Work in the Simulation and Training Studio — Best Practices

Scale AI agents for high availability and load

If you're testing your AI agents used in Production, it's important to ensure your AI agent can effectively handle the total conversation load, accounting for both live customer traffic and the additional load generated by simulations.

For LivePerson AI agents (built and deployed in LivePerson Conversation Builder), the primary scaling mechanism is the agent connector.

Capacity limit: A single bot agent connector can handle a maximum of 999 concurrent conversations. (See our Conversation Builder deployment best practices and learn how to add a connector.)
Proactive scaling: You must evaluate whether the bot is currently handling live traffic and add additional connectors ahead of anticipated spikes to prevent performance degradation.

For third-party AI agents used and managed through LivePerson’s Conversational Cloud, similar scaling principles apply. In this case, a connector shouldn't handle more than 250 conversations, so add additional connectors as needed.

Manage human agent load via CCaaS tools

The introduction of simulations shouldn't overwhelm your human agents. Rely on the built-in workload management tools within your CCaaS solution to manage their load. These features allow you to set strict concurrency limits, ensuring agents only handle a manageable number of conversations at once. By automating these caps, you protect your team from burnout while maintaining high-quality service for every customer, whether real or synthetic.

When using Syntrix to evaluate human agents, you must choose between blind testing (where your agents may be unaware they are interacting with a synthetic customer) and open testing (where the simulation is disclosed).

While blind testing provides the most authentic assessment regarding an agent’s natural performance, it carries cultural risks regarding surveillance and employee trust.

Conversely, open testing fosters a psychological "safe zone" ideal for coaching and upskilling without performance anxiety.

We recommend that you prioritize open testing for training purposes to build confidence, and reserve blind testing for compliance audits, ensuring that agents are informed beforehand that synthetic quality assurance checks are a part of standard operations.

Before you run a simulation

Ensure agent availability. Confirm your human agents are logged in—or your AI agents are deployed and online—for all skills in the scenarios. If a conversation cannot route, it will eventually time out and consume your allotment of total conversations in the simulation.
Pilot your test. Run 3–5 conversations before starting a full-scale test. Use these initial runs to catch configuration errors, such as incorrect skills.

When you configure a simulation

Total conversations

To ensure your performance data is meaningful, aim for at least 5–10 conversations per agent. Calculate your total test volume by scaling this target against your active agent count.

Testing/training goal	Recommended value for "Total conversations"
Quick validation or smoke test	3-5 conversations is enough to validate flows are working before committing to a full run.
Agent training	Scale to your agent count to ensure everyone gets meaningful exposure.
Bot testing	Scale volume to bot complexity. Simple flows might require fewer conversations to surface issues.

Max. concurrent conversations

Getting this right is critical.

When training human agents: Set concurrency slightly above total agent capacity (agents × conversations per agent) to keep a small queue. This ensures agents always have work without a long backlog. For example, 5 agents at 3 concurrent each = 15 capacity, so 18–20 concurrency works well.
When testing AI agents: Bots can handle much higher concurrency. Start at 10–15 and increase to stress-test throughput, but monitor for rate-limit errors.

Max. conversation turns

This is the safety net that prevents conversations from looping indefinitely.

When this limit is reached, the conversation is closed automatically by the system. This can negatively affect assessment scores, so choose a limit that allows legitimate conversation flows to finish.

General recommendation: 20–30 turns for typical conversations.
When testing AI agents: Set lower (10–15) to detect infinite loops quickly. If a bot can't resolve an issue in 15 turns, that's a finding in itself.
For complex, multi-step scenarios: Increase to 30 if the scenario requires extensive back-and-forth, for example, technical troubleshooting with multiple steps.

During a simulation

Monitor the conversations in the agent workspace

Watch the first 5–10 conversations closely; look for:

In-progress conversations that have with no turns progressing (routing issue)
Rapid closures with very few conversation turns (bot immediately closing, or agent rejecting)

Stop early if something is wrong

If you see systematic errors (wrong skill, broken bot, agent confusion, etc.), stop the simulation instead of letting bad conversations run their course. Fix the issue and re-run the simulation. Allowing flawed simulations to continue wastes valuable agent time and compromises the integrity of your reports.

After a simulation

Analyze at three levels

Macro (overall simulation): Start with the summary. Did the simulation succeed? Look for broad patterns and system-wide trends.
Mid (agent performance): Drill into individual results to identify exactly who needs coaching and on which specific skills, and which AI agents need tuning.
Micro (conversation deep-dive): Review specific failures. Use these transcripts as concrete, real-world examples for training sessions.

Track long-term progress

Run the same simulations periodically to measure ROI. Consistent benchmarking is the only way to verify if your bot updates, process changes, or agent coaching programs are actually working.

Close the feedback loop

Use your results to sharpen your scenarios and personas:

Too easy? If everyone passes, increase the scenario difficulty or tighten your agent goals.
Universal failures? If most agents fail the same metric, decide if the goal is unrealistic or if you've found a critical training gap.
Repetitive outcomes? If multiple personas always lead to the same result, your personas are too generic. Sharpen their unique traits to ensure they sound like distinct individuals.

Faster Better answers through AI

Best Practices