Pick a conversation
Select any closed conversation from your inbox or click "Replay" on the conversation detail page.
Conversation Replay
Replay past conversations against your current AI configuration. Compare original responses side by side with what your AI would say now — with full retrieval, confidence, and escalation data.
Replay Results
Improved
Unchanged
Worsened
Message 5 of 7
Full pipeline
Each user message goes through query synthesis, embedding, knowledge base retrieval, and response generation — exactly what happens in production. The only variable is your current configuration: prompts, knowledge base, and model.
Confidence
Retrieval
Escalation
Verdict
Improved
Deterministic signals
Every exchange is scored using measurable signals — confidence levels, retrieval quality, and escalation behavior. You see exactly what changed in the pipeline. Response quality is left to the human reviewer.
Actionable results
Unchanged exchanges are collapsed. Improved and worsened exchanges are front and center with full retrieval source data, relevance scores, and a clear change summary.
How it works
No setup required. If you have past conversations, you can replay them.
Select any closed conversation from your inbox or click "Replay" on the conversation detail page.
Each user message is re-run through your current AI pipeline — query synthesis, retrieval, and response generation.
See original vs. replayed responses side by side with retrieval data, confidence scores, and an automated verdict.
FAQ
A replay feeds each original user message through the full current AI pipeline: query synthesis, embedding, knowledge base retrieval, and response generation. It uses the same conversation history the original had, so the only variable is your current configuration. Learn more
Original. When replaying message N, the AI sees the original messages up to that point — not the replayed responses. This isolates the test to your current pipeline configuration without cascading effects.
Verdicts are based on deterministic signals only: confidence level changes, retrieval quality differences, and escalation behavior. The system does not attempt to judge response quality subjectively — that is left to the human reviewer who can read both responses side by side.
Yes. Each replay runs independently as a background job. You can queue several replays and review their results from the Replays dashboard. Learn more
Yes. Each replayed exchange makes a real call to the AI model with full retrieval. The cost is equivalent to the conversation happening live. Replay usage is tracked in your billing dashboard.
Stop guessing whether your changes improved things. Replay and see.