Conversation Replay

Test every change before your customers feel it.

Replay past conversations against your current AI configuration. Compare original responses side by side with what your AI would say now — with full retrieval, confidence, and escalation data.

  • Full pipeline replay
  • Side-by-side comparison
  • Deterministic verdicts

Replay Results

3

Improved

4

Unchanged

0

Worsened

Improved Exchange #8 — Facebook login
Improved Exchange #10 — Support email
Same Exchange #1 — Greeting
1 Query synthesis Done
2 Embedding & retrieval Done
3 AI response generation Running
4 Verdict computation Pending

Message 5 of 7

Full pipeline

Re-run every step of the AI pipeline

Each user message goes through query synthesis, embedding, knowledge base retrieval, and response generation — exactly what happens in production. The only variable is your current configuration: prompts, knowledge base, and model.

Confidence

none high

Retrieval

filtered 3 chunks

Escalation

missed triggered

Verdict

Improved

Deterministic signals

Objective data, not subjective judgment

Every exchange is scored using measurable signals — confidence levels, retrieval quality, and escalation behavior. You see exactly what changed in the pipeline. Response quality is left to the human reviewer.

Improved Exchange #8
Expanded
Same Exchange #1
Collapsed
Same Exchange #2
Collapsed
Improved Exchange #10
Expanded

Actionable results

Focus on what changed, skip the noise

Unchanged exchanges are collapsed. Improved and worsened exchanges are front and center with full retrieval source data, relevance scores, and a clear change summary.

How it works

Three steps to regression-test your AI

No setup required. If you have past conversations, you can replay them.

01

Pick a conversation

Select any closed conversation from your inbox or click "Replay" on the conversation detail page.

02

Run the replay

Each user message is re-run through your current AI pipeline — query synthesis, retrieval, and response generation.

03

Review the comparison

See original vs. replayed responses side by side with retrieval data, confidence scores, and an automated verdict.

FAQ

Conversation Replay FAQs

What does a replay actually re-run?

A replay feeds each original user message through the full current AI pipeline: query synthesis, embedding, knowledge base retrieval, and response generation. It uses the same conversation history the original had, so the only variable is your current configuration. Learn more

Does replay use the current or original conversation history?

Original. When replaying message N, the AI sees the original messages up to that point — not the replayed responses. This isolates the test to your current pipeline configuration without cascading effects.

How is the improved/worsened verdict determined?

Verdicts are based on deterministic signals only: confidence level changes, retrieval quality differences, and escalation behavior. The system does not attempt to judge response quality subjectively — that is left to the human reviewer who can read both responses side by side.

Can I replay multiple conversations?

Yes. Each replay runs independently as a background job. You can queue several replays and review their results from the Replays dashboard. Learn more

Does replay cost AI credits?

Yes. Each replayed exchange makes a real call to the AI model with full retrieval. The cost is equivalent to the conversation happening live. Replay usage is tracked in your billing dashboard.

Ship AI changes with confidence

Stop guessing whether your changes improved things. Replay and see.