D — session replay / review

Claude Code Session Replay — Review Every Agent Action

When a Claude Code or Cowork session produces something surprising — a great result you want to learn from, or a wrong one you need to debug — the natural next question is: what exactly did the agent do? Which files did it read, which tools and MCP servers did it call, what did each skill return, and in what order?

That's what session replay answers. Instead of reading a final summary or scrolling raw logs, you step through the session the way it actually unfolded.

Why "what did the agent do?" is hard to answer by default

Claude Code and Cowork store sessions locally as JSONL transcripts on the machine that ran them. The data is there, but it isn't built for review:

It's scattered across machines — no central place to look across a team or across projects.
It's raw — a flat event stream, not a navigable, human-readable timeline.
It mixes signal and noise — tool calls, partial outputs, and assistant text interleave in ways that are hard to reconstruct after the fact.
There's no quality layer — nothing tells you whether a given step was good, or which skill version produced it.

So in practice, "what did the agent do?" turns into archaeology. Session replay turns it into a click.

What good session replay looks like

A useful replay view reconstructs the session as a timeline you can read and navigate:

Turn by turn — each user prompt and assistant response in order.
Tool and MCP calls attached to the right turn — including the actual command for shell/script steps, and the real server and tool name for MCP calls (not a generic "tool" label).
Skills and agents surfaced — which ran, which version, with what inputs.
Errors and failures visible — so you can see where a run went sideways.
Annotatable — reviewers can comment on a specific turn and flag it good or wrong.

The point is to go from "the agent produced this" to "here's the path it took to get there" — which is what you need to debug, to teach, or to prove a deliverable.

Replay for Claude Code and Cowork with Argus

Argus captures each Claude Code or Cowork session as a structured record and gives you exactly this replay experience. A lightweight plugin instruments the environment; sessions flow into your own workspace, where the replay view reconstructs the full timeline:

See every action — prompts, responses, tool calls, MCP invocations (resolved to real names), and skill/agent runs, anchored to the right point in the conversation.
Review and rate quality — mark turns good or wrong, annotate them, and tag sessions for follow-up.
Catalog what ran — track which skills, MCP servers, and plugins were used, and which versions, across sessions.
Stay private and in control — encrypted, workspace-scoped, visible only to invited members, with redaction and per-session opt-out.

For teams shipping Claude implementations to clients, replay is also proof: you can show a client exactly what was done, and demonstrate that a skill or connector still works the way it should.

FAQ

What does "session replay" mean for Claude Code?

A turn-by-turn record of everything that happened: user prompts, assistant text, every tool call with its input and output, version hashes of every skill that fired. Played back in the order it occurred, with timing — so you can see exactly what the agent did and why.

How is this different from a transcript?

A transcript is text only. Session replay is the full structured event stream: tool inputs/outputs, errors, MCP server calls, hook events, latency per turn. You can filter, annotate, and diff version-over-version. A transcript answers "what was said"; replay answers "what actually happened."

Can I replay Claude Cowork sessions, or only Claude Code in the terminal?

Both, with the right plugin. Cowork sessions run in a sandboxed VM but expose the same hook surface as Claude Code — a capture plugin installed inside the VM (or on the host for Claude Code) produces the same event stream Argus replays.

Does session replay leak secrets if my prompts contained them?

It can if you don't redact. Argus offers entity-level and strict redaction modes via the Anthropic API, and a /private slash command that drops a session entirely. Replay is only as safe as the redaction you've configured upfront.

How long are replayed sessions kept?

For the alpha, indefinitely (you can delete any session manually). Retention policies are on the post-alpha roadmap — workspace-scoped, with sensible defaults and per-session overrides.