Argus vs Langfuse for Claude Cowork sessions
Langfuse is the open-source LLM observability platform that's earned its spot in the AI-tooling stack. It traces prompts, completions, tool calls, costs, and evaluations across any LLM provider you point it at. It's excellent at what it does.
Argus is not a Langfuse competitor. Argus is a tool for a much narrower problem: capturing and replaying Claude Cowork sessions, the ones that happen inside Anthropic's sandboxed desktop runtime where your normal observability stack doesn't reach.
This page exists because we want you to make the right call, not because we want to win a head-to-head.
When you want Langfuse
You're building an LLM product — an app, a chatbot, an agent loop — where you control the prompts and want to trace, evaluate, and iterate on them across many users and many calls. Langfuse's tracing model, prompt management, and eval framework are excellent for this. It also plugs into multiple providers (Anthropic, OpenAI, Mistral, etc.) — which is exactly what you want when your product calls more than one model.
When you want Argus
You're shipping Claude Cowork into someone's organization — an agency delivering an implementation to a client, a forward-deployed engineer rolling out for an internal team, or a consultant who needs to QA the work skills do once they leave their laptop. The sessions you care about are happening inside Cowork's VM, not inside an app you wrote, and the tools your team writes are Cowork skills + MCP servers, not LLM calls in product code.
Argus is purpose-built for that surface:
- Captures Cowork sessions via a plugin running inside the VM.
- Inventories every skill, MCP server, and plugin loaded per session.
- Tracks skill versions so you can diff before/after a refinement.
- Surfaces the workspace/user/identity layer Cowork itself doesn't.
- Ships with a
/privateopt-out and per-workspace redaction.
Can I use both?
Yes, and we'd recommend it once you have a product layer behind your Cowork delivery:
- Langfuse captures the traces in your own backend services and agent loops.
- Argus captures what's happening inside the Cowork sessions your team or client is running on their machines.
They don't overlap — Cowork is a runtime Langfuse can't reach, and your product backend is a layer Argus doesn't try to reach.
What Langfuse does that Argus doesn't
- Multi-provider tracing. Langfuse handles OpenAI / Gemini / Cohere the same way it handles Anthropic. Argus is Cowork-only by design.
- Prompt management UI. Langfuse has a polished workflow for versioning and A/B-testing prompts. Argus tracks versions of skills, not arbitrary prompts.
- A large eval framework. Langfuse has a mature evaluations product with LLM-as-judge built in. Argus is opinionated about turn-level human QA first.
What Argus does that Langfuse doesn't
- Cowork session capture. Langfuse can't see what happened inside a Cowork VM unless you pipe events to it via OTel, in which case you get metrics, not content.
- Skill / MCP catalog. Langfuse traces calls; Argus catalogs the surface — every skill version your team has ever used, every MCP any teammate has loaded.
- Compliance-grade Cowork audit trail. A specific use case the general-purpose observability tools don't address.
Verdict
If you're picking one: pick Langfuse if you're building an AI product and your Cowork usage is incidental. Pick Argus if your team's deliverable is Cowork — skills, MCPs, agents running on someone else's machines — and you need visibility into that specific surface.
Most serious teams shipping Cowork to clients will eventually want both.