Drift Detection & Diffing
Drift Detection & Diffing
One of the primary challenges in AI agent development is understanding why two runs produced different results. ctx provides a powerful diffing engine designed specifically for agentic workflows, allowing you to identify "drift" across prompts, tool usage, parameters, and reasoning.
The ctx diff command compares two immutable Context Packs and generates a structured report of their differences.
Basic Usage
To compare two execution runs, provide their hashes (or short hex prefixes) to the diff command:
ctx diff <hash-a> <hash-b>
By default, this outputs a machine-readable JSON report, making it ideal for CI/CD pipelines or automated regression testing. For a summary formatted for terminal viewing, use the --human flag:
ctx diff <hash-a> <hash-b> --human
Understanding Drift Types
The diffing engine categorizes differences into specific "Drift Types" to help you pinpoint the source of divergence:
| Drift Type | Description |
| :--- | :--- |
| prompt_drift | Changes in the system prompt or user prompt sequence. |
| tool_drift | Divergence in which tools were called, or changes in the order of tool execution. |
| param_drift | A tool was called with different arguments (e.g., a search query or a file path changed). |
| reasoning_drift | The same tool was called with the same parameters, but produced a different output (indicates non-determinism). |
| output_drift | Differences in the final artifacts or files produced by the agent. |
The Drift Report
JSON Output (Default)
The JSON output provides a structured DriftReport object. This is useful for building custom dashboards or automated "fidelity" checks.
{
"pack_hash_a": "7a2b9c1d...",
"pack_hash_b": "4f8e2a3b...",
"has_drift": true,
"entries": [
{
"type": "prompt_drift",
"description": "System prompts differ",
"pack_a": "3d21a...",
"pack_b": "9f12b..."
},
{
"type": "param_drift",
"description": "Step 0: read_file called with different parameters",
"step_index": 0,
"pack_a": { "path": "readme.md" },
"pack_b": { "path": "docs/readme.md" }
}
]
}
Human-Readable Output
When using --human, ctx provides a concise summary of the differences:
Comparing 7a2b9c1d... vs 4f8e2a3b...
2 difference(s) found:
1. [prompt_drift] System prompts differ
2. [param_drift] Step 0: read_file called with different parameters
Comparison Logic
The diffing engine performs a multi-stage analysis of the agent's execution:
- Prompt Alignment: It compares the system prompt and the sequence of user prompts. If the agent received extra prompts in one run, it flags the mismatch.
- Step-by-Step Execution: It iterates through the execution log. If at step $N$ the agent calls a different tool than in the reference pack, it flags a
tool_drift. If the tool matches but the arguments differ, it flags aparam_drift. - Output Divergence: Even if the inputs (parameters) are identical, external environments or LLM non-determinism can cause the tool output to change. This is flagged as
reasoning_drift. - Artifact Comparison: Finally, it compares the content hashes of the terminal outputs (the files or data produced at the end of the run).
Workflow Example: Debugging a Regression
If an agent successfully completed a task in the past but is now failing, you can use diff to isolate the change:
- Find the hash of the successful run using
ctx log. - Pack the failing run:
ctx pack failure.json. - Compare them:
ctx diff <success-hash> <failure-hash> --human.
This will tell you immediately if the prompt was modified, if a tool version changed in the environment, or if the agent simply chose a different reasoning path at a specific step.