Drift Detection & Diffing

One of the primary challenges in AI agent development is understanding why two runs produced different results. ctx provides a powerful diffing engine designed specifically for agentic workflows, allowing you to identify "drift" across prompts, tool usage, parameters, and reasoning.

The ctx diff command compares two immutable Context Packs and generates a structured report of their differences.

Basic Usage

To compare two execution runs, provide their hashes (or short hex prefixes) to the diff command:

ctx diff <hash-a> <hash-b>

By default, this outputs a machine-readable JSON report, making it ideal for CI/CD pipelines or automated regression testing. For a summary formatted for terminal viewing, use the --human flag:

ctx diff <hash-a> <hash-b> --human

Understanding Drift Types

The diffing engine categorizes differences into specific "Drift Types" to help you pinpoint the source of divergence:

The Drift Report

JSON Output (Default)

The JSON output provides a structured DriftReport object. This is useful for building custom dashboards or automated "fidelity" checks.

{
  "pack_hash_a": "7a2b9c1d...",
  "pack_hash_b": "4f8e2a3b...",
  "has_drift": true,
  "entries": [
    {
      "type": "prompt_drift",
      "description": "System prompts differ",
      "pack_a": "3d21a...",
      "pack_b": "9f12b..."
    },
    {
      "type": "param_drift",
      "description": "Step 0: read_file called with different parameters",
      "step_index": 0,
      "pack_a": { "path": "readme.md" },
      "pack_b": { "path": "docs/readme.md" }
    }
  ]
}

Human-Readable Output

When using --human, ctx provides a concise summary of the differences:

Comparing 7a2b9c1d... vs 4f8e2a3b...

2 difference(s) found:

  1. [prompt_drift] System prompts differ
  2. [param_drift] Step 0: read_file called with different parameters

Comparison Logic

The diffing engine performs a multi-stage analysis of the agent's execution:

Prompt Alignment: It compares the system prompt and the sequence of user prompts. If the agent received extra prompts in one run, it flags the mismatch.
Step-by-Step Execution: It iterates through the execution log. If at step $N$ the agent calls a different tool than in the reference pack, it flags a tool_drift. If the tool matches but the arguments differ, it flags a param_drift.
Output Divergence: Even if the inputs (parameters) are identical, external environments or LLM non-determinism can cause the tool output to change. This is flagged as reasoning_drift.
Artifact Comparison: Finally, it compares the content hashes of the terminal outputs (the files or data produced at the end of the run).

Workflow Example: Debugging a Regression

If an agent successfully completed a task in the past but is now failing, you can use diff to isolate the change:

Find the hash of the successful run using ctx log.
Pack the failing run: ctx pack failure.json.
Compare them: ctx diff <success-hash> <failure-hash> --human.

This will tell you immediately if the prompt was modified, if a tool version changed in the environment, or if the agent simply chose a different reasoning path at a specific step.