Replaying Executions

Replaying executions is a core feature of ctx that allows you to re-run the sequence of actions recorded in a Context Pack. This process is used to verify the reproducibility of an agent's work, debug non-deterministic behavior, and ensure that changes to the environment or tool definitions haven't introduced "drift" into the agent's logic.

Replaying a Pack

To re-execute an agent run exactly as it was recorded, use the replay command with a pack hash or a ctx:// URI.

ctx replay <hash>

When you run a replay, the substrate:

Loads the Manifest: Extracts the system prompt, user prompts, and input files.
Sequential Execution: Iterates through the steps defined in the pack.
Tool Invocation: Re-calls the tools with the original parameters recorded in the pack.
Fidelity Comparison: Compares the new output of each step against the OutputRef stored in the pack.

Example Output

Replaying Pack: 5f8e2a1b9c3d
Step 0: read_file [OK]
Step 1: search_docs [OK]
Step 2: write_summary [DIVERGED] - Output differs from recorded execution.

Fidelity: DEGRADED
Summary: 2/3 steps matched perfectly. 1 step produced different results.

Understanding Fidelity Levels

After a replay completes, ctx assigns a Fidelity score. This score indicates how closely the re-execution matched the original recording.

Integration with CI/CD

Because ctx replay provides specific exit codes based on execution fidelity, it is designed to be used in automated testing pipelines. You can use it to ensure that updates to your agent's codebase don't break the reproducibility of known "golden" execution packs.

# Example CI script
ctx replay $GOLDEN_PACK_HASH || {
  echo "Execution drift detected!"
  exit 1
}

How Replay Handles Steps

The ctx substrate uses the Steps array within the pack manifest to drive the replay. Each step includes:

Type/Tool: The specific function or model call performed.
Parameters: The exact arguments passed to the tool.
Deterministic Flag: A hint to the replayer. If a step is marked as Deterministic: true, any divergence in output during replay will be flagged as a significant fidelity loss.
Environment Context: Replays are executed using the local environment, but ctx checks the recorded Environment (OS, Runtime, Tool Versions) in the pack to warn you if your current setup differs significantly from the original.

Debugging Divergence

If a replay results in Degraded fidelity, you can use the ctx diff command to compare the original pack with a new pack generated from the replayed run. This helps identify exactly where the reasoning or tool output diverged.

# Pack the replayed run
ctx pack replayed_log.json

# Compare the original and the replay
ctx diff <original-hash> <replayed-hash> --human