CI/CD & GitHub Actions
Automating the verification and documentation of AI agent runs is critical for maintaining reliability in production. By integrating ctx into your CI/CD pipelines, you can ensure that every agent execution is reproducible, and every change to a prompt or tool is evaluated for drift before it hits main.
Automated Agent Verification
The primary use case for ctx in a CI environment is to validate that agent behavior remains consistent across code changes. Using the ctx replay command, you can re-execute captured runs and catch regressions automatically.
Replay & Fidelity Checks
When you run ctx replay <hash> in a CI environment, the CLI returns specific exit codes based on the "fidelity" of the run (how closely the replay matched the original execution):
- Exit Code 0: Success. The replay matched the original run perfectly.
- Exit Code 1: Degraded. The agent completed the task, but some non-deterministic steps diverged.
- Exit Code 2: Failed. The agent was unable to complete the steps as recorded.
- name: Verify Agent Fidelity
run: ctx replay ${{ env.GOLDEN_PACK_HASH }}
Detecting Drift in Pull Requests
When a developer modifies a prompt, model parameter, or tool implementation, you can use ctx diff to generate a structured report of how those changes affect the agent's execution path.
Example: Comparative Drift Analysis
You can compare a "Golden Pack" (a known good execution) against a new execution generated in a feature branch:
# Generate a new pack from the current branch execution
NEW_HASH=$(ctx pack current_run.json)
# Compare against the baseline
ctx diff ${{ env.BASELINE_HASH }} $NEW_HASH --human
The diff engine identifies:
- Prompt Drift: Changes in system or user prompts.
- Tool Drift: Changes in which tools were called or the order of calls.
- Reasoning Drift: Divergence in the agent's logic/output at specific steps.
GitHub Actions Integration
The following example demonstrates a complete GitHub Action workflow that initializes a context store, packs a new execution log, and verifies it against a baseline.
name: Agent Regression Testing
on: [pull_request]
jobs:
agent-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install ContextSubstrate
run: go install github.com/scalefirstai/ctx/cmd/ctx@latest
- name: Initialize Store
run: ctx init
- name: Run Agent Integration Test
run: ./scripts/run-agent.sh > execution.json
- name: Pack Execution
id: pack
run: |
HASH=$(ctx pack execution.json)
echo "pack_hash=$HASH" >> $GITHUB_OUTPUT
- name: Check for Drift
run: |
# Fails if there is significant reasoning drift
ctx diff ${{ vars.GOLDEN_HASH }} ${{ steps.pack.outputs.pack_hash }} --human
- name: Verify Provenance
run: ctx verify ./path/to/artifact.json
Artifact Provenance in CI
For teams shipping AI-generated artifacts (e.g., generated code, documentation, or data summaries), use ctx verify to ensure that the artifact in the build matches the recorded execution in your context store.
This prevents "phantom" changes—where an artifact is modified manually after the agent generated it—by validating the sidecar metadata against the immutable context pack.
# In your release pipeline
ctx verify ./generated_assets/summary.txt
Best Practices for CI
- Golden Packs: Store your "Golden" pack hashes in GitHub Actions Secrets or Variables. Update them only when a behavior change is intentional and approved.
- Deterministic Testing: While
ctxhandles non-deterministic steps, try to use atemperature: 0setting during CI runs to minimize noise in reasoning drift reports. - Persistence: The
.ctx/directory acts as a local database. In CI, you may want to cache the.ctx/objectsdirectory to speed up comparisons against historical packs.