CI/CD & GitHub Actions

Automating the verification and documentation of AI agent runs is critical for maintaining reliability in production. By integrating ctx into your CI/CD pipelines, you can ensure that every agent execution is reproducible, and every change to a prompt or tool is evaluated for drift before it hits main.

Automated Agent Verification

The primary use case for ctx in a CI environment is to validate that agent behavior remains consistent across code changes. Using the ctx replay command, you can re-execute captured runs and catch regressions automatically.

Replay & Fidelity Checks

When you run ctx replay <hash> in a CI environment, the CLI returns specific exit codes based on the "fidelity" of the run (how closely the replay matched the original execution):

Exit Code 0: Success. The replay matched the original run perfectly.
Exit Code 1: Degraded. The agent completed the task, but some non-deterministic steps diverged.
Exit Code 2: Failed. The agent was unable to complete the steps as recorded.

- name: Verify Agent Fidelity
  run: ctx replay ${{ env.GOLDEN_PACK_HASH }}

Detecting Drift in Pull Requests

When a developer modifies a prompt, model parameter, or tool implementation, you can use ctx diff to generate a structured report of how those changes affect the agent's execution path.

Example: Comparative Drift Analysis

You can compare a "Golden Pack" (a known good execution) against a new execution generated in a feature branch:

# Generate a new pack from the current branch execution
NEW_HASH=$(ctx pack current_run.json)

# Compare against the baseline
ctx diff ${{ env.BASELINE_HASH }} $NEW_HASH --human

The diff engine identifies:

Prompt Drift: Changes in system or user prompts.
Tool Drift: Changes in which tools were called or the order of calls.
Reasoning Drift: Divergence in the agent's logic/output at specific steps.

GitHub Actions Integration

The following example demonstrates a complete GitHub Action workflow that initializes a context store, packs a new execution log, and verifies it against a baseline.

name: Agent Regression Testing

on: [pull_request]

jobs:
  agent-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install ContextSubstrate
        run: go install github.com/scalefirstai/ctx/cmd/ctx@latest

      - name: Initialize Store
        run: ctx init

      - name: Run Agent Integration Test
        run: ./scripts/run-agent.sh > execution.json

      - name: Pack Execution
        id: pack
        run: |
          HASH=$(ctx pack execution.json)
          echo "pack_hash=$HASH" >> $GITHUB_OUTPUT

      - name: Check for Drift
        run: |
          # Fails if there is significant reasoning drift
          ctx diff ${{ vars.GOLDEN_HASH }} ${{ steps.pack.outputs.pack_hash }} --human

      - name: Verify Provenance
        run: ctx verify ./path/to/artifact.json

Artifact Provenance in CI

For teams shipping AI-generated artifacts (e.g., generated code, documentation, or data summaries), use ctx verify to ensure that the artifact in the build matches the recorded execution in your context store.

This prevents "phantom" changes—where an artifact is modified manually after the agent generated it—by validating the sidecar metadata against the immutable context pack.

# In your release pipeline
ctx verify ./generated_assets/summary.txt

Best Practices for CI

Golden Packs: Store your "Golden" pack hashes in GitHub Actions Secrets or Variables. Update them only when a behavior change is intentional and approved.
Deterministic Testing: While ctx handles non-deterministic steps, try to use a temperature: 0 setting during CI runs to minimize noise in reasoning drift reports.
Persistence: The .ctx/ directory acts as a local database. In CI, you may want to cache the .ctx/objects directory to speed up comparisons against historical packs.