Journey Verification
The core verification mechanism. AI exercises the system as a real user, then the trace tells you what actually happened.
What a Journey Is
A journey is the arc a real actor travels to reach user-meaningful value. One actor, one outcome, a sequence of steps. Not a test, not a spec, but a complete user arc verified against the running system.
The Walk Cycle
- Design - write the journey: actor, goal, steps, evaluation criteria.
- Pre-walk - check tool inventory, inspect schemas, verify preconditions.
- Walk - AI calls the surface step by step. After each call, analyze the trace.
- Evaluate - check against criteria: technical, security, business, surface quality.
- Fix - implement approved fixes.
- Re-walk - verify the fix. Repeat until zero violations.
Black Box vs White Box
Black box: AI sees only the callable surface, public docs, and traces. No source code. Tests what a real client experiences.
White box: AI sees everything: source, docs, tests, config. Used after black box walks for root-cause analysis and implementation.
The same system needs both perspectives. Mixing them produces biased results.
Evaluation Criteria
Human-defined criteria with binary outcomes. Violated or not. No subjective AI scoring. The criterion says the DB query budget is 5. If the trace shows 14, that’s a violation. No interpretation required.
Criteria are organized by scope: technical, security, business, surface quality.