Glossary
Layer - one of six maturity levels (0-5), adopted incrementally. Each layer addresses a gap the previous layer can not fill.
Journey - a defined sequence of steps that verifies a user-facing flow through the callable surface. Not a test case, a verification arc that produces observable traces.
Walk - executing a journey against a live system. An AI agent follows journey steps through the callable surface, collecting responses and trace IDs.
Walk Report - structured output of a walk. Pass/fail for each step, findings, trace IDs, and an overall result. Evidence, not opinions.
Callable Surface - the interface exercised during a walk. MCP server, REST API, gRPC, CLI, or any programmatically invocable interface. What makes journeys walkable by AI agents.
Evaluation Criteria - binary checks applied to a walk report. PASS or FAIL, no partial credit. Organized into four scopes: functional, performance, security, and observability.
Budget - a numeric threshold for a performance criterion. Example: “DB query count must be less than 5.” Compared against actual trace values.
Violation - a criterion that fails, with evidence: a metric value, a trace ID, a response body. A measured fact, not an opinion.
Finding - an observation made during a walk that is noteworthy but may not be a violation. Unexpected behaviors, quality concerns, or observations outside defined criteria.
Trace - a distributed trace capturing the internal execution path of a system operation. Produced by OpenTelemetry instrumentation. In this methodology, traces are verification evidence, not monitoring data.
Black Box - a walk mode where the walker sees only callable surface responses. No traces, no internal state. Evaluates the system as a user would.
White Box - a walk mode where the walker also has access to distributed traces. Enables performance evaluation, duplicate detection, and internal behavior verification.
Bias Prevention - performing black box evaluation before white box. If the walker sees traces first, it may rationalize functional failures based on internal context.