Open Problems

Honest about what’s unsolved. Contributions welcome.

Automated Regression

A journey that passes today may break tomorrow. No automated re-walk exists. Needs dependency graph mapping, CI-integrated walks, and a headless walk runner.

Adversarial Journeys

Current journeys are “actor tries to succeed.” Missing: authorization bypass attempts, cross-tenant data access, role escalation, input injection. The walk procedure supports failure mode testing per step, but dedicated adversarial journeys with systematic threat modeling would be more thorough.

Performance Budgets

Trace analysis currently uses default budgets (5 DB queries for CRUD, 500ms response). No per-operation baselines exist. Adding baselines requires measuring current performance and defining acceptable thresholds per journey step.

Scale

At 8 journeys, manual walks work. At 50, they don’t. Scaling requires automated execution, prioritization, and incremental re-walks (only steps affected by code changes).

Walk Data Format

Walk transcripts are AI-generated markdown. Future direction: structured JSON that can be consumed by a test framework for automated evaluation. This is a prerequisite for CI-integrated walks.

Cross-Model Consistency

Unknown whether different AI models walking the same journey with the same criteria produce comparable results. Binary criteria should help, but trace analysis may vary.

Methodology Observability

No meta-metrics for “is the methodology working?” Candidates: issues found per walk, re-walk count before passing, criteria violations per scope over time, time from walk failure to walk pass.

Human Bottleneck at Scale

The walk procedure requires human review for every fix. At scale, needs auto-approve for low-risk fixes, require-review for high-risk changes.

Callable Surface Parity

If the system has multiple surfaces (REST, MCP, gRPC), nothing verifies they behave identically. A walk through MCP may pass while the same operation through REST fails differently.

Full list of open problems →