Without an internal sceptic, the AI proposes and the human approves — and the only friction in the system is the human's vigilance.
The Challenger moves friction into the AI, where it is cheap.
Before requesting G2, the AI must produce four sections
Concrete failure modes — not generic ("schedule slip"). Each names: what breaks, cheapest signal, what we would do.
≥2 genuinely different alternatives. For each: 3–5-line sketch, why we rejected it, the condition under which we'd re-open it.
A first-person paragraph stating, as charitably as possible, the case against the chosen approach. If it reads like a strawman, it has failed.
One paragraph: vision fit, decision continuity with prior ADRs, and is the user problem measured or hypothetical?
Concrete vs. theatre
Failure 1 — Scene graph traversal becomes O(n²) on undo
The proposed undo strategy clones the parent chain on every node delete. With 5k nodes (already in our test scene) this is observable as a 200ms hitch. Cheapest signal: a single perf trace on the existing 5k-node sample. Mitigation: structural sharing of the parent chain via persistent data structure.
Counter-argument
The strongest case against this plan is that nobody has ever asked us for undo on the scene graph — it is a hypothesis about user pain, not a measurement. We are spending a sprint on a feature whose absence has produced exactly one support ticket in 18 months.
Failure 1 — Schedule slip
Sometimes development takes longer than expected.
Alternative A — Use a different library
We could use library X instead, but we already chose ours.
Counter-argument
Some people might think this is too complicated.
All three sections are generic and could apply to any plan. The Challenger has not engaged with the specific proposal.
| Phase | When the Challenger runs | Trigger |
|---|---|---|
| Exploration | Mandatory before G2 | Plan transitions draft → proposed |
| Production | Conditional | Human asks to expand scope mid-implementation |
| Context | Optional | Risks-template suspiciously thin (<3 risks, all rated low) |
| Not | Reason |
|---|---|
| A veto | Verdicts belong to the Gatekeeper agents, not the Challenger |
| A code reviewer | Code review happens at G3, against acceptance criteria |
| A devil's-advocate ritual | Generic objections are an automatic failure of the role |
| The human's voice | Humans review the output; they do not write it |