psychology

chevp-ai-framework

Lifecycle Gates Guidelines Commands Agents Templates
groups Agentic Workflow · Architecture

Agents

One main AI orchestrates — many small, narrow agents review.
Each agent owns a slice of the lifecycle and produces a verdict.

Friction is moved into the AI, where it is cheap, instead of bottlenecking on the human.

What is an agentic workflow?

Not one model doing everything — many small, focused models doing narrow jobs

A single LLM session that proposes, plans, prototypes, codes, reviews, audits, and ships is doing too much at once. The same context window holds the proposal and the rebuttal — so the rebuttal is never very strong.

An agentic workflow splits that single session into an orchestrator (the main AI you talk to) and many specialised agents (sub-sessions with their own prompt, tools, and read-only/read-write boundaries). Each specialised agent reviews a narrow slice with fresh eyes and returns a structured verdict. The orchestrator integrates the verdicts and reports back to the human.

chevp-ai-framework ships one cross-cutting role (Challenger) and six concrete agents. The pattern is open — you can register more for security, performance, accessibility, or any other slice your project cares about.

The Pattern

Orchestrator delegates · agents review · verdicts return

person Human (you)
smart_toy
Orchestrator AI
you talk to this one
balance
Challenger
sceptic
G1·G2·G3
Gatekeepers
transition verdicts
architecture
arch-reviewer
per-change
policy
gov-auditor
repo-wide drift
… plus any specialist you register: security, perf, a11y, deps …

Each agent is a separate sub-session: it cannot see the orchestrator's history, only what is passed in. That isolation is the point — a fresh perspective doesn't rationalise away the orchestrator's mistakes. Agents are read-only by default; only the orchestrator (with human approval) writes code.

Why many small agents?

Three failure modes that one big AI cannot escape

psychology_alt

Confirmation bias

A model that just wrote a plan is the worst reviewer of that plan. The Challenger lives in a separate sub-session, so it does not auto-defer.

filter_alt

Diluted attention

A long context with code, plans, and ADRs cannot be deeply checked end-to-end. A narrow agent reads only what it needs — ten lines, with intent.

verified

Auditable verdicts

Every agent emits a structured report (pass/block + findings). Verdicts can be diffed, logged, and replayed — unlike free-form chat.

The Roster

Six concrete agents + one cross-cutting role — each with a single, focused job

balance

Challenger

Cross-cutting Role Active in Exploration · Production

The internal sceptic. Before any G2 transition, the AI must produce three concrete failure modes for its own plan, two genuinely-considered alternatives, the strongest counter-argument against the chosen approach, and a product-coherence check. Generic output ("schedule slip", "scope creep") is an automatic regenerate.

Output
4 sections inside the EXP plan: failures, alternatives, counter-argument, product-coherence
Triggers
EXP plan transitions draft → proposed; mid-Production scope changes
G1

gatekeeper-g1

Subagent Read-only Tools: Read · Glob · Grep

Validates the Context → Exploration transition. Confirms the CTX-Plan, the uncertainty triplet (problem-statement, hypotheses, risks), the System Spec, the Software Architecture doc, fundamental ADRs, the context inventory and scope confirmation. Spawns up to five PROP-NNN Plan-Proposals for out-of-scope items so nothing silently disappears.

Verdict
pass / conditional-pass / block
Triggers
Before any move from Context to Exploration; /gate-check G1
G2

gatekeeper-g2

Subagent Read-only Tools: Read · Glob · Grep

Validates the Exploration → Production transition. Checks the EXP plan declares exploration-mode: A|B, that Kill Criteria, Acceptance Criteria, Risks and a UX-Prototype exist, that insights.md records what was learned, and — critically — that the Challenger output is concrete and not theatre. Generic Challenger blocks are an automatic block.

Verdict
pass / conditional-pass / block
Triggers
Before any move from Exploration to Production; /gate-check G2
G3

gatekeeper-g3

Subagent Read-only (+ Bash for build) Tools: Read · Glob · Grep · Bash

Validates the Production → Done transition. Verifies every acceptance criterion against actual code & tests, runs the build, checks the insights.md was updated with implementation surprises (not just copied from G2), checks the PRD provenance frontmatter for the human approval (approved-by / approved-at), and detects code changed outside the approved PRD scope.

Verdict
pass / conditional-pass / block
Triggers
Before declaring a Production task done; /gate-check G3
architecture

architecture-reviewer

Subagent Read-only Tools: Read · Glob · Grep

Reviews individual changes — a plan, a code diff, a new ADR — against the project's documented architecture invariants and accepted ADRs. Flags forbidden layer crossings, wrong dependency directions, and patterns that conflict with prior decisions. Returns severities info / warn / block.

Output
REVIEW · VERDICT · FINDINGS with severity per finding
Triggers
New pattern proposed, layer boundary crossed, ADR drafted
policy

governance-auditor

Subagent Read-only Tools: Read · Glob · Grep

Audits the whole repository for content-level drift against accepted ADRs and architecture invariants. Detects: ADRs whose constraints are violated by current code, recurring patterns that have no binding ADR, and accepted ADRs whose subject no longer exists in the codebase. Where the architecture-reviewer is per-change, the auditor is repo-wide.

Severity
BLOCK / CONCERN / INFO
Triggers
/governance-audit; per release; after any ADR is accepted/superseded
history

gate-validator

Legacy · Dispatcher Superseded

Backward-compatibility dispatcher. Older /gate-check invocations still route through here; it forwards to the matching gatekeeper-g1/g2/g3 and returns its output unchanged. New code should call the specialised gatekeepers directly.

Status
Retained for compatibility; do not extend
Replaced by
gatekeeper-g1, gatekeeper-g2, gatekeeper-g3

Where each agent activates

Mapped onto the 3-step lifecycle

Context G1 Exploration G2 Production G3 Done
balance Challenger
·
·
✓ mandatory
·
on scope-change
·
·
G1 gatekeeper-g1
·
✓ verdict
·
·
·
·
·
G2 gatekeeper-g2
·
·
·
✓ verdict
·
·
·
G3 gatekeeper-g3
·
·
·
·
·
✓ verdict
·
architecture architecture-reviewer
on ADR
·
on plan
·
on diff
·
·
policy governance-auditor
·
·
·
·
·
·
✓ periodic

The Gatekeepers gate transitions; the Challenger gates thinking; the Architecture-Reviewer gates changes; the Governance-Auditor gates drift.

The output contract

Every agent answers in the same skeleton — so verdicts are diffable

GATEKEEPER: G2
PLAN: EXP-014-auth-refactor
VERDICT: conditional-pass

FINDINGS:
  - satisfied  exploration-mode declared:    context/plans/EXP-014.md:3
  - satisfied  Kill Criteria present:         §"Kill Criteria" non-empty
  - missing    Challenger product-coherence:  no engagement with §Vision Alignment

EVIDENCE-BLOCK CHECK:
  - hypothesis: "session-token storage rewrite reduces compliance surface"
  - result:     "prototype confirmed on 2026-04-22"
  - reasoning:  "ship behind feature flag, dual-read 7d"

CHALLENGER CHECK:
  - failure-modes:    3, concrete
  - alternatives:     2, engaged
  - counter-argument: engaged
  - product-coherence: rubber-stamp

SPAWNED PLAN PROPOSALS (max 5):
  - PROP-027: long-running session migration plan (suggested-type: prd)

NEXT ACTION: regenerate Challenger §4 (product-coherence) before requesting /approve EXP-014

Verdicts can be committed to the repo, replayed, and diffed across runs — unlike free-form chat output. That property is what turns review into governance.

Many more agents are possible

The six shipped agents are an opinionated baseline — not a closed set

The framework's agents/ folder is an extension point. Anything that can be expressed as "a narrow read-only review with a structured verdict" can become an agent. Your project will likely want a few of these:

securitysecurity-reviewer
Reviews diffs against OWASP top-10, secrets, auth invariants.
speedperf-auditor
Flags O(n²) hot paths, missing indexes, N+1 queries.
accessibility_newaccessibility-checker
Reviews UI diffs for ARIA, contrast, keyboard nav, screen-reader paths.
menu_bookdoc-checker
Verifies that public APIs touched by a PR are reflected in docs/CLAUDE.md.
inventory_2dependency-watcher
Flags new dependencies, license conflicts, unmaintained packages.
sciencetest-coverage-reviewer
Confirms acceptance criteria from the PRD have at least one mapped test.
translatei18n-reviewer
Catches hardcoded strings in user-facing components.
databaseschema-migrator
Verifies migrations are reversible and handle long-running tables safely.
tips_and_updates A new agent is just a markdown file in agents/ with a frontmatter block (name, description, tools) and an output contract. The orchestrator picks them up automatically.

Design rules for new agents

Why the existing six are reliable — and how to keep yours that way

Rule Why
One agent = one job An agent that reviews "everything" returns nothing useful. Narrow scope makes verdicts strong.
Read-only by default Verdicts must be advisory. Only the orchestrator (with human approval) writes code or decisions.
Structured output, not prose Verdicts must be diffable. Free-form “looks good” is forbidden — cite a path or a line.
Generic findings auto-fail "Schedule slip", "scope creep" — if it could apply to any plan, it has not engaged with this one.
Out-of-scope → proposal, never silence An agent that finds a tangent files a PROP-NNN; it does not expand its own scope. (Rule 12)
Cap output to 5 proposals Prevents proposal-spam. Excess is rolled into one Sammel-Notiz paragraph.
menu_book

Read the source files

All agent definitions live next to the rest of the framework, as plain markdown.

agents/ open_in_new