First run · cold
"dedupe the import resolver"
spec-first
No prior decision. Shape the approach, gate it, build — then write the decision to memory with its provenance.
Adaptive agentic layer for Claude Code
Right-sized orchestration, SDLC specialists, and durable memory — for Claude Code. It routes each task to the least process that wins, then holds itself to a bar it can show you. Measured, not asserted.
› classify: multi-file feature, contract seams across 3 layers › route: decompose + verify — least process that wins › architect → plan + bounded task contracts › implement in parallel → verify the assembled whole
The thesis
A capable model will produce something. Whether it produced the right thing, at the right altitude, with the bar held — that is a separate question, and most setups never ask it. Agentry treats orchestration as engineering: the work is shaped to the least process that wins, built by specialists inside clean boundaries, and checked against an acceptance bar before it counts as done. And the system measures itself — so the bar isn't a claim, it's a number you can read.
How it holds together
Orchestration, memory, and specialists — each doing one thing well, wired into a system that adapts to the work in front of it.
Every task is routed to the least process that wins: a one-line fix goes straight to an implementer; an under-specified goal gets a Spec and a gate first; a multi-file feature is planned, split into bounded contracts, and verified. No ceremony where none is earned — and no hoping where a gate is.
Decisions, gotchas, and repo-facts are written to durable memory with provenance, then recalled — few, ranked, scoped — exactly when the next task needs them. The second run is warmer than the first: a problem solved spec-first once can route one-shot later, because the decision is remembered.
Architect, implementer, verifier, designer, librarian — each a specialist with a single concern, dispatched into a clean boundary and held to its contract. They use whatever tools and skills you already have rather than a fixed allowlist, so the harness adapts to your environment instead of fighting it.
How it works
Pick a task — watch it route to the least process that wins.
/agentry:go "The date formatter drops the timezone — fix it."
Single symbol, reversible, no design fork — straight to the implementer in debugging mode.
The moat
Run the same task twice. The first time, Agentry has no prior decision, so it works spec-first — and writes down what it decided. The next time, that decision is recalled, and the same task routes one-shot. The system gets cheaper at the work it has already reasoned through.
First run · cold
"dedupe the import resolver"
spec-first
No prior decision. Shape the approach, gate it, build — then write the decision to memory with its provenance.
Next run · warm
"dedupe the import resolver"
one-shot
The recalled decision removes the unknown. The least process that wins is now a single bounded pass — no re-deriving what was already settled.
Proof
The thesis is "measured, not asserted" — so here is the meter being wrong, in the open. Every entry below is real tracked data, the same run store the eval dashboard reads.
See the evidence6 corrections logged, and acted on — times the meter said one thing and the conductor was right. 3 would have shipped a wrong number; the rest were refinements.
Dispatch-pattern proxy was invalid
events.jsonl seam never existed
Cap-timing deflated decompose → one-shot
'Under-routes' were good decisions
Pagination mislabeled decompose
Grace window deflates some decompose
The harness runs the checks that keep the meter honest: an A/A null test (the same task scored against itself should show no difference), and a planted-discrimination test (deliberately strong and deliberately weak runs should separate). In the tracked runs they came back as expected — the null test stayed null, and the planted cases separated. We report this qualitatively: it's a pre-1.0 harness on a small sample, so read it as "the meter behaves," not a headline metric.
We don't ask you to trust the routing — we show you where it was wrong and how we fixed the meter, not the verdict. Measured, not asserted. The receipts are public, the harness is reproducible, and the dashboard is live.
Install
Paste this into Claude Code. It adds Agentry from its GitHub repo — no account, no build step.
/plugin marketplace add Codestz/agentry /agentry:go <task> — and Agentry routes it for you.