What happened
A user-supplied 4-taskset plan (α Battle Room/Chamber editors, β Pipeline authoring, γ Intelligence recommendations, δ Gated Async + parity) was delivered internally consistent but resting on three assumptions that the repo exploration pass flagged as load-bearing:-
Taskset δ’s parity gate assumed real Blake3 hashing existed. Exploration showed the ledger wrote
"blake3:pending"as a literal string. Parity tests would be cosmetic, not cryptographic. Three plausible paths: implement real hashing as part of δ, keep the placeholder and test around it, split hashing into a separate taskset. -
Taskset β’s PipelineBuilder assumed chain units could declare per-stage output types. Exploration showed
ChainStepSchemahad{id, unit, agent, gate, depends_on}— nooutputfield. Three paths: extend the schema with optionaloutput, infer output from each step’s referenced unit contract, or drop per-stage output from the plan entirely. -
Taskset δ’s audit deep link assumed one canonical audit signer. Exploration showed the repo had two mint paths (Meridian Astro route + orchestrator-http
/audit/mint), each with its own keypair andkid. Three paths: make orchestrator-http canonical with Meridian proxying, keep both and document the divergence, or make Meridian canonical and drop the HTTP route.
AskUserQuestion prompts with labelled options (each option tagged (Recommended) on the path I thought was strongest). User answered in under two minutes total. All three answers matched the recommendation.
The plan file was then written with those three decisions baked in, ExitPlanMode was called, execution proceeded from a 26-subitem TaskList. No mid-execution pivots. No rework.
Why this is repeatable leverage
The shape of the decision capture matters:- Present the trade-off, not the recommendation alone. Three options per question, each with a concrete description of the trade-off. The recommended path is first and tagged — but the other paths are legitimate enough that “actually, let’s do path 2” is a one-click pivot, not a debate.
- Anchor each question to a specific piece of evidence from exploration. “The
blake3:field is currently the literal stringblake3:pending— how should we treat this?” is answerable. “What’s your preferred Blake3 strategy?” is not. - Do all three (or four) questions in one
AskUserQuestioncall. Batch cost. Users answer them as a group, often in under a minute. Asking sequentially would have cost three context switches plus three decision-forming delays.
Why it beats “just ask in the chat”
Chat questions are unstructured text. The user has to generate an answer from scratch, remember context about the surrounding work, and commit to a position that might have been better presented with alternatives they hadn’t considered.AskUserQuestion with options turns it into a multiple-choice exam with a confident recommendation. The user can either agree (default) or diverge (explicit click). Both paths are fast. The decision is captured in structured form that propagates into the plan file as “three scope decisions baked in,” making it reviewable later.
What we’re reusing
- Budget one
AskUserQuestionblock per plan-mode session for load-bearing assumptions. Not for every minor choice — for the 2-4 decisions that would cause rework if wrong. Exploration flags candidates; Q&A resolves them. - Include the paired prompt-op that shows the Explore-partition prompts. Without good exploration, the questions are speculative. The two techniques chain.
- Label the recommended option, always first. Users with less context than you still get a working answer on “Enter”; users with more context can still pivot.