What happened
The clari-tools → oompa migration on 17 April 2026 executed eight tasksets in one session, producing 12 commits across proto schemas, NATS streams, brand guardrails, observability (rules + alerts + dashboard), Python service scaffolds, docker-compose wiring, oompa-spec additions, editor component renderers, Gherkin acceptance tests, a full Next.js 16 landing page rebuild, and cross-cutting final validation. Every taskset ended at a named, binary gate:| Taskset | Gate |
|---|---|
| 1 — proto + NATS | Proto structural validity + yaml.safe_load_all on streams.yaml |
| 2 — guardrails + observability | promtool check rules + jq -e on dashboards + yaml parse |
| 3 — service scaffolds + compose | docker compose config -q + duplicate-port scan returns empty |
| 4 — oompa spec + editor blocks | pnpm type-check in cho-co-web + spec YAML validity |
| 5 — gherkin tests | pnpm test:dry --tags "@REQ-OMP-*" reports zero ambiguous/undefined |
| 6 — landing structure | pnpm build passes, zero gsap imports remain |
| 7 — landing interactive | pnpm build plus live curl -X POST /api/waitlist |
| 8 — final validation | All of the above, end-to-end, plus forbidden-word scan |
data/ gitignore for the dev waitlist) was a small follow-on within the same taskset, not a rollback.
Why gates work
Each gate is a compile-time statement, not a judgment call.promtool check rules either passes or it doesn’t. docker compose config -q either parses or it doesn’t. The gate is run immediately on the staged changes, before the commit. If the gate fails, the fix happens in-place before the code reaches the history.
The alternative — “run all the tests at the end” — produces a compound failure mode where taskset 7’s failure might be rooted in taskset 3’s silent break. Gates localize cost. A failing gate at taskset 5 is a local problem; a failing global check after taskset 8 is a bisect exercise.
The commercial impact
Customers and operators budget for “integration drift” when shipping migrations. That drift is usually a multi-hour rollback + triage phase at the end. Gated tasksets collapse that drift to zero — the drift never accrues. For teams shipping weekly migrations, this pattern compounds: in a year, 50-100 migrations × 1-2 hours of drift per migration = 50-200 hours of direct engineering time avoided. On top of that, customer-visible incidents from half-rolled migrations disappear.Pattern to repeat
- Name each taskset with a single clear goal.
- Declare the gate as an executable command, not an intent.
- Commit at gate-green, with conventional-commits scope matching the taskset.
- If the gate fails, fix before committing. Never commit “fix” separately unless a real latent issue was found.