Skip to main content

What happened

STRATT’s product claim is: three surfaces (Battle Room / Chamber / Workflow Pipeline), one protocol. The audit trail is byte-identical regardless of which surface a human used. For months, the ledger supporting that claim wrote the literal string "blake3:pending" into every event’s fingerprint field. No actual hashing happened. The audit viewer rendered the string. Tests that asserted “two ledger events are equal” trivially passed, because "blake3:pending" === "blake3:pending" for every pair. The claim was cosmetic. The engine behind it was correct (the events themselves were faithfully recorded, the canonical-serialisation pipeline existed as a library, the audit bridge signed what it was told to sign) — but the invariant-bearing field was a placeholder.

How it was caught

Before writing plan-file changes for the next taskset, three Explore agents ran in parallel for 90 seconds auditing different parts of the repo. One agent was asked specifically: “trace the ledger append path end-to-end; does blake3 get computed?” It returned the placeholder line in under 40 seconds. The fix became a plan-mode decision with three options (implement now / defer / scope separately). User chose implement now; the fix shipped in the same taskset. Cost to find: one agent, 40 seconds. Cost if shipped: indefinite — no internal test would ever have caught it, and the first external probe would reveal the gap publicly.

Why this matters for the roadmap

  1. Third-party verifiability unlocks procurement. Customers in regulated spaces (legal, finance, med-adjacent) don’t take “trust us, the audit is real” as an answer. They want a cryptographic property they can verify independently, typically by running our signing pubkey against our ledger slice. “Byte-identical across surfaces” only survives that test if the bytes are actually hashed. They now are.
  2. Cosmetic parity attracts scrutiny. A clever competitor runs our audit viewer for a day, scrapes the blake3: field across sessions, notices they’re all identical, and writes a blog post. The product claim evaporates in public. This almost-shipped.
  3. Placeholder debt is a compounding category. A TODO comment rots. A string literal in a production field doesn’t rot loudly — tests stay green, UI renders, no one notices. Audit categorised this as a new tech-debt flavour: “field-shaped TODOs.”

The pattern we’ll apply going forward

  • If you can’t implement an invariant-bearing field yet, throw at the boundary. Don’t populate it with a lie. A hard error during dev forces the conversation; a plausible-looking string hides the gap for months.
  • Parity tests need a negative guard. Four assertions minimum: two equalities and two inequalities. The inequalities prove the test isn’t accidentally passing on constants.
  • Phase-1 parallel audit agents are the cheapest way to catch field-shaped TODOs. Budget 90 seconds before every non-trivial taskset for a “is the thing actually wired?” sweep. Break-even is one catch per quarter. Last two quarters caught seven each.

What we’re measuring

  • Periodic grep -r "<placeholder patterns>" sweeps across all invariant-bearing packages. Candidate patterns: "pending", "TODO", "stub", "fixme", "0000", "<tbd>". Log matches; investigate each.
  • For every new invariant field, a unit test that asserts the field is never one of those patterns after a populate call.