Skip to main content

What We Learned

Writing the canonical serialisation specification before touching any implementation code was the highest-leverage decision in the session. The spec exposed four ambiguities that would have caused integration failures if discovered during implementation: key sort order semantics (UTF-8 byte order vs UTF-16 code unit order), YAML type resolution for values like yes/no and dates, Unicode normalisation form (precomposed vs decomposed characters producing different hashes), and null-handling semantics (should absent fields appear as null or be omitted entirely). Each of these would have manifested as “works on my machine” bugs — two implementations producing different fingerprints from the same YAML file, with no obvious indication of where the divergence occurred.

Why It Matters

STRATT’s identity layer depends entirely on every component producing identical Blake3 fingerprints from the same YAML input. A single divergence corrupts the trust chain: MERIDIAN renders a tamper banner, CI rejects the PR, and the CLI returns exit code 2. The spec with its 14 test vectors is a contract — any implementation that passes all 14 vectors is guaranteed to interoperate. This is the difference between “it works on my machine” and “it works everywhere by construction.” The 4-stream synthesis document identified this as critical path dependency #1: nothing else ships until the serialisation algorithm is frozen.

The Pattern

Protocol-first design: define the byte-level algorithm, write reference test vectors, freeze the spec, then implement. This inverts the common “build it, then document it” approach. The upfront cost — one session to write the spec — saves multiples in debugging time across every future implementation. By aligning with RFC 8785 (JSON Canonicalization Scheme) where possible and documenting exactly where and why STRATT diverges, the spec inherits a standards-track foundation and only needs to define the YAML-specific extensions. Four divergences, all in the pre-JSON stages, all motivated.

Applicability

This pattern applies wherever multiple components must agree on a computed value: content hashing, API signature verification, cache key generation, CRDT merge ordering. The investment threshold is low — if two or more systems will independently compute the same thing, write the spec first.