Skip to main content

Designing Self-Improving Agent Systems via Execution Traces

The Problem

SPEC-05 defines execution traces and quality scoring, but no STRATT package implements it yet. Grace needs a trace format that is both immediately useful (manual review) and future-compatible (automated DSPy optimisation).

The Insight

The IRProgram type in @stratt/ir already defines the structure of a compiled chain: steps, agents, gates, inputs, outputs, edges, failure modes. A trace is structurally identical — it’s an IRProgram with timestamps, actual values, and quality scores added. By mirroring IRProgram’s structure in the trace format, we get type compatibility for free. When SPEC-05 is implemented as a STRATT package, no schema migration will be needed.

The Quality Scoring Heuristic

Three factors, weighted:
  • Contract conformance (0.40) — outputs match declared types
  • Completeness (0.35) — all required outputs present
  • Token efficiency (0.25) — useful output vs total tokens
Regression threshold: quality score delta > 0.05 between versions triggers Veritas gate.

The Self-Enhancement Loop

Execute → trace → evaluate (weekly) → identify underperformers → refine (prompt body, agent assignment, chain composition) → re-validate → execute again. This closes the loop that HB-03 explicitly relies on for “20-hour week” retirement operations.

Business Impact

The trace protocol is the bridge between Grace’s current manual operation and its future autonomous self-improvement. Every trace logged now is training data for the DSPy optimiser later.