Reading Optimisation Reports
The optimisation suite monitors all 8 agents for drift, consistency, and effectiveness. Reports post to #agent-optimisation on Discord. Here’s how to read them.
Report Types
Universal Reports (U-series)
These run for every agent and check foundational health metrics.
| Report | Frequency | What It Checks |
|---|
| U-01: Prompt Drift | Weekly (Sun) | Has the agent’s behaviour drifted from its original spec? |
| U-02: Memory Hygiene | Daily | Is the agent’s memory clean and well-organised? |
| U-03: Voice Consistency | Fortnightly | Does the agent still sound like itself? |
| U-04: Context Budget | Monthly | Is the agent using its context window efficiently? |
| U-05: ADHD Protocol | Weekly (Sun) | Are ADHD protocols being activated and working? |
| U-06: Council Handoff | Weekly (Sun) | Are agent-to-agent consultations working smoothly? |
| U-07: Tool Relevance | Monthly | Are the agent’s tools still appropriate? |
| U-08: Regression Testing | Monthly | Has response quality degraded? |
| U-09: Personality Entropy | Monthly | Is the agent’s personality stable or drifting? |
| U-10: Config Drift | Quarterly | Has the agent’s config diverged from the canonical spec? |
Group Reports (G-series)
These check how groups of agents work together.
| Report | Agents | What It Checks |
|---|
| G-01: Shipping Velocity | Anvil + Sentinel | Are we building and deploying at a healthy pace? |
| G-02: Decision Quality | Compass + Atlas + Vault | Are strategic decisions being made well? |
| G-03: Growth & Wellbeing | Tempo + Cortex | Is learning happening without burnout? |
| G-04: Content Quality | Bard + Cortex | Is content accurate and on-brand? |
Individual Reports (I-series)
Deep dives for specific agents with unique concerns.
| Report | Agent | What It Checks |
|---|
| I-01: Debt/Shipping Ratio | Anvil | Balance between new features and tech debt |
| I-02: Authenticity | Bard | Is content authentic vs. generic AI slop? |
| I-03: Priority Accuracy | Atlas | Were yesterday’s priority calls correct in hindsight? |
How to Read a Report
Reports in Discord follow a consistent format:
📊 OPTIMISATION: U-01 Prompt Drift — Anvil 🛠️
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Status: ✅ OK | Score: 0.91 | Trend: stable
Dimensions:
Tone: 0.93 ✅ (within bounds)
Scope: 0.88 ✅ (within bounds)
Protocol: 0.95 ✅ (strong)
Consistency: 0.89 ✅ (within bounds)
Recommendations: None — Anvil is operating within spec.
Cost: $0.03 | Tokens: 1,247
Status Meanings
| Status | Colour | Meaning | Action Needed |
|---|
| OK | Green | Everything normal | None |
| WARNING | Yellow | Something is drifting but not critical | Keep an eye on it |
| ACTION_REQUIRED | Red | Needs human attention | Read recommendations and act |
| SKIPPED | Grey | Rate-limited or cost-capped | Will retry next cycle |
Score Ranges
- 0.90 - 1.00 — Excellent, operating within spec
- 0.75 - 0.89 — Good, minor drift detected
- 0.50 - 0.74 — Warning zone, review recommendations
- Below 0.50 — Critical, likely needs manual intervention
Trend Indicators
- improving — Score is going up over recent runs
- stable — Score is consistent
- worsening — Score is declining — pay attention
What to Do When You See ACTION_REQUIRED
Read the recommendations
The report includes specific suggestions. Most are straightforward: “update the agent’s memory”, “review the prompt for scope creep”, etc.
Check the Airtable history
Open the “Optimisation Runs” table in Airtable to see the trend. Is this a one-off or a pattern?
Flag to the team
If you’re not sure what to do, share the Discord message in your team channel. The platform team can investigate.
Cost Tracking
Every report shows its cost in USD. The system has built-in limits:
| Limit | Default | What Happens When Hit |
|---|
| Daily cost cap | $5.00 | Remaining runs skip until tomorrow |
| Hourly rate limit | 50 calls | Runs queue until next hour |
| Per-agent cost cap | $1.00/day | That agent’s runs skip |
If you see lots of “SKIPPED” reports, the cost cap has been hit. This is by design — it prevents runaway API costs.
The cost of the full suite running at default schedules is approximately $2-3/day. If you see costs consistently higher, something may be running more often than expected.