Skip to main content

KAHN Scope: per-agent dashboard ships

Headline

/#/agents is no longer a row-per-run table. It’s a row-per-agent dashboard — busiest first, outcome histogram per row, p50/p95 duration, mean convergence, sparkline trend, last-seen relative timestamp. The operator’s eye lands on the right thing in one render.

What changed

The Phase A surface answered “what runs ran?” The Phase B dashboard answers questions Phase A couldn’t:
  • Which agent is busiest? Rows ordered by run_count descending.
  • Which agent is failing? Outcome histogram (5-segment stacked bar, scaled across rows so cross-row comparison is honest).
  • How fast are agents converging? p50/p95 duration columns; linear-interpolation percentiles, byte-equivalent across FS+Postgres backends.
  • Is convergence trending up? Sparkline column with fixed [0,1] y-axis (cross-row comparable), endpoint dot coloured by the convergence-badge bucket of the latest score.
  • When was this agent last active? Relative-time last_seen.
Click any agent_id and the surface filters to that agent’s run list — Phase A’s row-per-run table is preserved as the inner panel. Both surfaces are reachable; neither has regressed.

The window selector

24h | 7d | all toggles the rollup window. Unknown values 400 with a documented allow-list — operators cannot typo new windows into existence.

Why this matters for pilots

The kahn.host external-pilot conversation through Phase A was “we can show you a list of CI runs with agent labels.” Phase B converts it into “we can show you which of your agents is currently degrading, before your customers tell you.” That’s the kahn.tools pitch made concrete.

What’s next

The dashboard is the substrate for two follow-on surfaces: cross-run audit flakiness (/#/audits, also shipped Phase B) and a future per-agent reliability heatmap (north-star Phase C deliverable). The aggregation endpoint serving the dashboard — /api/agent-runs/aggregate — is wire-shape-pinned in kahn-hq/docs/spec/agent-runs-aggregate.md and ready for SDK consumption. The landing-page rewrite at kahn.tools can now anchor on real screenshots of the dashboard, not placeholder copy. Specifically: a screenshot of the audit-runner row with its 6-event rollup (3 audits × pass+fail) is in tree as a check-in fixture and renders deterministically.