Skip to main content

KAHN Cloud: per-tenant /api/self ships

Headline

/api/self/tenant is a new auth-scoped endpoint that answers two operator questions the public /api/self cannot: “How many agent runs does my tenant have in storage?” and “Is my producer being rate-limited?”

What changed

The trust boundary is now explicit:
  • GET /api/self — public, no auth. Process-global liveness signal: uptime, last-event ages, archived-runs count. Consumed by runbook smoke probes that intentionally curl without a JWT (post-deploy uptime checks, external monitors).
  • GET /api/self/tenant — auth-scoped (airlock JWT in cloud, localhost sentinel in OSS). Per-tenant data: agent_archived_runs (real count, not a placeholder), rate_limit snapshot.
The prior placeholder agent_archived_runs: 0 on the public endpoint is gone. It was load-bearing dead weight that no SPA surface rendered.

The rate-limit snapshot

rate_limit carries:
  • capacity, refill_rate_per_sec — the bucket configuration
  • remaining_tokens — current bucket level (continuous-refill float)
  • recent_throttled_count — number of HTTP 429s this tenant received in the last 5 minutes
  • window_s — width of the throttled-event ring buffer
The snapshot is read-only. Repeated polls do NOT consume tokens. SPA dashboards refreshing every 5s would otherwise throttle the producer they’re monitoring; this property is a unit-tested invariant (“does not consume tokens under 50 polls”).

Why this matters

A producer asking “is my emitter throttled?” used to require either grepping logs or making the operator triage a 429 spike on backend metrics. Now it’s a JWT-authenticated curl that returns the answer:
curl -H "Authorization: Bearer $KAHN_LIVE_SK" \
  https://api.kahn.host/api/self/tenant | jq .rate_limit
recent_throttled_count: 0 means the tenant is steady-state. Non-zero means throttling is currently happening (or just happened). The lazy-prune-on-read pattern means the hot allow() path stays O(1) regardless of throttle volume.

Substrate work this enables

The dashboards on /#/agents and /#/audits consume real per-tenant data, not a placeholder. Building Phase B against the placeholder would have required a second pass — substrate-first sequencing prevented the rework. agent_archived_runs is now wired through both storage backends with parity-tested counts (count_agent_runs() on FS + Postgres). The trust-boundary split (public vs. tenant-scoped) is a pattern any future per-tenant introspection endpoint can copy.

What’s next

Per-tenant audit-policy webhooks (Phase E E1) consume this endpoint shape. Rate-limit visibility is a precondition for the per-(tenant_id, agent_id) bucket overlay (Phase D D4) — operators need to see the limit before they can argue for granularity.