Skip to main content

Observability Standards

Goal: full observability across api and mcp-server with minimal operational toil.

Signals

  • Logs: structured (JSON), consistent fields, PII-safe
  • Metrics: golden signals + key business metrics
  • Traces: end-to-end for core flows

Required conventions

  • Request ID propagated (x-request-id or equivalent)
  • Trace context propagated (W3C traceparent)
  • Log fields: service, env, version, request_id, trace_id, span_id
  • Instrumentation: OpenTelemetry SDK + OTLP exporter
  • Collector: OpenTelemetry Collector
  • Metrics: Prometheus-compatible + Grafana
  • Logs: Loki/ELK-style stack
  • Traces: Tempo/Jaeger-style backend

Dashboards

  • Per-service: latency p50/p95/p99, RPS, error rate, saturation
  • Platform: dependency health, top endpoints/operations, alert status