Observability Standards
Goal: full observability acrossapi and mcp-server with minimal operational toil.
Signals
- Logs: structured (JSON), consistent fields, PII-safe
- Metrics: golden signals + key business metrics
- Traces: end-to-end for core flows
Required conventions
- Request ID propagated (
x-request-idor equivalent) - Trace context propagated (W3C
traceparent) - Log fields:
service,env,version,request_id,trace_id,span_id
Recommended stack (implementation-agnostic)
- Instrumentation: OpenTelemetry SDK + OTLP exporter
- Collector: OpenTelemetry Collector
- Metrics: Prometheus-compatible + Grafana
- Logs: Loki/ELK-style stack
- Traces: Tempo/Jaeger-style backend
Dashboards
- Per-service: latency p50/p95/p99, RPS, error rate, saturation
- Platform: dependency health, top endpoints/operations, alert status