SLOs (Service Level Objectives)
Define SLOs per service and per critical endpoint/flow.Starter SLOs
api
- Availability: 99.9% monthly for core endpoints
- Latency: 95% of requests under 300ms (adjust per endpoint)
- Error rate: < 0.1% (5xx) for core endpoints
mcp-server
- Availability: 99.9% monthly
- Latency: 95% under 500ms for core tool calls (adjust per operation)
- Error rate: < 0.2% (depending on dependency behavior)
Error budgets
- Tie alerting to budget burn rates (fast + slow burn)
- Allocate explicit time for reliability work when budgets burn