Production Readiness Checklist
Use this checklist to drive issues and to gate any go-live.1) Service basics
- Health endpoints: liveness + readiness (and dependency checks)
- Graceful shutdown (timeouts, draining)
- Config via env; no hard-coded secrets
- Versioning: build metadata, git sha, runtime version endpoint
2) Reliability
- Timeouts and retries with backoff; idempotency where needed
- Rate limiting / abuse protection (public endpoints)
- External dependency isolation (circuit breakers/bulkheads where appropriate)
3) Observability
- Structured logs with request correlation IDs
- Metrics for golden signals (latency, traffic, errors, saturation)
- Traces across service boundaries (OpenTelemetry)
- Dashboards per service + platform overview
- Alerts tied to SLOs (fast + slow burn)
4) Security
- Authentication + authorization model documented
- Audit logging for sensitive actions
- Secrets management documented and enforced
- Dependency scanning in CI
5) Delivery
- CI gates: lint, unit tests, integration tests, security checks
- CD: pinned build artifact; deploy uses the artifact, not the working tree
- Rollback procedure documented and tested
6) Data (if applicable)
- Migrations policy documented
- Backups enabled; restore tested
- Data retention/deletion policy documented