Skip to main content

Production Readiness Checklist

Use this checklist to drive issues and to gate any go-live.

1) Service basics

  • Health endpoints: liveness + readiness (and dependency checks)
  • Graceful shutdown (timeouts, draining)
  • Config via env; no hard-coded secrets
  • Versioning: build metadata, git sha, runtime version endpoint

2) Reliability

  • Timeouts and retries with backoff; idempotency where needed
  • Rate limiting / abuse protection (public endpoints)
  • External dependency isolation (circuit breakers/bulkheads where appropriate)

3) Observability

  • Structured logs with request correlation IDs
  • Metrics for golden signals (latency, traffic, errors, saturation)
  • Traces across service boundaries (OpenTelemetry)
  • Dashboards per service + platform overview
  • Alerts tied to SLOs (fast + slow burn)

4) Security

  • Authentication + authorization model documented
  • Audit logging for sensitive actions
  • Secrets management documented and enforced
  • Dependency scanning in CI

5) Delivery

  • CI gates: lint, unit tests, integration tests, security checks
  • CD: pinned build artifact; deploy uses the artifact, not the working tree
  • Rollback procedure documented and tested

6) Data (if applicable)

  • Migrations policy documented
  • Backups enabled; restore tested
  • Data retention/deletion policy documented