Skip to main content

TLS Certificate Expiry Recovery

P1 - High Priority: Certificate expiry causes immediate user-facing errors. All HTTPS traffic will fail.

Trigger Patterns

# Check certificate expiry
echo | openssl s_client -connect {{DOMAIN}}:443 2>/dev/null | openssl x509 -noout -dates

# Common log patterns
grep -E "(certificate.*expired|SSL.*invalid|x509.*expired)" /var/log/nginx/*.log
PatternBrowser ShowsImpact
NET::ERR_CERT_DATE_INVALID”Your connection is not private”All users blocked
SSL_ERROR_CERT_DATE_INVALIDFirefox security warningAll users blocked
certificate verify failedAPI calls fail silentlyBackend services down

Phase 1: Detect & Assess

AI Prompt: Certificate Triage

You are a security engineer triaging a certificate expiry incident.

**Alert Details:**
- Domain(s) Affected: {{DOMAINS}}
- Certificate Details:

{{OPENSSL_OUTPUT}}

- Current Time: {{CURRENT_TIME}}
- Expiry Time: {{EXPIRY_TIME}}

**Determine:**
1. Is the certificate actually expired or about to expire?
2. What type of certificate is it? (DV, OV, EV, wildcard, SAN)
3. Who is the issuer? (Let's Encrypt, DigiCert, internal CA, etc.)
4. What's the blast radius? (which services/domains affected)
5. Is this a renewal failure or a new certificate needed?

**Output:**
{
  "status": "expired|expiring_soon|valid",
  "certificate_type": "string",
  "issuer": "string",
  "affected_domains": ["string"],
  "affected_services": ["string"],
  "root_cause_hypothesis": "string",
  "recommended_action": "string"
}

Quick Diagnostic Commands

# Check when cert expires
echo | openssl s_client -connect example.com:443 2>/dev/null | \
  openssl x509 -noout -enddate

# Full certificate chain check
echo | openssl s_client -connect example.com:443 -showcerts 2>/dev/null | \
  openssl x509 -noout -text | grep -A2 "Validity"

# Check all certificates in a Kubernetes cluster
kubectl get secrets --all-namespaces -o json | \
  jq -r '.items[] | select(.type=="kubernetes.io/tls") |
  "\(.metadata.namespace)/\(.metadata.name)"'

# Check cert-manager certificates
kubectl get certificates --all-namespaces

Phase 2: Diagnose Root Cause

AI Prompt: Renewal Failure Analysis

You are debugging why automatic certificate renewal failed.

**Environment:**
- Certificate Manager: {{CERT_MANAGER}} (cert-manager, certbot, acme.sh, manual)
- DNS Provider: {{DNS_PROVIDER}}
- Challenge Type: {{CHALLENGE_TYPE}} (http-01, dns-01, tls-alpn-01)

**Logs:**
{{CERT_MANAGER_LOGS}}

**Analyze:**
1. Did the renewal attempt happen?
2. What challenge type was used?
3. Where did the challenge fail? (authorization, validation, issuance)
4. Is this a DNS propagation issue, rate limit, or configuration problem?

**Output a diagnosis with specific fix steps.**

Common Failure Modes


Phase 3: Execute Recovery

AI Prompt: Generate Recovery Steps

You are generating recovery steps for certificate expiry.

**Context:**
- Certificate Manager: {{CERT_MANAGER}}
- Diagnosis: {{DIAGNOSIS}}
- Urgency: Certificate {{EXPIRED_OR_EXPIRING}} in {{TIME_REMAINING}}

**Generate step-by-step recovery for the specific certificate manager in use.**

**Include:**
1. Immediate mitigation (if any)
2. Certificate renewal/regeneration steps
3. Verification commands
4. Rollback steps if renewal fails
5. Post-recovery verification

**Format each step with command and expected output.**

Recovery Commands by Platform

# 1. Check certificate status
kubectl describe certificate {{CERT_NAME}} -n {{NAMESPACE}}

# 2. Check certificate request
kubectl get certificaterequest -n {{NAMESPACE}}

# 3. Force renewal by deleting the secret
kubectl delete secret {{SECRET_NAME}} -n {{NAMESPACE}}

# 4. Trigger certificate re-issuance
kubectl delete certificate {{CERT_NAME}} -n {{NAMESPACE}}
kubectl apply -f certificate.yaml

# 5. Watch for successful issuance
kubectl get certificate {{CERT_NAME}} -n {{NAMESPACE}} -w

# 6. Check challenge status (if stuck)
kubectl get challenges -n {{NAMESPACE}}
kubectl describe challenge {{CHALLENGE_NAME}} -n {{NAMESPACE}}

# 7. Verify new certificate
kubectl get secret {{SECRET_NAME}} -n {{NAMESPACE}} -o jsonpath='{.data.tls\.crt}' | \
  base64 -d | openssl x509 -noout -dates

Phase 4: Verify Recovery

AI Prompt: Verification Checklist

Generate a verification checklist for certificate recovery.

**Recovered Certificate:**
{{NEW_CERT_DETAILS}}

**Services to Verify:** {{SERVICES_LIST}}

**Create verification steps for:**
1. Certificate validity and chain
2. All affected domains responding
3. All services using new certificate
4. No mixed content warnings
5. API clients connecting successfully

Verification Commands

# Full certificate chain validation
echo | openssl s_client -connect {{DOMAIN}}:443 -servername {{DOMAIN}} 2>/dev/null | \
  openssl x509 -noout -text | head -30

# Check certificate chain completeness
echo | openssl s_client -connect {{DOMAIN}}:443 -showcerts 2>/dev/null | \
  grep -E "(Certificate chain|s:|i:)"

# Verify from multiple locations (using curl)
for region in us-east-1 eu-west-1 ap-southeast-1; do
  echo "=== $region ==="
  curl -sI https://{{DOMAIN}} | head -5
done

# Check certificate transparency logs
curl -s "https://crt.sh/?q={{DOMAIN}}&output=json" | jq '.[0]'

# Verify no certificate warnings in logs
journalctl -u nginx --since "1 hour ago" | grep -i "ssl\|cert\|tls"

Phase 5: Communicate

AI Prompt: Status Communications

Generate incident communications for certificate expiry.

**Incident:** TLS Certificate Expiry
**Duration:** {{DURATION}}
**Affected:** {{AFFECTED_SERVICES}}
**Status:** {{STATUS}}

**Generate:**
1. Customer status page update (no technical details)
2. Internal Slack update (technical details OK)
3. Tweet/social media update (brief, reassuring)

Phase 6: Prevention

AI Prompt: Prevention Plan

Based on this certificate expiry incident, generate prevention measures.

**Root Cause:** {{ROOT_CAUSE}}
**Current Monitoring:** {{CURRENT_MONITORING}}

**Generate:**
1. Monitoring alerts to add (with specific thresholds)
2. Automation improvements
3. Process changes
4. Documentation updates needed

Prevention Checklist

immediate_actions:
  - action: "Add certificate expiry monitoring"
    tool: "Datadog/Prometheus/CloudWatch"
    threshold: "30 days before expiry = warning, 14 days = critical"

  - action: "Add to certificate inventory"
    fields: ["domain", "expiry", "issuer", "renewal_method", "owner"]

automation_improvements:
  - action: "Implement automatic renewal"
    options:
      - "cert-manager with Let's Encrypt"
      - "AWS ACM (auto-renews)"
      - "certbot with cron"

  - action: "Add renewal failure alerting"
    on_failure: "Page on-call immediately"

process_changes:
  - action: "Monthly certificate audit"
    command: "List all certs expiring in next 60 days"

  - action: "Certificate renewal runbook review"
    frequency: "Quarterly"

Full n8n Workflow

{
  "name": "DR-Certificate-Expiry",
  "nodes": [
    {
      "id": "scheduled-check",
      "type": "n8n-nodes-base.cron",
      "parameters": { "cronExpression": "0 9 * * *" },
      "notes": "Daily certificate check"
    },
    {
      "id": "check-certs",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "url": "https://{{$env.DOMAIN}}",
        "options": { "response": { "includeCertificate": true } }
      }
    },
    {
      "id": "evaluate-expiry",
      "type": "n8n-nodes-base.code",
      "parameters": {
        "jsCode": "const cert = $input.first().json.certificate;\nconst expiryDate = new Date(cert.valid_to);\nconst daysUntilExpiry = Math.floor((expiryDate - new Date()) / (1000 * 60 * 60 * 24));\n\nreturn {\n  domain: cert.subject.CN,\n  expires: cert.valid_to,\n  daysRemaining: daysUntilExpiry,\n  severity: daysUntilExpiry < 7 ? 'critical' : daysUntilExpiry < 30 ? 'warning' : 'ok'\n};"
      }
    },
    {
      "id": "alert-slack",
      "type": "n8n-nodes-base.slack",
      "parameters": {
        "channel": "#security-alerts",
        "text": "Certificate Expiry Warning\n\nDomain: {{$json.domain}}\nExpires: {{$json.expires}}\nDays Remaining: {{$json.daysRemaining}}"
      }
    }
  ]
}

Set up alerts at 60, 30, 14, and 7 days before expiry. The earlier you catch it, the easier the fix.