TLS Certificate Expiry Recovery
P1 - High Priority: Certificate expiry causes immediate user-facing errors. All HTTPS traffic will fail.
Trigger Patterns
Copy
# Check certificate expiry
echo | openssl s_client -connect {{DOMAIN}}:443 2>/dev/null | openssl x509 -noout -dates
# Common log patterns
grep -E "(certificate.*expired|SSL.*invalid|x509.*expired)" /var/log/nginx/*.log
| Pattern | Browser Shows | Impact |
|---|---|---|
NET::ERR_CERT_DATE_INVALID | ”Your connection is not private” | All users blocked |
SSL_ERROR_CERT_DATE_INVALID | Firefox security warning | All users blocked |
certificate verify failed | API calls fail silently | Backend services down |
Phase 1: Detect & Assess
AI Prompt: Certificate Triage
Copy
You are a security engineer triaging a certificate expiry incident.
**Alert Details:**
- Domain(s) Affected: {{DOMAINS}}
- Certificate Details:
{{OPENSSL_OUTPUT}}
- Current Time: {{CURRENT_TIME}}
- Expiry Time: {{EXPIRY_TIME}}
**Determine:**
1. Is the certificate actually expired or about to expire?
2. What type of certificate is it? (DV, OV, EV, wildcard, SAN)
3. Who is the issuer? (Let's Encrypt, DigiCert, internal CA, etc.)
4. What's the blast radius? (which services/domains affected)
5. Is this a renewal failure or a new certificate needed?
**Output:**
{
"status": "expired|expiring_soon|valid",
"certificate_type": "string",
"issuer": "string",
"affected_domains": ["string"],
"affected_services": ["string"],
"root_cause_hypothesis": "string",
"recommended_action": "string"
}
Quick Diagnostic Commands
Copy
# Check when cert expires
echo | openssl s_client -connect example.com:443 2>/dev/null | \
openssl x509 -noout -enddate
# Full certificate chain check
echo | openssl s_client -connect example.com:443 -showcerts 2>/dev/null | \
openssl x509 -noout -text | grep -A2 "Validity"
# Check all certificates in a Kubernetes cluster
kubectl get secrets --all-namespaces -o json | \
jq -r '.items[] | select(.type=="kubernetes.io/tls") |
"\(.metadata.namespace)/\(.metadata.name)"'
# Check cert-manager certificates
kubectl get certificates --all-namespaces
Phase 2: Diagnose Root Cause
AI Prompt: Renewal Failure Analysis
Copy
You are debugging why automatic certificate renewal failed.
**Environment:**
- Certificate Manager: {{CERT_MANAGER}} (cert-manager, certbot, acme.sh, manual)
- DNS Provider: {{DNS_PROVIDER}}
- Challenge Type: {{CHALLENGE_TYPE}} (http-01, dns-01, tls-alpn-01)
**Logs:**
{{CERT_MANAGER_LOGS}}
**Analyze:**
1. Did the renewal attempt happen?
2. What challenge type was used?
3. Where did the challenge fail? (authorization, validation, issuance)
4. Is this a DNS propagation issue, rate limit, or configuration problem?
**Output a diagnosis with specific fix steps.**
Common Failure Modes
Phase 3: Execute Recovery
AI Prompt: Generate Recovery Steps
Copy
You are generating recovery steps for certificate expiry.
**Context:**
- Certificate Manager: {{CERT_MANAGER}}
- Diagnosis: {{DIAGNOSIS}}
- Urgency: Certificate {{EXPIRED_OR_EXPIRING}} in {{TIME_REMAINING}}
**Generate step-by-step recovery for the specific certificate manager in use.**
**Include:**
1. Immediate mitigation (if any)
2. Certificate renewal/regeneration steps
3. Verification commands
4. Rollback steps if renewal fails
5. Post-recovery verification
**Format each step with command and expected output.**
Recovery Commands by Platform
- cert-manager (K8s)
- Certbot (Let's Encrypt)
- acme.sh
- AWS ACM
- Emergency Manual
Copy
# 1. Check certificate status
kubectl describe certificate {{CERT_NAME}} -n {{NAMESPACE}}
# 2. Check certificate request
kubectl get certificaterequest -n {{NAMESPACE}}
# 3. Force renewal by deleting the secret
kubectl delete secret {{SECRET_NAME}} -n {{NAMESPACE}}
# 4. Trigger certificate re-issuance
kubectl delete certificate {{CERT_NAME}} -n {{NAMESPACE}}
kubectl apply -f certificate.yaml
# 5. Watch for successful issuance
kubectl get certificate {{CERT_NAME}} -n {{NAMESPACE}} -w
# 6. Check challenge status (if stuck)
kubectl get challenges -n {{NAMESPACE}}
kubectl describe challenge {{CHALLENGE_NAME}} -n {{NAMESPACE}}
# 7. Verify new certificate
kubectl get secret {{SECRET_NAME}} -n {{NAMESPACE}} -o jsonpath='{.data.tls\.crt}' | \
base64 -d | openssl x509 -noout -dates
Copy
# 1. Check certificate status
sudo certbot certificates
# 2. Test renewal (dry run)
sudo certbot renew --dry-run
# 3. Force renewal
sudo certbot renew --force-renewal
# 4. If renewal fails, try with debug
sudo certbot renew --force-renewal --debug-challenges
# 5. Restart web server
sudo systemctl reload nginx # or apache2
# 6. Verify new certificate
echo | openssl s_client -connect localhost:443 2>/dev/null | \
openssl x509 -noout -dates
Copy
# 1. Check certificate
~/.acme.sh/acme.sh --list
# 2. Force renewal
~/.acme.sh/acme.sh --renew -d {{DOMAIN}} --force
# 3. If DNS challenge, ensure credentials
export CF_Token="{{CLOUDFLARE_TOKEN}}"
export CF_Account_ID="{{CLOUDFLARE_ACCOUNT}}"
# 4. Issue new certificate
~/.acme.sh/acme.sh --issue -d {{DOMAIN}} --dns dns_cf
# 5. Install certificate
~/.acme.sh/acme.sh --install-cert -d {{DOMAIN}} \
--key-file /etc/ssl/private/{{DOMAIN}}.key \
--fullchain-file /etc/ssl/certs/{{DOMAIN}}.crt \
--reloadcmd "systemctl reload nginx"
Copy
# 1. Check certificate status
aws acm describe-certificate --certificate-arn {{CERT_ARN}}
# 2. If validation failed, check DNS
aws acm describe-certificate --certificate-arn {{CERT_ARN}} | \
jq '.Certificate.DomainValidationOptions'
# 3. Request new certificate (if needed)
aws acm request-certificate \
--domain-name {{DOMAIN}} \
--validation-method DNS \
--subject-alternative-names {{SAN_DOMAINS}}
# 4. Get DNS validation records
aws acm describe-certificate --certificate-arn {{NEW_CERT_ARN}} | \
jq '.Certificate.DomainValidationOptions[].ResourceRecord'
# 5. Update load balancer listener (if replacing)
aws elbv2 modify-listener \
--listener-arn {{LISTENER_ARN}} \
--certificates CertificateArn={{NEW_CERT_ARN}}
This generates a self-signed certificate and will cause browser warnings. Replace with proper certificate ASAP!
Copy
# 1. Generate self-signed certificate
openssl req -x509 -nodes -days 1 -newkey rsa:2048 \
-keyout /tmp/emergency.key \
-out /tmp/emergency.crt \
-subj "/CN={{DOMAIN}}"
# 2. Backup existing certs
sudo cp /etc/ssl/certs/{{DOMAIN}}.crt /etc/ssl/certs/{{DOMAIN}}.crt.bak
sudo cp /etc/ssl/private/{{DOMAIN}}.key /etc/ssl/private/{{DOMAIN}}.key.bak
# 3. Install emergency cert
sudo cp /tmp/emergency.crt /etc/ssl/certs/{{DOMAIN}}.crt
sudo cp /tmp/emergency.key /etc/ssl/private/{{DOMAIN}}.key
# 4. Reload web server
sudo systemctl reload nginx
Phase 4: Verify Recovery
AI Prompt: Verification Checklist
Copy
Generate a verification checklist for certificate recovery.
**Recovered Certificate:**
{{NEW_CERT_DETAILS}}
**Services to Verify:** {{SERVICES_LIST}}
**Create verification steps for:**
1. Certificate validity and chain
2. All affected domains responding
3. All services using new certificate
4. No mixed content warnings
5. API clients connecting successfully
Verification Commands
Copy
# Full certificate chain validation
echo | openssl s_client -connect {{DOMAIN}}:443 -servername {{DOMAIN}} 2>/dev/null | \
openssl x509 -noout -text | head -30
# Check certificate chain completeness
echo | openssl s_client -connect {{DOMAIN}}:443 -showcerts 2>/dev/null | \
grep -E "(Certificate chain|s:|i:)"
# Verify from multiple locations (using curl)
for region in us-east-1 eu-west-1 ap-southeast-1; do
echo "=== $region ==="
curl -sI https://{{DOMAIN}} | head -5
done
# Check certificate transparency logs
curl -s "https://crt.sh/?q={{DOMAIN}}&output=json" | jq '.[0]'
# Verify no certificate warnings in logs
journalctl -u nginx --since "1 hour ago" | grep -i "ssl\|cert\|tls"
Phase 5: Communicate
AI Prompt: Status Communications
Copy
Generate incident communications for certificate expiry.
**Incident:** TLS Certificate Expiry
**Duration:** {{DURATION}}
**Affected:** {{AFFECTED_SERVICES}}
**Status:** {{STATUS}}
**Generate:**
1. Customer status page update (no technical details)
2. Internal Slack update (technical details OK)
3. Tweet/social media update (brief, reassuring)
Phase 6: Prevention
AI Prompt: Prevention Plan
Copy
Based on this certificate expiry incident, generate prevention measures.
**Root Cause:** {{ROOT_CAUSE}}
**Current Monitoring:** {{CURRENT_MONITORING}}
**Generate:**
1. Monitoring alerts to add (with specific thresholds)
2. Automation improvements
3. Process changes
4. Documentation updates needed
Prevention Checklist
Copy
immediate_actions:
- action: "Add certificate expiry monitoring"
tool: "Datadog/Prometheus/CloudWatch"
threshold: "30 days before expiry = warning, 14 days = critical"
- action: "Add to certificate inventory"
fields: ["domain", "expiry", "issuer", "renewal_method", "owner"]
automation_improvements:
- action: "Implement automatic renewal"
options:
- "cert-manager with Let's Encrypt"
- "AWS ACM (auto-renews)"
- "certbot with cron"
- action: "Add renewal failure alerting"
on_failure: "Page on-call immediately"
process_changes:
- action: "Monthly certificate audit"
command: "List all certs expiring in next 60 days"
- action: "Certificate renewal runbook review"
frequency: "Quarterly"
Full n8n Workflow
Copy
{
"name": "DR-Certificate-Expiry",
"nodes": [
{
"id": "scheduled-check",
"type": "n8n-nodes-base.cron",
"parameters": { "cronExpression": "0 9 * * *" },
"notes": "Daily certificate check"
},
{
"id": "check-certs",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"url": "https://{{$env.DOMAIN}}",
"options": { "response": { "includeCertificate": true } }
}
},
{
"id": "evaluate-expiry",
"type": "n8n-nodes-base.code",
"parameters": {
"jsCode": "const cert = $input.first().json.certificate;\nconst expiryDate = new Date(cert.valid_to);\nconst daysUntilExpiry = Math.floor((expiryDate - new Date()) / (1000 * 60 * 60 * 24));\n\nreturn {\n domain: cert.subject.CN,\n expires: cert.valid_to,\n daysRemaining: daysUntilExpiry,\n severity: daysUntilExpiry < 7 ? 'critical' : daysUntilExpiry < 30 ? 'warning' : 'ok'\n};"
}
},
{
"id": "alert-slack",
"type": "n8n-nodes-base.slack",
"parameters": {
"channel": "#security-alerts",
"text": "Certificate Expiry Warning\n\nDomain: {{$json.domain}}\nExpires: {{$json.expires}}\nDays Remaining: {{$json.daysRemaining}}"
}
}
]
}
Related Scenarios
- Database Corruption - When database issues occur
- Security Breach - When certificates are compromised
Set up alerts at 60, 30, 14, and 7 days before expiry. The earlier you catch it, the easier the fix.