Skip to main content

DEPLOYMENT_GUIDE.md

TimeChain Protocol Stack - Production Deployment Guide

Version: 1.0.0
Date: December 6, 2025
Audience: DevOps Engineers, System Administrators, SREs

Table of Contents

  1. Quick Start
  2. Prerequisites
  3. Environment Setup
  4. Deployment Strategies
  5. Configuration Guide
  6. Monitoring Setup
  7. Troubleshooting
  8. Scaling
  9. Disaster Recovery
  10. Performance Tuning

Quick Start

5-Minute Development Deployment

# 1. Clone and build
git clone https://github.com/timechain/protocol.git
cd timechain
cargo build --release

# 2. Run single node
cargo run --release -- --environment development

# 3. Verify deployment
curl http://localhost:8080/health
# Expected: { "status": "healthy" }

15-Minute Staging Deployment

# 1. Create staging namespace
kubectl create namespace staging

# 2. Deploy with Helm
helm repo add timechain https://charts.timechain.io
helm install timechain timechain/timechain \
  --namespace staging \
  --values values-staging.yaml

# 3. Verify cluster health
kubectl get pods -n staging
kubectl logs -n staging -l app=timechain

Prerequisites

System Requirements

Hardware:
  • CPU: 2+ cores per node
  • Memory: 4GB+ per node (8GB recommended)
  • Storage: 50GB+ SSD per node
  • Network: 1Gbps+ connectivity
Software:
  • Kubernetes: 1.24+ (EKS, GKE, or self-managed)
  • Docker: 20.10+ (or containerd)
  • kubectl: 1.24+
  • Helm: 3.10+

Kubernetes Cluster Setup

# Verify cluster is running
kubectl cluster-info
kubectl get nodes

# Expected output:
# NAME              STATUS   ROLES    AGE   VERSION
# node-1            Ready    worker   10d   v1.28.0
# node-2            Ready    worker   10d   v1.28.0
# node-3            Ready    worker   10d   v1.28.0

Network Requirements

PortProtocolPurposeDirection
8080HTTPREST APIIngress
9443HTTPSSecure APIIngress
6379TCPDistributed cacheInternal
9090HTTPPrometheus metricsInternal
5432TCPAudit databaseInternal

Environment Setup

1. Development Environment (Single Node)

Use Case: Local testing, feature development
# config/development.toml
[deployment]
environment = "development"
replicas = 1
tls_enabled = false
backup_enabled = false
debug_mode = true

[monitoring]
metrics_enabled = true
logging_level = "debug"
health_check_interval_ms = 30000

[performance]
thread_pool_size = 2
buffer_size = 1024
Deploy:
cargo run --release -- --config config/development.toml
Verify:
curl http://localhost:8080/health
curl http://localhost:8080/metrics

2. Staging Environment (3 Replicas)

Use Case: Integration testing, pre-production validation
# config/staging.yaml (Kubernetes ConfigMap)
apiVersion: v1
kind: ConfigMap
metadata:
    name: timechain-config
    namespace: staging
data:
    config.toml: |
        [deployment]
        environment = "staging"
        replicas = 3
        tls_enabled = true
        backup_enabled = true
        backup_interval_hours = 6

        [monitoring]
        metrics_enabled = true
        logging_level = "info"
        health_check_interval_ms = 10000

        [performance]
        thread_pool_size = 4
        buffer_size = 8192
        max_connections = 1000
Deploy with Helm:
helm install timechain ./chart \
  --namespace staging \
  --values values-staging.yaml \
  -f config/staging.yaml
Verify:
# Check pod status
kubectl get pods -n staging
kubectl describe pod -n staging <pod-name>

# Check logs
kubectl logs -n staging -f <pod-name>

# Port forward for testing
kubectl port-forward -n staging svc/timechain-service 8080:8080
curl http://localhost:8080/health

3. Production Environment (5+ Replicas)

Use Case: High-availability, production workloads
# config/production.yaml
apiVersion: v1
kind: ConfigMap
metadata:
    name: timechain-config
    namespace: production
data:
    config.toml: |
        [deployment]
        environment = "production"
        replicas = 5
        tls_enabled = true
        backup_enabled = true
        backup_interval_hours = 1
        backup_retention_days = 30

        [monitoring]
        metrics_enabled = true
        logging_level = "warn"
        health_check_interval_ms = 5000

        [performance]
        thread_pool_size = 8
        buffer_size = 16384
        max_connections = 5000

        [security]
        rate_limit_per_second = 10000
        request_timeout_seconds = 30
        enable_audit_logging = true
Production Deployment Checklist:
  • Create production namespace: kubectl create namespace production
  • Set up persistent volumes for state
  • Configure backup storage (S3/GCS)
  • Set up TLS certificates
  • Configure ingress/load balancer
  • Enable audit logging
  • Set up monitoring dashboards
  • Configure alerting rules
  • Perform load testing
  • Plan rollback procedure

Deployment Strategies

Characteristics:
  • Zero downtime
  • Gradual rollout (10% per step)
  • Automatic rollback on failure
  • Duration: ~50s for 5 nodes
Helm Deployment:
# values.yaml
strategy:
    type: RollingUpdate
    rollingUpdate:
        maxSurge: 1
        maxUnavailable: 0

updateStrategy:
    type: RollingUpdate
    rollingUpdate:
        maxSurge: 1
        maxUnavailable: 0
Deploy:
# Initial deployment
helm install timechain ./chart --namespace production

# Update to new version
helm upgrade timechain ./chart \
  --namespace production \
  --values values.yaml
Monitor Progress:
# Watch rollout
kubectl rollout status deployment/timechain -n production

# Check rollout history
kubectl rollout history deployment/timechain -n production

# Rollback if needed
kubectl rollout undo deployment/timechain -n production

2. Canary Deployment (Staging Validation)

Characteristics:
  • Validates new version with small traffic %
  • Automatic traffic shift
  • Duration: ~20s total
Deployment Steps:
# 1. Deploy canary (10% traffic)
kubectl set image deployment/timechain \
  timechain=timechain:v1.0.1 \
  --record -n staging

# 2. Monitor canary metrics
kubectl top pod -n staging

# 3. If healthy, promote canary to stable
# (Traffic shift to 100%)

# 4. If unhealthy, rollback
kubectl rollout undo deployment/timechain -n staging

3. Blue-Green Deployment (Minimal Risk)

Characteristics:
  • Parallel environments
  • Instant switchover
  • Easy rollback
  • Duration: ~5s switch
Helm Deployment:
# Deploy green version
helm install timechain-green ./chart \
  --namespace production \
  --values values-green.yaml

# Test green version
kubectl port-forward svc/timechain-green 8080:8080
# Run smoke tests...

# Switch traffic to green
kubectl patch service timechain -p \
  '{"spec":{"selector":{"version":"green"}}}'

# Keep blue running for quick rollback

4. Immediate Deployment (Development Only)

Use: Feature development, local testing
# Direct deployment
cargo build --release
./target/release/timechain --config config/development.toml

Configuration Guide

Basic Configuration

# .tcproto/config.toml

[deployment]
# Environment profile: development, staging, production
environment = "staging"

# Number of replicas
replicas = 3

# TLS certificate configuration
tls_enabled = true
tls_cert_path = "/etc/timechain/tls/cert.pem"
tls_key_path = "/etc/timechain/tls/key.pem"

# Backup configuration
backup_enabled = true
backup_interval_hours = 6
backup_storage = "s3://timechain-backups"
backup_retention_days = 30

Advanced Configuration

[monitoring]
# Prometheus metrics
metrics_enabled = true
metrics_port = 9090

# Logging configuration
logging_level = "info"  # debug, info, warn, error
logging_format = "json"
log_output = "stdout"

# Health checks
health_check_interval_ms = 10000
health_check_timeout_ms = 5000

[performance]
# Thread pool size (typically 2x CPU cores)
thread_pool_size = 8

# In-memory buffer size
buffer_size = 16384

# Maximum concurrent connections
max_connections = 5000

# Operation timeout
operation_timeout_ms = 30000

[security]
# Rate limiting
rate_limit_per_second = 10000
burst_limit = 20000

# Authentication
auth_enabled = true
auth_header = "Authorization"

# Encryption
encryption_key_path = "/etc/timechain/encryption/key"
cipher = "AEAD"

[database]
# Audit log database
db_type = "postgresql"
db_host = "postgres.production.svc.cluster.local"
db_port = 5432
db_name = "timechain_audit"
db_user = "timechain"
db_password_secret = "timechain-db-password"

Environment Variables

# Override config with environment variables
export TIMECHAIN_ENVIRONMENT=production
export TIMECHAIN_REPLICAS=5
export TIMECHAIN_TLS_ENABLED=true
export TIMECHAIN_BACKUP_STORAGE=s3://my-bucket
export TIMECHAIN_LOG_LEVEL=info

Monitoring Setup

Prometheus Configuration

# prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
    name: prometheus-config
    namespace: production
data:
    prometheus.yml: |
        global:
          scrape_interval: 15s
          evaluation_interval: 15s

        scrape_configs:
          - job_name: 'timechain'
            static_configs:
              - targets: ['localhost:9090']
            metrics_path: '/metrics'
Deploy Prometheus:
kubectl apply -f prometheus-config.yaml
kubectl apply -f prometheus-deployment.yaml

Key Metrics to Monitor

# E2E Latency (target: <11ms P99)
histogram_quantile(0.99, timechain_operation_latency_ms)

# Throughput (target: >3,000 ops/sec)
rate(timechain_operations_total[1m])

# Error Rate (target: <0.1%)
rate(timechain_errors_total[1m])

# Service Health
up{job="timechain"}

# Node Status
timechain_nodes_running / timechain_nodes_total

# Backup Status
timechain_backup_last_timestamp
timechain_backup_status

Grafana Dashboard

Create dashboard with panels for:
  • E2E latency (P50/P95/P99)
  • Throughput (ops/sec)
  • Error rate
  • Node health
  • CPU/memory usage
  • Storage consumption
  • Backup status
Example Panel (JSON):
{
    "title": "E2E Latency P99",
    "targets": [
        {
            "expr": "histogram_quantile(0.99, timechain_operation_latency_ms)"
        }
    ],
    "thresholds": [{ "value": 11, "color": "red" }]
}

Troubleshooting

Common Issues

1. Pods Not Starting

# Check pod status
kubectl describe pod -n production <pod-name>

# Check logs
kubectl logs -n production <pod-name>

# Common causes:
# - Insufficient resources: kubectl top nodes
# - Image pull failure: docker pull <image>
# - Config mount failed: kubectl get cm -n production

# Solution:
kubectl delete pod -n production <pod-name>
# Kubernetes will restart it

2. High Latency

# Check node performance
kubectl top nodes
kubectl top pod -n production

# Check network connectivity
kubectl exec -n production <pod-name> -- ping <other-pod>

# Check disk I/O
kubectl exec -n production <pod-name> -- iostat -x 1 5

# Solution:
# - Scale horizontally: kubectl scale deployment timechain --replicas=6
# - Increase resources: kubectl set resources deployment timechain -c=timechain \
#   --requests=cpu=1,memory=4Gi --limits=cpu=2,memory=8Gi

3. Backup Failures

# Check backup status
kubectl logs -n production deployment/timechain | grep backup

# Verify backup storage credentials
kubectl get secret timechain-backup-creds -n production

# Check storage permissions
kubectl exec -n production <pod-name> -- aws s3 ls s3://timechain-backups/

# Solution:
# - Verify S3 bucket exists: aws s3 ls
# - Check IAM permissions
# - Recreate backup creds: kubectl create secret generic timechain-backup-creds ...

4. Authentication Failures

# Check certificate expiration
kubectl get certificate -n production
kubectl describe certificate timechain-tls -n production

# Renew certificate if needed
kubectl delete certificate timechain-tls -n production
# cert-manager will automatically renew

# Check TLS configuration
kubectl exec -n production <pod-name> -- openssl x509 -in /etc/timechain/tls/cert.pem -text

Debugging Commands

# Full cluster status
kubectl get all -n production

# Recent events
kubectl get events -n production --sort-by='.lastTimestamp'

# Pod logs with timestamps
kubectl logs -n production <pod-name> --timestamps=true

# Previous pod logs (if pod crashed)
kubectl logs -n production <pod-name> --previous

# Execute command in pod
kubectl exec -n production <pod-name> -- <command>

# Interactive shell
kubectl exec -it -n production <pod-name> -- /bin/bash

# Port forward for local testing
kubectl port-forward -n production <pod-name> 8080:8080

Scaling

Horizontal Scaling (Add Nodes)

# Scale deployment
kubectl scale deployment timechain --replicas=10 -n production

# Verify scaling
kubectl get pods -n production | wc -l

# Monitor rollout
kubectl rollout status deployment/timechain -n production

Vertical Scaling (Increase Resources)

# Update resource requests/limits
kubectl set resources deployment timechain \
  -c=timechain \
  --requests=cpu=2,memory=8Gi \
  --limits=cpu=4,memory=16Gi \
  -n production

# Verify changes
kubectl get deployment -n production -o yaml | grep -A 10 resources

Auto-Scaling

# autoscaling.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: timechain-hpa
    namespace: production
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: timechain
    minReplicas: 5
    maxReplicas: 20
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70
        - type: Resource
          resource:
              name: memory
              target:
                  type: Utilization
                  averageUtilization: 80
Deploy HPA:
kubectl apply -f autoscaling.yaml
kubectl get hpa -n production --watch

Disaster Recovery

Backup Strategy

Automated Backups:
# backup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
    name: timechain-backup
    namespace: production
spec:
    schedule: "0 * * * *" # Hourly
    jobTemplate:
        spec:
            template:
                spec:
                    containers:
                        - name: backup
                          image: timechain:1.0.0
                          command: ["/backup.sh"]
                          env:
                              - name: BACKUP_STORAGE
                                value: "s3://timechain-backups"
                    restartPolicy: OnFailure

Restore Procedure

# 1. List available backups
kubectl exec -n production <pod-name> -- \
  aws s3 ls s3://timechain-backups/ --recursive

# 2. Restore from backup
kubectl exec -n production <pod-name> -- \
  aws s3 cp s3://timechain-backups/backup-2025-12-06.tar.gz /tmp/

# 3. Verify restore
kubectl exec -n production <pod-name> -- tar -tzf /tmp/backup-2025-12-06.tar.gz

# 4. Apply backup data
# (Specific to your data structure)

# 5. Verify data integrity
curl http://localhost:8080/health

Failover Procedure

# 1. Detect failure
kubectl get pods -n production | grep -i error

# 2. Remove failed node (if hardware failure)
kubectl drain node/<failed-node> --ignore-daemonsets

# 3. Kubernetes automatically respawns pods on healthy nodes

# 4. Verify cluster health
kubectl get nodes -n production
kubectl get pods -n production

Performance Tuning

CPU & Memory Optimization

# Monitor resource usage
kubectl top pods -n production --sort-by=memory

# Adjust if needed
kubectl set resources deployment timechain \
  -c=timechain \
  --requests=cpu=1,memory=4Gi \
  --limits=cpu=2,memory=8Gi

Network Optimization

# Enable network policies
kubectl apply -f network-policy.yaml

# Optimize DNS resolution
kubectl exec -n production <pod-name> -- nslookup kubernetes.default.svc.cluster.local

Storage Optimization

# Monitor disk usage
kubectl exec -n production <pod-name> -- df -h

# Enable compression
# (In config.toml)
[storage]
compression_enabled = true
compression_algorithm = "zstd"

Production Checklist

Before deploying to production:
  • All 313 tests passing
  • Performance SLAs validated
  • Security audit complete
  • Backup and recovery tested
  • Monitoring dashboards ready
  • Alerting rules configured
  • Team trained on deployment
  • Runbooks updated
  • Rollback procedure verified
  • Load testing completed
  • Documentation reviewed
  • Go/no-go approval obtained

Support & Escalation

For Issues:
  1. Check logs: kubectl logs -n production deployment/timechain
  2. Consult troubleshooting guide above
  3. File issue: https://github.com/timechain/protocol/issues
  4. Contact support: support@timechain.io

Document Info

Version: 1.0.0
Last Updated: December 6, 2025
Status: Production Ready ✅