DEPLOYMENT_GUIDE.md

TimeChain Protocol Stack - Production Deployment Guide

Version: 1.0.0
Date: December 6, 2025
Audience: DevOps Engineers, System Administrators, SREs

Quick Start
Prerequisites
Environment Setup
Deployment Strategies
Configuration Guide
Monitoring Setup
Troubleshooting
Scaling
Disaster Recovery
Performance Tuning

Quick Start

5-Minute Development Deployment

# 1. Clone and build
git clone https://github.com/timechain/protocol.git
cd timechain
cargo build --release

# 2. Run single node
cargo run --release -- --environment development

# 3. Verify deployment
curl http://localhost:8080/health
# Expected: { "status": "healthy" }

15-Minute Staging Deployment

# 1. Create staging namespace
kubectl create namespace staging

# 2. Deploy with Helm
helm repo add timechain https://charts.timechain.io
helm install timechain timechain/timechain \
  --namespace staging \
  --values values-staging.yaml

# 3. Verify cluster health
kubectl get pods -n staging
kubectl logs -n staging -l app=timechain

Prerequisites

System Requirements

Hardware:

CPU: 2+ cores per node
Memory: 4GB+ per node (8GB recommended)
Storage: 50GB+ SSD per node
Network: 1Gbps+ connectivity

Software:

Kubernetes: 1.24+ (EKS, GKE, or self-managed)
Docker: 20.10+ (or containerd)
kubectl: 1.24+
Helm: 3.10+

Kubernetes Cluster Setup

# Verify cluster is running
kubectl cluster-info
kubectl get nodes

# Expected output:
# NAME              STATUS   ROLES    AGE   VERSION
# node-1            Ready    worker   10d   v1.28.0
# node-2            Ready    worker   10d   v1.28.0
# node-3            Ready    worker   10d   v1.28.0

Network Requirements

Port	Protocol	Purpose	Direction
8080	HTTP	REST API	Ingress
9443	HTTPS	Secure API	Ingress
6379	TCP	Distributed cache	Internal
9090	HTTP	Prometheus metrics	Internal
5432	TCP	Audit database	Internal

Environment Setup

1. Development Environment (Single Node)

Use Case: Local testing, feature development

# config/development.toml
[deployment]
environment = "development"
replicas = 1
tls_enabled = false
backup_enabled = false
debug_mode = true

[monitoring]
metrics_enabled = true
logging_level = "debug"
health_check_interval_ms = 30000

[performance]
thread_pool_size = 2
buffer_size = 1024

Deploy:

cargo run --release -- --config config/development.toml

Verify:

curl http://localhost:8080/health
curl http://localhost:8080/metrics

2. Staging Environment (3 Replicas)

Use Case: Integration testing, pre-production validation

# config/staging.yaml (Kubernetes ConfigMap)
apiVersion: v1
kind: ConfigMap
metadata:
    name: timechain-config
    namespace: staging
data:
    config.toml: |
        [deployment]
        environment = "staging"
        replicas = 3
        tls_enabled = true
        backup_enabled = true
        backup_interval_hours = 6

        [monitoring]
        metrics_enabled = true
        logging_level = "info"
        health_check_interval_ms = 10000

        [performance]
        thread_pool_size = 4
        buffer_size = 8192
        max_connections = 1000

Deploy with Helm:

helm install timechain ./chart \
  --namespace staging \
  --values values-staging.yaml \
  -f config/staging.yaml

Verify:

# Check pod status
kubectl get pods -n staging
kubectl describe pod -n staging <pod-name>

# Check logs
kubectl logs -n staging -f <pod-name>

# Port forward for testing
kubectl port-forward -n staging svc/timechain-service 8080:8080
curl http://localhost:8080/health

3. Production Environment (5+ Replicas)

Use Case: High-availability, production workloads

# config/production.yaml
apiVersion: v1
kind: ConfigMap
metadata:
    name: timechain-config
    namespace: production
data:
    config.toml: |
        [deployment]
        environment = "production"
        replicas = 5
        tls_enabled = true
        backup_enabled = true
        backup_interval_hours = 1
        backup_retention_days = 30

        [monitoring]
        metrics_enabled = true
        logging_level = "warn"
        health_check_interval_ms = 5000

        [performance]
        thread_pool_size = 8
        buffer_size = 16384
        max_connections = 5000

        [security]
        rate_limit_per_second = 10000
        request_timeout_seconds = 30
        enable_audit_logging = true

Production Deployment Checklist:

Deployment Strategies

1. Rolling Update (Recommended for Production)

Characteristics:

Zero downtime
Gradual rollout (10% per step)
Automatic rollback on failure
Duration: ~50s for 5 nodes

Helm Deployment:

# values.yaml
strategy:
    type: RollingUpdate
    rollingUpdate:
        maxSurge: 1
        maxUnavailable: 0

updateStrategy:
    type: RollingUpdate
    rollingUpdate:
        maxSurge: 1
        maxUnavailable: 0

Deploy:

# Initial deployment
helm install timechain ./chart --namespace production

# Update to new version
helm upgrade timechain ./chart \
  --namespace production \
  --values values.yaml

Monitor Progress:

# Watch rollout
kubectl rollout status deployment/timechain -n production

# Check rollout history
kubectl rollout history deployment/timechain -n production

# Rollback if needed
kubectl rollout undo deployment/timechain -n production

2. Canary Deployment (Staging Validation)

Characteristics:

Validates new version with small traffic %
Automatic traffic shift
Duration: ~20s total

Deployment Steps:

# 1. Deploy canary (10% traffic)
kubectl set image deployment/timechain \
  timechain=timechain:v1.0.1 \
  --record -n staging

# 2. Monitor canary metrics
kubectl top pod -n staging

# 3. If healthy, promote canary to stable
# (Traffic shift to 100%)

# 4. If unhealthy, rollback
kubectl rollout undo deployment/timechain -n staging

3. Blue-Green Deployment (Minimal Risk)

Characteristics:

Parallel environments
Instant switchover
Easy rollback
Duration: ~5s switch

Helm Deployment:

# Deploy green version
helm install timechain-green ./chart \
  --namespace production \
  --values values-green.yaml

# Test green version
kubectl port-forward svc/timechain-green 8080:8080
# Run smoke tests...

# Switch traffic to green
kubectl patch service timechain -p \
  '{"spec":{"selector":{"version":"green"}}}'

# Keep blue running for quick rollback

4. Immediate Deployment (Development Only)

Use: Feature development, local testing

# Direct deployment
cargo build --release
./target/release/timechain --config config/development.toml

Configuration Guide

Basic Configuration

# .tcproto/config.toml

[deployment]
# Environment profile: development, staging, production
environment = "staging"

# Number of replicas
replicas = 3

# TLS certificate configuration
tls_enabled = true
tls_cert_path = "/etc/timechain/tls/cert.pem"
tls_key_path = "/etc/timechain/tls/key.pem"

# Backup configuration
backup_enabled = true
backup_interval_hours = 6
backup_storage = "s3://timechain-backups"
backup_retention_days = 30

Advanced Configuration

[monitoring]
# Prometheus metrics
metrics_enabled = true
metrics_port = 9090

# Logging configuration
logging_level = "info"  # debug, info, warn, error
logging_format = "json"
log_output = "stdout"

# Health checks
health_check_interval_ms = 10000
health_check_timeout_ms = 5000

[performance]
# Thread pool size (typically 2x CPU cores)
thread_pool_size = 8

# In-memory buffer size
buffer_size = 16384

# Maximum concurrent connections
max_connections = 5000

# Operation timeout
operation_timeout_ms = 30000

[security]
# Rate limiting
rate_limit_per_second = 10000
burst_limit = 20000

# Authentication
auth_enabled = true
auth_header = "Authorization"

# Encryption
encryption_key_path = "/etc/timechain/encryption/key"
cipher = "AEAD"

[database]
# Audit log database
db_type = "postgresql"
db_host = "postgres.production.svc.cluster.local"
db_port = 5432
db_name = "timechain_audit"
db_user = "timechain"
db_password_secret = "timechain-db-password"

Environment Variables

# Override config with environment variables
export TIMECHAIN_ENVIRONMENT=production
export TIMECHAIN_REPLICAS=5
export TIMECHAIN_TLS_ENABLED=true
export TIMECHAIN_BACKUP_STORAGE=s3://my-bucket
export TIMECHAIN_LOG_LEVEL=info

Monitoring Setup

Prometheus Configuration

# prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
    name: prometheus-config
    namespace: production
data:
    prometheus.yml: |
        global:
          scrape_interval: 15s
          evaluation_interval: 15s

        scrape_configs:
          - job_name: 'timechain'
            static_configs:
              - targets: ['localhost:9090']
            metrics_path: '/metrics'

Deploy Prometheus:

kubectl apply -f prometheus-config.yaml
kubectl apply -f prometheus-deployment.yaml

Key Metrics to Monitor

# E2E Latency (target: <11ms P99)
histogram_quantile(0.99, timechain_operation_latency_ms)

# Throughput (target: >3,000 ops/sec)
rate(timechain_operations_total[1m])

# Error Rate (target: <0.1%)
rate(timechain_errors_total[1m])

# Service Health
up{job="timechain"}

# Node Status
timechain_nodes_running / timechain_nodes_total

# Backup Status
timechain_backup_last_timestamp
timechain_backup_status

Grafana Dashboard

Create dashboard with panels for:

E2E latency (P50/P95/P99)
Throughput (ops/sec)
Error rate
Node health
CPU/memory usage
Storage consumption
Backup status

Example Panel (JSON):

{
    "title": "E2E Latency P99",
    "targets": [
        {
            "expr": "histogram_quantile(0.99, timechain_operation_latency_ms)"
        }
    ],
    "thresholds": [{ "value": 11, "color": "red" }]
}

Troubleshooting

Common Issues

1. Pods Not Starting

# Check pod status
kubectl describe pod -n production <pod-name>

# Check logs
kubectl logs -n production <pod-name>

# Common causes:
# - Insufficient resources: kubectl top nodes
# - Image pull failure: docker pull <image>
# - Config mount failed: kubectl get cm -n production

# Solution:
kubectl delete pod -n production <pod-name>
# Kubernetes will restart it

2. High Latency

# Check node performance
kubectl top nodes
kubectl top pod -n production

# Check network connectivity
kubectl exec -n production <pod-name> -- ping <other-pod>

# Check disk I/O
kubectl exec -n production <pod-name> -- iostat -x 1 5

# Solution:
# - Scale horizontally: kubectl scale deployment timechain --replicas=6
# - Increase resources: kubectl set resources deployment timechain -c=timechain \
#   --requests=cpu=1,memory=4Gi --limits=cpu=2,memory=8Gi

3. Backup Failures

# Check backup status
kubectl logs -n production deployment/timechain | grep backup

# Verify backup storage credentials
kubectl get secret timechain-backup-creds -n production

# Check storage permissions
kubectl exec -n production <pod-name> -- aws s3 ls s3://timechain-backups/

# Solution:
# - Verify S3 bucket exists: aws s3 ls
# - Check IAM permissions
# - Recreate backup creds: kubectl create secret generic timechain-backup-creds ...

4. Authentication Failures

# Check certificate expiration
kubectl get certificate -n production
kubectl describe certificate timechain-tls -n production

# Renew certificate if needed
kubectl delete certificate timechain-tls -n production
# cert-manager will automatically renew

# Check TLS configuration
kubectl exec -n production <pod-name> -- openssl x509 -in /etc/timechain/tls/cert.pem -text

Debugging Commands

# Full cluster status
kubectl get all -n production

# Recent events
kubectl get events -n production --sort-by='.lastTimestamp'

# Pod logs with timestamps
kubectl logs -n production <pod-name> --timestamps=true

# Previous pod logs (if pod crashed)
kubectl logs -n production <pod-name> --previous

# Execute command in pod
kubectl exec -n production <pod-name> -- <command>

# Interactive shell
kubectl exec -it -n production <pod-name> -- /bin/bash

# Port forward for local testing
kubectl port-forward -n production <pod-name> 8080:8080

Scaling

Horizontal Scaling (Add Nodes)

# Scale deployment
kubectl scale deployment timechain --replicas=10 -n production

# Verify scaling
kubectl get pods -n production | wc -l

# Monitor rollout
kubectl rollout status deployment/timechain -n production

Vertical Scaling (Increase Resources)

# Update resource requests/limits
kubectl set resources deployment timechain \
  -c=timechain \
  --requests=cpu=2,memory=8Gi \
  --limits=cpu=4,memory=16Gi \
  -n production

# Verify changes
kubectl get deployment -n production -o yaml | grep -A 10 resources

Auto-Scaling

# autoscaling.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: timechain-hpa
    namespace: production
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: timechain
    minReplicas: 5
    maxReplicas: 20
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70
        - type: Resource
          resource:
              name: memory
              target:
                  type: Utilization
                  averageUtilization: 80

Deploy HPA:

kubectl apply -f autoscaling.yaml
kubectl get hpa -n production --watch

Disaster Recovery

Backup Strategy

Automated Backups:

# backup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
    name: timechain-backup
    namespace: production
spec:
    schedule: "0 * * * *" # Hourly
    jobTemplate:
        spec:
            template:
                spec:
                    containers:
                        - name: backup
                          image: timechain:1.0.0
                          command: ["/backup.sh"]
                          env:
                              - name: BACKUP_STORAGE
                                value: "s3://timechain-backups"
                    restartPolicy: OnFailure

Restore Procedure

# 1. List available backups
kubectl exec -n production <pod-name> -- \
  aws s3 ls s3://timechain-backups/ --recursive

# 2. Restore from backup
kubectl exec -n production <pod-name> -- \
  aws s3 cp s3://timechain-backups/backup-2025-12-06.tar.gz /tmp/

# 3. Verify restore
kubectl exec -n production <pod-name> -- tar -tzf /tmp/backup-2025-12-06.tar.gz

# 4. Apply backup data
# (Specific to your data structure)

# 5. Verify data integrity
curl http://localhost:8080/health

Failover Procedure

# 1. Detect failure
kubectl get pods -n production | grep -i error

# 2. Remove failed node (if hardware failure)
kubectl drain node/<failed-node> --ignore-daemonsets

# 3. Kubernetes automatically respawns pods on healthy nodes

# 4. Verify cluster health
kubectl get nodes -n production
kubectl get pods -n production

Performance Tuning

CPU & Memory Optimization

# Monitor resource usage
kubectl top pods -n production --sort-by=memory

# Adjust if needed
kubectl set resources deployment timechain \
  -c=timechain \
  --requests=cpu=1,memory=4Gi \
  --limits=cpu=2,memory=8Gi

Network Optimization

# Enable network policies
kubectl apply -f network-policy.yaml

# Optimize DNS resolution
kubectl exec -n production <pod-name> -- nslookup kubernetes.default.svc.cluster.local

Storage Optimization

# Monitor disk usage
kubectl exec -n production <pod-name> -- df -h

# Enable compression
# (In config.toml)
[storage]
compression_enabled = true
compression_algorithm = "zstd"

Production Checklist

Before deploying to production:

Support & Escalation

For Issues:

Check logs: kubectl logs -n production deployment/timechain
Consult troubleshooting guide above
File issue: https://github.com/timechain/protocol/issues
Contact support: support@timechain.io

Document Info

Version: 1.0.0
Last Updated: December 6, 2025
Status: Production Ready ✅

​DEPLOYMENT_GUIDE.md

​TimeChain Protocol Stack - Production Deployment Guide

​Table of Contents

​Quick Start

​5-Minute Development Deployment

​15-Minute Staging Deployment

​Prerequisites

​System Requirements

​Kubernetes Cluster Setup

​Network Requirements

​Environment Setup

​1. Development Environment (Single Node)

​2. Staging Environment (3 Replicas)

​3. Production Environment (5+ Replicas)

​Deployment Strategies

​1. Rolling Update (Recommended for Production)

​2. Canary Deployment (Staging Validation)

​3. Blue-Green Deployment (Minimal Risk)

​4. Immediate Deployment (Development Only)

​Configuration Guide

​Basic Configuration

​Advanced Configuration

​Environment Variables

​Monitoring Setup

​Prometheus Configuration

​Key Metrics to Monitor

​Grafana Dashboard

​Troubleshooting

​Common Issues

​1. Pods Not Starting

​2. High Latency

​3. Backup Failures

​4. Authentication Failures

​Debugging Commands

​Scaling

​Horizontal Scaling (Add Nodes)

​Vertical Scaling (Increase Resources)

​Auto-Scaling

​Disaster Recovery

​Backup Strategy

​Restore Procedure

​Failover Procedure

​Performance Tuning

​CPU & Memory Optimization

​Network Optimization

​Storage Optimization

​Production Checklist

​Support & Escalation

​Document Info

DEPLOYMENT_GUIDE.md

TimeChain Protocol Stack - Production Deployment Guide

Table of Contents

Quick Start

5-Minute Development Deployment

15-Minute Staging Deployment

Prerequisites

System Requirements

Kubernetes Cluster Setup

Network Requirements

Environment Setup

1. Development Environment (Single Node)

2. Staging Environment (3 Replicas)

3. Production Environment (5+ Replicas)

Deployment Strategies

1. Rolling Update (Recommended for Production)

2. Canary Deployment (Staging Validation)

3. Blue-Green Deployment (Minimal Risk)

4. Immediate Deployment (Development Only)

Configuration Guide

Basic Configuration

Advanced Configuration

Environment Variables

Monitoring Setup

Prometheus Configuration

Key Metrics to Monitor

Grafana Dashboard

Troubleshooting

Common Issues

1. Pods Not Starting

2. High Latency

3. Backup Failures

4. Authentication Failures

Debugging Commands

Scaling

Horizontal Scaling (Add Nodes)

Vertical Scaling (Increase Resources)

Auto-Scaling

Disaster Recovery

Backup Strategy

Restore Procedure

Failover Procedure

Performance Tuning

CPU & Memory Optimization

Network Optimization

Storage Optimization

Production Checklist

Support & Escalation

Document Info