Skip to main content

KALNET Architecture Expansion: Agent Prompt Strategy

Overview: The Build Pipeline

┌─────────────────────────────────────────────────────────────────┐
│                    KALNET EXPANSION STRATEGY                    │
│                                                                 │
│  Three Execution Stages + One Validation Stage                 │
│  Each stage has serial dependencies + parallel execution       │
│                                                                 │
│  STAGE 0 (Foundation)    → STAGE 1 (Core Services)             │
│                          → STAGE 2 (Integration & Polish)      │
│                          → STAGE 3 (Validation & Docs)         │
│                                                                 │
│  Execution model:                                               │
│  - Stages must execute sequentially (stage N+1 depends on N)   │
│  - Within each stage, agents CAN run in parallel               │
│  - Some tasks have internal dependencies (marked with ※)       │
└─────────────────────────────────────────────────────────────────┘

STAGE 0: Foundation & Infrastructure

Duration: ~30 mins | Parallelization: 4 agents | Dependencies: None This stage creates the scaffolding everything else depends on. Run all 4 agents in parallel.

Agent 0.1: Network & Infrastructure Layer

Role: Define networking, expose services, create ingress patterns Prompt:
You are building KALNET's networking and infrastructure layer.
Your task is to create:

1. Enhanced docker-compose.yml that includes:
   - Traefik reverse proxy service (listens on 80, 443)
   - Service labels for automatic routing
   - Health checks on all services
   - Resource limits (CPU, memory)
   - Restart policies with backoff
   - Network segmentation (if needed for security)

2. Traefik configuration:
   - File-based provider for routing rules
   - Let's Encrypt support (self-signed for now)
   - Dashboard on /dashboard

3. Environment configuration:
   - .env.example file with all configurable variables
   - Instructions for customizing domain, ports, etc.

4. Network documentation:
   - Updated architecture showing Traefik as entry point
   - Port mapping diagram (which container maps to which public port)
   - SSL/TLS flow diagram

Requirements:
- All services should be accessible through Traefik
- No port conflicts with existing setup
- Backward compatible (existing docker-compose still works)
- Include validation commands to verify Traefik is routing correctly

Deliverables:
- docker-compose.prod.yml (production-ready with Traefik)
- traefik-config.yml
- .env.example
- KALNET-NETWORKING.md (architecture + routing rules)

Agent 0.2: Persistent Storage & Data Architecture

Role: Design data layout, backup strategy, volume management Prompt:
You are designing KALNET's persistent storage and data architecture.
Your task is to create:

1. Enhanced volume strategy:
   - Map out each service's data needs (read-only vs read-write)
   - Design separation: config vs cache vs media vs user data
   - Create mount points that make sense for backups/recovery
   - Add a backup volume strategy

2. Storage initialization script (bash):
   - Creates directory structure on host machine
   - Sets proper ownership (docker user)
   - Sets proper permissions (644 for files, 755 for dirs)
   - Pre-creates required subdirectories for each service
   - Generates a storage inventory

3. Data migration guide:
   - How to move media INTO nas-storage from existing locations
   - How to structure media folders for Jellyfin (Movies/, Shows/, Games/)
   - Backup strategy (what to back up, what's ephemeral)

4. Documentation:
   - Data flow diagram (which service writes/reads what)
   - Storage sizing calculator (MB per game, per movie, per workflow)
   - Recovery procedures (how to restore from backup)

Requirements:
- Idempotent (safe to run multiple times)
- Cross-platform aware (paths work on Linux/Mac/Windows hosts)
- Clear ownership/permission model
- Include validation script to verify storage is ready

Deliverables:
- init-storage.sh (initialization script)
- KALNET-STORAGE.md (architecture + recovery procedures)
- storage-structure.txt (example directory tree)
- backup-strategy.md

Agent 0.3: Observability & Monitoring Foundation

Role: Logging, metrics, health checks infrastructure Prompt:
You are building KALNET's observability layer (logging, metrics, monitoring).
Your task is to create:

1. Centralized logging:
   - Add logging driver to docker-compose (json-file with rotation)
   - Create a log aggregation container (e.g., Loki or simple file-based)
   - Configure all services to log to centralized store
   - Create log levels for each service (debug, info, warn, error)

2. Health check framework:
   - Define health check probes for each service:
     * HTTPEndpoint (for web services)
     * TCPPort (for Samba)
     * Custom script (if needed)
   - Add timeouts, intervals, retry logic
   - Create a health dashboard API endpoint

3. Metrics & monitoring:
   - Prometheus-compatible metrics endpoints (or mock them)
   - Create a lightweight metrics collector (Go-based)
   - Track: container uptime, service response times, volume usage
   - Metrics should integrate with Discovery Service (extended version)

4. Alerting configuration:
   - Define alert thresholds (service down, disk full, high CPU)
   - Create alerting rules file
   - Integration points (where alerts go: email, webhook, etc.)

5. Monitoring dashboard:
   - Create a simple HTML dashboard that shows:
     * Service status (color-coded: green/yellow/red)
     * Last health check time
     * Resource usage (from docker stats)
     * Recent logs
   - Can be served from Discovery Service (add /dashboard route)

Requirements:
- Non-invasive (doesn't break existing services)
- Lightweight (no performance impact on weak machines)
- Extensible (easy to add new metrics)
- Include sample alert rules

Deliverables:
- observability-compose-fragment.yml (docker-compose additions)
- metrics-collector.go (lightweight metrics service)
- alerting-rules.yaml
- dashboard.html + dashboard.go (serving logic)
- KALNET-OBSERVABILITY.md

Agent 0.4: Security & Authentication Foundation

Role: User management, credential storage, access control patterns Prompt:
You are designing KALNET's security and authentication layer.
Your task is to create:

1. Authentication strategy:
   - Design authentication model for all services:
     * Jellyfin: use built-in auth (with local user accounts)
     * n8n: use built-in auth (with admin setup)
     * Samba: continue with guest (OR design simple user management)
     * Discovery: API key authentication (simple token-based)
   - Document how users are created/managed in each service

2. Secrets management:
   - Create a secrets directory structure:
     * API keys for external services
     * Database credentials (future)
     * Encryption keys (for sensitive data)
   - Design a secret injection pattern for docker-compose
   - DO NOT store secrets in git (add to .gitignore)
   - Create a secrets template file (.secrets.example)

3. Network security:
   - Document internal vs external network access
   - Design firewall rules (if using ufw)
   - SSL/TLS strategy (self-signed for MVP, Let's Encrypt for prod)
   - CORS policies for API access

4. Access control matrix:
   - Create a table showing:
     * Who can access what service
     * What actions are allowed
     * What data is accessible
   - Include: family members, external guests, automated systems

5. Security checklist:
   - Pre-deployment security validation
   - Post-deployment hardening steps
   - Regular security maintenance (password rotation, updates)

Requirements:
- Family-friendly (easy enough for non-technical family members)
- Flexible (supports both strict and permissive configurations)
- Secure by default (but with documented overrides)
- Clear documentation of security trade-offs

Deliverables:
- secrets-template/.secrets.example
- security-config.yaml (centralized security settings)
- KALNET-SECURITY.md (architecture + checklist)
- init-security.sh (setup script for permissions, users)
- firewall-rules.sh (ufw rules for ArchLinux)

STAGE 1: Core Services & Capabilities

Duration: ~45 mins | Parallelization: 4-5 agents | Dependencies: Stage 0 Once infrastructure is in place, build out the actual services. These can mostly run in parallel, with a few internal dependencies noted.

Agent 1.1: Game Streaming & Remote Access

Role: Sunshine/Moonlight setup for in-home game streaming Prompt:
You are building game streaming capabilities for KALNET (Sims, Crazy Taxi, etc).
Your task is to create:

1. Game streaming service (Sunshine):
   - Create a Sunshine Docker container (or setup guide for bare-metal install)
   - Configure for low-latency streaming
   - Audio/video encoding profiles for different network conditions
   - Game detection/shortcuts configuration

2. Client support documentation:
   - Moonlight client setup for different devices:
     * iOS/iPad
     * Android
     * Windows
     * Mac
     * Linux
   - Per-platform configuration and troubleshooting

3. Integration with Jellyfin:
   - Design how games are catalogued in Jellyfin
   - Metadata format for games (artwork, description, play instructions)
   - Launch shortcuts from Jellyfin UI → Sunshine

4. Performance tuning guide:
   - Bitrate recommendations based on network
   - Resolution/FPS trade-offs
   - CPU vs GPU encoding decisions
   - Network optimization (WiFi 6, QoS settings)

5. Game management:
   - Scripts to import games into the system
   - Game library organization structure
   - How to add new games easily

Requirements:
- Works on weak hardware (Raspberry Pi-grade minimum)
- Supports WiFi streaming (with degradation graceful)
- Easy game discovery and launching
- Family-friendly UI

Deliverables:
- sunshine-docker-compose.yml (or setup guide)
- sunshine-config.xml
- game-import-script.sh
- KALNET-GAME-STREAMING.md
- client-setup-guides/ (per-platform)

Agent 1.2: Private Communications & Social Layer

Role: End-to-end encrypted family messaging (Matrix/Synapse or alternative) Prompt:
You are building KALNET's private communications layer (family chat, notifications).
Your task is to create:

1. Communication service setup (Matrix/Synapse OR simple alternative):
   - Docker container for messaging service
   - Client library for mobile/desktop access
   - E2E encryption by default
   - Message history (searchable, with storage limits)

2. Integration points:
   - n8n → send notifications to family chat
   - Jellyfin → send "X is watching Y" notifications
   - Service alerts → send to notification channel
   - File share notifications ("file added to NAS")

3. Client apps:
   - Setup guides for Element (Matrix client) on:
     * iOS
     * Android
     * Web (served from KALNET itself)
   - Deeplinks for easy joining rooms

4. Chat room structure:
   - Design default rooms for family:
     * #general
     * #alerts (service notifications)
     * #requests (movie/game requests)
     * #projects (collaborative tasks)
   - Invite management (keep family-only)

5. Backup & recovery:
   - How to backup chat history
   - Encryption key backup (for recovery)
   - Privacy/retention settings

Requirements:
- E2E encrypted (family privacy)
- Mobile-friendly (iPhone, Android)
- Works on constrained hardware
- Supports rich media (links, images, files)
- Integrates with n8n (webhooks to send messages)

Deliverables:
- comms-docker-compose.yml
- comms-init.sh (room creation, user setup)
- client-setup-guides/ (per-platform)
- KALNET-COMMUNICATIONS.md

Agent 1.3: Custom Code Deployment & CI/CD

Role: GitOps pipeline for deploying custom services and containers Prompt:
You are building KALNET's code deployment and CI/CD pipeline.
Your task is to create:

1. Git-based deployment system:
   - Design a simple git webhook receiver
   - Triggers on push to specific branches (main → auto-deploy, dev → staging)
   - Validates, builds, and deploys services

2. Build configuration:
   - Dockerfile templates for common languages (Go, Python, Node, Rust)
   - Build caching strategy (minimize rebuild times)
   - Multi-stage builds (optimize image size)
   - Private registry setup (if needed)

3. Deployment pipeline:
   - Pre-deployment validation (linting, tests)
   - Health checks post-deployment
   - Automatic rollback on failure
   - Deployment status notifications to family chat

4. Service templates:
   - Template for "deploying a Go HTTP service"
   - Template for "deploying a Python automation script"
   - Template for "deploying a Node.js microservice"
   - Each includes: Dockerfile, docker-compose snippet, health check

5. Developer experience:
   - Local development setup (docker-compose for dev)
   - Debug mode (verbose logging, hot reload if possible)
   - Deployment instructions (git push → live in <30s)

6. Rollback & versioning:
   - Automatic version tagging (by commit hash or semver)
   - One-command rollback ("deploy [service] [version]")
   - Keep last 3 versions of each service

Requirements:
- Works with Go, Python, TypeScript (your languages)
- Simple enough you can deploy mid-conversation
- Fast feedback (build + deploy < 2 minutes)
- Clear logging of deployment success/failure
- Secure (no accidental secrets in logs)

Deliverables:
- webhook-receiver.go (git webhook listener)
- deployment-pipeline.sh (build + deploy orchestration)
- .dockerfile.templates/ (per-language)
- KALNET-DEPLOYMENT-PIPELINE.md
- deploy.sh (user-friendly deploy script)

Agent 1.4: Advanced n8n Workflows & Integrations ※

Role: Pre-built workflows for common KALNET tasks (depends on comms from 1.2) Prompt:
You are designing KALNET's automation layer using n8n.
Your task is to create:

1. Pre-built workflow templates:
   a) "File added to NAS" → notification to family chat
   b) "Movie added to Jellyfin" → send details to family
   c) "Service goes down" → alert family
   d) "Scheduled backup" → runs daily, stores to external drive
   e) "Family request" → new movie/game requested via form

2. Workflow development guide:
   - How to create a new workflow in n8n
   - Common patterns (webhooks, HTTP, file operations)
   - Integration with external services:
     * Unsplash (for random art)
     * Weather API (display in dashboard)
     * Calendar sync (show family calendar in dashboard)
     * Custom APIs (your own Go services)

3. Secrets management in n8n:
   - How to store API keys safely
   - Use environment variables from .env
   - Rotate credentials (procedure + checklist)

4. Testing & debugging:
   - How to test a workflow without running it
   - Debug mode (step-by-step execution)
   - Common failure modes and fixes

5. Workflow library:
   - Exported n8n workflows (JSON files)
   - Import instructions
   - Per-workflow documentation

Requirements:
- Workflows can notify via the comms service (from 1.2)
- All async/non-blocking (don't pause KALNET)
- Clear error handling (don't crash on external API failure)
- Logging visible in KALNET dashboard

Deliverables:
- n8n-workflows/ (exported workflow JSONs)
- n8n-integration-guide.md
- workflow-templates/ (templates for common patterns)
- KALNET-AUTOMATION.md
- secrets-example.env (for API keys in workflows)

Agent 1.5: Extended Discovery Service & Service Mesh

Role: Enhanced discovery with metrics, service routing, and inter-service communication Prompt:
You are extending the KALNET Discovery Service into a lightweight service mesh.
Your task is to create:

1. Enhanced discovery service (Go):
   - Extend main.go with:
     * Service registration API (POST /register)
     * Custom metadata (version, tags, capabilities)
     * Service-to-service authentication (mTLS or simple API key)
     * Weighted routing (for canary deployments)
     * Circuit breaker pattern (fail open/closed)

2. Service mesh capabilities:
   - Service location (DNS-like, returns IP:port)
   - Load balancing (round-robin or weighted)
   - Retry logic (configurable per service)
   - Rate limiting (token bucket per service)
   - Request tracing (correlation IDs)

3. Admin API:
   - Register new service: POST /api/services
   - Deregister: DELETE /api/services/{name}
   - Get service details: GET /api/services/{name}
   - Update service metadata: PATCH /api/services/{name}
   - View all deployments: GET /api/deployments

4. Integration with deployment pipeline:
   - On successful deploy, auto-register service
   - On service shutdown, auto-deregister
   - Health check updates every 10s (more frequent than before)

5. Observability:
   - Metrics: request count, latency, error rate (per service)
   - Tracing: log inter-service calls
   - Dashboard: visualize service topology
   - Alerts: when service latency > threshold

Requirements:
- Stateless (can run multiple instances)
- Fast (sub-millisecond lookups)
- Extensible (easy to add new mesh policies)
- Go-idiomatic code (using your preferred patterns)

Deliverables:
- enhanced-discovery-service.go (extended version)
- service-mesh-config.yaml
- KALNET-SERVICE-MESH.md
- examples/ (example service registration)

STAGE 2: Integration & Polish

Duration: ~30 mins | Parallelization: 3 agents | Dependencies: Stage 1 Now that core services exist, integrate them and polish the experience.

Agent 2.1: Unified Dashboard & UI Layer

Role: Web UI for discovering, configuring, and monitoring all KALNET services Prompt:
You are building KALNET's unified dashboard (web UI).
Your task is to create:

1. Dashboard application (React or vanilla JS):
   Single page showing:
   - Service status (connected to Discovery Service)
   - Quick launch buttons (Jellyfin, n8n, etc.)
   - Recent activity (what's being watched, files added, etc.)
   - System health (CPU, memory, disk, network)
   - Family chat notifications (from comms service)
   - Alerts and warnings

2. Dashboard features:
   a) Service Cards:
      - Service name, status (green/red), uptime
      - Quick links to service UI
      - Last health check time
   
   b) Activity Feed:
      - "Movie X added to Jellyfin"
      - "User Y started watching Z"
      - "Service X went down" (with timestamp)
      - "Backup completed"
      - Filterable by type and time
   
   c) System Metrics:
      - Real-time graphs (last 24h):
        * CPU usage
        * Memory usage
        * Disk usage
        * Network throughput
      - Trends (is it getting worse?)
   
   d) Quick actions:
      - Restart service (button → triggers deployment pipeline)
      - View logs (tail last 50 lines)
      - Run backup (immediately)
      - Trigger workflow (dropdown of n8n workflows)

3. Dashboard API:
   - GET /api/dashboard/status (all services + metrics)
   - GET /api/dashboard/activity (recent events)
   - GET /api/dashboard/metrics (time-series data)
   - POST /api/dashboard/actions/{action} (quick actions)

4. Authentication:
   - Simple login (family member accounts)
   - Session management (stay logged in for 30 days)
   - Per-user preferences (dark mode, notifications, etc.)

5. Mobile responsive:
   - Works great on phones (controls are touch-friendly)
   - Responsive layout (scales from 320px to 4K)

Requirements:
- Single page app (no page reloads)
- WebSocket for real-time updates (optional, fallback to polling)
- Works offline (cached data)
- <500KB initial load (performance critical)
- Accessibility (WCAG 2.1 AA)

Deliverables:
- dashboard/ (React or vanilla JS app)
- dashboard-server.go (serves dashboard + API)
- KALNET-DASHBOARD.md
- screenshots/ (example UI states)

Agent 2.2: Configuration & Setup Wizard

Role: Interactive setup experience for first-time deployment Prompt:
You are building KALNET's setup wizard and configuration system.
Your task is to create:

1. Interactive setup wizard:
   - First-time setup (runs on first deploy)
   - Step-by-step configuration:
     a) Basic info (network name, your name, family members)
     b) Service selection (which services to enable)
     c) Storage location (where to store media/files)
     d) Network exposure (internal-only vs external with TLS)
     e) Backup location (where to backup data)
     f) User accounts (create family member accounts)
   - Validates all inputs before proceeding
   - Generates config files automatically

2. Configuration management:
   - Single config file (YAML or TOML)
   - All configurable options documented
   - Ability to re-run wizard to update config
   - Environment variable overrides (for Docker/k8s)

3. Configuration validation:
   - Pre-flight checks:
     * Disk space available?
     * Ports available?
     * Required permissions?
     * Docker installed?
   - Post-setup checks:
     * Can reach all services?
     * All volumes mounted?
     * All users created?

4. Advanced configuration:
   - For power users (like you):
     * Manual YAML editing
     * API-based config changes
     * Bulk operations (enable 10 services at once)

5. Configuration backup & restore:
   - Export config to portable file
   - Share config between machines
   - Restore config from backup

Requirements:
- Non-technical friendly (parents could do this)
- Fast (<5 minutes for complete setup)
- Idempotent (safe to re-run)
- Clear error messages (not cryptic)

Deliverables:
- setup-wizard.go (interactive setup)
- config-schema.yaml (defines valid config)
- config.example.yaml (fully documented example)
- KALNET-CONFIGURATION.md
- validate-config.sh (standalone validation)

Agent 2.3: Documentation & Onboarding

Role: Complete user and developer documentation Prompt:
You are creating comprehensive documentation for KALNET.
Your task is to create:

1. User documentation:
   a) Getting Started Guide
      - Installation (step-by-step with screenshots)
      - Initial configuration
      - Creating family accounts
      - Adding media to Jellyfin
      - Accessing from different devices
   
   b) Service guides (per service):
      - Jellyfin: watch videos, manage libraries, share with family
      - n8n: understanding workflows, creating automations
      - Samba: access files from Mac/Windows/Linux
      - Game streaming: play games from any room
      - Private chat: send encrypted messages
   
   c) Common tasks:
      - "How do I add a movie?"
      - "How do I watch from my iPhone?"
      - "How do I set up a backup?"
      - "My service went down, what do I do?"
      - "How do I add a new family member?"
   
   d) Troubleshooting guide:
      - "Service not responding"
      - "Can't connect from my phone"
      - "Performance is slow"
      - "Storage is full"
      - "I forgot my password"

2. Developer documentation:
   a) Architecture guide (boxes and arrows, your style):
      - System design
      - Data flow
      - Service interactions
      - Deployment model
   
   b) API reference:
      - Discovery Service API
      - Dashboard API
      - Deployment API
      - Custom service examples
   
   c) Contributing guide:
      - How to add a new service
      - How to add a new workflow
      - Code style and standards
      - Testing procedures
   
   d) Troubleshooting for developers:
      - Common build errors
      - Debug mode setup
      - Performance profiling
      - Log analysis

3. Quick reference cards:
   - CLI commands cheat sheet
   - API endpoints cheat sheet
   - Service ports quick ref
   - Keyboard shortcuts

4. Video tutorials (optional):
   - Setup walkthrough (2 mins)
   - Jellyfin basics (3 mins)
   - Game streaming setup (2 mins)
   - n8n workflow creation (5 mins)

Requirements:
- Written for your family (non-technical)
- But also for developers (technical depth available)
- Searchable and indexed
- Keyboard navigable
- Mobile-friendly

Deliverables:
- docs/ (Markdown-based docs site)
- docs/index.md (home page with quick links)
- docs/user/ (user guides)
- docs/developer/ (API reference + architecture)
- docs/troubleshooting/ (common issues)
- quick-ref-cards.pdf (printable cheat sheets)
- KALNET-DOCS.md (guide to the documentation)

STAGE 3: Validation & Testing

Duration: ~20 mins | Parallelization: 2 agents | Dependencies: Stage 2 Final validation and test automation.

Agent 3.1: Integration Tests & Validation Suite

Role: Automated testing to ensure everything works together Prompt:
You are building KALNET's test and validation suite.
Your task is to create:

1. Integration test suite (Go or similar):
   Tests that verify:
   - All services start successfully
   - Health checks pass
   - Discovery Service sees all services
   - Cross-service communication works:
     * Jellyfin can read from NAS
     * Samba can mount storage
     * n8n can reach external APIs
     * Dashboard can reach all services
   - User flows:
     * Create account → login → watch video → logout
     * Add file to NAS → appears in Jellyfin → can watch
     * Create automation → runs on schedule → notification sent
   - Error scenarios:
     * Service crash → auto-restart works
     * Network failure → services recover
     * Storage full → graceful degradation

2. Performance tests:
   - Measure startup time for all services
   - Measure discovery latency (<100ms)
   - Measure media streaming latency (<500ms)
   - Measure Samba file access latency
   - Load test: multiple concurrent Jellyfin streams

3. Stress tests:
   - Run for 24 hours
   - Monitor for memory leaks
   - Monitor for disk space issues
   - Monitor for network saturation

4. Validation checklist:
   - Security: ports not exposed to internet (unless configured)
   - Performance: baseline metrics documented
   - Reliability: auto-recovery from failures
   - Usability: all UIs responsive and fast

Requirements:
- Tests are automated (CI/CD compatible)
- Tests can run locally (docker-compose up && test)
- Clear pass/fail output
- Detailed failure reports (not just "FAIL")
- Run in <5 minutes

Deliverables:
- tests/ (test suite)
- tests/integration_test.go (main tests)
- tests/performance_test.go (benchmarks)
- tests/scenarios/ (user journey tests)
- KALNET-TESTING.md
- run-tests.sh (simple test runner)

Agent 3.2: Release & Deployment Validation

Role: Pre-release checklist and production deployment procedures Prompt:
You are creating KALNET's release and deployment validation process.
Your task is to create:

1. Pre-release checklist:
   - All tests pass ✓
   - Documentation updated ✓
   - Security review completed ✓
   - Performance acceptable ✓
   - No known critical bugs ✓
   - Database migrations safe ✓

2. Release process:
   - Semantic versioning (x.y.z)
   - Automated version tagging (git tags)
   - Changelog generation (what changed)
   - Release notes (user-friendly)
   - Docker image versioning (both :latest and :x.y.z)

3. Deployment procedures:
   a) Staging deployment:
      - Deploy to staging environment
      - Run smoke tests
      - Manual QA checklist
      - Get sign-off before production
   
   b) Production deployment:
      - Zero-downtime deployment strategy
      - Rollback procedure (if things go wrong)
      - Health checks post-deploy
      - Notification to family (deployment in progress, etc.)
   
   c) Rollback procedure:
      - One-command rollback to previous version
      - Data consistency checks after rollback
      - Status notification

4. Monitoring during deployment:
   - Real-time dashboard showing deployment progress
   - Service health checks every 10 seconds
   - Automatic rollback on health check failure
   - Notification on completion (success or failure)

5. Post-deployment validation:
   - All services healthy ✓
   - All users can access services ✓
   - No errors in logs ✓
   - Performance metrics normal ✓
   - Disk space OK ✓

Requirements:
- Deployments take <30 seconds
- Automatic rollback on failure
- Clear status indicators throughout
- Email/chat notifications of deployment status

Deliverables:
- release.sh (automated release process)
- deploy-prod.sh (production deployment)
- rollback.sh (one-command rollback)
- KALNET-DEPLOYMENT.md (procedures)
- pre-release-checklist.md (manual verification)

Execution Plan Summary

PARALLEL EXECUTION TIMELINE
───────────────────────────────────────────────────────────────────

STAGE 0 (Foundation)                           Duration: ~30 mins
├─ Agent 0.1: Networking             ├─ Agent 0.2: Storage
├─ Agent 0.3: Observability          └─ Agent 0.4: Security
│   All 4 run in parallel ─────────────────────────────────────────→
│                                                    ✓ Complete

STAGE 1 (Core Services)                        Duration: ~45 mins
├─ Agent 1.1: Game Streaming         ├─ Agent 1.2: Communications
├─ Agent 1.3: CI/CD Pipeline         ├─ Agent 1.4: n8n Workflows ※
└─ Agent 1.5: Enhanced Discovery
│   All 5 run in parallel ─────────────────────────────────────────→
│   (1.4 slightly dependent on 1.2, but can proceed in parallel)
│                                                    ✓ Complete

STAGE 2 (Integration & Polish)                  Duration: ~30 mins
├─ Agent 2.1: Dashboard & UI         ├─ Agent 2.2: Setup Wizard
└─ Agent 2.3: Documentation
│   All 3 run in parallel ─────────────────────────────────────────→
│                                                    ✓ Complete

STAGE 3 (Validation)                           Duration: ~20 mins
├─ Agent 3.1: Integration Tests
└─ Agent 3.2: Release Procedures
│   Both run in parallel ─────────────────────────────────────────→
│                                                    ✓ Complete

TOTAL WALL-CLOCK TIME: ~30 mins (if all agents run in parallel)
TOTAL AGENT-HOURS: ~5.5 hours (if run sequentially)

Agent Orchestration Model

Prompt Structure (What Each Agent Gets)

Each agent should receive:
CONTEXT:
- Your role and responsibility
- What already exists (from previous stages)
- What depends on your work
- Success criteria

TASK:
1. Primary deliverables
2. Secondary considerations
3. Integration points
4. Testing requirements

CONSTRAINTS:
- Code style/patterns
- Performance targets
- Security requirements
- File/directory structure

EXAMPLES:
- Example code snippets
- Example output format
- Example integration with other services

SUCCESS CRITERIA:
- Specific, measurable outcomes
- "Integration test passes" not "works"
- Checkable by human or automated test

Coordination Points

Between Stages:
  • Stage N agents must wait for Stage N-1 to complete
  • Completion marker: all primary deliverables present and validated
Within Stages:
  • Agents declare dependencies in their prompt
  • Example: “Agent 1.4 waits for Agent 1.2 to deploy comms service”
  • Fallback: if dependency not ready, agent stubs/mocks it
Handoff Checklist:
Before proceeding to next stage:
- [ ] All agents in current stage report complete
- [ ] All primary deliverables present
- [ ] Integration tests pass
- [ ] No critical blocking issues
- [ ] Documentation for stage is complete

Parallelization Constraints

Can Run in Parallel (No Dependencies)

  • Stage 0: All 4 agents (they build different concerns)
  • Stage 1: Agents 1.1, 1.3, 1.5 have no dependencies
  • Stage 1: Agents 1.2, 1.4 can proceed with mocks of each other
  • Stage 2: All 3 agents (dashboard, wizard, docs are independent)
  • Stage 3: Both agents (tests and release procedures)

Must Run Sequentially

  • Stages must complete before next stage starts (dependencies flow down)
  • Within stages, some soft dependencies exist (noted with ※)

Optimal Batch Sizes

  • Stage 0: 4 agents (I/O bound, good parallelization)
  • Stage 1: 5 agents (CPU bound for builds, good parallelization)
  • Stage 2: 3 agents (mixed, good parallelization)
  • Stage 3: 2 agents (validation & release, good parallelization)
Total agents available: 14 (could even run in fewer batches with right grouping)

Per-Agent Estimation

AgentComplexityDurationOutput Lines
0.1Medium8 mins250 (configs + docs)
0.2Low6 mins150
0.3High10 mins400
0.4Medium8 mins300
1.1High12 mins600
1.2High12 mins500
1.3High12 mins700
1.4Medium8 mins400
1.5High10 mins600
2.1High12 mins1000
2.2Medium8 mins400
2.3Low8 mins2000+ (docs)
3.1High12 mins800
3.2Medium10 mins350

Interdependencies Map

STAGE 0 (all independent)

        ├─────────────────────────────────────┐
        │                                     │
    STAGE 1.1-1.3,1.5 (independent)       STAGE 1.2 (comms)
        │                                     │
        ├──────────────┬──────────────────────┤
        │              │                      │
        │         STAGE 1.4 (n8n workflows)  │
        │          (soft dep on 1.2)         │
        │              │                      │
        │              └──────────┬───────────┘

        ├─ STAGE 2.1 (dashboard) ◄─ needs data from 1.1,1.2,1.3,1.5
        ├─ STAGE 2.2 (wizard)
        └─ STAGE 2.3 (docs)


        ┌──── STAGE 3.1 (tests)

        ├──── STAGE 3.2 (release)

How to Use This Document

For a single agent: Copy the prompt for that agent, adjust for your specific context. For batch execution: Use agents from same stage in parallel. Example batch 1:
# Run 4 agents in parallel for Stage 0
agent 0.1 &
agent 0.2 &
agent 0.3 &
agent 0.4 &
wait  # wait for all to complete
For sequential execution (if you want to do one agent at a time): Follow the numbered list (0.1 → 0.2 → 0.3 → 0.4 → 1.1 → … → 3.2) Customization: Feel free to:
  • Skip agents you don’t need yet (e.g., skip 0.3 observability for MVP)
  • Reorder within stages based on your priorities
  • Merge agents if some work is redundant
  • Split agents if they’re too large

Final Notes

  • Each prompt is self-contained (agent doesn’t need to read others)
  • Each prompt includes integration points (where it fits with others)
  • Each prompt specifies success criteria (how to know it’s done)
  • Prompts are written for workspace agents (Claude, but could be others)
This gives you a repeatable, parallelizable architecture for building KALNET from MVP to fully-featured home network. 🚀