Skip to main content

Overview

This runbook covers operational procedures for managing the SO1 Factory Orchestrator, including multi-agent workflow execution, FORGE gate validation, and stage transitions. These procedures ensure reliable orchestration of complex automation pipelines across all 6 FORGE stages. Purpose: Provide step-by-step instructions for operating orchestration agents and managing workflow execution Scope: Factory Orchestrator operations, FORGE Gatekeeper validation, stage management, workflow debugging Target Audience: Platform operators, DevOps engineers, automation architects

Prerequisites

  • Control Plane API access (CONTROL_PLANE_API_KEY)
  • Railway project access (SO1 Control Plane)
  • n8n workflow access (orchestration workflows)
  • GitHub repository access (so1-io/so1-agents)
  • curl or API client (Postman, Insomnia)
  • Railway CLI (railway command)
  • jq for JSON parsing
  • OpenCode with orchestration agents installed
  • Understanding of FORGE stages (Research → Design → Build → Test → Deploy → Monitor)
  • Familiarity with SO1 agent architecture
  • Basic knowledge of n8n workflow execution
  • Understanding of gate entry/exit criteria

Procedure 1: Initiate Factory Orchestrator Workflow

Step 1: Prepare Workflow Request

Identify the workflow type and gather required inputs:
# Example: Multi-agent automation pipeline
WORKFLOW_TYPE="automation_pipeline"
WORKFLOW_NAME="webhook-processing-system"
REQUIRED_AGENTS="webhook-engineer,hono-backend,railway-deployer"

Step 2: Create Orchestration Request

curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "workflow_name": "webhook-processing-system",
    "workflow_type": "automation_pipeline",
    "agents": [
      {
        "agent_id": "webhook-engineer",
        "stage": "design",
        "dependencies": []
      },
      {
        "agent_id": "hono-backend",
        "stage": "build",
        "dependencies": ["webhook-engineer"]
      },
      {
        "agent_id": "railway-deployer",
        "stage": "deploy",
        "dependencies": ["hono-backend"]
      }
    ],
    "forge_validation": true,
    "auto_advance": false
  }' | jq '.'
Expected Response:
{
  "workflow_id": "wf_9Kj2mP7nQz",
  "status": "initialized",
  "current_stage": "research",
  "next_agent": "webhook-engineer",
  "forge_gates": {
    "research_exit": "pending",
    "design_entry": "not_evaluated"
  }
}

Step 3: Monitor Workflow Progress

# Poll workflow status
watch -n 5 'curl -s https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq ".status, .current_stage, .forge_gates"'

Step 4: Handle Stage Transitions

When auto_advance: false, manually approve stage transitions:
# Approve transition from Research → Design
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/advance \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "from_stage": "research",
    "to_stage": "design",
    "gate_validation": "passed",
    "operator_notes": "Research artifacts validated, design requirements clear"
  }'

Procedure 2: FORGE Gate Validation

Step 1: Retrieve Gate Criteria

# Get gate requirements for current stage
curl -s https://control-plane.so1.io/api/v1/forge/gates/design/entry \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.'
Example Gate Criteria:
{
  "stage": "design",
  "gate_type": "entry",
  "criteria": [
    {
      "id": "research_artifacts",
      "description": "Research phase artifacts present",
      "required": true,
      "validation": "artifact_count >= 1"
    },
    {
      "id": "requirements_defined",
      "description": "Clear requirements documented",
      "required": true,
      "validation": "requirements.length > 0"
    }
  ]
}

Step 2: Invoke FORGE Gatekeeper

curl -X POST https://control-plane.so1.io/api/v1/agents/forge-gatekeeper/validate \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "workflow_id": "wf_9Kj2mP7nQz",
    "gate_type": "exit",
    "current_stage": "research",
    "artifacts": {
      "research_documents": ["market-analysis.md", "competitor-review.md"],
      "requirements": ["REQ-001", "REQ-002", "REQ-003"]
    },
    "strict_mode": true
  }' | jq '.'
Expected Response:
{
  "validation_id": "val_3XyZ9mNpQr",
  "gate_status": "passed",
  "timestamp": "2026-03-10T14:23:45Z",
  "criteria_results": [
    {
      "criterion_id": "research_artifacts",
      "status": "passed",
      "details": "Found 2 research documents"
    },
    {
      "criterion_id": "requirements_defined",
      "status": "passed",
      "details": "3 requirements documented"
    }
  ],
  "recommendation": "proceed",
  "next_stage": "design"
}

Step 3: Handle Gate Failures

If gate validation fails:
# Get detailed failure report
curl -s https://control-plane.so1.io/api/v1/forge/validations/val_3XyZ9mNpQr/report \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.failed_criteria'
Failure Response Example:
{
  "failed_criteria": [
    {
      "criterion_id": "test_coverage",
      "required": ">=80%",
      "actual": "67%",
      "remediation": "Add unit tests to increase coverage"
    }
  ]
}
Remediation Actions:
  1. Review failed criteria and remediation steps
  2. Execute required fixes (add tests, complete documentation, etc.)
  3. Re-run gate validation
  4. Document remediation in workflow notes

Procedure 3: Manage Agent Dependencies

Step 1: Visualize Dependency Graph

# Get workflow dependency graph
curl -s https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/dependencies \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.dependency_graph'
Example Output:
{
  "nodes": [
    {"agent_id": "webhook-engineer", "stage": "design", "status": "completed"},
    {"agent_id": "hono-backend", "stage": "build", "status": "in_progress"},
    {"agent_id": "railway-deployer", "stage": "deploy", "status": "blocked"}
  ],
  "edges": [
    {"from": "webhook-engineer", "to": "hono-backend", "type": "artifact"},
    {"from": "hono-backend", "to": "railway-deployer", "type": "deployment"}
  ]
}

Step 2: Resolve Blocked Dependencies

When an agent is blocked:
# Check blocking dependencies
curl -s https://control-plane.so1.io/api/v1/orchestration/agents/railway-deployer/blocked \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.blocking_agents'
Resolution Options:
Blocking ReasonResolution ActionCommand
Agent not startedManually trigger agentPOST /agents/{id}/trigger
Agent failedReview errors, retry agentPOST /agents/{id}/retry
Missing artifactsGenerate artifacts manuallyRun agent in standalone mode
Gate validation pendingComplete gate validationSee Procedure 2

Step 3: Override Dependencies (Emergency Only)

Dependency overrides should only be used in emergencies. This bypasses FORGE gate validation and can lead to incomplete workflows.
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/override \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "railway-deployer",
    "override_type": "dependency_skip",
    "reason": "Emergency deployment required",
    "operator": "ops-engineer@so1.io",
    "approval_ticket": "INC-12345"
  }'

Procedure 4: Debug Orchestration Failures

Step 1: Retrieve Workflow Logs

# Get full workflow execution logs
curl -s https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/logs \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  | jq '.logs[] | select(.level == "error")'

Step 2: Inspect Agent Execution

# Get specific agent execution details
curl -s https://control-plane.so1.io/api/v1/agents/hono-backend/executions/exec_7PqRs4TnVw \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '{
    status: .status,
    error: .error,
    artifacts: .artifacts,
    stage: .stage,
    duration_ms: .duration_ms
  }'

Step 3: Common Failure Patterns

SymptomRoot CauseResolution
Workflow stuck at gateGate criteria not metReview gate validation report, complete missing requirements
Agent execution timeoutLong-running agent taskIncrease timeout in agent config, check resource constraints
Missing artifactsAgent didn’t generate expected outputReview agent logs, verify agent configuration
Stage transition blockedManual approval requiredApprove transition via /advance endpoint
Circular dependencyInvalid workflow definitionUpdate workflow config to remove circular dependencies

Step 4: Retry Failed Workflow

# Retry entire workflow from last successful stage
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/retry \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "retry_from_stage": "build",
    "reset_artifacts": false,
    "skip_completed_agents": true
  }'

Verification Checklist

After completing orchestration operations, verify:

Troubleshooting

Issue: Workflow Not Starting

Symptoms: Workflow status remains initialized, no agents execute Possible Causes:
  • Invalid workflow configuration
  • Missing agent definitions
  • Control Plane API unavailable
Resolution:
# Validate workflow configuration
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/validate \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d @workflow-config.json | jq '.validation_errors'

# Check Control Plane health
curl -s https://control-plane.so1.io/health | jq '.'

Issue: FORGE Gate Always Fails

Symptoms: Gate validation consistently returns failed status Possible Causes:
  • Strict mode enabled with incomplete artifacts
  • Invalid gate criteria configuration
  • Missing required metadata
Resolution:
# Run validation in non-strict mode to identify issues
curl -X POST https://control-plane.so1.io/api/v1/agents/forge-gatekeeper/validate \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "workflow_id": "wf_9Kj2mP7nQz",
    "gate_type": "exit",
    "current_stage": "build",
    "strict_mode": false
  }' | jq '.warnings'

Issue: Agent Dependency Deadlock

Symptoms: Multiple agents in blocked state, none progressing Possible Causes:
  • Circular dependency in workflow definition
  • All agents waiting for each other’s output
Resolution:
# Detect circular dependencies
curl -s https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/dependencies/analyze \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.circular_dependencies'

# Break deadlock by manually triggering one agent
curl -X POST https://control-plane.so1.io/api/v1/agents/webhook-engineer/trigger \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -d '{"force": true, "skip_dependencies": true}'


Emergency Procedures

Emergency Workflow Termination

When a workflow must be stopped immediately:
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/terminate \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "reason": "Emergency termination - security issue detected",
    "operator": "ops-engineer@so1.io",
    "cleanup": true
  }'

Emergency Gate Bypass

Only use in critical production incidents. Requires approval ticket.
curl -X POST https://control-plane.so1.io/api/v1/forge/gates/bypass \
  -H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{
    "workflow_id": "wf_9Kj2mP7nQz",
    "gate_type": "exit",
    "stage": "test",
    "bypass_reason": "Critical production fix required",
    "approval_ticket": "INC-67890",
    "operator": "incident-commander@so1.io"
  }'