Overview
This runbook covers operational procedures for managing the SO1 Factory Orchestrator, including multi-agent workflow execution, FORGE gate validation, and stage transitions. These procedures ensure reliable orchestration of complex automation pipelines across all 6 FORGE stages.
Purpose: Provide step-by-step instructions for operating orchestration agents and managing workflow execution
Scope: Factory Orchestrator operations, FORGE Gatekeeper validation, stage management, workflow debugging
Target Audience: Platform operators, DevOps engineers, automation architects
Prerequisites
- Control Plane API access (
CONTROL_PLANE_API_KEY)
- Railway project access (SO1 Control Plane)
- n8n workflow access (orchestration workflows)
- GitHub repository access (
so1-io/so1-agents)
- Understanding of FORGE stages (Research → Design → Build → Test → Deploy → Monitor)
- Familiarity with SO1 agent architecture
- Basic knowledge of n8n workflow execution
- Understanding of gate entry/exit criteria
Procedure 1: Initiate Factory Orchestrator Workflow
Step 1: Prepare Workflow Request
Identify the workflow type and gather required inputs:
# Example: Multi-agent automation pipeline
WORKFLOW_TYPE="automation_pipeline"
WORKFLOW_NAME="webhook-processing-system"
REQUIRED_AGENTS="webhook-engineer,hono-backend,railway-deployer"
Step 2: Create Orchestration Request
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"workflow_name": "webhook-processing-system",
"workflow_type": "automation_pipeline",
"agents": [
{
"agent_id": "webhook-engineer",
"stage": "design",
"dependencies": []
},
{
"agent_id": "hono-backend",
"stage": "build",
"dependencies": ["webhook-engineer"]
},
{
"agent_id": "railway-deployer",
"stage": "deploy",
"dependencies": ["hono-backend"]
}
],
"forge_validation": true,
"auto_advance": false
}' | jq '.'
Expected Response:
{
"workflow_id": "wf_9Kj2mP7nQz",
"status": "initialized",
"current_stage": "research",
"next_agent": "webhook-engineer",
"forge_gates": {
"research_exit": "pending",
"design_entry": "not_evaluated"
}
}
Step 3: Monitor Workflow Progress
# Poll workflow status
watch -n 5 'curl -s https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq ".status, .current_stage, .forge_gates"'
Step 4: Handle Stage Transitions
When auto_advance: false, manually approve stage transitions:
# Approve transition from Research → Design
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/advance \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"from_stage": "research",
"to_stage": "design",
"gate_validation": "passed",
"operator_notes": "Research artifacts validated, design requirements clear"
}'
Procedure 2: FORGE Gate Validation
Step 1: Retrieve Gate Criteria
# Get gate requirements for current stage
curl -s https://control-plane.so1.io/api/v1/forge/gates/design/entry \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.'
Example Gate Criteria:
{
"stage": "design",
"gate_type": "entry",
"criteria": [
{
"id": "research_artifacts",
"description": "Research phase artifacts present",
"required": true,
"validation": "artifact_count >= 1"
},
{
"id": "requirements_defined",
"description": "Clear requirements documented",
"required": true,
"validation": "requirements.length > 0"
}
]
}
Step 2: Invoke FORGE Gatekeeper
curl -X POST https://control-plane.so1.io/api/v1/agents/forge-gatekeeper/validate \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"workflow_id": "wf_9Kj2mP7nQz",
"gate_type": "exit",
"current_stage": "research",
"artifacts": {
"research_documents": ["market-analysis.md", "competitor-review.md"],
"requirements": ["REQ-001", "REQ-002", "REQ-003"]
},
"strict_mode": true
}' | jq '.'
Expected Response:
{
"validation_id": "val_3XyZ9mNpQr",
"gate_status": "passed",
"timestamp": "2026-03-10T14:23:45Z",
"criteria_results": [
{
"criterion_id": "research_artifacts",
"status": "passed",
"details": "Found 2 research documents"
},
{
"criterion_id": "requirements_defined",
"status": "passed",
"details": "3 requirements documented"
}
],
"recommendation": "proceed",
"next_stage": "design"
}
Step 3: Handle Gate Failures
If gate validation fails:
# Get detailed failure report
curl -s https://control-plane.so1.io/api/v1/forge/validations/val_3XyZ9mNpQr/report \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.failed_criteria'
Failure Response Example:
{
"failed_criteria": [
{
"criterion_id": "test_coverage",
"required": ">=80%",
"actual": "67%",
"remediation": "Add unit tests to increase coverage"
}
]
}
Remediation Actions:
- Review failed criteria and remediation steps
- Execute required fixes (add tests, complete documentation, etc.)
- Re-run gate validation
- Document remediation in workflow notes
Procedure 3: Manage Agent Dependencies
Step 1: Visualize Dependency Graph
# Get workflow dependency graph
curl -s https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/dependencies \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.dependency_graph'
Example Output:
{
"nodes": [
{"agent_id": "webhook-engineer", "stage": "design", "status": "completed"},
{"agent_id": "hono-backend", "stage": "build", "status": "in_progress"},
{"agent_id": "railway-deployer", "stage": "deploy", "status": "blocked"}
],
"edges": [
{"from": "webhook-engineer", "to": "hono-backend", "type": "artifact"},
{"from": "hono-backend", "to": "railway-deployer", "type": "deployment"}
]
}
Step 2: Resolve Blocked Dependencies
When an agent is blocked:
# Check blocking dependencies
curl -s https://control-plane.so1.io/api/v1/orchestration/agents/railway-deployer/blocked \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.blocking_agents'
Resolution Options:
| Blocking Reason | Resolution Action | Command |
|---|
| Agent not started | Manually trigger agent | POST /agents/{id}/trigger |
| Agent failed | Review errors, retry agent | POST /agents/{id}/retry |
| Missing artifacts | Generate artifacts manually | Run agent in standalone mode |
| Gate validation pending | Complete gate validation | See Procedure 2 |
Step 3: Override Dependencies (Emergency Only)
Dependency overrides should only be used in emergencies. This bypasses FORGE gate validation and can lead to incomplete workflows.
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/override \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"agent_id": "railway-deployer",
"override_type": "dependency_skip",
"reason": "Emergency deployment required",
"operator": "ops-engineer@so1.io",
"approval_ticket": "INC-12345"
}'
Procedure 4: Debug Orchestration Failures
Step 1: Retrieve Workflow Logs
# Get full workflow execution logs
curl -s https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/logs \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
| jq '.logs[] | select(.level == "error")'
Step 2: Inspect Agent Execution
# Get specific agent execution details
curl -s https://control-plane.so1.io/api/v1/agents/hono-backend/executions/exec_7PqRs4TnVw \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '{
status: .status,
error: .error,
artifacts: .artifacts,
stage: .stage,
duration_ms: .duration_ms
}'
Step 3: Common Failure Patterns
| Symptom | Root Cause | Resolution |
|---|
| Workflow stuck at gate | Gate criteria not met | Review gate validation report, complete missing requirements |
| Agent execution timeout | Long-running agent task | Increase timeout in agent config, check resource constraints |
| Missing artifacts | Agent didn’t generate expected output | Review agent logs, verify agent configuration |
| Stage transition blocked | Manual approval required | Approve transition via /advance endpoint |
| Circular dependency | Invalid workflow definition | Update workflow config to remove circular dependencies |
Step 4: Retry Failed Workflow
# Retry entire workflow from last successful stage
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/retry \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"retry_from_stage": "build",
"reset_artifacts": false,
"skip_completed_agents": true
}'
Verification Checklist
After completing orchestration operations, verify:
Troubleshooting
Issue: Workflow Not Starting
Symptoms: Workflow status remains initialized, no agents execute
Possible Causes:
- Invalid workflow configuration
- Missing agent definitions
- Control Plane API unavailable
Resolution:
# Validate workflow configuration
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/validate \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-H "Content-Type: application/json" \
-d @workflow-config.json | jq '.validation_errors'
# Check Control Plane health
curl -s https://control-plane.so1.io/health | jq '.'
Issue: FORGE Gate Always Fails
Symptoms: Gate validation consistently returns failed status
Possible Causes:
- Strict mode enabled with incomplete artifacts
- Invalid gate criteria configuration
- Missing required metadata
Resolution:
# Run validation in non-strict mode to identify issues
curl -X POST https://control-plane.so1.io/api/v1/agents/forge-gatekeeper/validate \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"workflow_id": "wf_9Kj2mP7nQz",
"gate_type": "exit",
"current_stage": "build",
"strict_mode": false
}' | jq '.warnings'
Issue: Agent Dependency Deadlock
Symptoms: Multiple agents in blocked state, none progressing
Possible Causes:
- Circular dependency in workflow definition
- All agents waiting for each other’s output
Resolution:
# Detect circular dependencies
curl -s https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/dependencies/analyze \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" | jq '.circular_dependencies'
# Break deadlock by manually triggering one agent
curl -X POST https://control-plane.so1.io/api/v1/agents/webhook-engineer/trigger \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-d '{"force": true, "skip_dependencies": true}'
Emergency Procedures
Emergency Workflow Termination
When a workflow must be stopped immediately:
curl -X POST https://control-plane.so1.io/api/v1/orchestration/workflows/wf_9Kj2mP7nQz/terminate \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"reason": "Emergency termination - security issue detected",
"operator": "ops-engineer@so1.io",
"cleanup": true
}'
Emergency Gate Bypass
Only use in critical production incidents. Requires approval ticket.
curl -X POST https://control-plane.so1.io/api/v1/forge/gates/bypass \
-H "Authorization: Bearer ${CONTROL_PLANE_API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"workflow_id": "wf_9Kj2mP7nQz",
"gate_type": "exit",
"stage": "test",
"bypass_reason": "Critical production fix required",
"approval_ticket": "INC-67890",
"operator": "incident-commander@so1.io"
}'