Overview
Operational runbooks provide step-by-step guidance for common SO1 platform tasks, from deploying new workflows to responding to production incidents. Each runbook includes prerequisites, detailed procedures, verification steps, and troubleshooting guidance.Target Audience: These runbooks are designed for platform operators, on-call engineers, and team leads responsible for SO1 platform operations.
Runbook Categories
Domain Runbooks
Operational guides organized by domain expertise:Orchestration
Factory Orchestrator workflows and FORGE gate management
Automation
n8n workflow deployment, webhook management, scheduling
Engineering
Backend API, frontend components, shared TypeScript patterns
DevOps
Railway deployments, GitHub Actions CI/CD, pipeline audits
Documentation
Mintlify docs, API specs, operational runbook creation
Prompts
Veritas prompt management, chain design, fragment curation
Incident
Incident response, triage procedures, postmortem analysis
System Runbooks
Cross-domain operational procedures:Deployment
End-to-end deployment procedures for all services
Incident Response
Complete incident lifecycle from detection to resolution
Monitoring & Alerts
Health monitoring, alert management, metric interpretation
Backup & Recovery
Data backup procedures and disaster recovery workflows
Using Runbooks Effectively
Quick Start Workflow
- Identify the task: Find the relevant runbook category
- Check prerequisites: Ensure you have required access and tools
- Follow procedures: Execute steps sequentially, verify each step
- Troubleshoot if needed: Use troubleshooting section for common issues
- Verify completion: Run verification checks before closing
Runbook Structure
Each runbook follows this consistent format: Overview: Purpose, scope, and when to use this runbook Prerequisites: Required access, tools, and knowledge Procedure: Step-by-step instructions with verification Verification: How to confirm successful completion Troubleshooting: Common issues and resolutions Related Runbooks: Links to complementary proceduresWhen to Create New Runbooks
Create a new runbook when:- You perform a task more than 3 times
- A task requires >5 steps or involves multiple systems
- Task is time-sensitive (incident response, deployments)
- Tribal knowledge needs documentation
- New team members frequently ask about the process
Runbook Creation: Use the Runbook Writer agent to generate new operational runbooks from task descriptions or incident learnings.
Runbook Maintenance
Review Schedule
| Frequency | Trigger | Action |
|---|---|---|
| Quarterly | Calendar | Review all runbooks for accuracy |
| Post-Incident | Major incident | Update affected runbooks within 48h |
| Post-Deployment | Architecture change | Update relevant deployment runbooks |
| On-Request | User feedback | Address inaccuracies or gaps |
Version Control
All runbooks are version-controlled in theso1-io/so1-content repository:
- Location:
runbooks/directory - Format: Mintlify MDX
- Review Process: PR review by domain experts
- Change Log: Document significant updates in git commit messages
Emergency Runbooks
Feedback & Improvements
Report Issues
Found an outdated procedure or unclear instruction?-
Open a GitHub issue in
so1-io/so1-contentwith:- Runbook title and section
- Description of the issue
- Suggested improvement (if applicable)
-
Submit a PR with corrections:
- Make changes to the runbook file
- Include rationale in PR description
- Tag domain experts for review
Suggest New Runbooks
Use the Runbook Writer agent or submit a request via:- GitHub issue with “runbook-request” label
- Internal Slack channel:
#so1-documentation - Direct message to documentation team
Related Documentation
- Agent Reference Guide - Detailed agent capabilities
- Architecture Overview - System design
- FORGE Execution Stages - Gate compliance
- API Reference - API endpoint documentation