TASKSET 5 (SPAWN) - STRATEGIC BUILD PLAN
Component Overview
SPAWN (Semantic Preparation And Navigation Yield) is the Hydration System for Clari - responsible for enriching documents with metadata, populating MDX templates, extracting semantic content, and preparing content for real-time collaboration and search. Previous Layer: CAST (Tag Resolution & Linking) ✅ CompleteCurrent Layer: SPAWN (Hydration & Metadata Extraction) - Starting Now
Next Layer: STITCH (Content Coordination System)
Architecture Overview
4-Stage Build Strategy
STAGE 1: Metadata & FrontMatter (35% Effort)
Deliverables: 500 lines, 12+ tests- Metadata extractor (YAML/JSON frontmatter parsing)
- Schema validator (type-safe metadata validation)
- Template engine (Handlebars/Liquid-like syntax)
- Asset resolver (images, diagrams, code blocks)
ExtractMetadata()- Parse document frontmatterValidateSchema()- Validate against schema definitionsResolveAssets()- Find and link embedded assetsBuildMetadataIndex()- Create searchable metadata
STAGE 2: Content Enrichment (30% Effort)
Deliverables: 450 lines, 14+ tests- Semantic tagging integration (with CAST)
- Table of contents generation
- Section summaries extraction
- Code block highlighting metadata
- Diagram metadata extraction
EnrichContent()- Add semantic informationGenerateTOC()- Create hierarchical TOCExtractSummary()- Section summariesAnalyzeCodeBlocks()- Extract code metadataExtractDiagrams()- Parse Mermaid/SVG metadata
STAGE 3: Template Population (20% Effort)
Deliverables: 350 lines, 10+ tests- MDX component registration
- Template variable substitution
- Conditional content rendering
- Dynamic section rendering
- Cross-reference resolution
RegisterComponents()- Register MDX componentsPopulateTemplate()- Substitute variablesRenderConditional()- Conditional contentResolveCrossReferences()- @link/@reference/@see resolutionBuildComponentTree()- Hierarchical component structure
STAGE 4: Integration & API (15% Effort)
Deliverables: 280 lines, 8+ tests- REST API endpoints
- Service orchestration
- Cache integration
- Pipeline orchestration
- WebSocket preparation
POST /api/v1/spawn/enrich- Enrich single documentPOST /api/v1/spawn/batch- Batch enrichmentGET /api/v1/spawn/metadata/:doc_id- Retrieve metadataPOST /api/v1/spawn/template/:template_id- Render templateGET /api/v1/spawn/health- Health check
Implementation Schedule
| Phase | Duration | Files | Tests | LOC |
|---|---|---|---|---|
| Stage 1 | 2-3h | metadata.go, schema.go | 12+ | 500 |
| Stage 2 | 2-3h | enricher.go | 14+ | 450 |
| Stage 3 | 1-2h | templates.go | 10+ | 350 |
| Stage 4 | 1-2h | service.go, handlers.go | 8+ | 280 |
| Testing | 1-2h | spawn_test.go, integration_test.go | 20+ | 400 |
| Total | 8-12h | ~10 files | 64+ tests | 1,980 lines |
Technical Decisions
1. Metadata Format
- Primary: YAML frontmatter (compatible with most doc systems)
- Secondary: JSON frontmatter support
- Schema: JSON Schema for validation
- Example:
2. Enrichment Strategy
- Semantic tagging: Integration with CAST tag resolver
- Table of contents: Auto-generated from headings
- Cross-references: Resolved via CAST link resolution
- Code metadata: Language, line count, syntax highlighting info
- Asset resolution: CDN URLs, local paths, remote resources
3. Template Engine
- Format: Handlebars-compatible syntax
- Variables:
{{ variable_name }} - Conditionals:
{{#if condition}} ... {{/if}} - Loops:
{{#each array}} ... {{/each}} - Helpers:
{{ toUpperCase text }}
4. Caching Strategy
- Metadata cache: 1-hour TTL (60 minutes)
- Template cache: 24-hour TTL (persistent for static templates)
- Enrichment cache: Document version-based invalidation
- Asset cache: 7-day TTL with ETag validation
Dependencies
Internal
pkg/cast(CAST subsystem for tag resolution)pkg/sift(SIFT subsystem for document quality)
External
gopkg.in/yaml.v3- YAML parsingencoding/json- JSON parsingregexp- Pattern matching for content extractiongithub.com/Masterminds/sprig- Template functions (Handlebars helpers)
Success Criteria
| Criterion | Target | Definition |
|---|---|---|
| Test Coverage | 50+ tests | Unit + integration tests for all components |
| Pass Rate | 100% | All tests passing, 0 failures |
| Performance | <100ms | Metadata extraction <50ms, enrichment <100ms |
| Compilation | 0 errors | Clean Go build, type-safe throughout |
| Code Quality | Zero warnings | Consistent style, proper error handling |
| Functions | 100+ | Comprehensive function suite |
| Lines | 2,000+ | Production-grade implementation |
Risk Mitigation
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Template syntax errors | Medium | High | Comprehensive validation, early error reporting |
| Asset resolution failures | Medium | Medium | Graceful fallback, detailed logging |
| Performance degradation | Low | High | Aggressive caching, benchmarking |
| CAST integration issues | Low | High | Mock CAST for independent testing |
| Schema validation complexity | Low | Medium | Reusable validation library |
Go/No-Go Checkpoints
Before Starting
- CAST subsystem is production-ready (verify DELIVERY.md)
- Go 1.21+ environment configured
- PostgreSQL available for schema testing
- Development machine ready (4GB+ RAM recommended)
After Stage 1
- All metadata extraction tests pass
- Schema validation comprehensive
- Asset resolution working
- <50ms metadata extraction confirmed
After Stage 2
- Content enrichment tests pass
- CAST integration verified
- TOC generation accurate
- Code block analysis working
After Stage 3
- Template engine tests pass
- MDX component registration working
- Cross-references resolved
- Conditional rendering accurate
After Stage 4
- All API endpoints working
- Health check responsive
- Service orchestration complete
- Cache integration verified
Final Verification
- 50+ tests passing
- 100% pass rate confirmed
- Performance targets met
- Zero compilation errors
- Code coverage >80%
File Structure
Next Phase: STITCH
After SPAWN is complete:- STITCH - Content Coordination System
- Repository synchronization
- Cross-document linking
- Event-driven updates
- Real-time collaboration bridge
Status: Ready for Implementation
Authority: CTO Approval
Recommended: Proceed with Stage 1 immediately