Context
TOMMY Phase 2 (dry run verification) required 15 webhook fire-and-diagnose iterations across a single session to get the 34-node n8n pipeline producing correct output. Each iteration revealed a distinct bug at a deeper layer of the pipeline.The Iteration Stack
| # | Bug | Phase | Fix Category |
|---|---|---|---|
| 1 | FORGE Shortcut If node strict type on null array | P0 | Type coercion |
| 2-3 | Anthropic API key missing / wrong credential | P2 | Credential binding |
| 4 | API credit balance exhausted | P2 | Billing |
| 5 | pipeline.yaml 404 crashes pipeline | P1 | Error handling |
| 6-7 | Assemble runs 2-3x (parallel fan-out) | P1→P5 | Architecture |
| 8 | Merge node passes empty data downstream | P3 | Data flow |
| 9 | Filter node array return in wrong mode | P3 | Return shape |
| 10-11 | Sequential chain breaks URL expressions | P1 | Expression refs |
| 12-14 | README URL has literal = prefix | P1 | UI save issue |
| 15 | SplitInBatches v3 skips loop body | P4 | Batch config |
What We Learned
The bug stack pattern: Complex workflows don’t fail at one point — they fail at the first point. Each fix reveals the next failure deeper in the pipeline. Debugging requires patience to iterate through every layer rather than assuming the first fix solves everything. API-driven debugging is faster than UI: Every diagnosis was done viacurl + python3 against the n8n REST API, not by clicking through the UI. The execution data contains the full node-by-node trace with input/output payloads.
Cost per iteration: Each dry run that reached the Claude API cost ~0.16 across 4 LLM-hitting runs. The other 11 iterations failed before reaching the LLM — zero token cost.