HTTP MCP: The Server Architecture That Scales
The choice
When building orchestration platforms for AI agents, you face a critical architectural decision: how do your agents communicate with tools? Option 1: Subprocess invocationPerformance head-to-head
We integrated Traceo (26-tool MCP server) into PEBBLE using HTTP transport. Real measurements on April 15, 2026:Initialization cost
| Transport | Time | Components |
|---|---|---|
| HTTP | 195ms | Network (120ms) + server setup (50ms) + JSON parsing (25ms) |
| Subprocess | ~80ms | Process fork (60ms) + system startup (20ms) |
Per-tool execution
| Transport | Time | Explanation |
|---|---|---|
| HTTP | 20ms | Network roundtrip (15ms) + tool execution (5ms) |
| Subprocess | 15ms | No network, but: fork process (5ms) + load libraries (8ms) + execute (2ms) |
Total cost: Single tool call
| Transport | Time |
|---|---|
| HTTP | 195ms init + 20ms call = 215ms |
| Subprocess | 80ms init + 15ms call = 95ms |
Total cost: 10-tool workflow
| Transport | Time |
|---|---|
| HTTP | 195ms + (20ms × 10) = 395ms |
| Subprocess | 80ms + (15ms × 10) = 230ms |
Total cost: 100 simultaneous agents, each running 10 tools
| Transport | Total Time | Per-Agent Parallelism | Resource Overhead |
|---|---|---|---|
| HTTP | 395ms (amortized) | ✓ Fully parallel (1 server instance) | 1 running process |
| Subprocess | 2300ms+ (serialize or fork) | ✗ Limited (process table explosion) | 1000 processes spawned |
Why subprocess fails at scale
Process table explosion
Each tool call = new process:Fork() isn’t free
Process forking is expensive:| Operation | Cost |
|---|---|
fork() | 50-80ms (copy-on-write, still expensive) |
exec() | 20-40ms (replace process image) |
| Python startup | 20-60ms (import libraries) |
| Tool initialization | Variable |
Memory bloat
When subprocess wins
Subprocess is the right choice when:- Tool isolation required — Each tool runs in its own security boundary
- Extreme crash tolerance — Tool crashes don’t affect others (fork isolates failures)
- Incompatible dependencies — Tool A needs Python 2, Tool B needs Python 3 (separate processes required)
- Single-tool workflows — You’re only running one tool, so initialization cost doesn’t matter
When HTTP wins
HTTP is better when:- Multi-tool workflows — More than 5 tools per agent session
- Concurrent agents — Multiple agents/users, requesting tools simultaneously
- Scaling beyond a single machine — Need to add capacity, HTTP scales horizontally with load balancers
- Tool reuse — Same tools called repeatedly (connection pooling amortizes network cost)
- Infrastructure simplicity — One running process vs. process explosion
Architecture patterns
Pattern 1: Subprocess (Traditional)
Pattern 2: HTTP per tool (PEBBLE’s approach)
Pattern 3: HTTP + isolated containers (Enterprise)
Implementation checklist
If you’re moving from subprocess to HTTP MCP:- Choose FastMCP or similar HTTP framework
- Implement session management (mcp-session-id header)
- Add connection pooling (requests library, aiohttp, httpx)
- Deploy behind a load balancer (Nginx, HAProxy, cloud LB)
- Monitor latency with Prometheus (init + per-tool p95)
- Set up horizontal scaling rules (e.g., add instance when CPU > 70%)
- Test failover (one server down, traffic reroutes)
- Document connection limits (TCP connections, memory per server)
The verdict
For production AI agent platforms: HTTP MCP is the better default. Subprocess has its place (isolation requirements, crash tolerance, single-use tools). But if you’re building a platform for multiple agents running multiple tools, HTTP’s cost advantage compounds with scale. The initial 195ms initialization penalty is paid once. The 5ms savings per tool (connection reuse) multiply across thousands of calls. By the time you’ve orchestrated 50 tools, HTTP is cheaper — and 10x simpler to operate.What we built
PEBBLE uses HTTP MCP exclusively:- Traceo (requirements): HTTP at
mcp.traceo.cat - Future providers: HTTP or gRPC (not subprocess)
- Scaling: Add instances horizontally, route via LB