Skip to main content

Session-Managed HTTP: Stateful Communication on Stateless Transport

The paradox

HTTP is fundamentally stateless. No two requests know about each other. No connection concept. Every call is independent. Yet every non-trivial application needs some kind of state. Shopping carts, authentication, user context, tool state. The traditional answer: cookies, stored in the browser or server. This works for web browsers, but breaks down for:
  • Service-to-service communication (no browser)
  • Load-balanced clusters (which server holds the state?)
  • Stateful tools and agents (sessions need to survive across calls)
The question: How do you implement sessions over HTTP without violating its stateless nature? The answer: Session headers + connection pooling + distributed state. This isn’t new. But it’s rarely explained clearly. Let’s fix that.

The design

Step 1: Session header exchange

Client sends:
  POST /mcp/initialize
  Accept: application/json, text/event-stream
  
Server responds with:
  HTTP 200
  mcp-session-id: 01ARZ3NDEKTSV4RRFFQ69G5FAV
  
  {"jsonrpc": "2.0", "result": {...}}
The server creates an in-memory session, assigns it a UUID, and returns the ID.

Step 2: Client captures and reuses

# After initialization:
self._session_id = response.headers.get('mcp-session-id')

# On every subsequent call:
headers = {'mcp-session-id': self._session_id}
response = await self.post('/call', body, headers=headers)
The client remembers the ID and includes it in all future requests.

Step 3: Server retrieves and reuses

# On receiving a request:
session_id = request.headers.get('mcp-session-id')
session = server.sessions[session_id]  # O(1) lookup
# Execute tool with session context
result = await tool.execute(session.context)
The server has a session store (dict, Redis, database). Given an ID, it retrieves state in microseconds.

Step 4: Connection pooling captures the real win

# Client with connection pooling:
session = aiohttp.ClientSession()  # Create once
await session.post(url, headers=headers)  # Reuse TCP connection
await session.post(url, headers=headers)  # Same connection
await session.post(url, headers=headers)  # Same connection
The HTTP library (aiohttp, httpx, requests) keeps TCP connections alive. Each request reuses the same socket.

Why this is clever

Traditional HTTP (no pooling):

Request 1: TCP SYN → SYN-ACK → establish → POST → response → close
Request 2: TCP SYN → SYN-ACK → establish → POST → response → close
Request 3: TCP SYN → SYN-ACK → establish → POST → response → close
─────────────────────────────────────────────────────────────────
Cost: 3 handshakes × ~20ms = 60ms overhead

Session + pooling:

Request 1: TCP SYN → SYN-ACK → establish → POST → response → (keep open)
Request 2: (reuse) → POST → response → (keep open)
Request 3: (reuse) → POST → response → (keep open)
────────────────────────────────────────────────────────
Cost: 1 handshake × ~20ms = 20ms overhead
Total savings: 40ms across 3 calls
Across 100 calls, that’s 800ms saved just by reusing the connection.

Distributed state (the scaling trick):

Server 1: {"session-id-A": {context...}}
Server 2: {"session-id-B": {context...}}
Server 3: {"session-id-C": {context...}}

Load balancer routes:
Client with session-A → Server 1 (session found ✓)
                     → Server 2 (session NOT found ✗)
                     → Server 3 (session NOT found ✗)
This only works if the same session always goes to the same server (sticky sessions / session affinity). Most load balancers support this. But what if you don’t want sticky sessions? Move session state to shared storage:
All servers → Redis: {session-id → context}

Load balancer routes:
Client with session-A → Server 1 → Redis lookup ✓
                     → Server 2 → Redis lookup ✓
                     → Server 3 → Redis lookup ✓

Any server can handle the request. Stateless from LB perspective.

Real performance impact (PEBBLE data)

Without session headers / connection pooling:

Initialize tool-server: 195ms
Tool call #1: 45ms (new TCP connection)
Tool call #2: 45ms (new TCP connection)
Tool call #3: 45ms (new TCP connection)
─────────────
Total: 330ms (for 3 tool calls)

With session headers + pooling:

Initialize tool-server: 195ms
Tool call #1: 20ms (connection established during init, reused)
Tool call #2: 20ms (same connection)
Tool call #3: 20ms (same connection)
─────────────
Total: 255ms (for 3 tool calls)

Savings: 75ms (23% faster)
Marginal cost per tool: 20ms (vs 45ms without pooling)
Over 100 tool calls: 2500ms saved (25 seconds).

Design considerations

1. Session TTL (time-to-live)

session.expires_at = time.time() + 1800  # 30 minutes

# On request:
if session.is_expired():
    return HTTP 401  # Session expired, client must re-init
Why: Prevent memory leaks. Old sessions pile up. TTL lets them auto-cleanup. Tradeoff: Long-running tools (>30 min) will timeout. Mitigation: keepalive ping or refresh session before expiry.

2. Session size

Problem: Large session objects = slow serialization + memory bloat Pattern: Minimize session. Pass large data with each request:
# Bad: Store tool results in session
session.last_results = [...]  # Session bloats

# Good: Require client to send context with each request
tool_call = {
    "tool": "search",
    "args": {...},
    "context": {...}  # Client provides context, not server
}
Rule of thumb: Keep sessions under 1 KB. Everything else is context, not state.

3. Concurrency within a session

# What if multiple requests use the same session_id simultaneously?

Request A: session.state = "searching"
Request B: (same session) → sees "searching" → waits or fails?
Request C: (same session) → ???

Collision!
Solution: Either:
  • Serialize: Use locks, guarantee only one request per session at a time
  • Isolate: Give each concurrent request its own mini-context
  • Document: “Sessions not thread-safe; use unique IDs for parallel calls”
Most APIs go with option 1 (serialize). Option 2 is tricky. Option 3 is honest but burdensome.

4. Session security

Session ID is sent in a header. Not in URL (not logged). But still exposed:
GET /mcp/call
mcp-session-id: 01ARZ3NDEKTSV4RRFFQ69G5FAV  ← Sent in cleartext if HTTP
Mitigation: Always use HTTPS. Session IDs + HTTPS = secure. If you need extra security: Sign the session header:
mcp-session-id: 01ARZ3NDEKTSV4RRFFQ69G5FAV
mcp-session-sig: blake3(session_id + client_key + timestamp)
Server validates signature on each request. Prevents tampering.

When to use session-managed HTTP

Use when:
  • Multiple requests from same client
  • Tools have shared context (search queries, filters, pagination)
  • You want connection reuse benefits
  • You’re scaling horizontally (sticky sessions or Redis back the store)
Don’t use when:
  • Single request, one-off tool call
  • Each tool is completely independent (no shared state)
  • Tools are running in different security zones (isolation > performance)

Implementation checklist

  • Choose HTTP framework with session support (FastAPI + middleware, Flask-Session, etc.)
  • Generate session IDs (UUIDs, cryptographically random)
  • Store sessions (dict for single-instance, Redis for distributed)
  • Set TTL (e.g., 30 minutes)
  • Document session header requirement (easy to forget in clients)
  • Test session expiry (doesn’t leave garbage state)
  • Test concurrent requests (check for race conditions)
  • Monitor session memory (alert if count grows unbounded)
  • Add session affinity to load balancer (if not using Redis)
  • Use HTTPS (always)

The real lesson

Session-managed HTTP is a bridge between HTTP’s stateless nature and real-world stateful needs. It’s not a hack; it’s a design pattern. The key insight: You can be stateless from the platform’s perspective (any server can handle any request) while being stateful from the application’s perspective (tools have context). This is why the session header approach scales better than sticky sessions alone:
DesignScalingFailureComplexity
Sticky sessions (no distributed state)To machine countLB recalculates on server failureLow
Distributed session state (Redis + headers)UnlimitedTransparent (Redis handles failover)Medium
For platforms like PEBBLE that will eventually have dozens of tool servers, distributed state is the better foundation.

Further reading

  • HTTP RFC 7230 (Connection management)
  • Session design trade-offs (blog post link)
  • Redis Cluster for distributed sessions
  • Load balancer session affinity documentation (your cloud provider)

The 20ms latency you see in PEBBLE’s tool calls isn’t magic. It’s the result of session headers + connection pooling reducing overhead by 50%+ compared to traditional stateless HTTP. Adopt this pattern. Your users won’t see the difference. Your ops team will.