Session-Managed HTTP: Stateful Communication on Stateless Transport

The paradox

HTTP is fundamentally stateless. No two requests know about each other. No connection concept. Every call is independent. Yet every non-trivial application needs some kind of state. Shopping carts, authentication, user context, tool state. The traditional answer: cookies, stored in the browser or server. This works for web browsers, but breaks down for:

Service-to-service communication (no browser)
Load-balanced clusters (which server holds the state?)
Stateful tools and agents (sessions need to survive across calls)

The question: How do you implement sessions over HTTP without violating its stateless nature? The answer: Session headers + connection pooling + distributed state. This isn’t new. But it’s rarely explained clearly. Let’s fix that.

The design

Step 1: Session header exchange

Client sends:
  POST /mcp/initialize
  Accept: application/json, text/event-stream
  
Server responds with:
  HTTP 200
  mcp-session-id: 01ARZ3NDEKTSV4RRFFQ69G5FAV
  
  {"jsonrpc": "2.0", "result": {...}}

The server creates an in-memory session, assigns it a UUID, and returns the ID.

Step 2: Client captures and reuses

# After initialization:
self._session_id = response.headers.get('mcp-session-id')

# On every subsequent call:
headers = {'mcp-session-id': self._session_id}
response = await self.post('/call', body, headers=headers)

The client remembers the ID and includes it in all future requests.

Step 3: Server retrieves and reuses

# On receiving a request:
session_id = request.headers.get('mcp-session-id')
session = server.sessions[session_id]  # O(1) lookup
# Execute tool with session context
result = await tool.execute(session.context)

The server has a session store (dict, Redis, database). Given an ID, it retrieves state in microseconds.

Step 4: Connection pooling captures the real win

# Client with connection pooling:
session = aiohttp.ClientSession()  # Create once
await session.post(url, headers=headers)  # Reuse TCP connection
await session.post(url, headers=headers)  # Same connection
await session.post(url, headers=headers)  # Same connection

The HTTP library (aiohttp, httpx, requests) keeps TCP connections alive. Each request reuses the same socket.

Why this is clever

Traditional HTTP (no pooling):

Request 1: TCP SYN → SYN-ACK → establish → POST → response → close
Request 2: TCP SYN → SYN-ACK → establish → POST → response → close
Request 3: TCP SYN → SYN-ACK → establish → POST → response → close
─────────────────────────────────────────────────────────────────
Cost: 3 handshakes × ~20ms = 60ms overhead

Session + pooling:

Request 1: TCP SYN → SYN-ACK → establish → POST → response → (keep open)
Request 2: (reuse) → POST → response → (keep open)
Request 3: (reuse) → POST → response → (keep open)
────────────────────────────────────────────────────────
Cost: 1 handshake × ~20ms = 20ms overhead
Total savings: 40ms across 3 calls

Across 100 calls, that’s 800ms saved just by reusing the connection.

Distributed state (the scaling trick):

Server 1: {"session-id-A": {context...}}
Server 2: {"session-id-B": {context...}}
Server 3: {"session-id-C": {context...}}

Load balancer routes:
Client with session-A → Server 1 (session found ✓)
                     → Server 2 (session NOT found ✗)
                     → Server 3 (session NOT found ✗)

This only works if the same session always goes to the same server (sticky sessions / session affinity). Most load balancers support this. But what if you don’t want sticky sessions? Move session state to shared storage:

All servers → Redis: {session-id → context}

Load balancer routes:
Client with session-A → Server 1 → Redis lookup ✓
                     → Server 2 → Redis lookup ✓
                     → Server 3 → Redis lookup ✓

Any server can handle the request. Stateless from LB perspective.

Real performance impact (PEBBLE data)

Without session headers / connection pooling:

Initialize tool-server: 195ms
Tool call #1: 45ms (new TCP connection)
Tool call #2: 45ms (new TCP connection)
Tool call #3: 45ms (new TCP connection)
─────────────
Total: 330ms (for 3 tool calls)

With session headers + pooling:

Initialize tool-server: 195ms
Tool call #1: 20ms (connection established during init, reused)
Tool call #2: 20ms (same connection)
Tool call #3: 20ms (same connection)
─────────────
Total: 255ms (for 3 tool calls)

Savings: 75ms (23% faster)
Marginal cost per tool: 20ms (vs 45ms without pooling)

Over 100 tool calls: 2500ms saved (25 seconds).

Design considerations

1. Session TTL (time-to-live)

session.expires_at = time.time() + 1800  # 30 minutes

# On request:
if session.is_expired():
    return HTTP 401  # Session expired, client must re-init

Why: Prevent memory leaks. Old sessions pile up. TTL lets them auto-cleanup. Tradeoff: Long-running tools (>30 min) will timeout. Mitigation: keepalive ping or refresh session before expiry.

2. Session size

Problem: Large session objects = slow serialization + memory bloat Pattern: Minimize session. Pass large data with each request:

# Bad: Store tool results in session
session.last_results = [...]  # Session bloats

# Good: Require client to send context with each request
tool_call = {
    "tool": "search",
    "args": {...},
    "context": {...}  # Client provides context, not server
}

Rule of thumb: Keep sessions under 1 KB. Everything else is context, not state.

3. Concurrency within a session

# What if multiple requests use the same session_id simultaneously?

Request A: session.state = "searching"
Request B: (same session) → sees "searching" → waits or fails?
Request C: (same session) → ???

Collision!

Solution: Either:

Serialize: Use locks, guarantee only one request per session at a time
Isolate: Give each concurrent request its own mini-context
Document: “Sessions not thread-safe; use unique IDs for parallel calls”

Most APIs go with option 1 (serialize). Option 2 is tricky. Option 3 is honest but burdensome.

4. Session security

Session ID is sent in a header. Not in URL (not logged). But still exposed:

GET /mcp/call
mcp-session-id: 01ARZ3NDEKTSV4RRFFQ69G5FAV  ← Sent in cleartext if HTTP

Mitigation: Always use HTTPS. Session IDs + HTTPS = secure. If you need extra security: Sign the session header:

mcp-session-id: 01ARZ3NDEKTSV4RRFFQ69G5FAV
mcp-session-sig: blake3(session_id + client_key + timestamp)

Server validates signature on each request. Prevents tampering.

When to use session-managed HTTP

✓ Use when:

Multiple requests from same client
Tools have shared context (search queries, filters, pagination)
You want connection reuse benefits
You’re scaling horizontally (sticky sessions or Redis back the store)

✗ Don’t use when:

Single request, one-off tool call
Each tool is completely independent (no shared state)
Tools are running in different security zones (isolation > performance)

Implementation checklist

The real lesson

Session-managed HTTP is a bridge between HTTP’s stateless nature and real-world stateful needs. It’s not a hack; it’s a design pattern. The key insight: You can be stateless from the platform’s perspective (any server can handle any request) while being stateful from the application’s perspective (tools have context). This is why the session header approach scales better than sticky sessions alone:

Design	Scaling	Failure	Complexity
Sticky sessions (no distributed state)	To machine count	LB recalculates on server failure	Low
Distributed session state (Redis + headers)	Unlimited	Transparent (Redis handles failover)	Medium

For platforms like PEBBLE that will eventually have dozens of tool servers, distributed state is the better foundation.

Session-Managed HTTP: The Unexpected Win in Connection Pooling Design

Session-Managed HTTP: Stateful Communication on Stateless Transport

The paradox

The design

Step 1: Session header exchange

Step 2: Client captures and reuses

Step 3: Server retrieves and reuses

Step 4: Connection pooling captures the real win

Why this is clever

Traditional HTTP (no pooling):

Session + pooling:

Distributed state (the scaling trick):

Real performance impact (PEBBLE data)

Without session headers / connection pooling:

With session headers + pooling:

Design considerations

1. Session TTL (time-to-live)

2. Session size

3. Concurrency within a session

4. Session security

When to use session-managed HTTP

Implementation checklist

The real lesson

Further reading

​Session-Managed HTTP: Stateful Communication on Stateless Transport

​The paradox

​The design

​Step 1: Session header exchange

​Step 2: Client captures and reuses

​Step 3: Server retrieves and reuses

​Step 4: Connection pooling captures the real win

​Why this is clever

​Traditional HTTP (no pooling):

​Session + pooling:

​Distributed state (the scaling trick):

​Real performance impact (PEBBLE data)

​Without session headers / connection pooling:

​With session headers + pooling:

​Design considerations

​1. Session TTL (time-to-live)

​2. Session size

​3. Concurrency within a session

​4. Session security

​When to use session-managed HTTP

​Implementation checklist

​The real lesson

​Further reading

Session-Managed HTTP: Stateful Communication on Stateless Transport

The paradox

The design

Step 1: Session header exchange

Step 2: Client captures and reuses

Step 3: Server retrieves and reuses

Step 4: Connection pooling captures the real win

Why this is clever

Traditional HTTP (no pooling):

Session + pooling:

Distributed state (the scaling trick):

Real performance impact (PEBBLE data)

Without session headers / connection pooling:

With session headers + pooling:

Design considerations

1. Session TTL (time-to-live)

2. Session size

3. Concurrency within a session

4. Session security

When to use session-managed HTTP

Implementation checklist

The real lesson

Further reading