Proxy mode
mcp serve starts a single MCP server that aggregates all your configured backends. Any MCP-compatible client connects once and gets access to every tool from every server in your servers.json.
The problem
Without proxy mode, every LLM tool (Claude Code, Cursor, Windsurf, etc.) needs its own copy of your MCP server configuration. Add a new server? Update it in 3 places. Change a token? Same. The config drifts, breaks, and wastes time.
There's another problem: resource waste. When you configure MCP servers with command in mcpServers, each client session spawns its own copy of every server process. Open 5 Claude Code sessions and you get 5 copies of every MCP server — easily 3-4 GB of RAM wasted on duplicate processes.
Stdio vs HTTP: when to use each
"command": "mcp", "args": ["serve"]
Each session spawns a new proxy (which spawns all backends)
Simple, but duplicates everything per session
"type": "sse", "url": "http://…/mcp/sse"
All sessions share one persistent proxy
One process, one set of backends, zero duplication
Recommendation: Run mcp serve --http as a persistent service (systemd, launchd, etc.) and point all your clients to it via SSE. This gives you a single set of backend connections shared across every session, every client, every terminal.
How it works
Client sends
initialize— the proxy responds immediately with capabilitiesClient calls
tools/list— the proxy connects to all backends (lazy), discovers their tool lists, then lets idle backends shut down automaticallyClient calls
tools/call— the proxy reconnects the target backend on demand (if it was shut down), routes the request, and tracks usage for adaptive timeout
Tool namespacing
Tools are prefixed with the server name using double underscore (__) as separator:
sentry
search_issues
sentry__search_issues
slack
send_message
slack__send_message
github
list_repos
github__list_repos
Descriptions are also prefixed: [sentry] Search for issues in Sentry.
This prevents collisions when two servers expose a tool with the same name.
Stdio mode (default)
That's it. It reads the same servers.json (or $MCP_CONFIG_PATH) and connects to everything.
Diagnostics go to stderr:
HTTP mode
Expose the proxy as an HTTP server so multiple developers can share a single MCP endpoint:
This starts an HTTP server on 127.0.0.1:8080 (localhost only, by default).
Custom bind address
Security: Non-loopback addresses require the
--insecureflag. Without TLS, binding to0.0.0.0exposes the proxy to the network in plaintext. Use a reverse proxy (nginx, Caddy) with TLS in production.
Endpoints
POST
/mcp
JSON-RPC 2.0 request/response (Streamable HTTP)
GET
/mcp
SSE stream (same as /mcp/sse, for backward compatibility)
GET
/mcp/sse
SSE stream (old HTTP+SSE transport)
GET
/health
Health check (JSON)
The proxy supports both the Streamable HTTP transport (protocol version 2025-11-25) and the older HTTP+SSE transport (2024-11-05) for backward compatibility.
POST /mcp
Send any MCP JSON-RPC request and get the response. Supports both requests (with id) and notifications (without id):
When used with an SSE session (?session_id=<uuid>), responses are delivered via the SSE stream and the POST returns 202 Accepted.
GET /mcp/sse
SSE endpoint for clients that use the old HTTP+SSE transport (protocol version 2024-11-05). On connect, the server sends an endpoint event with the URL to POST requests to:
JSON-RPC responses are delivered as message events on the SSE stream. The connection stays open with periodic pings (every 15 seconds) to keep it alive. Sessions are cleaned up automatically when the client disconnects.
GET /health
Returns the proxy status:
Graceful shutdown
The HTTP server shuts down cleanly on SIGTERM or SIGINT (Ctrl+C). It stops accepting new connections, finishes in-flight requests, and disconnects all backends.
Team setup
Run one proxy server on shared infrastructure. Every developer connects to it:
Tokens stay on the server. Developers just connect. For a deeper look at the enterprise use case, see Enterprise token management.
Client configuration
Claude Code (stdio)
In your Claude Code MCP settings (.claude/mcp.json or via Claude Code settings):
Claude Code (HTTP — shared server)
Use the SSE transport type pointing to the /mcp/sse endpoint:
Note: The Streamable HTTP transport (
type: "http") requires OAuth which is not yet supported. Usetype: "sse"for now.
Cursor (stdio)
In .cursor/mcp.json:
Cursor (HTTP — shared server)
Windsurf
In your Windsurf MCP config:
Any MCP client (generic stdio)
Any client that supports stdio transport can use it. The proxy speaks standard JSON-RPC 2.0 over MCP protocol on stdin/stdout.
Any MCP client (HTTP)
Any client that supports HTTP transport can connect to the HTTP endpoint:
Lazy initialization and idle shutdown
The proxy does not keep all backends running permanently. It uses a lazy initialization strategy combined with adaptive idle shutdown to minimize resource usage.
How it works
Startup — No backends are connected. The proxy starts instantly.
First
tools/list— The proxy connects to all backends, fetches their tool lists, and caches them. Backends are kept alive after discovery.Idle shutdown — A background task checks every 30 seconds for idle backends. If a backend exceeds its idle timeout, it is shut down. Its tools remain visible in
tools/list.On-demand reconnect — When
tools/calltargets a disconnected backend, the proxy reconnects it transparently, refreshes the tool cache, and forwards the request.
Usage statistics (request count, frequency) are preserved across reconnections, so the adaptive timeout algorithm maintains continuity.
Adaptive timeout tiers
The default idle timeout is adaptive. The proxy classifies each backend by its usage frequency:
Hot
> 20
5 min
Warm
5–20
3 min
Cold
< 5
1 min
Backends with fewer than 2 requests use the minimum timeout (default: 1 min).
Configuring idle timeout
Per-backend in servers.json:
"adaptive" (default)
Usage-based timeout with automatic tier assignment
"never"
Keep alive forever (old behavior)
"<duration>"
Fixed timeout (e.g. "2m", "30s", "1h")
See the config file reference for full details.
Why this matters
With 10 MCP servers configured and 3 Claude Code sessions open:
Before: 30 backend processes running permanently (~3-4 GB RAM)
After: Only the backends you're actively using stay alive. Idle ones are shut down within 1-5 minutes and reconnected on demand.
Error handling
Backend fails to connect — logged to stderr, skipped. Other backends still work.
Backend disconnected (idle shutdown) —
tools/callreconnects the backend transparently. If reconnection fails, returns an MCP error with context.Backend disconnects mid-session —
tools/callreturns an MCP error with context about which backend failed.Unknown tool — returns a JSON-RPC error with the unknown tool name.
Malformed JSON-RPC — HTTP mode returns a parse error with details. Stdio mode silently ignores.
The proxy never crashes because one backend is down. It degrades gracefully.
Authentication
The proxy supports server-side authentication for HTTP mode. Authentication is configured via serverAuth in servers.json.
No auth (default)
By default, no authentication is required. This is suitable for local development and stdio mode.
Bearer token auth
Static token-to-user mapping. Each token maps to a subject identity:
Clients pass the token in the Authorization header:
Forwarded user auth
Trusts a reverse proxy header (e.g. X-Forwarded-User). Only use behind a trusted proxy that sets this header:
Access control (ACL)
Control which users can access which tools using glob patterns:
Rules are evaluated in order — first match wins. If no rule matches, the default policy applies.
ACL fields:
subjects— list of user subjects to match (supports*wildcard)roles— list of roles to match (supports*wildcard)tools— list of tool name patterns (supports*prefix/suffix globs)policy—allowordeny
Both subjects and roles must match for a rule to apply. Empty subjects or roles means "match all".
Note: Stdio mode always uses anonymous identity. ACL rules still apply but the subject is always "anonymous".
Security considerations
Localhost-only by default
HTTP mode binds to 127.0.0.1 by default. This is safe for local development — only processes on the same machine can reach it.
Non-loopback binding
To expose the proxy on the network, you must explicitly opt in:
The --insecure flag acknowledges the risk of plaintext HTTP on a network interface.
Production deployment
For production, put a reverse proxy in front:
This gives you:
TLS termination
Authentication (bearer tokens or forwarded user)
Rate limiting
Access logging
Token isolation
Backend tokens (Slack, GitHub, etc.) live in servers.json on the proxy server. They are never exposed to clients. Only tool results are forwarded.
Environment variables
All standard mcp env vars apply:
MCP_CONFIG_PATH
Custom config file path
MCP_TIMEOUT
Timeout in seconds for backend connections (default: 60)
When to use each mode
Single session, quick test
mcp serve (stdio)
Multiple sessions on same machine
mcp serve --http + SSE clients
Team sharing one MCP endpoint
mcp serve --http + SSE clients
CI/CD pipeline calling tools
mcp serve --http + curl
Production with auth & TLS
mcp serve --http + reverse proxy
Calling one tool from a script
mcp <server> <tool> directly
If you regularly open multiple Claude Code sessions, use HTTP mode as a persistent service. Stdio mode spawns a full copy of every backend per session — HTTP mode shares one.
Last updated
Was this helpful?