> For the complete documentation index, see [llms.txt](https://mcp.avelino.run/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://mcp.avelino.run/how-to/kubernetes.md).

# Kubernetes

Reference manifests for running the MCP proxy in a Kubernetes cluster.

## Quick start

```bash
# 1. Edit the ConfigMap with your server configuration
vim deploy/kubernetes/configmap.yaml

# 2. Create the Secret with your API tokens (not included in kustomize)
kubectl create namespace mcp
kubectl -n mcp create secret generic mcp-secrets \
  --from-literal=sentry-token=sntrys_...

# 3. Apply the manifests
kubectl apply -k deploy/kubernetes/
```

This creates:

* `mcp` namespace
* `mcp-proxy` Deployment (1 replica)
* `mcp-proxy` ClusterIP Service on port 8080
* `mcp-config` ConfigMap with your server configuration

The Secret must be created separately (step 2) to avoid committing real tokens to the repo.

## Configuration

### Server config via ConfigMap

Edit `deploy/kubernetes/configmap.yaml` with your MCP servers:

```yaml
data:
  servers.json: |
    {
      "mcpServers": {
        "sentry": {
          "url": "https://mcp.sentry.dev/sse",
          "headers": {
            "Authorization": "Bearer ${SENTRY_TOKEN}"
          }
        },
        "grafana": {
          "url": "https://grafana.internal/api/mcp/sse",
          "headers": {
            "Authorization": "Bearer ${GRAFANA_TOKEN}"
          }
        }
      }
    }
```

The proxy resolves `${VAR}` placeholders from environment variables at startup. This keeps tokens out of the ConfigMap.

### Secrets for tokens

Create the secret with your real tokens:

```bash
kubectl -n mcp create secret generic mcp-secrets \
  --from-literal=sentry-token=sntrys_abc123 \
  --from-literal=grafana-token=glsa_xyz789
```

Then reference each token in the Deployment env:

```yaml
env:
  - name: SENTRY_TOKEN
    valueFrom:
      secretKeyRef:
        name: mcp-secrets
        key: sentry-token
  - name: GRAFANA_TOKEN
    valueFrom:
      secretKeyRef:
        name: mcp-secrets
        key: grafana-token
```

### OAuth tokens via Secret

For backends that use OAuth (Sentry, Honeycomb, GitHub Copilot, etc.), `mcp` keeps issued access/refresh tokens and dynamic-client registrations in `auth.json`. In a pod with a read-only root filesystem, mounting a writable `auth.json` is awkward — instead, inject the contents directly via `MCP_AUTH_CONFIG`.

**1. Run the OAuth flow once on a workstation:**

```bash
mcp add sentry --remote https://mcp.sentry.dev
# completes browser flow, writes ~/.config/mcp/auth.json
```

**2. Push the resulting file into a Secret:**

```bash
kubectl -n mcp create secret generic mcp-auth \
  --from-file=auth.json=$HOME/.config/mcp/auth.json
```

> Use a Secret (not a ConfigMap) — the file contains live access tokens.

**3. Inject it as an env var in the Deployment:**

```yaml
env:
  - name: MCP_AUTH_CONFIG
    valueFrom:
      secretKeyRef:
        name: mcp-auth
        key: auth.json
```

The proxy reads the inline JSON at startup and keeps it in an in-memory store. OAuth refresh and dynamic-client registration update the in-memory copy so refreshed tokens stay coherent across requests within the pod's lifetime. **Nothing is ever written to disk** — one `warn` log is emitted on the first save attempt. On pod restart, the Secret is re-read and in-memory mutations are discarded.

**Refresh strategy.** When refresh tokens are about to expire, rotate the Secret externally (sealed-secrets, external-secrets-operator, a CronJob that re-runs `mcp add`, etc.) and let the rolling update pick it up. The proxy itself is not designed to write back to the Secret.

**Limitation.** Since `MCP_AUTH_CONFIG` is read-only, you cannot run `mcp add <server>` against a running pod and have the registration persist. Always pre-provision the auth store on a workstation or in a one-off Job, then ship it via the Secret.

### Pinning the image version

Edit `deploy/kubernetes/kustomization.yaml`:

```yaml
images:
  - name: ghcr.io/avelino/mcp
    newTag: "0.5.0"  # pin to a specific version
```

## Why `--insecure`?

The proxy refuses to bind non-loopback addresses without `--insecure`. In Kubernetes, the pod needs `0.0.0.0:8080` so the Service can route traffic to it. TLS termination happens at the Ingress or load balancer level, not at the proxy.

## Health probes

The proxy exposes `GET /health` returning:

```json
{
  "status": "ok",
  "backends_configured": 3,
  "backends_connected": 2,
  "active_clients": 5,
  "tools": 42,
  "version": "0.5.0"
}
```

### Why the probes are configured this way

**Startup probe** — gives 30s (`failureThreshold: 6 * periodSeconds: 5`) for the process to start and begin backend discovery. Discovery is async, so the proxy serves immediately but backends connect in the background.

**Liveness probe** — checks every 30s that the process responds to HTTP. Backend failures are **degraded state**, not a reason to restart the pod. If sentry is down, the proxy still serves grafana tools fine.

**Readiness probe** — checks every 10s. The proxy is ready to serve as soon as it starts because it lazy-connects backends on first request. A probe failure here means the process itself is unhealthy.

> **Do not** use `backends_connected > 0` as a readiness condition. The proxy is designed to start with zero connections and connect on demand.

## Application logs

Application logs (`tracing` events from the proxy itself — startup, backend discovery, request errors, OAuth flows) go to **stderr** by default and are captured by the kubelet, so `kubectl logs` just works. Two env vars tune this for production:

```yaml
env:
  # EnvFilter syntax — silence noisy library logs, keep mcp at debug.
  - name: MCP_LOG_LEVEL
    value: "mcp=debug,hyper=warn,reqwest=warn,h2=warn"
  # Newline-delimited JSON, one event per line. Drop in any log driver
  # (Loki, Datadog, CloudWatch, Fluentd) without parsing rules.
  - name: MCP_LOG_FORMAT
    value: "json"
```

`MCP_LOG_LEVEL` follows `tracing`'s [EnvFilter](https://docs.rs/tracing-subscriber/latest/tracing_subscriber/filter/struct.EnvFilter.html) syntax. Set the global level with `info`/`debug`/`trace`, or scope per module with `target=level` separated by commas. The example above keeps the proxy at `debug` while silencing `hyper`/`reqwest`/`h2` chatter — which dominates request volume otherwise.

`MCP_LOG_FORMAT=json` swaps the human-readable formatter for newline-delimited JSON. Each line is a complete event with `timestamp`, `level`, `message`, and structured fields. Pair with the audit stream below and you get a single tail-able log surface — every line is JSON, no mixed formats.

```bash
# Live tail, all events:
kubectl -n mcp logs deploy/mcp-proxy -f

# Only proxy errors via jq:
kubectl -n mcp logs deploy/mcp-proxy -f | jq -c 'select(.level=="ERROR")'
```

> **Why stderr, not stdout, for app logs?** In `mcp serve`, audit logs go to **stdout** by default (auto-promotion of `file` to `file+stdout`) and they're the structured product surface. Application/tracing logs go to **stderr** as the diagnostic surface. Kubernetes captures both in the same `kubectl logs` stream by default — split them downstream with `jq` (audit lines have `method`/`identity`; tracing lines have `level`/`target`).

## Audit logging

By default, audit logging is disabled (`MCP_AUDIT_ENABLED=false`) because the scratch-based image has no writable filesystem.

**Option A: Stream to stdout (no PVC needed)**

Set `MCP_AUDIT_OUTPUT=stdout` in the Deployment env. Audit entries are emitted as JSON lines to stdout and captured by your cluster's log pipeline (Fluentd, Loki, CloudWatch, etc.). No persistent storage required.

> If you want **both** PVC persistence (queryable via `mcp logs` in `kubectl exec`) and the cluster log pipeline, leave `MCP_AUDIT_OUTPUT` unset — `mcp serve --http` auto-promotes the default `file` to `file+stdout` for exactly this case. Setting `MCP_AUDIT_OUTPUT=file+stdout` explicitly also works (and is honored verbatim).

**Option B: Persist to a PVC**

1. Set `MCP_AUDIT_ENABLED=true` in the Deployment env
2. Mount persistent storage at `/data`:

```yaml
# In deployment.yaml, replace the emptyDir volume:
volumes:
  - name: data
    persistentVolumeClaim:
      claimName: mcp-audit-data
```

3. Uncomment `pvc.yaml` in `kustomization.yaml`:

```yaml
resources:
  # ...
  - pvc.yaml
```

4. Apply:

```bash
kubectl apply -k deploy/kubernetes/
```

Audit logs are written to `/data/audit/data` and indexed at `/data/audit/index` (controlled by `MCP_AUDIT_PATH` and `MCP_AUDIT_INDEX_PATH`).

## Security context

The manifests include a hardened security context:

```yaml
securityContext:
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]
```

The image is based on `scratch` — a static binary with no shell, no package manager, no libc. The process runs as UID 0 by default (the Dockerfile doesn't set `USER`), but `scratch` itself does not require root. The attack surface is minimal regardless of UID: no shell to exec into, no tools to exploit, read-only filesystem.

If your cluster policy requires `runAsNonRoot: true`, set a numeric `runAsUser` and ensure mounted volumes (`/tmp`, `/data`) are writable for that UID — either via `fsGroup` or an initContainer:

```yaml
securityContext:
  runAsNonRoot: true
  runAsUser: 65534
  runAsGroup: 65534
  fsGroup: 65534
```

## Scaling

Each replica is fully independent — own backend pool, own tool cache, own connections. There's no shared state, no leader election, no coordination needed.

Scaling to N replicas means:

* N independent connections to each backend
* N copies of the tool/resource/prompt cache in memory
* Clients are load-balanced across replicas by the Service

This is fine for most deployments. Be aware that stdio-based backends (which spawn child processes) will have N copies of each process running across the cluster.

## Graceful shutdown

When Kubernetes sends `SIGTERM` (during rolling updates or scale-down):

1. The proxy stops accepting new connections
2. In-flight requests finish normally
3. Backend clients are shut down in parallel (5s timeout each)
4. Total internal cleanup is bounded to \~10s

`terminationGracePeriodSeconds: 30` in the Deployment gives enough headroom. After 30s, Kubernetes sends `SIGKILL`.

## Environment variables reference

| Variable                    | Manifest value                                | Description                                                                                                                                                                                                                         |
| --------------------------- | --------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `MCP_SERVERS_CONFIG`        | (from ConfigMap)                              | Inline JSON config (highest priority)                                                                                                                                                                                               |
| `MCP_AUTH_CONFIG`           | (from Secret, optional)                       | Inline OAuth tokens (`auth.json`). Read-only — writes are no-ops.                                                                                                                                                                   |
| `MCP_PROXY_REQUEST_TIMEOUT` | `120` (app default)                           | Max seconds per JSON-RPC request                                                                                                                                                                                                    |
| `MCP_LOG_LEVEL`             | `info`                                        | `tracing` `EnvFilter` (e.g. `mcp=debug,hyper=warn,reqwest=warn,h2=warn`)                                                                                                                                                            |
| `MCP_LOG_FORMAT`            | `text`                                        | `json` for newline-delimited JSON to stderr (log drivers)                                                                                                                                                                           |
| `MCP_AUDIT_ENABLED`         | `false`                                       | Enable audit logging                                                                                                                                                                                                                |
| `MCP_AUDIT_OUTPUT`          | unset (→ `file+stdout` in `mcp serve --http`) | `stdout` for cluster log pipeline only, `file` for PVC only (setting this env var is treated as explicit and skips auto-promotion), `file+stdout` for both PVC and pipeline (the auto-promoted default in serve), `none` to disable |
| `MCP_AUDIT_PATH`            | `/data/audit/data`                            | Audit data directory (app default: `~/.config/mcp/db/data`)                                                                                                                                                                         |
| `MCP_AUDIT_INDEX_PATH`      | `/data/audit/index`                           | Audit index directory (app default: `~/.config/mcp/db/index`)                                                                                                                                                                       |
| `MCP_CLASSIFIER_CACHE`      | `/tmp/tool-classification.json`               | Tool classification cache (app default: `~/.config/mcp/tool-classification.json`)                                                                                                                                                   |

Full reference: [Environment variables](/reference/environment-variables.md)

## Exposing outside the cluster

The Service is `ClusterIP` by default. To expose externally, add an Ingress:

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-proxy
  namespace: mcp
  annotations:
    # TLS termination at the ingress
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts: ["mcp.example.com"]
      secretName: mcp-tls
  rules:
    - host: mcp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: mcp-proxy
                port:
                  name: http
```

## Troubleshooting

### Pod starts but backends never connect

Check the ConfigMap config is valid JSON:

```bash
kubectl -n mcp get configmap mcp-config -o jsonpath='{.data.servers\.json}' | jq .
```

Check the proxy logs:

```bash
kubectl -n mcp logs deploy/mcp-proxy
```

Look for `[serve] discovering tools from ...` lines. If you see `failed to discover`, the backend URL or token is wrong.

### Health probe fails on startup

Increase the startup probe threshold:

```yaml
startupProbe:
  failureThreshold: 12  # 60s instead of 30s
  periodSeconds: 5
```

### Token not resolving

Ensure the Secret key matches what the Deployment env references, and that the `${VAR_NAME}` in the ConfigMap matches the env var name exactly. If a referenced env var is missing, the placeholder is replaced with an empty string silently — verify the resolved config by checking the proxy logs for authentication failures on backend connections.

### Read-only filesystem errors

If you see permission errors, make sure the `tmp` and `data` volumes are mounted. The scratch image has no writable paths without explicit volume mounts.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://mcp.avelino.run/how-to/kubernetes.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
