Skip to main content
Normal Relay routing maps a relay key to a pool of your provider keys and picks one for each request. Proxy mode is the other way around: the caller brings their own upstream provider key, and Relay forwards the request to the provider verbatim — no policy resolution, no key pool, no model rewrite, no translation. It’s useful when you want Relay in front of provider traffic for rate limiting, observability, and a single endpoint, but the credential belongs to the caller rather than to a Relay-managed pool.

Normal mode vs proxy mode

Normal modeProxy mode
Upstream credentialDrawn from your key poolThe caller’s, forwarded as-is
RoutingFull policy → model → host bindingHost pinned by header (or model lookup)
Model rewriteYesNo — body forwarded verbatim
TranslationCross-shape supportedNone — byte-for-byte forward
Key pooling & failoverYesNo

Enabling it

Proxy mode is off by default. Turn it on through the control plane:
curl -X PUT http://localhost:8081/settings/proxy-mode \
  -H "Authorization: Bearer $RELAY_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"enabled": true, "allowUnauthenticated": false}'
SettingDefaultEffect
enabledfalseMaster switch. While false, every proxy request returns 403.
allowUnauthenticatedfalseWhether anonymous proxy requests (no relay key) are allowed.
The change hot-reloads across all pods within ~1s — no restart.
Proxy mode is gated by this global switch today. A per-relay-key passthroughAllowed flag exists for future per-key gating but is not yet enforced — for now, enabled is what controls access.

Making a proxy request

Send X-WR-Proxy-Mode: Proxy, put your upstream key in Authorization, and pin the upstream host with X-WR-Upstream-Host:
curl http://localhost:8080/anthropic/v1/messages \
  -H "X-WR-Proxy-Mode: Proxy" \
  -H "X-WR-API-Key: <your-relay-key>" \
  -H "X-WR-Upstream-Host: anthropic" \
  -H "Authorization: Bearer sk-ant-your-own-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":256,"messages":[{"role":"user","content":"hello"}]}'
Relay strips its own headers (X-WR-*, X-WR-API-Key) and forwards the body unchanged, with your Authorization reattached as the upstream credential.

Authenticated vs anonymous

  • Authenticated proxy — include your relay key in X-WR-API-Key. The request is attributed to that key and rate-limited under the inference-api-proxy pool. If you omit X-WR-Upstream-Host, Relay reads the model from the body and resolves the host from your relay key’s policy.
  • Anonymous proxy — omit the relay key. Allowed only when allowUnauthenticated is on, rate-limited per client IP under inference-api-proxy-anonymous, and X-WR-Upstream-Host is required (there’s no policy to infer the host from).

Discovering upstream hosts

List the host slugs you can target with X-WR-Upstream-Host:
curl http://localhost:8080/v1/proxy/hosts \
  -H "X-WR-API-Key: <your-relay-key>"

Errors

StatusMeaning
400X-WR-Proxy-Mode has an invalid value, Authorization (upstream key) is missing, or X-WR-Upstream-Host is unknown / required-but-absent.
401Anonymous proxy request while allowUnauthenticated is off.
403Proxy mode is disabled (enabled: false).
Proxy mode does no translation and no model rewriting — the body is forwarded exactly as sent. Send the request in the wire shape the target host expects.