Skip to main content
The inference plane is the customer-facing data plane. It listens on RELAY_PORT (default 8080) and speaks OpenAI- and Anthropic-shape wire protocols, plus Relay’s own provider-neutral canonical shape.

Authentication

Every inference request authenticates with a relay key as a bearer token:
Authorization: Bearer <relay-key>
Relay keys are minted in the admin UI or via the control plane (POST /relay-keys). The plaintext is shown exactly once on creation — Relay stores only sha256(plaintext).

Namespacing

Each vendor wire shape is served under its own path prefix. The bare /v1 namespace belongs to Relay’s canonical shape.
PrefixWire shape
/openai/...OpenAI Chat Completions, Responses, Embeddings
/anthropic/...Anthropic Messages
/v1/...Relay canonical (provider-neutral)

Endpoints

MethodPathShape
POST/openai/v1/chat/completionsOpenAI Chat Completions
POST/openai/v1/responsesOpenAI Responses API
POST/openai/v1/embeddingsOpenAI Embeddings (byte-pass)
POST/anthropic/v1/messagesAnthropic Messages
POST/v1/generateRelay canonical request/response
GET/v1/modelsModels accessible to the relay key
GET/healthzLiveness + Postgres ping (public)
GET/openapi.jsonGenerated typed OpenAPI spec

Examples

curl http://localhost:8080/openai/v1/chat/completions \
  -H "Authorization: Bearer <relay-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "hello"}]
  }'
Responses stream byte-for-byte from the upstream when the inbound shape matches the upstream shape. Cross-shape requests (e.g. OpenAI in, Anthropic upstream) are translated per chunk through Relay’s canonical protocol.

Models

The model field is resolved against your catalog. A model is reachable only if a policy grants it to your relay key and the model has an enabled host binding with a healthy host key. List what your key can reach:
curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer <relay-key>"

Errors

StatusMeaning
401Missing or invalid relay key.
403Relay key’s policy does not grant the requested model.
404Unknown model, or no enabled binding for it.
429Rate limit reached (relay-side or upstream).
502 / 503No healthy key in the pool, or upstream unreachable. See Troubleshooting.
Relay does not fail over mid-stream. Failover across keys and hosts happens before the first byte reaches you. Once bytes flow, an upstream error is returned as-is.