Metrics - Wyolet Relay

Relay exports Prometheus metrics for traffic, latency, its own overhead, and a few saturation signals. Everything is in the relay namespace, plus the standard Go runtime and process collectors.

Scrape endpoint

Metrics are served at /metrics on the control listener (RELAY_CONTROL_PORT, default 8081) — at the listener root, not under the /api prefix.

curl http://localhost:8081/metrics

Prometheus scrape config

scrape_configs:
  - job_name: relay
    static_configs:
      - targets: ["relay:8081"]

Request flow

The core RED-style metrics (Rate, Errors, Duration), plus the Relay-vs-upstream time split. All are labelled by source — the runner that handled the request: pipeline, proxy, ws, or batch.

Metric	Type	Labels	What it answers
`relay_requests_total`	counter	`source`, `status`	How much traffic and how many errors. `status` is a bounded class (`2xx`/`3xx`/`4xx`/`5xx`/`other`), never the raw code.
`relay_request_seconds`	histogram	`source`	End-to-end latency, from handler entry to response body closed.
`relay_overhead_seconds`	histogram	`source`	Relay’s own time = total handler time minus the upstream call.
`relay_admission_seconds`	histogram	`source`	Time from request accept to upstream handoff — auth + rate-limit reserve + key selection.
`relay_inflight_requests`	gauge	`source`	Currently-open requests. A streamed request counts until its body closes.
`relay_post_flight_seconds`	histogram	(none)	Duration of the post-flight observer fan-out per request.

relay_overhead_seconds is the metric to watch — it isolates the latency Relay itself adds. The performance SLO lives here: p99 overhead under 10 ms in a live distributed deployment. Its buckets are tuned tight (100 µs → 500 ms) for exactly this question.

relay_overhead_seconds and relay_admission_seconds are only observed when the request actually reached upstream. A request rejected before handoff (auth failure, rate-limited, no healthy key) is counted in relay_requests_total but contributes no overhead/admission sample — there’s no meaningful split to record.

Health & saturation

Leading indicators for the two failures that hurt silently: dropped background records, and provider keys dying off.

Metric	Type	Labels	What it answers
`relay_records_lost_total`	counter	`kind`	Background records (`usage` / `payload`) dropped because a bounded emitter queue was full. A drop you can’t see is a billing or audit hole — so it’s counted.
`relay_emit_queue_depth`	gauge	`kind`	Events waiting in a bounded emitter queue (`usage` / `payload`) — the leading signal for the drops above, before they happen.
`relay_provider_keys_down_total`	counter	`reason`	Times a pooled provider key was put into cooldown by a failure. `reason` is the failure class, e.g. `auth`, `rate_limit`, `server_error`, `network`, `local_rl`.

relay_provider_keys_down_total is a counter of cooldown transitions, not a gauge of “keys down right now.” Breaker state lives in shared kv, so a per-pod gauge would be inconsistent across the fleet; trip counts, by contrast, sum cleanly. Watch its rate, not its absolute value.

Standard collectors

The default Go and process collectors are registered too, so you also get the usual go_* (goroutines, GC, memory) and process_* (CPU, FDs, resident memory) series without any extra configuration.

​Scrape endpoint

​Request flow

​Health & saturation

​Standard collectors

Scrape endpoint

Request flow

Health & saturation

Standard collectors