Files
2026-04-14 17:54:32 +02:00

3.3 KiB

Metrics Reference

All metrics are emitted via OpenTelemetry (OTLP gRPC) with cumulative temporality. OTel metric names use dots; Prometheus converts them to underscores and appends _total for counters.

Counters

OTel Name Prometheus Name Attributes Description
proxy.request.count proxy_request_count_total model, stream, status_code Total proxied requests
proxy.tokens.input proxy_tokens_input_total model, credential Input tokens consumed
proxy.tokens.output proxy_tokens_output_total model, credential Output tokens consumed
proxy.upstream.errors proxy_upstream_errors_total error_type, credential, status_code Upstream errors (connection failures, 4xx/5xx)
proxy.credential.cooldowns proxy_credential_cooldowns_total status_code Credential cooldown activations (rate limited)
proxy.stream.requests proxy_stream_requests_total model Streaming request count

Histograms

OTel Name Prometheus Name Unit Attributes Description
proxy.request.duration_ms proxy_request_duration_ms_milliseconds ms model, stream, status_code Request latency
proxy.request.body_size_bytes proxy_request_body_size_bytes bytes model, stream Request body size

Gauges

OTel Name Prometheus Name Attributes Description
proxy.usage.utilization proxy_usage_utilization window Current utilization % from Anthropic API (0-100)
proxy.usage.resets_at proxy_usage_resets_at window Unix timestamp when the rate limit window resets
proxy.credential.active proxy_credential_active Currently active (non-cooldown) credentials

Window attribute values

  • 5h — 5-hour rolling window
  • 7d — 7-day rolling window
  • 7d_sonnet — 7-day Sonnet-specific window

Structured Logs (Loki)

Each completed request emits a structured log line via OTel LogBridge. Fields are stored as Loki stream labels/structured metadata.

Field Type Description
input_tokens int Input tokens for this request
output_tokens int Output tokens for this request
model string Model used
latency_ms float Request latency in milliseconds
status int HTTP status code (non-stream only)
stream bool Whether this was a streaming request

Log messages: "request completed" (non-stream), "stream completed" (stream).

Per-window token counting via Loki

Window token totals are computed in Grafana using LogQL, not tracked in-memory. This approach survives process restarts and uses exact Anthropic window boundaries.

Grafana variables derive the window age from Prometheus:

window_age_5h = time() - proxy_usage_resets_at{window="5h"} + 18000
window_age_7d = time() - proxy_usage_resets_at{window="7d"} + 604800

LogQL queries sum token values from individual log events within the window:

sum(sum_over_time(
  {service_name="anthropic-proxy"} |= "completed"
    | unwrap output_tokens
    | __error__=""
  [${window_age_5h}s]
))

Annotations (Loki)

429 rate limit events are surfaced as Grafana annotations:

{service_name="anthropic-proxy"} |= "upstream error" | json | status = "429"