Metrics Reference

All metrics are emitted via OpenTelemetry (OTLP gRPC) with cumulative temporality. OTel metric names use dots; Prometheus converts them to underscores and appends _total for counters.

Counters

OTel Name	Prometheus Name	Attributes	Description
`proxy.request.count`	`proxy_request_count_total`	`model`, `stream`, `status_code`	Total proxied requests
`proxy.tokens.input`	`proxy_tokens_input_total`	`model`, `credential`	Input tokens consumed
`proxy.tokens.output`	`proxy_tokens_output_total`	`model`, `credential`	Output tokens consumed
`proxy.upstream.errors`	`proxy_upstream_errors_total`	`error_type`, `credential`, `status_code`	Upstream errors (connection failures, 4xx/5xx)
`proxy.credential.cooldowns`	`proxy_credential_cooldowns_total`	`status_code`	Credential cooldown activations (rate limited)
`proxy.stream.requests`	`proxy_stream_requests_total`	`model`	Streaming request count

Histograms

OTel Name	Prometheus Name	Unit	Attributes	Description
`proxy.request.duration_ms`	`proxy_request_duration_ms_milliseconds`	ms	`model`, `stream`, `status_code`	Request latency
`proxy.request.body_size_bytes`	`proxy_request_body_size_bytes`	bytes	`model`, `stream`	Request body size

Gauges

OTel Name	Prometheus Name	Attributes	Description
`proxy.usage.utilization`	`proxy_usage_utilization`	`window`	Current utilization % from Anthropic API (0-100)
`proxy.usage.resets_at`	`proxy_usage_resets_at`	`window`	Unix timestamp when the rate limit window resets
`proxy.credential.active`	`proxy_credential_active`	—	Currently active (non-cooldown) credentials

Window attribute values

5h — 5-hour rolling window
7d — 7-day rolling window
7d_sonnet — 7-day Sonnet-specific window

Structured Logs (Loki)

Each completed request emits a structured log line via OTel LogBridge. Fields are stored as Loki stream labels/structured metadata.

Field	Type	Description
`input_tokens`	int	Input tokens for this request
`output_tokens`	int	Output tokens for this request
`model`	string	Model used
`latency_ms`	float	Request latency in milliseconds
`status`	int	HTTP status code (non-stream only)
`stream`	bool	Whether this was a streaming request

Log messages: "request completed" (non-stream), "stream completed" (stream).

Per-window token counting via Loki

Window token totals are computed in Grafana using LogQL, not tracked in-memory. This approach survives process restarts and uses exact Anthropic window boundaries.

Grafana variables derive the window age from Prometheus:

window_age_5h = time() - proxy_usage_resets_at{window="5h"} + 18000
window_age_7d = time() - proxy_usage_resets_at{window="7d"} + 604800

LogQL queries sum token values from individual log events within the window:

sum(sum_over_time(
  {service_name="anthropic-proxy"} |= "completed"
    | unwrap output_tokens
    | __error__=""
  [${window_age_5h}s]
))

Annotations (Loki)

429 rate limit events are surfaced as Grafana annotations:

{service_name="anthropic-proxy"} |= "upstream error" | json | status = "429"

3.3 KiB Raw Blame History