3.3 KiB
3.3 KiB
Metrics Reference
All metrics are emitted via OpenTelemetry (OTLP gRPC) with cumulative temporality. OTel metric names use dots; Prometheus converts them to underscores and appends _total for counters.
Counters
| OTel Name | Prometheus Name | Attributes | Description |
|---|---|---|---|
proxy.request.count |
proxy_request_count_total |
model, stream, status_code |
Total proxied requests |
proxy.tokens.input |
proxy_tokens_input_total |
model, credential |
Input tokens consumed |
proxy.tokens.output |
proxy_tokens_output_total |
model, credential |
Output tokens consumed |
proxy.upstream.errors |
proxy_upstream_errors_total |
error_type, credential, status_code |
Upstream errors (connection failures, 4xx/5xx) |
proxy.credential.cooldowns |
proxy_credential_cooldowns_total |
status_code |
Credential cooldown activations (rate limited) |
proxy.stream.requests |
proxy_stream_requests_total |
model |
Streaming request count |
Histograms
| OTel Name | Prometheus Name | Unit | Attributes | Description |
|---|---|---|---|---|
proxy.request.duration_ms |
proxy_request_duration_ms_milliseconds |
ms | model, stream, status_code |
Request latency |
proxy.request.body_size_bytes |
proxy_request_body_size_bytes |
bytes | model, stream |
Request body size |
Gauges
| OTel Name | Prometheus Name | Attributes | Description |
|---|---|---|---|
proxy.usage.utilization |
proxy_usage_utilization |
window |
Current utilization % from Anthropic API (0-100) |
proxy.usage.resets_at |
proxy_usage_resets_at |
window |
Unix timestamp when the rate limit window resets |
proxy.credential.active |
proxy_credential_active |
— | Currently active (non-cooldown) credentials |
Window attribute values
5h— 5-hour rolling window7d— 7-day rolling window7d_sonnet— 7-day Sonnet-specific window
Structured Logs (Loki)
Each completed request emits a structured log line via OTel LogBridge. Fields are stored as Loki stream labels/structured metadata.
| Field | Type | Description |
|---|---|---|
input_tokens |
int | Input tokens for this request |
output_tokens |
int | Output tokens for this request |
model |
string | Model used |
latency_ms |
float | Request latency in milliseconds |
status |
int | HTTP status code (non-stream only) |
stream |
bool | Whether this was a streaming request |
Log messages: "request completed" (non-stream), "stream completed" (stream).
Per-window token counting via Loki
Window token totals are computed in Grafana using LogQL, not tracked in-memory. This approach survives process restarts and uses exact Anthropic window boundaries.
Grafana variables derive the window age from Prometheus:
window_age_5h = time() - proxy_usage_resets_at{window="5h"} + 18000
window_age_7d = time() - proxy_usage_resets_at{window="7d"} + 604800
LogQL queries sum token values from individual log events within the window:
sum(sum_over_time(
{service_name="anthropic-proxy"} |= "completed"
| unwrap output_tokens
| __error__=""
[${window_age_5h}s]
))
Annotations (Loki)
429 rate limit events are surfaced as Grafana annotations:
{service_name="anthropic-proxy"} |= "upstream error" | json | status = "429"