docs: add metrics reference
This commit is contained in:
@@ -0,0 +1,80 @@
|
||||
# Metrics Reference
|
||||
|
||||
All metrics are emitted via OpenTelemetry (OTLP gRPC) with cumulative temporality. OTel metric names use dots; Prometheus converts them to underscores and appends `_total` for counters.
|
||||
|
||||
## Counters
|
||||
|
||||
| OTel Name | Prometheus Name | Attributes | Description |
|
||||
|---|---|---|---|
|
||||
| `proxy.request.count` | `proxy_request_count_total` | `model`, `stream`, `status_code` | Total proxied requests |
|
||||
| `proxy.tokens.input` | `proxy_tokens_input_total` | `model`, `credential` | Input tokens consumed |
|
||||
| `proxy.tokens.output` | `proxy_tokens_output_total` | `model`, `credential` | Output tokens consumed |
|
||||
| `proxy.upstream.errors` | `proxy_upstream_errors_total` | `error_type`, `credential`, `status_code` | Upstream errors (connection failures, 4xx/5xx) |
|
||||
| `proxy.credential.cooldowns` | `proxy_credential_cooldowns_total` | `status_code` | Credential cooldown activations (rate limited) |
|
||||
| `proxy.stream.requests` | `proxy_stream_requests_total` | `model` | Streaming request count |
|
||||
|
||||
## Histograms
|
||||
|
||||
| OTel Name | Prometheus Name | Unit | Attributes | Description |
|
||||
|---|---|---|---|---|
|
||||
| `proxy.request.duration_ms` | `proxy_request_duration_ms_milliseconds` | ms | `model`, `stream`, `status_code` | Request latency |
|
||||
| `proxy.request.body_size_bytes` | `proxy_request_body_size_bytes` | bytes | `model`, `stream` | Request body size |
|
||||
|
||||
## Gauges
|
||||
|
||||
| OTel Name | Prometheus Name | Attributes | Description |
|
||||
|---|---|---|---|
|
||||
| `proxy.usage.utilization` | `proxy_usage_utilization` | `window` | Current utilization % from Anthropic API (0-100) |
|
||||
| `proxy.usage.resets_at` | `proxy_usage_resets_at` | `window` | Unix timestamp when the rate limit window resets |
|
||||
| `proxy.credential.active` | `proxy_credential_active` | — | Currently active (non-cooldown) credentials |
|
||||
|
||||
### Window attribute values
|
||||
|
||||
- `5h` — 5-hour rolling window
|
||||
- `7d` — 7-day rolling window
|
||||
- `7d_sonnet` — 7-day Sonnet-specific window
|
||||
|
||||
## Structured Logs (Loki)
|
||||
|
||||
Each completed request emits a structured log line via OTel LogBridge. Fields are stored as Loki stream labels/structured metadata.
|
||||
|
||||
| Field | Type | Description |
|
||||
|---|---|---|
|
||||
| `input_tokens` | int | Input tokens for this request |
|
||||
| `output_tokens` | int | Output tokens for this request |
|
||||
| `model` | string | Model used |
|
||||
| `latency_ms` | float | Request latency in milliseconds |
|
||||
| `status` | int | HTTP status code (non-stream only) |
|
||||
| `stream` | bool | Whether this was a streaming request |
|
||||
|
||||
Log messages: `"request completed"` (non-stream), `"stream completed"` (stream).
|
||||
|
||||
### Per-window token counting via Loki
|
||||
|
||||
Window token totals are computed in Grafana using LogQL, not tracked in-memory. This approach survives process restarts and uses exact Anthropic window boundaries.
|
||||
|
||||
Grafana variables derive the window age from Prometheus:
|
||||
|
||||
```
|
||||
window_age_5h = time() - proxy_usage_resets_at{window="5h"} + 18000
|
||||
window_age_7d = time() - proxy_usage_resets_at{window="7d"} + 604800
|
||||
```
|
||||
|
||||
LogQL queries sum token values from individual log events within the window:
|
||||
|
||||
```logql
|
||||
sum(sum_over_time(
|
||||
{service_name="anthropic-proxy"} |= "completed"
|
||||
| unwrap output_tokens
|
||||
| __error__=""
|
||||
[${window_age_5h}s]
|
||||
))
|
||||
```
|
||||
|
||||
## Annotations (Loki)
|
||||
|
||||
429 rate limit events are surfaced as Grafana annotations:
|
||||
|
||||
```logql
|
||||
{service_name="anthropic-proxy"} |= "upstream error" | json | status = "429"
|
||||
```
|
||||
Reference in New Issue
Block a user