Window token counts are now computed in Grafana using the @ modifier
with dashboard variables derived from proxy_usage_resets_at. This
eliminates in-memory state, file persistence, and restart sensitivity.
Removes: TokensIn/Out, RecordTokens, setResetTime, persist.go,
window_tokens observable gauges. -171 lines.
message_delta only contains output_tokens. Input tokens are in the
message_start event under message.usage.input_tokens. This was causing
input token counts to be near-zero for all streaming requests.
Token counts per rate limit window are now derived in Grafana via
increase(counter[5h/168h]) on the existing cumulative OTel counters.
Removes TokensIn/Out from Window, RecordTokens, setResetTime, and
the window_tokens observable gauges.
Poll /api/oauth/usage every 5 min and extract utilization from
/v1/messages response headers for real-time updates. Track proxy
tokens in/out per rate limit window (5h/7d), resetting on window
change. Expose as OTel observable gauges for Grafana dashboards.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensure anthropic-beta includes oauth-2025-04-20 when using OAuth tokens,
fixing 401 "OAuth authentication is currently not supported" errors.
Add debug-level logging for upstream requests/responses, sniffed headers,
and token refresh operations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sniffs a real Claude Code request on startup to capture exact HTTP headers,
then replays them for all proxied requests. Injects the billing header with
per-request SHA256 fingerprint into the system prompt. Uses utls with Chrome
TLS fingerprint to pass Cloudflare's bot detection on api.anthropic.com.
Supports both streaming (SSE) and non-streaming modes, round-robin credential
selection with automatic failover, and loading OAuth tokens from both
cli-proxy-api auth files and native ~/.claude/.credentials.json.