# Music Metadata API - API Reference ## API Overview Music Metadata API exposes a RESTful HTTP API with 11 endpoints for querying music metadata. The API is fully documented with OpenAPI 3.1 and includes an interactive Swagger UI. **Base URL:** `http://localhost:8080` (configurable via `-addr` flag) **Content-Type:** `application/json` **Authentication:** None (public API) **CORS:** Not supported **Rate Limiting:** 100 requests/second, 200 burst (per-IP) ## Endpoints ### Batch Operations #### POST /batch/lookup Retrieve multiple tracks, albums, and artists in a single request. **Request Body:** ```json { "tracks": ["track_id_1", "track_id_2"], "artists": ["artist_id_1", "artist_id_2"], "albums": ["album_id_1", "album_id_2"], "isrcs": ["USRC12345678", "GBUM71234567"] } ``` **Constraints:** - Maximum 400 items total across all arrays - All fields optional (at least one required) - Duplicate IDs allowed (deduplicated in response) **Response:** ```json { "tracks": { "track_id_1": { "id": "track_id_1", "name": "Song Title", "isrc": "USRC12345678", "duration_ms": 240000, "explicit": false, "track_number": 1, "disc_number": 1, "popularity": 85, "preview_url": "https://...", "album": { /* Album object */ }, "artists": [ /* Artist objects */ ], "original_title": "Original Title", "version_title": "Radio Edit", "has_lyrics": true, "languages": ["en", "es"], "artist_roles": { "artist_id_1": ["performer", "composer"] } } }, "artists": { "artist_id_1": { /* Artist object */ } }, "albums": { "album_id_1": { /* Album object */ } }, "isrcs": { "USRC12345678": { /* Track object */ } } } ``` **Status Codes:** - `200 OK` - Success (even if some items not found) - `400 Bad Request` - Invalid request (exceeds 400 items, malformed JSON) - `429 Too Many Requests` - Rate limit exceeded **Performance:** - Optimized with batch queries (7 queries for 400 items vs 2,400 individual queries) - Typical response time: 100-500ms for 400 items **Example:** ```bash curl -X POST http://localhost:8080/batch/lookup \ -H "Content-Type: application/json" \ -d '{ "isrcs": ["USRC17607839", "GBUM71029604"], "tracks": ["4cOdK2wGLETKBW3PvgPWqT"] }' ``` ### Track Lookups #### GET /lookup/isrc/{isrc} Retrieve track by ISRC (International Standard Recording Code). **Path Parameters:** - `isrc` - ISRC code (e.g., `USRC12345678`) **Response:** ```json { "id": "track_id", "name": "Song Title", "isrc": "USRC12345678", "duration_ms": 240000, "explicit": false, "track_number": 1, "disc_number": 1, "popularity": 85, "preview_url": "https://p.scdn.co/mp3-preview/...", "album": { "id": "album_id", "name": "Album Title", "album_type": "album", "label": "Record Label", "release_date": "2023-01-15", "release_date_precision": "day", "external_id_upc": "123456789012", "total_tracks": 12, "copyright_c": "2023 Label", "copyright_p": "2023 Label", "images": [ { "url": "https://i.scdn.co/image/...", "width": 640, "height": 640 } ], "artists": [ /* Album artists */ ] }, "artists": [ { "id": "artist_id", "name": "Artist Name", "followers_total": 1000000, "popularity": 90, "genres": ["pop", "rock"], "images": [ /* Artist images */ ] } ], "original_title": "Original Title", "version_title": "Radio Edit", "has_lyrics": true, "languages": ["en"], "artist_roles": { "artist_id": ["performer", "composer"] } } ``` **Status Codes:** - `200 OK` - Track found - `404 Not Found` - ISRC not in database - `429 Too Many Requests` - Rate limit exceeded **Example:** ```bash curl http://localhost:8080/lookup/isrc/USRC17607839 ``` #### GET /lookup/track/{id} Retrieve track by internal track ID. **Path Parameters:** - `id` - Track ID (internal identifier) **Response:** Same as `/lookup/isrc/{isrc}` **Status Codes:** - `200 OK` - Track found - `404 Not Found` - Track ID not in database - `429 Too Many Requests` - Rate limit exceeded **Example:** ```bash curl http://localhost:8080/lookup/track/4cOdK2wGLETKBW3PvgPWqT ``` ### Artist Lookups #### GET /lookup/artist/{id} Retrieve artist by ID. **Path Parameters:** - `id` - Artist ID **Response:** ```json { "id": "artist_id", "name": "Artist Name", "followers_total": 1000000, "popularity": 90, "genres": ["pop", "rock", "indie"], "images": [ { "url": "https://i.scdn.co/image/...", "width": 640, "height": 640 }, { "url": "https://i.scdn.co/image/...", "width": 320, "height": 320 } ] } ``` **Status Codes:** - `200 OK` - Artist found - `404 Not Found` - Artist ID not in database - `429 Too Many Requests` - Rate limit exceeded **Example:** ```bash curl http://localhost:8080/lookup/artist/0TnOYISbd1XYRBk9myaseg ``` ### Album Lookups #### GET /lookup/album/{id} Retrieve album by ID. **Path Parameters:** - `id` - Album ID **Response:** ```json { "id": "album_id", "name": "Album Title", "album_type": "album", "label": "Record Label", "release_date": "2023-01-15", "release_date_precision": "day", "external_id_upc": "123456789012", "total_tracks": 12, "copyright_c": "2023 Label", "copyright_p": "2023 Label", "images": [ { "url": "https://i.scdn.co/image/...", "width": 640, "height": 640 } ], "artists": [ { "id": "artist_id", "name": "Artist Name", "followers_total": 1000000, "popularity": 90, "genres": ["pop"], "images": [ /* Artist images */ ] } ] } ``` **Status Codes:** - `200 OK` - Album found - `404 Not Found` - Album ID not in database - `429 Too Many Requests` - Rate limit exceeded **Example:** ```bash curl http://localhost:8080/lookup/album/2ODvWsOgouMbaA5xf0RkJe ``` #### GET /lookup/album/{id}/tracks Retrieve all tracks for an album. **Path Parameters:** - `id` - Album ID **Response:** ```json { "tracks": [ { "id": "track_id_1", "name": "Track 1", "track_number": 1, "disc_number": 1, /* Full track object */ }, { "id": "track_id_2", "name": "Track 2", "track_number": 2, "disc_number": 1, /* Full track object */ } ] } ``` **Status Codes:** - `200 OK` - Album found (even if no tracks) - `404 Not Found` - Album ID not in database - `429 Too Many Requests` - Rate limit exceeded **Example:** ```bash curl http://localhost:8080/lookup/album/2ODvWsOgouMbaA5xf0RkJe/tracks ``` ### Search #### GET /search/track Search tracks by name. **Query Parameters:** - `q` - Search query (minimum 2 characters, required) - `limit` - Maximum results (default 10, max 50) **Search Behavior:** - Case-insensitive substring match (`LIKE %query% COLLATE NOCASE`) - Ordered by popularity (descending) - 10-second timeout - Returns partial matches **Response:** ```json { "tracks": [ { "id": "track_id", "name": "Song Title", /* Full track object */ } ], "total": 1 } ``` **Status Codes:** - `200 OK` - Search completed (even if no results) - `400 Bad Request` - Query too short (< 2 chars) or limit too high (> 50) - `429 Too Many Requests` - Rate limit exceeded - `504 Gateway Timeout` - Search exceeded 10 seconds **Example:** ```bash curl "http://localhost:8080/search/track?q=bohemian&limit=5" ``` **Performance Note:** Search uses `LIKE %query%` which can't leverage indexes efficiently. Searches on common terms may be slow (full table scan on 256M tracks). #### GET /search/artist Search artists by name. **Query Parameters:** - `q` - Search query (minimum 2 characters, required) - `limit` - Maximum results (default 10, max 50) **Search Behavior:** - Case-insensitive substring match (`LIKE %query% COLLATE NOCASE`) - Ordered by follower count (descending) - 10-second timeout - Returns partial matches **Response:** ```json { "artists": [ { "id": "artist_id", "name": "Artist Name", /* Full artist object */ } ], "total": 1 } ``` **Status Codes:** - `200 OK` - Search completed (even if no results) - `400 Bad Request` - Query too short (< 2 chars) or limit too high (> 50) - `429 Too Many Requests` - Rate limit exceeded - `504 Gateway Timeout` - Search exceeded 10 seconds **Example:** ```bash curl "http://localhost:8080/search/artist?q=beatles&limit=5" ``` ### Health & Documentation #### GET /health Health check endpoint for monitoring. **Response:** ```json { "status": "ok" } ``` **Status Codes:** - `200 OK` - Always (even if database unreachable) **Limitation:** This is a naive health check. It doesn't verify database connectivity. A database failure won't be detected until actual queries fail. **Example:** ```bash curl http://localhost:8080/health ``` #### GET /docs Interactive Swagger UI for API documentation. **Response:** HTML page with embedded Swagger UI **Dependencies:** - Loads Swagger UI from unpkg.com CDN (browser-side) - Requires internet connection for first load - Fetches OpenAPI spec from `/openapi.yaml` **Example:** ```bash # Open in browser open http://localhost:8080/docs ``` #### GET /openapi.yaml OpenAPI 3.1 specification in YAML format. **Response:** YAML document with full API specification **Content-Type:** `application/x-yaml` **Example:** ```bash curl http://localhost:8080/openapi.yaml ``` ## Rate Limiting ### Algorithm **Implementation:** Token bucket per IP address **Configuration:** - **Rate:** 100 requests/second - **Burst:** 200 requests - **Scope:** Per-IP (extracted from `X-Forwarded-For` or `RemoteAddr`) ### Behavior **Token bucket mechanics:** 1. Each IP gets a bucket with 200 tokens (burst capacity) 2. Tokens refill at 100/second 3. Each request consumes 1 token 4. If bucket empty, request rejected with HTTP 429 **Example scenarios:** | Scenario | Tokens Available | Result | |----------|------------------|--------| | First request | 200 | Allowed (199 remaining) | | 200 requests in 1 second | 200 → 0 | All allowed | | 201st request in same second | 0 | Rejected (429) | | Wait 1 second | 0 → 100 | 100 requests allowed | | Steady 50 req/s | Always > 0 | Never rate limited | ### Response Headers **When rate limited (HTTP 429):** ``` HTTP/1.1 429 Too Many Requests Retry-After: 1 Content-Type: text/plain Rate limit exceeded ``` **Retry-After:** Seconds to wait before retrying (always 1) ### IP Extraction **Priority:** 1. `X-Forwarded-For` header (first IP if comma-separated) 2. `RemoteAddr` from connection **Example:** ``` X-Forwarded-For: 203.0.113.1, 198.51.100.1 → Rate limiter uses 203.0.113.1 ``` ### Known Issues **Memory leak:** Visitor map grows unbounded. No cleanup for inactive IPs. Long-running servers will accumulate memory over time. **Workaround:** Restart server periodically or implement custom cleanup. ## Data Models ### Track Object ```json { "id": "string", // Internal track ID "name": "string", // Track title "isrc": "string", // ISRC code (optional) "duration_ms": 0, // Duration in milliseconds "explicit": false, // Explicit content flag "track_number": 0, // Track number on album "disc_number": 0, // Disc number (multi-disc albums) "popularity": 0, // Popularity score (0-100) "preview_url": "string", // 30-second preview URL (optional) "album": { /* Album object */ }, // Parent album "artists": [ /* Artist objects */ ], // Track artists "original_title": "string", // Original title (optional) "version_title": "string", // Version (e.g., "Radio Edit") (optional) "has_lyrics": false, // Lyrics availability flag "languages": ["string"], // Languages of performance (optional) "artist_roles": { // Artist roles map (optional) "artist_id": ["role1", "role2"] } } ``` **Field notes:** - `isrc`: May be null for some tracks - `preview_url`: May be null if no preview available - `popularity`: Higher = more popular (Spotify-style metric) - `languages`: ISO 639-1 codes (e.g., "en", "es") - `artist_roles`: Maps artist ID to roles (e.g., "performer", "composer", "producer") ### Album Object ```json { "id": "string", // Internal album ID "name": "string", // Album title "album_type": "string", // "album", "single", "compilation" "label": "string", // Record label (optional) "release_date": "string", // ISO 8601 date (YYYY-MM-DD) "release_date_precision": "string", // "year", "month", "day" "external_id_upc": "string", // UPC barcode (optional) "total_tracks": 0, // Total tracks on album "copyright_c": "string", // Copyright notice (optional) "copyright_p": "string", // Phonographic copyright (optional) "images": [ /* Image objects */ ], // Album artwork (optional) "artists": [ /* Artist objects */ ] // Album artists (optional) } ``` **Field notes:** - `album_type`: Typically "album", "single", or "compilation" - `release_date_precision`: Indicates granularity of release date - `external_id_upc`: Universal Product Code (barcode) - `images`: Sorted by size (largest first) ### Artist Object ```json { "id": "string", // Internal artist ID "name": "string", // Artist name "followers_total": 0, // Total followers (optional) "popularity": 0, // Popularity score 0-100 (optional) "genres": ["string"], // Genres (optional) "images": [ /* Image objects */ ] // Artist images (optional) } ``` **Field notes:** - `followers_total`: Spotify-style follower count - `popularity`: Higher = more popular - `genres`: Multiple genres possible (e.g., ["pop", "rock"]) - `images`: Sorted by size (largest first) ### Image Object ```json { "url": "string", // Image URL (typically i.scdn.co) "width": 0, // Width in pixels "height": 0 // Height in pixels } ``` **Field notes:** - URLs reference external CDN (i.scdn.co) - Multiple sizes available (640x640, 320x320, 64x64 typical) - Images not hosted by API (external references) ## Error Responses ### Standard Error Format ```json { "error": "Error message" } ``` **Content-Type:** `application/json` (for structured errors) or `text/plain` (for simple errors) ### Common Error Codes | Status | Meaning | Common Causes | |--------|---------|---------------| | 400 | Bad Request | Invalid query params, malformed JSON, validation failure | | 404 | Not Found | ID/ISRC not in database | | 429 | Too Many Requests | Rate limit exceeded | | 500 | Internal Server Error | Database error, query timeout | | 504 | Gateway Timeout | Search exceeded 10 seconds | ### Example Error Responses **404 Not Found:** ```json { "error": "Track not found" } ``` **400 Bad Request:** ```json { "error": "Query must be at least 2 characters" } ``` **429 Too Many Requests:** ``` Rate limit exceeded ``` **500 Internal Server Error:** ```json { "error": "Database query failed" } ``` ## OpenAPI Specification ### Metadata ```yaml openapi: 3.1.0 info: title: Music Metadata API version: 1.0.0 description: API for querying music metadata from 256M tracks license: name: MIT servers: - url: http://localhost:8080 description: Local development server ``` ### Example Endpoint Definition ```yaml /lookup/track/{id}: get: summary: Get track by ID operationId: getTrack parameters: - name: id in: path required: true schema: type: string description: Track ID responses: '200': description: Track found content: application/json: schema: $ref: '#/components/schemas/Track' '404': description: Track not found '429': description: Rate limit exceeded ``` ### Schema Definitions All data models defined in `components/schemas`: - `Track` - `Album` - `Artist` - `Image` - `BatchRequest` - `BatchResponse` **Access:** http://localhost:8080/openapi.yaml ## Usage Examples ### Batch Lookup (Python) ```python import requests url = "http://localhost:8080/batch/lookup" payload = { "isrcs": ["USRC17607839", "GBUM71029604"], "tracks": ["4cOdK2wGLETKBW3PvgPWqT"] } response = requests.post(url, json=payload) data = response.json() for isrc, track in data.get("isrcs", {}).items(): print(f"{isrc}: {track['name']} by {track['artists'][0]['name']}") ``` ### Search with Pagination (JavaScript) ```javascript async function searchTracks(query, limit = 10) { const url = `http://localhost:8080/search/track?q=${encodeURIComponent(query)}&limit=${limit}`; const response = await fetch(url); const data = await response.json(); return data.tracks; } const tracks = await searchTracks("bohemian rhapsody", 5); tracks.forEach(track => { console.log(`${track.name} - ${track.album.name}`); }); ``` ### Rate Limit Handling (Go) ```go func fetchWithRetry(url string) (*http.Response, error) { for { resp, err := http.Get(url) if err != nil { return nil, err } if resp.StatusCode == 429 { retryAfter := resp.Header.Get("Retry-After") duration, _ := time.ParseDuration(retryAfter + "s") time.Sleep(duration) continue } return resp, nil } } ``` ## Performance Considerations ### Batch vs Individual Requests **Individual requests (400 tracks):** - 400 HTTP requests - 400 × ~50ms = 20 seconds - Rate limited at 100 req/s (4 seconds minimum) **Batch request (400 tracks):** - 1 HTTP request - ~200-500ms total - **40-100x faster** **Recommendation:** Always use batch endpoint for multiple items. ### Search Performance **Fast searches:** - Short, specific queries ("beatles") - Queries matching popular items (returned first) **Slow searches:** - Common words ("love", "the") - Long queries with many results - Queries requiring full table scan **Recommendation:** Implement client-side caching for common searches. ### Caching Strategy **Cacheable:** - Track/album/artist lookups (data rarely changes) - Search results (cache for 1 hour) **Not cacheable:** - Health checks - OpenAPI spec (changes with deployments) **Recommendation:** Use HTTP caching headers (not currently implemented by API). ## Integration Patterns ### Enrichment Pipeline ``` 1. Extract ISRCs from audio files (e.g., via AcoustID) ↓ 2. Batch lookup ISRCs (400 at a time) ↓ 3. Store track metadata in local database ↓ 4. Fetch missing artists/albums individually ↓ 5. Update local cache ``` ### Complementing MusicBrainz ``` MusicBrainz (MBID-based) ↓ Resolve ISRC from MusicBrainz ↓ Lookup ISRC in Music Metadata API ↓ Merge metadata (MusicBrainz + Spotify-style data) ``` ### Real-time Lookup ``` User plays track ↓ Extract ISRC from file ↓ Check local cache ↓ If miss: GET /lookup/isrc/{isrc} ↓ Display metadata in UI ``` ## Limitations ### No Authentication **Impact:** - Anyone can query API - No usage tracking per user - No quota enforcement per user **Mitigation:** - Deploy behind reverse proxy with auth - Use firewall rules to restrict access - Implement API gateway with authentication ### No CORS **Impact:** - Browser-based clients blocked - Can't call from web apps directly **Mitigation:** - Add CORS middleware (custom implementation) - Use server-side proxy - Deploy API on same origin as web app ### No Metrics **Impact:** - No visibility into usage patterns - Can't track error rates - No performance monitoring **Mitigation:** - Add Prometheus metrics (custom implementation) - Use reverse proxy with metrics (e.g., nginx) - Parse logs for analytics ### Naive Health Check **Impact:** - Health endpoint returns OK even if database down - Monitoring systems can't detect database failures **Mitigation:** - Implement custom health check with database ping - Monitor actual query endpoints (e.g., /lookup/track/test_id)