Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

896 lines
20 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Music Metadata API - API Reference
## API Overview
Music Metadata API exposes a RESTful HTTP API with 11 endpoints for querying music metadata. The API is fully documented with OpenAPI 3.1 and includes an interactive Swagger UI.
**Base URL:** `http://localhost:8080` (configurable via `-addr` flag)
**Content-Type:** `application/json`
**Authentication:** None (public API)
**CORS:** Not supported
**Rate Limiting:** 100 requests/second, 200 burst (per-IP)
## Endpoints
### Batch Operations
#### POST /batch/lookup
Retrieve multiple tracks, albums, and artists in a single request.
**Request Body:**
```json
{
"tracks": ["track_id_1", "track_id_2"],
"artists": ["artist_id_1", "artist_id_2"],
"albums": ["album_id_1", "album_id_2"],
"isrcs": ["USRC12345678", "GBUM71234567"]
}
```
**Constraints:**
- Maximum 400 items total across all arrays
- All fields optional (at least one required)
- Duplicate IDs allowed (deduplicated in response)
**Response:**
```json
{
"tracks": {
"track_id_1": {
"id": "track_id_1",
"name": "Song Title",
"isrc": "USRC12345678",
"duration_ms": 240000,
"explicit": false,
"track_number": 1,
"disc_number": 1,
"popularity": 85,
"preview_url": "https://...",
"album": { /* Album object */ },
"artists": [ /* Artist objects */ ],
"original_title": "Original Title",
"version_title": "Radio Edit",
"has_lyrics": true,
"languages": ["en", "es"],
"artist_roles": {
"artist_id_1": ["performer", "composer"]
}
}
},
"artists": {
"artist_id_1": { /* Artist object */ }
},
"albums": {
"album_id_1": { /* Album object */ }
},
"isrcs": {
"USRC12345678": { /* Track object */ }
}
}
```
**Status Codes:**
- `200 OK` - Success (even if some items not found)
- `400 Bad Request` - Invalid request (exceeds 400 items, malformed JSON)
- `429 Too Many Requests` - Rate limit exceeded
**Performance:**
- Optimized with batch queries (7 queries for 400 items vs 2,400 individual queries)
- Typical response time: 100-500ms for 400 items
**Example:**
```bash
curl -X POST http://localhost:8080/batch/lookup \
-H "Content-Type: application/json" \
-d '{
"isrcs": ["USRC17607839", "GBUM71029604"],
"tracks": ["4cOdK2wGLETKBW3PvgPWqT"]
}'
```
### Track Lookups
#### GET /lookup/isrc/{isrc}
Retrieve track by ISRC (International Standard Recording Code).
**Path Parameters:**
- `isrc` - ISRC code (e.g., `USRC12345678`)
**Response:**
```json
{
"id": "track_id",
"name": "Song Title",
"isrc": "USRC12345678",
"duration_ms": 240000,
"explicit": false,
"track_number": 1,
"disc_number": 1,
"popularity": 85,
"preview_url": "https://p.scdn.co/mp3-preview/...",
"album": {
"id": "album_id",
"name": "Album Title",
"album_type": "album",
"label": "Record Label",
"release_date": "2023-01-15",
"release_date_precision": "day",
"external_id_upc": "123456789012",
"total_tracks": 12,
"copyright_c": "2023 Label",
"copyright_p": "2023 Label",
"images": [
{
"url": "https://i.scdn.co/image/...",
"width": 640,
"height": 640
}
],
"artists": [ /* Album artists */ ]
},
"artists": [
{
"id": "artist_id",
"name": "Artist Name",
"followers_total": 1000000,
"popularity": 90,
"genres": ["pop", "rock"],
"images": [ /* Artist images */ ]
}
],
"original_title": "Original Title",
"version_title": "Radio Edit",
"has_lyrics": true,
"languages": ["en"],
"artist_roles": {
"artist_id": ["performer", "composer"]
}
}
```
**Status Codes:**
- `200 OK` - Track found
- `404 Not Found` - ISRC not in database
- `429 Too Many Requests` - Rate limit exceeded
**Example:**
```bash
curl http://localhost:8080/lookup/isrc/USRC17607839
```
#### GET /lookup/track/{id}
Retrieve track by internal track ID.
**Path Parameters:**
- `id` - Track ID (internal identifier)
**Response:** Same as `/lookup/isrc/{isrc}`
**Status Codes:**
- `200 OK` - Track found
- `404 Not Found` - Track ID not in database
- `429 Too Many Requests` - Rate limit exceeded
**Example:**
```bash
curl http://localhost:8080/lookup/track/4cOdK2wGLETKBW3PvgPWqT
```
### Artist Lookups
#### GET /lookup/artist/{id}
Retrieve artist by ID.
**Path Parameters:**
- `id` - Artist ID
**Response:**
```json
{
"id": "artist_id",
"name": "Artist Name",
"followers_total": 1000000,
"popularity": 90,
"genres": ["pop", "rock", "indie"],
"images": [
{
"url": "https://i.scdn.co/image/...",
"width": 640,
"height": 640
},
{
"url": "https://i.scdn.co/image/...",
"width": 320,
"height": 320
}
]
}
```
**Status Codes:**
- `200 OK` - Artist found
- `404 Not Found` - Artist ID not in database
- `429 Too Many Requests` - Rate limit exceeded
**Example:**
```bash
curl http://localhost:8080/lookup/artist/0TnOYISbd1XYRBk9myaseg
```
### Album Lookups
#### GET /lookup/album/{id}
Retrieve album by ID.
**Path Parameters:**
- `id` - Album ID
**Response:**
```json
{
"id": "album_id",
"name": "Album Title",
"album_type": "album",
"label": "Record Label",
"release_date": "2023-01-15",
"release_date_precision": "day",
"external_id_upc": "123456789012",
"total_tracks": 12,
"copyright_c": "2023 Label",
"copyright_p": "2023 Label",
"images": [
{
"url": "https://i.scdn.co/image/...",
"width": 640,
"height": 640
}
],
"artists": [
{
"id": "artist_id",
"name": "Artist Name",
"followers_total": 1000000,
"popularity": 90,
"genres": ["pop"],
"images": [ /* Artist images */ ]
}
]
}
```
**Status Codes:**
- `200 OK` - Album found
- `404 Not Found` - Album ID not in database
- `429 Too Many Requests` - Rate limit exceeded
**Example:**
```bash
curl http://localhost:8080/lookup/album/2ODvWsOgouMbaA5xf0RkJe
```
#### GET /lookup/album/{id}/tracks
Retrieve all tracks for an album.
**Path Parameters:**
- `id` - Album ID
**Response:**
```json
{
"tracks": [
{
"id": "track_id_1",
"name": "Track 1",
"track_number": 1,
"disc_number": 1,
/* Full track object */
},
{
"id": "track_id_2",
"name": "Track 2",
"track_number": 2,
"disc_number": 1,
/* Full track object */
}
]
}
```
**Status Codes:**
- `200 OK` - Album found (even if no tracks)
- `404 Not Found` - Album ID not in database
- `429 Too Many Requests` - Rate limit exceeded
**Example:**
```bash
curl http://localhost:8080/lookup/album/2ODvWsOgouMbaA5xf0RkJe/tracks
```
### Search
#### GET /search/track
Search tracks by name.
**Query Parameters:**
- `q` - Search query (minimum 2 characters, required)
- `limit` - Maximum results (default 10, max 50)
**Search Behavior:**
- Case-insensitive substring match (`LIKE %query% COLLATE NOCASE`)
- Ordered by popularity (descending)
- 10-second timeout
- Returns partial matches
**Response:**
```json
{
"tracks": [
{
"id": "track_id",
"name": "Song Title",
/* Full track object */
}
],
"total": 1
}
```
**Status Codes:**
- `200 OK` - Search completed (even if no results)
- `400 Bad Request` - Query too short (< 2 chars) or limit too high (> 50)
- `429 Too Many Requests` - Rate limit exceeded
- `504 Gateway Timeout` - Search exceeded 10 seconds
**Example:**
```bash
curl "http://localhost:8080/search/track?q=bohemian&limit=5"
```
**Performance Note:** Search uses `LIKE %query%` which can't leverage indexes efficiently. Searches on common terms may be slow (full table scan on 256M tracks).
#### GET /search/artist
Search artists by name.
**Query Parameters:**
- `q` - Search query (minimum 2 characters, required)
- `limit` - Maximum results (default 10, max 50)
**Search Behavior:**
- Case-insensitive substring match (`LIKE %query% COLLATE NOCASE`)
- Ordered by follower count (descending)
- 10-second timeout
- Returns partial matches
**Response:**
```json
{
"artists": [
{
"id": "artist_id",
"name": "Artist Name",
/* Full artist object */
}
],
"total": 1
}
```
**Status Codes:**
- `200 OK` - Search completed (even if no results)
- `400 Bad Request` - Query too short (< 2 chars) or limit too high (> 50)
- `429 Too Many Requests` - Rate limit exceeded
- `504 Gateway Timeout` - Search exceeded 10 seconds
**Example:**
```bash
curl "http://localhost:8080/search/artist?q=beatles&limit=5"
```
### Health & Documentation
#### GET /health
Health check endpoint for monitoring.
**Response:**
```json
{
"status": "ok"
}
```
**Status Codes:**
- `200 OK` - Always (even if database unreachable)
**Limitation:** This is a naive health check. It doesn't verify database connectivity. A database failure won't be detected until actual queries fail.
**Example:**
```bash
curl http://localhost:8080/health
```
#### GET /docs
Interactive Swagger UI for API documentation.
**Response:** HTML page with embedded Swagger UI
**Dependencies:**
- Loads Swagger UI from unpkg.com CDN (browser-side)
- Requires internet connection for first load
- Fetches OpenAPI spec from `/openapi.yaml`
**Example:**
```bash
# Open in browser
open http://localhost:8080/docs
```
#### GET /openapi.yaml
OpenAPI 3.1 specification in YAML format.
**Response:** YAML document with full API specification
**Content-Type:** `application/x-yaml`
**Example:**
```bash
curl http://localhost:8080/openapi.yaml
```
## Rate Limiting
### Algorithm
**Implementation:** Token bucket per IP address
**Configuration:**
- **Rate:** 100 requests/second
- **Burst:** 200 requests
- **Scope:** Per-IP (extracted from `X-Forwarded-For` or `RemoteAddr`)
### Behavior
**Token bucket mechanics:**
1. Each IP gets a bucket with 200 tokens (burst capacity)
2. Tokens refill at 100/second
3. Each request consumes 1 token
4. If bucket empty, request rejected with HTTP 429
**Example scenarios:**
| Scenario | Tokens Available | Result |
|----------|------------------|--------|
| First request | 200 | Allowed (199 remaining) |
| 200 requests in 1 second | 200 → 0 | All allowed |
| 201st request in same second | 0 | Rejected (429) |
| Wait 1 second | 0 → 100 | 100 requests allowed |
| Steady 50 req/s | Always > 0 | Never rate limited |
### Response Headers
**When rate limited (HTTP 429):**
```
HTTP/1.1 429 Too Many Requests
Retry-After: 1
Content-Type: text/plain
Rate limit exceeded
```
**Retry-After:** Seconds to wait before retrying (always 1)
### IP Extraction
**Priority:**
1. `X-Forwarded-For` header (first IP if comma-separated)
2. `RemoteAddr` from connection
**Example:**
```
X-Forwarded-For: 203.0.113.1, 198.51.100.1
→ Rate limiter uses 203.0.113.1
```
### Known Issues
**Memory leak:** Visitor map grows unbounded. No cleanup for inactive IPs. Long-running servers will accumulate memory over time.
**Workaround:** Restart server periodically or implement custom cleanup.
## Data Models
### Track Object
```json
{
"id": "string", // Internal track ID
"name": "string", // Track title
"isrc": "string", // ISRC code (optional)
"duration_ms": 0, // Duration in milliseconds
"explicit": false, // Explicit content flag
"track_number": 0, // Track number on album
"disc_number": 0, // Disc number (multi-disc albums)
"popularity": 0, // Popularity score (0-100)
"preview_url": "string", // 30-second preview URL (optional)
"album": { /* Album object */ }, // Parent album
"artists": [ /* Artist objects */ ], // Track artists
"original_title": "string", // Original title (optional)
"version_title": "string", // Version (e.g., "Radio Edit") (optional)
"has_lyrics": false, // Lyrics availability flag
"languages": ["string"], // Languages of performance (optional)
"artist_roles": { // Artist roles map (optional)
"artist_id": ["role1", "role2"]
}
}
```
**Field notes:**
- `isrc`: May be null for some tracks
- `preview_url`: May be null if no preview available
- `popularity`: Higher = more popular (Spotify-style metric)
- `languages`: ISO 639-1 codes (e.g., "en", "es")
- `artist_roles`: Maps artist ID to roles (e.g., "performer", "composer", "producer")
### Album Object
```json
{
"id": "string", // Internal album ID
"name": "string", // Album title
"album_type": "string", // "album", "single", "compilation"
"label": "string", // Record label (optional)
"release_date": "string", // ISO 8601 date (YYYY-MM-DD)
"release_date_precision": "string", // "year", "month", "day"
"external_id_upc": "string", // UPC barcode (optional)
"total_tracks": 0, // Total tracks on album
"copyright_c": "string", // Copyright notice (optional)
"copyright_p": "string", // Phonographic copyright (optional)
"images": [ /* Image objects */ ], // Album artwork (optional)
"artists": [ /* Artist objects */ ] // Album artists (optional)
}
```
**Field notes:**
- `album_type`: Typically "album", "single", or "compilation"
- `release_date_precision`: Indicates granularity of release date
- `external_id_upc`: Universal Product Code (barcode)
- `images`: Sorted by size (largest first)
### Artist Object
```json
{
"id": "string", // Internal artist ID
"name": "string", // Artist name
"followers_total": 0, // Total followers (optional)
"popularity": 0, // Popularity score 0-100 (optional)
"genres": ["string"], // Genres (optional)
"images": [ /* Image objects */ ] // Artist images (optional)
}
```
**Field notes:**
- `followers_total`: Spotify-style follower count
- `popularity`: Higher = more popular
- `genres`: Multiple genres possible (e.g., ["pop", "rock"])
- `images`: Sorted by size (largest first)
### Image Object
```json
{
"url": "string", // Image URL (typically i.scdn.co)
"width": 0, // Width in pixels
"height": 0 // Height in pixels
}
```
**Field notes:**
- URLs reference external CDN (i.scdn.co)
- Multiple sizes available (640x640, 320x320, 64x64 typical)
- Images not hosted by API (external references)
## Error Responses
### Standard Error Format
```json
{
"error": "Error message"
}
```
**Content-Type:** `application/json` (for structured errors) or `text/plain` (for simple errors)
### Common Error Codes
| Status | Meaning | Common Causes |
|--------|---------|---------------|
| 400 | Bad Request | Invalid query params, malformed JSON, validation failure |
| 404 | Not Found | ID/ISRC not in database |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Database error, query timeout |
| 504 | Gateway Timeout | Search exceeded 10 seconds |
### Example Error Responses
**404 Not Found:**
```json
{
"error": "Track not found"
}
```
**400 Bad Request:**
```json
{
"error": "Query must be at least 2 characters"
}
```
**429 Too Many Requests:**
```
Rate limit exceeded
```
**500 Internal Server Error:**
```json
{
"error": "Database query failed"
}
```
## OpenAPI Specification
### Metadata
```yaml
openapi: 3.1.0
info:
title: Music Metadata API
version: 1.0.0
description: API for querying music metadata from 256M tracks
license:
name: MIT
servers:
- url: http://localhost:8080
description: Local development server
```
### Example Endpoint Definition
```yaml
/lookup/track/{id}:
get:
summary: Get track by ID
operationId: getTrack
parameters:
- name: id
in: path
required: true
schema:
type: string
description: Track ID
responses:
'200':
description: Track found
content:
application/json:
schema:
$ref: '#/components/schemas/Track'
'404':
description: Track not found
'429':
description: Rate limit exceeded
```
### Schema Definitions
All data models defined in `components/schemas`:
- `Track`
- `Album`
- `Artist`
- `Image`
- `BatchRequest`
- `BatchResponse`
**Access:** http://localhost:8080/openapi.yaml
## Usage Examples
### Batch Lookup (Python)
```python
import requests
url = "http://localhost:8080/batch/lookup"
payload = {
"isrcs": ["USRC17607839", "GBUM71029604"],
"tracks": ["4cOdK2wGLETKBW3PvgPWqT"]
}
response = requests.post(url, json=payload)
data = response.json()
for isrc, track in data.get("isrcs", {}).items():
print(f"{isrc}: {track['name']} by {track['artists'][0]['name']}")
```
### Search with Pagination (JavaScript)
```javascript
async function searchTracks(query, limit = 10) {
const url = `http://localhost:8080/search/track?q=${encodeURIComponent(query)}&limit=${limit}`;
const response = await fetch(url);
const data = await response.json();
return data.tracks;
}
const tracks = await searchTracks("bohemian rhapsody", 5);
tracks.forEach(track => {
console.log(`${track.name} - ${track.album.name}`);
});
```
### Rate Limit Handling (Go)
```go
func fetchWithRetry(url string) (*http.Response, error) {
for {
resp, err := http.Get(url)
if err != nil {
return nil, err
}
if resp.StatusCode == 429 {
retryAfter := resp.Header.Get("Retry-After")
duration, _ := time.ParseDuration(retryAfter + "s")
time.Sleep(duration)
continue
}
return resp, nil
}
}
```
## Performance Considerations
### Batch vs Individual Requests
**Individual requests (400 tracks):**
- 400 HTTP requests
- 400 × ~50ms = 20 seconds
- Rate limited at 100 req/s (4 seconds minimum)
**Batch request (400 tracks):**
- 1 HTTP request
- ~200-500ms total
- **40-100x faster**
**Recommendation:** Always use batch endpoint for multiple items.
### Search Performance
**Fast searches:**
- Short, specific queries ("beatles")
- Queries matching popular items (returned first)
**Slow searches:**
- Common words ("love", "the")
- Long queries with many results
- Queries requiring full table scan
**Recommendation:** Implement client-side caching for common searches.
### Caching Strategy
**Cacheable:**
- Track/album/artist lookups (data rarely changes)
- Search results (cache for 1 hour)
**Not cacheable:**
- Health checks
- OpenAPI spec (changes with deployments)
**Recommendation:** Use HTTP caching headers (not currently implemented by API).
## Integration Patterns
### Enrichment Pipeline
```
1. Extract ISRCs from audio files (e.g., via AcoustID)
2. Batch lookup ISRCs (400 at a time)
3. Store track metadata in local database
4. Fetch missing artists/albums individually
5. Update local cache
```
### Complementing MusicBrainz
```
MusicBrainz (MBID-based)
Resolve ISRC from MusicBrainz
Lookup ISRC in Music Metadata API
Merge metadata (MusicBrainz + Spotify-style data)
```
### Real-time Lookup
```
User plays track
Extract ISRC from file
Check local cache
If miss: GET /lookup/isrc/{isrc}
Display metadata in UI
```
## Limitations
### No Authentication
**Impact:**
- Anyone can query API
- No usage tracking per user
- No quota enforcement per user
**Mitigation:**
- Deploy behind reverse proxy with auth
- Use firewall rules to restrict access
- Implement API gateway with authentication
### No CORS
**Impact:**
- Browser-based clients blocked
- Can't call from web apps directly
**Mitigation:**
- Add CORS middleware (custom implementation)
- Use server-side proxy
- Deploy API on same origin as web app
### No Metrics
**Impact:**
- No visibility into usage patterns
- Can't track error rates
- No performance monitoring
**Mitigation:**
- Add Prometheus metrics (custom implementation)
- Use reverse proxy with metrics (e.g., nginx)
- Parse logs for analytics
### Naive Health Check
**Impact:**
- Health endpoint returns OK even if database down
- Monitoring systems can't detect database failures
**Mitigation:**
- Implement custom health check with database ping
- Monitor actual query endpoints (e.g., /lookup/track/test_id)