Files
metadata-agregator/docs/research/music-metadata-api/analysis/API.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

20 KiB
Raw Blame History

Music Metadata API - API Reference

API Overview

Music Metadata API exposes a RESTful HTTP API with 11 endpoints for querying music metadata. The API is fully documented with OpenAPI 3.1 and includes an interactive Swagger UI.

Base URL: http://localhost:8080 (configurable via -addr flag)
Content-Type: application/json
Authentication: None (public API)
CORS: Not supported
Rate Limiting: 100 requests/second, 200 burst (per-IP)

Endpoints

Batch Operations

POST /batch/lookup

Retrieve multiple tracks, albums, and artists in a single request.

Request Body:

{
  "tracks": ["track_id_1", "track_id_2"],
  "artists": ["artist_id_1", "artist_id_2"],
  "albums": ["album_id_1", "album_id_2"],
  "isrcs": ["USRC12345678", "GBUM71234567"]
}

Constraints:

  • Maximum 400 items total across all arrays
  • All fields optional (at least one required)
  • Duplicate IDs allowed (deduplicated in response)

Response:

{
  "tracks": {
    "track_id_1": {
      "id": "track_id_1",
      "name": "Song Title",
      "isrc": "USRC12345678",
      "duration_ms": 240000,
      "explicit": false,
      "track_number": 1,
      "disc_number": 1,
      "popularity": 85,
      "preview_url": "https://...",
      "album": { /* Album object */ },
      "artists": [ /* Artist objects */ ],
      "original_title": "Original Title",
      "version_title": "Radio Edit",
      "has_lyrics": true,
      "languages": ["en", "es"],
      "artist_roles": {
        "artist_id_1": ["performer", "composer"]
      }
    }
  },
  "artists": {
    "artist_id_1": { /* Artist object */ }
  },
  "albums": {
    "album_id_1": { /* Album object */ }
  },
  "isrcs": {
    "USRC12345678": { /* Track object */ }
  }
}

Status Codes:

  • 200 OK - Success (even if some items not found)
  • 400 Bad Request - Invalid request (exceeds 400 items, malformed JSON)
  • 429 Too Many Requests - Rate limit exceeded

Performance:

  • Optimized with batch queries (7 queries for 400 items vs 2,400 individual queries)
  • Typical response time: 100-500ms for 400 items

Example:

curl -X POST http://localhost:8080/batch/lookup \
  -H "Content-Type: application/json" \
  -d '{
    "isrcs": ["USRC17607839", "GBUM71029604"],
    "tracks": ["4cOdK2wGLETKBW3PvgPWqT"]
  }'

Track Lookups

GET /lookup/isrc/{isrc}

Retrieve track by ISRC (International Standard Recording Code).

Path Parameters:

  • isrc - ISRC code (e.g., USRC12345678)

Response:

{
  "id": "track_id",
  "name": "Song Title",
  "isrc": "USRC12345678",
  "duration_ms": 240000,
  "explicit": false,
  "track_number": 1,
  "disc_number": 1,
  "popularity": 85,
  "preview_url": "https://p.scdn.co/mp3-preview/...",
  "album": {
    "id": "album_id",
    "name": "Album Title",
    "album_type": "album",
    "label": "Record Label",
    "release_date": "2023-01-15",
    "release_date_precision": "day",
    "external_id_upc": "123456789012",
    "total_tracks": 12,
    "copyright_c": "2023 Label",
    "copyright_p": "2023 Label",
    "images": [
      {
        "url": "https://i.scdn.co/image/...",
        "width": 640,
        "height": 640
      }
    ],
    "artists": [ /* Album artists */ ]
  },
  "artists": [
    {
      "id": "artist_id",
      "name": "Artist Name",
      "followers_total": 1000000,
      "popularity": 90,
      "genres": ["pop", "rock"],
      "images": [ /* Artist images */ ]
    }
  ],
  "original_title": "Original Title",
  "version_title": "Radio Edit",
  "has_lyrics": true,
  "languages": ["en"],
  "artist_roles": {
    "artist_id": ["performer", "composer"]
  }
}

Status Codes:

  • 200 OK - Track found
  • 404 Not Found - ISRC not in database
  • 429 Too Many Requests - Rate limit exceeded

Example:

curl http://localhost:8080/lookup/isrc/USRC17607839

GET /lookup/track/{id}

Retrieve track by internal track ID.

Path Parameters:

  • id - Track ID (internal identifier)

Response: Same as /lookup/isrc/{isrc}

Status Codes:

  • 200 OK - Track found
  • 404 Not Found - Track ID not in database
  • 429 Too Many Requests - Rate limit exceeded

Example:

curl http://localhost:8080/lookup/track/4cOdK2wGLETKBW3PvgPWqT

Artist Lookups

GET /lookup/artist/{id}

Retrieve artist by ID.

Path Parameters:

  • id - Artist ID

Response:

{
  "id": "artist_id",
  "name": "Artist Name",
  "followers_total": 1000000,
  "popularity": 90,
  "genres": ["pop", "rock", "indie"],
  "images": [
    {
      "url": "https://i.scdn.co/image/...",
      "width": 640,
      "height": 640
    },
    {
      "url": "https://i.scdn.co/image/...",
      "width": 320,
      "height": 320
    }
  ]
}

Status Codes:

  • 200 OK - Artist found
  • 404 Not Found - Artist ID not in database
  • 429 Too Many Requests - Rate limit exceeded

Example:

curl http://localhost:8080/lookup/artist/0TnOYISbd1XYRBk9myaseg

Album Lookups

GET /lookup/album/{id}

Retrieve album by ID.

Path Parameters:

  • id - Album ID

Response:

{
  "id": "album_id",
  "name": "Album Title",
  "album_type": "album",
  "label": "Record Label",
  "release_date": "2023-01-15",
  "release_date_precision": "day",
  "external_id_upc": "123456789012",
  "total_tracks": 12,
  "copyright_c": "2023 Label",
  "copyright_p": "2023 Label",
  "images": [
    {
      "url": "https://i.scdn.co/image/...",
      "width": 640,
      "height": 640
    }
  ],
  "artists": [
    {
      "id": "artist_id",
      "name": "Artist Name",
      "followers_total": 1000000,
      "popularity": 90,
      "genres": ["pop"],
      "images": [ /* Artist images */ ]
    }
  ]
}

Status Codes:

  • 200 OK - Album found
  • 404 Not Found - Album ID not in database
  • 429 Too Many Requests - Rate limit exceeded

Example:

curl http://localhost:8080/lookup/album/2ODvWsOgouMbaA5xf0RkJe

GET /lookup/album/{id}/tracks

Retrieve all tracks for an album.

Path Parameters:

  • id - Album ID

Response:

{
  "tracks": [
    {
      "id": "track_id_1",
      "name": "Track 1",
      "track_number": 1,
      "disc_number": 1,
      /* Full track object */
    },
    {
      "id": "track_id_2",
      "name": "Track 2",
      "track_number": 2,
      "disc_number": 1,
      /* Full track object */
    }
  ]
}

Status Codes:

  • 200 OK - Album found (even if no tracks)
  • 404 Not Found - Album ID not in database
  • 429 Too Many Requests - Rate limit exceeded

Example:

curl http://localhost:8080/lookup/album/2ODvWsOgouMbaA5xf0RkJe/tracks

GET /search/track

Search tracks by name.

Query Parameters:

  • q - Search query (minimum 2 characters, required)
  • limit - Maximum results (default 10, max 50)

Search Behavior:

  • Case-insensitive substring match (LIKE %query% COLLATE NOCASE)
  • Ordered by popularity (descending)
  • 10-second timeout
  • Returns partial matches

Response:

{
  "tracks": [
    {
      "id": "track_id",
      "name": "Song Title",
      /* Full track object */
    }
  ],
  "total": 1
}

Status Codes:

  • 200 OK - Search completed (even if no results)
  • 400 Bad Request - Query too short (< 2 chars) or limit too high (> 50)
  • 429 Too Many Requests - Rate limit exceeded
  • 504 Gateway Timeout - Search exceeded 10 seconds

Example:

curl "http://localhost:8080/search/track?q=bohemian&limit=5"

Performance Note: Search uses LIKE %query% which can't leverage indexes efficiently. Searches on common terms may be slow (full table scan on 256M tracks).

GET /search/artist

Search artists by name.

Query Parameters:

  • q - Search query (minimum 2 characters, required)
  • limit - Maximum results (default 10, max 50)

Search Behavior:

  • Case-insensitive substring match (LIKE %query% COLLATE NOCASE)
  • Ordered by follower count (descending)
  • 10-second timeout
  • Returns partial matches

Response:

{
  "artists": [
    {
      "id": "artist_id",
      "name": "Artist Name",
      /* Full artist object */
    }
  ],
  "total": 1
}

Status Codes:

  • 200 OK - Search completed (even if no results)
  • 400 Bad Request - Query too short (< 2 chars) or limit too high (> 50)
  • 429 Too Many Requests - Rate limit exceeded
  • 504 Gateway Timeout - Search exceeded 10 seconds

Example:

curl "http://localhost:8080/search/artist?q=beatles&limit=5"

Health & Documentation

GET /health

Health check endpoint for monitoring.

Response:

{
  "status": "ok"
}

Status Codes:

  • 200 OK - Always (even if database unreachable)

Limitation: This is a naive health check. It doesn't verify database connectivity. A database failure won't be detected until actual queries fail.

Example:

curl http://localhost:8080/health

GET /docs

Interactive Swagger UI for API documentation.

Response: HTML page with embedded Swagger UI

Dependencies:

  • Loads Swagger UI from unpkg.com CDN (browser-side)
  • Requires internet connection for first load
  • Fetches OpenAPI spec from /openapi.yaml

Example:

# Open in browser
open http://localhost:8080/docs

GET /openapi.yaml

OpenAPI 3.1 specification in YAML format.

Response: YAML document with full API specification

Content-Type: application/x-yaml

Example:

curl http://localhost:8080/openapi.yaml

Rate Limiting

Algorithm

Implementation: Token bucket per IP address

Configuration:

  • Rate: 100 requests/second
  • Burst: 200 requests
  • Scope: Per-IP (extracted from X-Forwarded-For or RemoteAddr)

Behavior

Token bucket mechanics:

  1. Each IP gets a bucket with 200 tokens (burst capacity)
  2. Tokens refill at 100/second
  3. Each request consumes 1 token
  4. If bucket empty, request rejected with HTTP 429

Example scenarios:

Scenario Tokens Available Result
First request 200 Allowed (199 remaining)
200 requests in 1 second 200 → 0 All allowed
201st request in same second 0 Rejected (429)
Wait 1 second 0 → 100 100 requests allowed
Steady 50 req/s Always > 0 Never rate limited

Response Headers

When rate limited (HTTP 429):

HTTP/1.1 429 Too Many Requests
Retry-After: 1
Content-Type: text/plain

Rate limit exceeded

Retry-After: Seconds to wait before retrying (always 1)

IP Extraction

Priority:

  1. X-Forwarded-For header (first IP if comma-separated)
  2. RemoteAddr from connection

Example:

X-Forwarded-For: 203.0.113.1, 198.51.100.1
→ Rate limiter uses 203.0.113.1

Known Issues

Memory leak: Visitor map grows unbounded. No cleanup for inactive IPs. Long-running servers will accumulate memory over time.

Workaround: Restart server periodically or implement custom cleanup.

Data Models

Track Object

{
  "id": "string",                    // Internal track ID
  "name": "string",                  // Track title
  "isrc": "string",                  // ISRC code (optional)
  "duration_ms": 0,                  // Duration in milliseconds
  "explicit": false,                 // Explicit content flag
  "track_number": 0,                 // Track number on album
  "disc_number": 0,                  // Disc number (multi-disc albums)
  "popularity": 0,                   // Popularity score (0-100)
  "preview_url": "string",           // 30-second preview URL (optional)
  "album": { /* Album object */ },   // Parent album
  "artists": [ /* Artist objects */ ], // Track artists
  "original_title": "string",        // Original title (optional)
  "version_title": "string",         // Version (e.g., "Radio Edit") (optional)
  "has_lyrics": false,               // Lyrics availability flag
  "languages": ["string"],           // Languages of performance (optional)
  "artist_roles": {                  // Artist roles map (optional)
    "artist_id": ["role1", "role2"]
  }
}

Field notes:

  • isrc: May be null for some tracks
  • preview_url: May be null if no preview available
  • popularity: Higher = more popular (Spotify-style metric)
  • languages: ISO 639-1 codes (e.g., "en", "es")
  • artist_roles: Maps artist ID to roles (e.g., "performer", "composer", "producer")

Album Object

{
  "id": "string",                    // Internal album ID
  "name": "string",                  // Album title
  "album_type": "string",            // "album", "single", "compilation"
  "label": "string",                 // Record label (optional)
  "release_date": "string",          // ISO 8601 date (YYYY-MM-DD)
  "release_date_precision": "string", // "year", "month", "day"
  "external_id_upc": "string",       // UPC barcode (optional)
  "total_tracks": 0,                 // Total tracks on album
  "copyright_c": "string",           // Copyright notice (optional)
  "copyright_p": "string",           // Phonographic copyright (optional)
  "images": [ /* Image objects */ ], // Album artwork (optional)
  "artists": [ /* Artist objects */ ] // Album artists (optional)
}

Field notes:

  • album_type: Typically "album", "single", or "compilation"
  • release_date_precision: Indicates granularity of release date
  • external_id_upc: Universal Product Code (barcode)
  • images: Sorted by size (largest first)

Artist Object

{
  "id": "string",                    // Internal artist ID
  "name": "string",                  // Artist name
  "followers_total": 0,              // Total followers (optional)
  "popularity": 0,                   // Popularity score 0-100 (optional)
  "genres": ["string"],              // Genres (optional)
  "images": [ /* Image objects */ ]  // Artist images (optional)
}

Field notes:

  • followers_total: Spotify-style follower count
  • popularity: Higher = more popular
  • genres: Multiple genres possible (e.g., ["pop", "rock"])
  • images: Sorted by size (largest first)

Image Object

{
  "url": "string",    // Image URL (typically i.scdn.co)
  "width": 0,         // Width in pixels
  "height": 0         // Height in pixels
}

Field notes:

  • URLs reference external CDN (i.scdn.co)
  • Multiple sizes available (640x640, 320x320, 64x64 typical)
  • Images not hosted by API (external references)

Error Responses

Standard Error Format

{
  "error": "Error message"
}

Content-Type: application/json (for structured errors) or text/plain (for simple errors)

Common Error Codes

Status Meaning Common Causes
400 Bad Request Invalid query params, malformed JSON, validation failure
404 Not Found ID/ISRC not in database
429 Too Many Requests Rate limit exceeded
500 Internal Server Error Database error, query timeout
504 Gateway Timeout Search exceeded 10 seconds

Example Error Responses

404 Not Found:

{
  "error": "Track not found"
}

400 Bad Request:

{
  "error": "Query must be at least 2 characters"
}

429 Too Many Requests:

Rate limit exceeded

500 Internal Server Error:

{
  "error": "Database query failed"
}

OpenAPI Specification

Metadata

openapi: 3.1.0
info:
  title: Music Metadata API
  version: 1.0.0
  description: API for querying music metadata from 256M tracks
  license:
    name: MIT
servers:
  - url: http://localhost:8080
    description: Local development server

Example Endpoint Definition

/lookup/track/{id}:
  get:
    summary: Get track by ID
    operationId: getTrack
    parameters:
      - name: id
        in: path
        required: true
        schema:
          type: string
        description: Track ID
    responses:
      '200':
        description: Track found
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/Track'
      '404':
        description: Track not found
      '429':
        description: Rate limit exceeded

Schema Definitions

All data models defined in components/schemas:

  • Track
  • Album
  • Artist
  • Image
  • BatchRequest
  • BatchResponse

Access: http://localhost:8080/openapi.yaml

Usage Examples

Batch Lookup (Python)

import requests

url = "http://localhost:8080/batch/lookup"
payload = {
    "isrcs": ["USRC17607839", "GBUM71029604"],
    "tracks": ["4cOdK2wGLETKBW3PvgPWqT"]
}

response = requests.post(url, json=payload)
data = response.json()

for isrc, track in data.get("isrcs", {}).items():
    print(f"{isrc}: {track['name']} by {track['artists'][0]['name']}")

Search with Pagination (JavaScript)

async function searchTracks(query, limit = 10) {
  const url = `http://localhost:8080/search/track?q=${encodeURIComponent(query)}&limit=${limit}`;
  const response = await fetch(url);
  const data = await response.json();
  return data.tracks;
}

const tracks = await searchTracks("bohemian rhapsody", 5);
tracks.forEach(track => {
  console.log(`${track.name} - ${track.album.name}`);
});

Rate Limit Handling (Go)

func fetchWithRetry(url string) (*http.Response, error) {
    for {
        resp, err := http.Get(url)
        if err != nil {
            return nil, err
        }
        
        if resp.StatusCode == 429 {
            retryAfter := resp.Header.Get("Retry-After")
            duration, _ := time.ParseDuration(retryAfter + "s")
            time.Sleep(duration)
            continue
        }
        
        return resp, nil
    }
}

Performance Considerations

Batch vs Individual Requests

Individual requests (400 tracks):

  • 400 HTTP requests
  • 400 × ~50ms = 20 seconds
  • Rate limited at 100 req/s (4 seconds minimum)

Batch request (400 tracks):

  • 1 HTTP request
  • ~200-500ms total
  • 40-100x faster

Recommendation: Always use batch endpoint for multiple items.

Search Performance

Fast searches:

  • Short, specific queries ("beatles")
  • Queries matching popular items (returned first)

Slow searches:

  • Common words ("love", "the")
  • Long queries with many results
  • Queries requiring full table scan

Recommendation: Implement client-side caching for common searches.

Caching Strategy

Cacheable:

  • Track/album/artist lookups (data rarely changes)
  • Search results (cache for 1 hour)

Not cacheable:

  • Health checks
  • OpenAPI spec (changes with deployments)

Recommendation: Use HTTP caching headers (not currently implemented by API).

Integration Patterns

Enrichment Pipeline

1. Extract ISRCs from audio files (e.g., via AcoustID)
   ↓
2. Batch lookup ISRCs (400 at a time)
   ↓
3. Store track metadata in local database
   ↓
4. Fetch missing artists/albums individually
   ↓
5. Update local cache

Complementing MusicBrainz

MusicBrainz (MBID-based)
   ↓
Resolve ISRC from MusicBrainz
   ↓
Lookup ISRC in Music Metadata API
   ↓
Merge metadata (MusicBrainz + Spotify-style data)

Real-time Lookup

User plays track
   ↓
Extract ISRC from file
   ↓
Check local cache
   ↓
If miss: GET /lookup/isrc/{isrc}
   ↓
Display metadata in UI

Limitations

No Authentication

Impact:

  • Anyone can query API
  • No usage tracking per user
  • No quota enforcement per user

Mitigation:

  • Deploy behind reverse proxy with auth
  • Use firewall rules to restrict access
  • Implement API gateway with authentication

No CORS

Impact:

  • Browser-based clients blocked
  • Can't call from web apps directly

Mitigation:

  • Add CORS middleware (custom implementation)
  • Use server-side proxy
  • Deploy API on same origin as web app

No Metrics

Impact:

  • No visibility into usage patterns
  • Can't track error rates
  • No performance monitoring

Mitigation:

  • Add Prometheus metrics (custom implementation)
  • Use reverse proxy with metrics (e.g., nginx)
  • Parse logs for analytics

Naive Health Check

Impact:

  • Health endpoint returns OK even if database down
  • Monitoring systems can't detect database failures

Mitigation:

  • Implement custom health check with database ping
  • Monitor actual query endpoints (e.g., /lookup/track/test_id)