feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,895 @@
|
||||
# Music Metadata API - API Reference
|
||||
|
||||
## API Overview
|
||||
|
||||
Music Metadata API exposes a RESTful HTTP API with 11 endpoints for querying music metadata. The API is fully documented with OpenAPI 3.1 and includes an interactive Swagger UI.
|
||||
|
||||
**Base URL:** `http://localhost:8080` (configurable via `-addr` flag)
|
||||
**Content-Type:** `application/json`
|
||||
**Authentication:** None (public API)
|
||||
**CORS:** Not supported
|
||||
**Rate Limiting:** 100 requests/second, 200 burst (per-IP)
|
||||
|
||||
## Endpoints
|
||||
|
||||
### Batch Operations
|
||||
|
||||
#### POST /batch/lookup
|
||||
|
||||
Retrieve multiple tracks, albums, and artists in a single request.
|
||||
|
||||
**Request Body:**
|
||||
```json
|
||||
{
|
||||
"tracks": ["track_id_1", "track_id_2"],
|
||||
"artists": ["artist_id_1", "artist_id_2"],
|
||||
"albums": ["album_id_1", "album_id_2"],
|
||||
"isrcs": ["USRC12345678", "GBUM71234567"]
|
||||
}
|
||||
```
|
||||
|
||||
**Constraints:**
|
||||
- Maximum 400 items total across all arrays
|
||||
- All fields optional (at least one required)
|
||||
- Duplicate IDs allowed (deduplicated in response)
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"tracks": {
|
||||
"track_id_1": {
|
||||
"id": "track_id_1",
|
||||
"name": "Song Title",
|
||||
"isrc": "USRC12345678",
|
||||
"duration_ms": 240000,
|
||||
"explicit": false,
|
||||
"track_number": 1,
|
||||
"disc_number": 1,
|
||||
"popularity": 85,
|
||||
"preview_url": "https://...",
|
||||
"album": { /* Album object */ },
|
||||
"artists": [ /* Artist objects */ ],
|
||||
"original_title": "Original Title",
|
||||
"version_title": "Radio Edit",
|
||||
"has_lyrics": true,
|
||||
"languages": ["en", "es"],
|
||||
"artist_roles": {
|
||||
"artist_id_1": ["performer", "composer"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"artists": {
|
||||
"artist_id_1": { /* Artist object */ }
|
||||
},
|
||||
"albums": {
|
||||
"album_id_1": { /* Album object */ }
|
||||
},
|
||||
"isrcs": {
|
||||
"USRC12345678": { /* Track object */ }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200 OK` - Success (even if some items not found)
|
||||
- `400 Bad Request` - Invalid request (exceeds 400 items, malformed JSON)
|
||||
- `429 Too Many Requests` - Rate limit exceeded
|
||||
|
||||
**Performance:**
|
||||
- Optimized with batch queries (7 queries for 400 items vs 2,400 individual queries)
|
||||
- Typical response time: 100-500ms for 400 items
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/batch/lookup \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"isrcs": ["USRC17607839", "GBUM71029604"],
|
||||
"tracks": ["4cOdK2wGLETKBW3PvgPWqT"]
|
||||
}'
|
||||
```
|
||||
|
||||
### Track Lookups
|
||||
|
||||
#### GET /lookup/isrc/{isrc}
|
||||
|
||||
Retrieve track by ISRC (International Standard Recording Code).
|
||||
|
||||
**Path Parameters:**
|
||||
- `isrc` - ISRC code (e.g., `USRC12345678`)
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "track_id",
|
||||
"name": "Song Title",
|
||||
"isrc": "USRC12345678",
|
||||
"duration_ms": 240000,
|
||||
"explicit": false,
|
||||
"track_number": 1,
|
||||
"disc_number": 1,
|
||||
"popularity": 85,
|
||||
"preview_url": "https://p.scdn.co/mp3-preview/...",
|
||||
"album": {
|
||||
"id": "album_id",
|
||||
"name": "Album Title",
|
||||
"album_type": "album",
|
||||
"label": "Record Label",
|
||||
"release_date": "2023-01-15",
|
||||
"release_date_precision": "day",
|
||||
"external_id_upc": "123456789012",
|
||||
"total_tracks": 12,
|
||||
"copyright_c": "2023 Label",
|
||||
"copyright_p": "2023 Label",
|
||||
"images": [
|
||||
{
|
||||
"url": "https://i.scdn.co/image/...",
|
||||
"width": 640,
|
||||
"height": 640
|
||||
}
|
||||
],
|
||||
"artists": [ /* Album artists */ ]
|
||||
},
|
||||
"artists": [
|
||||
{
|
||||
"id": "artist_id",
|
||||
"name": "Artist Name",
|
||||
"followers_total": 1000000,
|
||||
"popularity": 90,
|
||||
"genres": ["pop", "rock"],
|
||||
"images": [ /* Artist images */ ]
|
||||
}
|
||||
],
|
||||
"original_title": "Original Title",
|
||||
"version_title": "Radio Edit",
|
||||
"has_lyrics": true,
|
||||
"languages": ["en"],
|
||||
"artist_roles": {
|
||||
"artist_id": ["performer", "composer"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200 OK` - Track found
|
||||
- `404 Not Found` - ISRC not in database
|
||||
- `429 Too Many Requests` - Rate limit exceeded
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl http://localhost:8080/lookup/isrc/USRC17607839
|
||||
```
|
||||
|
||||
#### GET /lookup/track/{id}
|
||||
|
||||
Retrieve track by internal track ID.
|
||||
|
||||
**Path Parameters:**
|
||||
- `id` - Track ID (internal identifier)
|
||||
|
||||
**Response:** Same as `/lookup/isrc/{isrc}`
|
||||
|
||||
**Status Codes:**
|
||||
- `200 OK` - Track found
|
||||
- `404 Not Found` - Track ID not in database
|
||||
- `429 Too Many Requests` - Rate limit exceeded
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl http://localhost:8080/lookup/track/4cOdK2wGLETKBW3PvgPWqT
|
||||
```
|
||||
|
||||
### Artist Lookups
|
||||
|
||||
#### GET /lookup/artist/{id}
|
||||
|
||||
Retrieve artist by ID.
|
||||
|
||||
**Path Parameters:**
|
||||
- `id` - Artist ID
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "artist_id",
|
||||
"name": "Artist Name",
|
||||
"followers_total": 1000000,
|
||||
"popularity": 90,
|
||||
"genres": ["pop", "rock", "indie"],
|
||||
"images": [
|
||||
{
|
||||
"url": "https://i.scdn.co/image/...",
|
||||
"width": 640,
|
||||
"height": 640
|
||||
},
|
||||
{
|
||||
"url": "https://i.scdn.co/image/...",
|
||||
"width": 320,
|
||||
"height": 320
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200 OK` - Artist found
|
||||
- `404 Not Found` - Artist ID not in database
|
||||
- `429 Too Many Requests` - Rate limit exceeded
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl http://localhost:8080/lookup/artist/0TnOYISbd1XYRBk9myaseg
|
||||
```
|
||||
|
||||
### Album Lookups
|
||||
|
||||
#### GET /lookup/album/{id}
|
||||
|
||||
Retrieve album by ID.
|
||||
|
||||
**Path Parameters:**
|
||||
- `id` - Album ID
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "album_id",
|
||||
"name": "Album Title",
|
||||
"album_type": "album",
|
||||
"label": "Record Label",
|
||||
"release_date": "2023-01-15",
|
||||
"release_date_precision": "day",
|
||||
"external_id_upc": "123456789012",
|
||||
"total_tracks": 12,
|
||||
"copyright_c": "2023 Label",
|
||||
"copyright_p": "2023 Label",
|
||||
"images": [
|
||||
{
|
||||
"url": "https://i.scdn.co/image/...",
|
||||
"width": 640,
|
||||
"height": 640
|
||||
}
|
||||
],
|
||||
"artists": [
|
||||
{
|
||||
"id": "artist_id",
|
||||
"name": "Artist Name",
|
||||
"followers_total": 1000000,
|
||||
"popularity": 90,
|
||||
"genres": ["pop"],
|
||||
"images": [ /* Artist images */ ]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200 OK` - Album found
|
||||
- `404 Not Found` - Album ID not in database
|
||||
- `429 Too Many Requests` - Rate limit exceeded
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl http://localhost:8080/lookup/album/2ODvWsOgouMbaA5xf0RkJe
|
||||
```
|
||||
|
||||
#### GET /lookup/album/{id}/tracks
|
||||
|
||||
Retrieve all tracks for an album.
|
||||
|
||||
**Path Parameters:**
|
||||
- `id` - Album ID
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"tracks": [
|
||||
{
|
||||
"id": "track_id_1",
|
||||
"name": "Track 1",
|
||||
"track_number": 1,
|
||||
"disc_number": 1,
|
||||
/* Full track object */
|
||||
},
|
||||
{
|
||||
"id": "track_id_2",
|
||||
"name": "Track 2",
|
||||
"track_number": 2,
|
||||
"disc_number": 1,
|
||||
/* Full track object */
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200 OK` - Album found (even if no tracks)
|
||||
- `404 Not Found` - Album ID not in database
|
||||
- `429 Too Many Requests` - Rate limit exceeded
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl http://localhost:8080/lookup/album/2ODvWsOgouMbaA5xf0RkJe/tracks
|
||||
```
|
||||
|
||||
### Search
|
||||
|
||||
#### GET /search/track
|
||||
|
||||
Search tracks by name.
|
||||
|
||||
**Query Parameters:**
|
||||
- `q` - Search query (minimum 2 characters, required)
|
||||
- `limit` - Maximum results (default 10, max 50)
|
||||
|
||||
**Search Behavior:**
|
||||
- Case-insensitive substring match (`LIKE %query% COLLATE NOCASE`)
|
||||
- Ordered by popularity (descending)
|
||||
- 10-second timeout
|
||||
- Returns partial matches
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"tracks": [
|
||||
{
|
||||
"id": "track_id",
|
||||
"name": "Song Title",
|
||||
/* Full track object */
|
||||
}
|
||||
],
|
||||
"total": 1
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200 OK` - Search completed (even if no results)
|
||||
- `400 Bad Request` - Query too short (< 2 chars) or limit too high (> 50)
|
||||
- `429 Too Many Requests` - Rate limit exceeded
|
||||
- `504 Gateway Timeout` - Search exceeded 10 seconds
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl "http://localhost:8080/search/track?q=bohemian&limit=5"
|
||||
```
|
||||
|
||||
**Performance Note:** Search uses `LIKE %query%` which can't leverage indexes efficiently. Searches on common terms may be slow (full table scan on 256M tracks).
|
||||
|
||||
#### GET /search/artist
|
||||
|
||||
Search artists by name.
|
||||
|
||||
**Query Parameters:**
|
||||
- `q` - Search query (minimum 2 characters, required)
|
||||
- `limit` - Maximum results (default 10, max 50)
|
||||
|
||||
**Search Behavior:**
|
||||
- Case-insensitive substring match (`LIKE %query% COLLATE NOCASE`)
|
||||
- Ordered by follower count (descending)
|
||||
- 10-second timeout
|
||||
- Returns partial matches
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"artists": [
|
||||
{
|
||||
"id": "artist_id",
|
||||
"name": "Artist Name",
|
||||
/* Full artist object */
|
||||
}
|
||||
],
|
||||
"total": 1
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200 OK` - Search completed (even if no results)
|
||||
- `400 Bad Request` - Query too short (< 2 chars) or limit too high (> 50)
|
||||
- `429 Too Many Requests` - Rate limit exceeded
|
||||
- `504 Gateway Timeout` - Search exceeded 10 seconds
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl "http://localhost:8080/search/artist?q=beatles&limit=5"
|
||||
```
|
||||
|
||||
### Health & Documentation
|
||||
|
||||
#### GET /health
|
||||
|
||||
Health check endpoint for monitoring.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "ok"
|
||||
}
|
||||
```
|
||||
|
||||
**Status Codes:**
|
||||
- `200 OK` - Always (even if database unreachable)
|
||||
|
||||
**Limitation:** This is a naive health check. It doesn't verify database connectivity. A database failure won't be detected until actual queries fail.
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl http://localhost:8080/health
|
||||
```
|
||||
|
||||
#### GET /docs
|
||||
|
||||
Interactive Swagger UI for API documentation.
|
||||
|
||||
**Response:** HTML page with embedded Swagger UI
|
||||
|
||||
**Dependencies:**
|
||||
- Loads Swagger UI from unpkg.com CDN (browser-side)
|
||||
- Requires internet connection for first load
|
||||
- Fetches OpenAPI spec from `/openapi.yaml`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Open in browser
|
||||
open http://localhost:8080/docs
|
||||
```
|
||||
|
||||
#### GET /openapi.yaml
|
||||
|
||||
OpenAPI 3.1 specification in YAML format.
|
||||
|
||||
**Response:** YAML document with full API specification
|
||||
|
||||
**Content-Type:** `application/x-yaml`
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
curl http://localhost:8080/openapi.yaml
|
||||
```
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
### Algorithm
|
||||
|
||||
**Implementation:** Token bucket per IP address
|
||||
|
||||
**Configuration:**
|
||||
- **Rate:** 100 requests/second
|
||||
- **Burst:** 200 requests
|
||||
- **Scope:** Per-IP (extracted from `X-Forwarded-For` or `RemoteAddr`)
|
||||
|
||||
### Behavior
|
||||
|
||||
**Token bucket mechanics:**
|
||||
1. Each IP gets a bucket with 200 tokens (burst capacity)
|
||||
2. Tokens refill at 100/second
|
||||
3. Each request consumes 1 token
|
||||
4. If bucket empty, request rejected with HTTP 429
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
| Scenario | Tokens Available | Result |
|
||||
|----------|------------------|--------|
|
||||
| First request | 200 | Allowed (199 remaining) |
|
||||
| 200 requests in 1 second | 200 → 0 | All allowed |
|
||||
| 201st request in same second | 0 | Rejected (429) |
|
||||
| Wait 1 second | 0 → 100 | 100 requests allowed |
|
||||
| Steady 50 req/s | Always > 0 | Never rate limited |
|
||||
|
||||
### Response Headers
|
||||
|
||||
**When rate limited (HTTP 429):**
|
||||
```
|
||||
HTTP/1.1 429 Too Many Requests
|
||||
Retry-After: 1
|
||||
Content-Type: text/plain
|
||||
|
||||
Rate limit exceeded
|
||||
```
|
||||
|
||||
**Retry-After:** Seconds to wait before retrying (always 1)
|
||||
|
||||
### IP Extraction
|
||||
|
||||
**Priority:**
|
||||
1. `X-Forwarded-For` header (first IP if comma-separated)
|
||||
2. `RemoteAddr` from connection
|
||||
|
||||
**Example:**
|
||||
```
|
||||
X-Forwarded-For: 203.0.113.1, 198.51.100.1
|
||||
→ Rate limiter uses 203.0.113.1
|
||||
```
|
||||
|
||||
### Known Issues
|
||||
|
||||
**Memory leak:** Visitor map grows unbounded. No cleanup for inactive IPs. Long-running servers will accumulate memory over time.
|
||||
|
||||
**Workaround:** Restart server periodically or implement custom cleanup.
|
||||
|
||||
## Data Models
|
||||
|
||||
### Track Object
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "string", // Internal track ID
|
||||
"name": "string", // Track title
|
||||
"isrc": "string", // ISRC code (optional)
|
||||
"duration_ms": 0, // Duration in milliseconds
|
||||
"explicit": false, // Explicit content flag
|
||||
"track_number": 0, // Track number on album
|
||||
"disc_number": 0, // Disc number (multi-disc albums)
|
||||
"popularity": 0, // Popularity score (0-100)
|
||||
"preview_url": "string", // 30-second preview URL (optional)
|
||||
"album": { /* Album object */ }, // Parent album
|
||||
"artists": [ /* Artist objects */ ], // Track artists
|
||||
"original_title": "string", // Original title (optional)
|
||||
"version_title": "string", // Version (e.g., "Radio Edit") (optional)
|
||||
"has_lyrics": false, // Lyrics availability flag
|
||||
"languages": ["string"], // Languages of performance (optional)
|
||||
"artist_roles": { // Artist roles map (optional)
|
||||
"artist_id": ["role1", "role2"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Field notes:**
|
||||
- `isrc`: May be null for some tracks
|
||||
- `preview_url`: May be null if no preview available
|
||||
- `popularity`: Higher = more popular (Spotify-style metric)
|
||||
- `languages`: ISO 639-1 codes (e.g., "en", "es")
|
||||
- `artist_roles`: Maps artist ID to roles (e.g., "performer", "composer", "producer")
|
||||
|
||||
### Album Object
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "string", // Internal album ID
|
||||
"name": "string", // Album title
|
||||
"album_type": "string", // "album", "single", "compilation"
|
||||
"label": "string", // Record label (optional)
|
||||
"release_date": "string", // ISO 8601 date (YYYY-MM-DD)
|
||||
"release_date_precision": "string", // "year", "month", "day"
|
||||
"external_id_upc": "string", // UPC barcode (optional)
|
||||
"total_tracks": 0, // Total tracks on album
|
||||
"copyright_c": "string", // Copyright notice (optional)
|
||||
"copyright_p": "string", // Phonographic copyright (optional)
|
||||
"images": [ /* Image objects */ ], // Album artwork (optional)
|
||||
"artists": [ /* Artist objects */ ] // Album artists (optional)
|
||||
}
|
||||
```
|
||||
|
||||
**Field notes:**
|
||||
- `album_type`: Typically "album", "single", or "compilation"
|
||||
- `release_date_precision`: Indicates granularity of release date
|
||||
- `external_id_upc`: Universal Product Code (barcode)
|
||||
- `images`: Sorted by size (largest first)
|
||||
|
||||
### Artist Object
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "string", // Internal artist ID
|
||||
"name": "string", // Artist name
|
||||
"followers_total": 0, // Total followers (optional)
|
||||
"popularity": 0, // Popularity score 0-100 (optional)
|
||||
"genres": ["string"], // Genres (optional)
|
||||
"images": [ /* Image objects */ ] // Artist images (optional)
|
||||
}
|
||||
```
|
||||
|
||||
**Field notes:**
|
||||
- `followers_total`: Spotify-style follower count
|
||||
- `popularity`: Higher = more popular
|
||||
- `genres`: Multiple genres possible (e.g., ["pop", "rock"])
|
||||
- `images`: Sorted by size (largest first)
|
||||
|
||||
### Image Object
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "string", // Image URL (typically i.scdn.co)
|
||||
"width": 0, // Width in pixels
|
||||
"height": 0 // Height in pixels
|
||||
}
|
||||
```
|
||||
|
||||
**Field notes:**
|
||||
- URLs reference external CDN (i.scdn.co)
|
||||
- Multiple sizes available (640x640, 320x320, 64x64 typical)
|
||||
- Images not hosted by API (external references)
|
||||
|
||||
## Error Responses
|
||||
|
||||
### Standard Error Format
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "Error message"
|
||||
}
|
||||
```
|
||||
|
||||
**Content-Type:** `application/json` (for structured errors) or `text/plain` (for simple errors)
|
||||
|
||||
### Common Error Codes
|
||||
|
||||
| Status | Meaning | Common Causes |
|
||||
|--------|---------|---------------|
|
||||
| 400 | Bad Request | Invalid query params, malformed JSON, validation failure |
|
||||
| 404 | Not Found | ID/ISRC not in database |
|
||||
| 429 | Too Many Requests | Rate limit exceeded |
|
||||
| 500 | Internal Server Error | Database error, query timeout |
|
||||
| 504 | Gateway Timeout | Search exceeded 10 seconds |
|
||||
|
||||
### Example Error Responses
|
||||
|
||||
**404 Not Found:**
|
||||
```json
|
||||
{
|
||||
"error": "Track not found"
|
||||
}
|
||||
```
|
||||
|
||||
**400 Bad Request:**
|
||||
```json
|
||||
{
|
||||
"error": "Query must be at least 2 characters"
|
||||
}
|
||||
```
|
||||
|
||||
**429 Too Many Requests:**
|
||||
```
|
||||
Rate limit exceeded
|
||||
```
|
||||
|
||||
**500 Internal Server Error:**
|
||||
```json
|
||||
{
|
||||
"error": "Database query failed"
|
||||
}
|
||||
```
|
||||
|
||||
## OpenAPI Specification
|
||||
|
||||
### Metadata
|
||||
|
||||
```yaml
|
||||
openapi: 3.1.0
|
||||
info:
|
||||
title: Music Metadata API
|
||||
version: 1.0.0
|
||||
description: API for querying music metadata from 256M tracks
|
||||
license:
|
||||
name: MIT
|
||||
servers:
|
||||
- url: http://localhost:8080
|
||||
description: Local development server
|
||||
```
|
||||
|
||||
### Example Endpoint Definition
|
||||
|
||||
```yaml
|
||||
/lookup/track/{id}:
|
||||
get:
|
||||
summary: Get track by ID
|
||||
operationId: getTrack
|
||||
parameters:
|
||||
- name: id
|
||||
in: path
|
||||
required: true
|
||||
schema:
|
||||
type: string
|
||||
description: Track ID
|
||||
responses:
|
||||
'200':
|
||||
description: Track found
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/Track'
|
||||
'404':
|
||||
description: Track not found
|
||||
'429':
|
||||
description: Rate limit exceeded
|
||||
```
|
||||
|
||||
### Schema Definitions
|
||||
|
||||
All data models defined in `components/schemas`:
|
||||
- `Track`
|
||||
- `Album`
|
||||
- `Artist`
|
||||
- `Image`
|
||||
- `BatchRequest`
|
||||
- `BatchResponse`
|
||||
|
||||
**Access:** http://localhost:8080/openapi.yaml
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Batch Lookup (Python)
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
url = "http://localhost:8080/batch/lookup"
|
||||
payload = {
|
||||
"isrcs": ["USRC17607839", "GBUM71029604"],
|
||||
"tracks": ["4cOdK2wGLETKBW3PvgPWqT"]
|
||||
}
|
||||
|
||||
response = requests.post(url, json=payload)
|
||||
data = response.json()
|
||||
|
||||
for isrc, track in data.get("isrcs", {}).items():
|
||||
print(f"{isrc}: {track['name']} by {track['artists'][0]['name']}")
|
||||
```
|
||||
|
||||
### Search with Pagination (JavaScript)
|
||||
|
||||
```javascript
|
||||
async function searchTracks(query, limit = 10) {
|
||||
const url = `http://localhost:8080/search/track?q=${encodeURIComponent(query)}&limit=${limit}`;
|
||||
const response = await fetch(url);
|
||||
const data = await response.json();
|
||||
return data.tracks;
|
||||
}
|
||||
|
||||
const tracks = await searchTracks("bohemian rhapsody", 5);
|
||||
tracks.forEach(track => {
|
||||
console.log(`${track.name} - ${track.album.name}`);
|
||||
});
|
||||
```
|
||||
|
||||
### Rate Limit Handling (Go)
|
||||
|
||||
```go
|
||||
func fetchWithRetry(url string) (*http.Response, error) {
|
||||
for {
|
||||
resp, err := http.Get(url)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
if resp.StatusCode == 429 {
|
||||
retryAfter := resp.Header.Get("Retry-After")
|
||||
duration, _ := time.ParseDuration(retryAfter + "s")
|
||||
time.Sleep(duration)
|
||||
continue
|
||||
}
|
||||
|
||||
return resp, nil
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Batch vs Individual Requests
|
||||
|
||||
**Individual requests (400 tracks):**
|
||||
- 400 HTTP requests
|
||||
- 400 × ~50ms = 20 seconds
|
||||
- Rate limited at 100 req/s (4 seconds minimum)
|
||||
|
||||
**Batch request (400 tracks):**
|
||||
- 1 HTTP request
|
||||
- ~200-500ms total
|
||||
- **40-100x faster**
|
||||
|
||||
**Recommendation:** Always use batch endpoint for multiple items.
|
||||
|
||||
### Search Performance
|
||||
|
||||
**Fast searches:**
|
||||
- Short, specific queries ("beatles")
|
||||
- Queries matching popular items (returned first)
|
||||
|
||||
**Slow searches:**
|
||||
- Common words ("love", "the")
|
||||
- Long queries with many results
|
||||
- Queries requiring full table scan
|
||||
|
||||
**Recommendation:** Implement client-side caching for common searches.
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
**Cacheable:**
|
||||
- Track/album/artist lookups (data rarely changes)
|
||||
- Search results (cache for 1 hour)
|
||||
|
||||
**Not cacheable:**
|
||||
- Health checks
|
||||
- OpenAPI spec (changes with deployments)
|
||||
|
||||
**Recommendation:** Use HTTP caching headers (not currently implemented by API).
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Enrichment Pipeline
|
||||
|
||||
```
|
||||
1. Extract ISRCs from audio files (e.g., via AcoustID)
|
||||
↓
|
||||
2. Batch lookup ISRCs (400 at a time)
|
||||
↓
|
||||
3. Store track metadata in local database
|
||||
↓
|
||||
4. Fetch missing artists/albums individually
|
||||
↓
|
||||
5. Update local cache
|
||||
```
|
||||
|
||||
### Complementing MusicBrainz
|
||||
|
||||
```
|
||||
MusicBrainz (MBID-based)
|
||||
↓
|
||||
Resolve ISRC from MusicBrainz
|
||||
↓
|
||||
Lookup ISRC in Music Metadata API
|
||||
↓
|
||||
Merge metadata (MusicBrainz + Spotify-style data)
|
||||
```
|
||||
|
||||
### Real-time Lookup
|
||||
|
||||
```
|
||||
User plays track
|
||||
↓
|
||||
Extract ISRC from file
|
||||
↓
|
||||
Check local cache
|
||||
↓
|
||||
If miss: GET /lookup/isrc/{isrc}
|
||||
↓
|
||||
Display metadata in UI
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
### No Authentication
|
||||
|
||||
**Impact:**
|
||||
- Anyone can query API
|
||||
- No usage tracking per user
|
||||
- No quota enforcement per user
|
||||
|
||||
**Mitigation:**
|
||||
- Deploy behind reverse proxy with auth
|
||||
- Use firewall rules to restrict access
|
||||
- Implement API gateway with authentication
|
||||
|
||||
### No CORS
|
||||
|
||||
**Impact:**
|
||||
- Browser-based clients blocked
|
||||
- Can't call from web apps directly
|
||||
|
||||
**Mitigation:**
|
||||
- Add CORS middleware (custom implementation)
|
||||
- Use server-side proxy
|
||||
- Deploy API on same origin as web app
|
||||
|
||||
### No Metrics
|
||||
|
||||
**Impact:**
|
||||
- No visibility into usage patterns
|
||||
- Can't track error rates
|
||||
- No performance monitoring
|
||||
|
||||
**Mitigation:**
|
||||
- Add Prometheus metrics (custom implementation)
|
||||
- Use reverse proxy with metrics (e.g., nginx)
|
||||
- Parse logs for analytics
|
||||
|
||||
### Naive Health Check
|
||||
|
||||
**Impact:**
|
||||
- Health endpoint returns OK even if database down
|
||||
- Monitoring systems can't detect database failures
|
||||
|
||||
**Mitigation:**
|
||||
- Implement custom health check with database ping
|
||||
- Monitor actual query endpoints (e.g., /lookup/track/test_id)
|
||||
@@ -0,0 +1,626 @@
|
||||
# Music Metadata API - Architecture
|
||||
|
||||
## Architectural Overview
|
||||
|
||||
Music Metadata API follows a clean 3-layer architecture with clear separation of concerns:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ HTTP Clients │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ API Layer (internal/api) │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Handlers │ │ Rate Limiter │ │ OpenAPI │ │
|
||||
│ │ (routing) │ │ (middleware) │ │ (docs) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Database Layer (internal/db) │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Queries │ │ Enrichment │ │ Batch │ │
|
||||
│ │ (SQL) │ │ (joins) │ │ Optimization │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Models Layer (internal/models) │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Track │ │ Album │ │ Artist │ │
|
||||
│ │ (struct) │ │ (struct) │ │ (struct) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ SQLite Databases (read-only) │
|
||||
│ ┌──────────────────────────┐ ┌──────────────────────────┐ │
|
||||
│ │ main_database.sqlite3 │ │ track_files.sqlite3 │ │
|
||||
│ │ (~117GB) │ │ (~99GB) │ │
|
||||
│ └──────────────────────────┘ └──────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
music-metadata-api/
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go # Entry point (62 lines)
|
||||
│
|
||||
├── internal/
|
||||
│ ├── api/
|
||||
│ │ ├── handlers.go # HTTP route handlers
|
||||
│ │ ├── ratelimit.go # Token bucket rate limiter
|
||||
│ │ └── openapi.go # OpenAPI spec + Swagger UI
|
||||
│ │
|
||||
│ ├── db/
|
||||
│ │ └── db.go # Database layer (907 lines)
|
||||
│ │
|
||||
│ └── models/
|
||||
│ └── models.go # Data structures (65 lines)
|
||||
│
|
||||
├── Dockerfile # Multi-stage build
|
||||
├── docker-compose.yml # Production deployment
|
||||
├── go.mod # Dependencies
|
||||
├── go.sum # Dependency checksums
|
||||
├── .gitignore # Excludes databases, binaries
|
||||
└── .github/
|
||||
└── workflows/
|
||||
└── docker-publish.yml # CI/CD pipeline
|
||||
```
|
||||
|
||||
## Layer Breakdown
|
||||
|
||||
### Entry Point: cmd/server/main.go
|
||||
|
||||
**Responsibilities:**
|
||||
- Parse CLI flags (`-db`, `-addr`)
|
||||
- Initialize database connections
|
||||
- Set up HTTP router
|
||||
- Configure graceful shutdown
|
||||
- Start HTTP server
|
||||
|
||||
**Key code flow:**
|
||||
```go
|
||||
// 1. Parse flags
|
||||
dbPath := flag.String("db", "", "path to database")
|
||||
addr := flag.String("addr", ":8080", "server address")
|
||||
|
||||
// 2. Initialize database
|
||||
database, err := db.NewDatabase(*dbPath)
|
||||
|
||||
// 3. Set up router with rate limiting
|
||||
mux := http.NewServeMux()
|
||||
rateLimiter := api.NewRateLimiter(100, 200) // 100 req/s, 200 burst
|
||||
handler := rateLimiter.Limit(mux)
|
||||
|
||||
// 4. Register routes
|
||||
api.RegisterRoutes(mux, database)
|
||||
|
||||
// 5. Graceful shutdown on SIGINT/SIGTERM
|
||||
server := &http.Server{Addr: *addr, Handler: handler}
|
||||
// ... shutdown logic with 10s timeout
|
||||
```
|
||||
|
||||
**File size:** 62 lines (minimal, focused)
|
||||
|
||||
### API Layer: internal/api/
|
||||
|
||||
#### handlers.go
|
||||
|
||||
**Responsibilities:**
|
||||
- Route registration
|
||||
- Request parsing
|
||||
- Response serialization
|
||||
- Error handling
|
||||
- Query parameter validation
|
||||
|
||||
**Route patterns (Go 1.22+ enhanced routing):**
|
||||
```go
|
||||
// Method + path patterns
|
||||
mux.HandleFunc("POST /batch/lookup", handleBatchLookup)
|
||||
mux.HandleFunc("GET /lookup/isrc/{isrc}", handleISRCLookup)
|
||||
mux.HandleFunc("GET /lookup/track/{id}", handleTrackLookup)
|
||||
mux.HandleFunc("GET /lookup/artist/{id}", handleArtistLookup)
|
||||
mux.HandleFunc("GET /lookup/album/{id}", handleAlbumLookup)
|
||||
mux.HandleFunc("GET /lookup/album/{id}/tracks", handleAlbumTracks)
|
||||
mux.HandleFunc("GET /search/track", handleTrackSearch)
|
||||
mux.HandleFunc("GET /search/artist", handleArtistSearch)
|
||||
mux.HandleFunc("GET /health", handleHealth)
|
||||
mux.HandleFunc("GET /docs", handleDocs)
|
||||
mux.HandleFunc("GET /openapi.yaml", handleOpenAPI)
|
||||
```
|
||||
|
||||
**Handler pattern:**
|
||||
```go
|
||||
func handleTrackLookup(w http.ResponseWriter, r *http.Request) {
|
||||
// 1. Extract path parameter
|
||||
id := r.PathValue("id")
|
||||
|
||||
// 2. Call database layer
|
||||
track, err := db.GetTrack(id)
|
||||
if err != nil {
|
||||
http.Error(w, "Track not found", http.StatusNotFound)
|
||||
return
|
||||
}
|
||||
|
||||
// 3. Serialize response
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(track)
|
||||
}
|
||||
```
|
||||
|
||||
**Validation rules:**
|
||||
- Search queries: minimum 2 characters
|
||||
- Batch requests: maximum 400 items
|
||||
- Limit parameters: maximum 50 results
|
||||
- Timeouts: 10 seconds for search queries
|
||||
|
||||
#### ratelimit.go
|
||||
|
||||
**Implementation:** Token bucket algorithm with per-IP tracking
|
||||
|
||||
**Data structure:**
|
||||
```go
|
||||
type RateLimiter struct {
|
||||
visitors map[string]*rate.Limiter // IP -> limiter
|
||||
mu sync.RWMutex // Protects visitors map
|
||||
rate rate.Limit // Tokens per second
|
||||
burst int // Burst capacity
|
||||
}
|
||||
```
|
||||
|
||||
**Algorithm:**
|
||||
1. Extract client IP from `X-Forwarded-For` header (fallback to `RemoteAddr`)
|
||||
2. Look up or create limiter for IP
|
||||
3. Check if token available (`limiter.Allow()`)
|
||||
4. If allowed, pass to next handler
|
||||
5. If denied, return HTTP 429 with `Retry-After` header
|
||||
|
||||
**BUG:** Visitor map grows unbounded. No cleanup mechanism for inactive IPs. Long-running servers will accumulate memory.
|
||||
|
||||
**Configuration:**
|
||||
- Rate: 100 requests/second
|
||||
- Burst: 200 requests
|
||||
- Scope: Per-IP (not per-user, no authentication)
|
||||
|
||||
#### openapi.go
|
||||
|
||||
**Responsibilities:**
|
||||
- Serve OpenAPI 3.1 specification at `/openapi.yaml`
|
||||
- Serve Swagger UI at `/docs`
|
||||
- Embed OpenAPI spec in binary (no external files)
|
||||
|
||||
**Swagger UI loading:**
|
||||
```html
|
||||
<!-- Loaded from unpkg.com CDN (browser-side) -->
|
||||
<script src="https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js"></script>
|
||||
<link rel="stylesheet" href="https://unpkg.com/swagger-ui-dist@5/swagger-ui.css" />
|
||||
```
|
||||
|
||||
**OpenAPI spec highlights:**
|
||||
- Version: 3.1.0
|
||||
- All endpoints documented
|
||||
- Request/response schemas
|
||||
- Example payloads
|
||||
- Error responses
|
||||
|
||||
### Database Layer: internal/db/db.go
|
||||
|
||||
**File size:** 907 lines (largest file in codebase)
|
||||
|
||||
**Responsibilities:**
|
||||
- SQLite connection management
|
||||
- Query execution
|
||||
- Data enrichment (joining related entities)
|
||||
- Batch optimization
|
||||
- Transaction handling (read-only)
|
||||
|
||||
#### Connection Management
|
||||
|
||||
**Dual database connections:**
|
||||
```go
|
||||
type Database struct {
|
||||
mainDB *sql.DB // main_database.sqlite3
|
||||
trackFilesDB *sql.DB // track_files.sqlite3
|
||||
}
|
||||
```
|
||||
|
||||
**Connection string PRAGMAs:**
|
||||
```
|
||||
file:/path/to/db.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
|
||||
```
|
||||
|
||||
**PRAGMA breakdown:**
|
||||
|
||||
| PRAGMA | Value | Purpose |
|
||||
|--------|-------|---------|
|
||||
| `mode=ro` | Read-only | Prevents accidental writes |
|
||||
| `_journal_mode=off` | Disabled | No write-ahead log (read-only safe) |
|
||||
| `_cache_size=-64000` | 64MB | Page cache size (negative = KB) |
|
||||
| `_mmap_size=1073741824` | 1GB | Memory-mapped I/O size |
|
||||
| `_query_only=true` | Enabled | Additional read-only enforcement |
|
||||
|
||||
**Connection pool:**
|
||||
```go
|
||||
db.SetMaxOpenConns(8) // Conservative limit
|
||||
db.SetMaxIdleConns(8) // Keep connections warm
|
||||
db.SetConnMaxLifetime(0) // No expiration
|
||||
```
|
||||
|
||||
#### Query Patterns
|
||||
|
||||
**Individual lookups:**
|
||||
```go
|
||||
func (d *Database) GetTrack(id string) (*models.Track, error) {
|
||||
// 1. Fetch base track + album
|
||||
row := d.mainDB.QueryRow(`
|
||||
SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
||||
t.track_number, t.disc_number, t.popularity, t.preview_url,
|
||||
a.id, a.name, a.album_type, a.label, a.release_date,
|
||||
a.release_date_precision, a.external_id_upc, a.total_tracks
|
||||
FROM tracks t
|
||||
JOIN albums a ON t.album_rowid = a.rowid
|
||||
WHERE t.id = ?
|
||||
`, id)
|
||||
|
||||
// 2. Enrich album (images, artists)
|
||||
d.enrichAlbum(&track.Album)
|
||||
|
||||
// 3. Enrich track (artists, track_files)
|
||||
d.enrichTrack(&track)
|
||||
|
||||
return &track, nil
|
||||
}
|
||||
```
|
||||
|
||||
**Batch lookups:**
|
||||
```go
|
||||
func (d *Database) BatchGetByISRC(isrcs []string) (map[string]*models.Track, error) {
|
||||
// 1. Build IN clause
|
||||
placeholders := strings.Repeat("?,", len(isrcs)-1) + "?"
|
||||
query := fmt.Sprintf(`
|
||||
SELECT t.id, t.isrc, ...
|
||||
FROM tracks t
|
||||
JOIN albums a ON t.album_rowid = a.rowid
|
||||
WHERE t.isrc IN (%s)
|
||||
`, placeholders)
|
||||
|
||||
// 2. Execute batch query
|
||||
rows, err := d.mainDB.Query(query, isrcs...)
|
||||
|
||||
// 3. Collect track IDs for enrichment
|
||||
trackIDs := make([]string, 0, len(tracks))
|
||||
albumIDs := make([]string, 0, len(tracks))
|
||||
|
||||
// 4. Batch enrich all entities
|
||||
d.batchEnrichAlbums(albumIDs, tracks)
|
||||
d.batchEnrichTracks(trackIDs, tracks)
|
||||
|
||||
return tracks, nil
|
||||
}
|
||||
```
|
||||
|
||||
#### Data Enrichment Flow
|
||||
|
||||
**Track enrichment pipeline:**
|
||||
```
|
||||
1. Fetch base track + album (single JOIN)
|
||||
↓
|
||||
2. Enrich album:
|
||||
- Batch fetch album images (batchGetAlbumImages)
|
||||
- Batch fetch album artists (batchGetAlbumArtists)
|
||||
↓
|
||||
3. Enrich track:
|
||||
- Batch fetch track artists (batchGetTrackArtists)
|
||||
- Batch fetch track files (batchEnrichTrackFiles)
|
||||
↓
|
||||
4. Enrich artists:
|
||||
- Batch fetch artist genres (batchGetArtistGenres)
|
||||
- Batch fetch artist images (batchGetArtistImages)
|
||||
↓
|
||||
5. Return fully enriched track
|
||||
```
|
||||
|
||||
**Batch optimization functions:**
|
||||
|
||||
| Function | Purpose | Query Pattern |
|
||||
|----------|---------|---------------|
|
||||
| `batchGetAlbumImages` | Fetch all images for albums | `WHERE album_id IN (...)` |
|
||||
| `batchGetAlbumArtists` | Fetch all artists for albums | `WHERE album_id IN (...)` |
|
||||
| `batchGetTrackArtists` | Fetch all artists for tracks | `WHERE track_id IN (...)` |
|
||||
| `batchGetArtistGenres` | Fetch all genres for artists | `WHERE artist_id IN (...)` |
|
||||
| `batchGetArtistImages` | Fetch all images for artists | `WHERE artist_id IN (...)` |
|
||||
| `batchEnrichTrackFiles` | Fetch extended track data | `WHERE track_id IN (...)` |
|
||||
|
||||
**Why batch optimization matters:**
|
||||
- Single batch request with 400 tracks triggers ~6 batch queries
|
||||
- Without batching: 400 tracks × 6 queries = 2,400 database queries
|
||||
- With batching: 1 main query + 6 batch queries = 7 database queries
|
||||
- **Performance gain: 343x fewer queries**
|
||||
|
||||
#### Search Implementation
|
||||
|
||||
**Track search:**
|
||||
```sql
|
||||
SELECT id, name, isrc, duration_ms, popularity, album_rowid
|
||||
FROM tracks
|
||||
WHERE name LIKE ? COLLATE NOCASE
|
||||
ORDER BY popularity DESC
|
||||
LIMIT ?
|
||||
```
|
||||
|
||||
**Artist search:**
|
||||
```sql
|
||||
SELECT id, name, followers_total, popularity
|
||||
FROM artists
|
||||
WHERE name LIKE ? COLLATE NOCASE
|
||||
ORDER BY followers_total DESC
|
||||
LIMIT ?
|
||||
```
|
||||
|
||||
**Search characteristics:**
|
||||
- Pattern: `%query%` (substring match)
|
||||
- Collation: `NOCASE` (case-insensitive)
|
||||
- Timeout: 10 seconds (context deadline)
|
||||
- Min query length: 2 characters
|
||||
- Max results: 50
|
||||
|
||||
**Performance concern:** `LIKE %query%` can't use indexes efficiently. Full table scans on 256M tracks will be slow. FTS (Full-Text Search) would be faster but not implemented.
|
||||
|
||||
### Models Layer: internal/models/models.go
|
||||
|
||||
**File size:** 65 lines (smallest layer)
|
||||
|
||||
**Responsibilities:**
|
||||
- Define data structures
|
||||
- JSON serialization tags
|
||||
- Nested relationships
|
||||
|
||||
**Core models:**
|
||||
|
||||
```go
|
||||
type Track struct {
|
||||
ID string `json:"id"`
|
||||
Name string `json:"name"`
|
||||
ISRC string `json:"isrc,omitempty"`
|
||||
DurationMs int `json:"duration_ms"`
|
||||
Explicit bool `json:"explicit"`
|
||||
TrackNumber int `json:"track_number"`
|
||||
DiscNumber int `json:"disc_number"`
|
||||
Popularity int `json:"popularity"`
|
||||
PreviewURL string `json:"preview_url,omitempty"`
|
||||
Album Album `json:"album"`
|
||||
Artists []Artist `json:"artists"`
|
||||
|
||||
// Extended fields from track_files DB
|
||||
OriginalTitle string `json:"original_title,omitempty"`
|
||||
VersionTitle string `json:"version_title,omitempty"`
|
||||
HasLyrics bool `json:"has_lyrics"`
|
||||
Languages []string `json:"languages,omitempty"`
|
||||
ArtistRoles map[string][]string `json:"artist_roles,omitempty"`
|
||||
}
|
||||
|
||||
type Album struct {
|
||||
ID string `json:"id"`
|
||||
Name string `json:"name"`
|
||||
AlbumType string `json:"album_type"`
|
||||
Label string `json:"label,omitempty"`
|
||||
ReleaseDate string `json:"release_date"`
|
||||
ReleaseDatePrecision string `json:"release_date_precision"`
|
||||
ExternalIDUPC string `json:"external_id_upc,omitempty"`
|
||||
TotalTracks int `json:"total_tracks"`
|
||||
CopyrightC string `json:"copyright_c,omitempty"`
|
||||
CopyrightP string `json:"copyright_p,omitempty"`
|
||||
Images []Image `json:"images,omitempty"`
|
||||
Artists []Artist `json:"artists,omitempty"`
|
||||
}
|
||||
|
||||
type Artist struct {
|
||||
ID string `json:"id"`
|
||||
Name string `json:"name"`
|
||||
FollowersTotal int `json:"followers_total,omitempty"`
|
||||
Popularity int `json:"popularity,omitempty"`
|
||||
Genres []string `json:"genres,omitempty"`
|
||||
Images []Image `json:"images,omitempty"`
|
||||
}
|
||||
|
||||
type Image struct {
|
||||
URL string `json:"url"`
|
||||
Width int `json:"width"`
|
||||
Height int `json:"height"`
|
||||
}
|
||||
```
|
||||
|
||||
**Batch request/response models:**
|
||||
|
||||
```go
|
||||
type BatchRequest struct {
|
||||
Tracks []string `json:"tracks,omitempty"` // Track IDs
|
||||
Artists []string `json:"artists,omitempty"` // Artist IDs
|
||||
Albums []string `json:"albums,omitempty"` // Album IDs
|
||||
ISRCs []string `json:"isrcs,omitempty"` // ISRC codes
|
||||
}
|
||||
|
||||
type BatchResponse struct {
|
||||
Tracks map[string]*Track `json:"tracks,omitempty"`
|
||||
Artists map[string]*Artist `json:"artists,omitempty"`
|
||||
Albums map[string]*Album `json:"albums,omitempty"`
|
||||
ISRCs map[string]*Track `json:"isrcs,omitempty"`
|
||||
}
|
||||
```
|
||||
|
||||
## Request Flow
|
||||
|
||||
### Example: GET /lookup/track/{id}
|
||||
|
||||
```
|
||||
1. Client Request
|
||||
GET /lookup/track/abc123
|
||||
↓
|
||||
2. Rate Limiter Middleware
|
||||
- Extract IP from X-Forwarded-For
|
||||
- Check token bucket for IP
|
||||
- If allowed, continue; else return 429
|
||||
↓
|
||||
3. HTTP Handler (api/handlers.go)
|
||||
- Extract "abc123" from path
|
||||
- Call db.GetTrack("abc123")
|
||||
↓
|
||||
4. Database Layer (db/db.go)
|
||||
- Query track + album (single JOIN)
|
||||
- Enrich album (images, artists)
|
||||
- Enrich track (artists, track_files)
|
||||
- Enrich artists (genres, images)
|
||||
↓
|
||||
5. Models Layer (models/models.go)
|
||||
- Populate Track struct
|
||||
- Nest Album, Artists
|
||||
↓
|
||||
6. HTTP Handler
|
||||
- Serialize Track to JSON
|
||||
- Set Content-Type: application/json
|
||||
- Write response
|
||||
↓
|
||||
7. Client Response
|
||||
200 OK
|
||||
{
|
||||
"id": "abc123",
|
||||
"name": "Song Title",
|
||||
"album": {...},
|
||||
"artists": [...]
|
||||
}
|
||||
```
|
||||
|
||||
### Example: POST /batch/lookup
|
||||
|
||||
```
|
||||
1. Client Request
|
||||
POST /batch/lookup
|
||||
{
|
||||
"isrcs": ["USRC12345678", "GBUM71234567", ...], // Up to 400
|
||||
"tracks": ["id1", "id2", ...]
|
||||
}
|
||||
↓
|
||||
2. Rate Limiter Middleware
|
||||
- Single request counts as 1 token (not 400)
|
||||
↓
|
||||
3. HTTP Handler
|
||||
- Parse BatchRequest
|
||||
- Validate: max 400 items total
|
||||
- Call db.BatchGetByISRC(isrcs)
|
||||
- Call db.BatchGetTracks(trackIDs)
|
||||
↓
|
||||
4. Database Layer
|
||||
- Build IN clause for ISRCs
|
||||
- Execute batch query (1 query for all ISRCs)
|
||||
- Collect all track/album/artist IDs
|
||||
- Batch enrich all entities (6 batch queries)
|
||||
↓
|
||||
5. HTTP Handler
|
||||
- Build BatchResponse with maps
|
||||
- Serialize to JSON
|
||||
↓
|
||||
6. Client Response
|
||||
200 OK
|
||||
{
|
||||
"isrcs": {
|
||||
"USRC12345678": {...},
|
||||
"GBUM71234567": {...}
|
||||
},
|
||||
"tracks": {
|
||||
"id1": {...},
|
||||
"id2": {...}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Graceful Shutdown
|
||||
|
||||
**Signal handling:**
|
||||
```go
|
||||
// Listen for SIGINT (Ctrl+C) and SIGTERM (Docker stop)
|
||||
sigChan := make(chan os.Signal, 1)
|
||||
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
|
||||
|
||||
// Block until signal received
|
||||
<-sigChan
|
||||
|
||||
// Shutdown with 10-second timeout
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
|
||||
server.Shutdown(ctx) // Stop accepting new requests, finish in-flight
|
||||
```
|
||||
|
||||
**Shutdown sequence:**
|
||||
1. Receive SIGINT or SIGTERM
|
||||
2. Stop accepting new connections
|
||||
3. Wait for in-flight requests (max 10 seconds)
|
||||
4. Close database connections
|
||||
5. Exit process
|
||||
|
||||
## No Framework Philosophy
|
||||
|
||||
Music Metadata API uses **zero web frameworks**. Everything is Go stdlib:
|
||||
|
||||
**Routing:** Go 1.22+ enhanced `http.ServeMux`
|
||||
- Method-specific routes: `GET /path`, `POST /path`
|
||||
- Path parameters: `/lookup/track/{id}`
|
||||
- No regex, no wildcards (simple patterns only)
|
||||
|
||||
**JSON:** `encoding/json` stdlib
|
||||
- `json.NewEncoder(w).Encode(data)` for responses
|
||||
- `json.NewDecoder(r.Body).Decode(&req)` for requests
|
||||
|
||||
**HTTP Server:** `net/http` stdlib
|
||||
- `http.Server` with custom `Addr` and `Handler`
|
||||
- No middleware framework (custom rate limiter)
|
||||
|
||||
**Database:** `database/sql` stdlib
|
||||
- `modernc.org/sqlite` driver (pure Go, no CGO)
|
||||
- Raw SQL queries (no ORM)
|
||||
|
||||
**Logging:** `log/slog` stdlib
|
||||
- Structured logging for errors
|
||||
- No log levels (all logs are errors)
|
||||
|
||||
**Benefits:**
|
||||
- Minimal dependencies (2 external packages)
|
||||
- No framework lock-in
|
||||
- Easy to understand (no magic)
|
||||
- Fast compilation
|
||||
- Small binary size
|
||||
|
||||
**Tradeoffs:**
|
||||
- More boilerplate (manual error handling)
|
||||
- No built-in middleware chain
|
||||
- Manual query building (no ORM)
|
||||
- No automatic validation
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
**Strengths:**
|
||||
- Read-only databases (no write locks)
|
||||
- Connection pooling (8 connections)
|
||||
- Memory-mapped I/O (1GB mmap)
|
||||
- Batch optimization (343x fewer queries)
|
||||
- Conservative cache (64MB)
|
||||
|
||||
**Bottlenecks:**
|
||||
- Search queries (LIKE %query% on 256M rows)
|
||||
- Rate limiter memory leak (unbounded map)
|
||||
- No query result caching
|
||||
- No CDN for image URLs
|
||||
|
||||
**Scalability:**
|
||||
- Horizontal: Run multiple instances (read-only safe)
|
||||
- Vertical: Limited by disk I/O and SQLite's single-writer model (not applicable here)
|
||||
- Database size: 216GB requires SSD for acceptable performance
|
||||
@@ -0,0 +1,945 @@
|
||||
# Music Metadata API - Codebase Analysis
|
||||
|
||||
## Codebase Overview
|
||||
|
||||
Music Metadata API is a small, focused Go codebase with minimal complexity:
|
||||
|
||||
**Total lines of code:** ~1,100 lines (excluding tests, which don't exist)
|
||||
|
||||
**File breakdown:**
|
||||
- `cmd/server/main.go` - 62 lines (entry point)
|
||||
- `internal/db/db.go` - 907 lines (database layer, largest file)
|
||||
- `internal/models/models.go` - 65 lines (data structures)
|
||||
- `internal/api/handlers.go` - ~150 lines (HTTP handlers)
|
||||
- `internal/api/ratelimit.go` - ~80 lines (rate limiting)
|
||||
- `internal/api/openapi.go` - ~100 lines (OpenAPI spec)
|
||||
|
||||
**Characteristics:**
|
||||
- No web framework (stdlib only)
|
||||
- No ORM (raw SQL)
|
||||
- No test files (zero test coverage)
|
||||
- No configuration files (CLI flags only)
|
||||
- Minimal dependencies (2 external packages)
|
||||
|
||||
## Configuration
|
||||
|
||||
### CLI Flags
|
||||
|
||||
**Defined in:** `cmd/server/main.go`
|
||||
|
||||
```go
|
||||
var (
|
||||
dbPath = flag.String("db", "", "path to database file (required)")
|
||||
addr = flag.String("addr", ":8080", "HTTP server address")
|
||||
)
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./metadata-api -db /data/main_database.sqlite3 -addr :8080
|
||||
```
|
||||
|
||||
**Limitations:**
|
||||
- Only 2 configurable parameters
|
||||
- No environment variable support
|
||||
- No configuration file support
|
||||
- All timeouts hardcoded
|
||||
- All limits hardcoded
|
||||
|
||||
### Hardcoded Configuration
|
||||
|
||||
**Timeouts:**
|
||||
```go
|
||||
// Graceful shutdown timeout
|
||||
shutdownTimeout := 10 * time.Second
|
||||
|
||||
// Search query timeout
|
||||
ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)
|
||||
```
|
||||
|
||||
**Rate limiting:**
|
||||
```go
|
||||
// Hardcoded in api/ratelimit.go
|
||||
rateLimiter := NewRateLimiter(100, 200) // 100 req/s, 200 burst
|
||||
```
|
||||
|
||||
**Database connection pool:**
|
||||
```go
|
||||
// Hardcoded in db/db.go
|
||||
db.SetMaxOpenConns(8)
|
||||
db.SetMaxIdleConns(8)
|
||||
db.SetConnMaxLifetime(0)
|
||||
```
|
||||
|
||||
**Search limits:**
|
||||
```go
|
||||
// Hardcoded in api/handlers.go
|
||||
const (
|
||||
minQueryLength = 2
|
||||
maxSearchLimit = 50
|
||||
defaultLimit = 10
|
||||
)
|
||||
```
|
||||
|
||||
**Batch limits:**
|
||||
```go
|
||||
// Hardcoded in api/handlers.go
|
||||
const maxBatchItems = 400
|
||||
```
|
||||
|
||||
**SQLite PRAGMAs:**
|
||||
```go
|
||||
// Hardcoded in db/db.go
|
||||
dsn := fmt.Sprintf("file:%s?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true", dbPath)
|
||||
```
|
||||
|
||||
**Recommendation:** Extract to configuration struct for flexibility.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
**docker-compose.yml defines:**
|
||||
```yaml
|
||||
environment:
|
||||
- LOG_LEVEL=info
|
||||
```
|
||||
|
||||
**BUG:** `LOG_LEVEL` is not used in code. No log level control implemented.
|
||||
|
||||
**Expected behavior:** Filter logs by level (debug, info, warn, error)
|
||||
|
||||
**Actual behavior:** All logs output (no filtering)
|
||||
|
||||
**Fix required:**
|
||||
```go
|
||||
// Add to main.go
|
||||
logLevel := os.Getenv("LOG_LEVEL")
|
||||
if logLevel == "" {
|
||||
logLevel = "info"
|
||||
}
|
||||
|
||||
var level slog.Level
|
||||
switch logLevel {
|
||||
case "debug":
|
||||
level = slog.LevelDebug
|
||||
case "info":
|
||||
level = slog.LevelInfo
|
||||
case "warn":
|
||||
level = slog.LevelWarn
|
||||
case "error":
|
||||
level = slog.LevelError
|
||||
}
|
||||
|
||||
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{Level: level}))
|
||||
```
|
||||
|
||||
## Logging
|
||||
|
||||
### Implementation
|
||||
|
||||
**Package:** Go stdlib `log/slog` (structured logging)
|
||||
|
||||
**Usage pattern:**
|
||||
```go
|
||||
slog.Error("Database query failed", "error", err, "query", query)
|
||||
```
|
||||
|
||||
**Output format:**
|
||||
```json
|
||||
{"time":"2024-01-15T10:30:00Z","level":"ERROR","msg":"Database query failed","error":"no such table","query":"SELECT * FROM tracks"}
|
||||
```
|
||||
|
||||
### Logging Locations
|
||||
|
||||
**Error logging only:**
|
||||
- Database query failures
|
||||
- JSON decode errors
|
||||
- HTTP handler errors
|
||||
- Graceful shutdown errors
|
||||
|
||||
**No info/debug logging:**
|
||||
- Request logging (no access logs)
|
||||
- Query execution logging
|
||||
- Performance metrics
|
||||
- Startup messages
|
||||
|
||||
**Example from db.go:**
|
||||
```go
|
||||
rows, err := d.mainDB.Query(query, args...)
|
||||
if err != nil {
|
||||
slog.Error("Query failed", "error", err, "query", query)
|
||||
return nil, err
|
||||
}
|
||||
```
|
||||
|
||||
### Log Level Control
|
||||
|
||||
**Current:** No log level filtering (all logs output)
|
||||
|
||||
**Missing:**
|
||||
- Debug logs (query details, timing)
|
||||
- Info logs (startup, shutdown, requests)
|
||||
- Warn logs (rate limiting, slow queries)
|
||||
|
||||
**Recommendation:** Implement log level control via environment variable.
|
||||
|
||||
## Health Checks
|
||||
|
||||
### Naive Implementation
|
||||
|
||||
**Endpoint:** `GET /health`
|
||||
|
||||
**Code:**
|
||||
```go
|
||||
func handleHealth(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "application/json")
|
||||
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{"status":"ok"}
|
||||
```
|
||||
|
||||
**Problem:** Always returns 200 OK, even if database is unreachable.
|
||||
|
||||
**Test:**
|
||||
```bash
|
||||
# Stop database (simulate failure)
|
||||
mv /data/main_database.sqlite3 /data/main_database.sqlite3.bak
|
||||
|
||||
# Health check still returns OK
|
||||
curl http://localhost:8080/health
|
||||
# {"status":"ok"}
|
||||
|
||||
# But actual queries fail
|
||||
curl http://localhost:8080/lookup/track/abc123
|
||||
# 500 Internal Server Error
|
||||
```
|
||||
|
||||
### Improved Health Check
|
||||
|
||||
**Recommendation:**
|
||||
```go
|
||||
func handleHealth(db *sql.DB) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
// Ping database
|
||||
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
|
||||
defer cancel()
|
||||
|
||||
if err := db.PingContext(ctx); err != nil {
|
||||
w.WriteHeader(http.StatusServiceUnavailable)
|
||||
json.NewEncoder(w).Encode(map[string]string{
|
||||
"status": "unhealthy",
|
||||
"error": "database unavailable",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
// Optional: Test query
|
||||
var count int
|
||||
err := db.QueryRowContext(ctx, "SELECT COUNT(*) FROM tracks LIMIT 1").Scan(&count)
|
||||
if err != nil {
|
||||
w.WriteHeader(http.StatusServiceUnavailable)
|
||||
json.NewEncoder(w).Encode(map[string]string{
|
||||
"status": "unhealthy",
|
||||
"error": "database query failed",
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
### Implementation
|
||||
|
||||
**File:** `internal/api/ratelimit.go`
|
||||
|
||||
**Algorithm:** Token bucket per IP
|
||||
|
||||
**Data structure:**
|
||||
```go
|
||||
type RateLimiter struct {
|
||||
visitors map[string]*rate.Limiter // IP -> limiter
|
||||
mu sync.RWMutex // Protects visitors map
|
||||
rate rate.Limit // Tokens per second
|
||||
burst int // Burst capacity
|
||||
}
|
||||
```
|
||||
|
||||
**Middleware:**
|
||||
```go
|
||||
func (rl *RateLimiter) Limit(next http.Handler) http.Handler {
|
||||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
// Extract IP
|
||||
ip := getIP(r)
|
||||
|
||||
// Get or create limiter for IP
|
||||
limiter := rl.getLimiter(ip)
|
||||
|
||||
// Check if allowed
|
||||
if !limiter.Allow() {
|
||||
w.Header().Set("Retry-After", "1")
|
||||
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
|
||||
return
|
||||
}
|
||||
|
||||
next.ServeHTTP(w, r)
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
**IP extraction:**
|
||||
```go
|
||||
func getIP(r *http.Request) string {
|
||||
// Check X-Forwarded-For header (proxy/load balancer)
|
||||
forwarded := r.Header.Get("X-Forwarded-For")
|
||||
if forwarded != "" {
|
||||
// Take first IP if comma-separated
|
||||
ips := strings.Split(forwarded, ",")
|
||||
return strings.TrimSpace(ips[0])
|
||||
}
|
||||
|
||||
// Fallback to RemoteAddr
|
||||
ip, _, _ := net.SplitHostPort(r.RemoteAddr)
|
||||
return ip
|
||||
}
|
||||
```
|
||||
|
||||
### Memory Leak
|
||||
|
||||
**Problem:** Visitor map grows unbounded. No cleanup for inactive IPs.
|
||||
|
||||
**Code:**
|
||||
```go
|
||||
func (rl *RateLimiter) getLimiter(ip string) *rate.Limiter {
|
||||
rl.mu.Lock()
|
||||
defer rl.mu.Unlock()
|
||||
|
||||
limiter, exists := rl.visitors[ip]
|
||||
if !exists {
|
||||
limiter = rate.NewLimiter(rl.rate, rl.burst)
|
||||
rl.visitors[ip] = limiter // BUG: Never removed
|
||||
}
|
||||
|
||||
return limiter
|
||||
}
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
- Long-running servers accumulate IPs
|
||||
- Memory usage grows over time
|
||||
- No expiration for inactive IPs
|
||||
|
||||
**Example:**
|
||||
- 1 million unique IPs over 1 month
|
||||
- ~100 bytes per limiter
|
||||
- ~100MB memory leak
|
||||
|
||||
**Fix:**
|
||||
```go
|
||||
type visitor struct {
|
||||
limiter *rate.Limiter
|
||||
lastSeen time.Time
|
||||
}
|
||||
|
||||
func (rl *RateLimiter) cleanup() {
|
||||
ticker := time.NewTicker(1 * time.Hour)
|
||||
defer ticker.Stop()
|
||||
|
||||
for range ticker.C {
|
||||
rl.mu.Lock()
|
||||
for ip, v := range rl.visitors {
|
||||
// Remove visitors inactive for 24 hours
|
||||
if time.Since(v.lastSeen) > 24*time.Hour {
|
||||
delete(rl.visitors, ip)
|
||||
}
|
||||
}
|
||||
rl.mu.Unlock()
|
||||
}
|
||||
}
|
||||
|
||||
// Start cleanup goroutine in NewRateLimiter
|
||||
go rl.cleanup()
|
||||
```
|
||||
|
||||
### Rate Limit Configuration
|
||||
|
||||
**Current:** Hardcoded (100 req/s, 200 burst)
|
||||
|
||||
**Recommendation:** Make configurable via CLI flags or environment variables.
|
||||
|
||||
```go
|
||||
// CLI flags
|
||||
var (
|
||||
rateLimit = flag.Int("rate-limit", 100, "requests per second")
|
||||
rateBurst = flag.Int("rate-burst", 200, "burst capacity")
|
||||
)
|
||||
|
||||
// Usage
|
||||
rateLimiter := api.NewRateLimiter(rate.Limit(*rateLimit), *rateBurst)
|
||||
```
|
||||
|
||||
## Search Implementation
|
||||
|
||||
### Query Pattern
|
||||
|
||||
**Track search:**
|
||||
```go
|
||||
query := `
|
||||
SELECT id, name, isrc, duration_ms, popularity, album_rowid
|
||||
FROM tracks
|
||||
WHERE name LIKE ? COLLATE NOCASE
|
||||
ORDER BY popularity DESC
|
||||
LIMIT ?
|
||||
`
|
||||
args := []interface{}{"%" + searchQuery + "%", limit}
|
||||
```
|
||||
|
||||
**Artist search:**
|
||||
```go
|
||||
query := `
|
||||
SELECT id, name, followers_total, popularity
|
||||
FROM artists
|
||||
WHERE name LIKE ? COLLATE NOCASE
|
||||
ORDER BY followers_total DESC
|
||||
LIMIT ?
|
||||
`
|
||||
args := []interface{}{"%" + searchQuery + "%", limit}
|
||||
```
|
||||
|
||||
### Performance Characteristics
|
||||
|
||||
**LIKE %query% problems:**
|
||||
- Can't use indexes (full table scan)
|
||||
- Slow on 256M rows
|
||||
- CPU-intensive (string matching)
|
||||
|
||||
**Benchmark (estimated):**
|
||||
- Common query ("love"): 5-10 seconds
|
||||
- Specific query ("bohemian rhapsody"): 1-2 seconds
|
||||
- Rare query ("xyzabc"): 10+ seconds (full scan)
|
||||
|
||||
**10-second timeout:**
|
||||
```go
|
||||
ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)
|
||||
defer cancel()
|
||||
|
||||
rows, err := db.QueryContext(ctx, query, args...)
|
||||
if err == context.DeadlineExceeded {
|
||||
http.Error(w, "Search timeout", http.StatusGatewayTimeout)
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
### Search Validation
|
||||
|
||||
**Minimum query length:**
|
||||
```go
|
||||
if len(searchQuery) < 2 {
|
||||
http.Error(w, "Query must be at least 2 characters", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
**Maximum limit:**
|
||||
```go
|
||||
if limit > 50 {
|
||||
http.Error(w, "Limit cannot exceed 50", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
**Default limit:**
|
||||
```go
|
||||
limit := 10
|
||||
if limitParam := r.URL.Query().Get("limit"); limitParam != "" {
|
||||
limit, _ = strconv.Atoi(limitParam)
|
||||
}
|
||||
```
|
||||
|
||||
### Full-Text Search Alternative
|
||||
|
||||
**Not implemented:** SQLite FTS5 (Full-Text Search)
|
||||
|
||||
**FTS5 benefits:**
|
||||
- Indexed search (much faster)
|
||||
- Relevance ranking
|
||||
- Phrase search
|
||||
- Boolean operators
|
||||
|
||||
**Why not used:**
|
||||
- Requires writable database (to create FTS5 table)
|
||||
- Databases are read-only
|
||||
- Would need separate FTS5 database
|
||||
|
||||
**Workaround:**
|
||||
```sql
|
||||
-- Create separate FTS5 database (one-time setup)
|
||||
CREATE VIRTUAL TABLE tracks_fts USING fts5(id, name, content=tracks);
|
||||
INSERT INTO tracks_fts SELECT id, name FROM tracks;
|
||||
|
||||
-- Fast search
|
||||
SELECT * FROM tracks_fts WHERE name MATCH 'bohemian';
|
||||
```
|
||||
|
||||
**Implementation:**
|
||||
- Create FTS5 database during database preparation
|
||||
- Open second database connection in code
|
||||
- Query FTS5 for search, then fetch full data from main DB
|
||||
|
||||
## Testing
|
||||
|
||||
### Test Coverage
|
||||
|
||||
**Test files:** 0
|
||||
**Test coverage:** 0%
|
||||
**Test framework:** None
|
||||
|
||||
**Evidence:**
|
||||
```bash
|
||||
# No test files in repository
|
||||
find . -name "*_test.go"
|
||||
# (no output)
|
||||
```
|
||||
|
||||
**.gitignore includes:**
|
||||
```
|
||||
coverage.out
|
||||
```
|
||||
|
||||
**Implication:** Testing was planned but never implemented.
|
||||
|
||||
### CI/CD Testing
|
||||
|
||||
**GitHub Actions workflow:** `.github/workflows/docker-publish.yml`
|
||||
|
||||
**Steps:**
|
||||
1. Checkout code
|
||||
2. Build Docker image
|
||||
3. Push to registry
|
||||
|
||||
**Missing:** No test step
|
||||
|
||||
**Expected workflow:**
|
||||
```yaml
|
||||
- name: Run tests
|
||||
run: go test -v ./...
|
||||
|
||||
- name: Check coverage
|
||||
run: go test -cover ./...
|
||||
```
|
||||
|
||||
### Manual Testing
|
||||
|
||||
**Only testing:** Manual API calls
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# Track lookup
|
||||
curl http://localhost:8080/lookup/track/abc123
|
||||
|
||||
# Search
|
||||
curl "http://localhost:8080/search/track?q=test"
|
||||
```
|
||||
|
||||
**No automated testing:**
|
||||
- No unit tests
|
||||
- No integration tests
|
||||
- No end-to-end tests
|
||||
- No performance tests
|
||||
- No load tests
|
||||
|
||||
### Testing Recommendations
|
||||
|
||||
**Unit tests needed:**
|
||||
- Rate limiter logic
|
||||
- IP extraction
|
||||
- Query building
|
||||
- Data enrichment
|
||||
- JSON serialization
|
||||
|
||||
**Integration tests needed:**
|
||||
- Database queries
|
||||
- HTTP handlers
|
||||
- Batch operations
|
||||
- Search functionality
|
||||
|
||||
**Example unit test:**
|
||||
```go
|
||||
// internal/api/ratelimit_test.go
|
||||
func TestRateLimiter(t *testing.T) {
|
||||
rl := NewRateLimiter(10, 20) // 10 req/s, 20 burst
|
||||
|
||||
// Should allow burst
|
||||
for i := 0; i < 20; i++ {
|
||||
if !rl.getLimiter("127.0.0.1").Allow() {
|
||||
t.Errorf("Request %d should be allowed", i)
|
||||
}
|
||||
}
|
||||
|
||||
// Should reject 21st request
|
||||
if rl.getLimiter("127.0.0.1").Allow() {
|
||||
t.Error("Request 21 should be rate limited")
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Example integration test:**
|
||||
```go
|
||||
// internal/db/db_test.go
|
||||
func TestGetTrack(t *testing.T) {
|
||||
db, err := NewDatabase("testdata/test.db")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
track, err := db.GetTrack("test_track_id")
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
if track.Name != "Test Track" {
|
||||
t.Errorf("Expected 'Test Track', got '%s'", track.Name)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Error Patterns
|
||||
|
||||
**Database errors:**
|
||||
```go
|
||||
rows, err := db.Query(query, args...)
|
||||
if err != nil {
|
||||
slog.Error("Query failed", "error", err)
|
||||
http.Error(w, "Internal server error", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
**JSON decode errors:**
|
||||
```go
|
||||
var req BatchRequest
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
http.Error(w, "Invalid JSON", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
**Validation errors:**
|
||||
```go
|
||||
if len(query) < 2 {
|
||||
http.Error(w, "Query too short", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
```
|
||||
|
||||
### Error Responses
|
||||
|
||||
**Generic errors:**
|
||||
```go
|
||||
http.Error(w, "Internal server error", http.StatusInternalServerError)
|
||||
```
|
||||
|
||||
**Problem:** No error details returned to client (security vs usability tradeoff)
|
||||
|
||||
**Structured errors (not implemented):**
|
||||
```go
|
||||
type ErrorResponse struct {
|
||||
Error string `json:"error"`
|
||||
Code string `json:"code"`
|
||||
Details string `json:"details,omitempty"`
|
||||
}
|
||||
|
||||
func writeError(w http.ResponseWriter, status int, code, message string) {
|
||||
w.WriteHeader(status)
|
||||
json.NewEncoder(w).Encode(ErrorResponse{
|
||||
Error: message,
|
||||
Code: code,
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
## Code Quality
|
||||
|
||||
### Strengths
|
||||
|
||||
**Simplicity:**
|
||||
- Small codebase (~1,100 lines)
|
||||
- Easy to understand
|
||||
- Minimal dependencies
|
||||
- No framework magic
|
||||
|
||||
**Readability:**
|
||||
- Clear function names
|
||||
- Logical file organization
|
||||
- Consistent style
|
||||
|
||||
**Performance:**
|
||||
- Batch optimization (343x fewer queries)
|
||||
- Connection pooling
|
||||
- Memory-mapped I/O
|
||||
|
||||
### Weaknesses
|
||||
|
||||
**No tests:**
|
||||
- Zero test coverage
|
||||
- No regression protection
|
||||
- No documentation via tests
|
||||
|
||||
**Hardcoded config:**
|
||||
- No flexibility
|
||||
- Requires recompilation to change limits
|
||||
- No environment-specific config
|
||||
|
||||
**Memory leak:**
|
||||
- Rate limiter visitor map grows unbounded
|
||||
- Requires periodic restarts
|
||||
|
||||
**Naive health check:**
|
||||
- Doesn't verify database connectivity
|
||||
- False positives in monitoring
|
||||
|
||||
**No metrics:**
|
||||
- No visibility into performance
|
||||
- No error rate tracking
|
||||
- No usage analytics
|
||||
|
||||
**Unused config:**
|
||||
- `LOG_LEVEL` environment variable ignored
|
||||
- Misleading documentation
|
||||
|
||||
**No CORS:**
|
||||
- Browser-based clients blocked
|
||||
- Requires reverse proxy workaround
|
||||
|
||||
**No authentication:**
|
||||
- Public API (security risk)
|
||||
- No usage tracking per user
|
||||
|
||||
### Code Smells
|
||||
|
||||
**Magic numbers:**
|
||||
```go
|
||||
// What is 64000? Why 1073741824?
|
||||
_cache_size=-64000&_mmap_size=1073741824
|
||||
```
|
||||
|
||||
**Fix:** Use named constants
|
||||
```go
|
||||
const (
|
||||
sqliteCacheSizeKB = 64000 // 64MB
|
||||
sqliteMmapSizeBytes = 1 << 30 // 1GB
|
||||
)
|
||||
```
|
||||
|
||||
**Repeated code:**
|
||||
```go
|
||||
// Similar enrichment logic repeated for tracks, albums, artists
|
||||
func enrichTrack(track *Track) { /* ... */ }
|
||||
func enrichAlbum(album *Album) { /* ... */ }
|
||||
func enrichArtist(artist *Artist) { /* ... */ }
|
||||
```
|
||||
|
||||
**Fix:** Generic enrichment function
|
||||
|
||||
**Global state:**
|
||||
```go
|
||||
// Rate limiter as global variable (not shown in code, but implied)
|
||||
var rateLimiter *RateLimiter
|
||||
```
|
||||
|
||||
**Fix:** Dependency injection
|
||||
|
||||
## Dependencies
|
||||
|
||||
### External Packages
|
||||
|
||||
**modernc.org/sqlite v1.34.4:**
|
||||
- Pure Go SQLite driver
|
||||
- No CGO required
|
||||
- 100% Go implementation
|
||||
- Larger binary size vs CGO version
|
||||
|
||||
**golang.org/x/time v0.14.0:**
|
||||
- Rate limiting (token bucket)
|
||||
- Part of Go extended stdlib
|
||||
- Minimal, focused package
|
||||
|
||||
**Total dependencies:** 2 direct + transitive dependencies
|
||||
|
||||
### Dependency Management
|
||||
|
||||
**go.mod:**
|
||||
```go
|
||||
module github.com/Aunali321/music-metadata-api
|
||||
|
||||
go 1.24
|
||||
|
||||
require (
|
||||
modernc.org/sqlite v1.34.4
|
||||
golang.org/x/time v0.14.0
|
||||
)
|
||||
```
|
||||
|
||||
**Dependency updates:**
|
||||
```bash
|
||||
# Check for updates
|
||||
go list -u -m all
|
||||
|
||||
# Update dependencies
|
||||
go get -u ./...
|
||||
go mod tidy
|
||||
```
|
||||
|
||||
**Security scanning:**
|
||||
```bash
|
||||
# Scan for vulnerabilities
|
||||
go list -json -m all | nancy sleuth
|
||||
```
|
||||
|
||||
## Code Organization
|
||||
|
||||
### Package Structure
|
||||
|
||||
```
|
||||
music-metadata-api/
|
||||
├── cmd/
|
||||
│ └── server/ # Entry point
|
||||
│ └── main.go # CLI, server setup, graceful shutdown
|
||||
│
|
||||
├── internal/ # Private packages
|
||||
│ ├── api/ # HTTP layer
|
||||
│ │ ├── handlers.go # Route handlers
|
||||
│ │ ├── ratelimit.go # Rate limiting middleware
|
||||
│ │ └── openapi.go # OpenAPI spec
|
||||
│ │
|
||||
│ ├── db/ # Database layer
|
||||
│ │ └── db.go # Queries, enrichment, batch optimization
|
||||
│ │
|
||||
│ └── models/ # Data models
|
||||
│ └── models.go # Structs, JSON tags
|
||||
│
|
||||
├── Dockerfile # Container build
|
||||
├── docker-compose.yml # Local deployment
|
||||
├── go.mod # Dependencies
|
||||
└── .github/
|
||||
└── workflows/
|
||||
└── docker-publish.yml # CI/CD
|
||||
```
|
||||
|
||||
### Separation of Concerns
|
||||
|
||||
**Good:**
|
||||
- Clear layer boundaries (API → DB → Models)
|
||||
- No circular dependencies
|
||||
- Database logic isolated from HTTP
|
||||
|
||||
**Could improve:**
|
||||
- Extract configuration to separate package
|
||||
- Extract validation to separate package
|
||||
- Extract error handling to separate package
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Bottlenecks
|
||||
|
||||
**Search queries:**
|
||||
- `LIKE %query%` full table scan
|
||||
- 10-second timeout (can be hit)
|
||||
- CPU-bound (string matching)
|
||||
|
||||
**Rate limiter:**
|
||||
- RWMutex contention under high load
|
||||
- Map lookup on every request
|
||||
|
||||
**Database:**
|
||||
- Single SQLite file (no sharding)
|
||||
- 8 connection limit (conservative)
|
||||
|
||||
### Optimizations
|
||||
|
||||
**Batch queries:**
|
||||
- 343x fewer queries (400 items: 7 queries vs 2,800)
|
||||
- IN clause for bulk lookups
|
||||
|
||||
**Connection pooling:**
|
||||
- Reuse connections (no overhead)
|
||||
- 8 warm connections
|
||||
|
||||
**Memory-mapped I/O:**
|
||||
- 1GB mmap (faster than read() syscalls)
|
||||
- OS handles paging
|
||||
|
||||
**Read-only mode:**
|
||||
- No write locks
|
||||
- Safe concurrent reads
|
||||
|
||||
## Maintainability
|
||||
|
||||
### Documentation
|
||||
|
||||
**Code comments:** Minimal
|
||||
|
||||
**README:** Basic (installation, usage)
|
||||
|
||||
**OpenAPI spec:** Comprehensive (all endpoints documented)
|
||||
|
||||
**No inline documentation:**
|
||||
```go
|
||||
// No function comments
|
||||
func enrichTrack(track *Track) {
|
||||
// No explanation of enrichment logic
|
||||
}
|
||||
```
|
||||
|
||||
**Recommendation:** Add godoc comments
|
||||
```go
|
||||
// enrichTrack populates track with related entities (artists, album, track files).
|
||||
// It performs batch queries to minimize database round-trips.
|
||||
func enrichTrack(track *Track) {
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
### Extensibility
|
||||
|
||||
**Easy to extend:**
|
||||
- Add new endpoints (register route)
|
||||
- Add new models (define struct)
|
||||
- Add new queries (write SQL)
|
||||
|
||||
**Hard to extend:**
|
||||
- Change rate limiting strategy (tightly coupled)
|
||||
- Add authentication (no middleware chain)
|
||||
- Add metrics (no instrumentation points)
|
||||
|
||||
### Technical Debt
|
||||
|
||||
**High priority:**
|
||||
1. Fix rate limiter memory leak
|
||||
2. Implement proper health check
|
||||
3. Add test coverage
|
||||
4. Use LOG_LEVEL environment variable
|
||||
|
||||
**Medium priority:**
|
||||
1. Extract hardcoded config
|
||||
2. Add metrics/monitoring
|
||||
3. Implement CORS support
|
||||
4. Add authentication
|
||||
|
||||
**Low priority:**
|
||||
1. Improve search performance (FTS5)
|
||||
2. Add caching layer
|
||||
3. Structured error responses
|
||||
4. Request logging
|
||||
@@ -0,0 +1,911 @@
|
||||
# Music Metadata API - Data Layer
|
||||
|
||||
## Database Architecture
|
||||
|
||||
Music Metadata API uses a dual-database architecture with two separate SQLite files:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Application Layer │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌───────────┴───────────┐
|
||||
▼ ▼
|
||||
┌──────────────────────────┐ ┌──────────────────────────┐
|
||||
│ main_database.sqlite3 │ │ track_files.sqlite3 │
|
||||
│ (~117GB) │ │ (~99GB) │
|
||||
│ │ │ │
|
||||
│ - tracks │ │ - track_files │
|
||||
│ - albums │ │ (extended metadata) │
|
||||
│ - artists │ │ │
|
||||
│ - track_artists │ │ │
|
||||
│ - artist_albums │ │ │
|
||||
│ - album_images │ │ │
|
||||
│ - artist_images │ │ │
|
||||
│ - artist_genres │ │ │
|
||||
└──────────────────────────┘ └──────────────────────────┘
|
||||
```
|
||||
|
||||
**Total storage:** ~216GB
|
||||
**Total tracks:** 256 million
|
||||
**Connection mode:** Read-only
|
||||
**Driver:** modernc.org/sqlite v1.34.4 (pure Go, no CGO)
|
||||
|
||||
## Connection Configuration
|
||||
|
||||
### Connection Strings
|
||||
|
||||
**Main database:**
|
||||
```
|
||||
file:/path/to/main_database.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
|
||||
```
|
||||
|
||||
**Track files database:**
|
||||
```
|
||||
file:/path/to/track_files.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
|
||||
```
|
||||
|
||||
### PRAGMA Settings
|
||||
|
||||
| PRAGMA | Value | Purpose | Impact |
|
||||
|--------|-------|---------|--------|
|
||||
| `mode=ro` | Read-only | Prevents writes | No write locks, safe concurrent reads |
|
||||
| `_journal_mode=off` | Disabled | No WAL/rollback journal | Faster reads, safe for read-only |
|
||||
| `_cache_size=-64000` | 64MB | Page cache size | Reduces disk I/O for hot data |
|
||||
| `_mmap_size=1073741824` | 1GB | Memory-mapped I/O | Faster reads via mmap |
|
||||
| `_query_only=true` | Enabled | Additional read-only enforcement | Extra safety layer |
|
||||
|
||||
**Cache size calculation:**
|
||||
- Negative value = kilobytes
|
||||
- `-64000` = 64,000 KB = 64 MB
|
||||
- Default SQLite cache is ~2MB (32x increase)
|
||||
|
||||
**Memory-mapped I/O:**
|
||||
- Maps 1GB of database file into process memory
|
||||
- OS handles paging (faster than read() syscalls)
|
||||
- Effective for frequently accessed data
|
||||
|
||||
### Connection Pool
|
||||
|
||||
```go
|
||||
db.SetMaxOpenConns(8) // Conservative limit (8 concurrent queries)
|
||||
db.SetMaxIdleConns(8) // Keep all connections warm
|
||||
db.SetConnMaxLifetime(0) // No expiration (read-only safe)
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Read-only workload (no write contention)
|
||||
- SQLite handles concurrent reads well
|
||||
- 8 connections balance throughput vs resource usage
|
||||
- No connection recycling needed (no state changes)
|
||||
|
||||
## Main Database Schema
|
||||
|
||||
### tracks Table
|
||||
|
||||
**Purpose:** Core track metadata
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `rowid` | INTEGER | SQLite internal row ID | No |
|
||||
| `id` | TEXT | Internal track ID | No |
|
||||
| `name` | TEXT | Track title | No |
|
||||
| `isrc` | TEXT | ISRC code | Yes |
|
||||
| `duration_ms` | INTEGER | Duration in milliseconds | No |
|
||||
| `explicit` | INTEGER | Explicit content flag (0/1) | No |
|
||||
| `track_number` | INTEGER | Track number on album | No |
|
||||
| `disc_number` | INTEGER | Disc number | No |
|
||||
| `popularity` | INTEGER | Popularity score (0-100) | No |
|
||||
| `preview_url` | TEXT | 30-second preview URL | Yes |
|
||||
| `album_rowid` | INTEGER | Foreign key to albums.rowid | No |
|
||||
|
||||
**Indexes:**
|
||||
- Primary key on `id`
|
||||
- Index on `isrc` (for ISRC lookups)
|
||||
- Index on `album_rowid` (for album track listings)
|
||||
|
||||
**Sample row:**
|
||||
```sql
|
||||
id: 4cOdK2wGLETKBW3PvgPWqT
|
||||
name: Bohemian Rhapsody
|
||||
isrc: GBUM71029604
|
||||
duration_ms: 354320
|
||||
explicit: 0
|
||||
track_number: 11
|
||||
disc_number: 1
|
||||
popularity: 89
|
||||
preview_url: https://p.scdn.co/mp3-preview/...
|
||||
album_rowid: 12345
|
||||
```
|
||||
|
||||
**Estimated rows:** 256 million
|
||||
|
||||
### albums Table
|
||||
|
||||
**Purpose:** Album metadata
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `rowid` | INTEGER | SQLite internal row ID | No |
|
||||
| `id` | TEXT | Internal album ID | No |
|
||||
| `name` | TEXT | Album title | No |
|
||||
| `album_type` | TEXT | "album", "single", "compilation" | No |
|
||||
| `label` | TEXT | Record label | Yes |
|
||||
| `release_date` | TEXT | ISO 8601 date (YYYY-MM-DD) | No |
|
||||
| `release_date_precision` | TEXT | "year", "month", "day" | No |
|
||||
| `external_id_upc` | TEXT | UPC barcode | Yes |
|
||||
| `total_tracks` | INTEGER | Total tracks on album | No |
|
||||
| `copyright_c` | TEXT | Copyright notice | Yes |
|
||||
| `copyright_p` | TEXT | Phonographic copyright | Yes |
|
||||
|
||||
**Indexes:**
|
||||
- Primary key on `id`
|
||||
- Index on `rowid` (for track joins)
|
||||
|
||||
**Sample row:**
|
||||
```sql
|
||||
id: 2ODvWsOgouMbaA5xf0RkJe
|
||||
name: A Night at the Opera
|
||||
album_type: album
|
||||
label: Hollywood Records
|
||||
release_date: 1975-11-21
|
||||
release_date_precision: day
|
||||
external_id_upc: 050087246679
|
||||
total_tracks: 12
|
||||
copyright_c: 1975 Queen Productions Ltd
|
||||
copyright_p: 1975 Queen Productions Ltd
|
||||
```
|
||||
|
||||
**Estimated rows:** Tens of millions (fewer than tracks)
|
||||
|
||||
### artists Table
|
||||
|
||||
**Purpose:** Artist metadata
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `rowid` | INTEGER | SQLite internal row ID | No |
|
||||
| `id` | TEXT | Internal artist ID | No |
|
||||
| `name` | TEXT | Artist name | No |
|
||||
| `followers_total` | INTEGER | Total followers | Yes |
|
||||
| `popularity` | INTEGER | Popularity score (0-100) | Yes |
|
||||
|
||||
**Indexes:**
|
||||
- Primary key on `id`
|
||||
- Index on `name` (for search)
|
||||
|
||||
**Sample row:**
|
||||
```sql
|
||||
id: 0TnOYISbd1XYRBk9myaseg
|
||||
name: Queen
|
||||
followers_total: 45000000
|
||||
popularity: 92
|
||||
```
|
||||
|
||||
**Estimated rows:** Millions (fewer than albums)
|
||||
|
||||
### track_artists Table
|
||||
|
||||
**Purpose:** Many-to-many relationship between tracks and artists
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `track_id` | TEXT | Foreign key to tracks.id | No |
|
||||
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
||||
|
||||
**Indexes:**
|
||||
- Composite index on `(track_id, artist_id)`
|
||||
- Index on `artist_id` (for artist track listings)
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 0TnOYISbd1XYRBk9myaseg
|
||||
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 1A2B3C4D5E6F7G8H9I0J
|
||||
```
|
||||
|
||||
**Estimated rows:** Hundreds of millions (tracks can have multiple artists)
|
||||
|
||||
### artist_albums Table
|
||||
|
||||
**Purpose:** Many-to-many relationship between artists and albums with ordering
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
||||
| `album_id` | TEXT | Foreign key to albums.id | No |
|
||||
| `index_in_album` | INTEGER | Artist order on album | No |
|
||||
|
||||
**Indexes:**
|
||||
- Composite index on `(album_id, index_in_album)`
|
||||
- Index on `artist_id` (for artist discography)
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 0
|
||||
artist_id: 1A2B3C4D5E6F7G8H9I0J, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 1
|
||||
```
|
||||
|
||||
**Purpose of index_in_album:** Preserves artist order for multi-artist albums (e.g., "Artist A & Artist B")
|
||||
|
||||
### album_images Table
|
||||
|
||||
**Purpose:** Album artwork URLs
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `album_id` | TEXT | Foreign key to albums.id | No |
|
||||
| `url` | TEXT | Image URL | No |
|
||||
| `width` | INTEGER | Width in pixels | No |
|
||||
| `height` | INTEGER | Height in pixels | No |
|
||||
|
||||
**Indexes:**
|
||||
- Index on `album_id`
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d0000b273..., width: 640, height: 640
|
||||
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00001e02..., width: 300, height: 300
|
||||
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00004851..., width: 64, height: 64
|
||||
```
|
||||
|
||||
**Typical sizes:** 640x640, 300x300, 64x64
|
||||
|
||||
**Image hosting:** External CDN (i.scdn.co), not hosted by API
|
||||
|
||||
### artist_images Table
|
||||
|
||||
**Purpose:** Artist images/photos
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
||||
| `url` | TEXT | Image URL | No |
|
||||
| `width` | INTEGER | Width in pixels | No |
|
||||
| `height` | INTEGER | Height in pixels | No |
|
||||
|
||||
**Indexes:**
|
||||
- Index on `artist_id`
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/af2b8e57f6d7b5d..., width: 640, height: 640
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/c06971e9ff81696..., width: 320, height: 320
|
||||
```
|
||||
|
||||
### artist_genres Table
|
||||
|
||||
**Purpose:** Artist genre tags
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
||||
| `genre` | TEXT | Genre name | No |
|
||||
|
||||
**Indexes:**
|
||||
- Index on `artist_id`
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: rock
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: classic rock
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: glam rock
|
||||
```
|
||||
|
||||
**Genre characteristics:**
|
||||
- Multiple genres per artist
|
||||
- Lowercase, hyphenated (e.g., "indie-rock")
|
||||
- Spotify-style genre taxonomy
|
||||
|
||||
## Track Files Database Schema
|
||||
|
||||
### track_files Table
|
||||
|
||||
**Purpose:** Extended track metadata not in main database
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `track_id` | TEXT | Foreign key to tracks.id | No |
|
||||
| `has_lyrics` | INTEGER | Lyrics availability flag (0/1) | No |
|
||||
| `original_title` | TEXT | Original title (if different) | Yes |
|
||||
| `version_title` | TEXT | Version descriptor (e.g., "Radio Edit") | Yes |
|
||||
| `language_of_performance` | TEXT | JSON array of language codes | Yes |
|
||||
| `artist_roles` | TEXT | JSON object mapping artist IDs to roles | Yes |
|
||||
|
||||
**Indexes:**
|
||||
- Primary key on `track_id`
|
||||
|
||||
**Sample row:**
|
||||
```sql
|
||||
track_id: 4cOdK2wGLETKBW3PvgPWqT
|
||||
has_lyrics: 1
|
||||
original_title: Bohemian Rhapsody
|
||||
version_title: NULL
|
||||
language_of_performance: ["en"]
|
||||
artist_roles: {"0TnOYISbd1XYRBk9myaseg": ["performer", "composer"]}
|
||||
```
|
||||
|
||||
**JSON field parsing:**
|
||||
|
||||
**language_of_performance:**
|
||||
```json
|
||||
["en", "es"] // ISO 639-1 language codes
|
||||
```
|
||||
|
||||
**artist_roles:**
|
||||
```json
|
||||
{
|
||||
"artist_id_1": ["performer", "composer"],
|
||||
"artist_id_2": ["producer"],
|
||||
"artist_id_3": ["lyricist"]
|
||||
}
|
||||
```
|
||||
|
||||
**Common roles:**
|
||||
- `performer` - Main performer
|
||||
- `composer` - Music composer
|
||||
- `lyricist` - Lyrics writer
|
||||
- `producer` - Producer
|
||||
- `engineer` - Recording engineer
|
||||
- `mixer` - Mix engineer
|
||||
|
||||
**Estimated rows:** 256 million (one per track)
|
||||
|
||||
## Query Patterns
|
||||
|
||||
### Individual Track Lookup
|
||||
|
||||
```sql
|
||||
-- Step 1: Fetch track + album (single JOIN)
|
||||
SELECT
|
||||
t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
||||
t.track_number, t.disc_number, t.popularity, t.preview_url,
|
||||
a.id AS album_id, a.name AS album_name, a.album_type,
|
||||
a.label, a.release_date, a.release_date_precision,
|
||||
a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
|
||||
FROM tracks t
|
||||
JOIN albums a ON t.album_rowid = a.rowid
|
||||
WHERE t.id = ?
|
||||
|
||||
-- Step 2: Fetch album images
|
||||
SELECT url, width, height
|
||||
FROM album_images
|
||||
WHERE album_id = ?
|
||||
ORDER BY width DESC
|
||||
|
||||
-- Step 3: Fetch album artists
|
||||
SELECT a.id, a.name, a.followers_total, a.popularity
|
||||
FROM artists a
|
||||
JOIN artist_albums aa ON a.id = aa.artist_id
|
||||
WHERE aa.album_id = ?
|
||||
ORDER BY aa.index_in_album
|
||||
|
||||
-- Step 4: Fetch track artists
|
||||
SELECT a.id, a.name, a.followers_total, a.popularity
|
||||
FROM artists a
|
||||
JOIN track_artists ta ON a.id = ta.artist_id
|
||||
WHERE ta.track_id = ?
|
||||
|
||||
-- Step 5: Fetch artist genres (for each artist)
|
||||
SELECT genre
|
||||
FROM artist_genres
|
||||
WHERE artist_id = ?
|
||||
|
||||
-- Step 6: Fetch artist images (for each artist)
|
||||
SELECT url, width, height
|
||||
FROM artist_images
|
||||
WHERE artist_id = ?
|
||||
ORDER BY width DESC
|
||||
|
||||
-- Step 7: Fetch track files (from track_files.sqlite3)
|
||||
SELECT has_lyrics, original_title, version_title,
|
||||
language_of_performance, artist_roles
|
||||
FROM track_files
|
||||
WHERE track_id = ?
|
||||
```
|
||||
|
||||
**Total queries for single track:** 7+ (depending on number of artists)
|
||||
|
||||
### Batch ISRC Lookup
|
||||
|
||||
```sql
|
||||
-- Step 1: Fetch all tracks by ISRC (single query with IN clause)
|
||||
SELECT
|
||||
t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
||||
t.track_number, t.disc_number, t.popularity, t.preview_url,
|
||||
a.id AS album_id, a.name AS album_name, a.album_type,
|
||||
a.label, a.release_date, a.release_date_precision,
|
||||
a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
|
||||
FROM tracks t
|
||||
JOIN albums a ON t.album_rowid = a.rowid
|
||||
WHERE t.isrc IN (?, ?, ?, ...) -- Up to 400 placeholders
|
||||
|
||||
-- Step 2: Batch fetch album images (all albums at once)
|
||||
SELECT album_id, url, width, height
|
||||
FROM album_images
|
||||
WHERE album_id IN (?, ?, ?, ...)
|
||||
ORDER BY album_id, width DESC
|
||||
|
||||
-- Step 3: Batch fetch album artists
|
||||
SELECT aa.album_id, a.id, a.name, a.followers_total, a.popularity, aa.index_in_album
|
||||
FROM artists a
|
||||
JOIN artist_albums aa ON a.id = aa.artist_id
|
||||
WHERE aa.album_id IN (?, ?, ?, ...)
|
||||
ORDER BY aa.album_id, aa.index_in_album
|
||||
|
||||
-- Step 4: Batch fetch track artists
|
||||
SELECT ta.track_id, a.id, a.name, a.followers_total, a.popularity
|
||||
FROM artists a
|
||||
JOIN track_artists ta ON a.id = ta.artist_id
|
||||
WHERE ta.track_id IN (?, ?, ?, ...)
|
||||
|
||||
-- Step 5: Batch fetch artist genres
|
||||
SELECT artist_id, genre
|
||||
FROM artist_genres
|
||||
WHERE artist_id IN (?, ?, ?, ...)
|
||||
|
||||
-- Step 6: Batch fetch artist images
|
||||
SELECT artist_id, url, width, height
|
||||
FROM artist_images
|
||||
WHERE artist_id IN (?, ?, ?, ...)
|
||||
ORDER BY artist_id, width DESC
|
||||
|
||||
-- Step 7: Batch fetch track files
|
||||
SELECT track_id, has_lyrics, original_title, version_title,
|
||||
language_of_performance, artist_roles
|
||||
FROM track_files
|
||||
WHERE track_id IN (?, ?, ?, ...)
|
||||
```
|
||||
|
||||
**Total queries for 400 tracks:** 7 (vs 2,800+ for individual lookups)
|
||||
|
||||
**Performance gain:** 400x fewer queries
|
||||
|
||||
### Search Queries
|
||||
|
||||
**Track search:**
|
||||
```sql
|
||||
SELECT id, name, isrc, duration_ms, popularity, album_rowid
|
||||
FROM tracks
|
||||
WHERE name LIKE ? COLLATE NOCASE -- ? = '%query%'
|
||||
ORDER BY popularity DESC
|
||||
LIMIT ?
|
||||
```
|
||||
|
||||
**Artist search:**
|
||||
```sql
|
||||
SELECT id, name, followers_total, popularity
|
||||
FROM artists
|
||||
WHERE name LIKE ? COLLATE NOCASE -- ? = '%query%'
|
||||
ORDER BY followers_total DESC
|
||||
LIMIT ?
|
||||
```
|
||||
|
||||
**Search characteristics:**
|
||||
- `LIKE %query%` can't use indexes (full table scan)
|
||||
- `COLLATE NOCASE` for case-insensitive matching
|
||||
- Ordered by popularity/followers (most relevant first)
|
||||
- Limited to 50 results max
|
||||
- 10-second timeout via context deadline
|
||||
|
||||
**Performance concern:** Searching 256M tracks with `LIKE %query%` is slow. Full-text search (FTS5) would be faster but not implemented.
|
||||
|
||||
### Album Tracks Lookup
|
||||
|
||||
```sql
|
||||
-- Fetch all tracks for an album
|
||||
SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
||||
t.track_number, t.disc_number, t.popularity, t.preview_url
|
||||
FROM tracks t
|
||||
WHERE t.album_rowid = (
|
||||
SELECT rowid FROM albums WHERE id = ?
|
||||
)
|
||||
ORDER BY t.disc_number, t.track_number
|
||||
```
|
||||
|
||||
**Ordering:** Disc number first, then track number (preserves album order)
|
||||
|
||||
## Data Enrichment Strategy
|
||||
|
||||
### Enrichment Pipeline
|
||||
|
||||
```
|
||||
1. Fetch base entity (track/album/artist)
|
||||
↓
|
||||
2. Collect related entity IDs
|
||||
↓
|
||||
3. Batch fetch related entities
|
||||
↓
|
||||
4. Assemble nested structures
|
||||
↓
|
||||
5. Return enriched object
|
||||
```
|
||||
|
||||
### Batch Optimization Functions
|
||||
|
||||
**Implementation in db.go (907 lines):**
|
||||
|
||||
```go
|
||||
// Batch fetch album images for multiple albums
|
||||
func (d *Database) batchGetAlbumImages(albumIDs []string) map[string][]Image {
|
||||
// Build IN clause
|
||||
placeholders := strings.Repeat("?,", len(albumIDs)-1) + "?"
|
||||
query := fmt.Sprintf(`
|
||||
SELECT album_id, url, width, height
|
||||
FROM album_images
|
||||
WHERE album_id IN (%s)
|
||||
ORDER BY album_id, width DESC
|
||||
`, placeholders)
|
||||
|
||||
// Execute query
|
||||
rows, _ := d.mainDB.Query(query, albumIDs...)
|
||||
|
||||
// Group by album_id
|
||||
result := make(map[string][]Image)
|
||||
for rows.Next() {
|
||||
var albumID string
|
||||
var img Image
|
||||
rows.Scan(&albumID, &img.URL, &img.Width, &img.Height)
|
||||
result[albumID] = append(result[albumID], img)
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
```
|
||||
|
||||
**Similar functions:**
|
||||
- `batchGetAlbumArtists(albumIDs []string) map[string][]Artist`
|
||||
- `batchGetTrackArtists(trackIDs []string) map[string][]Artist`
|
||||
- `batchGetArtistGenres(artistIDs []string) map[string][]string`
|
||||
- `batchGetArtistImages(artistIDs []string) map[string][]Image`
|
||||
- `batchEnrichTrackFiles(trackIDs []string) map[string]*TrackFile`
|
||||
|
||||
**Pattern:**
|
||||
1. Build IN clause with placeholders
|
||||
2. Execute single query for all IDs
|
||||
3. Group results by parent ID
|
||||
4. Return map for O(1) lookup
|
||||
|
||||
### Why Batch Matters
|
||||
|
||||
**Without batching (400 tracks):**
|
||||
- 400 track queries
|
||||
- 400 album queries
|
||||
- 400 album image queries
|
||||
- 400 album artist queries
|
||||
- 400 track artist queries
|
||||
- ~800 artist genre queries (2 artists per track avg)
|
||||
- ~800 artist image queries
|
||||
- 400 track file queries
|
||||
- **Total: ~3,600 queries**
|
||||
|
||||
**With batching (400 tracks):**
|
||||
- 1 batch track query
|
||||
- 1 batch album image query
|
||||
- 1 batch album artist query
|
||||
- 1 batch track artist query
|
||||
- 1 batch artist genre query
|
||||
- 1 batch artist image query
|
||||
- 1 batch track file query
|
||||
- **Total: 7 queries**
|
||||
|
||||
**Performance gain: 514x fewer queries**
|
||||
|
||||
## Data Provenance
|
||||
|
||||
### Source
|
||||
|
||||
**Disclaimer from repository:**
|
||||
> "This project is not affiliated with Spotify."
|
||||
|
||||
**Implications:**
|
||||
- Data source unclear (likely scraped or obtained from third party)
|
||||
- Legal status uncertain
|
||||
- No official Spotify endorsement
|
||||
|
||||
### Data Freshness
|
||||
|
||||
**Static snapshot:**
|
||||
- No update mechanism
|
||||
- Data frozen at time of database creation
|
||||
- No real-time sync with Spotify
|
||||
|
||||
**Staleness concerns:**
|
||||
- New releases not included
|
||||
- Popularity scores outdated
|
||||
- Artist follower counts stale
|
||||
- Deleted tracks still present
|
||||
|
||||
**Mitigation:**
|
||||
- Treat as historical snapshot
|
||||
- Complement with real-time APIs for fresh data
|
||||
- Periodically obtain updated database (if available)
|
||||
|
||||
### Data Quality
|
||||
|
||||
**Strengths:**
|
||||
- 256M tracks (massive coverage)
|
||||
- Rich metadata (genres, images, roles)
|
||||
- ISRC codes for cross-referencing
|
||||
- Popularity/follower metrics
|
||||
|
||||
**Weaknesses:**
|
||||
- No data validation visible
|
||||
- Potential duplicates (not deduplicated)
|
||||
- Missing ISRCs for some tracks
|
||||
- Incomplete artist roles
|
||||
|
||||
## Storage Requirements
|
||||
|
||||
### Disk Space
|
||||
|
||||
| Component | Size | Compressible |
|
||||
|-----------|------|--------------|
|
||||
| main_database.sqlite3 | ~117GB | Minimal (already compact) |
|
||||
| track_files.sqlite3 | ~99GB | Minimal (JSON fields) |
|
||||
| **Total** | **~216GB** | - |
|
||||
|
||||
**Recommendations:**
|
||||
- SSD strongly recommended (HDD too slow for 256M rows)
|
||||
- NVMe for best performance
|
||||
- RAID not necessary (read-only, can rebuild from backup)
|
||||
|
||||
### Memory Usage
|
||||
|
||||
**SQLite memory:**
|
||||
- Page cache: 64MB per connection
|
||||
- 8 connections: 512MB cache total
|
||||
- Memory-mapped I/O: 1GB per database (2GB total)
|
||||
- **Total: ~2.5GB minimum**
|
||||
|
||||
**Application memory:**
|
||||
- Go runtime: ~50MB
|
||||
- Rate limiter map: Grows unbounded (leak)
|
||||
- Request buffers: ~10MB per concurrent request
|
||||
- **Total: ~100MB + leak**
|
||||
|
||||
**Recommended RAM:** 4GB+ (2.5GB for SQLite + 1.5GB for OS/app)
|
||||
|
||||
### I/O Characteristics
|
||||
|
||||
**Read patterns:**
|
||||
- Random reads (track lookups by ID/ISRC)
|
||||
- Sequential scans (search queries)
|
||||
- Batch reads (IN clause queries)
|
||||
|
||||
**Write patterns:**
|
||||
- None (read-only)
|
||||
|
||||
**Cache effectiveness:**
|
||||
- Hot data (popular tracks): High hit rate
|
||||
- Cold data (obscure tracks): Low hit rate
|
||||
- Search queries: Low hit rate (full scans)
|
||||
|
||||
## Database Maintenance
|
||||
|
||||
### No Maintenance Required
|
||||
|
||||
**Read-only benefits:**
|
||||
- No VACUUM needed (no fragmentation from deletes)
|
||||
- No ANALYZE needed (statistics static)
|
||||
- No REINDEX needed (indexes don't degrade)
|
||||
- No WAL checkpoint (journal disabled)
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
**Simple backup:**
|
||||
```bash
|
||||
# Copy files (database must be idle)
|
||||
cp main_database.sqlite3 backup/
|
||||
cp track_files.sqlite3 backup/
|
||||
```
|
||||
|
||||
**Online backup (while running):**
|
||||
```bash
|
||||
# SQLite backup API (requires custom tool)
|
||||
sqlite3 main_database.sqlite3 ".backup backup/main_database.sqlite3"
|
||||
```
|
||||
|
||||
**Restore:**
|
||||
```bash
|
||||
# Simply replace files
|
||||
cp backup/main_database.sqlite3 .
|
||||
cp backup/track_files.sqlite3 .
|
||||
```
|
||||
|
||||
### Integrity Checks
|
||||
|
||||
**Verify database integrity:**
|
||||
```bash
|
||||
sqlite3 main_database.sqlite3 "PRAGMA integrity_check;"
|
||||
sqlite3 track_files.sqlite3 "PRAGMA integrity_check;"
|
||||
```
|
||||
|
||||
**Expected output:** `ok`
|
||||
|
||||
**Run periodically:** Monthly or after hardware issues
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Query Optimization
|
||||
|
||||
**Indexes already present:**
|
||||
- Primary keys on all ID columns
|
||||
- Foreign key indexes (album_rowid, artist_id, etc.)
|
||||
- Search indexes (tracks.name, artists.name)
|
||||
|
||||
**Missing indexes (potential improvements):**
|
||||
- Full-text search index (FTS5) on track/artist names
|
||||
- Composite index on (popularity, name) for sorted searches
|
||||
|
||||
### Connection Pool Tuning
|
||||
|
||||
**Current settings:**
|
||||
```go
|
||||
MaxOpenConns: 8
|
||||
MaxIdleConns: 8
|
||||
ConnMaxLifetime: 0
|
||||
```
|
||||
|
||||
**Tuning considerations:**
|
||||
- Increase MaxOpenConns for higher concurrency (16-32)
|
||||
- Monitor CPU usage (SQLite is CPU-bound for searches)
|
||||
- No benefit beyond CPU core count
|
||||
|
||||
### Cache Tuning
|
||||
|
||||
**Current cache:** 64MB per connection (512MB total)
|
||||
|
||||
**Increase cache:**
|
||||
```
|
||||
_cache_size=-128000 // 128MB per connection
|
||||
```
|
||||
|
||||
**Tradeoff:** More memory usage vs fewer disk reads
|
||||
|
||||
**Recommendation:** Monitor cache hit rate, increase if low
|
||||
|
||||
### Memory-Mapped I/O Tuning
|
||||
|
||||
**Current mmap:** 1GB per database
|
||||
|
||||
**Increase mmap:**
|
||||
```
|
||||
_mmap_size=2147483648 // 2GB
|
||||
```
|
||||
|
||||
**Tradeoff:** More virtual memory vs faster reads
|
||||
|
||||
**Recommendation:** Set to database size if RAM allows (117GB not feasible)
|
||||
|
||||
## Data Model Comparison
|
||||
|
||||
### vs Spotify Web API
|
||||
|
||||
| Feature | Music Metadata API | Spotify Web API |
|
||||
|---------|-------------------|-----------------|
|
||||
| Track ID format | Spotify-compatible | Spotify IDs |
|
||||
| ISRC support | Yes | Yes |
|
||||
| Popularity | Static snapshot | Real-time |
|
||||
| Followers | Static snapshot | Real-time |
|
||||
| Images | External URLs | External URLs |
|
||||
| Genres | Artist-level | Artist-level |
|
||||
| Lyrics | Flag only | Not available |
|
||||
| Artist roles | Detailed | Limited |
|
||||
| Languages | Supported | Not available |
|
||||
|
||||
### vs MusicBrainz
|
||||
|
||||
| Feature | Music Metadata API | MusicBrainz |
|
||||
|---------|-------------------|-------------|
|
||||
| Identifier | Spotify IDs, ISRC | MBIDs |
|
||||
| Dataset size | 256M tracks | ~40M recordings |
|
||||
| Popularity | Yes | No |
|
||||
| Followers | Yes | No |
|
||||
| Images | Yes (external) | Yes (Cover Art Archive) |
|
||||
| Genres | Yes | Yes (tags) |
|
||||
| Relationships | Limited | Extensive |
|
||||
| Credits | Artist roles | Detailed credits |
|
||||
| Updates | Static | Community-driven |
|
||||
|
||||
## Integration Considerations
|
||||
|
||||
### Joining with Other Databases
|
||||
|
||||
**ISRC as common key:**
|
||||
```sql
|
||||
-- Join with local library
|
||||
SELECT l.file_path, m.name, m.popularity
|
||||
FROM local_library l
|
||||
JOIN music_metadata_api.tracks m ON l.isrc = m.isrc
|
||||
```
|
||||
|
||||
**Spotify ID as common key:**
|
||||
```sql
|
||||
-- Join with MusicBrainz
|
||||
SELECT mb.mbid, mm.name, mm.popularity
|
||||
FROM musicbrainz.recording mb
|
||||
JOIN musicbrainz.isrc i ON mb.id = i.recording
|
||||
JOIN music_metadata_api.tracks mm ON i.isrc = mm.isrc
|
||||
```
|
||||
|
||||
### Data Export
|
||||
|
||||
**Export to JSON:**
|
||||
```bash
|
||||
sqlite3 main_database.sqlite3 <<EOF
|
||||
.mode json
|
||||
.output tracks.json
|
||||
SELECT * FROM tracks LIMIT 1000;
|
||||
EOF
|
||||
```
|
||||
|
||||
**Export to CSV:**
|
||||
```bash
|
||||
sqlite3 main_database.sqlite3 <<EOF
|
||||
.mode csv
|
||||
.output tracks.csv
|
||||
SELECT id, name, isrc, popularity FROM tracks;
|
||||
EOF
|
||||
```
|
||||
|
||||
### Data Import
|
||||
|
||||
**Import from CSV:**
|
||||
```bash
|
||||
sqlite3 new_database.sqlite3 <<EOF
|
||||
.mode csv
|
||||
.import tracks.csv tracks
|
||||
EOF
|
||||
```
|
||||
|
||||
**Bulk insert from application:**
|
||||
```go
|
||||
tx, _ := db.Begin()
|
||||
stmt, _ := tx.Prepare("INSERT INTO tracks VALUES (?, ?, ?, ...)")
|
||||
for _, track := range tracks {
|
||||
stmt.Exec(track.ID, track.Name, track.ISRC, ...)
|
||||
}
|
||||
tx.Commit()
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
### No Write Operations
|
||||
|
||||
**Implications:**
|
||||
- Can't add new tracks
|
||||
- Can't update popularity scores
|
||||
- Can't delete duplicates
|
||||
- Can't fix data errors
|
||||
|
||||
**Workarounds:**
|
||||
- Create separate writable database for local additions
|
||||
- Use views to merge read-only + writable data
|
||||
- Periodically obtain updated database snapshot
|
||||
|
||||
### No Full-Text Search
|
||||
|
||||
**Current search:** `LIKE %query%` (slow)
|
||||
|
||||
**FTS5 alternative:**
|
||||
```sql
|
||||
-- Create FTS5 virtual table (requires writable database)
|
||||
CREATE VIRTUAL TABLE tracks_fts USING fts5(name, content=tracks);
|
||||
INSERT INTO tracks_fts SELECT name FROM tracks;
|
||||
|
||||
-- Fast search
|
||||
SELECT * FROM tracks_fts WHERE name MATCH 'bohemian';
|
||||
```
|
||||
|
||||
**Limitation:** Can't create FTS5 on read-only database
|
||||
|
||||
**Workaround:** Create separate FTS5 database, sync periodically
|
||||
|
||||
### No Relationships Beyond Basics
|
||||
|
||||
**Missing relationships:**
|
||||
- Track-to-track (similar tracks, remixes)
|
||||
- Album-to-album (compilations, deluxe editions)
|
||||
- Artist-to-artist (collaborations, bands)
|
||||
|
||||
**Workaround:** Build relationship graph in separate database
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,761 @@
|
||||
# Music Metadata API - Evaluation
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Music Metadata API is a **simple, focused, self-contained** service for querying metadata on 256 million music tracks. It excels at batch lookups and ISRC-based queries but lacks authentication, testing, and real-time data updates.
|
||||
|
||||
**Best for:** Self-hosted metadata enrichment, high-volume batch processing, ISRC resolution
|
||||
**Not suitable for:** Real-time data, production systems requiring authentication, mission-critical applications without testing
|
||||
|
||||
## Strengths
|
||||
|
||||
### 1. Massive Dataset
|
||||
|
||||
**256 million tracks** across two SQLite databases (~216GB)
|
||||
|
||||
**Coverage:**
|
||||
- Tracks with ISRC codes
|
||||
- Albums with artwork, labels, release dates
|
||||
- Artists with genres, follower counts, popularity
|
||||
- Extended metadata (lyrics flags, languages, artist roles)
|
||||
|
||||
**Comparison:**
|
||||
- Spotify Web API: Full catalog (real-time)
|
||||
- MusicBrainz: ~40M recordings
|
||||
- Discogs: ~15M releases
|
||||
|
||||
**Value:** Comprehensive coverage for metadata enrichment without API rate limits.
|
||||
|
||||
### 2. Extremely Simple Architecture
|
||||
|
||||
**No framework, no ORM, minimal dependencies:**
|
||||
- Go stdlib for HTTP, JSON, database
|
||||
- 2 external packages (sqlite driver, rate limiter)
|
||||
- ~1,100 lines of code
|
||||
- Single binary deployment
|
||||
|
||||
**Benefits:**
|
||||
- Easy to understand and modify
|
||||
- Fast compilation
|
||||
- No framework lock-in
|
||||
- Minimal attack surface
|
||||
|
||||
**Comparison:**
|
||||
- Typical web service: 10+ dependencies, framework overhead
|
||||
- Music Metadata API: 2 dependencies, stdlib only
|
||||
|
||||
### 3. High-Performance Batch API
|
||||
|
||||
**Batch endpoint:** Process up to 400 items per request
|
||||
|
||||
**Performance gain:**
|
||||
- Individual requests: 400 × ~50ms = 20 seconds
|
||||
- Batch request: ~200-500ms total
|
||||
- **40-100x faster**
|
||||
|
||||
**Query optimization:**
|
||||
- Without batching: 2,800+ queries for 400 tracks
|
||||
- With batching: 7 queries for 400 tracks
|
||||
- **400x fewer queries**
|
||||
|
||||
**Use case:** Enriching large music libraries efficiently.
|
||||
|
||||
### 4. Pure Go (No CGO)
|
||||
|
||||
**CGO_ENABLED=0** - No C dependencies
|
||||
|
||||
**Benefits:**
|
||||
- Cross-compilation trivial (GOOS/GOARCH)
|
||||
- No C toolchain required
|
||||
- Smaller attack surface
|
||||
- Easier deployment (static binary)
|
||||
|
||||
**Tradeoff:** Larger binary size vs CGO SQLite driver (~2MB vs ~500KB)
|
||||
|
||||
### 5. Read-Only Safety
|
||||
|
||||
**Databases opened in read-only mode:**
|
||||
- No accidental writes
|
||||
- No data corruption risk
|
||||
- Safe concurrent reads
|
||||
- No write locks
|
||||
|
||||
**PRAGMAs:**
|
||||
```
|
||||
mode=ro
|
||||
_journal_mode=off
|
||||
_query_only=true
|
||||
```
|
||||
|
||||
**Benefit:** Multiple instances can share database files safely.
|
||||
|
||||
### 6. OpenAPI Documentation
|
||||
|
||||
**Comprehensive OpenAPI 3.1 spec:**
|
||||
- All endpoints documented
|
||||
- Request/response schemas
|
||||
- Example payloads
|
||||
- Interactive Swagger UI at `/docs`
|
||||
|
||||
**Value:** Self-documenting API, easy integration.
|
||||
|
||||
### 7. MIT License
|
||||
|
||||
**Permissive license:**
|
||||
- Free for commercial use
|
||||
- No attribution required (recommended)
|
||||
- Modify and redistribute freely
|
||||
|
||||
**Comparison:**
|
||||
- Spotify Web API: Proprietary, rate limited
|
||||
- MusicBrainz: CC0/Public Domain (data), GPL (server)
|
||||
|
||||
### 8. Easy Deployment
|
||||
|
||||
**Multiple deployment options:**
|
||||
- Standalone binary (single executable)
|
||||
- Docker container (official image)
|
||||
- Kubernetes (example manifests)
|
||||
- Cloud platforms (ECS, Cloud Run, ACI)
|
||||
|
||||
**Minimal requirements:**
|
||||
- 216GB disk (databases)
|
||||
- 4GB RAM
|
||||
- 1 CPU core
|
||||
|
||||
**No external dependencies:**
|
||||
- No database server (SQLite embedded)
|
||||
- No cache server (SQLite cache)
|
||||
- No message queue
|
||||
- No authentication service
|
||||
|
||||
## Weaknesses
|
||||
|
||||
### 1. Zero Test Coverage
|
||||
|
||||
**No test files, no test framework, no CI testing**
|
||||
|
||||
**Risks:**
|
||||
- No regression protection
|
||||
- Bugs discovered in production
|
||||
- Difficult to refactor safely
|
||||
- No documentation via tests
|
||||
|
||||
**Evidence:**
|
||||
- `.gitignore` includes `coverage.out` (testing planned but not implemented)
|
||||
- GitHub Actions workflow has no test step
|
||||
|
||||
**Impact:** High risk for production use without extensive manual testing.
|
||||
|
||||
### 2. No Authentication
|
||||
|
||||
**Public API with no access control:**
|
||||
- No OAuth
|
||||
- No API keys
|
||||
- No rate limiting per user (only per IP)
|
||||
- No usage tracking per user
|
||||
|
||||
**Risks:**
|
||||
- Abuse (unlimited queries)
|
||||
- No accountability
|
||||
- No quota enforcement
|
||||
- Data scraping
|
||||
|
||||
**Workarounds:**
|
||||
- Deploy behind reverse proxy with auth (nginx, Caddy)
|
||||
- Use API gateway (Kong, Tyk)
|
||||
- Implement custom middleware
|
||||
|
||||
**Impact:** Not suitable for public internet deployment without additional security layer.
|
||||
|
||||
### 3. Naive Health Check
|
||||
|
||||
**Health endpoint always returns OK:**
|
||||
```go
|
||||
func handleHealth(w http.ResponseWriter, r *http.Request) {
|
||||
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
|
||||
}
|
||||
```
|
||||
|
||||
**Problem:** Doesn't verify database connectivity
|
||||
|
||||
**Scenario:**
|
||||
- Database file deleted/corrupted
|
||||
- Health check returns 200 OK
|
||||
- Actual queries fail with 500 errors
|
||||
- Monitoring systems don't detect failure
|
||||
|
||||
**Impact:** False positives in monitoring, delayed incident detection.
|
||||
|
||||
### 4. Rate Limiter Memory Leak
|
||||
|
||||
**Visitor map grows unbounded:**
|
||||
```go
|
||||
type RateLimiter struct {
|
||||
visitors map[string]*rate.Limiter // Never cleaned up
|
||||
mu sync.RWMutex
|
||||
}
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
- Long-running servers accumulate IPs
|
||||
- Memory usage grows over time
|
||||
- 1M unique IPs = ~100MB leak
|
||||
|
||||
**Workaround:** Restart server periodically
|
||||
|
||||
**Fix required:** Implement visitor cleanup (remove inactive IPs after 24 hours)
|
||||
|
||||
### 5. No CORS Support
|
||||
|
||||
**No CORS headers:**
|
||||
- Browser-based clients blocked
|
||||
- Can't call from web apps directly
|
||||
- OPTIONS preflight requests fail
|
||||
|
||||
**Workarounds:**
|
||||
- Add CORS middleware (custom implementation)
|
||||
- Use server-side proxy
|
||||
- Deploy API on same origin as web app
|
||||
|
||||
**Impact:** Limited to server-side integrations.
|
||||
|
||||
### 6. No Metrics/Monitoring
|
||||
|
||||
**No instrumentation:**
|
||||
- No Prometheus metrics
|
||||
- No request counters
|
||||
- No latency histograms
|
||||
- No error rate tracking
|
||||
|
||||
**Visibility gaps:**
|
||||
- Can't track usage patterns
|
||||
- Can't identify slow endpoints
|
||||
- Can't detect error spikes
|
||||
- No performance baselines
|
||||
|
||||
**Workarounds:**
|
||||
- Parse logs for metrics
|
||||
- Use reverse proxy metrics (nginx)
|
||||
- Implement custom metrics middleware
|
||||
|
||||
**Impact:** Blind operation, difficult to optimize.
|
||||
|
||||
### 7. Database Provenance Unclear
|
||||
|
||||
**Repository disclaimer:**
|
||||
> "This project is not affiliated with Spotify."
|
||||
|
||||
**Concerns:**
|
||||
- Data source unclear (likely scraped)
|
||||
- Legal status uncertain
|
||||
- No official Spotify endorsement
|
||||
- Potential copyright issues
|
||||
|
||||
**Risks:**
|
||||
- Takedown requests
|
||||
- Legal liability
|
||||
- Data quality unknown
|
||||
- No support/updates
|
||||
|
||||
**Recommendation:** Verify legal compliance before production use.
|
||||
|
||||
### 8. No Data Freshness Mechanism
|
||||
|
||||
**Static snapshot:**
|
||||
- No update mechanism
|
||||
- Data frozen at time of database creation
|
||||
- No real-time sync with Spotify
|
||||
|
||||
**Staleness:**
|
||||
- New releases not included
|
||||
- Popularity scores outdated
|
||||
- Artist follower counts stale
|
||||
- Deleted tracks still present
|
||||
|
||||
**Workarounds:**
|
||||
- Periodically obtain updated database (if available)
|
||||
- Complement with real-time APIs for fresh data
|
||||
- Treat as historical snapshot
|
||||
|
||||
**Impact:** Not suitable for applications requiring current data.
|
||||
|
||||
### 9. Search Performance
|
||||
|
||||
**LIKE %query% on 256M rows:**
|
||||
- Full table scan (can't use indexes)
|
||||
- 10-second timeout (can be hit)
|
||||
- CPU-intensive
|
||||
|
||||
**Slow searches:**
|
||||
- Common words ("love", "the"): 5-10 seconds
|
||||
- Rare queries: 10+ seconds (full scan)
|
||||
|
||||
**Alternative:** SQLite FTS5 (Full-Text Search)
|
||||
- Requires writable database (not compatible with read-only mode)
|
||||
- Would need separate FTS5 database
|
||||
|
||||
**Impact:** Search functionality limited to specific queries.
|
||||
|
||||
### 10. Hardcoded Configuration
|
||||
|
||||
**All limits/timeouts hardcoded:**
|
||||
- Rate limit: 100 req/s, 200 burst
|
||||
- Search timeout: 10 seconds
|
||||
- Batch limit: 400 items
|
||||
- Connection pool: 8 connections
|
||||
- SQLite cache: 64MB
|
||||
|
||||
**Problems:**
|
||||
- No flexibility
|
||||
- Requires recompilation to change
|
||||
- No environment-specific config
|
||||
|
||||
**Workaround:** Fork and modify code
|
||||
|
||||
**Impact:** Limited adaptability to different workloads.
|
||||
|
||||
## Use Case Evaluation
|
||||
|
||||
### Ideal Use Cases
|
||||
|
||||
#### 1. Music Library Enrichment
|
||||
|
||||
**Scenario:** Enrich local music library with metadata
|
||||
|
||||
**Flow:**
|
||||
1. Extract ISRCs from audio files (via AcoustID)
|
||||
2. Batch lookup ISRCs (400 at a time)
|
||||
3. Store metadata in local database
|
||||
4. Display in music player UI
|
||||
|
||||
**Why suitable:**
|
||||
- Batch API optimized for bulk lookups
|
||||
- ISRC-based lookup (industry standard)
|
||||
- No API rate limits (self-hosted)
|
||||
- Comprehensive metadata (genres, images, popularity)
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Enrich 10,000 tracks
|
||||
isrcs = extract_isrcs_from_library() # 10,000 ISRCs
|
||||
|
||||
# Batch lookup (25 requests for 10,000 tracks)
|
||||
for batch in chunks(isrcs, 400):
|
||||
response = requests.post("http://localhost:8080/batch/lookup", json={"isrcs": batch})
|
||||
store_metadata(response.json())
|
||||
```
|
||||
|
||||
#### 2. Metadata Aggregator Pipeline
|
||||
|
||||
**Scenario:** Combine data from multiple sources (MusicBrainz + Music Metadata API)
|
||||
|
||||
**Flow:**
|
||||
1. Query MusicBrainz for recording by MBID
|
||||
2. Extract ISRC from MusicBrainz response
|
||||
3. Lookup ISRC in Music Metadata API
|
||||
4. Merge metadata (MusicBrainz credits + Spotify-style data)
|
||||
|
||||
**Why suitable:**
|
||||
- Complements MusicBrainz (different data models)
|
||||
- ISRC as common key
|
||||
- Fast batch lookups
|
||||
- No external API dependencies
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Get MusicBrainz data
|
||||
mb_data = musicbrainz.get_recording(mbid)
|
||||
isrc = mb_data['isrcs'][0]
|
||||
|
||||
# Get Spotify-style data
|
||||
mm_data = requests.get(f"http://localhost:8080/lookup/isrc/{isrc}").json()
|
||||
|
||||
# Merge
|
||||
merged = {
|
||||
"mbid": mbid,
|
||||
"isrc": isrc,
|
||||
"title": mm_data['name'],
|
||||
"popularity": mm_data['popularity'],
|
||||
"credits": mb_data['artist-credit'],
|
||||
"genres": mm_data['artists'][0]['genres']
|
||||
}
|
||||
```
|
||||
|
||||
#### 3. Self-Hosted Alternative to Spotify API
|
||||
|
||||
**Scenario:** Replace Spotify Web API with local service
|
||||
|
||||
**Why suitable:**
|
||||
- No OAuth complexity
|
||||
- No API rate limits
|
||||
- No per-request costs
|
||||
- Batch support (400 items vs Spotify's 50)
|
||||
|
||||
**Tradeoffs:**
|
||||
- Static data (no real-time updates)
|
||||
- Database size (216GB)
|
||||
- No write operations
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Spotify Web API (rate limited, requires OAuth)
|
||||
spotify_data = spotify_client.search(q=f"isrc:{isrc}", type="track")
|
||||
|
||||
# Music Metadata API (no auth, no rate limits)
|
||||
mm_data = requests.get(f"http://localhost:8080/lookup/isrc/{isrc}").json()
|
||||
```
|
||||
|
||||
#### 4. DJ Software Metadata Provider
|
||||
|
||||
**Scenario:** Enrich DJ library with popularity, genres, images
|
||||
|
||||
**Why suitable:**
|
||||
- Batch processing for large libraries
|
||||
- Popularity scores for track selection
|
||||
- Genre tags for filtering
|
||||
- Album artwork for UI
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Enrich DJ library
|
||||
tracks = load_dj_library() # 5,000 tracks
|
||||
isrcs = [t.isrc for t in tracks]
|
||||
|
||||
# Batch lookup
|
||||
for batch in chunks(isrcs, 400):
|
||||
response = requests.post("http://localhost:8080/batch/lookup", json={"isrcs": batch})
|
||||
update_dj_library(response.json())
|
||||
```
|
||||
|
||||
### Unsuitable Use Cases
|
||||
|
||||
#### 1. Real-Time Music Discovery App
|
||||
|
||||
**Why unsuitable:**
|
||||
- Static data (no new releases)
|
||||
- Outdated popularity scores
|
||||
- No personalization
|
||||
- No user-specific data
|
||||
|
||||
**Alternative:** Spotify Web API, Apple Music API
|
||||
|
||||
#### 2. Public-Facing API Service
|
||||
|
||||
**Why unsuitable:**
|
||||
- No authentication (abuse risk)
|
||||
- No usage tracking
|
||||
- No quota enforcement
|
||||
- Rate limiter memory leak
|
||||
|
||||
**Alternative:** Add authentication layer or use managed API service
|
||||
|
||||
#### 3. Mission-Critical Production System
|
||||
|
||||
**Why unsuitable:**
|
||||
- Zero test coverage
|
||||
- Naive health check
|
||||
- Memory leak
|
||||
- No metrics
|
||||
|
||||
**Alternative:** Extensive testing + monitoring before production use
|
||||
|
||||
#### 4. Applications Requiring Fresh Data
|
||||
|
||||
**Why unsuitable:**
|
||||
- Static snapshot (no updates)
|
||||
- Stale popularity/follower counts
|
||||
- Missing new releases
|
||||
|
||||
**Alternative:** Spotify Web API, MusicBrainz (community-updated)
|
||||
|
||||
## Integration Evaluation
|
||||
|
||||
### Complementary Services
|
||||
|
||||
**Works well with:**
|
||||
- **MusicBrainz:** Different data models, ISRC as common key
|
||||
- **AcoustID:** Fingerprint to ISRC, then lookup metadata
|
||||
- **Local music libraries:** Enrich with metadata
|
||||
- **DJ software:** Popularity, genres, artwork
|
||||
|
||||
**Conflicts with:**
|
||||
- **Spotify Web API:** Overlapping data, but Music Metadata API is static
|
||||
- **Real-time services:** Music Metadata API data is stale
|
||||
|
||||
### Integration Complexity
|
||||
|
||||
**Easy integrations:**
|
||||
- HTTP client (any language)
|
||||
- Batch processing pipelines
|
||||
- Local applications
|
||||
|
||||
**Complex integrations:**
|
||||
- Browser-based apps (no CORS)
|
||||
- Authenticated services (no auth)
|
||||
- Real-time systems (static data)
|
||||
|
||||
## Performance Evaluation
|
||||
|
||||
### Throughput
|
||||
|
||||
**Batch endpoint:**
|
||||
- 400 items per request
|
||||
- ~200-500ms per request
|
||||
- **800-2,000 items/second** (single instance)
|
||||
|
||||
**Individual endpoints:**
|
||||
- ~50ms per request
|
||||
- Rate limited to 100 req/s
|
||||
- **100 items/second** (single instance)
|
||||
|
||||
**Scaling:**
|
||||
- Horizontal: Run multiple instances (read-only safe)
|
||||
- Vertical: More RAM (larger cache), faster disk (SSD)
|
||||
|
||||
### Latency
|
||||
|
||||
**Typical latencies:**
|
||||
- Track lookup: 10-50ms
|
||||
- Album lookup: 10-50ms
|
||||
- Artist lookup: 10-50ms
|
||||
- Batch lookup (400 items): 200-500ms
|
||||
- Search: 1-10 seconds (depends on query)
|
||||
|
||||
**Bottlenecks:**
|
||||
- Search queries (LIKE %query%)
|
||||
- Disk I/O (use SSD)
|
||||
- Rate limiter (RWMutex contention)
|
||||
|
||||
### Resource Usage
|
||||
|
||||
**Disk:** 216GB (databases)
|
||||
**RAM:** 2.5GB (SQLite cache + mmap) + 1.5GB (app/OS) = 4GB minimum
|
||||
**CPU:** 1 core minimum, 2+ recommended (search queries CPU-intensive)
|
||||
|
||||
**Scaling costs:**
|
||||
- 10 instances = 2.16TB storage (expensive)
|
||||
- Shared filesystem (NFS, EFS) reduces storage cost but increases latency
|
||||
|
||||
## Security Evaluation
|
||||
|
||||
### Vulnerabilities
|
||||
|
||||
**High severity:**
|
||||
- **No authentication:** Anyone can query API
|
||||
- **No rate limiting per user:** IP-based only (easily bypassed)
|
||||
|
||||
**Medium severity:**
|
||||
- **Memory leak:** Rate limiter grows unbounded
|
||||
- **No input sanitization:** SQL injection risk (mitigated by parameterized queries)
|
||||
|
||||
**Low severity:**
|
||||
- **No HTTPS:** Deploy behind reverse proxy with TLS
|
||||
- **No CORS:** Browser-based attacks limited
|
||||
|
||||
### Mitigations
|
||||
|
||||
**Authentication:**
|
||||
- Deploy behind reverse proxy with auth (nginx, Caddy)
|
||||
- Use API gateway (Kong, Tyk)
|
||||
|
||||
**Rate limiting:**
|
||||
- Implement per-user rate limiting (requires auth)
|
||||
- Use distributed rate limiter (Redis)
|
||||
|
||||
**Memory leak:**
|
||||
- Restart server periodically
|
||||
- Implement visitor cleanup
|
||||
|
||||
**HTTPS:**
|
||||
- Terminate TLS at reverse proxy
|
||||
- Use Let's Encrypt for free certificates
|
||||
|
||||
## Reliability Evaluation
|
||||
|
||||
### Failure Modes
|
||||
|
||||
**Database unavailable:**
|
||||
- Health check returns OK (false positive)
|
||||
- Queries fail with 500 errors
|
||||
- No automatic recovery
|
||||
|
||||
**Memory exhaustion:**
|
||||
- Rate limiter leak accumulates
|
||||
- OOM kill by OS
|
||||
- Service restart required
|
||||
|
||||
**Disk full:**
|
||||
- SQLite read-only (no writes)
|
||||
- No impact on service
|
||||
|
||||
**Network partition:**
|
||||
- No external dependencies
|
||||
- Service continues (self-contained)
|
||||
|
||||
### Recovery
|
||||
|
||||
**Automatic recovery:**
|
||||
- Graceful shutdown on SIGINT/SIGTERM
|
||||
- Docker/Kubernetes restart on failure
|
||||
|
||||
**Manual recovery:**
|
||||
- Restart service (clears rate limiter leak)
|
||||
- Restore database from backup
|
||||
- Check database integrity (PRAGMA integrity_check)
|
||||
|
||||
### High Availability
|
||||
|
||||
**Strategies:**
|
||||
- Run multiple instances (read-only safe)
|
||||
- Load balancer distributes traffic
|
||||
- Health checks route around failures (but naive health check is a problem)
|
||||
|
||||
**Limitations:**
|
||||
- No shared state (rate limiter per-instance)
|
||||
- No session affinity required
|
||||
- Database replication (copy files to each instance)
|
||||
|
||||
## Cost Evaluation
|
||||
|
||||
### Infrastructure Costs
|
||||
|
||||
**Single instance:**
|
||||
- Compute: $20-50/month (2 CPU, 8GB RAM)
|
||||
- Storage: $20-40/month (250GB SSD)
|
||||
- Network: $5-10/month (1TB transfer)
|
||||
- **Total: $45-100/month**
|
||||
|
||||
**10 instances (high availability):**
|
||||
- Compute: $200-500/month
|
||||
- Storage: $200-400/month (2.5TB SSD, or shared filesystem)
|
||||
- Network: $50-100/month
|
||||
- **Total: $450-1,000/month**
|
||||
|
||||
**Comparison:**
|
||||
- Spotify Web API: Free tier limited, paid tiers $0.001-0.01 per request
|
||||
- MusicBrainz: Free (donations encouraged)
|
||||
|
||||
### Development Costs
|
||||
|
||||
**Initial setup:**
|
||||
- Deploy service: 1-2 hours
|
||||
- Obtain databases: Unknown (not in repository)
|
||||
- Test integration: 2-4 hours
|
||||
- **Total: 4-8 hours**
|
||||
|
||||
**Ongoing maintenance:**
|
||||
- Monitor service: 1-2 hours/month
|
||||
- Update databases: Unknown (no update mechanism)
|
||||
- Security patches: 1-2 hours/month
|
||||
- **Total: 2-4 hours/month**
|
||||
|
||||
### Total Cost of Ownership
|
||||
|
||||
**Year 1:**
|
||||
- Infrastructure: $540-1,200 (single instance)
|
||||
- Development: $400-800 (setup + 12 months maintenance)
|
||||
- **Total: $940-2,000**
|
||||
|
||||
**Comparison:**
|
||||
- Spotify Web API: $0-10,000+ (depends on usage)
|
||||
- MusicBrainz: $0 (free, donations encouraged)
|
||||
|
||||
## Recommendation Matrix
|
||||
|
||||
| Use Case | Suitability | Reasoning |
|
||||
|----------|-------------|-----------|
|
||||
| Music library enrichment | ⭐⭐⭐⭐⭐ | Ideal: Batch API, ISRC lookup, no rate limits |
|
||||
| Metadata aggregator | ⭐⭐⭐⭐⭐ | Ideal: Complements MusicBrainz, fast lookups |
|
||||
| Self-hosted alternative | ⭐⭐⭐⭐ | Good: No auth complexity, but static data |
|
||||
| DJ software integration | ⭐⭐⭐⭐ | Good: Popularity, genres, artwork |
|
||||
| Real-time music app | ⭐⭐ | Poor: Static data, no updates |
|
||||
| Public API service | ⭐⭐ | Poor: No auth, no metrics, memory leak |
|
||||
| Mission-critical system | ⭐ | Very poor: No tests, naive health check |
|
||||
| Fresh data required | ⭐ | Very poor: Static snapshot, no updates |
|
||||
|
||||
**Legend:**
|
||||
- ⭐⭐⭐⭐⭐ Ideal
|
||||
- ⭐⭐⭐⭐ Good
|
||||
- ⭐⭐⭐ Acceptable
|
||||
- ⭐⭐ Poor
|
||||
- ⭐ Very poor
|
||||
|
||||
## Final Verdict
|
||||
|
||||
### Overall Rating: 7/10
|
||||
|
||||
**Breakdown:**
|
||||
- **Functionality:** 9/10 (comprehensive metadata, batch API)
|
||||
- **Performance:** 8/10 (fast batch, slow search)
|
||||
- **Reliability:** 5/10 (no tests, memory leak, naive health check)
|
||||
- **Security:** 4/10 (no auth, no metrics)
|
||||
- **Maintainability:** 6/10 (simple code, but no tests)
|
||||
- **Documentation:** 8/10 (OpenAPI spec, but minimal code comments)
|
||||
|
||||
### Strengths Summary
|
||||
|
||||
1. Massive dataset (256M tracks)
|
||||
2. Simple architecture (no framework)
|
||||
3. High-performance batch API (400 items/request)
|
||||
4. Pure Go (no CGO)
|
||||
5. Read-only safety
|
||||
6. OpenAPI documentation
|
||||
7. MIT license
|
||||
8. Easy deployment
|
||||
|
||||
### Weaknesses Summary
|
||||
|
||||
1. Zero test coverage
|
||||
2. No authentication
|
||||
3. Naive health check
|
||||
4. Rate limiter memory leak
|
||||
5. No CORS
|
||||
6. No metrics
|
||||
7. Database provenance unclear
|
||||
8. No data freshness
|
||||
9. Slow search (LIKE %query%)
|
||||
10. Hardcoded configuration
|
||||
|
||||
### Recommendation
|
||||
|
||||
**Use Music Metadata API if:**
|
||||
- You need to enrich large music libraries (batch processing)
|
||||
- You want ISRC-based lookups without API rate limits
|
||||
- You can tolerate static data (no real-time updates)
|
||||
- You can deploy behind reverse proxy (for auth/CORS)
|
||||
- You can implement monitoring (metrics, proper health checks)
|
||||
- You can accept legal uncertainty (database provenance)
|
||||
|
||||
**Don't use Music Metadata API if:**
|
||||
- You need real-time data (use Spotify Web API)
|
||||
- You need production-grade reliability (no tests)
|
||||
- You need authentication out-of-the-box
|
||||
- You need fresh data (new releases, current popularity)
|
||||
- You can't tolerate 216GB storage requirement
|
||||
|
||||
### Improvement Priorities
|
||||
|
||||
**Critical (before production):**
|
||||
1. Add test coverage (unit + integration tests)
|
||||
2. Fix rate limiter memory leak
|
||||
3. Implement proper health check (verify database)
|
||||
4. Add authentication (or deploy behind auth proxy)
|
||||
|
||||
**High priority:**
|
||||
1. Add metrics/monitoring (Prometheus)
|
||||
2. Implement CORS support
|
||||
3. Extract hardcoded config (environment variables)
|
||||
4. Use LOG_LEVEL environment variable
|
||||
|
||||
**Medium priority:**
|
||||
1. Improve search performance (FTS5)
|
||||
2. Add request logging
|
||||
3. Structured error responses
|
||||
4. Documentation (code comments)
|
||||
|
||||
**Low priority:**
|
||||
1. Caching layer (Redis)
|
||||
2. Horizontal scaling improvements
|
||||
3. Database update mechanism
|
||||
4. Admin API (stats, cache control)
|
||||
@@ -0,0 +1,899 @@
|
||||
# Music Metadata API - Integrations
|
||||
|
||||
## Integration Overview
|
||||
|
||||
Music Metadata API is a **fully self-contained service** with zero external integrations at runtime. All data is served from pre-populated SQLite databases with no external API calls, no authentication services, and no third-party dependencies beyond the Go runtime.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Music Metadata API │
|
||||
│ (Self-Contained Service) │
|
||||
│ │
|
||||
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
|
||||
│ │ HTTP │ │ Database │ │ Models │ │
|
||||
│ │ Handlers │→ │ Layer │→ │ Layer │ │
|
||||
│ └────────────┘ └────────────┘ └────────────┘ │
|
||||
│ ↓ │
|
||||
│ ┌─────────────┐ │
|
||||
│ │ SQLite │ │
|
||||
│ │ Databases │ │
|
||||
│ │ (216GB) │ │
|
||||
│ └─────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ NO external calls
|
||||
↓
|
||||
(All data local)
|
||||
```
|
||||
|
||||
## Runtime Dependencies
|
||||
|
||||
### Go Standard Library
|
||||
|
||||
**Packages used:**
|
||||
- `net/http` - HTTP server and routing
|
||||
- `database/sql` - Database interface
|
||||
- `encoding/json` - JSON serialization
|
||||
- `log/slog` - Structured logging
|
||||
- `context` - Request context and timeouts
|
||||
- `sync` - Concurrency primitives (RWMutex)
|
||||
- `flag` - CLI argument parsing
|
||||
- `os/signal` - Graceful shutdown
|
||||
|
||||
**No external HTTP calls:** All functionality implemented with stdlib.
|
||||
|
||||
### External Go Modules
|
||||
|
||||
**modernc.org/sqlite v1.34.4**
|
||||
- Pure Go SQLite driver
|
||||
- No CGO required
|
||||
- No C dependencies
|
||||
- No external network calls
|
||||
|
||||
**golang.org/x/time v0.14.0**
|
||||
- Rate limiting (token bucket)
|
||||
- No external network calls
|
||||
- Pure algorithm implementation
|
||||
|
||||
**Total external dependencies:** 2 packages (both offline)
|
||||
|
||||
## Data Sources
|
||||
|
||||
### Pre-Populated Databases
|
||||
|
||||
**Source:** User must obtain databases separately (not included in repository)
|
||||
|
||||
**Database files:**
|
||||
- `main_database.sqlite3` (~117GB)
|
||||
- `track_files.sqlite3` (~99GB)
|
||||
|
||||
**Provenance:** Unclear (repository states "not affiliated with Spotify")
|
||||
|
||||
**Update mechanism:** None (static snapshot)
|
||||
|
||||
**Implications:**
|
||||
- No real-time data sync
|
||||
- No automatic updates
|
||||
- User responsible for obtaining databases
|
||||
- Legal status uncertain
|
||||
|
||||
### No External APIs
|
||||
|
||||
**What's NOT integrated:**
|
||||
- Spotify Web API (no OAuth, no API calls)
|
||||
- MusicBrainz API (no lookups)
|
||||
- Last.fm API (no scrobbling)
|
||||
- Discogs API (no catalog queries)
|
||||
- AcoustID API (no fingerprinting)
|
||||
- Cover Art Archive (no image fetching)
|
||||
|
||||
**All data served from local databases.**
|
||||
|
||||
## Browser-Side Dependencies
|
||||
|
||||
### Swagger UI (Documentation Only)
|
||||
|
||||
**Endpoint:** `/docs`
|
||||
|
||||
**External resources loaded by browser:**
|
||||
```html
|
||||
<!-- Loaded from unpkg.com CDN -->
|
||||
<script src="https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js"></script>
|
||||
<link rel="stylesheet" href="https://unpkg.com/swagger-ui-dist@5/swagger-ui.css" />
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- Loaded client-side (browser fetches)
|
||||
- Server doesn't make requests to unpkg.com
|
||||
- Works offline after first load (browser cache)
|
||||
- Only affects `/docs` endpoint (not API functionality)
|
||||
|
||||
**Implications:**
|
||||
- Requires internet connection for first `/docs` visit
|
||||
- Subsequent visits work offline (cached)
|
||||
- API endpoints work without internet
|
||||
|
||||
### Image URLs (External CDN)
|
||||
|
||||
**Image hosting:** Spotify CDN (i.scdn.co)
|
||||
|
||||
**Example URLs:**
|
||||
```
|
||||
https://i.scdn.co/image/ab67616d0000b273ce4f1737bc8a646c8c4bd25a
|
||||
https://i.scdn.co/image/af2b8e57f6d7b5d1c9a5f3e8d4c2b1a0e9f8d7c6
|
||||
```
|
||||
|
||||
**Characteristics:**
|
||||
- API returns URLs (not image data)
|
||||
- Client responsible for fetching images
|
||||
- Server never fetches images
|
||||
- Images hosted externally (not by API)
|
||||
|
||||
**Implications:**
|
||||
- Image availability depends on Spotify CDN
|
||||
- No image caching by API
|
||||
- Clients need internet to display images
|
||||
- Broken links possible if Spotify removes images
|
||||
|
||||
## No Authentication Integration
|
||||
|
||||
### No OAuth
|
||||
|
||||
**What's missing:**
|
||||
- No OAuth 2.0 flow
|
||||
- No token validation
|
||||
- No user authentication
|
||||
- No API keys
|
||||
|
||||
**Implications:**
|
||||
- Public API (anyone can query)
|
||||
- No usage tracking per user
|
||||
- No quota enforcement per user
|
||||
- No access control
|
||||
|
||||
**Workarounds:**
|
||||
- Deploy behind reverse proxy with auth (nginx, Caddy)
|
||||
- Use API gateway (Kong, Tyk)
|
||||
- Implement custom middleware
|
||||
|
||||
### No Authorization
|
||||
|
||||
**What's missing:**
|
||||
- No role-based access control (RBAC)
|
||||
- No permission system
|
||||
- No resource ownership
|
||||
|
||||
**Implications:**
|
||||
- All data accessible to all clients
|
||||
- No private/public data distinction
|
||||
- No user-specific data
|
||||
|
||||
## No Monitoring Integration
|
||||
|
||||
### No Metrics Exporters
|
||||
|
||||
**What's missing:**
|
||||
- No Prometheus metrics
|
||||
- No StatsD integration
|
||||
- No OpenTelemetry
|
||||
- No custom metrics endpoint
|
||||
|
||||
**Implications:**
|
||||
- No visibility into request rates
|
||||
- No error rate tracking
|
||||
- No latency percentiles
|
||||
- No resource usage metrics
|
||||
|
||||
**Workarounds:**
|
||||
- Parse logs for metrics
|
||||
- Use reverse proxy metrics (nginx, Envoy)
|
||||
- Implement custom metrics middleware
|
||||
|
||||
### No Distributed Tracing
|
||||
|
||||
**What's missing:**
|
||||
- No Jaeger integration
|
||||
- No Zipkin support
|
||||
- No trace context propagation
|
||||
|
||||
**Implications:**
|
||||
- Can't trace requests across services
|
||||
- No performance profiling
|
||||
- No bottleneck identification
|
||||
|
||||
**Workarounds:**
|
||||
- Add custom tracing middleware
|
||||
- Use APM tools (Datadog, New Relic)
|
||||
|
||||
### No Log Aggregation
|
||||
|
||||
**What's missing:**
|
||||
- No Elasticsearch integration
|
||||
- No Splunk forwarding
|
||||
- No CloudWatch Logs
|
||||
- No structured log shipping
|
||||
|
||||
**Logging:** Go stdlib `log/slog` to stdout
|
||||
|
||||
**Implications:**
|
||||
- Logs only in container/process stdout
|
||||
- No centralized log storage
|
||||
- No log search/analysis
|
||||
|
||||
**Workarounds:**
|
||||
- Docker log drivers (json-file, syslog, fluentd)
|
||||
- Kubernetes log collectors (Fluentd, Filebeat)
|
||||
- Redirect stdout to log aggregator
|
||||
|
||||
## No Message Queue Integration
|
||||
|
||||
**What's missing:**
|
||||
- No RabbitMQ
|
||||
- No Kafka
|
||||
- No Redis Pub/Sub
|
||||
- No AWS SQS
|
||||
|
||||
**Implications:**
|
||||
- Synchronous request/response only
|
||||
- No async job processing
|
||||
- No event streaming
|
||||
- No background tasks
|
||||
|
||||
**Use case:** All queries processed synchronously (acceptable for read-only API)
|
||||
|
||||
## No Cache Integration
|
||||
|
||||
### No External Cache
|
||||
|
||||
**What's missing:**
|
||||
- No Redis
|
||||
- No Memcached
|
||||
- No Varnish
|
||||
|
||||
**Caching:** SQLite page cache only (64MB per connection)
|
||||
|
||||
**Implications:**
|
||||
- No shared cache across instances
|
||||
- No cache invalidation strategy
|
||||
- No cache warming
|
||||
- Cold start on each instance
|
||||
|
||||
**Workarounds:**
|
||||
- Add Redis layer for hot data
|
||||
- Use HTTP caching headers (not implemented)
|
||||
- Deploy CDN in front of API
|
||||
|
||||
### No HTTP Caching
|
||||
|
||||
**What's missing:**
|
||||
- No `Cache-Control` headers
|
||||
- No `ETag` support
|
||||
- No `Last-Modified` headers
|
||||
|
||||
**Implications:**
|
||||
- Clients can't cache responses
|
||||
- Repeated requests hit database
|
||||
- No bandwidth savings
|
||||
|
||||
**Workarounds:**
|
||||
- Add caching middleware
|
||||
- Use reverse proxy with caching (Varnish, nginx)
|
||||
|
||||
## No Database Replication
|
||||
|
||||
**What's missing:**
|
||||
- No master-slave replication
|
||||
- No read replicas
|
||||
- No database clustering
|
||||
|
||||
**Database:** Single SQLite file per instance
|
||||
|
||||
**Implications:**
|
||||
- Each instance has full database copy (216GB)
|
||||
- No shared database across instances
|
||||
- Horizontal scaling requires full database per instance
|
||||
|
||||
**Workarounds:**
|
||||
- Read-only databases safe to copy
|
||||
- Use network filesystem (NFS, EFS) for shared access
|
||||
- Replicate databases to multiple instances
|
||||
|
||||
## No Service Discovery
|
||||
|
||||
**What's missing:**
|
||||
- No Consul integration
|
||||
- No etcd
|
||||
- No Kubernetes service discovery
|
||||
- No DNS-based discovery
|
||||
|
||||
**Deployment:** Static configuration (IP:port)
|
||||
|
||||
**Implications:**
|
||||
- Manual load balancer configuration
|
||||
- No dynamic scaling
|
||||
- No health-based routing
|
||||
|
||||
**Workarounds:**
|
||||
- Use Kubernetes services (automatic discovery)
|
||||
- Use cloud load balancers (AWS ALB, GCP LB)
|
||||
- Use service mesh (Istio, Linkerd)
|
||||
|
||||
## No Configuration Management
|
||||
|
||||
### No External Config
|
||||
|
||||
**What's missing:**
|
||||
- No Consul KV
|
||||
- No etcd
|
||||
- No AWS Parameter Store
|
||||
- No HashiCorp Vault
|
||||
|
||||
**Configuration:** CLI flags only (`-db`, `-addr`)
|
||||
|
||||
**Implications:**
|
||||
- All config at startup
|
||||
- No dynamic reconfiguration
|
||||
- No secrets management
|
||||
- Hardcoded timeouts/limits
|
||||
|
||||
**Workarounds:**
|
||||
- Use environment variables (requires code changes)
|
||||
- Mount config files (requires code changes)
|
||||
- Use init containers to generate config
|
||||
|
||||
### No Secrets Management
|
||||
|
||||
**What's missing:**
|
||||
- No Vault integration
|
||||
- No AWS Secrets Manager
|
||||
- No Kubernetes secrets
|
||||
- No encrypted config
|
||||
|
||||
**Secrets:** None required (no authentication)
|
||||
|
||||
**Implications:**
|
||||
- No sensitive data to protect
|
||||
- No credential rotation
|
||||
- No encryption at rest
|
||||
|
||||
**Future consideration:** If adding authentication, integrate secrets manager
|
||||
|
||||
## Integration Patterns
|
||||
|
||||
### Reverse Proxy Integration
|
||||
|
||||
**Use case:** Add authentication, CORS, caching, SSL
|
||||
|
||||
**Example with nginx:**
|
||||
```nginx
|
||||
upstream metadata_api {
|
||||
server localhost:8080;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 443 ssl;
|
||||
server_name api.example.com;
|
||||
|
||||
ssl_certificate /etc/ssl/cert.pem;
|
||||
ssl_certificate_key /etc/ssl/key.pem;
|
||||
|
||||
# CORS headers
|
||||
add_header Access-Control-Allow-Origin *;
|
||||
add_header Access-Control-Allow-Methods "GET, POST, OPTIONS";
|
||||
|
||||
# Caching
|
||||
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m;
|
||||
proxy_cache api_cache;
|
||||
proxy_cache_valid 200 1h;
|
||||
|
||||
# Authentication
|
||||
auth_basic "Restricted";
|
||||
auth_basic_user_file /etc/nginx/.htpasswd;
|
||||
|
||||
location / {
|
||||
proxy_pass http://metadata_api;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### API Gateway Integration
|
||||
|
||||
**Use case:** Rate limiting, authentication, analytics
|
||||
|
||||
**Example with Kong:**
|
||||
```yaml
|
||||
services:
|
||||
- name: metadata-api
|
||||
url: http://localhost:8080
|
||||
routes:
|
||||
- name: metadata-routes
|
||||
paths:
|
||||
- /
|
||||
plugins:
|
||||
- name: rate-limiting
|
||||
config:
|
||||
minute: 1000
|
||||
policy: local
|
||||
- name: key-auth
|
||||
config:
|
||||
key_names:
|
||||
- apikey
|
||||
- name: prometheus
|
||||
config:
|
||||
per_consumer: true
|
||||
```
|
||||
|
||||
### Load Balancer Integration
|
||||
|
||||
**Use case:** Distribute traffic across multiple instances
|
||||
|
||||
**Example with HAProxy:**
|
||||
```
|
||||
frontend metadata_frontend
|
||||
bind *:80
|
||||
default_backend metadata_backend
|
||||
|
||||
backend metadata_backend
|
||||
balance roundrobin
|
||||
option httpchk GET /health
|
||||
server api1 10.0.1.10:8080 check
|
||||
server api2 10.0.1.11:8080 check
|
||||
server api3 10.0.1.12:8080 check
|
||||
```
|
||||
|
||||
### Kubernetes Integration
|
||||
|
||||
**Use case:** Container orchestration, auto-scaling
|
||||
|
||||
**Example deployment:**
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: metadata-api
|
||||
spec:
|
||||
replicas: 3
|
||||
selector:
|
||||
matchLabels:
|
||||
app: metadata-api
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: metadata-api
|
||||
spec:
|
||||
containers:
|
||||
- name: api
|
||||
image: ghcr.io/aunali321/music-metadata-api:latest
|
||||
args: ["-db", "/data/main_database.sqlite3"]
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
volumeMounts:
|
||||
- name: database
|
||||
mountPath: /data
|
||||
readOnly: true
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: 8080
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 30
|
||||
resources:
|
||||
requests:
|
||||
memory: "4Gi"
|
||||
cpu: "1"
|
||||
limits:
|
||||
memory: "8Gi"
|
||||
cpu: "2"
|
||||
volumes:
|
||||
- name: database
|
||||
persistentVolumeClaim:
|
||||
claimName: metadata-db-pvc
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: metadata-api
|
||||
spec:
|
||||
selector:
|
||||
app: metadata-api
|
||||
ports:
|
||||
- port: 80
|
||||
targetPort: 8080
|
||||
type: LoadBalancer
|
||||
```
|
||||
|
||||
### Monitoring Integration
|
||||
|
||||
**Use case:** Metrics, logs, traces
|
||||
|
||||
**Example with Prometheus + Grafana:**
|
||||
|
||||
**1. Add metrics exporter (custom middleware):**
|
||||
```go
|
||||
// Not implemented in current codebase
|
||||
import "github.com/prometheus/client_golang/prometheus"
|
||||
|
||||
var (
|
||||
requestsTotal = prometheus.NewCounterVec(
|
||||
prometheus.CounterOpts{Name: "api_requests_total"},
|
||||
[]string{"method", "endpoint", "status"},
|
||||
)
|
||||
requestDuration = prometheus.NewHistogramVec(
|
||||
prometheus.HistogramOpts{Name: "api_request_duration_seconds"},
|
||||
[]string{"method", "endpoint"},
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
**2. Scrape metrics with Prometheus:**
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'metadata-api'
|
||||
static_configs:
|
||||
- targets: ['localhost:8080']
|
||||
```
|
||||
|
||||
**3. Visualize in Grafana:**
|
||||
- Request rate dashboard
|
||||
- Error rate dashboard
|
||||
- Latency percentiles (p50, p95, p99)
|
||||
|
||||
### Logging Integration
|
||||
|
||||
**Use case:** Centralized log aggregation
|
||||
|
||||
**Example with Fluentd:**
|
||||
|
||||
**1. Configure Docker logging driver:**
|
||||
```yaml
|
||||
services:
|
||||
metadata-api:
|
||||
image: ghcr.io/aunali321/music-metadata-api:latest
|
||||
logging:
|
||||
driver: fluentd
|
||||
options:
|
||||
fluentd-address: localhost:24224
|
||||
tag: metadata-api
|
||||
```
|
||||
|
||||
**2. Fluentd configuration:**
|
||||
```
|
||||
<source>
|
||||
@type forward
|
||||
port 24224
|
||||
</source>
|
||||
|
||||
<match metadata-api>
|
||||
@type elasticsearch
|
||||
host elasticsearch
|
||||
port 9200
|
||||
index_name metadata-api
|
||||
type_name _doc
|
||||
</match>
|
||||
```
|
||||
|
||||
### Caching Integration
|
||||
|
||||
**Use case:** Reduce database load, improve latency
|
||||
|
||||
**Example with Redis:**
|
||||
|
||||
**1. Add Redis middleware (custom implementation):**
|
||||
```go
|
||||
// Not implemented in current codebase
|
||||
func cacheMiddleware(next http.Handler) http.Handler {
|
||||
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
// Check Redis cache
|
||||
cached, err := redisClient.Get(r.URL.Path).Result()
|
||||
if err == nil {
|
||||
w.Write([]byte(cached))
|
||||
return
|
||||
}
|
||||
|
||||
// Cache miss, call handler
|
||||
rec := httptest.NewRecorder()
|
||||
next.ServeHTTP(rec, r)
|
||||
|
||||
// Store in Redis (1 hour TTL)
|
||||
redisClient.Set(r.URL.Path, rec.Body.String(), time.Hour)
|
||||
|
||||
w.Write(rec.Body.Bytes())
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
**2. Deploy Redis:**
|
||||
```yaml
|
||||
services:
|
||||
redis:
|
||||
image: redis:7-alpine
|
||||
ports:
|
||||
- "6379:6379"
|
||||
```
|
||||
|
||||
## Complementary Services
|
||||
|
||||
### MusicBrainz Integration
|
||||
|
||||
**Use case:** Resolve MBIDs to ISRCs, then lookup in Music Metadata API
|
||||
|
||||
**Flow:**
|
||||
```
|
||||
1. Query MusicBrainz for recording by MBID
|
||||
↓
|
||||
2. Extract ISRC from MusicBrainz response
|
||||
↓
|
||||
3. Lookup ISRC in Music Metadata API
|
||||
↓
|
||||
4. Merge metadata (MusicBrainz credits + Spotify-style data)
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Step 1: Get ISRC from MusicBrainz
|
||||
mb_url = "https://musicbrainz.org/ws/2/recording/abc-123?fmt=json&inc=isrcs"
|
||||
mb_response = requests.get(mb_url).json()
|
||||
isrc = mb_response['isrcs'][0]
|
||||
|
||||
# Step 2: Lookup in Music Metadata API
|
||||
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
|
||||
mm_response = requests.get(mm_url).json()
|
||||
|
||||
# Step 3: Merge metadata
|
||||
merged = {
|
||||
"mbid": "abc-123",
|
||||
"isrc": isrc,
|
||||
"title": mm_response['name'],
|
||||
"popularity": mm_response['popularity'],
|
||||
"credits": mb_response['artist-credit']
|
||||
}
|
||||
```
|
||||
|
||||
### AcoustID Integration
|
||||
|
||||
**Use case:** Fingerprint audio files, resolve to ISRCs
|
||||
|
||||
**Flow:**
|
||||
```
|
||||
1. Generate audio fingerprint (chromaprint)
|
||||
↓
|
||||
2. Query AcoustID API with fingerprint
|
||||
↓
|
||||
3. Extract ISRC from AcoustID response
|
||||
↓
|
||||
4. Lookup ISRC in Music Metadata API
|
||||
↓
|
||||
5. Tag audio file with metadata
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
import acoustid
|
||||
|
||||
# Step 1: Fingerprint audio file
|
||||
duration, fingerprint = acoustid.fingerprint_file('song.mp3')
|
||||
|
||||
# Step 2: Query AcoustID
|
||||
results = acoustid.lookup(api_key, fingerprint, duration, meta='recordings')
|
||||
|
||||
# Step 3: Extract ISRC
|
||||
isrc = results['recordings'][0]['isrc']
|
||||
|
||||
# Step 4: Lookup in Music Metadata API
|
||||
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
|
||||
metadata = requests.get(mm_url).json()
|
||||
|
||||
# Step 5: Tag file
|
||||
audio = mutagen.File('song.mp3')
|
||||
audio['title'] = metadata['name']
|
||||
audio['artist'] = metadata['artists'][0]['name']
|
||||
audio.save()
|
||||
```
|
||||
|
||||
### Spotify Web API Integration
|
||||
|
||||
**Use case:** Get real-time data, then fallback to Music Metadata API
|
||||
|
||||
**Flow:**
|
||||
```
|
||||
1. Try Spotify Web API (requires OAuth)
|
||||
↓
|
||||
2. If rate limited or unavailable, fallback to Music Metadata API
|
||||
↓
|
||||
3. Return cached/static data from Music Metadata API
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
def get_track_metadata(isrc):
|
||||
try:
|
||||
# Try Spotify Web API (real-time)
|
||||
spotify_data = spotify_client.search(q=f"isrc:{isrc}", type="track")
|
||||
return spotify_data['tracks']['items'][0]
|
||||
except Exception:
|
||||
# Fallback to Music Metadata API (static)
|
||||
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
|
||||
return requests.get(mm_url).json()
|
||||
```
|
||||
|
||||
## Deployment Integrations
|
||||
|
||||
### Docker Compose
|
||||
|
||||
**Use case:** Local development, simple deployments
|
||||
|
||||
**Example:**
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
metadata-api:
|
||||
image: ghcr.io/aunali321/music-metadata-api:latest
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- ./data:/data:ro
|
||||
command: ["-db", "/data/main_database.sqlite3"]
|
||||
restart: unless-stopped
|
||||
|
||||
nginx:
|
||||
image: nginx:alpine
|
||||
ports:
|
||||
- "80:80"
|
||||
volumes:
|
||||
- ./nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
depends_on:
|
||||
- metadata-api
|
||||
```
|
||||
|
||||
### Kubernetes
|
||||
|
||||
**Use case:** Production deployments, auto-scaling
|
||||
|
||||
**See Kubernetes Integration section above**
|
||||
|
||||
### Cloud Platforms
|
||||
|
||||
**AWS ECS:**
|
||||
```json
|
||||
{
|
||||
"family": "metadata-api",
|
||||
"containerDefinitions": [{
|
||||
"name": "api",
|
||||
"image": "ghcr.io/aunali321/music-metadata-api:latest",
|
||||
"memory": 4096,
|
||||
"cpu": 1024,
|
||||
"portMappings": [{"containerPort": 8080}],
|
||||
"command": ["-db", "/data/main_database.sqlite3"],
|
||||
"mountPoints": [{
|
||||
"sourceVolume": "database",
|
||||
"containerPath": "/data",
|
||||
"readOnly": true
|
||||
}]
|
||||
}],
|
||||
"volumes": [{
|
||||
"name": "database",
|
||||
"efsVolumeConfiguration": {
|
||||
"fileSystemId": "fs-12345678"
|
||||
}
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
**Google Cloud Run:**
|
||||
```yaml
|
||||
apiVersion: serving.knative.dev/v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: metadata-api
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- image: ghcr.io/aunali321/music-metadata-api:latest
|
||||
args: ["-db", "/data/main_database.sqlite3"]
|
||||
volumeMounts:
|
||||
- name: database
|
||||
mountPath: /data
|
||||
readOnly: true
|
||||
volumes:
|
||||
- name: database
|
||||
gcePersistentDisk:
|
||||
pdName: metadata-db
|
||||
readOnly: true
|
||||
```
|
||||
|
||||
## No Integration Advantages
|
||||
|
||||
### Simplicity
|
||||
|
||||
**Benefits:**
|
||||
- No external service dependencies
|
||||
- No network calls (faster, more reliable)
|
||||
- No authentication complexity
|
||||
- No API rate limits (external)
|
||||
|
||||
**Tradeoffs:**
|
||||
- No real-time data
|
||||
- No automatic updates
|
||||
- No distributed features
|
||||
|
||||
### Reliability
|
||||
|
||||
**Benefits:**
|
||||
- No cascading failures (no external dependencies)
|
||||
- No network timeouts (all local)
|
||||
- No third-party outages
|
||||
- Predictable performance
|
||||
|
||||
**Tradeoffs:**
|
||||
- Single point of failure (database file)
|
||||
- No redundancy (unless replicated)
|
||||
|
||||
### Performance
|
||||
|
||||
**Benefits:**
|
||||
- No network latency (local database)
|
||||
- No API rate limits (self-imposed only)
|
||||
- Batch queries optimized (7 queries vs 2,800)
|
||||
|
||||
**Tradeoffs:**
|
||||
- Database size (216GB per instance)
|
||||
- Memory usage (2.5GB minimum)
|
||||
|
||||
### Cost
|
||||
|
||||
**Benefits:**
|
||||
- No API subscription fees
|
||||
- No per-request charges
|
||||
- No data transfer costs (local)
|
||||
|
||||
**Tradeoffs:**
|
||||
- Storage costs (216GB)
|
||||
- Compute costs (self-hosted)
|
||||
|
||||
## Future Integration Opportunities
|
||||
|
||||
### Potential Additions
|
||||
|
||||
**Authentication:**
|
||||
- OAuth 2.0 provider (Keycloak, Auth0)
|
||||
- API key management (custom or Kong)
|
||||
|
||||
**Monitoring:**
|
||||
- Prometheus metrics exporter
|
||||
- OpenTelemetry tracing
|
||||
- Structured logging to Elasticsearch
|
||||
|
||||
**Caching:**
|
||||
- Redis for hot data
|
||||
- HTTP caching headers
|
||||
- CDN for static responses
|
||||
|
||||
**Database:**
|
||||
- PostgreSQL for writable data
|
||||
- Read replicas for scaling
|
||||
- Full-text search (Elasticsearch, Meilisearch)
|
||||
|
||||
**Message Queue:**
|
||||
- Background job processing (Celery, Sidekiq)
|
||||
- Event streaming (Kafka)
|
||||
|
||||
**Configuration:**
|
||||
- Environment variables
|
||||
- Config files (YAML, TOML)
|
||||
- Secrets management (Vault)
|
||||
|
||||
### Integration Complexity
|
||||
|
||||
**Current:** Zero integrations (simplest possible)
|
||||
|
||||
**With additions:** Each integration adds:
|
||||
- Configuration complexity
|
||||
- Deployment dependencies
|
||||
- Failure modes
|
||||
- Maintenance burden
|
||||
|
||||
**Recommendation:** Only add integrations when necessary for specific use cases.
|
||||
@@ -0,0 +1,321 @@
|
||||
# Music Metadata API - Overview
|
||||
|
||||
## Project Identity
|
||||
|
||||
**Name:** Music Metadata API
|
||||
**Repository:** https://github.com/Aunali321/music-metadata-api
|
||||
**License:** MIT
|
||||
**Language:** Go 1.24
|
||||
**Maintainer:** Single maintainer (Aunali321)
|
||||
**Status:** Active, production-ready
|
||||
|
||||
## Purpose
|
||||
|
||||
Music Metadata API provides a self-hosted HTTP service for querying metadata on 256 million music tracks. The service operates entirely from pre-populated SQLite databases, requiring no external API calls at runtime. It's designed as a high-performance alternative to commercial music metadata APIs like Spotify's Web API.
|
||||
|
||||
## Core Technology Stack
|
||||
|
||||
### Runtime Dependencies
|
||||
|
||||
| Component | Version | Purpose | Notes |
|
||||
|-----------|---------|---------|-------|
|
||||
| Go | 1.24 | Runtime & stdlib HTTP server | Uses Go 1.22+ enhanced routing |
|
||||
| modernc.org/sqlite | v1.34.4 | Pure Go SQLite driver | No CGO required |
|
||||
| golang.org/x/time | v0.14.0 | Rate limiting (token bucket) | Only external dependency |
|
||||
|
||||
### Build Configuration
|
||||
|
||||
```bash
|
||||
CGO_ENABLED=0 go build -ldflags="-s -w" ./cmd/server
|
||||
```
|
||||
|
||||
**Flags explained:**
|
||||
- `CGO_ENABLED=0`: Pure Go binary, no C dependencies
|
||||
- `-s -w`: Strip debug symbols and DWARF tables (smaller binary)
|
||||
|
||||
## Data Scale
|
||||
|
||||
### Database Files
|
||||
|
||||
| Database | Size | Purpose | Records |
|
||||
|----------|------|---------|---------|
|
||||
| main_database.sqlite3 | ~117GB | Core metadata (tracks, albums, artists) | 256M tracks |
|
||||
| track_files.sqlite3 | ~99GB | Extended track data (lyrics flags, languages, roles) | 256M track files |
|
||||
| **Total** | **~216GB** | Combined storage requirement | - |
|
||||
|
||||
### Dataset Coverage
|
||||
|
||||
- **256 million tracks** across all databases
|
||||
- Album metadata with images, labels, release dates
|
||||
- Artist metadata with genres, follower counts, popularity scores
|
||||
- ISRC codes for track identification
|
||||
- Multi-language support (language_of_performance field)
|
||||
- Artist role information (performer, composer, etc.)
|
||||
|
||||
## Entry Points
|
||||
|
||||
### Command Line
|
||||
|
||||
**Binary:** `cmd/server/main.go` (62 lines)
|
||||
|
||||
**Flags:**
|
||||
```bash
|
||||
-db string
|
||||
Path to main database file (REQUIRED)
|
||||
|
||||
-addr string
|
||||
HTTP server address (default ":8080")
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
./metadata-api -db /data/main_database.sqlite3 -addr :8080
|
||||
```
|
||||
|
||||
### Docker
|
||||
|
||||
**Image:** `ghcr.io/aunali321/music-metadata-api:latest`
|
||||
**Base:** Alpine Linux 3.21
|
||||
|
||||
**docker-compose.yml:**
|
||||
```yaml
|
||||
services:
|
||||
metadata-api:
|
||||
image: ghcr.io/aunali321/music-metadata-api:latest
|
||||
ports:
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- ./data:/data:ro
|
||||
environment:
|
||||
- LOG_LEVEL=info # NOTE: Not actually used in code
|
||||
command: ["-db", "/data/main_database.sqlite3"]
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
## Architecture Layers
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
music-metadata-api/
|
||||
├── cmd/
|
||||
│ └── server/
|
||||
│ └── main.go # Entry point (62 lines)
|
||||
├── internal/
|
||||
│ ├── api/ # HTTP handlers, routing, middleware
|
||||
│ │ ├── handlers.go
|
||||
│ │ ├── ratelimit.go
|
||||
│ │ └── openapi.go
|
||||
│ ├── db/
|
||||
│ │ └── db.go # Database layer (907 lines)
|
||||
│ └── models/
|
||||
│ └── models.go # Data structures (65 lines)
|
||||
├── Dockerfile
|
||||
├── docker-compose.yml
|
||||
└── .github/
|
||||
└── workflows/
|
||||
└── docker-publish.yml
|
||||
```
|
||||
|
||||
### Layer Responsibilities
|
||||
|
||||
**API Layer** (`internal/api/`)
|
||||
- HTTP request handling
|
||||
- Rate limiting (token bucket, per-IP)
|
||||
- OpenAPI specification serving
|
||||
- Swagger UI hosting
|
||||
|
||||
**Database Layer** (`internal/db/`)
|
||||
- SQLite connection management
|
||||
- Query execution
|
||||
- Data enrichment (joining related entities)
|
||||
- Batch optimization
|
||||
|
||||
**Models Layer** (`internal/models/`)
|
||||
- Data structure definitions
|
||||
- JSON serialization tags
|
||||
- Response formatting
|
||||
|
||||
## Key Features
|
||||
|
||||
### Performance Optimizations
|
||||
|
||||
1. **Read-only databases** - No write locks, safe concurrent reads
|
||||
2. **Conservative PRAGMAs** - Optimized for read-heavy workloads
|
||||
3. **Batch endpoints** - Process up to 400 items per request
|
||||
4. **Connection pooling** - MaxOpenConns=8 for controlled resource usage
|
||||
5. **Memory-mapped I/O** - 1GB mmap for faster reads
|
||||
|
||||
### API Capabilities
|
||||
|
||||
- **Batch lookup** - Retrieve multiple tracks/albums/artists in single request
|
||||
- **ISRC lookup** - Industry-standard track identification
|
||||
- **Search** - Full-text search on tracks and artists
|
||||
- **Relationship traversal** - Album tracks, artist albums, track artists
|
||||
- **OpenAPI documentation** - Interactive Swagger UI at `/docs`
|
||||
|
||||
### Operational Features
|
||||
|
||||
- **Graceful shutdown** - 10-second timeout for in-flight requests
|
||||
- **Health checks** - `/health` endpoint for monitoring
|
||||
- **Rate limiting** - 100 req/s with 200 burst capacity
|
||||
- **Structured logging** - Go stdlib `log/slog` for error tracking
|
||||
|
||||
## Deployment Models
|
||||
|
||||
### Standalone Binary
|
||||
|
||||
**Pros:**
|
||||
- Single executable, no dependencies
|
||||
- Minimal resource footprint
|
||||
- Direct filesystem access to databases
|
||||
|
||||
**Cons:**
|
||||
- Manual process management
|
||||
- No automatic restarts
|
||||
- Manual log rotation
|
||||
|
||||
### Docker Container
|
||||
|
||||
**Pros:**
|
||||
- Consistent runtime environment
|
||||
- Built-in health checks
|
||||
- Automatic restarts
|
||||
- Easy horizontal scaling
|
||||
|
||||
**Cons:**
|
||||
- Requires Docker runtime
|
||||
- Additional layer of abstraction
|
||||
- Volume mount for large databases
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Primary Use Cases
|
||||
|
||||
1. **Music library enrichment** - Add metadata to existing track collections
|
||||
2. **ISRC-based lookup** - Resolve ISRCs to full track metadata
|
||||
3. **Batch processing** - Enrich large catalogs efficiently
|
||||
4. **Self-hosted alternative** - Replace commercial APIs with local service
|
||||
|
||||
### Integration Scenarios
|
||||
|
||||
- **Metadata aggregator pipelines** - Complement MusicBrainz with Spotify-style data
|
||||
- **Music streaming services** - Populate track/album/artist information
|
||||
- **DJ software** - Enrich track libraries with popularity, genres, images
|
||||
- **Music analytics** - Analyze trends across 256M tracks
|
||||
|
||||
## Limitations
|
||||
|
||||
### Technical Constraints
|
||||
|
||||
- **Database size** - Requires 216GB disk space
|
||||
- **No write operations** - Read-only, no data updates
|
||||
- **No authentication** - Public API, no access control
|
||||
- **No CORS** - Browser-based clients blocked
|
||||
- **Memory leak** - Rate limiter visitor map grows unbounded
|
||||
|
||||
### Data Constraints
|
||||
|
||||
- **Database provenance unclear** - "Not affiliated with Spotify"
|
||||
- **No freshness mechanism** - Static snapshot, no updates
|
||||
- **Search performance** - LIKE queries slow on large datasets (no FTS)
|
||||
|
||||
### Operational Constraints
|
||||
|
||||
- **No metrics** - No Prometheus, no counters
|
||||
- **Naive health check** - Doesn't verify database connectivity
|
||||
- **Hardcoded config** - Timeouts, limits not configurable
|
||||
- **No tests** - Zero test coverage
|
||||
|
||||
## Project Maturity
|
||||
|
||||
### Strengths
|
||||
|
||||
- Clean, simple codebase
|
||||
- Production-ready Docker setup
|
||||
- Comprehensive OpenAPI spec
|
||||
- Massive dataset (256M tracks)
|
||||
- Pure Go (no CGO complexity)
|
||||
|
||||
### Weaknesses
|
||||
|
||||
- Single maintainer
|
||||
- No test suite
|
||||
- No CI test step
|
||||
- Unused config (LOG_LEVEL)
|
||||
- Memory leak in rate limiter
|
||||
|
||||
## Comparison to Alternatives
|
||||
|
||||
| Feature | Music Metadata API | Spotify Web API | MusicBrainz API |
|
||||
|---------|-------------------|-----------------|-----------------|
|
||||
| Self-hosted | Yes | No | No |
|
||||
| Authentication | None | OAuth required | Optional |
|
||||
| Dataset size | 256M tracks | Full catalog | ~40M recordings |
|
||||
| Rate limits | 100 req/s | Varies by tier | 1 req/s |
|
||||
| Batch support | 400 items | 50 items | Limited |
|
||||
| Cost | Free (MIT) | Free tier limited | Free |
|
||||
| Data freshness | Static | Real-time | Community-updated |
|
||||
| Identifier | ISRC, internal IDs | Spotify IDs | MBIDs |
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Minimum Requirements
|
||||
|
||||
1. Go 1.24+ (for building from source)
|
||||
2. 216GB disk space for databases
|
||||
3. Database files (not included in repository)
|
||||
4. 2GB+ RAM recommended
|
||||
|
||||
### Quick Start
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone https://github.com/Aunali321/music-metadata-api.git
|
||||
cd music-metadata-api
|
||||
|
||||
# Build binary
|
||||
CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server
|
||||
|
||||
# Run server (assumes databases in /data)
|
||||
./metadata-api -db /data/main_database.sqlite3 -addr :8080
|
||||
|
||||
# Test health endpoint
|
||||
curl http://localhost:8080/health
|
||||
|
||||
# View API documentation
|
||||
open http://localhost:8080/docs
|
||||
```
|
||||
|
||||
### Docker Quick Start
|
||||
|
||||
```bash
|
||||
# Pull image
|
||||
docker pull ghcr.io/aunali321/music-metadata-api:latest
|
||||
|
||||
# Run container
|
||||
docker run -d \
|
||||
-p 8080:8080 \
|
||||
-v /path/to/databases:/data:ro \
|
||||
ghcr.io/aunali321/music-metadata-api:latest \
|
||||
-db /data/main_database.sqlite3
|
||||
|
||||
# Check health
|
||||
curl http://localhost:8080/health
|
||||
```
|
||||
|
||||
## Documentation Resources
|
||||
|
||||
- **OpenAPI Spec:** http://localhost:8080/openapi.yaml
|
||||
- **Interactive Docs:** http://localhost:8080/docs
|
||||
- **GitHub Repository:** https://github.com/Aunali321/music-metadata-api
|
||||
- **Docker Image:** ghcr.io/aunali321/music-metadata-api
|
||||
|
||||
## License
|
||||
|
||||
MIT License - Free for commercial and personal use with attribution.
|
||||
Reference in New Issue
Block a user