Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

21 KiB

Raw Blame History

Music Metadata API - Architecture

Architectural Overview

Music Metadata API follows a clean 3-layer architecture with clear separation of concerns:

┌─────────────────────────────────────────────────────────────┐
│                      HTTP Clients                            │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                   API Layer (internal/api)                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │  Handlers    │  │ Rate Limiter │  │   OpenAPI    │      │
│  │  (routing)   │  │ (middleware) │  │   (docs)     │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                Database Layer (internal/db)                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   Queries    │  │ Enrichment   │  │    Batch     │      │
│  │   (SQL)      │  │  (joins)     │  │ Optimization │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                 Models Layer (internal/models)               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │    Track     │  │    Album     │  │   Artist     │      │
│  │   (struct)   │  │   (struct)   │  │  (struct)    │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│              SQLite Databases (read-only)                    │
│  ┌──────────────────────────┐  ┌──────────────────────────┐ │
│  │  main_database.sqlite3   │  │  track_files.sqlite3     │ │
│  │       (~117GB)           │  │       (~99GB)            │ │
│  └──────────────────────────┘  └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Directory Structure

music-metadata-api/
├── cmd/
│   └── server/
│       └── main.go                    # Entry point (62 lines)
│
├── internal/
│   ├── api/
│   │   ├── handlers.go                # HTTP route handlers
│   │   ├── ratelimit.go               # Token bucket rate limiter
│   │   └── openapi.go                 # OpenAPI spec + Swagger UI
│   │
│   ├── db/
│   │   └── db.go                      # Database layer (907 lines)
│   │
│   └── models/
│       └── models.go                  # Data structures (65 lines)
│
├── Dockerfile                         # Multi-stage build
├── docker-compose.yml                 # Production deployment
├── go.mod                             # Dependencies
├── go.sum                             # Dependency checksums
├── .gitignore                         # Excludes databases, binaries
└── .github/
    └── workflows/
        └── docker-publish.yml         # CI/CD pipeline

Layer Breakdown

Entry Point: cmd/server/main.go

Responsibilities:

Parse CLI flags (-db, -addr)
Initialize database connections
Set up HTTP router
Configure graceful shutdown
Start HTTP server

Key code flow:

// 1. Parse flags
dbPath := flag.String("db", "", "path to database")
addr := flag.String("addr", ":8080", "server address")

// 2. Initialize database
database, err := db.NewDatabase(*dbPath)

// 3. Set up router with rate limiting
mux := http.NewServeMux()
rateLimiter := api.NewRateLimiter(100, 200)  // 100 req/s, 200 burst
handler := rateLimiter.Limit(mux)

// 4. Register routes
api.RegisterRoutes(mux, database)

// 5. Graceful shutdown on SIGINT/SIGTERM
server := &http.Server{Addr: *addr, Handler: handler}
// ... shutdown logic with 10s timeout

File size: 62 lines (minimal, focused)

API Layer: internal/api/

handlers.go

Responsibilities:

Route registration
Request parsing
Response serialization
Error handling
Query parameter validation

Route patterns (Go 1.22+ enhanced routing):

// Method + path patterns
mux.HandleFunc("POST /batch/lookup", handleBatchLookup)
mux.HandleFunc("GET /lookup/isrc/{isrc}", handleISRCLookup)
mux.HandleFunc("GET /lookup/track/{id}", handleTrackLookup)
mux.HandleFunc("GET /lookup/artist/{id}", handleArtistLookup)
mux.HandleFunc("GET /lookup/album/{id}", handleAlbumLookup)
mux.HandleFunc("GET /lookup/album/{id}/tracks", handleAlbumTracks)
mux.HandleFunc("GET /search/track", handleTrackSearch)
mux.HandleFunc("GET /search/artist", handleArtistSearch)
mux.HandleFunc("GET /health", handleHealth)
mux.HandleFunc("GET /docs", handleDocs)
mux.HandleFunc("GET /openapi.yaml", handleOpenAPI)

Handler pattern:

func handleTrackLookup(w http.ResponseWriter, r *http.Request) {
    // 1. Extract path parameter
    id := r.PathValue("id")
    
    // 2. Call database layer
    track, err := db.GetTrack(id)
    if err != nil {
        http.Error(w, "Track not found", http.StatusNotFound)
        return
    }
    
    // 3. Serialize response
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(track)
}

Validation rules:

Search queries: minimum 2 characters
Batch requests: maximum 400 items
Limit parameters: maximum 50 results
Timeouts: 10 seconds for search queries

ratelimit.go

Implementation: Token bucket algorithm with per-IP tracking

Data structure:

type RateLimiter struct {
    visitors map[string]*rate.Limiter  // IP -> limiter
    mu       sync.RWMutex               // Protects visitors map
    rate     rate.Limit                 // Tokens per second
    burst    int                        // Burst capacity
}

Algorithm:

Extract client IP from X-Forwarded-For header (fallback to RemoteAddr)
Look up or create limiter for IP
Check if token available (limiter.Allow())
If allowed, pass to next handler
If denied, return HTTP 429 with Retry-After header

BUG: Visitor map grows unbounded. No cleanup mechanism for inactive IPs. Long-running servers will accumulate memory.

Configuration:

Rate: 100 requests/second
Burst: 200 requests
Scope: Per-IP (not per-user, no authentication)

openapi.go

Responsibilities:

Serve OpenAPI 3.1 specification at /openapi.yaml
Serve Swagger UI at /docs
Embed OpenAPI spec in binary (no external files)

Swagger UI loading:

<!-- Loaded from unpkg.com CDN (browser-side) -->
<script src="https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js"></script>
<link rel="stylesheet" href="https://unpkg.com/swagger-ui-dist@5/swagger-ui.css" />

OpenAPI spec highlights:

Version: 3.1.0
All endpoints documented
Request/response schemas
Example payloads
Error responses

Database Layer: internal/db/db.go

File size: 907 lines (largest file in codebase)

Responsibilities:

SQLite connection management
Query execution
Data enrichment (joining related entities)
Batch optimization
Transaction handling (read-only)

Connection Management

Dual database connections:

type Database struct {
    mainDB       *sql.DB  // main_database.sqlite3
    trackFilesDB *sql.DB  // track_files.sqlite3
}

Connection string PRAGMAs:

file:/path/to/db.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true

PRAGMA breakdown:

PRAGMA	Value	Purpose
`mode=ro`	Read-only	Prevents accidental writes
`_journal_mode=off`	Disabled	No write-ahead log (read-only safe)
`_cache_size=-64000`	64MB	Page cache size (negative = KB)
`_mmap_size=1073741824`	1GB	Memory-mapped I/O size
`_query_only=true`	Enabled	Additional read-only enforcement

Connection pool:

db.SetMaxOpenConns(8)   // Conservative limit
db.SetMaxIdleConns(8)   // Keep connections warm
db.SetConnMaxLifetime(0) // No expiration

Query Patterns

Individual lookups:

func (d *Database) GetTrack(id string) (*models.Track, error) {
    // 1. Fetch base track + album
    row := d.mainDB.QueryRow(`
        SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
               t.track_number, t.disc_number, t.popularity, t.preview_url,
               a.id, a.name, a.album_type, a.label, a.release_date,
               a.release_date_precision, a.external_id_upc, a.total_tracks
        FROM tracks t
        JOIN albums a ON t.album_rowid = a.rowid
        WHERE t.id = ?
    `, id)
    
    // 2. Enrich album (images, artists)
    d.enrichAlbum(&track.Album)
    
    // 3. Enrich track (artists, track_files)
    d.enrichTrack(&track)
    
    return &track, nil
}

Batch lookups:

func (d *Database) BatchGetByISRC(isrcs []string) (map[string]*models.Track, error) {
    // 1. Build IN clause
    placeholders := strings.Repeat("?,", len(isrcs)-1) + "?"
    query := fmt.Sprintf(`
        SELECT t.id, t.isrc, ...
        FROM tracks t
        JOIN albums a ON t.album_rowid = a.rowid
        WHERE t.isrc IN (%s)
    `, placeholders)
    
    // 2. Execute batch query
    rows, err := d.mainDB.Query(query, isrcs...)
    
    // 3. Collect track IDs for enrichment
    trackIDs := make([]string, 0, len(tracks))
    albumIDs := make([]string, 0, len(tracks))
    
    // 4. Batch enrich all entities
    d.batchEnrichAlbums(albumIDs, tracks)
    d.batchEnrichTracks(trackIDs, tracks)
    
    return tracks, nil
}

Data Enrichment Flow

Track enrichment pipeline:

1. Fetch base track + album (single JOIN)
   ↓
2. Enrich album:
   - Batch fetch album images (batchGetAlbumImages)
   - Batch fetch album artists (batchGetAlbumArtists)
   ↓
3. Enrich track:
   - Batch fetch track artists (batchGetTrackArtists)
   - Batch fetch track files (batchEnrichTrackFiles)
   ↓
4. Enrich artists:
   - Batch fetch artist genres (batchGetArtistGenres)
   - Batch fetch artist images (batchGetArtistImages)
   ↓
5. Return fully enriched track

Batch optimization functions:

Function	Purpose	Query Pattern
`batchGetAlbumImages`	Fetch all images for albums	`WHERE album_id IN (...)`
`batchGetAlbumArtists`	Fetch all artists for albums	`WHERE album_id IN (...)`
`batchGetTrackArtists`	Fetch all artists for tracks	`WHERE track_id IN (...)`
`batchGetArtistGenres`	Fetch all genres for artists	`WHERE artist_id IN (...)`
`batchGetArtistImages`	Fetch all images for artists	`WHERE artist_id IN (...)`
`batchEnrichTrackFiles`	Fetch extended track data	`WHERE track_id IN (...)`

Why batch optimization matters:

Single batch request with 400 tracks triggers ~6 batch queries
Without batching: 400 tracks × 6 queries = 2,400 database queries
With batching: 1 main query + 6 batch queries = 7 database queries
Performance gain: 343x fewer queries

Search Implementation

Track search:

SELECT id, name, isrc, duration_ms, popularity, album_rowid
FROM tracks
WHERE name LIKE ? COLLATE NOCASE
ORDER BY popularity DESC
LIMIT ?

Artist search:

SELECT id, name, followers_total, popularity
FROM artists
WHERE name LIKE ? COLLATE NOCASE
ORDER BY followers_total DESC
LIMIT ?

Search characteristics:

Pattern: %query% (substring match)
Collation: NOCASE (case-insensitive)
Timeout: 10 seconds (context deadline)
Min query length: 2 characters
Max results: 50

Performance concern: LIKE %query% can't use indexes efficiently. Full table scans on 256M tracks will be slow. FTS (Full-Text Search) would be faster but not implemented.

Models Layer: internal/models/models.go

File size: 65 lines (smallest layer)

Responsibilities:

Define data structures
JSON serialization tags
Nested relationships

Core models:

type Track struct {
    ID            string   `json:"id"`
    Name          string   `json:"name"`
    ISRC          string   `json:"isrc,omitempty"`
    DurationMs    int      `json:"duration_ms"`
    Explicit      bool     `json:"explicit"`
    TrackNumber   int      `json:"track_number"`
    DiscNumber    int      `json:"disc_number"`
    Popularity    int      `json:"popularity"`
    PreviewURL    string   `json:"preview_url,omitempty"`
    Album         Album    `json:"album"`
    Artists       []Artist `json:"artists"`
    
    // Extended fields from track_files DB
    OriginalTitle string                 `json:"original_title,omitempty"`
    VersionTitle  string                 `json:"version_title,omitempty"`
    HasLyrics     bool                   `json:"has_lyrics"`
    Languages     []string               `json:"languages,omitempty"`
    ArtistRoles   map[string][]string    `json:"artist_roles,omitempty"`
}

type Album struct {
    ID                    string   `json:"id"`
    Name                  string   `json:"name"`
    AlbumType             string   `json:"album_type"`
    Label                 string   `json:"label,omitempty"`
    ReleaseDate           string   `json:"release_date"`
    ReleaseDatePrecision  string   `json:"release_date_precision"`
    ExternalIDUPC         string   `json:"external_id_upc,omitempty"`
    TotalTracks           int      `json:"total_tracks"`
    CopyrightC            string   `json:"copyright_c,omitempty"`
    CopyrightP            string   `json:"copyright_p,omitempty"`
    Images                []Image  `json:"images,omitempty"`
    Artists               []Artist `json:"artists,omitempty"`
}

type Artist struct {
    ID             string   `json:"id"`
    Name           string   `json:"name"`
    FollowersTotal int      `json:"followers_total,omitempty"`
    Popularity     int      `json:"popularity,omitempty"`
    Genres         []string `json:"genres,omitempty"`
    Images         []Image  `json:"images,omitempty"`
}

type Image struct {
    URL    string `json:"url"`
    Width  int    `json:"width"`
    Height int    `json:"height"`
}

Batch request/response models:

type BatchRequest struct {
    Tracks  []string `json:"tracks,omitempty"`   // Track IDs
    Artists []string `json:"artists,omitempty"`  // Artist IDs
    Albums  []string `json:"albums,omitempty"`   // Album IDs
    ISRCs   []string `json:"isrcs,omitempty"`    // ISRC codes
}

type BatchResponse struct {
    Tracks  map[string]*Track  `json:"tracks,omitempty"`
    Artists map[string]*Artist `json:"artists,omitempty"`
    Albums  map[string]*Album  `json:"albums,omitempty"`
    ISRCs   map[string]*Track  `json:"isrcs,omitempty"`
}

Request Flow

Example: GET /lookup/track/{id}

1. Client Request
   GET /lookup/track/abc123
   ↓
2. Rate Limiter Middleware
   - Extract IP from X-Forwarded-For
   - Check token bucket for IP
   - If allowed, continue; else return 429
   ↓
3. HTTP Handler (api/handlers.go)
   - Extract "abc123" from path
   - Call db.GetTrack("abc123")
   ↓
4. Database Layer (db/db.go)
   - Query track + album (single JOIN)
   - Enrich album (images, artists)
   - Enrich track (artists, track_files)
   - Enrich artists (genres, images)
   ↓
5. Models Layer (models/models.go)
   - Populate Track struct
   - Nest Album, Artists
   ↓
6. HTTP Handler
   - Serialize Track to JSON
   - Set Content-Type: application/json
   - Write response
   ↓
7. Client Response
   200 OK
   {
     "id": "abc123",
     "name": "Song Title",
     "album": {...},
     "artists": [...]
   }

Example: POST /batch/lookup

1. Client Request
   POST /batch/lookup
   {
     "isrcs": ["USRC12345678", "GBUM71234567", ...],  // Up to 400
     "tracks": ["id1", "id2", ...]
   }
   ↓
2. Rate Limiter Middleware
   - Single request counts as 1 token (not 400)
   ↓
3. HTTP Handler
   - Parse BatchRequest
   - Validate: max 400 items total
   - Call db.BatchGetByISRC(isrcs)
   - Call db.BatchGetTracks(trackIDs)
   ↓
4. Database Layer
   - Build IN clause for ISRCs
   - Execute batch query (1 query for all ISRCs)
   - Collect all track/album/artist IDs
   - Batch enrich all entities (6 batch queries)
   ↓
5. HTTP Handler
   - Build BatchResponse with maps
   - Serialize to JSON
   ↓
6. Client Response
   200 OK
   {
     "isrcs": {
       "USRC12345678": {...},
       "GBUM71234567": {...}
     },
     "tracks": {
       "id1": {...},
       "id2": {...}
     }
   }

Graceful Shutdown

Signal handling:

// Listen for SIGINT (Ctrl+C) and SIGTERM (Docker stop)
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)

// Block until signal received
<-sigChan

// Shutdown with 10-second timeout
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

server.Shutdown(ctx)  // Stop accepting new requests, finish in-flight

Shutdown sequence:

Receive SIGINT or SIGTERM
Stop accepting new connections
Wait for in-flight requests (max 10 seconds)
Close database connections
Exit process

No Framework Philosophy

Music Metadata API uses zero web frameworks. Everything is Go stdlib:

Routing: Go 1.22+ enhanced http.ServeMux

Method-specific routes: GET /path, POST /path
Path parameters: /lookup/track/{id}
No regex, no wildcards (simple patterns only)

JSON: encoding/json stdlib

json.NewEncoder(w).Encode(data) for responses
json.NewDecoder(r.Body).Decode(&req) for requests

HTTP Server: net/http stdlib

http.Server with custom Addr and Handler
No middleware framework (custom rate limiter)

Database: database/sql stdlib

modernc.org/sqlite driver (pure Go, no CGO)
Raw SQL queries (no ORM)

Logging: log/slog stdlib

Structured logging for errors
No log levels (all logs are errors)

Benefits:

Minimal dependencies (2 external packages)
No framework lock-in
Easy to understand (no magic)
Fast compilation
Small binary size

Tradeoffs:

More boilerplate (manual error handling)
No built-in middleware chain
Manual query building (no ORM)
No automatic validation

Performance Characteristics

Strengths:

Read-only databases (no write locks)
Connection pooling (8 connections)
Memory-mapped I/O (1GB mmap)
Batch optimization (343x fewer queries)
Conservative cache (64MB)

Bottlenecks:

Search queries (LIKE %query% on 256M rows)
Rate limiter memory leak (unbounded map)
No query result caching
No CDN for image URLs

Scalability:

Horizontal: Run multiple instances (read-only safe)
Vertical: Limited by disk I/O and SQLite's single-writer model (not applicable here)
Database size: 216GB requires SSD for acceptable performance

21 KiB Raw Blame History Unescape Escape

Music Metadata API - Architecture

Architectural Overview

Directory Structure

Layer Breakdown

Entry Point: cmd/server/main.go

API Layer: internal/api/

handlers.go

ratelimit.go

openapi.go

Database Layer: internal/db/db.go

Connection Management

Query Patterns

Data Enrichment Flow

Search Implementation

Models Layer: internal/models/models.go

Request Flow

Example: GET /lookup/track/{id}

Example: POST /batch/lookup

Graceful Shutdown

No Framework Philosophy

Performance Characteristics

21 KiB

Raw Blame History