Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

21 KiB
Raw Permalink Blame History

Music Metadata API - Architecture

Architectural Overview

Music Metadata API follows a clean 3-layer architecture with clear separation of concerns:

┌─────────────────────────────────────────────────────────────┐
│                      HTTP Clients                            │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                   API Layer (internal/api)                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │  Handlers    │  │ Rate Limiter │  │   OpenAPI    │      │
│  │  (routing)   │  │ (middleware) │  │   (docs)     │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                Database Layer (internal/db)                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   Queries    │  │ Enrichment   │  │    Batch     │      │
│  │   (SQL)      │  │  (joins)     │  │ Optimization │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                 Models Layer (internal/models)               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │    Track     │  │    Album     │  │   Artist     │      │
│  │   (struct)   │  │   (struct)   │  │  (struct)    │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│              SQLite Databases (read-only)                    │
│  ┌──────────────────────────┐  ┌──────────────────────────┐ │
│  │  main_database.sqlite3   │  │  track_files.sqlite3     │ │
│  │       (~117GB)           │  │       (~99GB)            │ │
│  └──────────────────────────┘  └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Directory Structure

music-metadata-api/
├── cmd/
│   └── server/
│       └── main.go                    # Entry point (62 lines)
│
├── internal/
│   ├── api/
│   │   ├── handlers.go                # HTTP route handlers
│   │   ├── ratelimit.go               # Token bucket rate limiter
│   │   └── openapi.go                 # OpenAPI spec + Swagger UI
│   │
│   ├── db/
│   │   └── db.go                      # Database layer (907 lines)
│   │
│   └── models/
│       └── models.go                  # Data structures (65 lines)
│
├── Dockerfile                         # Multi-stage build
├── docker-compose.yml                 # Production deployment
├── go.mod                             # Dependencies
├── go.sum                             # Dependency checksums
├── .gitignore                         # Excludes databases, binaries
└── .github/
    └── workflows/
        └── docker-publish.yml         # CI/CD pipeline

Layer Breakdown

Entry Point: cmd/server/main.go

Responsibilities:

  • Parse CLI flags (-db, -addr)
  • Initialize database connections
  • Set up HTTP router
  • Configure graceful shutdown
  • Start HTTP server

Key code flow:

// 1. Parse flags
dbPath := flag.String("db", "", "path to database")
addr := flag.String("addr", ":8080", "server address")

// 2. Initialize database
database, err := db.NewDatabase(*dbPath)

// 3. Set up router with rate limiting
mux := http.NewServeMux()
rateLimiter := api.NewRateLimiter(100, 200)  // 100 req/s, 200 burst
handler := rateLimiter.Limit(mux)

// 4. Register routes
api.RegisterRoutes(mux, database)

// 5. Graceful shutdown on SIGINT/SIGTERM
server := &http.Server{Addr: *addr, Handler: handler}
// ... shutdown logic with 10s timeout

File size: 62 lines (minimal, focused)

API Layer: internal/api/

handlers.go

Responsibilities:

  • Route registration
  • Request parsing
  • Response serialization
  • Error handling
  • Query parameter validation

Route patterns (Go 1.22+ enhanced routing):

// Method + path patterns
mux.HandleFunc("POST /batch/lookup", handleBatchLookup)
mux.HandleFunc("GET /lookup/isrc/{isrc}", handleISRCLookup)
mux.HandleFunc("GET /lookup/track/{id}", handleTrackLookup)
mux.HandleFunc("GET /lookup/artist/{id}", handleArtistLookup)
mux.HandleFunc("GET /lookup/album/{id}", handleAlbumLookup)
mux.HandleFunc("GET /lookup/album/{id}/tracks", handleAlbumTracks)
mux.HandleFunc("GET /search/track", handleTrackSearch)
mux.HandleFunc("GET /search/artist", handleArtistSearch)
mux.HandleFunc("GET /health", handleHealth)
mux.HandleFunc("GET /docs", handleDocs)
mux.HandleFunc("GET /openapi.yaml", handleOpenAPI)

Handler pattern:

func handleTrackLookup(w http.ResponseWriter, r *http.Request) {
    // 1. Extract path parameter
    id := r.PathValue("id")
    
    // 2. Call database layer
    track, err := db.GetTrack(id)
    if err != nil {
        http.Error(w, "Track not found", http.StatusNotFound)
        return
    }
    
    // 3. Serialize response
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(track)
}

Validation rules:

  • Search queries: minimum 2 characters
  • Batch requests: maximum 400 items
  • Limit parameters: maximum 50 results
  • Timeouts: 10 seconds for search queries

ratelimit.go

Implementation: Token bucket algorithm with per-IP tracking

Data structure:

type RateLimiter struct {
    visitors map[string]*rate.Limiter  // IP -> limiter
    mu       sync.RWMutex               // Protects visitors map
    rate     rate.Limit                 // Tokens per second
    burst    int                        // Burst capacity
}

Algorithm:

  1. Extract client IP from X-Forwarded-For header (fallback to RemoteAddr)
  2. Look up or create limiter for IP
  3. Check if token available (limiter.Allow())
  4. If allowed, pass to next handler
  5. If denied, return HTTP 429 with Retry-After header

BUG: Visitor map grows unbounded. No cleanup mechanism for inactive IPs. Long-running servers will accumulate memory.

Configuration:

  • Rate: 100 requests/second
  • Burst: 200 requests
  • Scope: Per-IP (not per-user, no authentication)

openapi.go

Responsibilities:

  • Serve OpenAPI 3.1 specification at /openapi.yaml
  • Serve Swagger UI at /docs
  • Embed OpenAPI spec in binary (no external files)

Swagger UI loading:

<!-- Loaded from unpkg.com CDN (browser-side) -->
<script src="https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js"></script>
<link rel="stylesheet" href="https://unpkg.com/swagger-ui-dist@5/swagger-ui.css" />

OpenAPI spec highlights:

  • Version: 3.1.0
  • All endpoints documented
  • Request/response schemas
  • Example payloads
  • Error responses

Database Layer: internal/db/db.go

File size: 907 lines (largest file in codebase)

Responsibilities:

  • SQLite connection management
  • Query execution
  • Data enrichment (joining related entities)
  • Batch optimization
  • Transaction handling (read-only)

Connection Management

Dual database connections:

type Database struct {
    mainDB       *sql.DB  // main_database.sqlite3
    trackFilesDB *sql.DB  // track_files.sqlite3
}

Connection string PRAGMAs:

file:/path/to/db.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true

PRAGMA breakdown:

PRAGMA Value Purpose
mode=ro Read-only Prevents accidental writes
_journal_mode=off Disabled No write-ahead log (read-only safe)
_cache_size=-64000 64MB Page cache size (negative = KB)
_mmap_size=1073741824 1GB Memory-mapped I/O size
_query_only=true Enabled Additional read-only enforcement

Connection pool:

db.SetMaxOpenConns(8)   // Conservative limit
db.SetMaxIdleConns(8)   // Keep connections warm
db.SetConnMaxLifetime(0) // No expiration

Query Patterns

Individual lookups:

func (d *Database) GetTrack(id string) (*models.Track, error) {
    // 1. Fetch base track + album
    row := d.mainDB.QueryRow(`
        SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
               t.track_number, t.disc_number, t.popularity, t.preview_url,
               a.id, a.name, a.album_type, a.label, a.release_date,
               a.release_date_precision, a.external_id_upc, a.total_tracks
        FROM tracks t
        JOIN albums a ON t.album_rowid = a.rowid
        WHERE t.id = ?
    `, id)
    
    // 2. Enrich album (images, artists)
    d.enrichAlbum(&track.Album)
    
    // 3. Enrich track (artists, track_files)
    d.enrichTrack(&track)
    
    return &track, nil
}

Batch lookups:

func (d *Database) BatchGetByISRC(isrcs []string) (map[string]*models.Track, error) {
    // 1. Build IN clause
    placeholders := strings.Repeat("?,", len(isrcs)-1) + "?"
    query := fmt.Sprintf(`
        SELECT t.id, t.isrc, ...
        FROM tracks t
        JOIN albums a ON t.album_rowid = a.rowid
        WHERE t.isrc IN (%s)
    `, placeholders)
    
    // 2. Execute batch query
    rows, err := d.mainDB.Query(query, isrcs...)
    
    // 3. Collect track IDs for enrichment
    trackIDs := make([]string, 0, len(tracks))
    albumIDs := make([]string, 0, len(tracks))
    
    // 4. Batch enrich all entities
    d.batchEnrichAlbums(albumIDs, tracks)
    d.batchEnrichTracks(trackIDs, tracks)
    
    return tracks, nil
}

Data Enrichment Flow

Track enrichment pipeline:

1. Fetch base track + album (single JOIN)
   ↓
2. Enrich album:
   - Batch fetch album images (batchGetAlbumImages)
   - Batch fetch album artists (batchGetAlbumArtists)
   ↓
3. Enrich track:
   - Batch fetch track artists (batchGetTrackArtists)
   - Batch fetch track files (batchEnrichTrackFiles)
   ↓
4. Enrich artists:
   - Batch fetch artist genres (batchGetArtistGenres)
   - Batch fetch artist images (batchGetArtistImages)
   ↓
5. Return fully enriched track

Batch optimization functions:

Function Purpose Query Pattern
batchGetAlbumImages Fetch all images for albums WHERE album_id IN (...)
batchGetAlbumArtists Fetch all artists for albums WHERE album_id IN (...)
batchGetTrackArtists Fetch all artists for tracks WHERE track_id IN (...)
batchGetArtistGenres Fetch all genres for artists WHERE artist_id IN (...)
batchGetArtistImages Fetch all images for artists WHERE artist_id IN (...)
batchEnrichTrackFiles Fetch extended track data WHERE track_id IN (...)

Why batch optimization matters:

  • Single batch request with 400 tracks triggers ~6 batch queries
  • Without batching: 400 tracks × 6 queries = 2,400 database queries
  • With batching: 1 main query + 6 batch queries = 7 database queries
  • Performance gain: 343x fewer queries

Search Implementation

Track search:

SELECT id, name, isrc, duration_ms, popularity, album_rowid
FROM tracks
WHERE name LIKE ? COLLATE NOCASE
ORDER BY popularity DESC
LIMIT ?

Artist search:

SELECT id, name, followers_total, popularity
FROM artists
WHERE name LIKE ? COLLATE NOCASE
ORDER BY followers_total DESC
LIMIT ?

Search characteristics:

  • Pattern: %query% (substring match)
  • Collation: NOCASE (case-insensitive)
  • Timeout: 10 seconds (context deadline)
  • Min query length: 2 characters
  • Max results: 50

Performance concern: LIKE %query% can't use indexes efficiently. Full table scans on 256M tracks will be slow. FTS (Full-Text Search) would be faster but not implemented.

Models Layer: internal/models/models.go

File size: 65 lines (smallest layer)

Responsibilities:

  • Define data structures
  • JSON serialization tags
  • Nested relationships

Core models:

type Track struct {
    ID            string   `json:"id"`
    Name          string   `json:"name"`
    ISRC          string   `json:"isrc,omitempty"`
    DurationMs    int      `json:"duration_ms"`
    Explicit      bool     `json:"explicit"`
    TrackNumber   int      `json:"track_number"`
    DiscNumber    int      `json:"disc_number"`
    Popularity    int      `json:"popularity"`
    PreviewURL    string   `json:"preview_url,omitempty"`
    Album         Album    `json:"album"`
    Artists       []Artist `json:"artists"`
    
    // Extended fields from track_files DB
    OriginalTitle string                 `json:"original_title,omitempty"`
    VersionTitle  string                 `json:"version_title,omitempty"`
    HasLyrics     bool                   `json:"has_lyrics"`
    Languages     []string               `json:"languages,omitempty"`
    ArtistRoles   map[string][]string    `json:"artist_roles,omitempty"`
}

type Album struct {
    ID                    string   `json:"id"`
    Name                  string   `json:"name"`
    AlbumType             string   `json:"album_type"`
    Label                 string   `json:"label,omitempty"`
    ReleaseDate           string   `json:"release_date"`
    ReleaseDatePrecision  string   `json:"release_date_precision"`
    ExternalIDUPC         string   `json:"external_id_upc,omitempty"`
    TotalTracks           int      `json:"total_tracks"`
    CopyrightC            string   `json:"copyright_c,omitempty"`
    CopyrightP            string   `json:"copyright_p,omitempty"`
    Images                []Image  `json:"images,omitempty"`
    Artists               []Artist `json:"artists,omitempty"`
}

type Artist struct {
    ID             string   `json:"id"`
    Name           string   `json:"name"`
    FollowersTotal int      `json:"followers_total,omitempty"`
    Popularity     int      `json:"popularity,omitempty"`
    Genres         []string `json:"genres,omitempty"`
    Images         []Image  `json:"images,omitempty"`
}

type Image struct {
    URL    string `json:"url"`
    Width  int    `json:"width"`
    Height int    `json:"height"`
}

Batch request/response models:

type BatchRequest struct {
    Tracks  []string `json:"tracks,omitempty"`   // Track IDs
    Artists []string `json:"artists,omitempty"`  // Artist IDs
    Albums  []string `json:"albums,omitempty"`   // Album IDs
    ISRCs   []string `json:"isrcs,omitempty"`    // ISRC codes
}

type BatchResponse struct {
    Tracks  map[string]*Track  `json:"tracks,omitempty"`
    Artists map[string]*Artist `json:"artists,omitempty"`
    Albums  map[string]*Album  `json:"albums,omitempty"`
    ISRCs   map[string]*Track  `json:"isrcs,omitempty"`
}

Request Flow

Example: GET /lookup/track/{id}

1. Client Request
   GET /lookup/track/abc123
   ↓
2. Rate Limiter Middleware
   - Extract IP from X-Forwarded-For
   - Check token bucket for IP
   - If allowed, continue; else return 429
   ↓
3. HTTP Handler (api/handlers.go)
   - Extract "abc123" from path
   - Call db.GetTrack("abc123")
   ↓
4. Database Layer (db/db.go)
   - Query track + album (single JOIN)
   - Enrich album (images, artists)
   - Enrich track (artists, track_files)
   - Enrich artists (genres, images)
   ↓
5. Models Layer (models/models.go)
   - Populate Track struct
   - Nest Album, Artists
   ↓
6. HTTP Handler
   - Serialize Track to JSON
   - Set Content-Type: application/json
   - Write response
   ↓
7. Client Response
   200 OK
   {
     "id": "abc123",
     "name": "Song Title",
     "album": {...},
     "artists": [...]
   }

Example: POST /batch/lookup

1. Client Request
   POST /batch/lookup
   {
     "isrcs": ["USRC12345678", "GBUM71234567", ...],  // Up to 400
     "tracks": ["id1", "id2", ...]
   }
   ↓
2. Rate Limiter Middleware
   - Single request counts as 1 token (not 400)
   ↓
3. HTTP Handler
   - Parse BatchRequest
   - Validate: max 400 items total
   - Call db.BatchGetByISRC(isrcs)
   - Call db.BatchGetTracks(trackIDs)
   ↓
4. Database Layer
   - Build IN clause for ISRCs
   - Execute batch query (1 query for all ISRCs)
   - Collect all track/album/artist IDs
   - Batch enrich all entities (6 batch queries)
   ↓
5. HTTP Handler
   - Build BatchResponse with maps
   - Serialize to JSON
   ↓
6. Client Response
   200 OK
   {
     "isrcs": {
       "USRC12345678": {...},
       "GBUM71234567": {...}
     },
     "tracks": {
       "id1": {...},
       "id2": {...}
     }
   }

Graceful Shutdown

Signal handling:

// Listen for SIGINT (Ctrl+C) and SIGTERM (Docker stop)
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)

// Block until signal received
<-sigChan

// Shutdown with 10-second timeout
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

server.Shutdown(ctx)  // Stop accepting new requests, finish in-flight

Shutdown sequence:

  1. Receive SIGINT or SIGTERM
  2. Stop accepting new connections
  3. Wait for in-flight requests (max 10 seconds)
  4. Close database connections
  5. Exit process

No Framework Philosophy

Music Metadata API uses zero web frameworks. Everything is Go stdlib:

Routing: Go 1.22+ enhanced http.ServeMux

  • Method-specific routes: GET /path, POST /path
  • Path parameters: /lookup/track/{id}
  • No regex, no wildcards (simple patterns only)

JSON: encoding/json stdlib

  • json.NewEncoder(w).Encode(data) for responses
  • json.NewDecoder(r.Body).Decode(&req) for requests

HTTP Server: net/http stdlib

  • http.Server with custom Addr and Handler
  • No middleware framework (custom rate limiter)

Database: database/sql stdlib

  • modernc.org/sqlite driver (pure Go, no CGO)
  • Raw SQL queries (no ORM)

Logging: log/slog stdlib

  • Structured logging for errors
  • No log levels (all logs are errors)

Benefits:

  • Minimal dependencies (2 external packages)
  • No framework lock-in
  • Easy to understand (no magic)
  • Fast compilation
  • Small binary size

Tradeoffs:

  • More boilerplate (manual error handling)
  • No built-in middleware chain
  • Manual query building (no ORM)
  • No automatic validation

Performance Characteristics

Strengths:

  • Read-only databases (no write locks)
  • Connection pooling (8 connections)
  • Memory-mapped I/O (1GB mmap)
  • Batch optimization (343x fewer queries)
  • Conservative cache (64MB)

Bottlenecks:

  • Search queries (LIKE %query% on 256M rows)
  • Rate limiter memory leak (unbounded map)
  • No query result caching
  • No CDN for image URLs

Scalability:

  • Horizontal: Run multiple instances (read-only safe)
  • Vertical: Limited by disk I/O and SQLite's single-writer model (not applicable here)
  • Database size: 216GB requires SSD for acceptable performance