- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
21 KiB
Music Metadata API - Architecture
Architectural Overview
Music Metadata API follows a clean 3-layer architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────────┐
│ HTTP Clients │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ API Layer (internal/api) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Handlers │ │ Rate Limiter │ │ OpenAPI │ │
│ │ (routing) │ │ (middleware) │ │ (docs) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Database Layer (internal/db) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Queries │ │ Enrichment │ │ Batch │ │
│ │ (SQL) │ │ (joins) │ │ Optimization │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Models Layer (internal/models) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Track │ │ Album │ │ Artist │ │
│ │ (struct) │ │ (struct) │ │ (struct) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SQLite Databases (read-only) │
│ ┌──────────────────────────┐ ┌──────────────────────────┐ │
│ │ main_database.sqlite3 │ │ track_files.sqlite3 │ │
│ │ (~117GB) │ │ (~99GB) │ │
│ └──────────────────────────┘ └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Directory Structure
music-metadata-api/
├── cmd/
│ └── server/
│ └── main.go # Entry point (62 lines)
│
├── internal/
│ ├── api/
│ │ ├── handlers.go # HTTP route handlers
│ │ ├── ratelimit.go # Token bucket rate limiter
│ │ └── openapi.go # OpenAPI spec + Swagger UI
│ │
│ ├── db/
│ │ └── db.go # Database layer (907 lines)
│ │
│ └── models/
│ └── models.go # Data structures (65 lines)
│
├── Dockerfile # Multi-stage build
├── docker-compose.yml # Production deployment
├── go.mod # Dependencies
├── go.sum # Dependency checksums
├── .gitignore # Excludes databases, binaries
└── .github/
└── workflows/
└── docker-publish.yml # CI/CD pipeline
Layer Breakdown
Entry Point: cmd/server/main.go
Responsibilities:
- Parse CLI flags (
-db,-addr) - Initialize database connections
- Set up HTTP router
- Configure graceful shutdown
- Start HTTP server
Key code flow:
// 1. Parse flags
dbPath := flag.String("db", "", "path to database")
addr := flag.String("addr", ":8080", "server address")
// 2. Initialize database
database, err := db.NewDatabase(*dbPath)
// 3. Set up router with rate limiting
mux := http.NewServeMux()
rateLimiter := api.NewRateLimiter(100, 200) // 100 req/s, 200 burst
handler := rateLimiter.Limit(mux)
// 4. Register routes
api.RegisterRoutes(mux, database)
// 5. Graceful shutdown on SIGINT/SIGTERM
server := &http.Server{Addr: *addr, Handler: handler}
// ... shutdown logic with 10s timeout
File size: 62 lines (minimal, focused)
API Layer: internal/api/
handlers.go
Responsibilities:
- Route registration
- Request parsing
- Response serialization
- Error handling
- Query parameter validation
Route patterns (Go 1.22+ enhanced routing):
// Method + path patterns
mux.HandleFunc("POST /batch/lookup", handleBatchLookup)
mux.HandleFunc("GET /lookup/isrc/{isrc}", handleISRCLookup)
mux.HandleFunc("GET /lookup/track/{id}", handleTrackLookup)
mux.HandleFunc("GET /lookup/artist/{id}", handleArtistLookup)
mux.HandleFunc("GET /lookup/album/{id}", handleAlbumLookup)
mux.HandleFunc("GET /lookup/album/{id}/tracks", handleAlbumTracks)
mux.HandleFunc("GET /search/track", handleTrackSearch)
mux.HandleFunc("GET /search/artist", handleArtistSearch)
mux.HandleFunc("GET /health", handleHealth)
mux.HandleFunc("GET /docs", handleDocs)
mux.HandleFunc("GET /openapi.yaml", handleOpenAPI)
Handler pattern:
func handleTrackLookup(w http.ResponseWriter, r *http.Request) {
// 1. Extract path parameter
id := r.PathValue("id")
// 2. Call database layer
track, err := db.GetTrack(id)
if err != nil {
http.Error(w, "Track not found", http.StatusNotFound)
return
}
// 3. Serialize response
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(track)
}
Validation rules:
- Search queries: minimum 2 characters
- Batch requests: maximum 400 items
- Limit parameters: maximum 50 results
- Timeouts: 10 seconds for search queries
ratelimit.go
Implementation: Token bucket algorithm with per-IP tracking
Data structure:
type RateLimiter struct {
visitors map[string]*rate.Limiter // IP -> limiter
mu sync.RWMutex // Protects visitors map
rate rate.Limit // Tokens per second
burst int // Burst capacity
}
Algorithm:
- Extract client IP from
X-Forwarded-Forheader (fallback toRemoteAddr) - Look up or create limiter for IP
- Check if token available (
limiter.Allow()) - If allowed, pass to next handler
- If denied, return HTTP 429 with
Retry-Afterheader
BUG: Visitor map grows unbounded. No cleanup mechanism for inactive IPs. Long-running servers will accumulate memory.
Configuration:
- Rate: 100 requests/second
- Burst: 200 requests
- Scope: Per-IP (not per-user, no authentication)
openapi.go
Responsibilities:
- Serve OpenAPI 3.1 specification at
/openapi.yaml - Serve Swagger UI at
/docs - Embed OpenAPI spec in binary (no external files)
Swagger UI loading:
<!-- Loaded from unpkg.com CDN (browser-side) -->
<script src="https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js"></script>
<link rel="stylesheet" href="https://unpkg.com/swagger-ui-dist@5/swagger-ui.css" />
OpenAPI spec highlights:
- Version: 3.1.0
- All endpoints documented
- Request/response schemas
- Example payloads
- Error responses
Database Layer: internal/db/db.go
File size: 907 lines (largest file in codebase)
Responsibilities:
- SQLite connection management
- Query execution
- Data enrichment (joining related entities)
- Batch optimization
- Transaction handling (read-only)
Connection Management
Dual database connections:
type Database struct {
mainDB *sql.DB // main_database.sqlite3
trackFilesDB *sql.DB // track_files.sqlite3
}
Connection string PRAGMAs:
file:/path/to/db.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
PRAGMA breakdown:
| PRAGMA | Value | Purpose |
|---|---|---|
mode=ro |
Read-only | Prevents accidental writes |
_journal_mode=off |
Disabled | No write-ahead log (read-only safe) |
_cache_size=-64000 |
64MB | Page cache size (negative = KB) |
_mmap_size=1073741824 |
1GB | Memory-mapped I/O size |
_query_only=true |
Enabled | Additional read-only enforcement |
Connection pool:
db.SetMaxOpenConns(8) // Conservative limit
db.SetMaxIdleConns(8) // Keep connections warm
db.SetConnMaxLifetime(0) // No expiration
Query Patterns
Individual lookups:
func (d *Database) GetTrack(id string) (*models.Track, error) {
// 1. Fetch base track + album
row := d.mainDB.QueryRow(`
SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
t.track_number, t.disc_number, t.popularity, t.preview_url,
a.id, a.name, a.album_type, a.label, a.release_date,
a.release_date_precision, a.external_id_upc, a.total_tracks
FROM tracks t
JOIN albums a ON t.album_rowid = a.rowid
WHERE t.id = ?
`, id)
// 2. Enrich album (images, artists)
d.enrichAlbum(&track.Album)
// 3. Enrich track (artists, track_files)
d.enrichTrack(&track)
return &track, nil
}
Batch lookups:
func (d *Database) BatchGetByISRC(isrcs []string) (map[string]*models.Track, error) {
// 1. Build IN clause
placeholders := strings.Repeat("?,", len(isrcs)-1) + "?"
query := fmt.Sprintf(`
SELECT t.id, t.isrc, ...
FROM tracks t
JOIN albums a ON t.album_rowid = a.rowid
WHERE t.isrc IN (%s)
`, placeholders)
// 2. Execute batch query
rows, err := d.mainDB.Query(query, isrcs...)
// 3. Collect track IDs for enrichment
trackIDs := make([]string, 0, len(tracks))
albumIDs := make([]string, 0, len(tracks))
// 4. Batch enrich all entities
d.batchEnrichAlbums(albumIDs, tracks)
d.batchEnrichTracks(trackIDs, tracks)
return tracks, nil
}
Data Enrichment Flow
Track enrichment pipeline:
1. Fetch base track + album (single JOIN)
↓
2. Enrich album:
- Batch fetch album images (batchGetAlbumImages)
- Batch fetch album artists (batchGetAlbumArtists)
↓
3. Enrich track:
- Batch fetch track artists (batchGetTrackArtists)
- Batch fetch track files (batchEnrichTrackFiles)
↓
4. Enrich artists:
- Batch fetch artist genres (batchGetArtistGenres)
- Batch fetch artist images (batchGetArtistImages)
↓
5. Return fully enriched track
Batch optimization functions:
| Function | Purpose | Query Pattern |
|---|---|---|
batchGetAlbumImages |
Fetch all images for albums | WHERE album_id IN (...) |
batchGetAlbumArtists |
Fetch all artists for albums | WHERE album_id IN (...) |
batchGetTrackArtists |
Fetch all artists for tracks | WHERE track_id IN (...) |
batchGetArtistGenres |
Fetch all genres for artists | WHERE artist_id IN (...) |
batchGetArtistImages |
Fetch all images for artists | WHERE artist_id IN (...) |
batchEnrichTrackFiles |
Fetch extended track data | WHERE track_id IN (...) |
Why batch optimization matters:
- Single batch request with 400 tracks triggers ~6 batch queries
- Without batching: 400 tracks × 6 queries = 2,400 database queries
- With batching: 1 main query + 6 batch queries = 7 database queries
- Performance gain: 343x fewer queries
Search Implementation
Track search:
SELECT id, name, isrc, duration_ms, popularity, album_rowid
FROM tracks
WHERE name LIKE ? COLLATE NOCASE
ORDER BY popularity DESC
LIMIT ?
Artist search:
SELECT id, name, followers_total, popularity
FROM artists
WHERE name LIKE ? COLLATE NOCASE
ORDER BY followers_total DESC
LIMIT ?
Search characteristics:
- Pattern:
%query%(substring match) - Collation:
NOCASE(case-insensitive) - Timeout: 10 seconds (context deadline)
- Min query length: 2 characters
- Max results: 50
Performance concern: LIKE %query% can't use indexes efficiently. Full table scans on 256M tracks will be slow. FTS (Full-Text Search) would be faster but not implemented.
Models Layer: internal/models/models.go
File size: 65 lines (smallest layer)
Responsibilities:
- Define data structures
- JSON serialization tags
- Nested relationships
Core models:
type Track struct {
ID string `json:"id"`
Name string `json:"name"`
ISRC string `json:"isrc,omitempty"`
DurationMs int `json:"duration_ms"`
Explicit bool `json:"explicit"`
TrackNumber int `json:"track_number"`
DiscNumber int `json:"disc_number"`
Popularity int `json:"popularity"`
PreviewURL string `json:"preview_url,omitempty"`
Album Album `json:"album"`
Artists []Artist `json:"artists"`
// Extended fields from track_files DB
OriginalTitle string `json:"original_title,omitempty"`
VersionTitle string `json:"version_title,omitempty"`
HasLyrics bool `json:"has_lyrics"`
Languages []string `json:"languages,omitempty"`
ArtistRoles map[string][]string `json:"artist_roles,omitempty"`
}
type Album struct {
ID string `json:"id"`
Name string `json:"name"`
AlbumType string `json:"album_type"`
Label string `json:"label,omitempty"`
ReleaseDate string `json:"release_date"`
ReleaseDatePrecision string `json:"release_date_precision"`
ExternalIDUPC string `json:"external_id_upc,omitempty"`
TotalTracks int `json:"total_tracks"`
CopyrightC string `json:"copyright_c,omitempty"`
CopyrightP string `json:"copyright_p,omitempty"`
Images []Image `json:"images,omitempty"`
Artists []Artist `json:"artists,omitempty"`
}
type Artist struct {
ID string `json:"id"`
Name string `json:"name"`
FollowersTotal int `json:"followers_total,omitempty"`
Popularity int `json:"popularity,omitempty"`
Genres []string `json:"genres,omitempty"`
Images []Image `json:"images,omitempty"`
}
type Image struct {
URL string `json:"url"`
Width int `json:"width"`
Height int `json:"height"`
}
Batch request/response models:
type BatchRequest struct {
Tracks []string `json:"tracks,omitempty"` // Track IDs
Artists []string `json:"artists,omitempty"` // Artist IDs
Albums []string `json:"albums,omitempty"` // Album IDs
ISRCs []string `json:"isrcs,omitempty"` // ISRC codes
}
type BatchResponse struct {
Tracks map[string]*Track `json:"tracks,omitempty"`
Artists map[string]*Artist `json:"artists,omitempty"`
Albums map[string]*Album `json:"albums,omitempty"`
ISRCs map[string]*Track `json:"isrcs,omitempty"`
}
Request Flow
Example: GET /lookup/track/{id}
1. Client Request
GET /lookup/track/abc123
↓
2. Rate Limiter Middleware
- Extract IP from X-Forwarded-For
- Check token bucket for IP
- If allowed, continue; else return 429
↓
3. HTTP Handler (api/handlers.go)
- Extract "abc123" from path
- Call db.GetTrack("abc123")
↓
4. Database Layer (db/db.go)
- Query track + album (single JOIN)
- Enrich album (images, artists)
- Enrich track (artists, track_files)
- Enrich artists (genres, images)
↓
5. Models Layer (models/models.go)
- Populate Track struct
- Nest Album, Artists
↓
6. HTTP Handler
- Serialize Track to JSON
- Set Content-Type: application/json
- Write response
↓
7. Client Response
200 OK
{
"id": "abc123",
"name": "Song Title",
"album": {...},
"artists": [...]
}
Example: POST /batch/lookup
1. Client Request
POST /batch/lookup
{
"isrcs": ["USRC12345678", "GBUM71234567", ...], // Up to 400
"tracks": ["id1", "id2", ...]
}
↓
2. Rate Limiter Middleware
- Single request counts as 1 token (not 400)
↓
3. HTTP Handler
- Parse BatchRequest
- Validate: max 400 items total
- Call db.BatchGetByISRC(isrcs)
- Call db.BatchGetTracks(trackIDs)
↓
4. Database Layer
- Build IN clause for ISRCs
- Execute batch query (1 query for all ISRCs)
- Collect all track/album/artist IDs
- Batch enrich all entities (6 batch queries)
↓
5. HTTP Handler
- Build BatchResponse with maps
- Serialize to JSON
↓
6. Client Response
200 OK
{
"isrcs": {
"USRC12345678": {...},
"GBUM71234567": {...}
},
"tracks": {
"id1": {...},
"id2": {...}
}
}
Graceful Shutdown
Signal handling:
// Listen for SIGINT (Ctrl+C) and SIGTERM (Docker stop)
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
// Block until signal received
<-sigChan
// Shutdown with 10-second timeout
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
server.Shutdown(ctx) // Stop accepting new requests, finish in-flight
Shutdown sequence:
- Receive SIGINT or SIGTERM
- Stop accepting new connections
- Wait for in-flight requests (max 10 seconds)
- Close database connections
- Exit process
No Framework Philosophy
Music Metadata API uses zero web frameworks. Everything is Go stdlib:
Routing: Go 1.22+ enhanced http.ServeMux
- Method-specific routes:
GET /path,POST /path - Path parameters:
/lookup/track/{id} - No regex, no wildcards (simple patterns only)
JSON: encoding/json stdlib
json.NewEncoder(w).Encode(data)for responsesjson.NewDecoder(r.Body).Decode(&req)for requests
HTTP Server: net/http stdlib
http.Serverwith customAddrandHandler- No middleware framework (custom rate limiter)
Database: database/sql stdlib
modernc.org/sqlitedriver (pure Go, no CGO)- Raw SQL queries (no ORM)
Logging: log/slog stdlib
- Structured logging for errors
- No log levels (all logs are errors)
Benefits:
- Minimal dependencies (2 external packages)
- No framework lock-in
- Easy to understand (no magic)
- Fast compilation
- Small binary size
Tradeoffs:
- More boilerplate (manual error handling)
- No built-in middleware chain
- Manual query building (no ORM)
- No automatic validation
Performance Characteristics
Strengths:
- Read-only databases (no write locks)
- Connection pooling (8 connections)
- Memory-mapped I/O (1GB mmap)
- Batch optimization (343x fewer queries)
- Conservative cache (64MB)
Bottlenecks:
- Search queries (LIKE %query% on 256M rows)
- Rate limiter memory leak (unbounded map)
- No query result caching
- No CDN for image URLs
Scalability:
- Horizontal: Run multiple instances (read-only safe)
- Vertical: Limited by disk I/O and SQLite's single-writer model (not applicable here)
- Database size: 216GB requires SSD for acceptable performance