# Music Metadata API - Architecture ## Architectural Overview Music Metadata API follows a clean 3-layer architecture with clear separation of concerns: ``` ┌─────────────────────────────────────────────────────────────┐ │ HTTP Clients │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ API Layer (internal/api) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Handlers │ │ Rate Limiter │ │ OpenAPI │ │ │ │ (routing) │ │ (middleware) │ │ (docs) │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Database Layer (internal/db) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Queries │ │ Enrichment │ │ Batch │ │ │ │ (SQL) │ │ (joins) │ │ Optimization │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Models Layer (internal/models) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Track │ │ Album │ │ Artist │ │ │ │ (struct) │ │ (struct) │ │ (struct) │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ SQLite Databases (read-only) │ │ ┌──────────────────────────┐ ┌──────────────────────────┐ │ │ │ main_database.sqlite3 │ │ track_files.sqlite3 │ │ │ │ (~117GB) │ │ (~99GB) │ │ │ └──────────────────────────┘ └──────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ## Directory Structure ``` music-metadata-api/ ├── cmd/ │ └── server/ │ └── main.go # Entry point (62 lines) │ ├── internal/ │ ├── api/ │ │ ├── handlers.go # HTTP route handlers │ │ ├── ratelimit.go # Token bucket rate limiter │ │ └── openapi.go # OpenAPI spec + Swagger UI │ │ │ ├── db/ │ │ └── db.go # Database layer (907 lines) │ │ │ └── models/ │ └── models.go # Data structures (65 lines) │ ├── Dockerfile # Multi-stage build ├── docker-compose.yml # Production deployment ├── go.mod # Dependencies ├── go.sum # Dependency checksums ├── .gitignore # Excludes databases, binaries └── .github/ └── workflows/ └── docker-publish.yml # CI/CD pipeline ``` ## Layer Breakdown ### Entry Point: cmd/server/main.go **Responsibilities:** - Parse CLI flags (`-db`, `-addr`) - Initialize database connections - Set up HTTP router - Configure graceful shutdown - Start HTTP server **Key code flow:** ```go // 1. Parse flags dbPath := flag.String("db", "", "path to database") addr := flag.String("addr", ":8080", "server address") // 2. Initialize database database, err := db.NewDatabase(*dbPath) // 3. Set up router with rate limiting mux := http.NewServeMux() rateLimiter := api.NewRateLimiter(100, 200) // 100 req/s, 200 burst handler := rateLimiter.Limit(mux) // 4. Register routes api.RegisterRoutes(mux, database) // 5. Graceful shutdown on SIGINT/SIGTERM server := &http.Server{Addr: *addr, Handler: handler} // ... shutdown logic with 10s timeout ``` **File size:** 62 lines (minimal, focused) ### API Layer: internal/api/ #### handlers.go **Responsibilities:** - Route registration - Request parsing - Response serialization - Error handling - Query parameter validation **Route patterns (Go 1.22+ enhanced routing):** ```go // Method + path patterns mux.HandleFunc("POST /batch/lookup", handleBatchLookup) mux.HandleFunc("GET /lookup/isrc/{isrc}", handleISRCLookup) mux.HandleFunc("GET /lookup/track/{id}", handleTrackLookup) mux.HandleFunc("GET /lookup/artist/{id}", handleArtistLookup) mux.HandleFunc("GET /lookup/album/{id}", handleAlbumLookup) mux.HandleFunc("GET /lookup/album/{id}/tracks", handleAlbumTracks) mux.HandleFunc("GET /search/track", handleTrackSearch) mux.HandleFunc("GET /search/artist", handleArtistSearch) mux.HandleFunc("GET /health", handleHealth) mux.HandleFunc("GET /docs", handleDocs) mux.HandleFunc("GET /openapi.yaml", handleOpenAPI) ``` **Handler pattern:** ```go func handleTrackLookup(w http.ResponseWriter, r *http.Request) { // 1. Extract path parameter id := r.PathValue("id") // 2. Call database layer track, err := db.GetTrack(id) if err != nil { http.Error(w, "Track not found", http.StatusNotFound) return } // 3. Serialize response w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(track) } ``` **Validation rules:** - Search queries: minimum 2 characters - Batch requests: maximum 400 items - Limit parameters: maximum 50 results - Timeouts: 10 seconds for search queries #### ratelimit.go **Implementation:** Token bucket algorithm with per-IP tracking **Data structure:** ```go type RateLimiter struct { visitors map[string]*rate.Limiter // IP -> limiter mu sync.RWMutex // Protects visitors map rate rate.Limit // Tokens per second burst int // Burst capacity } ``` **Algorithm:** 1. Extract client IP from `X-Forwarded-For` header (fallback to `RemoteAddr`) 2. Look up or create limiter for IP 3. Check if token available (`limiter.Allow()`) 4. If allowed, pass to next handler 5. If denied, return HTTP 429 with `Retry-After` header **BUG:** Visitor map grows unbounded. No cleanup mechanism for inactive IPs. Long-running servers will accumulate memory. **Configuration:** - Rate: 100 requests/second - Burst: 200 requests - Scope: Per-IP (not per-user, no authentication) #### openapi.go **Responsibilities:** - Serve OpenAPI 3.1 specification at `/openapi.yaml` - Serve Swagger UI at `/docs` - Embed OpenAPI spec in binary (no external files) **Swagger UI loading:** ```html ``` **OpenAPI spec highlights:** - Version: 3.1.0 - All endpoints documented - Request/response schemas - Example payloads - Error responses ### Database Layer: internal/db/db.go **File size:** 907 lines (largest file in codebase) **Responsibilities:** - SQLite connection management - Query execution - Data enrichment (joining related entities) - Batch optimization - Transaction handling (read-only) #### Connection Management **Dual database connections:** ```go type Database struct { mainDB *sql.DB // main_database.sqlite3 trackFilesDB *sql.DB // track_files.sqlite3 } ``` **Connection string PRAGMAs:** ``` file:/path/to/db.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true ``` **PRAGMA breakdown:** | PRAGMA | Value | Purpose | |--------|-------|---------| | `mode=ro` | Read-only | Prevents accidental writes | | `_journal_mode=off` | Disabled | No write-ahead log (read-only safe) | | `_cache_size=-64000` | 64MB | Page cache size (negative = KB) | | `_mmap_size=1073741824` | 1GB | Memory-mapped I/O size | | `_query_only=true` | Enabled | Additional read-only enforcement | **Connection pool:** ```go db.SetMaxOpenConns(8) // Conservative limit db.SetMaxIdleConns(8) // Keep connections warm db.SetConnMaxLifetime(0) // No expiration ``` #### Query Patterns **Individual lookups:** ```go func (d *Database) GetTrack(id string) (*models.Track, error) { // 1. Fetch base track + album row := d.mainDB.QueryRow(` SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit, t.track_number, t.disc_number, t.popularity, t.preview_url, a.id, a.name, a.album_type, a.label, a.release_date, a.release_date_precision, a.external_id_upc, a.total_tracks FROM tracks t JOIN albums a ON t.album_rowid = a.rowid WHERE t.id = ? `, id) // 2. Enrich album (images, artists) d.enrichAlbum(&track.Album) // 3. Enrich track (artists, track_files) d.enrichTrack(&track) return &track, nil } ``` **Batch lookups:** ```go func (d *Database) BatchGetByISRC(isrcs []string) (map[string]*models.Track, error) { // 1. Build IN clause placeholders := strings.Repeat("?,", len(isrcs)-1) + "?" query := fmt.Sprintf(` SELECT t.id, t.isrc, ... FROM tracks t JOIN albums a ON t.album_rowid = a.rowid WHERE t.isrc IN (%s) `, placeholders) // 2. Execute batch query rows, err := d.mainDB.Query(query, isrcs...) // 3. Collect track IDs for enrichment trackIDs := make([]string, 0, len(tracks)) albumIDs := make([]string, 0, len(tracks)) // 4. Batch enrich all entities d.batchEnrichAlbums(albumIDs, tracks) d.batchEnrichTracks(trackIDs, tracks) return tracks, nil } ``` #### Data Enrichment Flow **Track enrichment pipeline:** ``` 1. Fetch base track + album (single JOIN) ↓ 2. Enrich album: - Batch fetch album images (batchGetAlbumImages) - Batch fetch album artists (batchGetAlbumArtists) ↓ 3. Enrich track: - Batch fetch track artists (batchGetTrackArtists) - Batch fetch track files (batchEnrichTrackFiles) ↓ 4. Enrich artists: - Batch fetch artist genres (batchGetArtistGenres) - Batch fetch artist images (batchGetArtistImages) ↓ 5. Return fully enriched track ``` **Batch optimization functions:** | Function | Purpose | Query Pattern | |----------|---------|---------------| | `batchGetAlbumImages` | Fetch all images for albums | `WHERE album_id IN (...)` | | `batchGetAlbumArtists` | Fetch all artists for albums | `WHERE album_id IN (...)` | | `batchGetTrackArtists` | Fetch all artists for tracks | `WHERE track_id IN (...)` | | `batchGetArtistGenres` | Fetch all genres for artists | `WHERE artist_id IN (...)` | | `batchGetArtistImages` | Fetch all images for artists | `WHERE artist_id IN (...)` | | `batchEnrichTrackFiles` | Fetch extended track data | `WHERE track_id IN (...)` | **Why batch optimization matters:** - Single batch request with 400 tracks triggers ~6 batch queries - Without batching: 400 tracks × 6 queries = 2,400 database queries - With batching: 1 main query + 6 batch queries = 7 database queries - **Performance gain: 343x fewer queries** #### Search Implementation **Track search:** ```sql SELECT id, name, isrc, duration_ms, popularity, album_rowid FROM tracks WHERE name LIKE ? COLLATE NOCASE ORDER BY popularity DESC LIMIT ? ``` **Artist search:** ```sql SELECT id, name, followers_total, popularity FROM artists WHERE name LIKE ? COLLATE NOCASE ORDER BY followers_total DESC LIMIT ? ``` **Search characteristics:** - Pattern: `%query%` (substring match) - Collation: `NOCASE` (case-insensitive) - Timeout: 10 seconds (context deadline) - Min query length: 2 characters - Max results: 50 **Performance concern:** `LIKE %query%` can't use indexes efficiently. Full table scans on 256M tracks will be slow. FTS (Full-Text Search) would be faster but not implemented. ### Models Layer: internal/models/models.go **File size:** 65 lines (smallest layer) **Responsibilities:** - Define data structures - JSON serialization tags - Nested relationships **Core models:** ```go type Track struct { ID string `json:"id"` Name string `json:"name"` ISRC string `json:"isrc,omitempty"` DurationMs int `json:"duration_ms"` Explicit bool `json:"explicit"` TrackNumber int `json:"track_number"` DiscNumber int `json:"disc_number"` Popularity int `json:"popularity"` PreviewURL string `json:"preview_url,omitempty"` Album Album `json:"album"` Artists []Artist `json:"artists"` // Extended fields from track_files DB OriginalTitle string `json:"original_title,omitempty"` VersionTitle string `json:"version_title,omitempty"` HasLyrics bool `json:"has_lyrics"` Languages []string `json:"languages,omitempty"` ArtistRoles map[string][]string `json:"artist_roles,omitempty"` } type Album struct { ID string `json:"id"` Name string `json:"name"` AlbumType string `json:"album_type"` Label string `json:"label,omitempty"` ReleaseDate string `json:"release_date"` ReleaseDatePrecision string `json:"release_date_precision"` ExternalIDUPC string `json:"external_id_upc,omitempty"` TotalTracks int `json:"total_tracks"` CopyrightC string `json:"copyright_c,omitempty"` CopyrightP string `json:"copyright_p,omitempty"` Images []Image `json:"images,omitempty"` Artists []Artist `json:"artists,omitempty"` } type Artist struct { ID string `json:"id"` Name string `json:"name"` FollowersTotal int `json:"followers_total,omitempty"` Popularity int `json:"popularity,omitempty"` Genres []string `json:"genres,omitempty"` Images []Image `json:"images,omitempty"` } type Image struct { URL string `json:"url"` Width int `json:"width"` Height int `json:"height"` } ``` **Batch request/response models:** ```go type BatchRequest struct { Tracks []string `json:"tracks,omitempty"` // Track IDs Artists []string `json:"artists,omitempty"` // Artist IDs Albums []string `json:"albums,omitempty"` // Album IDs ISRCs []string `json:"isrcs,omitempty"` // ISRC codes } type BatchResponse struct { Tracks map[string]*Track `json:"tracks,omitempty"` Artists map[string]*Artist `json:"artists,omitempty"` Albums map[string]*Album `json:"albums,omitempty"` ISRCs map[string]*Track `json:"isrcs,omitempty"` } ``` ## Request Flow ### Example: GET /lookup/track/{id} ``` 1. Client Request GET /lookup/track/abc123 ↓ 2. Rate Limiter Middleware - Extract IP from X-Forwarded-For - Check token bucket for IP - If allowed, continue; else return 429 ↓ 3. HTTP Handler (api/handlers.go) - Extract "abc123" from path - Call db.GetTrack("abc123") ↓ 4. Database Layer (db/db.go) - Query track + album (single JOIN) - Enrich album (images, artists) - Enrich track (artists, track_files) - Enrich artists (genres, images) ↓ 5. Models Layer (models/models.go) - Populate Track struct - Nest Album, Artists ↓ 6. HTTP Handler - Serialize Track to JSON - Set Content-Type: application/json - Write response ↓ 7. Client Response 200 OK { "id": "abc123", "name": "Song Title", "album": {...}, "artists": [...] } ``` ### Example: POST /batch/lookup ``` 1. Client Request POST /batch/lookup { "isrcs": ["USRC12345678", "GBUM71234567", ...], // Up to 400 "tracks": ["id1", "id2", ...] } ↓ 2. Rate Limiter Middleware - Single request counts as 1 token (not 400) ↓ 3. HTTP Handler - Parse BatchRequest - Validate: max 400 items total - Call db.BatchGetByISRC(isrcs) - Call db.BatchGetTracks(trackIDs) ↓ 4. Database Layer - Build IN clause for ISRCs - Execute batch query (1 query for all ISRCs) - Collect all track/album/artist IDs - Batch enrich all entities (6 batch queries) ↓ 5. HTTP Handler - Build BatchResponse with maps - Serialize to JSON ↓ 6. Client Response 200 OK { "isrcs": { "USRC12345678": {...}, "GBUM71234567": {...} }, "tracks": { "id1": {...}, "id2": {...} } } ``` ## Graceful Shutdown **Signal handling:** ```go // Listen for SIGINT (Ctrl+C) and SIGTERM (Docker stop) sigChan := make(chan os.Signal, 1) signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM) // Block until signal received <-sigChan // Shutdown with 10-second timeout ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() server.Shutdown(ctx) // Stop accepting new requests, finish in-flight ``` **Shutdown sequence:** 1. Receive SIGINT or SIGTERM 2. Stop accepting new connections 3. Wait for in-flight requests (max 10 seconds) 4. Close database connections 5. Exit process ## No Framework Philosophy Music Metadata API uses **zero web frameworks**. Everything is Go stdlib: **Routing:** Go 1.22+ enhanced `http.ServeMux` - Method-specific routes: `GET /path`, `POST /path` - Path parameters: `/lookup/track/{id}` - No regex, no wildcards (simple patterns only) **JSON:** `encoding/json` stdlib - `json.NewEncoder(w).Encode(data)` for responses - `json.NewDecoder(r.Body).Decode(&req)` for requests **HTTP Server:** `net/http` stdlib - `http.Server` with custom `Addr` and `Handler` - No middleware framework (custom rate limiter) **Database:** `database/sql` stdlib - `modernc.org/sqlite` driver (pure Go, no CGO) - Raw SQL queries (no ORM) **Logging:** `log/slog` stdlib - Structured logging for errors - No log levels (all logs are errors) **Benefits:** - Minimal dependencies (2 external packages) - No framework lock-in - Easy to understand (no magic) - Fast compilation - Small binary size **Tradeoffs:** - More boilerplate (manual error handling) - No built-in middleware chain - Manual query building (no ORM) - No automatic validation ## Performance Characteristics **Strengths:** - Read-only databases (no write locks) - Connection pooling (8 connections) - Memory-mapped I/O (1GB mmap) - Batch optimization (343x fewer queries) - Conservative cache (64MB) **Bottlenecks:** - Search queries (LIKE %query% on 256M rows) - Rate limiter memory leak (unbounded map) - No query result caching - No CDN for image URLs **Scalability:** - Horizontal: Run multiple instances (read-only safe) - Vertical: Limited by disk I/O and SQLite's single-writer model (not applicable here) - Database size: 216GB requires SSD for acceptable performance