# Music Metadata API - Architecture

## Architectural Overview

Music Metadata API follows a clean 3-layer architecture with clear separation of concerns:

```
┌─────────────────────────────────────────────────────────────┐
│                      HTTP Clients                            │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                   API Layer (internal/api)                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │  Handlers    │  │ Rate Limiter │  │   OpenAPI    │      │
│  │  (routing)   │  │ (middleware) │  │   (docs)     │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                Database Layer (internal/db)                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   Queries    │  │ Enrichment   │  │    Batch     │      │
│  │   (SQL)      │  │  (joins)     │  │ Optimization │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                 Models Layer (internal/models)               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │    Track     │  │    Album     │  │   Artist     │      │
│  │   (struct)   │  │   (struct)   │  │  (struct)    │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│              SQLite Databases (read-only)                    │
│  ┌──────────────────────────┐  ┌──────────────────────────┐ │
│  │  main_database.sqlite3   │  │  track_files.sqlite3     │ │
│  │       (~117GB)           │  │       (~99GB)            │ │
│  └──────────────────────────┘  └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```

## Directory Structure

```
music-metadata-api/
├── cmd/
│   └── server/
│       └── main.go                    # Entry point (62 lines)
│
├── internal/
│   ├── api/
│   │   ├── handlers.go                # HTTP route handlers
│   │   ├── ratelimit.go               # Token bucket rate limiter
│   │   └── openapi.go                 # OpenAPI spec + Swagger UI
│   │
│   ├── db/
│   │   └── db.go                      # Database layer (907 lines)
│   │
│   └── models/
│       └── models.go                  # Data structures (65 lines)
│
├── Dockerfile                         # Multi-stage build
├── docker-compose.yml                 # Production deployment
├── go.mod                             # Dependencies
├── go.sum                             # Dependency checksums
├── .gitignore                         # Excludes databases, binaries
└── .github/
    └── workflows/
        └── docker-publish.yml         # CI/CD pipeline
```

## Layer Breakdown

### Entry Point: cmd/server/main.go

**Responsibilities:**
- Parse CLI flags (`-db`, `-addr`)
- Initialize database connections
- Set up HTTP router
- Configure graceful shutdown
- Start HTTP server

**Key code flow:**
```go
// 1. Parse flags
dbPath := flag.String("db", "", "path to database")
addr := flag.String("addr", ":8080", "server address")

// 2. Initialize database
database, err := db.NewDatabase(*dbPath)

// 3. Set up router with rate limiting
mux := http.NewServeMux()
rateLimiter := api.NewRateLimiter(100, 200)  // 100 req/s, 200 burst
handler := rateLimiter.Limit(mux)

// 4. Register routes
api.RegisterRoutes(mux, database)

// 5. Graceful shutdown on SIGINT/SIGTERM
server := &http.Server{Addr: *addr, Handler: handler}
// ... shutdown logic with 10s timeout
```

**File size:** 62 lines (minimal, focused)

### API Layer: internal/api/

#### handlers.go

**Responsibilities:**
- Route registration
- Request parsing
- Response serialization
- Error handling
- Query parameter validation

**Route patterns (Go 1.22+ enhanced routing):**
```go
// Method + path patterns
mux.HandleFunc("POST /batch/lookup", handleBatchLookup)
mux.HandleFunc("GET /lookup/isrc/{isrc}", handleISRCLookup)
mux.HandleFunc("GET /lookup/track/{id}", handleTrackLookup)
mux.HandleFunc("GET /lookup/artist/{id}", handleArtistLookup)
mux.HandleFunc("GET /lookup/album/{id}", handleAlbumLookup)
mux.HandleFunc("GET /lookup/album/{id}/tracks", handleAlbumTracks)
mux.HandleFunc("GET /search/track", handleTrackSearch)
mux.HandleFunc("GET /search/artist", handleArtistSearch)
mux.HandleFunc("GET /health", handleHealth)
mux.HandleFunc("GET /docs", handleDocs)
mux.HandleFunc("GET /openapi.yaml", handleOpenAPI)
```

**Handler pattern:**
```go
func handleTrackLookup(w http.ResponseWriter, r *http.Request) {
    // 1. Extract path parameter
    id := r.PathValue("id")
    
    // 2. Call database layer
    track, err := db.GetTrack(id)
    if err != nil {
        http.Error(w, "Track not found", http.StatusNotFound)
        return
    }
    
    // 3. Serialize response
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(track)
}
```

**Validation rules:**
- Search queries: minimum 2 characters
- Batch requests: maximum 400 items
- Limit parameters: maximum 50 results
- Timeouts: 10 seconds for search queries

#### ratelimit.go

**Implementation:** Token bucket algorithm with per-IP tracking

**Data structure:**
```go
type RateLimiter struct {
    visitors map[string]*rate.Limiter  // IP -> limiter
    mu       sync.RWMutex               // Protects visitors map
    rate     rate.Limit                 // Tokens per second
    burst    int                        // Burst capacity
}
```

**Algorithm:**
1. Extract client IP from `X-Forwarded-For` header (fallback to `RemoteAddr`)
2. Look up or create limiter for IP
3. Check if token available (`limiter.Allow()`)
4. If allowed, pass to next handler
5. If denied, return HTTP 429 with `Retry-After` header

**BUG:** Visitor map grows unbounded. No cleanup mechanism for inactive IPs. Long-running servers will accumulate memory.

**Configuration:**
- Rate: 100 requests/second
- Burst: 200 requests
- Scope: Per-IP (not per-user, no authentication)

#### openapi.go

**Responsibilities:**
- Serve OpenAPI 3.1 specification at `/openapi.yaml`
- Serve Swagger UI at `/docs`
- Embed OpenAPI spec in binary (no external files)

**Swagger UI loading:**
```html
<!-- Loaded from unpkg.com CDN (browser-side) -->
<script src="https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js"></script>
<link rel="stylesheet" href="https://unpkg.com/swagger-ui-dist@5/swagger-ui.css" />
```

**OpenAPI spec highlights:**
- Version: 3.1.0
- All endpoints documented
- Request/response schemas
- Example payloads
- Error responses

### Database Layer: internal/db/db.go

**File size:** 907 lines (largest file in codebase)

**Responsibilities:**
- SQLite connection management
- Query execution
- Data enrichment (joining related entities)
- Batch optimization
- Transaction handling (read-only)

#### Connection Management

**Dual database connections:**
```go
type Database struct {
    mainDB       *sql.DB  // main_database.sqlite3
    trackFilesDB *sql.DB  // track_files.sqlite3
}
```

**Connection string PRAGMAs:**
```
file:/path/to/db.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
```

**PRAGMA breakdown:**

| PRAGMA | Value | Purpose |
|--------|-------|---------|
| `mode=ro` | Read-only | Prevents accidental writes |
| `_journal_mode=off` | Disabled | No write-ahead log (read-only safe) |
| `_cache_size=-64000` | 64MB | Page cache size (negative = KB) |
| `_mmap_size=1073741824` | 1GB | Memory-mapped I/O size |
| `_query_only=true` | Enabled | Additional read-only enforcement |

**Connection pool:**
```go
db.SetMaxOpenConns(8)   // Conservative limit
db.SetMaxIdleConns(8)   // Keep connections warm
db.SetConnMaxLifetime(0) // No expiration
```

#### Query Patterns

**Individual lookups:**
```go
func (d *Database) GetTrack(id string) (*models.Track, error) {
    // 1. Fetch base track + album
    row := d.mainDB.QueryRow(`
        SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
               t.track_number, t.disc_number, t.popularity, t.preview_url,
               a.id, a.name, a.album_type, a.label, a.release_date,
               a.release_date_precision, a.external_id_upc, a.total_tracks
        FROM tracks t
        JOIN albums a ON t.album_rowid = a.rowid
        WHERE t.id = ?
    `, id)
    
    // 2. Enrich album (images, artists)
    d.enrichAlbum(&track.Album)
    
    // 3. Enrich track (artists, track_files)
    d.enrichTrack(&track)
    
    return &track, nil
}
```

**Batch lookups:**
```go
func (d *Database) BatchGetByISRC(isrcs []string) (map[string]*models.Track, error) {
    // 1. Build IN clause
    placeholders := strings.Repeat("?,", len(isrcs)-1) + "?"
    query := fmt.Sprintf(`
        SELECT t.id, t.isrc, ...
        FROM tracks t
        JOIN albums a ON t.album_rowid = a.rowid
        WHERE t.isrc IN (%s)
    `, placeholders)
    
    // 2. Execute batch query
    rows, err := d.mainDB.Query(query, isrcs...)
    
    // 3. Collect track IDs for enrichment
    trackIDs := make([]string, 0, len(tracks))
    albumIDs := make([]string, 0, len(tracks))
    
    // 4. Batch enrich all entities
    d.batchEnrichAlbums(albumIDs, tracks)
    d.batchEnrichTracks(trackIDs, tracks)
    
    return tracks, nil
}
```

#### Data Enrichment Flow

**Track enrichment pipeline:**
```
1. Fetch base track + album (single JOIN)
   ↓
2. Enrich album:
   - Batch fetch album images (batchGetAlbumImages)
   - Batch fetch album artists (batchGetAlbumArtists)
   ↓
3. Enrich track:
   - Batch fetch track artists (batchGetTrackArtists)
   - Batch fetch track files (batchEnrichTrackFiles)
   ↓
4. Enrich artists:
   - Batch fetch artist genres (batchGetArtistGenres)
   - Batch fetch artist images (batchGetArtistImages)
   ↓
5. Return fully enriched track
```

**Batch optimization functions:**

| Function | Purpose | Query Pattern |
|----------|---------|---------------|
| `batchGetAlbumImages` | Fetch all images for albums | `WHERE album_id IN (...)` |
| `batchGetAlbumArtists` | Fetch all artists for albums | `WHERE album_id IN (...)` |
| `batchGetTrackArtists` | Fetch all artists for tracks | `WHERE track_id IN (...)` |
| `batchGetArtistGenres` | Fetch all genres for artists | `WHERE artist_id IN (...)` |
| `batchGetArtistImages` | Fetch all images for artists | `WHERE artist_id IN (...)` |
| `batchEnrichTrackFiles` | Fetch extended track data | `WHERE track_id IN (...)` |

**Why batch optimization matters:**
- Single batch request with 400 tracks triggers ~6 batch queries
- Without batching: 400 tracks × 6 queries = 2,400 database queries
- With batching: 1 main query + 6 batch queries = 7 database queries
- **Performance gain: 343x fewer queries**

#### Search Implementation

**Track search:**
```sql
SELECT id, name, isrc, duration_ms, popularity, album_rowid
FROM tracks
WHERE name LIKE ? COLLATE NOCASE
ORDER BY popularity DESC
LIMIT ?
```

**Artist search:**
```sql
SELECT id, name, followers_total, popularity
FROM artists
WHERE name LIKE ? COLLATE NOCASE
ORDER BY followers_total DESC
LIMIT ?
```

**Search characteristics:**
- Pattern: `%query%` (substring match)
- Collation: `NOCASE` (case-insensitive)
- Timeout: 10 seconds (context deadline)
- Min query length: 2 characters
- Max results: 50

**Performance concern:** `LIKE %query%` can't use indexes efficiently. Full table scans on 256M tracks will be slow. FTS (Full-Text Search) would be faster but not implemented.

### Models Layer: internal/models/models.go

**File size:** 65 lines (smallest layer)

**Responsibilities:**
- Define data structures
- JSON serialization tags
- Nested relationships

**Core models:**

```go
type Track struct {
    ID            string   `json:"id"`
    Name          string   `json:"name"`
    ISRC          string   `json:"isrc,omitempty"`
    DurationMs    int      `json:"duration_ms"`
    Explicit      bool     `json:"explicit"`
    TrackNumber   int      `json:"track_number"`
    DiscNumber    int      `json:"disc_number"`
    Popularity    int      `json:"popularity"`
    PreviewURL    string   `json:"preview_url,omitempty"`
    Album         Album    `json:"album"`
    Artists       []Artist `json:"artists"`
    
    // Extended fields from track_files DB
    OriginalTitle string                 `json:"original_title,omitempty"`
    VersionTitle  string                 `json:"version_title,omitempty"`
    HasLyrics     bool                   `json:"has_lyrics"`
    Languages     []string               `json:"languages,omitempty"`
    ArtistRoles   map[string][]string    `json:"artist_roles,omitempty"`
}

type Album struct {
    ID                    string   `json:"id"`
    Name                  string   `json:"name"`
    AlbumType             string   `json:"album_type"`
    Label                 string   `json:"label,omitempty"`
    ReleaseDate           string   `json:"release_date"`
    ReleaseDatePrecision  string   `json:"release_date_precision"`
    ExternalIDUPC         string   `json:"external_id_upc,omitempty"`
    TotalTracks           int      `json:"total_tracks"`
    CopyrightC            string   `json:"copyright_c,omitempty"`
    CopyrightP            string   `json:"copyright_p,omitempty"`
    Images                []Image  `json:"images,omitempty"`
    Artists               []Artist `json:"artists,omitempty"`
}

type Artist struct {
    ID             string   `json:"id"`
    Name           string   `json:"name"`
    FollowersTotal int      `json:"followers_total,omitempty"`
    Popularity     int      `json:"popularity,omitempty"`
    Genres         []string `json:"genres,omitempty"`
    Images         []Image  `json:"images,omitempty"`
}

type Image struct {
    URL    string `json:"url"`
    Width  int    `json:"width"`
    Height int    `json:"height"`
}
```

**Batch request/response models:**

```go
type BatchRequest struct {
    Tracks  []string `json:"tracks,omitempty"`   // Track IDs
    Artists []string `json:"artists,omitempty"`  // Artist IDs
    Albums  []string `json:"albums,omitempty"`   // Album IDs
    ISRCs   []string `json:"isrcs,omitempty"`    // ISRC codes
}

type BatchResponse struct {
    Tracks  map[string]*Track  `json:"tracks,omitempty"`
    Artists map[string]*Artist `json:"artists,omitempty"`
    Albums  map[string]*Album  `json:"albums,omitempty"`
    ISRCs   map[string]*Track  `json:"isrcs,omitempty"`
}
```

## Request Flow

### Example: GET /lookup/track/{id}

```
1. Client Request
   GET /lookup/track/abc123
   ↓
2. Rate Limiter Middleware
   - Extract IP from X-Forwarded-For
   - Check token bucket for IP
   - If allowed, continue; else return 429
   ↓
3. HTTP Handler (api/handlers.go)
   - Extract "abc123" from path
   - Call db.GetTrack("abc123")
   ↓
4. Database Layer (db/db.go)
   - Query track + album (single JOIN)
   - Enrich album (images, artists)
   - Enrich track (artists, track_files)
   - Enrich artists (genres, images)
   ↓
5. Models Layer (models/models.go)
   - Populate Track struct
   - Nest Album, Artists
   ↓
6. HTTP Handler
   - Serialize Track to JSON
   - Set Content-Type: application/json
   - Write response
   ↓
7. Client Response
   200 OK
   {
     "id": "abc123",
     "name": "Song Title",
     "album": {...},
     "artists": [...]
   }
```

### Example: POST /batch/lookup

```
1. Client Request
   POST /batch/lookup
   {
     "isrcs": ["USRC12345678", "GBUM71234567", ...],  // Up to 400
     "tracks": ["id1", "id2", ...]
   }
   ↓
2. Rate Limiter Middleware
   - Single request counts as 1 token (not 400)
   ↓
3. HTTP Handler
   - Parse BatchRequest
   - Validate: max 400 items total
   - Call db.BatchGetByISRC(isrcs)
   - Call db.BatchGetTracks(trackIDs)
   ↓
4. Database Layer
   - Build IN clause for ISRCs
   - Execute batch query (1 query for all ISRCs)
   - Collect all track/album/artist IDs
   - Batch enrich all entities (6 batch queries)
   ↓
5. HTTP Handler
   - Build BatchResponse with maps
   - Serialize to JSON
   ↓
6. Client Response
   200 OK
   {
     "isrcs": {
       "USRC12345678": {...},
       "GBUM71234567": {...}
     },
     "tracks": {
       "id1": {...},
       "id2": {...}
     }
   }
```

## Graceful Shutdown

**Signal handling:**
```go
// Listen for SIGINT (Ctrl+C) and SIGTERM (Docker stop)
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)

// Block until signal received
<-sigChan

// Shutdown with 10-second timeout
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

server.Shutdown(ctx)  // Stop accepting new requests, finish in-flight
```

**Shutdown sequence:**
1. Receive SIGINT or SIGTERM
2. Stop accepting new connections
3. Wait for in-flight requests (max 10 seconds)
4. Close database connections
5. Exit process

## No Framework Philosophy

Music Metadata API uses **zero web frameworks**. Everything is Go stdlib:

**Routing:** Go 1.22+ enhanced `http.ServeMux`
- Method-specific routes: `GET /path`, `POST /path`
- Path parameters: `/lookup/track/{id}`
- No regex, no wildcards (simple patterns only)

**JSON:** `encoding/json` stdlib
- `json.NewEncoder(w).Encode(data)` for responses
- `json.NewDecoder(r.Body).Decode(&req)` for requests

**HTTP Server:** `net/http` stdlib
- `http.Server` with custom `Addr` and `Handler`
- No middleware framework (custom rate limiter)

**Database:** `database/sql` stdlib
- `modernc.org/sqlite` driver (pure Go, no CGO)
- Raw SQL queries (no ORM)

**Logging:** `log/slog` stdlib
- Structured logging for errors
- No log levels (all logs are errors)

**Benefits:**
- Minimal dependencies (2 external packages)
- No framework lock-in
- Easy to understand (no magic)
- Fast compilation
- Small binary size

**Tradeoffs:**
- More boilerplate (manual error handling)
- No built-in middleware chain
- Manual query building (no ORM)
- No automatic validation

## Performance Characteristics

**Strengths:**
- Read-only databases (no write locks)
- Connection pooling (8 connections)
- Memory-mapped I/O (1GB mmap)
- Batch optimization (343x fewer queries)
- Conservative cache (64MB)

**Bottlenecks:**
- Search queries (LIKE %query% on 256M rows)
- Rate limiter memory leak (unbounded map)
- No query result caching
- No CDN for image URLs

**Scalability:**
- Horizontal: Run multiple instances (read-only safe)
- Vertical: Limited by disk I/O and SQLite's single-writer model (not applicable here)
- Database size: 216GB requires SSD for acceptable performance