# Music Metadata API - Codebase Analysis ## Codebase Overview Music Metadata API is a small, focused Go codebase with minimal complexity: **Total lines of code:** ~1,100 lines (excluding tests, which don't exist) **File breakdown:** - `cmd/server/main.go` - 62 lines (entry point) - `internal/db/db.go` - 907 lines (database layer, largest file) - `internal/models/models.go` - 65 lines (data structures) - `internal/api/handlers.go` - ~150 lines (HTTP handlers) - `internal/api/ratelimit.go` - ~80 lines (rate limiting) - `internal/api/openapi.go` - ~100 lines (OpenAPI spec) **Characteristics:** - No web framework (stdlib only) - No ORM (raw SQL) - No test files (zero test coverage) - No configuration files (CLI flags only) - Minimal dependencies (2 external packages) ## Configuration ### CLI Flags **Defined in:** `cmd/server/main.go` ```go var ( dbPath = flag.String("db", "", "path to database file (required)") addr = flag.String("addr", ":8080", "HTTP server address") ) ``` **Usage:** ```bash ./metadata-api -db /data/main_database.sqlite3 -addr :8080 ``` **Limitations:** - Only 2 configurable parameters - No environment variable support - No configuration file support - All timeouts hardcoded - All limits hardcoded ### Hardcoded Configuration **Timeouts:** ```go // Graceful shutdown timeout shutdownTimeout := 10 * time.Second // Search query timeout ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second) ``` **Rate limiting:** ```go // Hardcoded in api/ratelimit.go rateLimiter := NewRateLimiter(100, 200) // 100 req/s, 200 burst ``` **Database connection pool:** ```go // Hardcoded in db/db.go db.SetMaxOpenConns(8) db.SetMaxIdleConns(8) db.SetConnMaxLifetime(0) ``` **Search limits:** ```go // Hardcoded in api/handlers.go const ( minQueryLength = 2 maxSearchLimit = 50 defaultLimit = 10 ) ``` **Batch limits:** ```go // Hardcoded in api/handlers.go const maxBatchItems = 400 ``` **SQLite PRAGMAs:** ```go // Hardcoded in db/db.go dsn := fmt.Sprintf("file:%s?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true", dbPath) ``` **Recommendation:** Extract to configuration struct for flexibility. ### Environment Variables **docker-compose.yml defines:** ```yaml environment: - LOG_LEVEL=info ``` **BUG:** `LOG_LEVEL` is not used in code. No log level control implemented. **Expected behavior:** Filter logs by level (debug, info, warn, error) **Actual behavior:** All logs output (no filtering) **Fix required:** ```go // Add to main.go logLevel := os.Getenv("LOG_LEVEL") if logLevel == "" { logLevel = "info" } var level slog.Level switch logLevel { case "debug": level = slog.LevelDebug case "info": level = slog.LevelInfo case "warn": level = slog.LevelWarn case "error": level = slog.LevelError } logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{Level: level})) ``` ## Logging ### Implementation **Package:** Go stdlib `log/slog` (structured logging) **Usage pattern:** ```go slog.Error("Database query failed", "error", err, "query", query) ``` **Output format:** ```json {"time":"2024-01-15T10:30:00Z","level":"ERROR","msg":"Database query failed","error":"no such table","query":"SELECT * FROM tracks"} ``` ### Logging Locations **Error logging only:** - Database query failures - JSON decode errors - HTTP handler errors - Graceful shutdown errors **No info/debug logging:** - Request logging (no access logs) - Query execution logging - Performance metrics - Startup messages **Example from db.go:** ```go rows, err := d.mainDB.Query(query, args...) if err != nil { slog.Error("Query failed", "error", err, "query", query) return nil, err } ``` ### Log Level Control **Current:** No log level filtering (all logs output) **Missing:** - Debug logs (query details, timing) - Info logs (startup, shutdown, requests) - Warn logs (rate limiting, slow queries) **Recommendation:** Implement log level control via environment variable. ## Health Checks ### Naive Implementation **Endpoint:** `GET /health` **Code:** ```go func handleHealth(w http.ResponseWriter, r *http.Request) { w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(map[string]string{"status": "ok"}) } ``` **Response:** ```json {"status":"ok"} ``` **Problem:** Always returns 200 OK, even if database is unreachable. **Test:** ```bash # Stop database (simulate failure) mv /data/main_database.sqlite3 /data/main_database.sqlite3.bak # Health check still returns OK curl http://localhost:8080/health # {"status":"ok"} # But actual queries fail curl http://localhost:8080/lookup/track/abc123 # 500 Internal Server Error ``` ### Improved Health Check **Recommendation:** ```go func handleHealth(db *sql.DB) http.HandlerFunc { return func(w http.ResponseWriter, r *http.Request) { // Ping database ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second) defer cancel() if err := db.PingContext(ctx); err != nil { w.WriteHeader(http.StatusServiceUnavailable) json.NewEncoder(w).Encode(map[string]string{ "status": "unhealthy", "error": "database unavailable", }) return } // Optional: Test query var count int err := db.QueryRowContext(ctx, "SELECT COUNT(*) FROM tracks LIMIT 1").Scan(&count) if err != nil { w.WriteHeader(http.StatusServiceUnavailable) json.NewEncoder(w).Encode(map[string]string{ "status": "unhealthy", "error": "database query failed", }) return } json.NewEncoder(w).Encode(map[string]string{"status": "ok"}) } } ``` ## Rate Limiting ### Implementation **File:** `internal/api/ratelimit.go` **Algorithm:** Token bucket per IP **Data structure:** ```go type RateLimiter struct { visitors map[string]*rate.Limiter // IP -> limiter mu sync.RWMutex // Protects visitors map rate rate.Limit // Tokens per second burst int // Burst capacity } ``` **Middleware:** ```go func (rl *RateLimiter) Limit(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { // Extract IP ip := getIP(r) // Get or create limiter for IP limiter := rl.getLimiter(ip) // Check if allowed if !limiter.Allow() { w.Header().Set("Retry-After", "1") http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests) return } next.ServeHTTP(w, r) }) } ``` **IP extraction:** ```go func getIP(r *http.Request) string { // Check X-Forwarded-For header (proxy/load balancer) forwarded := r.Header.Get("X-Forwarded-For") if forwarded != "" { // Take first IP if comma-separated ips := strings.Split(forwarded, ",") return strings.TrimSpace(ips[0]) } // Fallback to RemoteAddr ip, _, _ := net.SplitHostPort(r.RemoteAddr) return ip } ``` ### Memory Leak **Problem:** Visitor map grows unbounded. No cleanup for inactive IPs. **Code:** ```go func (rl *RateLimiter) getLimiter(ip string) *rate.Limiter { rl.mu.Lock() defer rl.mu.Unlock() limiter, exists := rl.visitors[ip] if !exists { limiter = rate.NewLimiter(rl.rate, rl.burst) rl.visitors[ip] = limiter // BUG: Never removed } return limiter } ``` **Impact:** - Long-running servers accumulate IPs - Memory usage grows over time - No expiration for inactive IPs **Example:** - 1 million unique IPs over 1 month - ~100 bytes per limiter - ~100MB memory leak **Fix:** ```go type visitor struct { limiter *rate.Limiter lastSeen time.Time } func (rl *RateLimiter) cleanup() { ticker := time.NewTicker(1 * time.Hour) defer ticker.Stop() for range ticker.C { rl.mu.Lock() for ip, v := range rl.visitors { // Remove visitors inactive for 24 hours if time.Since(v.lastSeen) > 24*time.Hour { delete(rl.visitors, ip) } } rl.mu.Unlock() } } // Start cleanup goroutine in NewRateLimiter go rl.cleanup() ``` ### Rate Limit Configuration **Current:** Hardcoded (100 req/s, 200 burst) **Recommendation:** Make configurable via CLI flags or environment variables. ```go // CLI flags var ( rateLimit = flag.Int("rate-limit", 100, "requests per second") rateBurst = flag.Int("rate-burst", 200, "burst capacity") ) // Usage rateLimiter := api.NewRateLimiter(rate.Limit(*rateLimit), *rateBurst) ``` ## Search Implementation ### Query Pattern **Track search:** ```go query := ` SELECT id, name, isrc, duration_ms, popularity, album_rowid FROM tracks WHERE name LIKE ? COLLATE NOCASE ORDER BY popularity DESC LIMIT ? ` args := []interface{}{"%" + searchQuery + "%", limit} ``` **Artist search:** ```go query := ` SELECT id, name, followers_total, popularity FROM artists WHERE name LIKE ? COLLATE NOCASE ORDER BY followers_total DESC LIMIT ? ` args := []interface{}{"%" + searchQuery + "%", limit} ``` ### Performance Characteristics **LIKE %query% problems:** - Can't use indexes (full table scan) - Slow on 256M rows - CPU-intensive (string matching) **Benchmark (estimated):** - Common query ("love"): 5-10 seconds - Specific query ("bohemian rhapsody"): 1-2 seconds - Rare query ("xyzabc"): 10+ seconds (full scan) **10-second timeout:** ```go ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second) defer cancel() rows, err := db.QueryContext(ctx, query, args...) if err == context.DeadlineExceeded { http.Error(w, "Search timeout", http.StatusGatewayTimeout) return } ``` ### Search Validation **Minimum query length:** ```go if len(searchQuery) < 2 { http.Error(w, "Query must be at least 2 characters", http.StatusBadRequest) return } ``` **Maximum limit:** ```go if limit > 50 { http.Error(w, "Limit cannot exceed 50", http.StatusBadRequest) return } ``` **Default limit:** ```go limit := 10 if limitParam := r.URL.Query().Get("limit"); limitParam != "" { limit, _ = strconv.Atoi(limitParam) } ``` ### Full-Text Search Alternative **Not implemented:** SQLite FTS5 (Full-Text Search) **FTS5 benefits:** - Indexed search (much faster) - Relevance ranking - Phrase search - Boolean operators **Why not used:** - Requires writable database (to create FTS5 table) - Databases are read-only - Would need separate FTS5 database **Workaround:** ```sql -- Create separate FTS5 database (one-time setup) CREATE VIRTUAL TABLE tracks_fts USING fts5(id, name, content=tracks); INSERT INTO tracks_fts SELECT id, name FROM tracks; -- Fast search SELECT * FROM tracks_fts WHERE name MATCH 'bohemian'; ``` **Implementation:** - Create FTS5 database during database preparation - Open second database connection in code - Query FTS5 for search, then fetch full data from main DB ## Testing ### Test Coverage **Test files:** 0 **Test coverage:** 0% **Test framework:** None **Evidence:** ```bash # No test files in repository find . -name "*_test.go" # (no output) ``` **.gitignore includes:** ``` coverage.out ``` **Implication:** Testing was planned but never implemented. ### CI/CD Testing **GitHub Actions workflow:** `.github/workflows/docker-publish.yml` **Steps:** 1. Checkout code 2. Build Docker image 3. Push to registry **Missing:** No test step **Expected workflow:** ```yaml - name: Run tests run: go test -v ./... - name: Check coverage run: go test -cover ./... ``` ### Manual Testing **Only testing:** Manual API calls **Example:** ```bash # Health check curl http://localhost:8080/health # Track lookup curl http://localhost:8080/lookup/track/abc123 # Search curl "http://localhost:8080/search/track?q=test" ``` **No automated testing:** - No unit tests - No integration tests - No end-to-end tests - No performance tests - No load tests ### Testing Recommendations **Unit tests needed:** - Rate limiter logic - IP extraction - Query building - Data enrichment - JSON serialization **Integration tests needed:** - Database queries - HTTP handlers - Batch operations - Search functionality **Example unit test:** ```go // internal/api/ratelimit_test.go func TestRateLimiter(t *testing.T) { rl := NewRateLimiter(10, 20) // 10 req/s, 20 burst // Should allow burst for i := 0; i < 20; i++ { if !rl.getLimiter("127.0.0.1").Allow() { t.Errorf("Request %d should be allowed", i) } } // Should reject 21st request if rl.getLimiter("127.0.0.1").Allow() { t.Error("Request 21 should be rate limited") } } ``` **Example integration test:** ```go // internal/db/db_test.go func TestGetTrack(t *testing.T) { db, err := NewDatabase("testdata/test.db") if err != nil { t.Fatal(err) } defer db.Close() track, err := db.GetTrack("test_track_id") if err != nil { t.Fatal(err) } if track.Name != "Test Track" { t.Errorf("Expected 'Test Track', got '%s'", track.Name) } } ``` ## Error Handling ### Error Patterns **Database errors:** ```go rows, err := db.Query(query, args...) if err != nil { slog.Error("Query failed", "error", err) http.Error(w, "Internal server error", http.StatusInternalServerError) return } ``` **JSON decode errors:** ```go var req BatchRequest if err := json.NewDecoder(r.Body).Decode(&req); err != nil { http.Error(w, "Invalid JSON", http.StatusBadRequest) return } ``` **Validation errors:** ```go if len(query) < 2 { http.Error(w, "Query too short", http.StatusBadRequest) return } ``` ### Error Responses **Generic errors:** ```go http.Error(w, "Internal server error", http.StatusInternalServerError) ``` **Problem:** No error details returned to client (security vs usability tradeoff) **Structured errors (not implemented):** ```go type ErrorResponse struct { Error string `json:"error"` Code string `json:"code"` Details string `json:"details,omitempty"` } func writeError(w http.ResponseWriter, status int, code, message string) { w.WriteHeader(status) json.NewEncoder(w).Encode(ErrorResponse{ Error: message, Code: code, }) } ``` ## Code Quality ### Strengths **Simplicity:** - Small codebase (~1,100 lines) - Easy to understand - Minimal dependencies - No framework magic **Readability:** - Clear function names - Logical file organization - Consistent style **Performance:** - Batch optimization (343x fewer queries) - Connection pooling - Memory-mapped I/O ### Weaknesses **No tests:** - Zero test coverage - No regression protection - No documentation via tests **Hardcoded config:** - No flexibility - Requires recompilation to change limits - No environment-specific config **Memory leak:** - Rate limiter visitor map grows unbounded - Requires periodic restarts **Naive health check:** - Doesn't verify database connectivity - False positives in monitoring **No metrics:** - No visibility into performance - No error rate tracking - No usage analytics **Unused config:** - `LOG_LEVEL` environment variable ignored - Misleading documentation **No CORS:** - Browser-based clients blocked - Requires reverse proxy workaround **No authentication:** - Public API (security risk) - No usage tracking per user ### Code Smells **Magic numbers:** ```go // What is 64000? Why 1073741824? _cache_size=-64000&_mmap_size=1073741824 ``` **Fix:** Use named constants ```go const ( sqliteCacheSizeKB = 64000 // 64MB sqliteMmapSizeBytes = 1 << 30 // 1GB ) ``` **Repeated code:** ```go // Similar enrichment logic repeated for tracks, albums, artists func enrichTrack(track *Track) { /* ... */ } func enrichAlbum(album *Album) { /* ... */ } func enrichArtist(artist *Artist) { /* ... */ } ``` **Fix:** Generic enrichment function **Global state:** ```go // Rate limiter as global variable (not shown in code, but implied) var rateLimiter *RateLimiter ``` **Fix:** Dependency injection ## Dependencies ### External Packages **modernc.org/sqlite v1.34.4:** - Pure Go SQLite driver - No CGO required - 100% Go implementation - Larger binary size vs CGO version **golang.org/x/time v0.14.0:** - Rate limiting (token bucket) - Part of Go extended stdlib - Minimal, focused package **Total dependencies:** 2 direct + transitive dependencies ### Dependency Management **go.mod:** ```go module github.com/Aunali321/music-metadata-api go 1.24 require ( modernc.org/sqlite v1.34.4 golang.org/x/time v0.14.0 ) ``` **Dependency updates:** ```bash # Check for updates go list -u -m all # Update dependencies go get -u ./... go mod tidy ``` **Security scanning:** ```bash # Scan for vulnerabilities go list -json -m all | nancy sleuth ``` ## Code Organization ### Package Structure ``` music-metadata-api/ ├── cmd/ │ └── server/ # Entry point │ └── main.go # CLI, server setup, graceful shutdown │ ├── internal/ # Private packages │ ├── api/ # HTTP layer │ │ ├── handlers.go # Route handlers │ │ ├── ratelimit.go # Rate limiting middleware │ │ └── openapi.go # OpenAPI spec │ │ │ ├── db/ # Database layer │ │ └── db.go # Queries, enrichment, batch optimization │ │ │ └── models/ # Data models │ └── models.go # Structs, JSON tags │ ├── Dockerfile # Container build ├── docker-compose.yml # Local deployment ├── go.mod # Dependencies └── .github/ └── workflows/ └── docker-publish.yml # CI/CD ``` ### Separation of Concerns **Good:** - Clear layer boundaries (API → DB → Models) - No circular dependencies - Database logic isolated from HTTP **Could improve:** - Extract configuration to separate package - Extract validation to separate package - Extract error handling to separate package ## Performance Characteristics ### Bottlenecks **Search queries:** - `LIKE %query%` full table scan - 10-second timeout (can be hit) - CPU-bound (string matching) **Rate limiter:** - RWMutex contention under high load - Map lookup on every request **Database:** - Single SQLite file (no sharding) - 8 connection limit (conservative) ### Optimizations **Batch queries:** - 343x fewer queries (400 items: 7 queries vs 2,800) - IN clause for bulk lookups **Connection pooling:** - Reuse connections (no overhead) - 8 warm connections **Memory-mapped I/O:** - 1GB mmap (faster than read() syscalls) - OS handles paging **Read-only mode:** - No write locks - Safe concurrent reads ## Maintainability ### Documentation **Code comments:** Minimal **README:** Basic (installation, usage) **OpenAPI spec:** Comprehensive (all endpoints documented) **No inline documentation:** ```go // No function comments func enrichTrack(track *Track) { // No explanation of enrichment logic } ``` **Recommendation:** Add godoc comments ```go // enrichTrack populates track with related entities (artists, album, track files). // It performs batch queries to minimize database round-trips. func enrichTrack(track *Track) { // ... } ``` ### Extensibility **Easy to extend:** - Add new endpoints (register route) - Add new models (define struct) - Add new queries (write SQL) **Hard to extend:** - Change rate limiting strategy (tightly coupled) - Add authentication (no middleware chain) - Add metrics (no instrumentation points) ### Technical Debt **High priority:** 1. Fix rate limiter memory leak 2. Implement proper health check 3. Add test coverage 4. Use LOG_LEVEL environment variable **Medium priority:** 1. Extract hardcoded config 2. Add metrics/monitoring 3. Implement CORS support 4. Add authentication **Low priority:** 1. Improve search performance (FTS5) 2. Add caching layer 3. Structured error responses 4. Request logging