Files
metadata-agregator/docs/research/music-metadata-api/analysis/CODEBASE.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

20 KiB

Music Metadata API - Codebase Analysis

Codebase Overview

Music Metadata API is a small, focused Go codebase with minimal complexity:

Total lines of code: ~1,100 lines (excluding tests, which don't exist)

File breakdown:

  • cmd/server/main.go - 62 lines (entry point)
  • internal/db/db.go - 907 lines (database layer, largest file)
  • internal/models/models.go - 65 lines (data structures)
  • internal/api/handlers.go - ~150 lines (HTTP handlers)
  • internal/api/ratelimit.go - ~80 lines (rate limiting)
  • internal/api/openapi.go - ~100 lines (OpenAPI spec)

Characteristics:

  • No web framework (stdlib only)
  • No ORM (raw SQL)
  • No test files (zero test coverage)
  • No configuration files (CLI flags only)
  • Minimal dependencies (2 external packages)

Configuration

CLI Flags

Defined in: cmd/server/main.go

var (
    dbPath = flag.String("db", "", "path to database file (required)")
    addr   = flag.String("addr", ":8080", "HTTP server address")
)

Usage:

./metadata-api -db /data/main_database.sqlite3 -addr :8080

Limitations:

  • Only 2 configurable parameters
  • No environment variable support
  • No configuration file support
  • All timeouts hardcoded
  • All limits hardcoded

Hardcoded Configuration

Timeouts:

// Graceful shutdown timeout
shutdownTimeout := 10 * time.Second

// Search query timeout
ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)

Rate limiting:

// Hardcoded in api/ratelimit.go
rateLimiter := NewRateLimiter(100, 200)  // 100 req/s, 200 burst

Database connection pool:

// Hardcoded in db/db.go
db.SetMaxOpenConns(8)
db.SetMaxIdleConns(8)
db.SetConnMaxLifetime(0)

Search limits:

// Hardcoded in api/handlers.go
const (
    minQueryLength = 2
    maxSearchLimit = 50
    defaultLimit   = 10
)

Batch limits:

// Hardcoded in api/handlers.go
const maxBatchItems = 400

SQLite PRAGMAs:

// Hardcoded in db/db.go
dsn := fmt.Sprintf("file:%s?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true", dbPath)

Recommendation: Extract to configuration struct for flexibility.

Environment Variables

docker-compose.yml defines:

environment:
  - LOG_LEVEL=info

BUG: LOG_LEVEL is not used in code. No log level control implemented.

Expected behavior: Filter logs by level (debug, info, warn, error)

Actual behavior: All logs output (no filtering)

Fix required:

// Add to main.go
logLevel := os.Getenv("LOG_LEVEL")
if logLevel == "" {
    logLevel = "info"
}

var level slog.Level
switch logLevel {
case "debug":
    level = slog.LevelDebug
case "info":
    level = slog.LevelInfo
case "warn":
    level = slog.LevelWarn
case "error":
    level = slog.LevelError
}

logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{Level: level}))

Logging

Implementation

Package: Go stdlib log/slog (structured logging)

Usage pattern:

slog.Error("Database query failed", "error", err, "query", query)

Output format:

{"time":"2024-01-15T10:30:00Z","level":"ERROR","msg":"Database query failed","error":"no such table","query":"SELECT * FROM tracks"}

Logging Locations

Error logging only:

  • Database query failures
  • JSON decode errors
  • HTTP handler errors
  • Graceful shutdown errors

No info/debug logging:

  • Request logging (no access logs)
  • Query execution logging
  • Performance metrics
  • Startup messages

Example from db.go:

rows, err := d.mainDB.Query(query, args...)
if err != nil {
    slog.Error("Query failed", "error", err, "query", query)
    return nil, err
}

Log Level Control

Current: No log level filtering (all logs output)

Missing:

  • Debug logs (query details, timing)
  • Info logs (startup, shutdown, requests)
  • Warn logs (rate limiting, slow queries)

Recommendation: Implement log level control via environment variable.

Health Checks

Naive Implementation

Endpoint: GET /health

Code:

func handleHealth(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
}

Response:

{"status":"ok"}

Problem: Always returns 200 OK, even if database is unreachable.

Test:

# Stop database (simulate failure)
mv /data/main_database.sqlite3 /data/main_database.sqlite3.bak

# Health check still returns OK
curl http://localhost:8080/health
# {"status":"ok"}

# But actual queries fail
curl http://localhost:8080/lookup/track/abc123
# 500 Internal Server Error

Improved Health Check

Recommendation:

func handleHealth(db *sql.DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        // Ping database
        ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
        defer cancel()
        
        if err := db.PingContext(ctx); err != nil {
            w.WriteHeader(http.StatusServiceUnavailable)
            json.NewEncoder(w).Encode(map[string]string{
                "status": "unhealthy",
                "error":  "database unavailable",
            })
            return
        }
        
        // Optional: Test query
        var count int
        err := db.QueryRowContext(ctx, "SELECT COUNT(*) FROM tracks LIMIT 1").Scan(&count)
        if err != nil {
            w.WriteHeader(http.StatusServiceUnavailable)
            json.NewEncoder(w).Encode(map[string]string{
                "status": "unhealthy",
                "error":  "database query failed",
            })
            return
        }
        
        json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
    }
}

Rate Limiting

Implementation

File: internal/api/ratelimit.go

Algorithm: Token bucket per IP

Data structure:

type RateLimiter struct {
    visitors map[string]*rate.Limiter  // IP -> limiter
    mu       sync.RWMutex               // Protects visitors map
    rate     rate.Limit                 // Tokens per second
    burst    int                        // Burst capacity
}

Middleware:

func (rl *RateLimiter) Limit(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Extract IP
        ip := getIP(r)
        
        // Get or create limiter for IP
        limiter := rl.getLimiter(ip)
        
        // Check if allowed
        if !limiter.Allow() {
            w.Header().Set("Retry-After", "1")
            http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
            return
        }
        
        next.ServeHTTP(w, r)
    })
}

IP extraction:

func getIP(r *http.Request) string {
    // Check X-Forwarded-For header (proxy/load balancer)
    forwarded := r.Header.Get("X-Forwarded-For")
    if forwarded != "" {
        // Take first IP if comma-separated
        ips := strings.Split(forwarded, ",")
        return strings.TrimSpace(ips[0])
    }
    
    // Fallback to RemoteAddr
    ip, _, _ := net.SplitHostPort(r.RemoteAddr)
    return ip
}

Memory Leak

Problem: Visitor map grows unbounded. No cleanup for inactive IPs.

Code:

func (rl *RateLimiter) getLimiter(ip string) *rate.Limiter {
    rl.mu.Lock()
    defer rl.mu.Unlock()
    
    limiter, exists := rl.visitors[ip]
    if !exists {
        limiter = rate.NewLimiter(rl.rate, rl.burst)
        rl.visitors[ip] = limiter  // BUG: Never removed
    }
    
    return limiter
}

Impact:

  • Long-running servers accumulate IPs
  • Memory usage grows over time
  • No expiration for inactive IPs

Example:

  • 1 million unique IPs over 1 month
  • ~100 bytes per limiter
  • ~100MB memory leak

Fix:

type visitor struct {
    limiter  *rate.Limiter
    lastSeen time.Time
}

func (rl *RateLimiter) cleanup() {
    ticker := time.NewTicker(1 * time.Hour)
    defer ticker.Stop()
    
    for range ticker.C {
        rl.mu.Lock()
        for ip, v := range rl.visitors {
            // Remove visitors inactive for 24 hours
            if time.Since(v.lastSeen) > 24*time.Hour {
                delete(rl.visitors, ip)
            }
        }
        rl.mu.Unlock()
    }
}

// Start cleanup goroutine in NewRateLimiter
go rl.cleanup()

Rate Limit Configuration

Current: Hardcoded (100 req/s, 200 burst)

Recommendation: Make configurable via CLI flags or environment variables.

// CLI flags
var (
    rateLimit = flag.Int("rate-limit", 100, "requests per second")
    rateBurst = flag.Int("rate-burst", 200, "burst capacity")
)

// Usage
rateLimiter := api.NewRateLimiter(rate.Limit(*rateLimit), *rateBurst)

Search Implementation

Query Pattern

Track search:

query := `
    SELECT id, name, isrc, duration_ms, popularity, album_rowid
    FROM tracks
    WHERE name LIKE ? COLLATE NOCASE
    ORDER BY popularity DESC
    LIMIT ?
`
args := []interface{}{"%" + searchQuery + "%", limit}

Artist search:

query := `
    SELECT id, name, followers_total, popularity
    FROM artists
    WHERE name LIKE ? COLLATE NOCASE
    ORDER BY followers_total DESC
    LIMIT ?
`
args := []interface{}{"%" + searchQuery + "%", limit}

Performance Characteristics

LIKE %query% problems:

  • Can't use indexes (full table scan)
  • Slow on 256M rows
  • CPU-intensive (string matching)

Benchmark (estimated):

  • Common query ("love"): 5-10 seconds
  • Specific query ("bohemian rhapsody"): 1-2 seconds
  • Rare query ("xyzabc"): 10+ seconds (full scan)

10-second timeout:

ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)
defer cancel()

rows, err := db.QueryContext(ctx, query, args...)
if err == context.DeadlineExceeded {
    http.Error(w, "Search timeout", http.StatusGatewayTimeout)
    return
}

Search Validation

Minimum query length:

if len(searchQuery) < 2 {
    http.Error(w, "Query must be at least 2 characters", http.StatusBadRequest)
    return
}

Maximum limit:

if limit > 50 {
    http.Error(w, "Limit cannot exceed 50", http.StatusBadRequest)
    return
}

Default limit:

limit := 10
if limitParam := r.URL.Query().Get("limit"); limitParam != "" {
    limit, _ = strconv.Atoi(limitParam)
}

Full-Text Search Alternative

Not implemented: SQLite FTS5 (Full-Text Search)

FTS5 benefits:

  • Indexed search (much faster)
  • Relevance ranking
  • Phrase search
  • Boolean operators

Why not used:

  • Requires writable database (to create FTS5 table)
  • Databases are read-only
  • Would need separate FTS5 database

Workaround:

-- Create separate FTS5 database (one-time setup)
CREATE VIRTUAL TABLE tracks_fts USING fts5(id, name, content=tracks);
INSERT INTO tracks_fts SELECT id, name FROM tracks;

-- Fast search
SELECT * FROM tracks_fts WHERE name MATCH 'bohemian';

Implementation:

  • Create FTS5 database during database preparation
  • Open second database connection in code
  • Query FTS5 for search, then fetch full data from main DB

Testing

Test Coverage

Test files: 0
Test coverage: 0%
Test framework: None

Evidence:

# No test files in repository
find . -name "*_test.go"
# (no output)

.gitignore includes:

coverage.out

Implication: Testing was planned but never implemented.

CI/CD Testing

GitHub Actions workflow: .github/workflows/docker-publish.yml

Steps:

  1. Checkout code
  2. Build Docker image
  3. Push to registry

Missing: No test step

Expected workflow:

- name: Run tests
  run: go test -v ./...

- name: Check coverage
  run: go test -cover ./...

Manual Testing

Only testing: Manual API calls

Example:

# Health check
curl http://localhost:8080/health

# Track lookup
curl http://localhost:8080/lookup/track/abc123

# Search
curl "http://localhost:8080/search/track?q=test"

No automated testing:

  • No unit tests
  • No integration tests
  • No end-to-end tests
  • No performance tests
  • No load tests

Testing Recommendations

Unit tests needed:

  • Rate limiter logic
  • IP extraction
  • Query building
  • Data enrichment
  • JSON serialization

Integration tests needed:

  • Database queries
  • HTTP handlers
  • Batch operations
  • Search functionality

Example unit test:

// internal/api/ratelimit_test.go
func TestRateLimiter(t *testing.T) {
    rl := NewRateLimiter(10, 20)  // 10 req/s, 20 burst
    
    // Should allow burst
    for i := 0; i < 20; i++ {
        if !rl.getLimiter("127.0.0.1").Allow() {
            t.Errorf("Request %d should be allowed", i)
        }
    }
    
    // Should reject 21st request
    if rl.getLimiter("127.0.0.1").Allow() {
        t.Error("Request 21 should be rate limited")
    }
}

Example integration test:

// internal/db/db_test.go
func TestGetTrack(t *testing.T) {
    db, err := NewDatabase("testdata/test.db")
    if err != nil {
        t.Fatal(err)
    }
    defer db.Close()
    
    track, err := db.GetTrack("test_track_id")
    if err != nil {
        t.Fatal(err)
    }
    
    if track.Name != "Test Track" {
        t.Errorf("Expected 'Test Track', got '%s'", track.Name)
    }
}

Error Handling

Error Patterns

Database errors:

rows, err := db.Query(query, args...)
if err != nil {
    slog.Error("Query failed", "error", err)
    http.Error(w, "Internal server error", http.StatusInternalServerError)
    return
}

JSON decode errors:

var req BatchRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
    http.Error(w, "Invalid JSON", http.StatusBadRequest)
    return
}

Validation errors:

if len(query) < 2 {
    http.Error(w, "Query too short", http.StatusBadRequest)
    return
}

Error Responses

Generic errors:

http.Error(w, "Internal server error", http.StatusInternalServerError)

Problem: No error details returned to client (security vs usability tradeoff)

Structured errors (not implemented):

type ErrorResponse struct {
    Error   string `json:"error"`
    Code    string `json:"code"`
    Details string `json:"details,omitempty"`
}

func writeError(w http.ResponseWriter, status int, code, message string) {
    w.WriteHeader(status)
    json.NewEncoder(w).Encode(ErrorResponse{
        Error: message,
        Code:  code,
    })
}

Code Quality

Strengths

Simplicity:

  • Small codebase (~1,100 lines)
  • Easy to understand
  • Minimal dependencies
  • No framework magic

Readability:

  • Clear function names
  • Logical file organization
  • Consistent style

Performance:

  • Batch optimization (343x fewer queries)
  • Connection pooling
  • Memory-mapped I/O

Weaknesses

No tests:

  • Zero test coverage
  • No regression protection
  • No documentation via tests

Hardcoded config:

  • No flexibility
  • Requires recompilation to change limits
  • No environment-specific config

Memory leak:

  • Rate limiter visitor map grows unbounded
  • Requires periodic restarts

Naive health check:

  • Doesn't verify database connectivity
  • False positives in monitoring

No metrics:

  • No visibility into performance
  • No error rate tracking
  • No usage analytics

Unused config:

  • LOG_LEVEL environment variable ignored
  • Misleading documentation

No CORS:

  • Browser-based clients blocked
  • Requires reverse proxy workaround

No authentication:

  • Public API (security risk)
  • No usage tracking per user

Code Smells

Magic numbers:

// What is 64000? Why 1073741824?
_cache_size=-64000&_mmap_size=1073741824

Fix: Use named constants

const (
    sqliteCacheSizeKB = 64000  // 64MB
    sqliteMmapSizeBytes = 1 << 30  // 1GB
)

Repeated code:

// Similar enrichment logic repeated for tracks, albums, artists
func enrichTrack(track *Track) { /* ... */ }
func enrichAlbum(album *Album) { /* ... */ }
func enrichArtist(artist *Artist) { /* ... */ }

Fix: Generic enrichment function

Global state:

// Rate limiter as global variable (not shown in code, but implied)
var rateLimiter *RateLimiter

Fix: Dependency injection

Dependencies

External Packages

modernc.org/sqlite v1.34.4:

  • Pure Go SQLite driver
  • No CGO required
  • 100% Go implementation
  • Larger binary size vs CGO version

golang.org/x/time v0.14.0:

  • Rate limiting (token bucket)
  • Part of Go extended stdlib
  • Minimal, focused package

Total dependencies: 2 direct + transitive dependencies

Dependency Management

go.mod:

module github.com/Aunali321/music-metadata-api

go 1.24

require (
    modernc.org/sqlite v1.34.4
    golang.org/x/time v0.14.0
)

Dependency updates:

# Check for updates
go list -u -m all

# Update dependencies
go get -u ./...
go mod tidy

Security scanning:

# Scan for vulnerabilities
go list -json -m all | nancy sleuth

Code Organization

Package Structure

music-metadata-api/
├── cmd/
│   └── server/          # Entry point
│       └── main.go      # CLI, server setup, graceful shutdown
│
├── internal/            # Private packages
│   ├── api/             # HTTP layer
│   │   ├── handlers.go  # Route handlers
│   │   ├── ratelimit.go # Rate limiting middleware
│   │   └── openapi.go   # OpenAPI spec
│   │
│   ├── db/              # Database layer
│   │   └── db.go        # Queries, enrichment, batch optimization
│   │
│   └── models/          # Data models
│       └── models.go    # Structs, JSON tags
│
├── Dockerfile           # Container build
├── docker-compose.yml   # Local deployment
├── go.mod               # Dependencies
└── .github/
    └── workflows/
        └── docker-publish.yml  # CI/CD

Separation of Concerns

Good:

  • Clear layer boundaries (API → DB → Models)
  • No circular dependencies
  • Database logic isolated from HTTP

Could improve:

  • Extract configuration to separate package
  • Extract validation to separate package
  • Extract error handling to separate package

Performance Characteristics

Bottlenecks

Search queries:

  • LIKE %query% full table scan
  • 10-second timeout (can be hit)
  • CPU-bound (string matching)

Rate limiter:

  • RWMutex contention under high load
  • Map lookup on every request

Database:

  • Single SQLite file (no sharding)
  • 8 connection limit (conservative)

Optimizations

Batch queries:

  • 343x fewer queries (400 items: 7 queries vs 2,800)
  • IN clause for bulk lookups

Connection pooling:

  • Reuse connections (no overhead)
  • 8 warm connections

Memory-mapped I/O:

  • 1GB mmap (faster than read() syscalls)
  • OS handles paging

Read-only mode:

  • No write locks
  • Safe concurrent reads

Maintainability

Documentation

Code comments: Minimal

README: Basic (installation, usage)

OpenAPI spec: Comprehensive (all endpoints documented)

No inline documentation:

// No function comments
func enrichTrack(track *Track) {
    // No explanation of enrichment logic
}

Recommendation: Add godoc comments

// enrichTrack populates track with related entities (artists, album, track files).
// It performs batch queries to minimize database round-trips.
func enrichTrack(track *Track) {
    // ...
}

Extensibility

Easy to extend:

  • Add new endpoints (register route)
  • Add new models (define struct)
  • Add new queries (write SQL)

Hard to extend:

  • Change rate limiting strategy (tightly coupled)
  • Add authentication (no middleware chain)
  • Add metrics (no instrumentation points)

Technical Debt

High priority:

  1. Fix rate limiter memory leak
  2. Implement proper health check
  3. Add test coverage
  4. Use LOG_LEVEL environment variable

Medium priority:

  1. Extract hardcoded config
  2. Add metrics/monitoring
  3. Implement CORS support
  4. Add authentication

Low priority:

  1. Improve search performance (FTS5)
  2. Add caching layer
  3. Structured error responses
  4. Request logging