- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
20 KiB
Music Metadata API - Codebase Analysis
Codebase Overview
Music Metadata API is a small, focused Go codebase with minimal complexity:
Total lines of code: ~1,100 lines (excluding tests, which don't exist)
File breakdown:
cmd/server/main.go- 62 lines (entry point)internal/db/db.go- 907 lines (database layer, largest file)internal/models/models.go- 65 lines (data structures)internal/api/handlers.go- ~150 lines (HTTP handlers)internal/api/ratelimit.go- ~80 lines (rate limiting)internal/api/openapi.go- ~100 lines (OpenAPI spec)
Characteristics:
- No web framework (stdlib only)
- No ORM (raw SQL)
- No test files (zero test coverage)
- No configuration files (CLI flags only)
- Minimal dependencies (2 external packages)
Configuration
CLI Flags
Defined in: cmd/server/main.go
var (
dbPath = flag.String("db", "", "path to database file (required)")
addr = flag.String("addr", ":8080", "HTTP server address")
)
Usage:
./metadata-api -db /data/main_database.sqlite3 -addr :8080
Limitations:
- Only 2 configurable parameters
- No environment variable support
- No configuration file support
- All timeouts hardcoded
- All limits hardcoded
Hardcoded Configuration
Timeouts:
// Graceful shutdown timeout
shutdownTimeout := 10 * time.Second
// Search query timeout
ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)
Rate limiting:
// Hardcoded in api/ratelimit.go
rateLimiter := NewRateLimiter(100, 200) // 100 req/s, 200 burst
Database connection pool:
// Hardcoded in db/db.go
db.SetMaxOpenConns(8)
db.SetMaxIdleConns(8)
db.SetConnMaxLifetime(0)
Search limits:
// Hardcoded in api/handlers.go
const (
minQueryLength = 2
maxSearchLimit = 50
defaultLimit = 10
)
Batch limits:
// Hardcoded in api/handlers.go
const maxBatchItems = 400
SQLite PRAGMAs:
// Hardcoded in db/db.go
dsn := fmt.Sprintf("file:%s?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true", dbPath)
Recommendation: Extract to configuration struct for flexibility.
Environment Variables
docker-compose.yml defines:
environment:
- LOG_LEVEL=info
BUG: LOG_LEVEL is not used in code. No log level control implemented.
Expected behavior: Filter logs by level (debug, info, warn, error)
Actual behavior: All logs output (no filtering)
Fix required:
// Add to main.go
logLevel := os.Getenv("LOG_LEVEL")
if logLevel == "" {
logLevel = "info"
}
var level slog.Level
switch logLevel {
case "debug":
level = slog.LevelDebug
case "info":
level = slog.LevelInfo
case "warn":
level = slog.LevelWarn
case "error":
level = slog.LevelError
}
logger := slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{Level: level}))
Logging
Implementation
Package: Go stdlib log/slog (structured logging)
Usage pattern:
slog.Error("Database query failed", "error", err, "query", query)
Output format:
{"time":"2024-01-15T10:30:00Z","level":"ERROR","msg":"Database query failed","error":"no such table","query":"SELECT * FROM tracks"}
Logging Locations
Error logging only:
- Database query failures
- JSON decode errors
- HTTP handler errors
- Graceful shutdown errors
No info/debug logging:
- Request logging (no access logs)
- Query execution logging
- Performance metrics
- Startup messages
Example from db.go:
rows, err := d.mainDB.Query(query, args...)
if err != nil {
slog.Error("Query failed", "error", err, "query", query)
return nil, err
}
Log Level Control
Current: No log level filtering (all logs output)
Missing:
- Debug logs (query details, timing)
- Info logs (startup, shutdown, requests)
- Warn logs (rate limiting, slow queries)
Recommendation: Implement log level control via environment variable.
Health Checks
Naive Implementation
Endpoint: GET /health
Code:
func handleHealth(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
}
Response:
{"status":"ok"}
Problem: Always returns 200 OK, even if database is unreachable.
Test:
# Stop database (simulate failure)
mv /data/main_database.sqlite3 /data/main_database.sqlite3.bak
# Health check still returns OK
curl http://localhost:8080/health
# {"status":"ok"}
# But actual queries fail
curl http://localhost:8080/lookup/track/abc123
# 500 Internal Server Error
Improved Health Check
Recommendation:
func handleHealth(db *sql.DB) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
// Ping database
ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
if err := db.PingContext(ctx); err != nil {
w.WriteHeader(http.StatusServiceUnavailable)
json.NewEncoder(w).Encode(map[string]string{
"status": "unhealthy",
"error": "database unavailable",
})
return
}
// Optional: Test query
var count int
err := db.QueryRowContext(ctx, "SELECT COUNT(*) FROM tracks LIMIT 1").Scan(&count)
if err != nil {
w.WriteHeader(http.StatusServiceUnavailable)
json.NewEncoder(w).Encode(map[string]string{
"status": "unhealthy",
"error": "database query failed",
})
return
}
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
}
}
Rate Limiting
Implementation
File: internal/api/ratelimit.go
Algorithm: Token bucket per IP
Data structure:
type RateLimiter struct {
visitors map[string]*rate.Limiter // IP -> limiter
mu sync.RWMutex // Protects visitors map
rate rate.Limit // Tokens per second
burst int // Burst capacity
}
Middleware:
func (rl *RateLimiter) Limit(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Extract IP
ip := getIP(r)
// Get or create limiter for IP
limiter := rl.getLimiter(ip)
// Check if allowed
if !limiter.Allow() {
w.Header().Set("Retry-After", "1")
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
IP extraction:
func getIP(r *http.Request) string {
// Check X-Forwarded-For header (proxy/load balancer)
forwarded := r.Header.Get("X-Forwarded-For")
if forwarded != "" {
// Take first IP if comma-separated
ips := strings.Split(forwarded, ",")
return strings.TrimSpace(ips[0])
}
// Fallback to RemoteAddr
ip, _, _ := net.SplitHostPort(r.RemoteAddr)
return ip
}
Memory Leak
Problem: Visitor map grows unbounded. No cleanup for inactive IPs.
Code:
func (rl *RateLimiter) getLimiter(ip string) *rate.Limiter {
rl.mu.Lock()
defer rl.mu.Unlock()
limiter, exists := rl.visitors[ip]
if !exists {
limiter = rate.NewLimiter(rl.rate, rl.burst)
rl.visitors[ip] = limiter // BUG: Never removed
}
return limiter
}
Impact:
- Long-running servers accumulate IPs
- Memory usage grows over time
- No expiration for inactive IPs
Example:
- 1 million unique IPs over 1 month
- ~100 bytes per limiter
- ~100MB memory leak
Fix:
type visitor struct {
limiter *rate.Limiter
lastSeen time.Time
}
func (rl *RateLimiter) cleanup() {
ticker := time.NewTicker(1 * time.Hour)
defer ticker.Stop()
for range ticker.C {
rl.mu.Lock()
for ip, v := range rl.visitors {
// Remove visitors inactive for 24 hours
if time.Since(v.lastSeen) > 24*time.Hour {
delete(rl.visitors, ip)
}
}
rl.mu.Unlock()
}
}
// Start cleanup goroutine in NewRateLimiter
go rl.cleanup()
Rate Limit Configuration
Current: Hardcoded (100 req/s, 200 burst)
Recommendation: Make configurable via CLI flags or environment variables.
// CLI flags
var (
rateLimit = flag.Int("rate-limit", 100, "requests per second")
rateBurst = flag.Int("rate-burst", 200, "burst capacity")
)
// Usage
rateLimiter := api.NewRateLimiter(rate.Limit(*rateLimit), *rateBurst)
Search Implementation
Query Pattern
Track search:
query := `
SELECT id, name, isrc, duration_ms, popularity, album_rowid
FROM tracks
WHERE name LIKE ? COLLATE NOCASE
ORDER BY popularity DESC
LIMIT ?
`
args := []interface{}{"%" + searchQuery + "%", limit}
Artist search:
query := `
SELECT id, name, followers_total, popularity
FROM artists
WHERE name LIKE ? COLLATE NOCASE
ORDER BY followers_total DESC
LIMIT ?
`
args := []interface{}{"%" + searchQuery + "%", limit}
Performance Characteristics
LIKE %query% problems:
- Can't use indexes (full table scan)
- Slow on 256M rows
- CPU-intensive (string matching)
Benchmark (estimated):
- Common query ("love"): 5-10 seconds
- Specific query ("bohemian rhapsody"): 1-2 seconds
- Rare query ("xyzabc"): 10+ seconds (full scan)
10-second timeout:
ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)
defer cancel()
rows, err := db.QueryContext(ctx, query, args...)
if err == context.DeadlineExceeded {
http.Error(w, "Search timeout", http.StatusGatewayTimeout)
return
}
Search Validation
Minimum query length:
if len(searchQuery) < 2 {
http.Error(w, "Query must be at least 2 characters", http.StatusBadRequest)
return
}
Maximum limit:
if limit > 50 {
http.Error(w, "Limit cannot exceed 50", http.StatusBadRequest)
return
}
Default limit:
limit := 10
if limitParam := r.URL.Query().Get("limit"); limitParam != "" {
limit, _ = strconv.Atoi(limitParam)
}
Full-Text Search Alternative
Not implemented: SQLite FTS5 (Full-Text Search)
FTS5 benefits:
- Indexed search (much faster)
- Relevance ranking
- Phrase search
- Boolean operators
Why not used:
- Requires writable database (to create FTS5 table)
- Databases are read-only
- Would need separate FTS5 database
Workaround:
-- Create separate FTS5 database (one-time setup)
CREATE VIRTUAL TABLE tracks_fts USING fts5(id, name, content=tracks);
INSERT INTO tracks_fts SELECT id, name FROM tracks;
-- Fast search
SELECT * FROM tracks_fts WHERE name MATCH 'bohemian';
Implementation:
- Create FTS5 database during database preparation
- Open second database connection in code
- Query FTS5 for search, then fetch full data from main DB
Testing
Test Coverage
Test files: 0
Test coverage: 0%
Test framework: None
Evidence:
# No test files in repository
find . -name "*_test.go"
# (no output)
.gitignore includes:
coverage.out
Implication: Testing was planned but never implemented.
CI/CD Testing
GitHub Actions workflow: .github/workflows/docker-publish.yml
Steps:
- Checkout code
- Build Docker image
- Push to registry
Missing: No test step
Expected workflow:
- name: Run tests
run: go test -v ./...
- name: Check coverage
run: go test -cover ./...
Manual Testing
Only testing: Manual API calls
Example:
# Health check
curl http://localhost:8080/health
# Track lookup
curl http://localhost:8080/lookup/track/abc123
# Search
curl "http://localhost:8080/search/track?q=test"
No automated testing:
- No unit tests
- No integration tests
- No end-to-end tests
- No performance tests
- No load tests
Testing Recommendations
Unit tests needed:
- Rate limiter logic
- IP extraction
- Query building
- Data enrichment
- JSON serialization
Integration tests needed:
- Database queries
- HTTP handlers
- Batch operations
- Search functionality
Example unit test:
// internal/api/ratelimit_test.go
func TestRateLimiter(t *testing.T) {
rl := NewRateLimiter(10, 20) // 10 req/s, 20 burst
// Should allow burst
for i := 0; i < 20; i++ {
if !rl.getLimiter("127.0.0.1").Allow() {
t.Errorf("Request %d should be allowed", i)
}
}
// Should reject 21st request
if rl.getLimiter("127.0.0.1").Allow() {
t.Error("Request 21 should be rate limited")
}
}
Example integration test:
// internal/db/db_test.go
func TestGetTrack(t *testing.T) {
db, err := NewDatabase("testdata/test.db")
if err != nil {
t.Fatal(err)
}
defer db.Close()
track, err := db.GetTrack("test_track_id")
if err != nil {
t.Fatal(err)
}
if track.Name != "Test Track" {
t.Errorf("Expected 'Test Track', got '%s'", track.Name)
}
}
Error Handling
Error Patterns
Database errors:
rows, err := db.Query(query, args...)
if err != nil {
slog.Error("Query failed", "error", err)
http.Error(w, "Internal server error", http.StatusInternalServerError)
return
}
JSON decode errors:
var req BatchRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, "Invalid JSON", http.StatusBadRequest)
return
}
Validation errors:
if len(query) < 2 {
http.Error(w, "Query too short", http.StatusBadRequest)
return
}
Error Responses
Generic errors:
http.Error(w, "Internal server error", http.StatusInternalServerError)
Problem: No error details returned to client (security vs usability tradeoff)
Structured errors (not implemented):
type ErrorResponse struct {
Error string `json:"error"`
Code string `json:"code"`
Details string `json:"details,omitempty"`
}
func writeError(w http.ResponseWriter, status int, code, message string) {
w.WriteHeader(status)
json.NewEncoder(w).Encode(ErrorResponse{
Error: message,
Code: code,
})
}
Code Quality
Strengths
Simplicity:
- Small codebase (~1,100 lines)
- Easy to understand
- Minimal dependencies
- No framework magic
Readability:
- Clear function names
- Logical file organization
- Consistent style
Performance:
- Batch optimization (343x fewer queries)
- Connection pooling
- Memory-mapped I/O
Weaknesses
No tests:
- Zero test coverage
- No regression protection
- No documentation via tests
Hardcoded config:
- No flexibility
- Requires recompilation to change limits
- No environment-specific config
Memory leak:
- Rate limiter visitor map grows unbounded
- Requires periodic restarts
Naive health check:
- Doesn't verify database connectivity
- False positives in monitoring
No metrics:
- No visibility into performance
- No error rate tracking
- No usage analytics
Unused config:
LOG_LEVELenvironment variable ignored- Misleading documentation
No CORS:
- Browser-based clients blocked
- Requires reverse proxy workaround
No authentication:
- Public API (security risk)
- No usage tracking per user
Code Smells
Magic numbers:
// What is 64000? Why 1073741824?
_cache_size=-64000&_mmap_size=1073741824
Fix: Use named constants
const (
sqliteCacheSizeKB = 64000 // 64MB
sqliteMmapSizeBytes = 1 << 30 // 1GB
)
Repeated code:
// Similar enrichment logic repeated for tracks, albums, artists
func enrichTrack(track *Track) { /* ... */ }
func enrichAlbum(album *Album) { /* ... */ }
func enrichArtist(artist *Artist) { /* ... */ }
Fix: Generic enrichment function
Global state:
// Rate limiter as global variable (not shown in code, but implied)
var rateLimiter *RateLimiter
Fix: Dependency injection
Dependencies
External Packages
modernc.org/sqlite v1.34.4:
- Pure Go SQLite driver
- No CGO required
- 100% Go implementation
- Larger binary size vs CGO version
golang.org/x/time v0.14.0:
- Rate limiting (token bucket)
- Part of Go extended stdlib
- Minimal, focused package
Total dependencies: 2 direct + transitive dependencies
Dependency Management
go.mod:
module github.com/Aunali321/music-metadata-api
go 1.24
require (
modernc.org/sqlite v1.34.4
golang.org/x/time v0.14.0
)
Dependency updates:
# Check for updates
go list -u -m all
# Update dependencies
go get -u ./...
go mod tidy
Security scanning:
# Scan for vulnerabilities
go list -json -m all | nancy sleuth
Code Organization
Package Structure
music-metadata-api/
├── cmd/
│ └── server/ # Entry point
│ └── main.go # CLI, server setup, graceful shutdown
│
├── internal/ # Private packages
│ ├── api/ # HTTP layer
│ │ ├── handlers.go # Route handlers
│ │ ├── ratelimit.go # Rate limiting middleware
│ │ └── openapi.go # OpenAPI spec
│ │
│ ├── db/ # Database layer
│ │ └── db.go # Queries, enrichment, batch optimization
│ │
│ └── models/ # Data models
│ └── models.go # Structs, JSON tags
│
├── Dockerfile # Container build
├── docker-compose.yml # Local deployment
├── go.mod # Dependencies
└── .github/
└── workflows/
└── docker-publish.yml # CI/CD
Separation of Concerns
Good:
- Clear layer boundaries (API → DB → Models)
- No circular dependencies
- Database logic isolated from HTTP
Could improve:
- Extract configuration to separate package
- Extract validation to separate package
- Extract error handling to separate package
Performance Characteristics
Bottlenecks
Search queries:
LIKE %query%full table scan- 10-second timeout (can be hit)
- CPU-bound (string matching)
Rate limiter:
- RWMutex contention under high load
- Map lookup on every request
Database:
- Single SQLite file (no sharding)
- 8 connection limit (conservative)
Optimizations
Batch queries:
- 343x fewer queries (400 items: 7 queries vs 2,800)
- IN clause for bulk lookups
Connection pooling:
- Reuse connections (no overhead)
- 8 warm connections
Memory-mapped I/O:
- 1GB mmap (faster than read() syscalls)
- OS handles paging
Read-only mode:
- No write locks
- Safe concurrent reads
Maintainability
Documentation
Code comments: Minimal
README: Basic (installation, usage)
OpenAPI spec: Comprehensive (all endpoints documented)
No inline documentation:
// No function comments
func enrichTrack(track *Track) {
// No explanation of enrichment logic
}
Recommendation: Add godoc comments
// enrichTrack populates track with related entities (artists, album, track files).
// It performs batch queries to minimize database round-trips.
func enrichTrack(track *Track) {
// ...
}
Extensibility
Easy to extend:
- Add new endpoints (register route)
- Add new models (define struct)
- Add new queries (write SQL)
Hard to extend:
- Change rate limiting strategy (tightly coupled)
- Add authentication (no middleware chain)
- Add metrics (no instrumentation points)
Technical Debt
High priority:
- Fix rate limiter memory leak
- Implement proper health check
- Add test coverage
- Use LOG_LEVEL environment variable
Medium priority:
- Extract hardcoded config
- Add metrics/monitoring
- Implement CORS support
- Add authentication
Low priority:
- Improve search performance (FTS5)
- Add caching layer
- Structured error responses
- Request logging