- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
19 KiB
Music Metadata API - Evaluation
Executive Summary
Music Metadata API is a simple, focused, self-contained service for querying metadata on 256 million music tracks. It excels at batch lookups and ISRC-based queries but lacks authentication, testing, and real-time data updates.
Best for: Self-hosted metadata enrichment, high-volume batch processing, ISRC resolution
Not suitable for: Real-time data, production systems requiring authentication, mission-critical applications without testing
Strengths
1. Massive Dataset
256 million tracks across two SQLite databases (~216GB)
Coverage:
- Tracks with ISRC codes
- Albums with artwork, labels, release dates
- Artists with genres, follower counts, popularity
- Extended metadata (lyrics flags, languages, artist roles)
Comparison:
- Spotify Web API: Full catalog (real-time)
- MusicBrainz: ~40M recordings
- Discogs: ~15M releases
Value: Comprehensive coverage for metadata enrichment without API rate limits.
2. Extremely Simple Architecture
No framework, no ORM, minimal dependencies:
- Go stdlib for HTTP, JSON, database
- 2 external packages (sqlite driver, rate limiter)
- ~1,100 lines of code
- Single binary deployment
Benefits:
- Easy to understand and modify
- Fast compilation
- No framework lock-in
- Minimal attack surface
Comparison:
- Typical web service: 10+ dependencies, framework overhead
- Music Metadata API: 2 dependencies, stdlib only
3. High-Performance Batch API
Batch endpoint: Process up to 400 items per request
Performance gain:
- Individual requests: 400 × ~50ms = 20 seconds
- Batch request: ~200-500ms total
- 40-100x faster
Query optimization:
- Without batching: 2,800+ queries for 400 tracks
- With batching: 7 queries for 400 tracks
- 400x fewer queries
Use case: Enriching large music libraries efficiently.
4. Pure Go (No CGO)
CGO_ENABLED=0 - No C dependencies
Benefits:
- Cross-compilation trivial (GOOS/GOARCH)
- No C toolchain required
- Smaller attack surface
- Easier deployment (static binary)
Tradeoff: Larger binary size vs CGO SQLite driver (~2MB vs ~500KB)
5. Read-Only Safety
Databases opened in read-only mode:
- No accidental writes
- No data corruption risk
- Safe concurrent reads
- No write locks
PRAGMAs:
mode=ro
_journal_mode=off
_query_only=true
Benefit: Multiple instances can share database files safely.
6. OpenAPI Documentation
Comprehensive OpenAPI 3.1 spec:
- All endpoints documented
- Request/response schemas
- Example payloads
- Interactive Swagger UI at
/docs
Value: Self-documenting API, easy integration.
7. MIT License
Permissive license:
- Free for commercial use
- No attribution required (recommended)
- Modify and redistribute freely
Comparison:
- Spotify Web API: Proprietary, rate limited
- MusicBrainz: CC0/Public Domain (data), GPL (server)
8. Easy Deployment
Multiple deployment options:
- Standalone binary (single executable)
- Docker container (official image)
- Kubernetes (example manifests)
- Cloud platforms (ECS, Cloud Run, ACI)
Minimal requirements:
- 216GB disk (databases)
- 4GB RAM
- 1 CPU core
No external dependencies:
- No database server (SQLite embedded)
- No cache server (SQLite cache)
- No message queue
- No authentication service
Weaknesses
1. Zero Test Coverage
No test files, no test framework, no CI testing
Risks:
- No regression protection
- Bugs discovered in production
- Difficult to refactor safely
- No documentation via tests
Evidence:
.gitignoreincludescoverage.out(testing planned but not implemented)- GitHub Actions workflow has no test step
Impact: High risk for production use without extensive manual testing.
2. No Authentication
Public API with no access control:
- No OAuth
- No API keys
- No rate limiting per user (only per IP)
- No usage tracking per user
Risks:
- Abuse (unlimited queries)
- No accountability
- No quota enforcement
- Data scraping
Workarounds:
- Deploy behind reverse proxy with auth (nginx, Caddy)
- Use API gateway (Kong, Tyk)
- Implement custom middleware
Impact: Not suitable for public internet deployment without additional security layer.
3. Naive Health Check
Health endpoint always returns OK:
func handleHealth(w http.ResponseWriter, r *http.Request) {
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
}
Problem: Doesn't verify database connectivity
Scenario:
- Database file deleted/corrupted
- Health check returns 200 OK
- Actual queries fail with 500 errors
- Monitoring systems don't detect failure
Impact: False positives in monitoring, delayed incident detection.
4. Rate Limiter Memory Leak
Visitor map grows unbounded:
type RateLimiter struct {
visitors map[string]*rate.Limiter // Never cleaned up
mu sync.RWMutex
}
Impact:
- Long-running servers accumulate IPs
- Memory usage grows over time
- 1M unique IPs = ~100MB leak
Workaround: Restart server periodically
Fix required: Implement visitor cleanup (remove inactive IPs after 24 hours)
5. No CORS Support
No CORS headers:
- Browser-based clients blocked
- Can't call from web apps directly
- OPTIONS preflight requests fail
Workarounds:
- Add CORS middleware (custom implementation)
- Use server-side proxy
- Deploy API on same origin as web app
Impact: Limited to server-side integrations.
6. No Metrics/Monitoring
No instrumentation:
- No Prometheus metrics
- No request counters
- No latency histograms
- No error rate tracking
Visibility gaps:
- Can't track usage patterns
- Can't identify slow endpoints
- Can't detect error spikes
- No performance baselines
Workarounds:
- Parse logs for metrics
- Use reverse proxy metrics (nginx)
- Implement custom metrics middleware
Impact: Blind operation, difficult to optimize.
7. Database Provenance Unclear
Repository disclaimer:
"This project is not affiliated with Spotify."
Concerns:
- Data source unclear (likely scraped)
- Legal status uncertain
- No official Spotify endorsement
- Potential copyright issues
Risks:
- Takedown requests
- Legal liability
- Data quality unknown
- No support/updates
Recommendation: Verify legal compliance before production use.
8. No Data Freshness Mechanism
Static snapshot:
- No update mechanism
- Data frozen at time of database creation
- No real-time sync with Spotify
Staleness:
- New releases not included
- Popularity scores outdated
- Artist follower counts stale
- Deleted tracks still present
Workarounds:
- Periodically obtain updated database (if available)
- Complement with real-time APIs for fresh data
- Treat as historical snapshot
Impact: Not suitable for applications requiring current data.
9. Search Performance
LIKE %query% on 256M rows:
- Full table scan (can't use indexes)
- 10-second timeout (can be hit)
- CPU-intensive
Slow searches:
- Common words ("love", "the"): 5-10 seconds
- Rare queries: 10+ seconds (full scan)
Alternative: SQLite FTS5 (Full-Text Search)
- Requires writable database (not compatible with read-only mode)
- Would need separate FTS5 database
Impact: Search functionality limited to specific queries.
10. Hardcoded Configuration
All limits/timeouts hardcoded:
- Rate limit: 100 req/s, 200 burst
- Search timeout: 10 seconds
- Batch limit: 400 items
- Connection pool: 8 connections
- SQLite cache: 64MB
Problems:
- No flexibility
- Requires recompilation to change
- No environment-specific config
Workaround: Fork and modify code
Impact: Limited adaptability to different workloads.
Use Case Evaluation
Ideal Use Cases
1. Music Library Enrichment
Scenario: Enrich local music library with metadata
Flow:
- Extract ISRCs from audio files (via AcoustID)
- Batch lookup ISRCs (400 at a time)
- Store metadata in local database
- Display in music player UI
Why suitable:
- Batch API optimized for bulk lookups
- ISRC-based lookup (industry standard)
- No API rate limits (self-hosted)
- Comprehensive metadata (genres, images, popularity)
Example:
# Enrich 10,000 tracks
isrcs = extract_isrcs_from_library() # 10,000 ISRCs
# Batch lookup (25 requests for 10,000 tracks)
for batch in chunks(isrcs, 400):
response = requests.post("http://localhost:8080/batch/lookup", json={"isrcs": batch})
store_metadata(response.json())
2. Metadata Aggregator Pipeline
Scenario: Combine data from multiple sources (MusicBrainz + Music Metadata API)
Flow:
- Query MusicBrainz for recording by MBID
- Extract ISRC from MusicBrainz response
- Lookup ISRC in Music Metadata API
- Merge metadata (MusicBrainz credits + Spotify-style data)
Why suitable:
- Complements MusicBrainz (different data models)
- ISRC as common key
- Fast batch lookups
- No external API dependencies
Example:
# Get MusicBrainz data
mb_data = musicbrainz.get_recording(mbid)
isrc = mb_data['isrcs'][0]
# Get Spotify-style data
mm_data = requests.get(f"http://localhost:8080/lookup/isrc/{isrc}").json()
# Merge
merged = {
"mbid": mbid,
"isrc": isrc,
"title": mm_data['name'],
"popularity": mm_data['popularity'],
"credits": mb_data['artist-credit'],
"genres": mm_data['artists'][0]['genres']
}
3. Self-Hosted Alternative to Spotify API
Scenario: Replace Spotify Web API with local service
Why suitable:
- No OAuth complexity
- No API rate limits
- No per-request costs
- Batch support (400 items vs Spotify's 50)
Tradeoffs:
- Static data (no real-time updates)
- Database size (216GB)
- No write operations
Example:
# Spotify Web API (rate limited, requires OAuth)
spotify_data = spotify_client.search(q=f"isrc:{isrc}", type="track")
# Music Metadata API (no auth, no rate limits)
mm_data = requests.get(f"http://localhost:8080/lookup/isrc/{isrc}").json()
4. DJ Software Metadata Provider
Scenario: Enrich DJ library with popularity, genres, images
Why suitable:
- Batch processing for large libraries
- Popularity scores for track selection
- Genre tags for filtering
- Album artwork for UI
Example:
# Enrich DJ library
tracks = load_dj_library() # 5,000 tracks
isrcs = [t.isrc for t in tracks]
# Batch lookup
for batch in chunks(isrcs, 400):
response = requests.post("http://localhost:8080/batch/lookup", json={"isrcs": batch})
update_dj_library(response.json())
Unsuitable Use Cases
1. Real-Time Music Discovery App
Why unsuitable:
- Static data (no new releases)
- Outdated popularity scores
- No personalization
- No user-specific data
Alternative: Spotify Web API, Apple Music API
2. Public-Facing API Service
Why unsuitable:
- No authentication (abuse risk)
- No usage tracking
- No quota enforcement
- Rate limiter memory leak
Alternative: Add authentication layer or use managed API service
3. Mission-Critical Production System
Why unsuitable:
- Zero test coverage
- Naive health check
- Memory leak
- No metrics
Alternative: Extensive testing + monitoring before production use
4. Applications Requiring Fresh Data
Why unsuitable:
- Static snapshot (no updates)
- Stale popularity/follower counts
- Missing new releases
Alternative: Spotify Web API, MusicBrainz (community-updated)
Integration Evaluation
Complementary Services
Works well with:
- MusicBrainz: Different data models, ISRC as common key
- AcoustID: Fingerprint to ISRC, then lookup metadata
- Local music libraries: Enrich with metadata
- DJ software: Popularity, genres, artwork
Conflicts with:
- Spotify Web API: Overlapping data, but Music Metadata API is static
- Real-time services: Music Metadata API data is stale
Integration Complexity
Easy integrations:
- HTTP client (any language)
- Batch processing pipelines
- Local applications
Complex integrations:
- Browser-based apps (no CORS)
- Authenticated services (no auth)
- Real-time systems (static data)
Performance Evaluation
Throughput
Batch endpoint:
- 400 items per request
- ~200-500ms per request
- 800-2,000 items/second (single instance)
Individual endpoints:
- ~50ms per request
- Rate limited to 100 req/s
- 100 items/second (single instance)
Scaling:
- Horizontal: Run multiple instances (read-only safe)
- Vertical: More RAM (larger cache), faster disk (SSD)
Latency
Typical latencies:
- Track lookup: 10-50ms
- Album lookup: 10-50ms
- Artist lookup: 10-50ms
- Batch lookup (400 items): 200-500ms
- Search: 1-10 seconds (depends on query)
Bottlenecks:
- Search queries (LIKE %query%)
- Disk I/O (use SSD)
- Rate limiter (RWMutex contention)
Resource Usage
Disk: 216GB (databases)
RAM: 2.5GB (SQLite cache + mmap) + 1.5GB (app/OS) = 4GB minimum
CPU: 1 core minimum, 2+ recommended (search queries CPU-intensive)
Scaling costs:
- 10 instances = 2.16TB storage (expensive)
- Shared filesystem (NFS, EFS) reduces storage cost but increases latency
Security Evaluation
Vulnerabilities
High severity:
- No authentication: Anyone can query API
- No rate limiting per user: IP-based only (easily bypassed)
Medium severity:
- Memory leak: Rate limiter grows unbounded
- No input sanitization: SQL injection risk (mitigated by parameterized queries)
Low severity:
- No HTTPS: Deploy behind reverse proxy with TLS
- No CORS: Browser-based attacks limited
Mitigations
Authentication:
- Deploy behind reverse proxy with auth (nginx, Caddy)
- Use API gateway (Kong, Tyk)
Rate limiting:
- Implement per-user rate limiting (requires auth)
- Use distributed rate limiter (Redis)
Memory leak:
- Restart server periodically
- Implement visitor cleanup
HTTPS:
- Terminate TLS at reverse proxy
- Use Let's Encrypt for free certificates
Reliability Evaluation
Failure Modes
Database unavailable:
- Health check returns OK (false positive)
- Queries fail with 500 errors
- No automatic recovery
Memory exhaustion:
- Rate limiter leak accumulates
- OOM kill by OS
- Service restart required
Disk full:
- SQLite read-only (no writes)
- No impact on service
Network partition:
- No external dependencies
- Service continues (self-contained)
Recovery
Automatic recovery:
- Graceful shutdown on SIGINT/SIGTERM
- Docker/Kubernetes restart on failure
Manual recovery:
- Restart service (clears rate limiter leak)
- Restore database from backup
- Check database integrity (PRAGMA integrity_check)
High Availability
Strategies:
- Run multiple instances (read-only safe)
- Load balancer distributes traffic
- Health checks route around failures (but naive health check is a problem)
Limitations:
- No shared state (rate limiter per-instance)
- No session affinity required
- Database replication (copy files to each instance)
Cost Evaluation
Infrastructure Costs
Single instance:
- Compute: $20-50/month (2 CPU, 8GB RAM)
- Storage: $20-40/month (250GB SSD)
- Network: $5-10/month (1TB transfer)
- Total: $45-100/month
10 instances (high availability):
- Compute: $200-500/month
- Storage: $200-400/month (2.5TB SSD, or shared filesystem)
- Network: $50-100/month
- Total: $450-1,000/month
Comparison:
- Spotify Web API: Free tier limited, paid tiers $0.001-0.01 per request
- MusicBrainz: Free (donations encouraged)
Development Costs
Initial setup:
- Deploy service: 1-2 hours
- Obtain databases: Unknown (not in repository)
- Test integration: 2-4 hours
- Total: 4-8 hours
Ongoing maintenance:
- Monitor service: 1-2 hours/month
- Update databases: Unknown (no update mechanism)
- Security patches: 1-2 hours/month
- Total: 2-4 hours/month
Total Cost of Ownership
Year 1:
- Infrastructure: $540-1,200 (single instance)
- Development: $400-800 (setup + 12 months maintenance)
- Total: $940-2,000
Comparison:
- Spotify Web API: $0-10,000+ (depends on usage)
- MusicBrainz: $0 (free, donations encouraged)
Recommendation Matrix
| Use Case | Suitability | Reasoning |
|---|---|---|
| Music library enrichment | ⭐⭐⭐⭐⭐ | Ideal: Batch API, ISRC lookup, no rate limits |
| Metadata aggregator | ⭐⭐⭐⭐⭐ | Ideal: Complements MusicBrainz, fast lookups |
| Self-hosted alternative | ⭐⭐⭐⭐ | Good: No auth complexity, but static data |
| DJ software integration | ⭐⭐⭐⭐ | Good: Popularity, genres, artwork |
| Real-time music app | ⭐⭐ | Poor: Static data, no updates |
| Public API service | ⭐⭐ | Poor: No auth, no metrics, memory leak |
| Mission-critical system | ⭐ | Very poor: No tests, naive health check |
| Fresh data required | ⭐ | Very poor: Static snapshot, no updates |
Legend:
- ⭐⭐⭐⭐⭐ Ideal
- ⭐⭐⭐⭐ Good
- ⭐⭐⭐ Acceptable
- ⭐⭐ Poor
- ⭐ Very poor
Final Verdict
Overall Rating: 7/10
Breakdown:
- Functionality: 9/10 (comprehensive metadata, batch API)
- Performance: 8/10 (fast batch, slow search)
- Reliability: 5/10 (no tests, memory leak, naive health check)
- Security: 4/10 (no auth, no metrics)
- Maintainability: 6/10 (simple code, but no tests)
- Documentation: 8/10 (OpenAPI spec, but minimal code comments)
Strengths Summary
- Massive dataset (256M tracks)
- Simple architecture (no framework)
- High-performance batch API (400 items/request)
- Pure Go (no CGO)
- Read-only safety
- OpenAPI documentation
- MIT license
- Easy deployment
Weaknesses Summary
- Zero test coverage
- No authentication
- Naive health check
- Rate limiter memory leak
- No CORS
- No metrics
- Database provenance unclear
- No data freshness
- Slow search (LIKE %query%)
- Hardcoded configuration
Recommendation
Use Music Metadata API if:
- You need to enrich large music libraries (batch processing)
- You want ISRC-based lookups without API rate limits
- You can tolerate static data (no real-time updates)
- You can deploy behind reverse proxy (for auth/CORS)
- You can implement monitoring (metrics, proper health checks)
- You can accept legal uncertainty (database provenance)
Don't use Music Metadata API if:
- You need real-time data (use Spotify Web API)
- You need production-grade reliability (no tests)
- You need authentication out-of-the-box
- You need fresh data (new releases, current popularity)
- You can't tolerate 216GB storage requirement
Improvement Priorities
Critical (before production):
- Add test coverage (unit + integration tests)
- Fix rate limiter memory leak
- Implement proper health check (verify database)
- Add authentication (or deploy behind auth proxy)
High priority:
- Add metrics/monitoring (Prometheus)
- Implement CORS support
- Extract hardcoded config (environment variables)
- Use LOG_LEVEL environment variable
Medium priority:
- Improve search performance (FTS5)
- Add request logging
- Structured error responses
- Documentation (code comments)
Low priority:
- Caching layer (Redis)
- Horizontal scaling improvements
- Database update mechanism
- Admin API (stats, cache control)