# Music Metadata API - Overview ## Project Identity **Name:** Music Metadata API **Repository:** https://github.com/Aunali321/music-metadata-api **License:** MIT **Language:** Go 1.24 **Maintainer:** Single maintainer (Aunali321) **Status:** Active, production-ready ## Purpose Music Metadata API provides a self-hosted HTTP service for querying metadata on 256 million music tracks. The service operates entirely from pre-populated SQLite databases, requiring no external API calls at runtime. It's designed as a high-performance alternative to commercial music metadata APIs like Spotify's Web API. ## Core Technology Stack ### Runtime Dependencies | Component | Version | Purpose | Notes | |-----------|---------|---------|-------| | Go | 1.24 | Runtime & stdlib HTTP server | Uses Go 1.22+ enhanced routing | | modernc.org/sqlite | v1.34.4 | Pure Go SQLite driver | No CGO required | | golang.org/x/time | v0.14.0 | Rate limiting (token bucket) | Only external dependency | ### Build Configuration ```bash CGO_ENABLED=0 go build -ldflags="-s -w" ./cmd/server ``` **Flags explained:** - `CGO_ENABLED=0`: Pure Go binary, no C dependencies - `-s -w`: Strip debug symbols and DWARF tables (smaller binary) ## Data Scale ### Database Files | Database | Size | Purpose | Records | |----------|------|---------|---------| | main_database.sqlite3 | ~117GB | Core metadata (tracks, albums, artists) | 256M tracks | | track_files.sqlite3 | ~99GB | Extended track data (lyrics flags, languages, roles) | 256M track files | | **Total** | **~216GB** | Combined storage requirement | - | ### Dataset Coverage - **256 million tracks** across all databases - Album metadata with images, labels, release dates - Artist metadata with genres, follower counts, popularity scores - ISRC codes for track identification - Multi-language support (language_of_performance field) - Artist role information (performer, composer, etc.) ## Entry Points ### Command Line **Binary:** `cmd/server/main.go` (62 lines) **Flags:** ```bash -db string Path to main database file (REQUIRED) -addr string HTTP server address (default ":8080") ``` **Example:** ```bash ./metadata-api -db /data/main_database.sqlite3 -addr :8080 ``` ### Docker **Image:** `ghcr.io/aunali321/music-metadata-api:latest` **Base:** Alpine Linux 3.21 **docker-compose.yml:** ```yaml services: metadata-api: image: ghcr.io/aunali321/music-metadata-api:latest ports: - "8080:8080" volumes: - ./data:/data:ro environment: - LOG_LEVEL=info # NOTE: Not actually used in code command: ["-db", "/data/main_database.sqlite3"] healthcheck: test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 3 restart: unless-stopped ``` ## Architecture Layers ### Directory Structure ``` music-metadata-api/ ├── cmd/ │ └── server/ │ └── main.go # Entry point (62 lines) ├── internal/ │ ├── api/ # HTTP handlers, routing, middleware │ │ ├── handlers.go │ │ ├── ratelimit.go │ │ └── openapi.go │ ├── db/ │ │ └── db.go # Database layer (907 lines) │ └── models/ │ └── models.go # Data structures (65 lines) ├── Dockerfile ├── docker-compose.yml └── .github/ └── workflows/ └── docker-publish.yml ``` ### Layer Responsibilities **API Layer** (`internal/api/`) - HTTP request handling - Rate limiting (token bucket, per-IP) - OpenAPI specification serving - Swagger UI hosting **Database Layer** (`internal/db/`) - SQLite connection management - Query execution - Data enrichment (joining related entities) - Batch optimization **Models Layer** (`internal/models/`) - Data structure definitions - JSON serialization tags - Response formatting ## Key Features ### Performance Optimizations 1. **Read-only databases** - No write locks, safe concurrent reads 2. **Conservative PRAGMAs** - Optimized for read-heavy workloads 3. **Batch endpoints** - Process up to 400 items per request 4. **Connection pooling** - MaxOpenConns=8 for controlled resource usage 5. **Memory-mapped I/O** - 1GB mmap for faster reads ### API Capabilities - **Batch lookup** - Retrieve multiple tracks/albums/artists in single request - **ISRC lookup** - Industry-standard track identification - **Search** - Full-text search on tracks and artists - **Relationship traversal** - Album tracks, artist albums, track artists - **OpenAPI documentation** - Interactive Swagger UI at `/docs` ### Operational Features - **Graceful shutdown** - 10-second timeout for in-flight requests - **Health checks** - `/health` endpoint for monitoring - **Rate limiting** - 100 req/s with 200 burst capacity - **Structured logging** - Go stdlib `log/slog` for error tracking ## Deployment Models ### Standalone Binary **Pros:** - Single executable, no dependencies - Minimal resource footprint - Direct filesystem access to databases **Cons:** - Manual process management - No automatic restarts - Manual log rotation ### Docker Container **Pros:** - Consistent runtime environment - Built-in health checks - Automatic restarts - Easy horizontal scaling **Cons:** - Requires Docker runtime - Additional layer of abstraction - Volume mount for large databases ## Use Cases ### Primary Use Cases 1. **Music library enrichment** - Add metadata to existing track collections 2. **ISRC-based lookup** - Resolve ISRCs to full track metadata 3. **Batch processing** - Enrich large catalogs efficiently 4. **Self-hosted alternative** - Replace commercial APIs with local service ### Integration Scenarios - **Metadata aggregator pipelines** - Complement MusicBrainz with Spotify-style data - **Music streaming services** - Populate track/album/artist information - **DJ software** - Enrich track libraries with popularity, genres, images - **Music analytics** - Analyze trends across 256M tracks ## Limitations ### Technical Constraints - **Database size** - Requires 216GB disk space - **No write operations** - Read-only, no data updates - **No authentication** - Public API, no access control - **No CORS** - Browser-based clients blocked - **Memory leak** - Rate limiter visitor map grows unbounded ### Data Constraints - **Database provenance unclear** - "Not affiliated with Spotify" - **No freshness mechanism** - Static snapshot, no updates - **Search performance** - LIKE queries slow on large datasets (no FTS) ### Operational Constraints - **No metrics** - No Prometheus, no counters - **Naive health check** - Doesn't verify database connectivity - **Hardcoded config** - Timeouts, limits not configurable - **No tests** - Zero test coverage ## Project Maturity ### Strengths - Clean, simple codebase - Production-ready Docker setup - Comprehensive OpenAPI spec - Massive dataset (256M tracks) - Pure Go (no CGO complexity) ### Weaknesses - Single maintainer - No test suite - No CI test step - Unused config (LOG_LEVEL) - Memory leak in rate limiter ## Comparison to Alternatives | Feature | Music Metadata API | Spotify Web API | MusicBrainz API | |---------|-------------------|-----------------|-----------------| | Self-hosted | Yes | No | No | | Authentication | None | OAuth required | Optional | | Dataset size | 256M tracks | Full catalog | ~40M recordings | | Rate limits | 100 req/s | Varies by tier | 1 req/s | | Batch support | 400 items | 50 items | Limited | | Cost | Free (MIT) | Free tier limited | Free | | Data freshness | Static | Real-time | Community-updated | | Identifier | ISRC, internal IDs | Spotify IDs | MBIDs | ## Getting Started ### Minimum Requirements 1. Go 1.24+ (for building from source) 2. 216GB disk space for databases 3. Database files (not included in repository) 4. 2GB+ RAM recommended ### Quick Start ```bash # Clone repository git clone https://github.com/Aunali321/music-metadata-api.git cd music-metadata-api # Build binary CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server # Run server (assumes databases in /data) ./metadata-api -db /data/main_database.sqlite3 -addr :8080 # Test health endpoint curl http://localhost:8080/health # View API documentation open http://localhost:8080/docs ``` ### Docker Quick Start ```bash # Pull image docker pull ghcr.io/aunali321/music-metadata-api:latest # Run container docker run -d \ -p 8080:8080 \ -v /path/to/databases:/data:ro \ ghcr.io/aunali321/music-metadata-api:latest \ -db /data/main_database.sqlite3 # Check health curl http://localhost:8080/health ``` ## Documentation Resources - **OpenAPI Spec:** http://localhost:8080/openapi.yaml - **Interactive Docs:** http://localhost:8080/docs - **GitHub Repository:** https://github.com/Aunali321/music-metadata-api - **Docker Image:** ghcr.io/aunali321/music-metadata-api ## License MIT License - Free for commercial and personal use with attribution.