Files
metadata-agregator/docs/research/music-metadata-api/analysis/OVERVIEW.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

322 lines
8.8 KiB
Markdown

# Music Metadata API - Overview
## Project Identity
**Name:** Music Metadata API
**Repository:** https://github.com/Aunali321/music-metadata-api
**License:** MIT
**Language:** Go 1.24
**Maintainer:** Single maintainer (Aunali321)
**Status:** Active, production-ready
## Purpose
Music Metadata API provides a self-hosted HTTP service for querying metadata on 256 million music tracks. The service operates entirely from pre-populated SQLite databases, requiring no external API calls at runtime. It's designed as a high-performance alternative to commercial music metadata APIs like Spotify's Web API.
## Core Technology Stack
### Runtime Dependencies
| Component | Version | Purpose | Notes |
|-----------|---------|---------|-------|
| Go | 1.24 | Runtime & stdlib HTTP server | Uses Go 1.22+ enhanced routing |
| modernc.org/sqlite | v1.34.4 | Pure Go SQLite driver | No CGO required |
| golang.org/x/time | v0.14.0 | Rate limiting (token bucket) | Only external dependency |
### Build Configuration
```bash
CGO_ENABLED=0 go build -ldflags="-s -w" ./cmd/server
```
**Flags explained:**
- `CGO_ENABLED=0`: Pure Go binary, no C dependencies
- `-s -w`: Strip debug symbols and DWARF tables (smaller binary)
## Data Scale
### Database Files
| Database | Size | Purpose | Records |
|----------|------|---------|---------|
| main_database.sqlite3 | ~117GB | Core metadata (tracks, albums, artists) | 256M tracks |
| track_files.sqlite3 | ~99GB | Extended track data (lyrics flags, languages, roles) | 256M track files |
| **Total** | **~216GB** | Combined storage requirement | - |
### Dataset Coverage
- **256 million tracks** across all databases
- Album metadata with images, labels, release dates
- Artist metadata with genres, follower counts, popularity scores
- ISRC codes for track identification
- Multi-language support (language_of_performance field)
- Artist role information (performer, composer, etc.)
## Entry Points
### Command Line
**Binary:** `cmd/server/main.go` (62 lines)
**Flags:**
```bash
-db string
Path to main database file (REQUIRED)
-addr string
HTTP server address (default ":8080")
```
**Example:**
```bash
./metadata-api -db /data/main_database.sqlite3 -addr :8080
```
### Docker
**Image:** `ghcr.io/aunali321/music-metadata-api:latest`
**Base:** Alpine Linux 3.21
**docker-compose.yml:**
```yaml
services:
metadata-api:
image: ghcr.io/aunali321/music-metadata-api:latest
ports:
- "8080:8080"
volumes:
- ./data:/data:ro
environment:
- LOG_LEVEL=info # NOTE: Not actually used in code
command: ["-db", "/data/main_database.sqlite3"]
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
restart: unless-stopped
```
## Architecture Layers
### Directory Structure
```
music-metadata-api/
├── cmd/
│ └── server/
│ └── main.go # Entry point (62 lines)
├── internal/
│ ├── api/ # HTTP handlers, routing, middleware
│ │ ├── handlers.go
│ │ ├── ratelimit.go
│ │ └── openapi.go
│ ├── db/
│ │ └── db.go # Database layer (907 lines)
│ └── models/
│ └── models.go # Data structures (65 lines)
├── Dockerfile
├── docker-compose.yml
└── .github/
└── workflows/
└── docker-publish.yml
```
### Layer Responsibilities
**API Layer** (`internal/api/`)
- HTTP request handling
- Rate limiting (token bucket, per-IP)
- OpenAPI specification serving
- Swagger UI hosting
**Database Layer** (`internal/db/`)
- SQLite connection management
- Query execution
- Data enrichment (joining related entities)
- Batch optimization
**Models Layer** (`internal/models/`)
- Data structure definitions
- JSON serialization tags
- Response formatting
## Key Features
### Performance Optimizations
1. **Read-only databases** - No write locks, safe concurrent reads
2. **Conservative PRAGMAs** - Optimized for read-heavy workloads
3. **Batch endpoints** - Process up to 400 items per request
4. **Connection pooling** - MaxOpenConns=8 for controlled resource usage
5. **Memory-mapped I/O** - 1GB mmap for faster reads
### API Capabilities
- **Batch lookup** - Retrieve multiple tracks/albums/artists in single request
- **ISRC lookup** - Industry-standard track identification
- **Search** - Full-text search on tracks and artists
- **Relationship traversal** - Album tracks, artist albums, track artists
- **OpenAPI documentation** - Interactive Swagger UI at `/docs`
### Operational Features
- **Graceful shutdown** - 10-second timeout for in-flight requests
- **Health checks** - `/health` endpoint for monitoring
- **Rate limiting** - 100 req/s with 200 burst capacity
- **Structured logging** - Go stdlib `log/slog` for error tracking
## Deployment Models
### Standalone Binary
**Pros:**
- Single executable, no dependencies
- Minimal resource footprint
- Direct filesystem access to databases
**Cons:**
- Manual process management
- No automatic restarts
- Manual log rotation
### Docker Container
**Pros:**
- Consistent runtime environment
- Built-in health checks
- Automatic restarts
- Easy horizontal scaling
**Cons:**
- Requires Docker runtime
- Additional layer of abstraction
- Volume mount for large databases
## Use Cases
### Primary Use Cases
1. **Music library enrichment** - Add metadata to existing track collections
2. **ISRC-based lookup** - Resolve ISRCs to full track metadata
3. **Batch processing** - Enrich large catalogs efficiently
4. **Self-hosted alternative** - Replace commercial APIs with local service
### Integration Scenarios
- **Metadata aggregator pipelines** - Complement MusicBrainz with Spotify-style data
- **Music streaming services** - Populate track/album/artist information
- **DJ software** - Enrich track libraries with popularity, genres, images
- **Music analytics** - Analyze trends across 256M tracks
## Limitations
### Technical Constraints
- **Database size** - Requires 216GB disk space
- **No write operations** - Read-only, no data updates
- **No authentication** - Public API, no access control
- **No CORS** - Browser-based clients blocked
- **Memory leak** - Rate limiter visitor map grows unbounded
### Data Constraints
- **Database provenance unclear** - "Not affiliated with Spotify"
- **No freshness mechanism** - Static snapshot, no updates
- **Search performance** - LIKE queries slow on large datasets (no FTS)
### Operational Constraints
- **No metrics** - No Prometheus, no counters
- **Naive health check** - Doesn't verify database connectivity
- **Hardcoded config** - Timeouts, limits not configurable
- **No tests** - Zero test coverage
## Project Maturity
### Strengths
- Clean, simple codebase
- Production-ready Docker setup
- Comprehensive OpenAPI spec
- Massive dataset (256M tracks)
- Pure Go (no CGO complexity)
### Weaknesses
- Single maintainer
- No test suite
- No CI test step
- Unused config (LOG_LEVEL)
- Memory leak in rate limiter
## Comparison to Alternatives
| Feature | Music Metadata API | Spotify Web API | MusicBrainz API |
|---------|-------------------|-----------------|-----------------|
| Self-hosted | Yes | No | No |
| Authentication | None | OAuth required | Optional |
| Dataset size | 256M tracks | Full catalog | ~40M recordings |
| Rate limits | 100 req/s | Varies by tier | 1 req/s |
| Batch support | 400 items | 50 items | Limited |
| Cost | Free (MIT) | Free tier limited | Free |
| Data freshness | Static | Real-time | Community-updated |
| Identifier | ISRC, internal IDs | Spotify IDs | MBIDs |
## Getting Started
### Minimum Requirements
1. Go 1.24+ (for building from source)
2. 216GB disk space for databases
3. Database files (not included in repository)
4. 2GB+ RAM recommended
### Quick Start
```bash
# Clone repository
git clone https://github.com/Aunali321/music-metadata-api.git
cd music-metadata-api
# Build binary
CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server
# Run server (assumes databases in /data)
./metadata-api -db /data/main_database.sqlite3 -addr :8080
# Test health endpoint
curl http://localhost:8080/health
# View API documentation
open http://localhost:8080/docs
```
### Docker Quick Start
```bash
# Pull image
docker pull ghcr.io/aunali321/music-metadata-api:latest
# Run container
docker run -d \
-p 8080:8080 \
-v /path/to/databases:/data:ro \
ghcr.io/aunali321/music-metadata-api:latest \
-db /data/main_database.sqlite3
# Check health
curl http://localhost:8080/health
```
## Documentation Resources
- **OpenAPI Spec:** http://localhost:8080/openapi.yaml
- **Interactive Docs:** http://localhost:8080/docs
- **GitHub Repository:** https://github.com/Aunali321/music-metadata-api
- **Docker Image:** ghcr.io/aunali321/music-metadata-api
## License
MIT License - Free for commercial and personal use with attribution.