a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
322 lines
8.8 KiB
Markdown
322 lines
8.8 KiB
Markdown
# Music Metadata API - Overview
|
|
|
|
## Project Identity
|
|
|
|
**Name:** Music Metadata API
|
|
**Repository:** https://github.com/Aunali321/music-metadata-api
|
|
**License:** MIT
|
|
**Language:** Go 1.24
|
|
**Maintainer:** Single maintainer (Aunali321)
|
|
**Status:** Active, production-ready
|
|
|
|
## Purpose
|
|
|
|
Music Metadata API provides a self-hosted HTTP service for querying metadata on 256 million music tracks. The service operates entirely from pre-populated SQLite databases, requiring no external API calls at runtime. It's designed as a high-performance alternative to commercial music metadata APIs like Spotify's Web API.
|
|
|
|
## Core Technology Stack
|
|
|
|
### Runtime Dependencies
|
|
|
|
| Component | Version | Purpose | Notes |
|
|
|-----------|---------|---------|-------|
|
|
| Go | 1.24 | Runtime & stdlib HTTP server | Uses Go 1.22+ enhanced routing |
|
|
| modernc.org/sqlite | v1.34.4 | Pure Go SQLite driver | No CGO required |
|
|
| golang.org/x/time | v0.14.0 | Rate limiting (token bucket) | Only external dependency |
|
|
|
|
### Build Configuration
|
|
|
|
```bash
|
|
CGO_ENABLED=0 go build -ldflags="-s -w" ./cmd/server
|
|
```
|
|
|
|
**Flags explained:**
|
|
- `CGO_ENABLED=0`: Pure Go binary, no C dependencies
|
|
- `-s -w`: Strip debug symbols and DWARF tables (smaller binary)
|
|
|
|
## Data Scale
|
|
|
|
### Database Files
|
|
|
|
| Database | Size | Purpose | Records |
|
|
|----------|------|---------|---------|
|
|
| main_database.sqlite3 | ~117GB | Core metadata (tracks, albums, artists) | 256M tracks |
|
|
| track_files.sqlite3 | ~99GB | Extended track data (lyrics flags, languages, roles) | 256M track files |
|
|
| **Total** | **~216GB** | Combined storage requirement | - |
|
|
|
|
### Dataset Coverage
|
|
|
|
- **256 million tracks** across all databases
|
|
- Album metadata with images, labels, release dates
|
|
- Artist metadata with genres, follower counts, popularity scores
|
|
- ISRC codes for track identification
|
|
- Multi-language support (language_of_performance field)
|
|
- Artist role information (performer, composer, etc.)
|
|
|
|
## Entry Points
|
|
|
|
### Command Line
|
|
|
|
**Binary:** `cmd/server/main.go` (62 lines)
|
|
|
|
**Flags:**
|
|
```bash
|
|
-db string
|
|
Path to main database file (REQUIRED)
|
|
|
|
-addr string
|
|
HTTP server address (default ":8080")
|
|
```
|
|
|
|
**Example:**
|
|
```bash
|
|
./metadata-api -db /data/main_database.sqlite3 -addr :8080
|
|
```
|
|
|
|
### Docker
|
|
|
|
**Image:** `ghcr.io/aunali321/music-metadata-api:latest`
|
|
**Base:** Alpine Linux 3.21
|
|
|
|
**docker-compose.yml:**
|
|
```yaml
|
|
services:
|
|
metadata-api:
|
|
image: ghcr.io/aunali321/music-metadata-api:latest
|
|
ports:
|
|
- "8080:8080"
|
|
volumes:
|
|
- ./data:/data:ro
|
|
environment:
|
|
- LOG_LEVEL=info # NOTE: Not actually used in code
|
|
command: ["-db", "/data/main_database.sqlite3"]
|
|
healthcheck:
|
|
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
restart: unless-stopped
|
|
```
|
|
|
|
## Architecture Layers
|
|
|
|
### Directory Structure
|
|
|
|
```
|
|
music-metadata-api/
|
|
├── cmd/
|
|
│ └── server/
|
|
│ └── main.go # Entry point (62 lines)
|
|
├── internal/
|
|
│ ├── api/ # HTTP handlers, routing, middleware
|
|
│ │ ├── handlers.go
|
|
│ │ ├── ratelimit.go
|
|
│ │ └── openapi.go
|
|
│ ├── db/
|
|
│ │ └── db.go # Database layer (907 lines)
|
|
│ └── models/
|
|
│ └── models.go # Data structures (65 lines)
|
|
├── Dockerfile
|
|
├── docker-compose.yml
|
|
└── .github/
|
|
└── workflows/
|
|
└── docker-publish.yml
|
|
```
|
|
|
|
### Layer Responsibilities
|
|
|
|
**API Layer** (`internal/api/`)
|
|
- HTTP request handling
|
|
- Rate limiting (token bucket, per-IP)
|
|
- OpenAPI specification serving
|
|
- Swagger UI hosting
|
|
|
|
**Database Layer** (`internal/db/`)
|
|
- SQLite connection management
|
|
- Query execution
|
|
- Data enrichment (joining related entities)
|
|
- Batch optimization
|
|
|
|
**Models Layer** (`internal/models/`)
|
|
- Data structure definitions
|
|
- JSON serialization tags
|
|
- Response formatting
|
|
|
|
## Key Features
|
|
|
|
### Performance Optimizations
|
|
|
|
1. **Read-only databases** - No write locks, safe concurrent reads
|
|
2. **Conservative PRAGMAs** - Optimized for read-heavy workloads
|
|
3. **Batch endpoints** - Process up to 400 items per request
|
|
4. **Connection pooling** - MaxOpenConns=8 for controlled resource usage
|
|
5. **Memory-mapped I/O** - 1GB mmap for faster reads
|
|
|
|
### API Capabilities
|
|
|
|
- **Batch lookup** - Retrieve multiple tracks/albums/artists in single request
|
|
- **ISRC lookup** - Industry-standard track identification
|
|
- **Search** - Full-text search on tracks and artists
|
|
- **Relationship traversal** - Album tracks, artist albums, track artists
|
|
- **OpenAPI documentation** - Interactive Swagger UI at `/docs`
|
|
|
|
### Operational Features
|
|
|
|
- **Graceful shutdown** - 10-second timeout for in-flight requests
|
|
- **Health checks** - `/health` endpoint for monitoring
|
|
- **Rate limiting** - 100 req/s with 200 burst capacity
|
|
- **Structured logging** - Go stdlib `log/slog` for error tracking
|
|
|
|
## Deployment Models
|
|
|
|
### Standalone Binary
|
|
|
|
**Pros:**
|
|
- Single executable, no dependencies
|
|
- Minimal resource footprint
|
|
- Direct filesystem access to databases
|
|
|
|
**Cons:**
|
|
- Manual process management
|
|
- No automatic restarts
|
|
- Manual log rotation
|
|
|
|
### Docker Container
|
|
|
|
**Pros:**
|
|
- Consistent runtime environment
|
|
- Built-in health checks
|
|
- Automatic restarts
|
|
- Easy horizontal scaling
|
|
|
|
**Cons:**
|
|
- Requires Docker runtime
|
|
- Additional layer of abstraction
|
|
- Volume mount for large databases
|
|
|
|
## Use Cases
|
|
|
|
### Primary Use Cases
|
|
|
|
1. **Music library enrichment** - Add metadata to existing track collections
|
|
2. **ISRC-based lookup** - Resolve ISRCs to full track metadata
|
|
3. **Batch processing** - Enrich large catalogs efficiently
|
|
4. **Self-hosted alternative** - Replace commercial APIs with local service
|
|
|
|
### Integration Scenarios
|
|
|
|
- **Metadata aggregator pipelines** - Complement MusicBrainz with Spotify-style data
|
|
- **Music streaming services** - Populate track/album/artist information
|
|
- **DJ software** - Enrich track libraries with popularity, genres, images
|
|
- **Music analytics** - Analyze trends across 256M tracks
|
|
|
|
## Limitations
|
|
|
|
### Technical Constraints
|
|
|
|
- **Database size** - Requires 216GB disk space
|
|
- **No write operations** - Read-only, no data updates
|
|
- **No authentication** - Public API, no access control
|
|
- **No CORS** - Browser-based clients blocked
|
|
- **Memory leak** - Rate limiter visitor map grows unbounded
|
|
|
|
### Data Constraints
|
|
|
|
- **Database provenance unclear** - "Not affiliated with Spotify"
|
|
- **No freshness mechanism** - Static snapshot, no updates
|
|
- **Search performance** - LIKE queries slow on large datasets (no FTS)
|
|
|
|
### Operational Constraints
|
|
|
|
- **No metrics** - No Prometheus, no counters
|
|
- **Naive health check** - Doesn't verify database connectivity
|
|
- **Hardcoded config** - Timeouts, limits not configurable
|
|
- **No tests** - Zero test coverage
|
|
|
|
## Project Maturity
|
|
|
|
### Strengths
|
|
|
|
- Clean, simple codebase
|
|
- Production-ready Docker setup
|
|
- Comprehensive OpenAPI spec
|
|
- Massive dataset (256M tracks)
|
|
- Pure Go (no CGO complexity)
|
|
|
|
### Weaknesses
|
|
|
|
- Single maintainer
|
|
- No test suite
|
|
- No CI test step
|
|
- Unused config (LOG_LEVEL)
|
|
- Memory leak in rate limiter
|
|
|
|
## Comparison to Alternatives
|
|
|
|
| Feature | Music Metadata API | Spotify Web API | MusicBrainz API |
|
|
|---------|-------------------|-----------------|-----------------|
|
|
| Self-hosted | Yes | No | No |
|
|
| Authentication | None | OAuth required | Optional |
|
|
| Dataset size | 256M tracks | Full catalog | ~40M recordings |
|
|
| Rate limits | 100 req/s | Varies by tier | 1 req/s |
|
|
| Batch support | 400 items | 50 items | Limited |
|
|
| Cost | Free (MIT) | Free tier limited | Free |
|
|
| Data freshness | Static | Real-time | Community-updated |
|
|
| Identifier | ISRC, internal IDs | Spotify IDs | MBIDs |
|
|
|
|
## Getting Started
|
|
|
|
### Minimum Requirements
|
|
|
|
1. Go 1.24+ (for building from source)
|
|
2. 216GB disk space for databases
|
|
3. Database files (not included in repository)
|
|
4. 2GB+ RAM recommended
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# Clone repository
|
|
git clone https://github.com/Aunali321/music-metadata-api.git
|
|
cd music-metadata-api
|
|
|
|
# Build binary
|
|
CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server
|
|
|
|
# Run server (assumes databases in /data)
|
|
./metadata-api -db /data/main_database.sqlite3 -addr :8080
|
|
|
|
# Test health endpoint
|
|
curl http://localhost:8080/health
|
|
|
|
# View API documentation
|
|
open http://localhost:8080/docs
|
|
```
|
|
|
|
### Docker Quick Start
|
|
|
|
```bash
|
|
# Pull image
|
|
docker pull ghcr.io/aunali321/music-metadata-api:latest
|
|
|
|
# Run container
|
|
docker run -d \
|
|
-p 8080:8080 \
|
|
-v /path/to/databases:/data:ro \
|
|
ghcr.io/aunali321/music-metadata-api:latest \
|
|
-db /data/main_database.sqlite3
|
|
|
|
# Check health
|
|
curl http://localhost:8080/health
|
|
```
|
|
|
|
## Documentation Resources
|
|
|
|
- **OpenAPI Spec:** http://localhost:8080/openapi.yaml
|
|
- **Interactive Docs:** http://localhost:8080/docs
|
|
- **GitHub Repository:** https://github.com/Aunali321/music-metadata-api
|
|
- **Docker Image:** ghcr.io/aunali321/music-metadata-api
|
|
|
|
## License
|
|
|
|
MIT License - Free for commercial and personal use with attribution.
|