Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

8.8 KiB

Music Metadata API - Overview

Project Identity

Name: Music Metadata API
Repository: https://github.com/Aunali321/music-metadata-api
License: MIT
Language: Go 1.24
Maintainer: Single maintainer (Aunali321)
Status: Active, production-ready

Purpose

Music Metadata API provides a self-hosted HTTP service for querying metadata on 256 million music tracks. The service operates entirely from pre-populated SQLite databases, requiring no external API calls at runtime. It's designed as a high-performance alternative to commercial music metadata APIs like Spotify's Web API.

Core Technology Stack

Runtime Dependencies

Component Version Purpose Notes
Go 1.24 Runtime & stdlib HTTP server Uses Go 1.22+ enhanced routing
modernc.org/sqlite v1.34.4 Pure Go SQLite driver No CGO required
golang.org/x/time v0.14.0 Rate limiting (token bucket) Only external dependency

Build Configuration

CGO_ENABLED=0 go build -ldflags="-s -w" ./cmd/server

Flags explained:

  • CGO_ENABLED=0: Pure Go binary, no C dependencies
  • -s -w: Strip debug symbols and DWARF tables (smaller binary)

Data Scale

Database Files

Database Size Purpose Records
main_database.sqlite3 ~117GB Core metadata (tracks, albums, artists) 256M tracks
track_files.sqlite3 ~99GB Extended track data (lyrics flags, languages, roles) 256M track files
Total ~216GB Combined storage requirement -

Dataset Coverage

  • 256 million tracks across all databases
  • Album metadata with images, labels, release dates
  • Artist metadata with genres, follower counts, popularity scores
  • ISRC codes for track identification
  • Multi-language support (language_of_performance field)
  • Artist role information (performer, composer, etc.)

Entry Points

Command Line

Binary: cmd/server/main.go (62 lines)

Flags:

-db string
    Path to main database file (REQUIRED)
    
-addr string
    HTTP server address (default ":8080")

Example:

./metadata-api -db /data/main_database.sqlite3 -addr :8080

Docker

Image: ghcr.io/aunali321/music-metadata-api:latest
Base: Alpine Linux 3.21

docker-compose.yml:

services:
  metadata-api:
    image: ghcr.io/aunali321/music-metadata-api:latest
    ports:
      - "8080:8080"
    volumes:
      - ./data:/data:ro
    environment:
      - LOG_LEVEL=info  # NOTE: Not actually used in code
    command: ["-db", "/data/main_database.sqlite3"]
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped

Architecture Layers

Directory Structure

music-metadata-api/
├── cmd/
│   └── server/
│       └── main.go          # Entry point (62 lines)
├── internal/
│   ├── api/                 # HTTP handlers, routing, middleware
│   │   ├── handlers.go
│   │   ├── ratelimit.go
│   │   └── openapi.go
│   ├── db/
│   │   └── db.go            # Database layer (907 lines)
│   └── models/
│       └── models.go        # Data structures (65 lines)
├── Dockerfile
├── docker-compose.yml
└── .github/
    └── workflows/
        └── docker-publish.yml

Layer Responsibilities

API Layer (internal/api/)

  • HTTP request handling
  • Rate limiting (token bucket, per-IP)
  • OpenAPI specification serving
  • Swagger UI hosting

Database Layer (internal/db/)

  • SQLite connection management
  • Query execution
  • Data enrichment (joining related entities)
  • Batch optimization

Models Layer (internal/models/)

  • Data structure definitions
  • JSON serialization tags
  • Response formatting

Key Features

Performance Optimizations

  1. Read-only databases - No write locks, safe concurrent reads
  2. Conservative PRAGMAs - Optimized for read-heavy workloads
  3. Batch endpoints - Process up to 400 items per request
  4. Connection pooling - MaxOpenConns=8 for controlled resource usage
  5. Memory-mapped I/O - 1GB mmap for faster reads

API Capabilities

  • Batch lookup - Retrieve multiple tracks/albums/artists in single request
  • ISRC lookup - Industry-standard track identification
  • Search - Full-text search on tracks and artists
  • Relationship traversal - Album tracks, artist albums, track artists
  • OpenAPI documentation - Interactive Swagger UI at /docs

Operational Features

  • Graceful shutdown - 10-second timeout for in-flight requests
  • Health checks - /health endpoint for monitoring
  • Rate limiting - 100 req/s with 200 burst capacity
  • Structured logging - Go stdlib log/slog for error tracking

Deployment Models

Standalone Binary

Pros:

  • Single executable, no dependencies
  • Minimal resource footprint
  • Direct filesystem access to databases

Cons:

  • Manual process management
  • No automatic restarts
  • Manual log rotation

Docker Container

Pros:

  • Consistent runtime environment
  • Built-in health checks
  • Automatic restarts
  • Easy horizontal scaling

Cons:

  • Requires Docker runtime
  • Additional layer of abstraction
  • Volume mount for large databases

Use Cases

Primary Use Cases

  1. Music library enrichment - Add metadata to existing track collections
  2. ISRC-based lookup - Resolve ISRCs to full track metadata
  3. Batch processing - Enrich large catalogs efficiently
  4. Self-hosted alternative - Replace commercial APIs with local service

Integration Scenarios

  • Metadata aggregator pipelines - Complement MusicBrainz with Spotify-style data
  • Music streaming services - Populate track/album/artist information
  • DJ software - Enrich track libraries with popularity, genres, images
  • Music analytics - Analyze trends across 256M tracks

Limitations

Technical Constraints

  • Database size - Requires 216GB disk space
  • No write operations - Read-only, no data updates
  • No authentication - Public API, no access control
  • No CORS - Browser-based clients blocked
  • Memory leak - Rate limiter visitor map grows unbounded

Data Constraints

  • Database provenance unclear - "Not affiliated with Spotify"
  • No freshness mechanism - Static snapshot, no updates
  • Search performance - LIKE queries slow on large datasets (no FTS)

Operational Constraints

  • No metrics - No Prometheus, no counters
  • Naive health check - Doesn't verify database connectivity
  • Hardcoded config - Timeouts, limits not configurable
  • No tests - Zero test coverage

Project Maturity

Strengths

  • Clean, simple codebase
  • Production-ready Docker setup
  • Comprehensive OpenAPI spec
  • Massive dataset (256M tracks)
  • Pure Go (no CGO complexity)

Weaknesses

  • Single maintainer
  • No test suite
  • No CI test step
  • Unused config (LOG_LEVEL)
  • Memory leak in rate limiter

Comparison to Alternatives

Feature Music Metadata API Spotify Web API MusicBrainz API
Self-hosted Yes No No
Authentication None OAuth required Optional
Dataset size 256M tracks Full catalog ~40M recordings
Rate limits 100 req/s Varies by tier 1 req/s
Batch support 400 items 50 items Limited
Cost Free (MIT) Free tier limited Free
Data freshness Static Real-time Community-updated
Identifier ISRC, internal IDs Spotify IDs MBIDs

Getting Started

Minimum Requirements

  1. Go 1.24+ (for building from source)
  2. 216GB disk space for databases
  3. Database files (not included in repository)
  4. 2GB+ RAM recommended

Quick Start

# Clone repository
git clone https://github.com/Aunali321/music-metadata-api.git
cd music-metadata-api

# Build binary
CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server

# Run server (assumes databases in /data)
./metadata-api -db /data/main_database.sqlite3 -addr :8080

# Test health endpoint
curl http://localhost:8080/health

# View API documentation
open http://localhost:8080/docs

Docker Quick Start

# Pull image
docker pull ghcr.io/aunali321/music-metadata-api:latest

# Run container
docker run -d \
  -p 8080:8080 \
  -v /path/to/databases:/data:ro \
  ghcr.io/aunali321/music-metadata-api:latest \
  -db /data/main_database.sqlite3

# Check health
curl http://localhost:8080/health

Documentation Resources

License

MIT License - Free for commercial and personal use with attribution.