metadata-agregator/docs/research/music-metadata-api/analysis/OVERVIEW.md

# Music Metadata API - Overview

## Project Identity

**Name:** Music Metadata API
**Repository:** https://github.com/Aunali321/music-metadata-api
**License:** MIT
**Language:** Go 1.24
**Maintainer:** Single maintainer (Aunali321)
**Status:** Active, production-ready

## Purpose

Music Metadata API provides a self-hosted HTTP service for querying metadata on 256 million music tracks. The service operates entirely from pre-populated SQLite databases, requiring no external API calls at runtime. It's designed as a high-performance alternative to commercial music metadata APIs like Spotify's Web API.

## Core Technology Stack

### Runtime Dependencies

| Component | Version | Purpose | Notes |
|-----------|---------|---------|-------|
| Go | 1.24 | Runtime & stdlib HTTP server | Uses Go 1.22+ enhanced routing |
| modernc.org/sqlite | v1.34.4 | Pure Go SQLite driver | No CGO required |
| golang.org/x/time | v0.14.0 | Rate limiting (token bucket) | Only external dependency |

### Build Configuration

```bash
CGO_ENABLED=0 go build -ldflags="-s -w" ./cmd/server
```

**Flags explained:**
- `CGO_ENABLED=0`: Pure Go binary, no C dependencies
- `-s -w`: Strip debug symbols and DWARF tables (smaller binary)

## Data Scale

### Database Files

| Database | Size | Purpose | Records |
|----------|------|---------|---------|
| main_database.sqlite3 | ~117GB | Core metadata (tracks, albums, artists) | 256M tracks |
| track_files.sqlite3 | ~99GB | Extended track data (lyrics flags, languages, roles) | 256M track files |
| **Total** | **~216GB** | Combined storage requirement | - |

### Dataset Coverage

- **256 million tracks** across all databases
- Album metadata with images, labels, release dates
- Artist metadata with genres, follower counts, popularity scores
- ISRC codes for track identification
- Multi-language support (language_of_performance field)
- Artist role information (performer, composer, etc.)

## Entry Points

### Command Line

**Binary:** `cmd/server/main.go` (62 lines)

**Flags:**
```bash
-db string
    Path to main database file (REQUIRED)

-addr string
    HTTP server address (default ":8080")
```

**Example:**
```bash
./metadata-api -db /data/main_database.sqlite3 -addr :8080
```

### Docker

**Image:** `ghcr.io/aunali321/music-metadata-api:latest`
**Base:** Alpine Linux 3.21

**docker-compose.yml:**
```yaml
services:
  metadata-api:
    image: ghcr.io/aunali321/music-metadata-api:latest
    ports:
      - "8080:8080"
    volumes:
      - ./data:/data:ro
    environment:
      - LOG_LEVEL=info  # NOTE: Not actually used in code
    command: ["-db", "/data/main_database.sqlite3"]
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
    restart: unless-stopped
```

## Architecture Layers

### Directory Structure

```
music-metadata-api/
├── cmd/
│   └── server/
│       └── main.go          # Entry point (62 lines)
├── internal/
│   ├── api/                 # HTTP handlers, routing, middleware
│   │   ├── handlers.go
│   │   ├── ratelimit.go
│   │   └── openapi.go
│   ├── db/
│   │   └── db.go            # Database layer (907 lines)
│   └── models/
│       └── models.go        # Data structures (65 lines)
├── Dockerfile
├── docker-compose.yml
└── .github/
    └── workflows/
        └── docker-publish.yml
```

### Layer Responsibilities

**API Layer** (`internal/api/`)
- HTTP request handling
- Rate limiting (token bucket, per-IP)
- OpenAPI specification serving
- Swagger UI hosting

**Database Layer** (`internal/db/`)
- SQLite connection management
- Query execution
- Data enrichment (joining related entities)
- Batch optimization

**Models Layer** (`internal/models/`)
- Data structure definitions
- JSON serialization tags
- Response formatting

## Key Features

### Performance Optimizations

1. **Read-only databases** - No write locks, safe concurrent reads
2. **Conservative PRAGMAs** - Optimized for read-heavy workloads
3. **Batch endpoints** - Process up to 400 items per request
4. **Connection pooling** - MaxOpenConns=8 for controlled resource usage
5. **Memory-mapped I/O** - 1GB mmap for faster reads

### API Capabilities

- **Batch lookup** - Retrieve multiple tracks/albums/artists in single request
- **ISRC lookup** - Industry-standard track identification
- **Search** - Full-text search on tracks and artists
- **Relationship traversal** - Album tracks, artist albums, track artists
- **OpenAPI documentation** - Interactive Swagger UI at `/docs`

### Operational Features

- **Graceful shutdown** - 10-second timeout for in-flight requests
- **Health checks** - `/health` endpoint for monitoring
- **Rate limiting** - 100 req/s with 200 burst capacity
- **Structured logging** - Go stdlib `log/slog` for error tracking

## Deployment Models

### Standalone Binary

**Pros:**
- Single executable, no dependencies
- Minimal resource footprint
- Direct filesystem access to databases

**Cons:**
- Manual process management
- No automatic restarts
- Manual log rotation

### Docker Container

**Pros:**
- Consistent runtime environment
- Built-in health checks
- Automatic restarts
- Easy horizontal scaling

**Cons:**
- Requires Docker runtime
- Additional layer of abstraction
- Volume mount for large databases

## Use Cases

### Primary Use Cases

1. **Music library enrichment** - Add metadata to existing track collections
2. **ISRC-based lookup** - Resolve ISRCs to full track metadata
3. **Batch processing** - Enrich large catalogs efficiently
4. **Self-hosted alternative** - Replace commercial APIs with local service

### Integration Scenarios

- **Metadata aggregator pipelines** - Complement MusicBrainz with Spotify-style data
- **Music streaming services** - Populate track/album/artist information
- **DJ software** - Enrich track libraries with popularity, genres, images
- **Music analytics** - Analyze trends across 256M tracks

## Limitations

### Technical Constraints

- **Database size** - Requires 216GB disk space
- **No write operations** - Read-only, no data updates
- **No authentication** - Public API, no access control
- **No CORS** - Browser-based clients blocked
- **Memory leak** - Rate limiter visitor map grows unbounded

### Data Constraints

- **Database provenance unclear** - "Not affiliated with Spotify"
- **No freshness mechanism** - Static snapshot, no updates
- **Search performance** - LIKE queries slow on large datasets (no FTS)

### Operational Constraints

- **No metrics** - No Prometheus, no counters
- **Naive health check** - Doesn't verify database connectivity
- **Hardcoded config** - Timeouts, limits not configurable
- **No tests** - Zero test coverage

## Project Maturity

### Strengths

- Clean, simple codebase
- Production-ready Docker setup
- Comprehensive OpenAPI spec
- Massive dataset (256M tracks)
- Pure Go (no CGO complexity)

### Weaknesses

- Single maintainer
- No test suite
- No CI test step
- Unused config (LOG_LEVEL)
- Memory leak in rate limiter

## Comparison to Alternatives

| Feature | Music Metadata API | Spotify Web API | MusicBrainz API |
|---------|-------------------|-----------------|-----------------|
| Self-hosted | Yes | No | No |
| Authentication | None | OAuth required | Optional |
| Dataset size | 256M tracks | Full catalog | ~40M recordings |
| Rate limits | 100 req/s | Varies by tier | 1 req/s |
| Batch support | 400 items | 50 items | Limited |
| Cost | Free (MIT) | Free tier limited | Free |
| Data freshness | Static | Real-time | Community-updated |
| Identifier | ISRC, internal IDs | Spotify IDs | MBIDs |

## Getting Started

### Minimum Requirements

1. Go 1.24+ (for building from source)
2. 216GB disk space for databases
3. Database files (not included in repository)
4. 2GB+ RAM recommended

### Quick Start

```bash
# Clone repository
git clone https://github.com/Aunali321/music-metadata-api.git
cd music-metadata-api

# Build binary
CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server

# Run server (assumes databases in /data)
./metadata-api -db /data/main_database.sqlite3 -addr :8080

# Test health endpoint
curl http://localhost:8080/health

# View API documentation
open http://localhost:8080/docs
```

### Docker Quick Start

```bash
# Pull image
docker pull ghcr.io/aunali321/music-metadata-api:latest

# Run container
docker run -d \
  -p 8080:8080 \
  -v /path/to/databases:/data:ro \
  ghcr.io/aunali321/music-metadata-api:latest \
  -db /data/main_database.sqlite3

# Check health
curl http://localhost:8080/health
```

## Documentation Resources

- **OpenAPI Spec:** http://localhost:8080/openapi.yaml
- **Interactive Docs:** http://localhost:8080/docs
- **GitHub Repository:** https://github.com/Aunali321/music-metadata-api
- **Docker Image:** ghcr.io/aunali321/music-metadata-api

## License

MIT License - Free for commercial and personal use with attribution.