Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

21 KiB

Music Metadata API - Integrations

Integration Overview

Music Metadata API is a fully self-contained service with zero external integrations at runtime. All data is served from pre-populated SQLite databases with no external API calls, no authentication services, and no third-party dependencies beyond the Go runtime.

┌─────────────────────────────────────────────────────────────┐
│                    Music Metadata API                        │
│                   (Self-Contained Service)                   │
│                                                              │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐           │
│  │   HTTP     │  │  Database  │  │   Models   │           │
│  │  Handlers  │→ │   Layer    │→ │   Layer    │           │
│  └────────────┘  └────────────┘  └────────────┘           │
│                         ↓                                    │
│                  ┌─────────────┐                            │
│                  │   SQLite    │                            │
│                  │  Databases  │                            │
│                  │  (216GB)    │                            │
│                  └─────────────┘                            │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ NO external calls
                            ↓
                    (All data local)

Runtime Dependencies

Go Standard Library

Packages used:

  • net/http - HTTP server and routing
  • database/sql - Database interface
  • encoding/json - JSON serialization
  • log/slog - Structured logging
  • context - Request context and timeouts
  • sync - Concurrency primitives (RWMutex)
  • flag - CLI argument parsing
  • os/signal - Graceful shutdown

No external HTTP calls: All functionality implemented with stdlib.

External Go Modules

modernc.org/sqlite v1.34.4

  • Pure Go SQLite driver
  • No CGO required
  • No C dependencies
  • No external network calls

golang.org/x/time v0.14.0

  • Rate limiting (token bucket)
  • No external network calls
  • Pure algorithm implementation

Total external dependencies: 2 packages (both offline)

Data Sources

Pre-Populated Databases

Source: User must obtain databases separately (not included in repository)

Database files:

  • main_database.sqlite3 (~117GB)
  • track_files.sqlite3 (~99GB)

Provenance: Unclear (repository states "not affiliated with Spotify")

Update mechanism: None (static snapshot)

Implications:

  • No real-time data sync
  • No automatic updates
  • User responsible for obtaining databases
  • Legal status uncertain

No External APIs

What's NOT integrated:

  • Spotify Web API (no OAuth, no API calls)
  • MusicBrainz API (no lookups)
  • Last.fm API (no scrobbling)
  • Discogs API (no catalog queries)
  • AcoustID API (no fingerprinting)
  • Cover Art Archive (no image fetching)

All data served from local databases.

Browser-Side Dependencies

Swagger UI (Documentation Only)

Endpoint: /docs

External resources loaded by browser:

<!-- Loaded from unpkg.com CDN -->
<script src="https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js"></script>
<link rel="stylesheet" href="https://unpkg.com/swagger-ui-dist@5/swagger-ui.css" />

Characteristics:

  • Loaded client-side (browser fetches)
  • Server doesn't make requests to unpkg.com
  • Works offline after first load (browser cache)
  • Only affects /docs endpoint (not API functionality)

Implications:

  • Requires internet connection for first /docs visit
  • Subsequent visits work offline (cached)
  • API endpoints work without internet

Image URLs (External CDN)

Image hosting: Spotify CDN (i.scdn.co)

Example URLs:

https://i.scdn.co/image/ab67616d0000b273ce4f1737bc8a646c8c4bd25a
https://i.scdn.co/image/af2b8e57f6d7b5d1c9a5f3e8d4c2b1a0e9f8d7c6

Characteristics:

  • API returns URLs (not image data)
  • Client responsible for fetching images
  • Server never fetches images
  • Images hosted externally (not by API)

Implications:

  • Image availability depends on Spotify CDN
  • No image caching by API
  • Clients need internet to display images
  • Broken links possible if Spotify removes images

No Authentication Integration

No OAuth

What's missing:

  • No OAuth 2.0 flow
  • No token validation
  • No user authentication
  • No API keys

Implications:

  • Public API (anyone can query)
  • No usage tracking per user
  • No quota enforcement per user
  • No access control

Workarounds:

  • Deploy behind reverse proxy with auth (nginx, Caddy)
  • Use API gateway (Kong, Tyk)
  • Implement custom middleware

No Authorization

What's missing:

  • No role-based access control (RBAC)
  • No permission system
  • No resource ownership

Implications:

  • All data accessible to all clients
  • No private/public data distinction
  • No user-specific data

No Monitoring Integration

No Metrics Exporters

What's missing:

  • No Prometheus metrics
  • No StatsD integration
  • No OpenTelemetry
  • No custom metrics endpoint

Implications:

  • No visibility into request rates
  • No error rate tracking
  • No latency percentiles
  • No resource usage metrics

Workarounds:

  • Parse logs for metrics
  • Use reverse proxy metrics (nginx, Envoy)
  • Implement custom metrics middleware

No Distributed Tracing

What's missing:

  • No Jaeger integration
  • No Zipkin support
  • No trace context propagation

Implications:

  • Can't trace requests across services
  • No performance profiling
  • No bottleneck identification

Workarounds:

  • Add custom tracing middleware
  • Use APM tools (Datadog, New Relic)

No Log Aggregation

What's missing:

  • No Elasticsearch integration
  • No Splunk forwarding
  • No CloudWatch Logs
  • No structured log shipping

Logging: Go stdlib log/slog to stdout

Implications:

  • Logs only in container/process stdout
  • No centralized log storage
  • No log search/analysis

Workarounds:

  • Docker log drivers (json-file, syslog, fluentd)
  • Kubernetes log collectors (Fluentd, Filebeat)
  • Redirect stdout to log aggregator

No Message Queue Integration

What's missing:

  • No RabbitMQ
  • No Kafka
  • No Redis Pub/Sub
  • No AWS SQS

Implications:

  • Synchronous request/response only
  • No async job processing
  • No event streaming
  • No background tasks

Use case: All queries processed synchronously (acceptable for read-only API)

No Cache Integration

No External Cache

What's missing:

  • No Redis
  • No Memcached
  • No Varnish

Caching: SQLite page cache only (64MB per connection)

Implications:

  • No shared cache across instances
  • No cache invalidation strategy
  • No cache warming
  • Cold start on each instance

Workarounds:

  • Add Redis layer for hot data
  • Use HTTP caching headers (not implemented)
  • Deploy CDN in front of API

No HTTP Caching

What's missing:

  • No Cache-Control headers
  • No ETag support
  • No Last-Modified headers

Implications:

  • Clients can't cache responses
  • Repeated requests hit database
  • No bandwidth savings

Workarounds:

  • Add caching middleware
  • Use reverse proxy with caching (Varnish, nginx)

No Database Replication

What's missing:

  • No master-slave replication
  • No read replicas
  • No database clustering

Database: Single SQLite file per instance

Implications:

  • Each instance has full database copy (216GB)
  • No shared database across instances
  • Horizontal scaling requires full database per instance

Workarounds:

  • Read-only databases safe to copy
  • Use network filesystem (NFS, EFS) for shared access
  • Replicate databases to multiple instances

No Service Discovery

What's missing:

  • No Consul integration
  • No etcd
  • No Kubernetes service discovery
  • No DNS-based discovery

Deployment: Static configuration (IP:port)

Implications:

  • Manual load balancer configuration
  • No dynamic scaling
  • No health-based routing

Workarounds:

  • Use Kubernetes services (automatic discovery)
  • Use cloud load balancers (AWS ALB, GCP LB)
  • Use service mesh (Istio, Linkerd)

No Configuration Management

No External Config

What's missing:

  • No Consul KV
  • No etcd
  • No AWS Parameter Store
  • No HashiCorp Vault

Configuration: CLI flags only (-db, -addr)

Implications:

  • All config at startup
  • No dynamic reconfiguration
  • No secrets management
  • Hardcoded timeouts/limits

Workarounds:

  • Use environment variables (requires code changes)
  • Mount config files (requires code changes)
  • Use init containers to generate config

No Secrets Management

What's missing:

  • No Vault integration
  • No AWS Secrets Manager
  • No Kubernetes secrets
  • No encrypted config

Secrets: None required (no authentication)

Implications:

  • No sensitive data to protect
  • No credential rotation
  • No encryption at rest

Future consideration: If adding authentication, integrate secrets manager

Integration Patterns

Reverse Proxy Integration

Use case: Add authentication, CORS, caching, SSL

Example with nginx:

upstream metadata_api {
    server localhost:8080;
}

server {
    listen 443 ssl;
    server_name api.example.com;
    
    ssl_certificate /etc/ssl/cert.pem;
    ssl_certificate_key /etc/ssl/key.pem;
    
    # CORS headers
    add_header Access-Control-Allow-Origin *;
    add_header Access-Control-Allow-Methods "GET, POST, OPTIONS";
    
    # Caching
    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m;
    proxy_cache api_cache;
    proxy_cache_valid 200 1h;
    
    # Authentication
    auth_basic "Restricted";
    auth_basic_user_file /etc/nginx/.htpasswd;
    
    location / {
        proxy_pass http://metadata_api;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

API Gateway Integration

Use case: Rate limiting, authentication, analytics

Example with Kong:

services:
  - name: metadata-api
    url: http://localhost:8080
    routes:
      - name: metadata-routes
        paths:
          - /
    plugins:
      - name: rate-limiting
        config:
          minute: 1000
          policy: local
      - name: key-auth
        config:
          key_names:
            - apikey
      - name: prometheus
        config:
          per_consumer: true

Load Balancer Integration

Use case: Distribute traffic across multiple instances

Example with HAProxy:

frontend metadata_frontend
    bind *:80
    default_backend metadata_backend

backend metadata_backend
    balance roundrobin
    option httpchk GET /health
    server api1 10.0.1.10:8080 check
    server api2 10.0.1.11:8080 check
    server api3 10.0.1.12:8080 check

Kubernetes Integration

Use case: Container orchestration, auto-scaling

Example deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metadata-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: metadata-api
  template:
    metadata:
      labels:
        app: metadata-api
    spec:
      containers:
      - name: api
        image: ghcr.io/aunali321/music-metadata-api:latest
        args: ["-db", "/data/main_database.sqlite3"]
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: database
          mountPath: /data
          readOnly: true
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 30
        resources:
          requests:
            memory: "4Gi"
            cpu: "1"
          limits:
            memory: "8Gi"
            cpu: "2"
      volumes:
      - name: database
        persistentVolumeClaim:
          claimName: metadata-db-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: metadata-api
spec:
  selector:
    app: metadata-api
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer

Monitoring Integration

Use case: Metrics, logs, traces

Example with Prometheus + Grafana:

1. Add metrics exporter (custom middleware):

// Not implemented in current codebase
import "github.com/prometheus/client_golang/prometheus"

var (
    requestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{Name: "api_requests_total"},
        []string{"method", "endpoint", "status"},
    )
    requestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{Name: "api_request_duration_seconds"},
        []string{"method", "endpoint"},
    )
)

2. Scrape metrics with Prometheus:

scrape_configs:
  - job_name: 'metadata-api'
    static_configs:
      - targets: ['localhost:8080']

3. Visualize in Grafana:

  • Request rate dashboard
  • Error rate dashboard
  • Latency percentiles (p50, p95, p99)

Logging Integration

Use case: Centralized log aggregation

Example with Fluentd:

1. Configure Docker logging driver:

services:
  metadata-api:
    image: ghcr.io/aunali321/music-metadata-api:latest
    logging:
      driver: fluentd
      options:
        fluentd-address: localhost:24224
        tag: metadata-api

2. Fluentd configuration:

<source>
  @type forward
  port 24224
</source>

<match metadata-api>
  @type elasticsearch
  host elasticsearch
  port 9200
  index_name metadata-api
  type_name _doc
</match>

Caching Integration

Use case: Reduce database load, improve latency

Example with Redis:

1. Add Redis middleware (custom implementation):

// Not implemented in current codebase
func cacheMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Check Redis cache
        cached, err := redisClient.Get(r.URL.Path).Result()
        if err == nil {
            w.Write([]byte(cached))
            return
        }
        
        // Cache miss, call handler
        rec := httptest.NewRecorder()
        next.ServeHTTP(rec, r)
        
        // Store in Redis (1 hour TTL)
        redisClient.Set(r.URL.Path, rec.Body.String(), time.Hour)
        
        w.Write(rec.Body.Bytes())
    })
}

2. Deploy Redis:

services:
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

Complementary Services

MusicBrainz Integration

Use case: Resolve MBIDs to ISRCs, then lookup in Music Metadata API

Flow:

1. Query MusicBrainz for recording by MBID
   ↓
2. Extract ISRC from MusicBrainz response
   ↓
3. Lookup ISRC in Music Metadata API
   ↓
4. Merge metadata (MusicBrainz credits + Spotify-style data)

Example:

import requests

# Step 1: Get ISRC from MusicBrainz
mb_url = "https://musicbrainz.org/ws/2/recording/abc-123?fmt=json&inc=isrcs"
mb_response = requests.get(mb_url).json()
isrc = mb_response['isrcs'][0]

# Step 2: Lookup in Music Metadata API
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
mm_response = requests.get(mm_url).json()

# Step 3: Merge metadata
merged = {
    "mbid": "abc-123",
    "isrc": isrc,
    "title": mm_response['name'],
    "popularity": mm_response['popularity'],
    "credits": mb_response['artist-credit']
}

AcoustID Integration

Use case: Fingerprint audio files, resolve to ISRCs

Flow:

1. Generate audio fingerprint (chromaprint)
   ↓
2. Query AcoustID API with fingerprint
   ↓
3. Extract ISRC from AcoustID response
   ↓
4. Lookup ISRC in Music Metadata API
   ↓
5. Tag audio file with metadata

Example:

import acoustid

# Step 1: Fingerprint audio file
duration, fingerprint = acoustid.fingerprint_file('song.mp3')

# Step 2: Query AcoustID
results = acoustid.lookup(api_key, fingerprint, duration, meta='recordings')

# Step 3: Extract ISRC
isrc = results['recordings'][0]['isrc']

# Step 4: Lookup in Music Metadata API
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
metadata = requests.get(mm_url).json()

# Step 5: Tag file
audio = mutagen.File('song.mp3')
audio['title'] = metadata['name']
audio['artist'] = metadata['artists'][0]['name']
audio.save()

Spotify Web API Integration

Use case: Get real-time data, then fallback to Music Metadata API

Flow:

1. Try Spotify Web API (requires OAuth)
   ↓
2. If rate limited or unavailable, fallback to Music Metadata API
   ↓
3. Return cached/static data from Music Metadata API

Example:

def get_track_metadata(isrc):
    try:
        # Try Spotify Web API (real-time)
        spotify_data = spotify_client.search(q=f"isrc:{isrc}", type="track")
        return spotify_data['tracks']['items'][0]
    except Exception:
        # Fallback to Music Metadata API (static)
        mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
        return requests.get(mm_url).json()

Deployment Integrations

Docker Compose

Use case: Local development, simple deployments

Example:

version: '3.8'
services:
  metadata-api:
    image: ghcr.io/aunali321/music-metadata-api:latest
    ports:
      - "8080:8080"
    volumes:
      - ./data:/data:ro
    command: ["-db", "/data/main_database.sqlite3"]
    restart: unless-stopped
    
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - metadata-api

Kubernetes

Use case: Production deployments, auto-scaling

See Kubernetes Integration section above

Cloud Platforms

AWS ECS:

{
  "family": "metadata-api",
  "containerDefinitions": [{
    "name": "api",
    "image": "ghcr.io/aunali321/music-metadata-api:latest",
    "memory": 4096,
    "cpu": 1024,
    "portMappings": [{"containerPort": 8080}],
    "command": ["-db", "/data/main_database.sqlite3"],
    "mountPoints": [{
      "sourceVolume": "database",
      "containerPath": "/data",
      "readOnly": true
    }]
  }],
  "volumes": [{
    "name": "database",
    "efsVolumeConfiguration": {
      "fileSystemId": "fs-12345678"
    }
  }]
}

Google Cloud Run:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: metadata-api
spec:
  template:
    spec:
      containers:
      - image: ghcr.io/aunali321/music-metadata-api:latest
        args: ["-db", "/data/main_database.sqlite3"]
        volumeMounts:
        - name: database
          mountPath: /data
          readOnly: true
      volumes:
      - name: database
        gcePersistentDisk:
          pdName: metadata-db
          readOnly: true

No Integration Advantages

Simplicity

Benefits:

  • No external service dependencies
  • No network calls (faster, more reliable)
  • No authentication complexity
  • No API rate limits (external)

Tradeoffs:

  • No real-time data
  • No automatic updates
  • No distributed features

Reliability

Benefits:

  • No cascading failures (no external dependencies)
  • No network timeouts (all local)
  • No third-party outages
  • Predictable performance

Tradeoffs:

  • Single point of failure (database file)
  • No redundancy (unless replicated)

Performance

Benefits:

  • No network latency (local database)
  • No API rate limits (self-imposed only)
  • Batch queries optimized (7 queries vs 2,800)

Tradeoffs:

  • Database size (216GB per instance)
  • Memory usage (2.5GB minimum)

Cost

Benefits:

  • No API subscription fees
  • No per-request charges
  • No data transfer costs (local)

Tradeoffs:

  • Storage costs (216GB)
  • Compute costs (self-hosted)

Future Integration Opportunities

Potential Additions

Authentication:

  • OAuth 2.0 provider (Keycloak, Auth0)
  • API key management (custom or Kong)

Monitoring:

  • Prometheus metrics exporter
  • OpenTelemetry tracing
  • Structured logging to Elasticsearch

Caching:

  • Redis for hot data
  • HTTP caching headers
  • CDN for static responses

Database:

  • PostgreSQL for writable data
  • Read replicas for scaling
  • Full-text search (Elasticsearch, Meilisearch)

Message Queue:

  • Background job processing (Celery, Sidekiq)
  • Event streaming (Kafka)

Configuration:

  • Environment variables
  • Config files (YAML, TOML)
  • Secrets management (Vault)

Integration Complexity

Current: Zero integrations (simplest possible)

With additions: Each integration adds:

  • Configuration complexity
  • Deployment dependencies
  • Failure modes
  • Maintenance burden

Recommendation: Only add integrations when necessary for specific use cases.