# Music Metadata API - Integrations ## Integration Overview Music Metadata API is a **fully self-contained service** with zero external integrations at runtime. All data is served from pre-populated SQLite databases with no external API calls, no authentication services, and no third-party dependencies beyond the Go runtime. ``` ┌─────────────────────────────────────────────────────────────┐ │ Music Metadata API │ │ (Self-Contained Service) │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ HTTP │ │ Database │ │ Models │ │ │ │ Handlers │→ │ Layer │→ │ Layer │ │ │ └────────────┘ └────────────┘ └────────────┘ │ │ ↓ │ │ ┌─────────────┐ │ │ │ SQLite │ │ │ │ Databases │ │ │ │ (216GB) │ │ │ └─────────────┘ │ └─────────────────────────────────────────────────────────────┘ │ │ NO external calls ↓ (All data local) ``` ## Runtime Dependencies ### Go Standard Library **Packages used:** - `net/http` - HTTP server and routing - `database/sql` - Database interface - `encoding/json` - JSON serialization - `log/slog` - Structured logging - `context` - Request context and timeouts - `sync` - Concurrency primitives (RWMutex) - `flag` - CLI argument parsing - `os/signal` - Graceful shutdown **No external HTTP calls:** All functionality implemented with stdlib. ### External Go Modules **modernc.org/sqlite v1.34.4** - Pure Go SQLite driver - No CGO required - No C dependencies - No external network calls **golang.org/x/time v0.14.0** - Rate limiting (token bucket) - No external network calls - Pure algorithm implementation **Total external dependencies:** 2 packages (both offline) ## Data Sources ### Pre-Populated Databases **Source:** User must obtain databases separately (not included in repository) **Database files:** - `main_database.sqlite3` (~117GB) - `track_files.sqlite3` (~99GB) **Provenance:** Unclear (repository states "not affiliated with Spotify") **Update mechanism:** None (static snapshot) **Implications:** - No real-time data sync - No automatic updates - User responsible for obtaining databases - Legal status uncertain ### No External APIs **What's NOT integrated:** - Spotify Web API (no OAuth, no API calls) - MusicBrainz API (no lookups) - Last.fm API (no scrobbling) - Discogs API (no catalog queries) - AcoustID API (no fingerprinting) - Cover Art Archive (no image fetching) **All data served from local databases.** ## Browser-Side Dependencies ### Swagger UI (Documentation Only) **Endpoint:** `/docs` **External resources loaded by browser:** ```html ``` **Characteristics:** - Loaded client-side (browser fetches) - Server doesn't make requests to unpkg.com - Works offline after first load (browser cache) - Only affects `/docs` endpoint (not API functionality) **Implications:** - Requires internet connection for first `/docs` visit - Subsequent visits work offline (cached) - API endpoints work without internet ### Image URLs (External CDN) **Image hosting:** Spotify CDN (i.scdn.co) **Example URLs:** ``` https://i.scdn.co/image/ab67616d0000b273ce4f1737bc8a646c8c4bd25a https://i.scdn.co/image/af2b8e57f6d7b5d1c9a5f3e8d4c2b1a0e9f8d7c6 ``` **Characteristics:** - API returns URLs (not image data) - Client responsible for fetching images - Server never fetches images - Images hosted externally (not by API) **Implications:** - Image availability depends on Spotify CDN - No image caching by API - Clients need internet to display images - Broken links possible if Spotify removes images ## No Authentication Integration ### No OAuth **What's missing:** - No OAuth 2.0 flow - No token validation - No user authentication - No API keys **Implications:** - Public API (anyone can query) - No usage tracking per user - No quota enforcement per user - No access control **Workarounds:** - Deploy behind reverse proxy with auth (nginx, Caddy) - Use API gateway (Kong, Tyk) - Implement custom middleware ### No Authorization **What's missing:** - No role-based access control (RBAC) - No permission system - No resource ownership **Implications:** - All data accessible to all clients - No private/public data distinction - No user-specific data ## No Monitoring Integration ### No Metrics Exporters **What's missing:** - No Prometheus metrics - No StatsD integration - No OpenTelemetry - No custom metrics endpoint **Implications:** - No visibility into request rates - No error rate tracking - No latency percentiles - No resource usage metrics **Workarounds:** - Parse logs for metrics - Use reverse proxy metrics (nginx, Envoy) - Implement custom metrics middleware ### No Distributed Tracing **What's missing:** - No Jaeger integration - No Zipkin support - No trace context propagation **Implications:** - Can't trace requests across services - No performance profiling - No bottleneck identification **Workarounds:** - Add custom tracing middleware - Use APM tools (Datadog, New Relic) ### No Log Aggregation **What's missing:** - No Elasticsearch integration - No Splunk forwarding - No CloudWatch Logs - No structured log shipping **Logging:** Go stdlib `log/slog` to stdout **Implications:** - Logs only in container/process stdout - No centralized log storage - No log search/analysis **Workarounds:** - Docker log drivers (json-file, syslog, fluentd) - Kubernetes log collectors (Fluentd, Filebeat) - Redirect stdout to log aggregator ## No Message Queue Integration **What's missing:** - No RabbitMQ - No Kafka - No Redis Pub/Sub - No AWS SQS **Implications:** - Synchronous request/response only - No async job processing - No event streaming - No background tasks **Use case:** All queries processed synchronously (acceptable for read-only API) ## No Cache Integration ### No External Cache **What's missing:** - No Redis - No Memcached - No Varnish **Caching:** SQLite page cache only (64MB per connection) **Implications:** - No shared cache across instances - No cache invalidation strategy - No cache warming - Cold start on each instance **Workarounds:** - Add Redis layer for hot data - Use HTTP caching headers (not implemented) - Deploy CDN in front of API ### No HTTP Caching **What's missing:** - No `Cache-Control` headers - No `ETag` support - No `Last-Modified` headers **Implications:** - Clients can't cache responses - Repeated requests hit database - No bandwidth savings **Workarounds:** - Add caching middleware - Use reverse proxy with caching (Varnish, nginx) ## No Database Replication **What's missing:** - No master-slave replication - No read replicas - No database clustering **Database:** Single SQLite file per instance **Implications:** - Each instance has full database copy (216GB) - No shared database across instances - Horizontal scaling requires full database per instance **Workarounds:** - Read-only databases safe to copy - Use network filesystem (NFS, EFS) for shared access - Replicate databases to multiple instances ## No Service Discovery **What's missing:** - No Consul integration - No etcd - No Kubernetes service discovery - No DNS-based discovery **Deployment:** Static configuration (IP:port) **Implications:** - Manual load balancer configuration - No dynamic scaling - No health-based routing **Workarounds:** - Use Kubernetes services (automatic discovery) - Use cloud load balancers (AWS ALB, GCP LB) - Use service mesh (Istio, Linkerd) ## No Configuration Management ### No External Config **What's missing:** - No Consul KV - No etcd - No AWS Parameter Store - No HashiCorp Vault **Configuration:** CLI flags only (`-db`, `-addr`) **Implications:** - All config at startup - No dynamic reconfiguration - No secrets management - Hardcoded timeouts/limits **Workarounds:** - Use environment variables (requires code changes) - Mount config files (requires code changes) - Use init containers to generate config ### No Secrets Management **What's missing:** - No Vault integration - No AWS Secrets Manager - No Kubernetes secrets - No encrypted config **Secrets:** None required (no authentication) **Implications:** - No sensitive data to protect - No credential rotation - No encryption at rest **Future consideration:** If adding authentication, integrate secrets manager ## Integration Patterns ### Reverse Proxy Integration **Use case:** Add authentication, CORS, caching, SSL **Example with nginx:** ```nginx upstream metadata_api { server localhost:8080; } server { listen 443 ssl; server_name api.example.com; ssl_certificate /etc/ssl/cert.pem; ssl_certificate_key /etc/ssl/key.pem; # CORS headers add_header Access-Control-Allow-Origin *; add_header Access-Control-Allow-Methods "GET, POST, OPTIONS"; # Caching proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m; proxy_cache api_cache; proxy_cache_valid 200 1h; # Authentication auth_basic "Restricted"; auth_basic_user_file /etc/nginx/.htpasswd; location / { proxy_pass http://metadata_api; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } } ``` ### API Gateway Integration **Use case:** Rate limiting, authentication, analytics **Example with Kong:** ```yaml services: - name: metadata-api url: http://localhost:8080 routes: - name: metadata-routes paths: - / plugins: - name: rate-limiting config: minute: 1000 policy: local - name: key-auth config: key_names: - apikey - name: prometheus config: per_consumer: true ``` ### Load Balancer Integration **Use case:** Distribute traffic across multiple instances **Example with HAProxy:** ``` frontend metadata_frontend bind *:80 default_backend metadata_backend backend metadata_backend balance roundrobin option httpchk GET /health server api1 10.0.1.10:8080 check server api2 10.0.1.11:8080 check server api3 10.0.1.12:8080 check ``` ### Kubernetes Integration **Use case:** Container orchestration, auto-scaling **Example deployment:** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: metadata-api spec: replicas: 3 selector: matchLabels: app: metadata-api template: metadata: labels: app: metadata-api spec: containers: - name: api image: ghcr.io/aunali321/music-metadata-api:latest args: ["-db", "/data/main_database.sqlite3"] ports: - containerPort: 8080 volumeMounts: - name: database mountPath: /data readOnly: true livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 30 resources: requests: memory: "4Gi" cpu: "1" limits: memory: "8Gi" cpu: "2" volumes: - name: database persistentVolumeClaim: claimName: metadata-db-pvc --- apiVersion: v1 kind: Service metadata: name: metadata-api spec: selector: app: metadata-api ports: - port: 80 targetPort: 8080 type: LoadBalancer ``` ### Monitoring Integration **Use case:** Metrics, logs, traces **Example with Prometheus + Grafana:** **1. Add metrics exporter (custom middleware):** ```go // Not implemented in current codebase import "github.com/prometheus/client_golang/prometheus" var ( requestsTotal = prometheus.NewCounterVec( prometheus.CounterOpts{Name: "api_requests_total"}, []string{"method", "endpoint", "status"}, ) requestDuration = prometheus.NewHistogramVec( prometheus.HistogramOpts{Name: "api_request_duration_seconds"}, []string{"method", "endpoint"}, ) ) ``` **2. Scrape metrics with Prometheus:** ```yaml scrape_configs: - job_name: 'metadata-api' static_configs: - targets: ['localhost:8080'] ``` **3. Visualize in Grafana:** - Request rate dashboard - Error rate dashboard - Latency percentiles (p50, p95, p99) ### Logging Integration **Use case:** Centralized log aggregation **Example with Fluentd:** **1. Configure Docker logging driver:** ```yaml services: metadata-api: image: ghcr.io/aunali321/music-metadata-api:latest logging: driver: fluentd options: fluentd-address: localhost:24224 tag: metadata-api ``` **2. Fluentd configuration:** ``` @type forward port 24224 @type elasticsearch host elasticsearch port 9200 index_name metadata-api type_name _doc ``` ### Caching Integration **Use case:** Reduce database load, improve latency **Example with Redis:** **1. Add Redis middleware (custom implementation):** ```go // Not implemented in current codebase func cacheMiddleware(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { // Check Redis cache cached, err := redisClient.Get(r.URL.Path).Result() if err == nil { w.Write([]byte(cached)) return } // Cache miss, call handler rec := httptest.NewRecorder() next.ServeHTTP(rec, r) // Store in Redis (1 hour TTL) redisClient.Set(r.URL.Path, rec.Body.String(), time.Hour) w.Write(rec.Body.Bytes()) }) } ``` **2. Deploy Redis:** ```yaml services: redis: image: redis:7-alpine ports: - "6379:6379" ``` ## Complementary Services ### MusicBrainz Integration **Use case:** Resolve MBIDs to ISRCs, then lookup in Music Metadata API **Flow:** ``` 1. Query MusicBrainz for recording by MBID ↓ 2. Extract ISRC from MusicBrainz response ↓ 3. Lookup ISRC in Music Metadata API ↓ 4. Merge metadata (MusicBrainz credits + Spotify-style data) ``` **Example:** ```python import requests # Step 1: Get ISRC from MusicBrainz mb_url = "https://musicbrainz.org/ws/2/recording/abc-123?fmt=json&inc=isrcs" mb_response = requests.get(mb_url).json() isrc = mb_response['isrcs'][0] # Step 2: Lookup in Music Metadata API mm_url = f"http://localhost:8080/lookup/isrc/{isrc}" mm_response = requests.get(mm_url).json() # Step 3: Merge metadata merged = { "mbid": "abc-123", "isrc": isrc, "title": mm_response['name'], "popularity": mm_response['popularity'], "credits": mb_response['artist-credit'] } ``` ### AcoustID Integration **Use case:** Fingerprint audio files, resolve to ISRCs **Flow:** ``` 1. Generate audio fingerprint (chromaprint) ↓ 2. Query AcoustID API with fingerprint ↓ 3. Extract ISRC from AcoustID response ↓ 4. Lookup ISRC in Music Metadata API ↓ 5. Tag audio file with metadata ``` **Example:** ```python import acoustid # Step 1: Fingerprint audio file duration, fingerprint = acoustid.fingerprint_file('song.mp3') # Step 2: Query AcoustID results = acoustid.lookup(api_key, fingerprint, duration, meta='recordings') # Step 3: Extract ISRC isrc = results['recordings'][0]['isrc'] # Step 4: Lookup in Music Metadata API mm_url = f"http://localhost:8080/lookup/isrc/{isrc}" metadata = requests.get(mm_url).json() # Step 5: Tag file audio = mutagen.File('song.mp3') audio['title'] = metadata['name'] audio['artist'] = metadata['artists'][0]['name'] audio.save() ``` ### Spotify Web API Integration **Use case:** Get real-time data, then fallback to Music Metadata API **Flow:** ``` 1. Try Spotify Web API (requires OAuth) ↓ 2. If rate limited or unavailable, fallback to Music Metadata API ↓ 3. Return cached/static data from Music Metadata API ``` **Example:** ```python def get_track_metadata(isrc): try: # Try Spotify Web API (real-time) spotify_data = spotify_client.search(q=f"isrc:{isrc}", type="track") return spotify_data['tracks']['items'][0] except Exception: # Fallback to Music Metadata API (static) mm_url = f"http://localhost:8080/lookup/isrc/{isrc}" return requests.get(mm_url).json() ``` ## Deployment Integrations ### Docker Compose **Use case:** Local development, simple deployments **Example:** ```yaml version: '3.8' services: metadata-api: image: ghcr.io/aunali321/music-metadata-api:latest ports: - "8080:8080" volumes: - ./data:/data:ro command: ["-db", "/data/main_database.sqlite3"] restart: unless-stopped nginx: image: nginx:alpine ports: - "80:80" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro depends_on: - metadata-api ``` ### Kubernetes **Use case:** Production deployments, auto-scaling **See Kubernetes Integration section above** ### Cloud Platforms **AWS ECS:** ```json { "family": "metadata-api", "containerDefinitions": [{ "name": "api", "image": "ghcr.io/aunali321/music-metadata-api:latest", "memory": 4096, "cpu": 1024, "portMappings": [{"containerPort": 8080}], "command": ["-db", "/data/main_database.sqlite3"], "mountPoints": [{ "sourceVolume": "database", "containerPath": "/data", "readOnly": true }] }], "volumes": [{ "name": "database", "efsVolumeConfiguration": { "fileSystemId": "fs-12345678" } }] } ``` **Google Cloud Run:** ```yaml apiVersion: serving.knative.dev/v1 kind: Service metadata: name: metadata-api spec: template: spec: containers: - image: ghcr.io/aunali321/music-metadata-api:latest args: ["-db", "/data/main_database.sqlite3"] volumeMounts: - name: database mountPath: /data readOnly: true volumes: - name: database gcePersistentDisk: pdName: metadata-db readOnly: true ``` ## No Integration Advantages ### Simplicity **Benefits:** - No external service dependencies - No network calls (faster, more reliable) - No authentication complexity - No API rate limits (external) **Tradeoffs:** - No real-time data - No automatic updates - No distributed features ### Reliability **Benefits:** - No cascading failures (no external dependencies) - No network timeouts (all local) - No third-party outages - Predictable performance **Tradeoffs:** - Single point of failure (database file) - No redundancy (unless replicated) ### Performance **Benefits:** - No network latency (local database) - No API rate limits (self-imposed only) - Batch queries optimized (7 queries vs 2,800) **Tradeoffs:** - Database size (216GB per instance) - Memory usage (2.5GB minimum) ### Cost **Benefits:** - No API subscription fees - No per-request charges - No data transfer costs (local) **Tradeoffs:** - Storage costs (216GB) - Compute costs (self-hosted) ## Future Integration Opportunities ### Potential Additions **Authentication:** - OAuth 2.0 provider (Keycloak, Auth0) - API key management (custom or Kong) **Monitoring:** - Prometheus metrics exporter - OpenTelemetry tracing - Structured logging to Elasticsearch **Caching:** - Redis for hot data - HTTP caching headers - CDN for static responses **Database:** - PostgreSQL for writable data - Read replicas for scaling - Full-text search (Elasticsearch, Meilisearch) **Message Queue:** - Background job processing (Celery, Sidekiq) - Event streaming (Kafka) **Configuration:** - Environment variables - Config files (YAML, TOML) - Secrets management (Vault) ### Integration Complexity **Current:** Zero integrations (simplest possible) **With additions:** Each integration adds: - Configuration complexity - Deployment dependencies - Failure modes - Maintenance burden **Recommendation:** Only add integrations when necessary for specific use cases.