metadata-agregator/docs/research/music-metadata-api/analysis/INTEGRATIONS.md

# Music Metadata API - Integrations

## Integration Overview

Music Metadata API is a **fully self-contained service** with zero external integrations at runtime. All data is served from pre-populated SQLite databases with no external API calls, no authentication services, and no third-party dependencies beyond the Go runtime.

```
┌─────────────────────────────────────────────────────────────┐
│                    Music Metadata API                        │
│                   (Self-Contained Service)                   │
│                                                              │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐           │
│  │   HTTP     │  │  Database  │  │   Models   │           │
│  │  Handlers  │→ │   Layer    │→ │   Layer    │           │
│  └────────────┘  └────────────┘  └────────────┘           │
│                         ↓                                    │
│                  ┌─────────────┐                            │
│                  │   SQLite    │                            │
│                  │  Databases  │                            │
│                  │  (216GB)    │                            │
│                  └─────────────┘                            │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ NO external calls
                            ↓
                    (All data local)
```

## Runtime Dependencies

### Go Standard Library

**Packages used:**
- `net/http` - HTTP server and routing
- `database/sql` - Database interface
- `encoding/json` - JSON serialization
- `log/slog` - Structured logging
- `context` - Request context and timeouts
- `sync` - Concurrency primitives (RWMutex)
- `flag` - CLI argument parsing
- `os/signal` - Graceful shutdown

**No external HTTP calls:** All functionality implemented with stdlib.

### External Go Modules

**modernc.org/sqlite v1.34.4**
- Pure Go SQLite driver
- No CGO required
- No C dependencies
- No external network calls

**golang.org/x/time v0.14.0**
- Rate limiting (token bucket)
- No external network calls
- Pure algorithm implementation

**Total external dependencies:** 2 packages (both offline)

## Data Sources

### Pre-Populated Databases

**Source:** User must obtain databases separately (not included in repository)

**Database files:**
- `main_database.sqlite3` (~117GB)
- `track_files.sqlite3` (~99GB)

**Provenance:** Unclear (repository states "not affiliated with Spotify")

**Update mechanism:** None (static snapshot)

**Implications:**
- No real-time data sync
- No automatic updates
- User responsible for obtaining databases
- Legal status uncertain

### No External APIs

**What's NOT integrated:**
- Spotify Web API (no OAuth, no API calls)
- MusicBrainz API (no lookups)
- Last.fm API (no scrobbling)
- Discogs API (no catalog queries)
- AcoustID API (no fingerprinting)
- Cover Art Archive (no image fetching)

**All data served from local databases.**

## Browser-Side Dependencies

### Swagger UI (Documentation Only)

**Endpoint:** `/docs`

**External resources loaded by browser:**
```html
<!-- Loaded from unpkg.com CDN -->
<script src="https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js"></script>
<link rel="stylesheet" href="https://unpkg.com/swagger-ui-dist@5/swagger-ui.css" />
```

**Characteristics:**
- Loaded client-side (browser fetches)
- Server doesn't make requests to unpkg.com
- Works offline after first load (browser cache)
- Only affects `/docs` endpoint (not API functionality)

**Implications:**
- Requires internet connection for first `/docs` visit
- Subsequent visits work offline (cached)
- API endpoints work without internet

### Image URLs (External CDN)

**Image hosting:** Spotify CDN (i.scdn.co)

**Example URLs:**
```
https://i.scdn.co/image/ab67616d0000b273ce4f1737bc8a646c8c4bd25a
https://i.scdn.co/image/af2b8e57f6d7b5d1c9a5f3e8d4c2b1a0e9f8d7c6
```

**Characteristics:**
- API returns URLs (not image data)
- Client responsible for fetching images
- Server never fetches images
- Images hosted externally (not by API)

**Implications:**
- Image availability depends on Spotify CDN
- No image caching by API
- Clients need internet to display images
- Broken links possible if Spotify removes images

## No Authentication Integration

### No OAuth

**What's missing:**
- No OAuth 2.0 flow
- No token validation
- No user authentication
- No API keys

**Implications:**
- Public API (anyone can query)
- No usage tracking per user
- No quota enforcement per user
- No access control

**Workarounds:**
- Deploy behind reverse proxy with auth (nginx, Caddy)
- Use API gateway (Kong, Tyk)
- Implement custom middleware

### No Authorization

**What's missing:**
- No role-based access control (RBAC)
- No permission system
- No resource ownership

**Implications:**
- All data accessible to all clients
- No private/public data distinction
- No user-specific data

## No Monitoring Integration

### No Metrics Exporters

**What's missing:**
- No Prometheus metrics
- No StatsD integration
- No OpenTelemetry
- No custom metrics endpoint

**Implications:**
- No visibility into request rates
- No error rate tracking
- No latency percentiles
- No resource usage metrics

**Workarounds:**
- Parse logs for metrics
- Use reverse proxy metrics (nginx, Envoy)
- Implement custom metrics middleware

### No Distributed Tracing

**What's missing:**
- No Jaeger integration
- No Zipkin support
- No trace context propagation

**Implications:**
- Can't trace requests across services
- No performance profiling
- No bottleneck identification

**Workarounds:**
- Add custom tracing middleware
- Use APM tools (Datadog, New Relic)

### No Log Aggregation

**What's missing:**
- No Elasticsearch integration
- No Splunk forwarding
- No CloudWatch Logs
- No structured log shipping

**Logging:** Go stdlib `log/slog` to stdout

**Implications:**
- Logs only in container/process stdout
- No centralized log storage
- No log search/analysis

**Workarounds:**
- Docker log drivers (json-file, syslog, fluentd)
- Kubernetes log collectors (Fluentd, Filebeat)
- Redirect stdout to log aggregator

## No Message Queue Integration

**What's missing:**
- No RabbitMQ
- No Kafka
- No Redis Pub/Sub
- No AWS SQS

**Implications:**
- Synchronous request/response only
- No async job processing
- No event streaming
- No background tasks

**Use case:** All queries processed synchronously (acceptable for read-only API)

## No Cache Integration

### No External Cache

**What's missing:**
- No Redis
- No Memcached
- No Varnish

**Caching:** SQLite page cache only (64MB per connection)

**Implications:**
- No shared cache across instances
- No cache invalidation strategy
- No cache warming
- Cold start on each instance

**Workarounds:**
- Add Redis layer for hot data
- Use HTTP caching headers (not implemented)
- Deploy CDN in front of API

### No HTTP Caching

**What's missing:**
- No `Cache-Control` headers
- No `ETag` support
- No `Last-Modified` headers

**Implications:**
- Clients can't cache responses
- Repeated requests hit database
- No bandwidth savings

**Workarounds:**
- Add caching middleware
- Use reverse proxy with caching (Varnish, nginx)

## No Database Replication

**What's missing:**
- No master-slave replication
- No read replicas
- No database clustering

**Database:** Single SQLite file per instance

**Implications:**
- Each instance has full database copy (216GB)
- No shared database across instances
- Horizontal scaling requires full database per instance

**Workarounds:**
- Read-only databases safe to copy
- Use network filesystem (NFS, EFS) for shared access
- Replicate databases to multiple instances

## No Service Discovery

**What's missing:**
- No Consul integration
- No etcd
- No Kubernetes service discovery
- No DNS-based discovery

**Deployment:** Static configuration (IP:port)

**Implications:**
- Manual load balancer configuration
- No dynamic scaling
- No health-based routing

**Workarounds:**
- Use Kubernetes services (automatic discovery)
- Use cloud load balancers (AWS ALB, GCP LB)
- Use service mesh (Istio, Linkerd)

## No Configuration Management

### No External Config

**What's missing:**
- No Consul KV
- No etcd
- No AWS Parameter Store
- No HashiCorp Vault

**Configuration:** CLI flags only (`-db`, `-addr`)

**Implications:**
- All config at startup
- No dynamic reconfiguration
- No secrets management
- Hardcoded timeouts/limits

**Workarounds:**
- Use environment variables (requires code changes)
- Mount config files (requires code changes)
- Use init containers to generate config

### No Secrets Management

**What's missing:**
- No Vault integration
- No AWS Secrets Manager
- No Kubernetes secrets
- No encrypted config

**Secrets:** None required (no authentication)

**Implications:**
- No sensitive data to protect
- No credential rotation
- No encryption at rest

**Future consideration:** If adding authentication, integrate secrets manager

## Integration Patterns

### Reverse Proxy Integration

**Use case:** Add authentication, CORS, caching, SSL

**Example with nginx:**
```nginx
upstream metadata_api {
    server localhost:8080;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    ssl_certificate /etc/ssl/cert.pem;
    ssl_certificate_key /etc/ssl/key.pem;

    # CORS headers
    add_header Access-Control-Allow-Origin *;
    add_header Access-Control-Allow-Methods "GET, POST, OPTIONS";

    # Caching
    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m;
    proxy_cache api_cache;
    proxy_cache_valid 200 1h;

    # Authentication
    auth_basic "Restricted";
    auth_basic_user_file /etc/nginx/.htpasswd;

    location / {
        proxy_pass http://metadata_api;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
```

### API Gateway Integration

**Use case:** Rate limiting, authentication, analytics

**Example with Kong:**
```yaml
services:
  - name: metadata-api
    url: http://localhost:8080
    routes:
      - name: metadata-routes
        paths:
          - /
    plugins:
      - name: rate-limiting
        config:
          minute: 1000
          policy: local
      - name: key-auth
        config:
          key_names:
            - apikey
      - name: prometheus
        config:
          per_consumer: true
```

### Load Balancer Integration

**Use case:** Distribute traffic across multiple instances

**Example with HAProxy:**
```
frontend metadata_frontend
    bind *:80
    default_backend metadata_backend

backend metadata_backend
    balance roundrobin
    option httpchk GET /health
    server api1 10.0.1.10:8080 check
    server api2 10.0.1.11:8080 check
    server api3 10.0.1.12:8080 check
```

### Kubernetes Integration

**Use case:** Container orchestration, auto-scaling

**Example deployment:**
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: metadata-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: metadata-api
  template:
    metadata:
      labels:
        app: metadata-api
    spec:
      containers:
      - name: api
        image: ghcr.io/aunali321/music-metadata-api:latest
        args: ["-db", "/data/main_database.sqlite3"]
        ports:
        - containerPort: 8080
        volumeMounts:
        - name: database
          mountPath: /data
          readOnly: true
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 30
        resources:
          requests:
            memory: "4Gi"
            cpu: "1"
          limits:
            memory: "8Gi"
            cpu: "2"
      volumes:
      - name: database
        persistentVolumeClaim:
          claimName: metadata-db-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: metadata-api
spec:
  selector:
    app: metadata-api
  ports:
  - port: 80
    targetPort: 8080
  type: LoadBalancer
```

### Monitoring Integration

**Use case:** Metrics, logs, traces

**Example with Prometheus + Grafana:**

**1. Add metrics exporter (custom middleware):**
```go
// Not implemented in current codebase
import "github.com/prometheus/client_golang/prometheus"

var (
    requestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{Name: "api_requests_total"},
        []string{"method", "endpoint", "status"},
    )
    requestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{Name: "api_request_duration_seconds"},
        []string{"method", "endpoint"},
    )
)
```

**2. Scrape metrics with Prometheus:**
```yaml
scrape_configs:
  - job_name: 'metadata-api'
    static_configs:
      - targets: ['localhost:8080']
```

**3. Visualize in Grafana:**
- Request rate dashboard
- Error rate dashboard
- Latency percentiles (p50, p95, p99)

### Logging Integration

**Use case:** Centralized log aggregation

**Example with Fluentd:**

**1. Configure Docker logging driver:**
```yaml
services:
  metadata-api:
    image: ghcr.io/aunali321/music-metadata-api:latest
    logging:
      driver: fluentd
      options:
        fluentd-address: localhost:24224
        tag: metadata-api
```

**2. Fluentd configuration:**
```
<source>
  @type forward
  port 24224
</source>

<match metadata-api>
  @type elasticsearch
  host elasticsearch
  port 9200
  index_name metadata-api
  type_name _doc
</match>
```

### Caching Integration

**Use case:** Reduce database load, improve latency

**Example with Redis:**

**1. Add Redis middleware (custom implementation):**
```go
// Not implemented in current codebase
func cacheMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Check Redis cache
        cached, err := redisClient.Get(r.URL.Path).Result()
        if err == nil {
            w.Write([]byte(cached))
            return
        }

        // Cache miss, call handler
        rec := httptest.NewRecorder()
        next.ServeHTTP(rec, r)

        // Store in Redis (1 hour TTL)
        redisClient.Set(r.URL.Path, rec.Body.String(), time.Hour)

        w.Write(rec.Body.Bytes())
    })
}
```

**2. Deploy Redis:**
```yaml
services:
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
```

## Complementary Services

### MusicBrainz Integration

**Use case:** Resolve MBIDs to ISRCs, then lookup in Music Metadata API

**Flow:**
```
1. Query MusicBrainz for recording by MBID
   ↓
2. Extract ISRC from MusicBrainz response
   ↓
3. Lookup ISRC in Music Metadata API
   ↓
4. Merge metadata (MusicBrainz credits + Spotify-style data)
```

**Example:**
```python
import requests

# Step 1: Get ISRC from MusicBrainz
mb_url = "https://musicbrainz.org/ws/2/recording/abc-123?fmt=json&inc=isrcs"
mb_response = requests.get(mb_url).json()
isrc = mb_response['isrcs'][0]

# Step 2: Lookup in Music Metadata API
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
mm_response = requests.get(mm_url).json()

# Step 3: Merge metadata
merged = {
    "mbid": "abc-123",
    "isrc": isrc,
    "title": mm_response['name'],
    "popularity": mm_response['popularity'],
    "credits": mb_response['artist-credit']
}
```

### AcoustID Integration

**Use case:** Fingerprint audio files, resolve to ISRCs

**Flow:**
```
1. Generate audio fingerprint (chromaprint)
   ↓
2. Query AcoustID API with fingerprint
   ↓
3. Extract ISRC from AcoustID response
   ↓
4. Lookup ISRC in Music Metadata API
   ↓
5. Tag audio file with metadata
```

**Example:**
```python
import acoustid

# Step 1: Fingerprint audio file
duration, fingerprint = acoustid.fingerprint_file('song.mp3')

# Step 2: Query AcoustID
results = acoustid.lookup(api_key, fingerprint, duration, meta='recordings')

# Step 3: Extract ISRC
isrc = results['recordings'][0]['isrc']

# Step 4: Lookup in Music Metadata API
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
metadata = requests.get(mm_url).json()

# Step 5: Tag file
audio = mutagen.File('song.mp3')
audio['title'] = metadata['name']
audio['artist'] = metadata['artists'][0]['name']
audio.save()
```

### Spotify Web API Integration

**Use case:** Get real-time data, then fallback to Music Metadata API

**Flow:**
```
1. Try Spotify Web API (requires OAuth)
   ↓
2. If rate limited or unavailable, fallback to Music Metadata API
   ↓
3. Return cached/static data from Music Metadata API
```

**Example:**
```python
def get_track_metadata(isrc):
    try:
        # Try Spotify Web API (real-time)
        spotify_data = spotify_client.search(q=f"isrc:{isrc}", type="track")
        return spotify_data['tracks']['items'][0]
    except Exception:
        # Fallback to Music Metadata API (static)
        mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
        return requests.get(mm_url).json()
```

## Deployment Integrations

### Docker Compose

**Use case:** Local development, simple deployments

**Example:**
```yaml
version: '3.8'
services:
  metadata-api:
    image: ghcr.io/aunali321/music-metadata-api:latest
    ports:
      - "8080:8080"
    volumes:
      - ./data:/data:ro
    command: ["-db", "/data/main_database.sqlite3"]
    restart: unless-stopped

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - metadata-api
```

### Kubernetes

**Use case:** Production deployments, auto-scaling

**See Kubernetes Integration section above**

### Cloud Platforms

**AWS ECS:**
```json
{
  "family": "metadata-api",
  "containerDefinitions": [{
    "name": "api",
    "image": "ghcr.io/aunali321/music-metadata-api:latest",
    "memory": 4096,
    "cpu": 1024,
    "portMappings": [{"containerPort": 8080}],
    "command": ["-db", "/data/main_database.sqlite3"],
    "mountPoints": [{
      "sourceVolume": "database",
      "containerPath": "/data",
      "readOnly": true
    }]
  }],
  "volumes": [{
    "name": "database",
    "efsVolumeConfiguration": {
      "fileSystemId": "fs-12345678"
    }
  }]
}
```

**Google Cloud Run:**
```yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: metadata-api
spec:
  template:
    spec:
      containers:
      - image: ghcr.io/aunali321/music-metadata-api:latest
        args: ["-db", "/data/main_database.sqlite3"]
        volumeMounts:
        - name: database
          mountPath: /data
          readOnly: true
      volumes:
      - name: database
        gcePersistentDisk:
          pdName: metadata-db
          readOnly: true
```

## No Integration Advantages

### Simplicity

**Benefits:**
- No external service dependencies
- No network calls (faster, more reliable)
- No authentication complexity
- No API rate limits (external)

**Tradeoffs:**
- No real-time data
- No automatic updates
- No distributed features

### Reliability

**Benefits:**
- No cascading failures (no external dependencies)
- No network timeouts (all local)
- No third-party outages
- Predictable performance

**Tradeoffs:**
- Single point of failure (database file)
- No redundancy (unless replicated)

### Performance

**Benefits:**
- No network latency (local database)
- No API rate limits (self-imposed only)
- Batch queries optimized (7 queries vs 2,800)

**Tradeoffs:**
- Database size (216GB per instance)
- Memory usage (2.5GB minimum)

### Cost

**Benefits:**
- No API subscription fees
- No per-request charges
- No data transfer costs (local)

**Tradeoffs:**
- Storage costs (216GB)
- Compute costs (self-hosted)

## Future Integration Opportunities

### Potential Additions

**Authentication:**
- OAuth 2.0 provider (Keycloak, Auth0)
- API key management (custom or Kong)

**Monitoring:**
- Prometheus metrics exporter
- OpenTelemetry tracing
- Structured logging to Elasticsearch

**Caching:**
- Redis for hot data
- HTTP caching headers
- CDN for static responses

**Database:**
- PostgreSQL for writable data
- Read replicas for scaling
- Full-text search (Elasticsearch, Meilisearch)

**Message Queue:**
- Background job processing (Celery, Sidekiq)
- Event streaming (Kafka)

**Configuration:**
- Environment variables
- Config files (YAML, TOML)
- Secrets management (Vault)

### Integration Complexity

**Current:** Zero integrations (simplest possible)

**With additions:** Each integration adds:
- Configuration complexity
- Deployment dependencies
- Failure modes
- Maintenance burden

**Recommendation:** Only add integrations when necessary for specific use cases.