- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
21 KiB
Music Metadata API - Integrations
Integration Overview
Music Metadata API is a fully self-contained service with zero external integrations at runtime. All data is served from pre-populated SQLite databases with no external API calls, no authentication services, and no third-party dependencies beyond the Go runtime.
┌─────────────────────────────────────────────────────────────┐
│ Music Metadata API │
│ (Self-Contained Service) │
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ HTTP │ │ Database │ │ Models │ │
│ │ Handlers │→ │ Layer │→ │ Layer │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ ↓ │
│ ┌─────────────┐ │
│ │ SQLite │ │
│ │ Databases │ │
│ │ (216GB) │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
│ NO external calls
↓
(All data local)
Runtime Dependencies
Go Standard Library
Packages used:
net/http- HTTP server and routingdatabase/sql- Database interfaceencoding/json- JSON serializationlog/slog- Structured loggingcontext- Request context and timeoutssync- Concurrency primitives (RWMutex)flag- CLI argument parsingos/signal- Graceful shutdown
No external HTTP calls: All functionality implemented with stdlib.
External Go Modules
modernc.org/sqlite v1.34.4
- Pure Go SQLite driver
- No CGO required
- No C dependencies
- No external network calls
golang.org/x/time v0.14.0
- Rate limiting (token bucket)
- No external network calls
- Pure algorithm implementation
Total external dependencies: 2 packages (both offline)
Data Sources
Pre-Populated Databases
Source: User must obtain databases separately (not included in repository)
Database files:
main_database.sqlite3(~117GB)track_files.sqlite3(~99GB)
Provenance: Unclear (repository states "not affiliated with Spotify")
Update mechanism: None (static snapshot)
Implications:
- No real-time data sync
- No automatic updates
- User responsible for obtaining databases
- Legal status uncertain
No External APIs
What's NOT integrated:
- Spotify Web API (no OAuth, no API calls)
- MusicBrainz API (no lookups)
- Last.fm API (no scrobbling)
- Discogs API (no catalog queries)
- AcoustID API (no fingerprinting)
- Cover Art Archive (no image fetching)
All data served from local databases.
Browser-Side Dependencies
Swagger UI (Documentation Only)
Endpoint: /docs
External resources loaded by browser:
<!-- Loaded from unpkg.com CDN -->
<script src="https://unpkg.com/swagger-ui-dist@5/swagger-ui-bundle.js"></script>
<link rel="stylesheet" href="https://unpkg.com/swagger-ui-dist@5/swagger-ui.css" />
Characteristics:
- Loaded client-side (browser fetches)
- Server doesn't make requests to unpkg.com
- Works offline after first load (browser cache)
- Only affects
/docsendpoint (not API functionality)
Implications:
- Requires internet connection for first
/docsvisit - Subsequent visits work offline (cached)
- API endpoints work without internet
Image URLs (External CDN)
Image hosting: Spotify CDN (i.scdn.co)
Example URLs:
https://i.scdn.co/image/ab67616d0000b273ce4f1737bc8a646c8c4bd25a
https://i.scdn.co/image/af2b8e57f6d7b5d1c9a5f3e8d4c2b1a0e9f8d7c6
Characteristics:
- API returns URLs (not image data)
- Client responsible for fetching images
- Server never fetches images
- Images hosted externally (not by API)
Implications:
- Image availability depends on Spotify CDN
- No image caching by API
- Clients need internet to display images
- Broken links possible if Spotify removes images
No Authentication Integration
No OAuth
What's missing:
- No OAuth 2.0 flow
- No token validation
- No user authentication
- No API keys
Implications:
- Public API (anyone can query)
- No usage tracking per user
- No quota enforcement per user
- No access control
Workarounds:
- Deploy behind reverse proxy with auth (nginx, Caddy)
- Use API gateway (Kong, Tyk)
- Implement custom middleware
No Authorization
What's missing:
- No role-based access control (RBAC)
- No permission system
- No resource ownership
Implications:
- All data accessible to all clients
- No private/public data distinction
- No user-specific data
No Monitoring Integration
No Metrics Exporters
What's missing:
- No Prometheus metrics
- No StatsD integration
- No OpenTelemetry
- No custom metrics endpoint
Implications:
- No visibility into request rates
- No error rate tracking
- No latency percentiles
- No resource usage metrics
Workarounds:
- Parse logs for metrics
- Use reverse proxy metrics (nginx, Envoy)
- Implement custom metrics middleware
No Distributed Tracing
What's missing:
- No Jaeger integration
- No Zipkin support
- No trace context propagation
Implications:
- Can't trace requests across services
- No performance profiling
- No bottleneck identification
Workarounds:
- Add custom tracing middleware
- Use APM tools (Datadog, New Relic)
No Log Aggregation
What's missing:
- No Elasticsearch integration
- No Splunk forwarding
- No CloudWatch Logs
- No structured log shipping
Logging: Go stdlib log/slog to stdout
Implications:
- Logs only in container/process stdout
- No centralized log storage
- No log search/analysis
Workarounds:
- Docker log drivers (json-file, syslog, fluentd)
- Kubernetes log collectors (Fluentd, Filebeat)
- Redirect stdout to log aggregator
No Message Queue Integration
What's missing:
- No RabbitMQ
- No Kafka
- No Redis Pub/Sub
- No AWS SQS
Implications:
- Synchronous request/response only
- No async job processing
- No event streaming
- No background tasks
Use case: All queries processed synchronously (acceptable for read-only API)
No Cache Integration
No External Cache
What's missing:
- No Redis
- No Memcached
- No Varnish
Caching: SQLite page cache only (64MB per connection)
Implications:
- No shared cache across instances
- No cache invalidation strategy
- No cache warming
- Cold start on each instance
Workarounds:
- Add Redis layer for hot data
- Use HTTP caching headers (not implemented)
- Deploy CDN in front of API
No HTTP Caching
What's missing:
- No
Cache-Controlheaders - No
ETagsupport - No
Last-Modifiedheaders
Implications:
- Clients can't cache responses
- Repeated requests hit database
- No bandwidth savings
Workarounds:
- Add caching middleware
- Use reverse proxy with caching (Varnish, nginx)
No Database Replication
What's missing:
- No master-slave replication
- No read replicas
- No database clustering
Database: Single SQLite file per instance
Implications:
- Each instance has full database copy (216GB)
- No shared database across instances
- Horizontal scaling requires full database per instance
Workarounds:
- Read-only databases safe to copy
- Use network filesystem (NFS, EFS) for shared access
- Replicate databases to multiple instances
No Service Discovery
What's missing:
- No Consul integration
- No etcd
- No Kubernetes service discovery
- No DNS-based discovery
Deployment: Static configuration (IP:port)
Implications:
- Manual load balancer configuration
- No dynamic scaling
- No health-based routing
Workarounds:
- Use Kubernetes services (automatic discovery)
- Use cloud load balancers (AWS ALB, GCP LB)
- Use service mesh (Istio, Linkerd)
No Configuration Management
No External Config
What's missing:
- No Consul KV
- No etcd
- No AWS Parameter Store
- No HashiCorp Vault
Configuration: CLI flags only (-db, -addr)
Implications:
- All config at startup
- No dynamic reconfiguration
- No secrets management
- Hardcoded timeouts/limits
Workarounds:
- Use environment variables (requires code changes)
- Mount config files (requires code changes)
- Use init containers to generate config
No Secrets Management
What's missing:
- No Vault integration
- No AWS Secrets Manager
- No Kubernetes secrets
- No encrypted config
Secrets: None required (no authentication)
Implications:
- No sensitive data to protect
- No credential rotation
- No encryption at rest
Future consideration: If adding authentication, integrate secrets manager
Integration Patterns
Reverse Proxy Integration
Use case: Add authentication, CORS, caching, SSL
Example with nginx:
upstream metadata_api {
server localhost:8080;
}
server {
listen 443 ssl;
server_name api.example.com;
ssl_certificate /etc/ssl/cert.pem;
ssl_certificate_key /etc/ssl/key.pem;
# CORS headers
add_header Access-Control-Allow-Origin *;
add_header Access-Control-Allow-Methods "GET, POST, OPTIONS";
# Caching
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=api_cache:10m;
proxy_cache api_cache;
proxy_cache_valid 200 1h;
# Authentication
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/.htpasswd;
location / {
proxy_pass http://metadata_api;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
API Gateway Integration
Use case: Rate limiting, authentication, analytics
Example with Kong:
services:
- name: metadata-api
url: http://localhost:8080
routes:
- name: metadata-routes
paths:
- /
plugins:
- name: rate-limiting
config:
minute: 1000
policy: local
- name: key-auth
config:
key_names:
- apikey
- name: prometheus
config:
per_consumer: true
Load Balancer Integration
Use case: Distribute traffic across multiple instances
Example with HAProxy:
frontend metadata_frontend
bind *:80
default_backend metadata_backend
backend metadata_backend
balance roundrobin
option httpchk GET /health
server api1 10.0.1.10:8080 check
server api2 10.0.1.11:8080 check
server api3 10.0.1.12:8080 check
Kubernetes Integration
Use case: Container orchestration, auto-scaling
Example deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: metadata-api
spec:
replicas: 3
selector:
matchLabels:
app: metadata-api
template:
metadata:
labels:
app: metadata-api
spec:
containers:
- name: api
image: ghcr.io/aunali321/music-metadata-api:latest
args: ["-db", "/data/main_database.sqlite3"]
ports:
- containerPort: 8080
volumeMounts:
- name: database
mountPath: /data
readOnly: true
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
resources:
requests:
memory: "4Gi"
cpu: "1"
limits:
memory: "8Gi"
cpu: "2"
volumes:
- name: database
persistentVolumeClaim:
claimName: metadata-db-pvc
---
apiVersion: v1
kind: Service
metadata:
name: metadata-api
spec:
selector:
app: metadata-api
ports:
- port: 80
targetPort: 8080
type: LoadBalancer
Monitoring Integration
Use case: Metrics, logs, traces
Example with Prometheus + Grafana:
1. Add metrics exporter (custom middleware):
// Not implemented in current codebase
import "github.com/prometheus/client_golang/prometheus"
var (
requestsTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{Name: "api_requests_total"},
[]string{"method", "endpoint", "status"},
)
requestDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{Name: "api_request_duration_seconds"},
[]string{"method", "endpoint"},
)
)
2. Scrape metrics with Prometheus:
scrape_configs:
- job_name: 'metadata-api'
static_configs:
- targets: ['localhost:8080']
3. Visualize in Grafana:
- Request rate dashboard
- Error rate dashboard
- Latency percentiles (p50, p95, p99)
Logging Integration
Use case: Centralized log aggregation
Example with Fluentd:
1. Configure Docker logging driver:
services:
metadata-api:
image: ghcr.io/aunali321/music-metadata-api:latest
logging:
driver: fluentd
options:
fluentd-address: localhost:24224
tag: metadata-api
2. Fluentd configuration:
<source>
@type forward
port 24224
</source>
<match metadata-api>
@type elasticsearch
host elasticsearch
port 9200
index_name metadata-api
type_name _doc
</match>
Caching Integration
Use case: Reduce database load, improve latency
Example with Redis:
1. Add Redis middleware (custom implementation):
// Not implemented in current codebase
func cacheMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Check Redis cache
cached, err := redisClient.Get(r.URL.Path).Result()
if err == nil {
w.Write([]byte(cached))
return
}
// Cache miss, call handler
rec := httptest.NewRecorder()
next.ServeHTTP(rec, r)
// Store in Redis (1 hour TTL)
redisClient.Set(r.URL.Path, rec.Body.String(), time.Hour)
w.Write(rec.Body.Bytes())
})
}
2. Deploy Redis:
services:
redis:
image: redis:7-alpine
ports:
- "6379:6379"
Complementary Services
MusicBrainz Integration
Use case: Resolve MBIDs to ISRCs, then lookup in Music Metadata API
Flow:
1. Query MusicBrainz for recording by MBID
↓
2. Extract ISRC from MusicBrainz response
↓
3. Lookup ISRC in Music Metadata API
↓
4. Merge metadata (MusicBrainz credits + Spotify-style data)
Example:
import requests
# Step 1: Get ISRC from MusicBrainz
mb_url = "https://musicbrainz.org/ws/2/recording/abc-123?fmt=json&inc=isrcs"
mb_response = requests.get(mb_url).json()
isrc = mb_response['isrcs'][0]
# Step 2: Lookup in Music Metadata API
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
mm_response = requests.get(mm_url).json()
# Step 3: Merge metadata
merged = {
"mbid": "abc-123",
"isrc": isrc,
"title": mm_response['name'],
"popularity": mm_response['popularity'],
"credits": mb_response['artist-credit']
}
AcoustID Integration
Use case: Fingerprint audio files, resolve to ISRCs
Flow:
1. Generate audio fingerprint (chromaprint)
↓
2. Query AcoustID API with fingerprint
↓
3. Extract ISRC from AcoustID response
↓
4. Lookup ISRC in Music Metadata API
↓
5. Tag audio file with metadata
Example:
import acoustid
# Step 1: Fingerprint audio file
duration, fingerprint = acoustid.fingerprint_file('song.mp3')
# Step 2: Query AcoustID
results = acoustid.lookup(api_key, fingerprint, duration, meta='recordings')
# Step 3: Extract ISRC
isrc = results['recordings'][0]['isrc']
# Step 4: Lookup in Music Metadata API
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
metadata = requests.get(mm_url).json()
# Step 5: Tag file
audio = mutagen.File('song.mp3')
audio['title'] = metadata['name']
audio['artist'] = metadata['artists'][0]['name']
audio.save()
Spotify Web API Integration
Use case: Get real-time data, then fallback to Music Metadata API
Flow:
1. Try Spotify Web API (requires OAuth)
↓
2. If rate limited or unavailable, fallback to Music Metadata API
↓
3. Return cached/static data from Music Metadata API
Example:
def get_track_metadata(isrc):
try:
# Try Spotify Web API (real-time)
spotify_data = spotify_client.search(q=f"isrc:{isrc}", type="track")
return spotify_data['tracks']['items'][0]
except Exception:
# Fallback to Music Metadata API (static)
mm_url = f"http://localhost:8080/lookup/isrc/{isrc}"
return requests.get(mm_url).json()
Deployment Integrations
Docker Compose
Use case: Local development, simple deployments
Example:
version: '3.8'
services:
metadata-api:
image: ghcr.io/aunali321/music-metadata-api:latest
ports:
- "8080:8080"
volumes:
- ./data:/data:ro
command: ["-db", "/data/main_database.sqlite3"]
restart: unless-stopped
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- metadata-api
Kubernetes
Use case: Production deployments, auto-scaling
See Kubernetes Integration section above
Cloud Platforms
AWS ECS:
{
"family": "metadata-api",
"containerDefinitions": [{
"name": "api",
"image": "ghcr.io/aunali321/music-metadata-api:latest",
"memory": 4096,
"cpu": 1024,
"portMappings": [{"containerPort": 8080}],
"command": ["-db", "/data/main_database.sqlite3"],
"mountPoints": [{
"sourceVolume": "database",
"containerPath": "/data",
"readOnly": true
}]
}],
"volumes": [{
"name": "database",
"efsVolumeConfiguration": {
"fileSystemId": "fs-12345678"
}
}]
}
Google Cloud Run:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: metadata-api
spec:
template:
spec:
containers:
- image: ghcr.io/aunali321/music-metadata-api:latest
args: ["-db", "/data/main_database.sqlite3"]
volumeMounts:
- name: database
mountPath: /data
readOnly: true
volumes:
- name: database
gcePersistentDisk:
pdName: metadata-db
readOnly: true
No Integration Advantages
Simplicity
Benefits:
- No external service dependencies
- No network calls (faster, more reliable)
- No authentication complexity
- No API rate limits (external)
Tradeoffs:
- No real-time data
- No automatic updates
- No distributed features
Reliability
Benefits:
- No cascading failures (no external dependencies)
- No network timeouts (all local)
- No third-party outages
- Predictable performance
Tradeoffs:
- Single point of failure (database file)
- No redundancy (unless replicated)
Performance
Benefits:
- No network latency (local database)
- No API rate limits (self-imposed only)
- Batch queries optimized (7 queries vs 2,800)
Tradeoffs:
- Database size (216GB per instance)
- Memory usage (2.5GB minimum)
Cost
Benefits:
- No API subscription fees
- No per-request charges
- No data transfer costs (local)
Tradeoffs:
- Storage costs (216GB)
- Compute costs (self-hosted)
Future Integration Opportunities
Potential Additions
Authentication:
- OAuth 2.0 provider (Keycloak, Auth0)
- API key management (custom or Kong)
Monitoring:
- Prometheus metrics exporter
- OpenTelemetry tracing
- Structured logging to Elasticsearch
Caching:
- Redis for hot data
- HTTP caching headers
- CDN for static responses
Database:
- PostgreSQL for writable data
- Read replicas for scaling
- Full-text search (Elasticsearch, Meilisearch)
Message Queue:
- Background job processing (Celery, Sidekiq)
- Event streaming (Kafka)
Configuration:
- Environment variables
- Config files (YAML, TOML)
- Secrets management (Vault)
Integration Complexity
Current: Zero integrations (simplest possible)
With additions: Each integration adds:
- Configuration complexity
- Deployment dependencies
- Failure modes
- Maintenance burden
Recommendation: Only add integrations when necessary for specific use cases.