Files
metadata-agregator/docs/research/music-metadata-api/analysis/DEPLOYMENT.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

21 KiB

Music Metadata API - Deployment

Deployment Overview

Music Metadata API supports two primary deployment models:

  1. Standalone binary - Single executable with database files
  2. Docker container - Containerized deployment with orchestration support

Both models require ~216GB of database files and minimal runtime resources.

Build Process

Building from Source

Prerequisites:

  • Go 1.24+
  • Git

Build steps:

# Clone repository
git clone https://github.com/Aunali321/music-metadata-api.git
cd music-metadata-api

# Build binary (CGO disabled for static linking)
CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server

# Verify binary
./metadata-api -h

Build flags explained:

Flag Purpose Impact
CGO_ENABLED=0 Disable CGO Pure Go binary, no C dependencies
-ldflags="-s -w" Strip symbols Smaller binary (~30% reduction)
-s Strip debug symbols Removes symbol table
-w Strip DWARF Removes debugging info

Binary size: ~10-15MB (stripped)

Output: Single executable (metadata-api)

Cross-Compilation

Build for Linux (from macOS/Windows):

GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api-linux ./cmd/server

Build for ARM (Raspberry Pi, AWS Graviton):

GOOS=linux GOARCH=arm64 CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api-arm64 ./cmd/server

Supported platforms:

  • Linux (amd64, arm64)
  • macOS (amd64, arm64)
  • Windows (amd64)

Docker Build

Dockerfile

Multi-stage build:

# Stage 1: Build
FROM golang:1.24-alpine AS builder

WORKDIR /app

# Copy dependency files
COPY go.mod go.sum ./
RUN go mod download

# Copy source code
COPY . .

# Build binary
RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server

# Stage 2: Runtime
FROM alpine:3.21

# Install CA certificates (for HTTPS if needed)
RUN apk --no-cache add ca-certificates

WORKDIR /app

# Copy binary from builder
COPY --from=builder /app/metadata-api .

# Expose port
EXPOSE 8080

# Run as non-root user
RUN adduser -D -u 1000 apiuser
USER apiuser

# Entry point
ENTRYPOINT ["/app/metadata-api"]

Build characteristics:

  • Base image: Alpine Linux 3.21 (~5MB)
  • Final image size: ~15-20MB (without databases)
  • Security: Runs as non-root user
  • Layers: Optimized for caching (dependencies separate from code)

Building Docker Image

Build locally:

docker build -t metadata-api:latest .

Build with specific tag:

docker build -t metadata-api:v1.0.0 .

Build for multiple platforms:

docker buildx build --platform linux/amd64,linux/arm64 -t metadata-api:latest .

Official Docker Image

Registry: GitHub Container Registry (ghcr.io)

Image: ghcr.io/aunali321/music-metadata-api:latest

Pull image:

docker pull ghcr.io/aunali321/music-metadata-api:latest

Image tags:

  • latest - Latest build from main branch
  • v* - Semantic version tags (e.g., v1.0.0)

CI/CD Pipeline

GitHub Actions Workflow

File: .github/workflows/docker-publish.yml

Triggers:

  • Push to main branch
  • Push tags matching v* (e.g., v1.0.0)
  • Pull requests (build only, no publish)

Workflow steps:

name: Docker Publish

on:
  push:
    branches: [main]
    tags: ['v*']
  pull_request:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
      
      - name: Log in to GitHub Container Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ghcr.io/${{ github.repository }}
          tags: |
            type=ref,event=branch
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
      
      - name: Build and push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

Key features:

  • Multi-platform builds: amd64, arm64
  • Caching: GitHub Actions cache for faster builds
  • Automatic tagging: Branch name, semantic versions
  • Security: Uses GitHub token (no manual secrets)

Notable omission: No test step (zero tests in codebase)

Release Process

Create release:

# Tag version
git tag v1.0.0
git push origin v1.0.0

# GitHub Actions automatically:
# 1. Builds Docker image
# 2. Tags as v1.0.0, v1.0, v1, latest
# 3. Pushes to ghcr.io

Verify release:

docker pull ghcr.io/aunali321/music-metadata-api:v1.0.0

Standalone Deployment

Prerequisites

System requirements:

  • Linux, macOS, or Windows
  • 216GB disk space (databases)
  • 4GB+ RAM
  • SSD recommended (HDD too slow)

Database files:

  • main_database.sqlite3 (~117GB)
  • track_files.sqlite3 (~99GB)
  • Must be obtained separately (not in repository)

Deployment Steps

1. Prepare environment:

# Create directory structure
mkdir -p /opt/metadata-api/data
cd /opt/metadata-api

# Copy databases
cp /path/to/main_database.sqlite3 data/
cp /path/to/track_files.sqlite3 data/

# Copy binary
cp metadata-api /opt/metadata-api/
chmod +x metadata-api

2. Run service:

./metadata-api -db /opt/metadata-api/data/main_database.sqlite3 -addr :8080

3. Verify:

curl http://localhost:8080/health
# Expected: {"status":"ok"}

Systemd Service

Create service file: /etc/systemd/system/metadata-api.service

[Unit]
Description=Music Metadata API
After=network.target

[Service]
Type=simple
User=apiuser
Group=apiuser
WorkingDirectory=/opt/metadata-api
ExecStart=/opt/metadata-api/metadata-api -db /opt/metadata-api/data/main_database.sqlite3 -addr :8080
Restart=on-failure
RestartSec=10s

# Resource limits
LimitNOFILE=65536
MemoryLimit=8G

# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=metadata-api

[Install]
WantedBy=multi-user.target

Enable and start:

# Create user
sudo useradd -r -s /bin/false apiuser
sudo chown -R apiuser:apiuser /opt/metadata-api

# Enable service
sudo systemctl daemon-reload
sudo systemctl enable metadata-api
sudo systemctl start metadata-api

# Check status
sudo systemctl status metadata-api

# View logs
sudo journalctl -u metadata-api -f

Docker Deployment

Docker Run

Basic run:

docker run -d \
  --name metadata-api \
  -p 8080:8080 \
  -v /path/to/databases:/data:ro \
  ghcr.io/aunali321/music-metadata-api:latest \
  -db /data/main_database.sqlite3

With resource limits:

docker run -d \
  --name metadata-api \
  -p 8080:8080 \
  -v /path/to/databases:/data:ro \
  --memory=8g \
  --cpus=2 \
  --restart=unless-stopped \
  ghcr.io/aunali321/music-metadata-api:latest \
  -db /data/main_database.sqlite3 \
  -addr :8080

Verify:

docker logs metadata-api
curl http://localhost:8080/health

Docker Compose

File: docker-compose.yml

version: '3.8'

services:
  metadata-api:
    image: ghcr.io/aunali321/music-metadata-api:latest
    container_name: metadata-api
    ports:
      - "8080:8080"
    volumes:
      - ./data:/data:ro
    environment:
      - LOG_LEVEL=info  # NOTE: Not actually used in code
    command: ["-db", "/data/main_database.sqlite3"]
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 8G
          cpus: '2'
        reservations:
          memory: 4G
          cpus: '1'

Deploy:

# Start services
docker-compose up -d

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Health check details:

  • Command: wget --spider -q http://localhost:8080/health
  • Interval: Every 30 seconds
  • Timeout: 10 seconds
  • Retries: 3 failures before unhealthy
  • Start period: 10 seconds grace period

Limitation: Health check doesn't verify database connectivity (naive implementation)

Kubernetes Deployment

Deployment Manifest

File: k8s/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metadata-api
  labels:
    app: metadata-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: metadata-api
  template:
    metadata:
      labels:
        app: metadata-api
    spec:
      containers:
      - name: api
        image: ghcr.io/aunali321/music-metadata-api:latest
        args: ["-db", "/data/main_database.sqlite3", "-addr", ":8080"]
        ports:
        - containerPort: 8080
          name: http
        volumeMounts:
        - name: database
          mountPath: /data
          readOnly: true
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 30
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        resources:
          requests:
            memory: "4Gi"
            cpu: "1"
          limits:
            memory: "8Gi"
            cpu: "2"
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000
          readOnlyRootFilesystem: true
      volumes:
      - name: database
        persistentVolumeClaim:
          claimName: metadata-db-pvc

Service Manifest

File: k8s/service.yaml

apiVersion: v1
kind: Service
metadata:
  name: metadata-api
  labels:
    app: metadata-api
spec:
  type: LoadBalancer
  selector:
    app: metadata-api
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  sessionAffinity: None

Persistent Volume

File: k8s/pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: metadata-db-pvc
spec:
  accessModes:
    - ReadOnlyMany  # Multiple pods can read
  resources:
    requests:
      storage: 220Gi  # 216GB databases + overhead
  storageClassName: fast-ssd  # Use SSD storage class

Storage options:

  • AWS EBS: Use gp3 volumes (SSD)
  • GCP Persistent Disk: Use pd-ssd
  • Azure Disk: Use Premium_LRS
  • NFS: Shared filesystem (slower, but works)

Horizontal Pod Autoscaler

File: k8s/hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: metadata-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: metadata-api
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Deploy to Kubernetes

# Create namespace
kubectl create namespace metadata-api

# Apply manifests
kubectl apply -f k8s/pvc.yaml -n metadata-api
kubectl apply -f k8s/deployment.yaml -n metadata-api
kubectl apply -f k8s/service.yaml -n metadata-api
kubectl apply -f k8s/hpa.yaml -n metadata-api

# Verify deployment
kubectl get pods -n metadata-api
kubectl get svc -n metadata-api

# View logs
kubectl logs -f deployment/metadata-api -n metadata-api

# Get service URL
kubectl get svc metadata-api -n metadata-api -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

Cloud Platform Deployments

AWS ECS

Task definition:

{
  "family": "metadata-api",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "2048",
  "memory": "8192",
  "containerDefinitions": [{
    "name": "api",
    "image": "ghcr.io/aunali321/music-metadata-api:latest",
    "portMappings": [{
      "containerPort": 8080,
      "protocol": "tcp"
    }],
    "command": ["-db", "/data/main_database.sqlite3"],
    "mountPoints": [{
      "sourceVolume": "database",
      "containerPath": "/data",
      "readOnly": true
    }],
    "healthCheck": {
      "command": ["CMD-SHELL", "wget --spider -q http://localhost:8080/health || exit 1"],
      "interval": 30,
      "timeout": 5,
      "retries": 3,
      "startPeriod": 10
    },
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/metadata-api",
        "awslogs-region": "us-east-1",
        "awslogs-stream-prefix": "ecs"
      }
    }
  }],
  "volumes": [{
    "name": "database",
    "efsVolumeConfiguration": {
      "fileSystemId": "fs-12345678",
      "rootDirectory": "/databases",
      "transitEncryption": "ENABLED"
    }
  }]
}

Deploy:

# Create EFS filesystem (for databases)
aws efs create-file-system --tags Key=Name,Value=metadata-db

# Register task definition
aws ecs register-task-definition --cli-input-json file://task-definition.json

# Create service
aws ecs create-service \
  --cluster metadata-cluster \
  --service-name metadata-api \
  --task-definition metadata-api \
  --desired-count 3 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-123],securityGroups=[sg-456]}"

Google Cloud Run

Deploy:

# Build and push image
gcloud builds submit --tag gcr.io/PROJECT_ID/metadata-api

# Create Cloud Filestore instance (for databases)
gcloud filestore instances create metadata-db \
  --zone=us-central1-a \
  --tier=BASIC_SSD \
  --file-share=name=databases,capacity=250GB

# Deploy to Cloud Run
gcloud run deploy metadata-api \
  --image gcr.io/PROJECT_ID/metadata-api \
  --platform managed \
  --region us-central1 \
  --memory 8Gi \
  --cpu 2 \
  --min-instances 1 \
  --max-instances 10 \
  --port 8080 \
  --args="-db,/data/main_database.sqlite3" \
  --execution-environment gen2 \
  --vpc-connector metadata-vpc

Note: Cloud Run doesn't natively support persistent volumes. Use Cloud Filestore with VPC connector.

Azure Container Instances

Deploy:

# Create Azure Files share (for databases)
az storage share create --name metadata-db --quota 250

# Deploy container
az container create \
  --resource-group metadata-rg \
  --name metadata-api \
  --image ghcr.io/aunali321/music-metadata-api:latest \
  --cpu 2 \
  --memory 8 \
  --ports 8080 \
  --command-line "/app/metadata-api -db /data/main_database.sqlite3" \
  --azure-file-volume-account-name STORAGE_ACCOUNT \
  --azure-file-volume-account-key STORAGE_KEY \
  --azure-file-volume-share-name metadata-db \
  --azure-file-volume-mount-path /data

Resource Requirements

Minimum Requirements

Resource Minimum Recommended Notes
CPU 1 core 2 cores Search queries CPU-intensive
RAM 4GB 8GB 2.5GB for SQLite + 1.5GB for app/OS
Disk 220GB 250GB 216GB databases + overhead
Disk Type SSD NVMe SSD HDD too slow for 256M rows
Network 100 Mbps 1 Gbps For serving JSON responses

Scaling Considerations

Vertical scaling:

  • More RAM: Larger SQLite cache (faster queries)
  • More CPU: Faster search queries (CPU-bound)
  • Faster disk: Lower query latency

Horizontal scaling:

  • Each instance needs full 216GB database copy
  • Read-only safe (no write conflicts)
  • Load balancer distributes traffic
  • No shared state (rate limiter per-instance)

Cost implications:

  • 10 instances = 2.16TB storage (expensive)
  • Consider shared filesystem (NFS, EFS) for databases
  • Tradeoff: Shared storage slower than local SSD

Monitoring and Logging

Health Checks

Endpoint: GET /health

Response:

{"status":"ok"}

Limitation: Doesn't verify database connectivity

Improved health check (custom implementation):

func healthCheck(db *sql.DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        // Ping database
        if err := db.Ping(); err != nil {
            http.Error(w, "Database unavailable", http.StatusServiceUnavailable)
            return
        }
        
        json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
    }
}

Logging

Current implementation:

  • Go stdlib log/slog
  • Structured logging for errors
  • Output to stdout/stderr

Log format:

2024-01-15T10:30:00Z level=ERROR msg="Database query failed" error="no such table"

Docker logging:

# View logs
docker logs -f metadata-api

# Follow logs with timestamps
docker logs -f --timestamps metadata-api

# Last 100 lines
docker logs --tail 100 metadata-api

Kubernetes logging:

# View logs
kubectl logs -f deployment/metadata-api

# Logs from all pods
kubectl logs -f -l app=metadata-api

# Previous container logs (after crash)
kubectl logs --previous pod/metadata-api-abc123

Metrics (Not Implemented)

Missing metrics:

  • Request count by endpoint
  • Request duration percentiles
  • Error rate
  • Database query duration
  • Rate limiter rejections

Workaround: Use reverse proxy metrics (nginx, Envoy)

Security Considerations

Container Security

Best practices:

  • Run as non-root user (UID 1000)
  • Read-only root filesystem
  • Drop all capabilities
  • No privileged mode

Enhanced Dockerfile:

FROM alpine:3.21

RUN apk --no-cache add ca-certificates && \
    adduser -D -u 1000 apiuser

WORKDIR /app
COPY --from=builder /app/metadata-api .

USER apiuser

# Read-only filesystem
RUN chmod 555 /app/metadata-api

ENTRYPOINT ["/app/metadata-api"]

Network Security

Recommendations:

  • Deploy behind reverse proxy (nginx, Traefik)
  • Use TLS/HTTPS (terminate at proxy)
  • Firewall rules (allow only necessary ports)
  • VPC/private network (not public internet)

Example nginx TLS:

server {
    listen 443 ssl http2;
    server_name api.example.com;
    
    ssl_certificate /etc/ssl/cert.pem;
    ssl_certificate_key /etc/ssl/key.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    
    location / {
        proxy_pass http://localhost:8080;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Database Security

Recommendations:

  • Read-only volume mounts
  • File permissions (chmod 400)
  • Separate user for database files
  • No write access to application

Example permissions:

sudo chown root:apiuser /data/main_database.sqlite3
sudo chmod 440 /data/main_database.sqlite3

Troubleshooting

Common Issues

Issue: Container fails to start

Diagnosis:

docker logs metadata-api

Common causes:

  • Database file not found (check volume mount)
  • Incorrect -db path
  • Insufficient memory

Solution:

# Verify volume mount
docker inspect metadata-api | grep Mounts -A 10

# Check database path
docker exec metadata-api ls -lh /data

Issue: High memory usage

Diagnosis:

docker stats metadata-api

Causes:

  • Rate limiter memory leak (unbounded visitor map)
  • Large result sets
  • Many concurrent requests

Solution:

  • Restart container periodically
  • Increase memory limit
  • Implement visitor cleanup (code change)

Issue: Slow queries

Diagnosis:

  • Check disk I/O (use SSD)
  • Monitor CPU usage
  • Review query patterns

Solution:

  • Use SSD storage
  • Increase SQLite cache size
  • Use batch endpoints (not individual lookups)

Backup and Recovery

Backup Strategy

Database backup:

# Stop service (optional, but safer)
systemctl stop metadata-api

# Copy databases
cp /data/main_database.sqlite3 /backup/main_database.sqlite3.$(date +%Y%m%d)
cp /data/track_files.sqlite3 /backup/track_files.sqlite3.$(date +%Y%m%d)

# Restart service
systemctl start metadata-api

Online backup (while running):

sqlite3 /data/main_database.sqlite3 ".backup /backup/main_database.sqlite3"

Recovery

Restore from backup:

# Stop service
systemctl stop metadata-api

# Restore databases
cp /backup/main_database.sqlite3.20240115 /data/main_database.sqlite3
cp /backup/track_files.sqlite3.20240115 /data/track_files.sqlite3

# Verify integrity
sqlite3 /data/main_database.sqlite3 "PRAGMA integrity_check;"

# Restart service
systemctl start metadata-api

Performance Tuning

Database Optimization

Increase cache size:

_cache_size=-128000  # 128MB (from 64MB)

Increase mmap size:

_mmap_size=2147483648  # 2GB (from 1GB)

Connection pool:

db.SetMaxOpenConns(16)  // Increase from 8

Container Optimization

CPU pinning (Docker):

docker run --cpuset-cpus="0-3" metadata-api

Memory limits:

docker run --memory=8g --memory-swap=8g metadata-api

I/O priority:

docker run --blkio-weight=1000 metadata-api