# Music Metadata API - Deployment ## Deployment Overview Music Metadata API supports two primary deployment models: 1. **Standalone binary** - Single executable with database files 2. **Docker container** - Containerized deployment with orchestration support Both models require ~216GB of database files and minimal runtime resources. ## Build Process ### Building from Source **Prerequisites:** - Go 1.24+ - Git **Build steps:** ```bash # Clone repository git clone https://github.com/Aunali321/music-metadata-api.git cd music-metadata-api # Build binary (CGO disabled for static linking) CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server # Verify binary ./metadata-api -h ``` **Build flags explained:** | Flag | Purpose | Impact | |------|---------|--------| | `CGO_ENABLED=0` | Disable CGO | Pure Go binary, no C dependencies | | `-ldflags="-s -w"` | Strip symbols | Smaller binary (~30% reduction) | | `-s` | Strip debug symbols | Removes symbol table | | `-w` | Strip DWARF | Removes debugging info | **Binary size:** ~10-15MB (stripped) **Output:** Single executable (`metadata-api`) ### Cross-Compilation **Build for Linux (from macOS/Windows):** ```bash GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api-linux ./cmd/server ``` **Build for ARM (Raspberry Pi, AWS Graviton):** ```bash GOOS=linux GOARCH=arm64 CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api-arm64 ./cmd/server ``` **Supported platforms:** - Linux (amd64, arm64) - macOS (amd64, arm64) - Windows (amd64) ## Docker Build ### Dockerfile **Multi-stage build:** ```dockerfile # Stage 1: Build FROM golang:1.24-alpine AS builder WORKDIR /app # Copy dependency files COPY go.mod go.sum ./ RUN go mod download # Copy source code COPY . . # Build binary RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server # Stage 2: Runtime FROM alpine:3.21 # Install CA certificates (for HTTPS if needed) RUN apk --no-cache add ca-certificates WORKDIR /app # Copy binary from builder COPY --from=builder /app/metadata-api . # Expose port EXPOSE 8080 # Run as non-root user RUN adduser -D -u 1000 apiuser USER apiuser # Entry point ENTRYPOINT ["/app/metadata-api"] ``` **Build characteristics:** - **Base image:** Alpine Linux 3.21 (~5MB) - **Final image size:** ~15-20MB (without databases) - **Security:** Runs as non-root user - **Layers:** Optimized for caching (dependencies separate from code) ### Building Docker Image **Build locally:** ```bash docker build -t metadata-api:latest . ``` **Build with specific tag:** ```bash docker build -t metadata-api:v1.0.0 . ``` **Build for multiple platforms:** ```bash docker buildx build --platform linux/amd64,linux/arm64 -t metadata-api:latest . ``` ### Official Docker Image **Registry:** GitHub Container Registry (ghcr.io) **Image:** `ghcr.io/aunali321/music-metadata-api:latest` **Pull image:** ```bash docker pull ghcr.io/aunali321/music-metadata-api:latest ``` **Image tags:** - `latest` - Latest build from main branch - `v*` - Semantic version tags (e.g., `v1.0.0`) ## CI/CD Pipeline ### GitHub Actions Workflow **File:** `.github/workflows/docker-publish.yml` **Triggers:** - Push to `main` branch - Push tags matching `v*` (e.g., `v1.0.0`) - Pull requests (build only, no publish) **Workflow steps:** ```yaml name: Docker Publish on: push: branches: [main] tags: ['v*'] pull_request: branches: [main] jobs: build: runs-on: ubuntu-latest permissions: contents: read packages: write steps: - name: Checkout code uses: actions/checkout@v4 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v3 - name: Log in to GitHub Container Registry uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ghcr.io/${{ github.repository }} tags: | type=ref,event=branch type=semver,pattern={{version}} type=semver,pattern={{major}}.{{minor}} - name: Build and push uses: docker/build-push-action@v5 with: context: . push: ${{ github.event_name != 'pull_request' }} tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} cache-from: type=gha cache-to: type=gha,mode=max ``` **Key features:** - **Multi-platform builds:** amd64, arm64 - **Caching:** GitHub Actions cache for faster builds - **Automatic tagging:** Branch name, semantic versions - **Security:** Uses GitHub token (no manual secrets) **Notable omission:** No test step (zero tests in codebase) ### Release Process **Create release:** ```bash # Tag version git tag v1.0.0 git push origin v1.0.0 # GitHub Actions automatically: # 1. Builds Docker image # 2. Tags as v1.0.0, v1.0, v1, latest # 3. Pushes to ghcr.io ``` **Verify release:** ```bash docker pull ghcr.io/aunali321/music-metadata-api:v1.0.0 ``` ## Standalone Deployment ### Prerequisites **System requirements:** - Linux, macOS, or Windows - 216GB disk space (databases) - 4GB+ RAM - SSD recommended (HDD too slow) **Database files:** - `main_database.sqlite3` (~117GB) - `track_files.sqlite3` (~99GB) - Must be obtained separately (not in repository) ### Deployment Steps **1. Prepare environment:** ```bash # Create directory structure mkdir -p /opt/metadata-api/data cd /opt/metadata-api # Copy databases cp /path/to/main_database.sqlite3 data/ cp /path/to/track_files.sqlite3 data/ # Copy binary cp metadata-api /opt/metadata-api/ chmod +x metadata-api ``` **2. Run service:** ```bash ./metadata-api -db /opt/metadata-api/data/main_database.sqlite3 -addr :8080 ``` **3. Verify:** ```bash curl http://localhost:8080/health # Expected: {"status":"ok"} ``` ### Systemd Service **Create service file:** `/etc/systemd/system/metadata-api.service` ```ini [Unit] Description=Music Metadata API After=network.target [Service] Type=simple User=apiuser Group=apiuser WorkingDirectory=/opt/metadata-api ExecStart=/opt/metadata-api/metadata-api -db /opt/metadata-api/data/main_database.sqlite3 -addr :8080 Restart=on-failure RestartSec=10s # Resource limits LimitNOFILE=65536 MemoryLimit=8G # Logging StandardOutput=journal StandardError=journal SyslogIdentifier=metadata-api [Install] WantedBy=multi-user.target ``` **Enable and start:** ```bash # Create user sudo useradd -r -s /bin/false apiuser sudo chown -R apiuser:apiuser /opt/metadata-api # Enable service sudo systemctl daemon-reload sudo systemctl enable metadata-api sudo systemctl start metadata-api # Check status sudo systemctl status metadata-api # View logs sudo journalctl -u metadata-api -f ``` ## Docker Deployment ### Docker Run **Basic run:** ```bash docker run -d \ --name metadata-api \ -p 8080:8080 \ -v /path/to/databases:/data:ro \ ghcr.io/aunali321/music-metadata-api:latest \ -db /data/main_database.sqlite3 ``` **With resource limits:** ```bash docker run -d \ --name metadata-api \ -p 8080:8080 \ -v /path/to/databases:/data:ro \ --memory=8g \ --cpus=2 \ --restart=unless-stopped \ ghcr.io/aunali321/music-metadata-api:latest \ -db /data/main_database.sqlite3 \ -addr :8080 ``` **Verify:** ```bash docker logs metadata-api curl http://localhost:8080/health ``` ### Docker Compose **File:** `docker-compose.yml` ```yaml version: '3.8' services: metadata-api: image: ghcr.io/aunali321/music-metadata-api:latest container_name: metadata-api ports: - "8080:8080" volumes: - ./data:/data:ro environment: - LOG_LEVEL=info # NOTE: Not actually used in code command: ["-db", "/data/main_database.sqlite3"] healthcheck: test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 3 start_period: 10s restart: unless-stopped deploy: resources: limits: memory: 8G cpus: '2' reservations: memory: 4G cpus: '1' ``` **Deploy:** ```bash # Start services docker-compose up -d # View logs docker-compose logs -f # Stop services docker-compose down ``` **Health check details:** - **Command:** `wget --spider -q http://localhost:8080/health` - **Interval:** Every 30 seconds - **Timeout:** 10 seconds - **Retries:** 3 failures before unhealthy - **Start period:** 10 seconds grace period **Limitation:** Health check doesn't verify database connectivity (naive implementation) ## Kubernetes Deployment ### Deployment Manifest **File:** `k8s/deployment.yaml` ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: metadata-api labels: app: metadata-api spec: replicas: 3 selector: matchLabels: app: metadata-api template: metadata: labels: app: metadata-api spec: containers: - name: api image: ghcr.io/aunali321/music-metadata-api:latest args: ["-db", "/data/main_database.sqlite3", "-addr", ":8080"] ports: - containerPort: 8080 name: http volumeMounts: - name: database mountPath: /data readOnly: true livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 10 periodSeconds: 30 timeoutSeconds: 5 failureThreshold: 3 readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 resources: requests: memory: "4Gi" cpu: "1" limits: memory: "8Gi" cpu: "2" securityContext: runAsNonRoot: true runAsUser: 1000 readOnlyRootFilesystem: true volumes: - name: database persistentVolumeClaim: claimName: metadata-db-pvc ``` ### Service Manifest **File:** `k8s/service.yaml` ```yaml apiVersion: v1 kind: Service metadata: name: metadata-api labels: app: metadata-api spec: type: LoadBalancer selector: app: metadata-api ports: - port: 80 targetPort: 8080 protocol: TCP name: http sessionAffinity: None ``` ### Persistent Volume **File:** `k8s/pvc.yaml` ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: metadata-db-pvc spec: accessModes: - ReadOnlyMany # Multiple pods can read resources: requests: storage: 220Gi # 216GB databases + overhead storageClassName: fast-ssd # Use SSD storage class ``` **Storage options:** - **AWS EBS:** Use `gp3` volumes (SSD) - **GCP Persistent Disk:** Use `pd-ssd` - **Azure Disk:** Use `Premium_LRS` - **NFS:** Shared filesystem (slower, but works) ### Horizontal Pod Autoscaler **File:** `k8s/hpa.yaml` ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: metadata-api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: metadata-api minReplicas: 3 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` ### Deploy to Kubernetes ```bash # Create namespace kubectl create namespace metadata-api # Apply manifests kubectl apply -f k8s/pvc.yaml -n metadata-api kubectl apply -f k8s/deployment.yaml -n metadata-api kubectl apply -f k8s/service.yaml -n metadata-api kubectl apply -f k8s/hpa.yaml -n metadata-api # Verify deployment kubectl get pods -n metadata-api kubectl get svc -n metadata-api # View logs kubectl logs -f deployment/metadata-api -n metadata-api # Get service URL kubectl get svc metadata-api -n metadata-api -o jsonpath='{.status.loadBalancer.ingress[0].ip}' ``` ## Cloud Platform Deployments ### AWS ECS **Task definition:** ```json { "family": "metadata-api", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "2048", "memory": "8192", "containerDefinitions": [{ "name": "api", "image": "ghcr.io/aunali321/music-metadata-api:latest", "portMappings": [{ "containerPort": 8080, "protocol": "tcp" }], "command": ["-db", "/data/main_database.sqlite3"], "mountPoints": [{ "sourceVolume": "database", "containerPath": "/data", "readOnly": true }], "healthCheck": { "command": ["CMD-SHELL", "wget --spider -q http://localhost:8080/health || exit 1"], "interval": 30, "timeout": 5, "retries": 3, "startPeriod": 10 }, "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/metadata-api", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "ecs" } } }], "volumes": [{ "name": "database", "efsVolumeConfiguration": { "fileSystemId": "fs-12345678", "rootDirectory": "/databases", "transitEncryption": "ENABLED" } }] } ``` **Deploy:** ```bash # Create EFS filesystem (for databases) aws efs create-file-system --tags Key=Name,Value=metadata-db # Register task definition aws ecs register-task-definition --cli-input-json file://task-definition.json # Create service aws ecs create-service \ --cluster metadata-cluster \ --service-name metadata-api \ --task-definition metadata-api \ --desired-count 3 \ --launch-type FARGATE \ --network-configuration "awsvpcConfiguration={subnets=[subnet-123],securityGroups=[sg-456]}" ``` ### Google Cloud Run **Deploy:** ```bash # Build and push image gcloud builds submit --tag gcr.io/PROJECT_ID/metadata-api # Create Cloud Filestore instance (for databases) gcloud filestore instances create metadata-db \ --zone=us-central1-a \ --tier=BASIC_SSD \ --file-share=name=databases,capacity=250GB # Deploy to Cloud Run gcloud run deploy metadata-api \ --image gcr.io/PROJECT_ID/metadata-api \ --platform managed \ --region us-central1 \ --memory 8Gi \ --cpu 2 \ --min-instances 1 \ --max-instances 10 \ --port 8080 \ --args="-db,/data/main_database.sqlite3" \ --execution-environment gen2 \ --vpc-connector metadata-vpc ``` **Note:** Cloud Run doesn't natively support persistent volumes. Use Cloud Filestore with VPC connector. ### Azure Container Instances **Deploy:** ```bash # Create Azure Files share (for databases) az storage share create --name metadata-db --quota 250 # Deploy container az container create \ --resource-group metadata-rg \ --name metadata-api \ --image ghcr.io/aunali321/music-metadata-api:latest \ --cpu 2 \ --memory 8 \ --ports 8080 \ --command-line "/app/metadata-api -db /data/main_database.sqlite3" \ --azure-file-volume-account-name STORAGE_ACCOUNT \ --azure-file-volume-account-key STORAGE_KEY \ --azure-file-volume-share-name metadata-db \ --azure-file-volume-mount-path /data ``` ## Resource Requirements ### Minimum Requirements | Resource | Minimum | Recommended | Notes | |----------|---------|-------------|-------| | CPU | 1 core | 2 cores | Search queries CPU-intensive | | RAM | 4GB | 8GB | 2.5GB for SQLite + 1.5GB for app/OS | | Disk | 220GB | 250GB | 216GB databases + overhead | | Disk Type | SSD | NVMe SSD | HDD too slow for 256M rows | | Network | 100 Mbps | 1 Gbps | For serving JSON responses | ### Scaling Considerations **Vertical scaling:** - More RAM: Larger SQLite cache (faster queries) - More CPU: Faster search queries (CPU-bound) - Faster disk: Lower query latency **Horizontal scaling:** - Each instance needs full 216GB database copy - Read-only safe (no write conflicts) - Load balancer distributes traffic - No shared state (rate limiter per-instance) **Cost implications:** - 10 instances = 2.16TB storage (expensive) - Consider shared filesystem (NFS, EFS) for databases - Tradeoff: Shared storage slower than local SSD ## Monitoring and Logging ### Health Checks **Endpoint:** `GET /health` **Response:** ```json {"status":"ok"} ``` **Limitation:** Doesn't verify database connectivity **Improved health check (custom implementation):** ```go func healthCheck(db *sql.DB) http.HandlerFunc { return func(w http.ResponseWriter, r *http.Request) { // Ping database if err := db.Ping(); err != nil { http.Error(w, "Database unavailable", http.StatusServiceUnavailable) return } json.NewEncoder(w).Encode(map[string]string{"status": "ok"}) } } ``` ### Logging **Current implementation:** - Go stdlib `log/slog` - Structured logging for errors - Output to stdout/stderr **Log format:** ``` 2024-01-15T10:30:00Z level=ERROR msg="Database query failed" error="no such table" ``` **Docker logging:** ```bash # View logs docker logs -f metadata-api # Follow logs with timestamps docker logs -f --timestamps metadata-api # Last 100 lines docker logs --tail 100 metadata-api ``` **Kubernetes logging:** ```bash # View logs kubectl logs -f deployment/metadata-api # Logs from all pods kubectl logs -f -l app=metadata-api # Previous container logs (after crash) kubectl logs --previous pod/metadata-api-abc123 ``` ### Metrics (Not Implemented) **Missing metrics:** - Request count by endpoint - Request duration percentiles - Error rate - Database query duration - Rate limiter rejections **Workaround:** Use reverse proxy metrics (nginx, Envoy) ## Security Considerations ### Container Security **Best practices:** - Run as non-root user (UID 1000) - Read-only root filesystem - Drop all capabilities - No privileged mode **Enhanced Dockerfile:** ```dockerfile FROM alpine:3.21 RUN apk --no-cache add ca-certificates && \ adduser -D -u 1000 apiuser WORKDIR /app COPY --from=builder /app/metadata-api . USER apiuser # Read-only filesystem RUN chmod 555 /app/metadata-api ENTRYPOINT ["/app/metadata-api"] ``` ### Network Security **Recommendations:** - Deploy behind reverse proxy (nginx, Traefik) - Use TLS/HTTPS (terminate at proxy) - Firewall rules (allow only necessary ports) - VPC/private network (not public internet) **Example nginx TLS:** ```nginx server { listen 443 ssl http2; server_name api.example.com; ssl_certificate /etc/ssl/cert.pem; ssl_certificate_key /etc/ssl/key.pem; ssl_protocols TLSv1.2 TLSv1.3; location / { proxy_pass http://localhost:8080; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } } ``` ### Database Security **Recommendations:** - Read-only volume mounts - File permissions (chmod 400) - Separate user for database files - No write access to application **Example permissions:** ```bash sudo chown root:apiuser /data/main_database.sqlite3 sudo chmod 440 /data/main_database.sqlite3 ``` ## Troubleshooting ### Common Issues **Issue:** Container fails to start **Diagnosis:** ```bash docker logs metadata-api ``` **Common causes:** - Database file not found (check volume mount) - Incorrect `-db` path - Insufficient memory **Solution:** ```bash # Verify volume mount docker inspect metadata-api | grep Mounts -A 10 # Check database path docker exec metadata-api ls -lh /data ``` **Issue:** High memory usage **Diagnosis:** ```bash docker stats metadata-api ``` **Causes:** - Rate limiter memory leak (unbounded visitor map) - Large result sets - Many concurrent requests **Solution:** - Restart container periodically - Increase memory limit - Implement visitor cleanup (code change) **Issue:** Slow queries **Diagnosis:** - Check disk I/O (use SSD) - Monitor CPU usage - Review query patterns **Solution:** - Use SSD storage - Increase SQLite cache size - Use batch endpoints (not individual lookups) ## Backup and Recovery ### Backup Strategy **Database backup:** ```bash # Stop service (optional, but safer) systemctl stop metadata-api # Copy databases cp /data/main_database.sqlite3 /backup/main_database.sqlite3.$(date +%Y%m%d) cp /data/track_files.sqlite3 /backup/track_files.sqlite3.$(date +%Y%m%d) # Restart service systemctl start metadata-api ``` **Online backup (while running):** ```bash sqlite3 /data/main_database.sqlite3 ".backup /backup/main_database.sqlite3" ``` ### Recovery **Restore from backup:** ```bash # Stop service systemctl stop metadata-api # Restore databases cp /backup/main_database.sqlite3.20240115 /data/main_database.sqlite3 cp /backup/track_files.sqlite3.20240115 /data/track_files.sqlite3 # Verify integrity sqlite3 /data/main_database.sqlite3 "PRAGMA integrity_check;" # Restart service systemctl start metadata-api ``` ## Performance Tuning ### Database Optimization **Increase cache size:** ``` _cache_size=-128000 # 128MB (from 64MB) ``` **Increase mmap size:** ``` _mmap_size=2147483648 # 2GB (from 1GB) ``` **Connection pool:** ```go db.SetMaxOpenConns(16) // Increase from 8 ``` ### Container Optimization **CPU pinning (Docker):** ```bash docker run --cpuset-cpus="0-3" metadata-api ``` **Memory limits:** ```bash docker run --memory=8g --memory-swap=8g metadata-api ``` **I/O priority:** ```bash docker run --blkio-weight=1000 metadata-api ```