Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

1009 lines
21 KiB
Markdown

# Music Metadata API - Deployment
## Deployment Overview
Music Metadata API supports two primary deployment models:
1. **Standalone binary** - Single executable with database files
2. **Docker container** - Containerized deployment with orchestration support
Both models require ~216GB of database files and minimal runtime resources.
## Build Process
### Building from Source
**Prerequisites:**
- Go 1.24+
- Git
**Build steps:**
```bash
# Clone repository
git clone https://github.com/Aunali321/music-metadata-api.git
cd music-metadata-api
# Build binary (CGO disabled for static linking)
CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server
# Verify binary
./metadata-api -h
```
**Build flags explained:**
| Flag | Purpose | Impact |
|------|---------|--------|
| `CGO_ENABLED=0` | Disable CGO | Pure Go binary, no C dependencies |
| `-ldflags="-s -w"` | Strip symbols | Smaller binary (~30% reduction) |
| `-s` | Strip debug symbols | Removes symbol table |
| `-w` | Strip DWARF | Removes debugging info |
**Binary size:** ~10-15MB (stripped)
**Output:** Single executable (`metadata-api`)
### Cross-Compilation
**Build for Linux (from macOS/Windows):**
```bash
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api-linux ./cmd/server
```
**Build for ARM (Raspberry Pi, AWS Graviton):**
```bash
GOOS=linux GOARCH=arm64 CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api-arm64 ./cmd/server
```
**Supported platforms:**
- Linux (amd64, arm64)
- macOS (amd64, arm64)
- Windows (amd64)
## Docker Build
### Dockerfile
**Multi-stage build:**
```dockerfile
# Stage 1: Build
FROM golang:1.24-alpine AS builder
WORKDIR /app
# Copy dependency files
COPY go.mod go.sum ./
RUN go mod download
# Copy source code
COPY . .
# Build binary
RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o metadata-api ./cmd/server
# Stage 2: Runtime
FROM alpine:3.21
# Install CA certificates (for HTTPS if needed)
RUN apk --no-cache add ca-certificates
WORKDIR /app
# Copy binary from builder
COPY --from=builder /app/metadata-api .
# Expose port
EXPOSE 8080
# Run as non-root user
RUN adduser -D -u 1000 apiuser
USER apiuser
# Entry point
ENTRYPOINT ["/app/metadata-api"]
```
**Build characteristics:**
- **Base image:** Alpine Linux 3.21 (~5MB)
- **Final image size:** ~15-20MB (without databases)
- **Security:** Runs as non-root user
- **Layers:** Optimized for caching (dependencies separate from code)
### Building Docker Image
**Build locally:**
```bash
docker build -t metadata-api:latest .
```
**Build with specific tag:**
```bash
docker build -t metadata-api:v1.0.0 .
```
**Build for multiple platforms:**
```bash
docker buildx build --platform linux/amd64,linux/arm64 -t metadata-api:latest .
```
### Official Docker Image
**Registry:** GitHub Container Registry (ghcr.io)
**Image:** `ghcr.io/aunali321/music-metadata-api:latest`
**Pull image:**
```bash
docker pull ghcr.io/aunali321/music-metadata-api:latest
```
**Image tags:**
- `latest` - Latest build from main branch
- `v*` - Semantic version tags (e.g., `v1.0.0`)
## CI/CD Pipeline
### GitHub Actions Workflow
**File:** `.github/workflows/docker-publish.yml`
**Triggers:**
- Push to `main` branch
- Push tags matching `v*` (e.g., `v1.0.0`)
- Pull requests (build only, no publish)
**Workflow steps:**
```yaml
name: Docker Publish
on:
push:
branches: [main]
tags: ['v*']
pull_request:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ghcr.io/${{ github.repository }}
tags: |
type=ref,event=branch
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
```
**Key features:**
- **Multi-platform builds:** amd64, arm64
- **Caching:** GitHub Actions cache for faster builds
- **Automatic tagging:** Branch name, semantic versions
- **Security:** Uses GitHub token (no manual secrets)
**Notable omission:** No test step (zero tests in codebase)
### Release Process
**Create release:**
```bash
# Tag version
git tag v1.0.0
git push origin v1.0.0
# GitHub Actions automatically:
# 1. Builds Docker image
# 2. Tags as v1.0.0, v1.0, v1, latest
# 3. Pushes to ghcr.io
```
**Verify release:**
```bash
docker pull ghcr.io/aunali321/music-metadata-api:v1.0.0
```
## Standalone Deployment
### Prerequisites
**System requirements:**
- Linux, macOS, or Windows
- 216GB disk space (databases)
- 4GB+ RAM
- SSD recommended (HDD too slow)
**Database files:**
- `main_database.sqlite3` (~117GB)
- `track_files.sqlite3` (~99GB)
- Must be obtained separately (not in repository)
### Deployment Steps
**1. Prepare environment:**
```bash
# Create directory structure
mkdir -p /opt/metadata-api/data
cd /opt/metadata-api
# Copy databases
cp /path/to/main_database.sqlite3 data/
cp /path/to/track_files.sqlite3 data/
# Copy binary
cp metadata-api /opt/metadata-api/
chmod +x metadata-api
```
**2. Run service:**
```bash
./metadata-api -db /opt/metadata-api/data/main_database.sqlite3 -addr :8080
```
**3. Verify:**
```bash
curl http://localhost:8080/health
# Expected: {"status":"ok"}
```
### Systemd Service
**Create service file:** `/etc/systemd/system/metadata-api.service`
```ini
[Unit]
Description=Music Metadata API
After=network.target
[Service]
Type=simple
User=apiuser
Group=apiuser
WorkingDirectory=/opt/metadata-api
ExecStart=/opt/metadata-api/metadata-api -db /opt/metadata-api/data/main_database.sqlite3 -addr :8080
Restart=on-failure
RestartSec=10s
# Resource limits
LimitNOFILE=65536
MemoryLimit=8G
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=metadata-api
[Install]
WantedBy=multi-user.target
```
**Enable and start:**
```bash
# Create user
sudo useradd -r -s /bin/false apiuser
sudo chown -R apiuser:apiuser /opt/metadata-api
# Enable service
sudo systemctl daemon-reload
sudo systemctl enable metadata-api
sudo systemctl start metadata-api
# Check status
sudo systemctl status metadata-api
# View logs
sudo journalctl -u metadata-api -f
```
## Docker Deployment
### Docker Run
**Basic run:**
```bash
docker run -d \
--name metadata-api \
-p 8080:8080 \
-v /path/to/databases:/data:ro \
ghcr.io/aunali321/music-metadata-api:latest \
-db /data/main_database.sqlite3
```
**With resource limits:**
```bash
docker run -d \
--name metadata-api \
-p 8080:8080 \
-v /path/to/databases:/data:ro \
--memory=8g \
--cpus=2 \
--restart=unless-stopped \
ghcr.io/aunali321/music-metadata-api:latest \
-db /data/main_database.sqlite3 \
-addr :8080
```
**Verify:**
```bash
docker logs metadata-api
curl http://localhost:8080/health
```
### Docker Compose
**File:** `docker-compose.yml`
```yaml
version: '3.8'
services:
metadata-api:
image: ghcr.io/aunali321/music-metadata-api:latest
container_name: metadata-api
ports:
- "8080:8080"
volumes:
- ./data:/data:ro
environment:
- LOG_LEVEL=info # NOTE: Not actually used in code
command: ["-db", "/data/main_database.sqlite3"]
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
restart: unless-stopped
deploy:
resources:
limits:
memory: 8G
cpus: '2'
reservations:
memory: 4G
cpus: '1'
```
**Deploy:**
```bash
# Start services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
```
**Health check details:**
- **Command:** `wget --spider -q http://localhost:8080/health`
- **Interval:** Every 30 seconds
- **Timeout:** 10 seconds
- **Retries:** 3 failures before unhealthy
- **Start period:** 10 seconds grace period
**Limitation:** Health check doesn't verify database connectivity (naive implementation)
## Kubernetes Deployment
### Deployment Manifest
**File:** `k8s/deployment.yaml`
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: metadata-api
labels:
app: metadata-api
spec:
replicas: 3
selector:
matchLabels:
app: metadata-api
template:
metadata:
labels:
app: metadata-api
spec:
containers:
- name: api
image: ghcr.io/aunali321/music-metadata-api:latest
args: ["-db", "/data/main_database.sqlite3", "-addr", ":8080"]
ports:
- containerPort: 8080
name: http
volumeMounts:
- name: database
mountPath: /data
readOnly: true
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
resources:
requests:
memory: "4Gi"
cpu: "1"
limits:
memory: "8Gi"
cpu: "2"
securityContext:
runAsNonRoot: true
runAsUser: 1000
readOnlyRootFilesystem: true
volumes:
- name: database
persistentVolumeClaim:
claimName: metadata-db-pvc
```
### Service Manifest
**File:** `k8s/service.yaml`
```yaml
apiVersion: v1
kind: Service
metadata:
name: metadata-api
labels:
app: metadata-api
spec:
type: LoadBalancer
selector:
app: metadata-api
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
sessionAffinity: None
```
### Persistent Volume
**File:** `k8s/pvc.yaml`
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: metadata-db-pvc
spec:
accessModes:
- ReadOnlyMany # Multiple pods can read
resources:
requests:
storage: 220Gi # 216GB databases + overhead
storageClassName: fast-ssd # Use SSD storage class
```
**Storage options:**
- **AWS EBS:** Use `gp3` volumes (SSD)
- **GCP Persistent Disk:** Use `pd-ssd`
- **Azure Disk:** Use `Premium_LRS`
- **NFS:** Shared filesystem (slower, but works)
### Horizontal Pod Autoscaler
**File:** `k8s/hpa.yaml`
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: metadata-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: metadata-api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
### Deploy to Kubernetes
```bash
# Create namespace
kubectl create namespace metadata-api
# Apply manifests
kubectl apply -f k8s/pvc.yaml -n metadata-api
kubectl apply -f k8s/deployment.yaml -n metadata-api
kubectl apply -f k8s/service.yaml -n metadata-api
kubectl apply -f k8s/hpa.yaml -n metadata-api
# Verify deployment
kubectl get pods -n metadata-api
kubectl get svc -n metadata-api
# View logs
kubectl logs -f deployment/metadata-api -n metadata-api
# Get service URL
kubectl get svc metadata-api -n metadata-api -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
```
## Cloud Platform Deployments
### AWS ECS
**Task definition:**
```json
{
"family": "metadata-api",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "2048",
"memory": "8192",
"containerDefinitions": [{
"name": "api",
"image": "ghcr.io/aunali321/music-metadata-api:latest",
"portMappings": [{
"containerPort": 8080,
"protocol": "tcp"
}],
"command": ["-db", "/data/main_database.sqlite3"],
"mountPoints": [{
"sourceVolume": "database",
"containerPath": "/data",
"readOnly": true
}],
"healthCheck": {
"command": ["CMD-SHELL", "wget --spider -q http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 10
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/metadata-api",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}],
"volumes": [{
"name": "database",
"efsVolumeConfiguration": {
"fileSystemId": "fs-12345678",
"rootDirectory": "/databases",
"transitEncryption": "ENABLED"
}
}]
}
```
**Deploy:**
```bash
# Create EFS filesystem (for databases)
aws efs create-file-system --tags Key=Name,Value=metadata-db
# Register task definition
aws ecs register-task-definition --cli-input-json file://task-definition.json
# Create service
aws ecs create-service \
--cluster metadata-cluster \
--service-name metadata-api \
--task-definition metadata-api \
--desired-count 3 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-123],securityGroups=[sg-456]}"
```
### Google Cloud Run
**Deploy:**
```bash
# Build and push image
gcloud builds submit --tag gcr.io/PROJECT_ID/metadata-api
# Create Cloud Filestore instance (for databases)
gcloud filestore instances create metadata-db \
--zone=us-central1-a \
--tier=BASIC_SSD \
--file-share=name=databases,capacity=250GB
# Deploy to Cloud Run
gcloud run deploy metadata-api \
--image gcr.io/PROJECT_ID/metadata-api \
--platform managed \
--region us-central1 \
--memory 8Gi \
--cpu 2 \
--min-instances 1 \
--max-instances 10 \
--port 8080 \
--args="-db,/data/main_database.sqlite3" \
--execution-environment gen2 \
--vpc-connector metadata-vpc
```
**Note:** Cloud Run doesn't natively support persistent volumes. Use Cloud Filestore with VPC connector.
### Azure Container Instances
**Deploy:**
```bash
# Create Azure Files share (for databases)
az storage share create --name metadata-db --quota 250
# Deploy container
az container create \
--resource-group metadata-rg \
--name metadata-api \
--image ghcr.io/aunali321/music-metadata-api:latest \
--cpu 2 \
--memory 8 \
--ports 8080 \
--command-line "/app/metadata-api -db /data/main_database.sqlite3" \
--azure-file-volume-account-name STORAGE_ACCOUNT \
--azure-file-volume-account-key STORAGE_KEY \
--azure-file-volume-share-name metadata-db \
--azure-file-volume-mount-path /data
```
## Resource Requirements
### Minimum Requirements
| Resource | Minimum | Recommended | Notes |
|----------|---------|-------------|-------|
| CPU | 1 core | 2 cores | Search queries CPU-intensive |
| RAM | 4GB | 8GB | 2.5GB for SQLite + 1.5GB for app/OS |
| Disk | 220GB | 250GB | 216GB databases + overhead |
| Disk Type | SSD | NVMe SSD | HDD too slow for 256M rows |
| Network | 100 Mbps | 1 Gbps | For serving JSON responses |
### Scaling Considerations
**Vertical scaling:**
- More RAM: Larger SQLite cache (faster queries)
- More CPU: Faster search queries (CPU-bound)
- Faster disk: Lower query latency
**Horizontal scaling:**
- Each instance needs full 216GB database copy
- Read-only safe (no write conflicts)
- Load balancer distributes traffic
- No shared state (rate limiter per-instance)
**Cost implications:**
- 10 instances = 2.16TB storage (expensive)
- Consider shared filesystem (NFS, EFS) for databases
- Tradeoff: Shared storage slower than local SSD
## Monitoring and Logging
### Health Checks
**Endpoint:** `GET /health`
**Response:**
```json
{"status":"ok"}
```
**Limitation:** Doesn't verify database connectivity
**Improved health check (custom implementation):**
```go
func healthCheck(db *sql.DB) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
// Ping database
if err := db.Ping(); err != nil {
http.Error(w, "Database unavailable", http.StatusServiceUnavailable)
return
}
json.NewEncoder(w).Encode(map[string]string{"status": "ok"})
}
}
```
### Logging
**Current implementation:**
- Go stdlib `log/slog`
- Structured logging for errors
- Output to stdout/stderr
**Log format:**
```
2024-01-15T10:30:00Z level=ERROR msg="Database query failed" error="no such table"
```
**Docker logging:**
```bash
# View logs
docker logs -f metadata-api
# Follow logs with timestamps
docker logs -f --timestamps metadata-api
# Last 100 lines
docker logs --tail 100 metadata-api
```
**Kubernetes logging:**
```bash
# View logs
kubectl logs -f deployment/metadata-api
# Logs from all pods
kubectl logs -f -l app=metadata-api
# Previous container logs (after crash)
kubectl logs --previous pod/metadata-api-abc123
```
### Metrics (Not Implemented)
**Missing metrics:**
- Request count by endpoint
- Request duration percentiles
- Error rate
- Database query duration
- Rate limiter rejections
**Workaround:** Use reverse proxy metrics (nginx, Envoy)
## Security Considerations
### Container Security
**Best practices:**
- Run as non-root user (UID 1000)
- Read-only root filesystem
- Drop all capabilities
- No privileged mode
**Enhanced Dockerfile:**
```dockerfile
FROM alpine:3.21
RUN apk --no-cache add ca-certificates && \
adduser -D -u 1000 apiuser
WORKDIR /app
COPY --from=builder /app/metadata-api .
USER apiuser
# Read-only filesystem
RUN chmod 555 /app/metadata-api
ENTRYPOINT ["/app/metadata-api"]
```
### Network Security
**Recommendations:**
- Deploy behind reverse proxy (nginx, Traefik)
- Use TLS/HTTPS (terminate at proxy)
- Firewall rules (allow only necessary ports)
- VPC/private network (not public internet)
**Example nginx TLS:**
```nginx
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /etc/ssl/cert.pem;
ssl_certificate_key /etc/ssl/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
location / {
proxy_pass http://localhost:8080;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
```
### Database Security
**Recommendations:**
- Read-only volume mounts
- File permissions (chmod 400)
- Separate user for database files
- No write access to application
**Example permissions:**
```bash
sudo chown root:apiuser /data/main_database.sqlite3
sudo chmod 440 /data/main_database.sqlite3
```
## Troubleshooting
### Common Issues
**Issue:** Container fails to start
**Diagnosis:**
```bash
docker logs metadata-api
```
**Common causes:**
- Database file not found (check volume mount)
- Incorrect `-db` path
- Insufficient memory
**Solution:**
```bash
# Verify volume mount
docker inspect metadata-api | grep Mounts -A 10
# Check database path
docker exec metadata-api ls -lh /data
```
**Issue:** High memory usage
**Diagnosis:**
```bash
docker stats metadata-api
```
**Causes:**
- Rate limiter memory leak (unbounded visitor map)
- Large result sets
- Many concurrent requests
**Solution:**
- Restart container periodically
- Increase memory limit
- Implement visitor cleanup (code change)
**Issue:** Slow queries
**Diagnosis:**
- Check disk I/O (use SSD)
- Monitor CPU usage
- Review query patterns
**Solution:**
- Use SSD storage
- Increase SQLite cache size
- Use batch endpoints (not individual lookups)
## Backup and Recovery
### Backup Strategy
**Database backup:**
```bash
# Stop service (optional, but safer)
systemctl stop metadata-api
# Copy databases
cp /data/main_database.sqlite3 /backup/main_database.sqlite3.$(date +%Y%m%d)
cp /data/track_files.sqlite3 /backup/track_files.sqlite3.$(date +%Y%m%d)
# Restart service
systemctl start metadata-api
```
**Online backup (while running):**
```bash
sqlite3 /data/main_database.sqlite3 ".backup /backup/main_database.sqlite3"
```
### Recovery
**Restore from backup:**
```bash
# Stop service
systemctl stop metadata-api
# Restore databases
cp /backup/main_database.sqlite3.20240115 /data/main_database.sqlite3
cp /backup/track_files.sqlite3.20240115 /data/track_files.sqlite3
# Verify integrity
sqlite3 /data/main_database.sqlite3 "PRAGMA integrity_check;"
# Restart service
systemctl start metadata-api
```
## Performance Tuning
### Database Optimization
**Increase cache size:**
```
_cache_size=-128000 # 128MB (from 64MB)
```
**Increase mmap size:**
```
_mmap_size=2147483648 # 2GB (from 1GB)
```
**Connection pool:**
```go
db.SetMaxOpenConns(16) // Increase from 8
```
### Container Optimization
**CPU pinning (Docker):**
```bash
docker run --cpuset-cpus="0-3" metadata-api
```
**Memory limits:**
```bash
docker run --memory=8g --memory-swap=8g metadata-api
```
**I/O priority:**
```bash
docker run --blkio-weight=1000 metadata-api
```