Files
metadata-agregator/docs/research/musicbrainz-server/analysis/DEPLOYMENT.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

14 KiB

MusicBrainz Server Deployment

Docker Architecture

Build System

Template Engine: M4 macros
Base Image: Ubuntu Noble (24.04 LTS)
Dockerfile Location: docker/Dockerfile.template

Template Processing:

# Generate Dockerfile from template
m4 docker/Dockerfile.template > docker/Dockerfile

M4 Macros:

  • INSTALL_PERL_DEPENDENCIES - Install Perl modules via carton
  • INSTALL_NODE_DEPENDENCIES - Install Node.js packages via yarn
  • COMPILE_RESOURCES - Compile static assets
  • SETUP_DATABASE - Initialize PostgreSQL schema

Multi-Stage Build:

  1. Base stage - Install system dependencies
  2. Build stage - Compile assets and dependencies
  3. Runtime stage - Copy artifacts, minimal runtime

Container Types

website:

  • Main web application
  • Serves HTML pages via Template Toolkit
  • Handles user authentication and sessions
  • Port: 5000

webservice:

  • API endpoints (/ws/2/)
  • JSON/XML serialization
  • OAuth authentication
  • Port: 5001

tests:

  • Run test suites
  • Perl unit tests
  • JavaScript tests
  • pgTAP database tests
  • No exposed ports (ephemeral)

cron:

  • Scheduled tasks
  • Statistics calculation
  • Data cleanup
  • Replication packet export
  • No exposed ports

sitemaps:

  • Generate XML sitemaps
  • Update search engine indexes
  • Run daily
  • No exposed ports

json-dump:

  • Export database to JSON
  • Generate data dumps for download
  • Run weekly
  • No exposed ports

solr-backup:

  • Backup Solr indexes
  • Run daily
  • No exposed ports

template-renderer:

  • Isolated Template Toolkit renderer
  • Forked from main process
  • Prevents template errors from crashing main app
  • IPC via Unix socket

Docker Compose

File: docker-compose.yml

Services:

services:
  db:
    image: postgres:16
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: musicbrainz
      POSTGRES_PASSWORD: musicbrainz
      POSTGRES_DB: musicbrainz_db
    ports:
      - "5432:5432"

  redis:
    image: redis:7
    volumes:
      - redisdata:/data
    ports:
      - "6379:6379"

  solr:
    image: solr:8.11
    volumes:
      - solrdata:/var/solr
    ports:
      - "8983:8983"

  website:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: website
    depends_on:
      - db
      - redis
      - solr
    ports:
      - "5000:5000"
    environment:
      MUSICBRAINZ_SERVER_PROCESSES: 10
      MUSICBRAINZ_USE_PROXY: 1

  webservice:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: webservice
    depends_on:
      - db
      - redis
      - solr
    ports:
      - "5001:5001"

volumes:
  pgdata:
  redisdata:
  solrdata:

Image Layers

Base Layer (Ubuntu Noble):

  • System packages (build-essential, libpq-dev, etc.)
  • Perl 5.38
  • Node.js 20
  • PostgreSQL client libraries

Dependency Layer:

  • Perl modules (via carton)
  • Node.js packages (via yarn)
  • Cached for faster rebuilds

Application Layer:

  • Application code
  • Compiled assets
  • Configuration templates

Runtime Layer:

  • Minimal runtime dependencies
  • No build tools
  • Smaller image size

PSGI Server Configuration

Starlet

Server: Starlet (high-performance PSGI server)
Protocol: HTTP/1.1
Concurrency: Pre-forking worker model

Configuration:

# Start Starlet with 10 workers
starman --workers 10 \
        --max-requests 100 \
        --listen :5000 \
        app.psgi

Worker Settings:

  • Workers: 10 (configurable via MUSICBRAINZ_SERVER_PROCESSES)
  • Max Requests per Worker: 30-90 (random to prevent thundering herd)
  • Worker Timeout: 300 seconds (5 minutes)
  • Keepalive: Enabled (60 seconds)

Worker Lifecycle:

  1. Master process forks 10 workers
  2. Each worker handles requests until max_requests reached
  3. Worker exits gracefully
  4. Master forks new worker to replace it
  5. Prevents memory leaks from accumulating

Server::Starter (Zero-Downtime Restarts)

Purpose: Enable zero-downtime deployments

Mechanism:

  1. Server::Starter binds to port
  2. Forks Starlet with inherited socket
  3. On restart signal (HUP):
    • Start new Starlet process
    • New process binds to same socket
    • Old process finishes existing requests
    • Old process exits
    • No dropped connections

Command:

start_server \
  --port 5000 \
  --pid-file /var/run/musicbrainz.pid \
  --status-file /var/run/musicbrainz.status \
  -- \
  starman --workers 10 app.psgi

Restart:

# Send HUP signal to trigger graceful restart
kill -HUP $(cat /var/run/musicbrainz.pid)

Status Check:

# Check server status
cat /var/run/musicbrainz.status
# Output: 1234:5000 (PID:PORT)

Reverse Proxy

Production Setup: Nginx reverse proxy in front of Starlet

Nginx Configuration:

upstream musicbrainz {
    server localhost:5000;
    keepalive 32;
}

server {
    listen 80;
    server_name musicbrainz.org;

    location / {
        proxy_pass http://musicbrainz;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }

    location /static/ {
        alias /var/www/musicbrainz/root/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
    }
}

Benefits:

  • SSL termination
  • Static file serving
  • Gzip compression
  • Request buffering
  • Load balancing (multiple Starlet instances)

CI/CD Pipeline

GitHub Actions

Workflow File: .github/workflows/test.yml

Triggers:

  • Push to main branch
  • Pull requests
  • Manual workflow dispatch

Build Stage

Job: build-tests-image

Steps:

  1. Checkout code
  2. Set up Docker Buildx
  3. Build test Docker image
  4. Push to GitHub Container Registry
  5. Cache layers for faster rebuilds

Dockerfile: docker/Dockerfile.test

Caching:

  • Perl dependencies cached by cpanfile.snapshot hash
  • Node dependencies cached by yarn.lock hash
  • Docker layer caching via GitHub Actions cache

Test Stages

Job: js-perl-and-pgtap

Matrix:

  • Perl 5.38.0 (stable)
  • Perl 5.42.0 (latest)

Steps:

  1. Pull test image from registry
  2. Start PostgreSQL container
  3. Start Redis container
  4. Initialize test database
  5. Run Perl tests (prove -lr t/)
  6. Run JavaScript tests (yarn test)
  7. Run pgTAP tests (pg_prove -d musicbrainz_test t/pgtap/)
  8. Upload coverage reports

Parallelization: Tests run in parallel across matrix

Selenium Tests

Jobs: selenium-1, selenium-2, selenium-3, selenium-4

Partitioning: Tests split into 4 partitions for parallel execution

Steps:

  1. Pull test image
  2. Start PostgreSQL, Redis, Solr
  3. Start Selenium standalone Chrome
  4. Initialize test database with sample data
  5. Start MusicBrainz server
  6. Run Selenium tests for partition
  7. Upload screenshots on failure

Partition Strategy:

# Partition 1: Artist and release tests
# Partition 2: Recording and work tests
# Partition 3: Edit and relationship tests
# Partition 4: Search and browse tests

Selenium Configuration:

# t/selenium.pl
use Selenium::Remote::Driver;

my $driver = Selenium::Remote::Driver->new(
    remote_server_addr => 'localhost',
    port => 4444,
    browser_name => 'chrome',
    extra_capabilities => {
        chromeOptions => {
            args => ['--headless', '--no-sandbox', '--disable-dev-shm-usage'],
        },
    },
);

Second-Tier Tests

Job: second-perl-and-pgtap

Purpose: Test against Perl 5.42.0 (latest stable)

Trigger: After main tests pass

Allowed to Fail: Yes (informational only)

Report Generation

Job: generate-reports

Steps:

  1. Download coverage reports from all test jobs
  2. Merge coverage data
  3. Generate HTML coverage report
  4. Upload to Codecov
  5. Comment on PR with coverage summary

Coverage Tools:

  • Perl: Devel::Cover
  • JavaScript: Istanbul/nyc

Build Process

Step 1: Install Perl Dependencies

# Install Carton (Perl dependency manager)
cpanm --notest Carton

# Install dependencies from cpanfile.snapshot
carton install --deployment

Dependencies Installed:

  • Catalyst framework
  • Moose object system
  • DBD::Pg database driver
  • Template::Toolkit
  • JSON::XS
  • XML::LibXML
  • Redis client
  • ~200 total CPAN modules

Installation Time: ~10 minutes (first time), ~1 minute (cached)

Step 2: Install Node.js Dependencies

# Install Yarn (if not present)
npm install -g yarn

# Install dependencies from yarn.lock
yarn install --frozen-lockfile

Dependencies Installed:

  • React 19.2.4
  • Redux
  • Webpack 5
  • Babel 7
  • Jest (testing)
  • ESLint (linting)
  • ~500 total npm packages

Installation Time: ~5 minutes (first time), ~30 seconds (cached)

Step 3: Compile Static Resources

# Compile CSS, images, fonts
./script/compile_resources.sh

Tasks:

  • Compile LESS to CSS
  • Optimize images (pngcrush, optipng)
  • Copy fonts to static directory
  • Generate CSS sprites
  • Minify CSS

Output: root/static/styles/, root/static/images/

Time: ~2 minutes

Step 4: Build JavaScript Bundles

# Build production bundles with Webpack
yarn run build

# Or for development (with source maps)
yarn run build:dev

Webpack Configuration:

  • Entry points: root/static/scripts/main.js, root/static/scripts/edit.js
  • Output: root/static/build/
  • Loaders: Babel (JSX, ES6+), CSS, file-loader
  • Plugins: UglifyJS, ExtractTextPlugin, DefinePlugin
  • Code splitting: Vendor bundle, async chunks

Output Files:

  • main.bundle.js - Main application code
  • vendor.bundle.js - Third-party libraries
  • edit.bundle.js - Edit interface code
  • *.chunk.js - Async-loaded chunks

Time: ~3 minutes (production), ~30 seconds (development)

Step 5: Initialize Database

# Create database
createdb musicbrainz_db

# Load schema
psql musicbrainz_db < admin/sql/CreateTables.sql

# Load initial data
./admin/InitDb.pl --createdb --import

Schema Loading:

  • 375 tables created
  • 500+ foreign keys added
  • Indexes created
  • Triggers installed

Initial Data:

  • Countries and areas
  • Languages
  • Relationship types
  • Instrument types
  • Genre definitions

Time: ~10 minutes (schema), ~30 minutes (sample data)

Step 6: Build Search Indexes

# Build Solr indexes for all entities
./admin/BuildSearchIndexes.pl --all

Indexes Built:

  • Artist index
  • Release index
  • Recording index
  • Work index
  • Label index
  • Area, event, place, series, instrument indexes

Time: ~2 hours (full production data), ~5 minutes (sample data)

System Requirements

Minimum Requirements (Development)

CPU: 2 cores
RAM: 4 GB
Disk: 20 GB
Database: PostgreSQL 16+
Cache: Redis 6.0+
Search: Solr 8.11+

CPU: 8+ cores
RAM: 16+ GB
Disk: 500+ GB SSD

  • 350 GB for PostgreSQL database
  • 50 GB for Solr indexes
  • 50 GB for backups
  • 50 GB for logs and temp files

Database: PostgreSQL 16+ with:

  • shared_buffers = 4GB
  • effective_cache_size = 12GB
  • work_mem = 64MB
  • maintenance_work_mem = 1GB

Cache: Redis 6.0+ with:

  • maxmemory = 2GB
  • maxmemory-policy = allkeys-lru

Search: Solr 8.11+ with:

  • Java heap = 4GB
  • Solr cache = 512MB per core

Network Requirements

Bandwidth: 100 Mbps+ (for replication and API traffic)

Ports:

  • 5000 - Website
  • 5001 - Web service API
  • 5432 - PostgreSQL
  • 6379 - Redis
  • 8983 - Solr

Firewall:

  • Allow inbound 80/443 (HTTP/HTTPS)
  • Allow outbound 80/443 (external APIs)
  • Restrict 5432, 6379, 8983 to localhost

Software Requirements

Operating System:

  • Ubuntu 24.04 LTS (Noble) - recommended
  • Debian 12 (Bookworm)
  • Any Linux with Perl 5.38+ and Node.js 20+

Perl: 5.38.0 or later (5.42.0 tested)

Node.js: 20.9.0 or later

PostgreSQL: 16.0 or later (16.3 recommended)

Redis: 6.0 or later (7.0 recommended)

Solr: 8.11 or later

Optional:

  • Docker 24.0+
  • Docker Compose 2.0+
  • Nginx 1.24+ (reverse proxy)
  • RabbitMQ 3.12+ (background jobs)

Deployment Strategies

Single Server

Use Case: Development, small mirrors

Architecture:

  • All services on one server
  • PostgreSQL, Redis, Solr, MusicBrainz on localhost
  • Nginx reverse proxy

Pros:

  • Simple setup
  • Low cost
  • Easy to manage

Cons:

  • Single point of failure
  • Limited scalability
  • Resource contention

Multi-Server

Use Case: Production, high-traffic mirrors

Architecture:

  • Web tier: 2+ servers running MusicBrainz (load balanced)
  • Database tier: PostgreSQL primary + replicas
  • Cache tier: Redis (possibly clustered)
  • Search tier: Solr (possibly sharded)

Pros:

  • High availability
  • Horizontal scalability
  • Better performance

Cons:

  • Complex setup
  • Higher cost
  • Requires load balancer

Docker Swarm / Kubernetes

Use Case: Large-scale deployments, cloud environments

Architecture:

  • Container orchestration
  • Auto-scaling
  • Service discovery
  • Health checks

Pros:

  • Automated deployment
  • Self-healing
  • Easy scaling

Cons:

  • Steep learning curve
  • Operational complexity
  • Overhead

Monitoring and Logging

Logging

Framework: Log::Dispatch

Log Levels:

  • DEBUG - Verbose debugging
  • INFO - Informational messages
  • WARN - Warnings
  • ERROR - Errors
  • FATAL - Fatal errors

Log Destinations:

  • STDOUT (development)
  • File (production): /var/log/musicbrainz/server.log
  • Syslog (optional)

Log Rotation:

  • Daily rotation
  • Keep 30 days
  • Compress old logs

Error Tracking

Platform: Sentry

Integration:

  • Server-side: Perl Sentry SDK
  • Client-side: JavaScript Sentry SDK

Captured:

  • Exceptions
  • Error messages
  • Stack traces
  • Request context
  • User context

Metrics

Current State: No Prometheus/metrics endpoint

Workaround: Parse logs for metrics

Future: Prometheus exporter planned

Health Checks

Current State: No dedicated health check endpoint

Workaround: Check / returns 200

Future: /health endpoint planned