Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

14 KiB

Raw Blame History

MusicBrainz Server Deployment

Docker Architecture

Build System

Template Engine: M4 macros
Base Image: Ubuntu Noble (24.04 LTS)
Dockerfile Location: docker/Dockerfile.template

Template Processing:

# Generate Dockerfile from template
m4 docker/Dockerfile.template > docker/Dockerfile

M4 Macros:

INSTALL_PERL_DEPENDENCIES - Install Perl modules via carton
INSTALL_NODE_DEPENDENCIES - Install Node.js packages via yarn
COMPILE_RESOURCES - Compile static assets
SETUP_DATABASE - Initialize PostgreSQL schema

Multi-Stage Build:

Base stage - Install system dependencies
Build stage - Compile assets and dependencies
Runtime stage - Copy artifacts, minimal runtime

Container Types

website:

Main web application
Serves HTML pages via Template Toolkit
Handles user authentication and sessions
Port: 5000

webservice:

API endpoints (/ws/2/)
JSON/XML serialization
OAuth authentication
Port: 5001

tests:

Run test suites
Perl unit tests
JavaScript tests
pgTAP database tests
No exposed ports (ephemeral)

cron:

Scheduled tasks
Statistics calculation
Data cleanup
Replication packet export
No exposed ports

sitemaps:

Generate XML sitemaps
Update search engine indexes
Run daily
No exposed ports

json-dump:

Export database to JSON
Generate data dumps for download
Run weekly
No exposed ports

solr-backup:

Backup Solr indexes
Run daily
No exposed ports

template-renderer:

Isolated Template Toolkit renderer
Forked from main process
Prevents template errors from crashing main app
IPC via Unix socket

Docker Compose

File: docker-compose.yml

Services:

services:
  db:
    image: postgres:16
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: musicbrainz
      POSTGRES_PASSWORD: musicbrainz
      POSTGRES_DB: musicbrainz_db
    ports:
      - "5432:5432"

  redis:
    image: redis:7
    volumes:
      - redisdata:/data
    ports:
      - "6379:6379"

  solr:
    image: solr:8.11
    volumes:
      - solrdata:/var/solr
    ports:
      - "8983:8983"

  website:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: website
    depends_on:
      - db
      - redis
      - solr
    ports:
      - "5000:5000"
    environment:
      MUSICBRAINZ_SERVER_PROCESSES: 10
      MUSICBRAINZ_USE_PROXY: 1

  webservice:
    build:
      context: .
      dockerfile: docker/Dockerfile
      target: webservice
    depends_on:
      - db
      - redis
      - solr
    ports:
      - "5001:5001"

volumes:
  pgdata:
  redisdata:
  solrdata:

Image Layers

Base Layer (Ubuntu Noble):

System packages (build-essential, libpq-dev, etc.)
Perl 5.38
Node.js 20
PostgreSQL client libraries

Dependency Layer:

Perl modules (via carton)
Node.js packages (via yarn)
Cached for faster rebuilds

Application Layer:

Application code
Compiled assets
Configuration templates

Runtime Layer:

Minimal runtime dependencies
No build tools
Smaller image size

PSGI Server Configuration

Starlet

Server: Starlet (high-performance PSGI server)
Protocol: HTTP/1.1
Concurrency: Pre-forking worker model

Configuration:

# Start Starlet with 10 workers
starman --workers 10 \
        --max-requests 100 \
        --listen :5000 \
        app.psgi

Worker Settings:

Workers: 10 (configurable via MUSICBRAINZ_SERVER_PROCESSES)
Max Requests per Worker: 30-90 (random to prevent thundering herd)
Worker Timeout: 300 seconds (5 minutes)
Keepalive: Enabled (60 seconds)

Worker Lifecycle:

Master process forks 10 workers
Each worker handles requests until max_requests reached
Worker exits gracefully
Master forks new worker to replace it
Prevents memory leaks from accumulating

Server::Starter (Zero-Downtime Restarts)

Purpose: Enable zero-downtime deployments

Mechanism:

Server::Starter binds to port
Forks Starlet with inherited socket
On restart signal (HUP):
- Start new Starlet process
- New process binds to same socket
- Old process finishes existing requests
- Old process exits
- No dropped connections

Command:

start_server \
  --port 5000 \
  --pid-file /var/run/musicbrainz.pid \
  --status-file /var/run/musicbrainz.status \
  -- \
  starman --workers 10 app.psgi

Restart:

# Send HUP signal to trigger graceful restart
kill -HUP $(cat /var/run/musicbrainz.pid)

Status Check:

# Check server status
cat /var/run/musicbrainz.status
# Output: 1234:5000 (PID:PORT)

Reverse Proxy

Production Setup: Nginx reverse proxy in front of Starlet

Nginx Configuration:

upstream musicbrainz {
    server localhost:5000;
    keepalive 32;
}

server {
    listen 80;
    server_name musicbrainz.org;

    location / {
        proxy_pass http://musicbrainz;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    }

    location /static/ {
        alias /var/www/musicbrainz/root/static/;
        expires 1y;
        add_header Cache-Control "public, immutable";
    }
}

Benefits:

SSL termination
Static file serving
Gzip compression
Request buffering
Load balancing (multiple Starlet instances)

CI/CD Pipeline

GitHub Actions

Workflow File: .github/workflows/test.yml

Triggers:

Push to main branch
Pull requests
Manual workflow dispatch

Build Stage

Job: build-tests-image

Steps:

Checkout code
Set up Docker Buildx
Build test Docker image
Push to GitHub Container Registry
Cache layers for faster rebuilds

Dockerfile: docker/Dockerfile.test

Caching:

Perl dependencies cached by cpanfile.snapshot hash
Node dependencies cached by yarn.lock hash
Docker layer caching via GitHub Actions cache

Test Stages

Job: js-perl-and-pgtap

Matrix:

Perl 5.38.0 (stable)
Perl 5.42.0 (latest)

Steps:

Pull test image from registry
Start PostgreSQL container
Start Redis container
Initialize test database
Run Perl tests (prove -lr t/)
Run JavaScript tests (yarn test)
Run pgTAP tests (pg_prove -d musicbrainz_test t/pgtap/)
Upload coverage reports

Parallelization: Tests run in parallel across matrix

Selenium Tests

Jobs: selenium-1, selenium-2, selenium-3, selenium-4

Partitioning: Tests split into 4 partitions for parallel execution

Steps:

Pull test image
Start PostgreSQL, Redis, Solr
Start Selenium standalone Chrome
Initialize test database with sample data
Start MusicBrainz server
Run Selenium tests for partition
Upload screenshots on failure

Partition Strategy:

# Partition 1: Artist and release tests
# Partition 2: Recording and work tests
# Partition 3: Edit and relationship tests
# Partition 4: Search and browse tests

Selenium Configuration:

# t/selenium.pl
use Selenium::Remote::Driver;

my $driver = Selenium::Remote::Driver->new(
    remote_server_addr => 'localhost',
    port => 4444,
    browser_name => 'chrome',
    extra_capabilities => {
        chromeOptions => {
            args => ['--headless', '--no-sandbox', '--disable-dev-shm-usage'],
        },
    },
);

Second-Tier Tests

Job: second-perl-and-pgtap

Purpose: Test against Perl 5.42.0 (latest stable)

Trigger: After main tests pass

Allowed to Fail: Yes (informational only)

Report Generation

Job: generate-reports

Steps:

Download coverage reports from all test jobs
Merge coverage data
Generate HTML coverage report
Upload to Codecov
Comment on PR with coverage summary

Coverage Tools:

Perl: Devel::Cover
JavaScript: Istanbul/nyc

Build Process

Step 1: Install Perl Dependencies

# Install Carton (Perl dependency manager)
cpanm --notest Carton

# Install dependencies from cpanfile.snapshot
carton install --deployment

Dependencies Installed:

Catalyst framework
Moose object system
DBD::Pg database driver
Template::Toolkit
JSON::XS
XML::LibXML
Redis client
~200 total CPAN modules

Installation Time: ~10 minutes (first time), ~1 minute (cached)

Step 2: Install Node.js Dependencies

# Install Yarn (if not present)
npm install -g yarn

# Install dependencies from yarn.lock
yarn install --frozen-lockfile

Dependencies Installed:

React 19.2.4
Redux
Webpack 5
Babel 7
Jest (testing)
ESLint (linting)
~500 total npm packages

Installation Time: ~5 minutes (first time), ~30 seconds (cached)

Step 3: Compile Static Resources

# Compile CSS, images, fonts
./script/compile_resources.sh

Tasks:

Compile LESS to CSS
Optimize images (pngcrush, optipng)
Copy fonts to static directory
Generate CSS sprites
Minify CSS

Output: root/static/styles/, root/static/images/

Time: ~2 minutes

Step 4: Build JavaScript Bundles

# Build production bundles with Webpack
yarn run build

# Or for development (with source maps)
yarn run build:dev

Webpack Configuration:

Entry points: root/static/scripts/main.js, root/static/scripts/edit.js
Output: root/static/build/
Loaders: Babel (JSX, ES6+), CSS, file-loader
Plugins: UglifyJS, ExtractTextPlugin, DefinePlugin
Code splitting: Vendor bundle, async chunks

Output Files:

main.bundle.js - Main application code
vendor.bundle.js - Third-party libraries
edit.bundle.js - Edit interface code
*.chunk.js - Async-loaded chunks

Time: ~3 minutes (production), ~30 seconds (development)

Step 5: Initialize Database

# Create database
createdb musicbrainz_db

# Load schema
psql musicbrainz_db < admin/sql/CreateTables.sql

# Load initial data
./admin/InitDb.pl --createdb --import

Schema Loading:

375 tables created
500+ foreign keys added
Indexes created
Triggers installed

Initial Data:

Countries and areas
Languages
Relationship types
Instrument types
Genre definitions

Time: ~10 minutes (schema), ~30 minutes (sample data)

Step 6: Build Search Indexes

# Build Solr indexes for all entities
./admin/BuildSearchIndexes.pl --all

Indexes Built:

Artist index
Release index
Recording index
Work index
Label index
Area, event, place, series, instrument indexes

Time: ~2 hours (full production data), ~5 minutes (sample data)

System Requirements

Minimum Requirements (Development)

CPU: 2 cores
RAM: 4 GB
Disk: 20 GB
Database: PostgreSQL 16+
Cache: Redis 6.0+
Search: Solr 8.11+

Recommended Requirements (Production)

CPU: 8+ cores
RAM: 16+ GB
Disk: 500+ GB SSD

350 GB for PostgreSQL database
50 GB for Solr indexes
50 GB for backups
50 GB for logs and temp files

Database: PostgreSQL 16+ with:

shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 64MB
maintenance_work_mem = 1GB

Cache: Redis 6.0+ with:

maxmemory = 2GB
maxmemory-policy = allkeys-lru

Search: Solr 8.11+ with:

Java heap = 4GB
Solr cache = 512MB per core

Network Requirements

Bandwidth: 100 Mbps+ (for replication and API traffic)

Ports:

5000 - Website
5001 - Web service API
5432 - PostgreSQL
6379 - Redis
8983 - Solr

Firewall:

Allow inbound 80/443 (HTTP/HTTPS)
Allow outbound 80/443 (external APIs)
Restrict 5432, 6379, 8983 to localhost

Software Requirements

Operating System:

Ubuntu 24.04 LTS (Noble) - recommended
Debian 12 (Bookworm)
Any Linux with Perl 5.38+ and Node.js 20+

Perl: 5.38.0 or later (5.42.0 tested)

Node.js: 20.9.0 or later

PostgreSQL: 16.0 or later (16.3 recommended)

Redis: 6.0 or later (7.0 recommended)

Solr: 8.11 or later

Optional:

Docker 24.0+
Docker Compose 2.0+
Nginx 1.24+ (reverse proxy)
RabbitMQ 3.12+ (background jobs)

Deployment Strategies

Single Server

Use Case: Development, small mirrors

Architecture:

All services on one server
PostgreSQL, Redis, Solr, MusicBrainz on localhost
Nginx reverse proxy

Pros:

Simple setup
Low cost
Easy to manage

Cons:

Single point of failure
Limited scalability
Resource contention

Multi-Server

Use Case: Production, high-traffic mirrors

Architecture:

Web tier: 2+ servers running MusicBrainz (load balanced)
Database tier: PostgreSQL primary + replicas
Cache tier: Redis (possibly clustered)
Search tier: Solr (possibly sharded)

Pros:

High availability
Horizontal scalability
Better performance

Cons:

Complex setup
Higher cost
Requires load balancer

Docker Swarm / Kubernetes

Use Case: Large-scale deployments, cloud environments

Architecture:

Container orchestration
Auto-scaling
Service discovery
Health checks

Pros:

Automated deployment
Self-healing
Easy scaling

Cons:

Steep learning curve
Operational complexity
Overhead

Monitoring and Logging

Logging

Framework: Log::Dispatch

Log Levels:

DEBUG - Verbose debugging
INFO - Informational messages
WARN - Warnings
ERROR - Errors
FATAL - Fatal errors

Log Destinations:

STDOUT (development)
File (production): /var/log/musicbrainz/server.log
Syslog (optional)

Log Rotation:

Daily rotation
Keep 30 days
Compress old logs

Error Tracking

Platform: Sentry

Integration:

Server-side: Perl Sentry SDK
Client-side: JavaScript Sentry SDK

Captured:

Exceptions
Error messages
Stack traces
Request context
User context

Metrics

Current State: No Prometheus/metrics endpoint

Workaround: Parse logs for metrics

Future: Prometheus exporter planned

Health Checks

Current State: No dedicated health check endpoint

Workaround: Check / returns 200

Future: /health endpoint planned

14 KiB Raw Blame History

MusicBrainz Server Deployment

Docker Architecture

Build System

Container Types

Docker Compose

Image Layers

PSGI Server Configuration

Starlet

Server::Starter (Zero-Downtime Restarts)

Reverse Proxy

CI/CD Pipeline

GitHub Actions

Build Stage

Test Stages

Selenium Tests

Second-Tier Tests

Report Generation

Build Process

Step 1: Install Perl Dependencies

Step 2: Install Node.js Dependencies

Step 3: Compile Static Resources

Step 4: Build JavaScript Bundles

Step 5: Initialize Database

Step 6: Build Search Indexes

System Requirements

Minimum Requirements (Development)

Recommended Requirements (Production)

Network Requirements

Software Requirements

Deployment Strategies

Single Server

Multi-Server

Docker Swarm / Kubernetes

Monitoring and Logging

Logging

Error Tracking

Metrics

Health Checks

14 KiB

Raw Blame History