# MusicBrainz Server Deployment ## Docker Architecture ### Build System **Template Engine:** M4 macros **Base Image:** Ubuntu Noble (24.04 LTS) **Dockerfile Location:** `docker/Dockerfile.template` **Template Processing:** ```bash # Generate Dockerfile from template m4 docker/Dockerfile.template > docker/Dockerfile ``` **M4 Macros:** - `INSTALL_PERL_DEPENDENCIES` - Install Perl modules via carton - `INSTALL_NODE_DEPENDENCIES` - Install Node.js packages via yarn - `COMPILE_RESOURCES` - Compile static assets - `SETUP_DATABASE` - Initialize PostgreSQL schema **Multi-Stage Build:** 1. Base stage - Install system dependencies 2. Build stage - Compile assets and dependencies 3. Runtime stage - Copy artifacts, minimal runtime ### Container Types **website:** - Main web application - Serves HTML pages via Template Toolkit - Handles user authentication and sessions - Port: 5000 **webservice:** - API endpoints (/ws/2/) - JSON/XML serialization - OAuth authentication - Port: 5001 **tests:** - Run test suites - Perl unit tests - JavaScript tests - pgTAP database tests - No exposed ports (ephemeral) **cron:** - Scheduled tasks - Statistics calculation - Data cleanup - Replication packet export - No exposed ports **sitemaps:** - Generate XML sitemaps - Update search engine indexes - Run daily - No exposed ports **json-dump:** - Export database to JSON - Generate data dumps for download - Run weekly - No exposed ports **solr-backup:** - Backup Solr indexes - Run daily - No exposed ports **template-renderer:** - Isolated Template Toolkit renderer - Forked from main process - Prevents template errors from crashing main app - IPC via Unix socket ### Docker Compose **File:** `docker-compose.yml` **Services:** ```yaml services: db: image: postgres:16 volumes: - pgdata:/var/lib/postgresql/data environment: POSTGRES_USER: musicbrainz POSTGRES_PASSWORD: musicbrainz POSTGRES_DB: musicbrainz_db ports: - "5432:5432" redis: image: redis:7 volumes: - redisdata:/data ports: - "6379:6379" solr: image: solr:8.11 volumes: - solrdata:/var/solr ports: - "8983:8983" website: build: context: . dockerfile: docker/Dockerfile target: website depends_on: - db - redis - solr ports: - "5000:5000" environment: MUSICBRAINZ_SERVER_PROCESSES: 10 MUSICBRAINZ_USE_PROXY: 1 webservice: build: context: . dockerfile: docker/Dockerfile target: webservice depends_on: - db - redis - solr ports: - "5001:5001" volumes: pgdata: redisdata: solrdata: ``` ### Image Layers **Base Layer (Ubuntu Noble):** - System packages (build-essential, libpq-dev, etc.) - Perl 5.38 - Node.js 20 - PostgreSQL client libraries **Dependency Layer:** - Perl modules (via carton) - Node.js packages (via yarn) - Cached for faster rebuilds **Application Layer:** - Application code - Compiled assets - Configuration templates **Runtime Layer:** - Minimal runtime dependencies - No build tools - Smaller image size ## PSGI Server Configuration ### Starlet **Server:** Starlet (high-performance PSGI server) **Protocol:** HTTP/1.1 **Concurrency:** Pre-forking worker model **Configuration:** ```perl # Start Starlet with 10 workers starman --workers 10 \ --max-requests 100 \ --listen :5000 \ app.psgi ``` **Worker Settings:** - **Workers:** 10 (configurable via `MUSICBRAINZ_SERVER_PROCESSES`) - **Max Requests per Worker:** 30-90 (random to prevent thundering herd) - **Worker Timeout:** 300 seconds (5 minutes) - **Keepalive:** Enabled (60 seconds) **Worker Lifecycle:** 1. Master process forks 10 workers 2. Each worker handles requests until max_requests reached 3. Worker exits gracefully 4. Master forks new worker to replace it 5. Prevents memory leaks from accumulating ### Server::Starter (Zero-Downtime Restarts) **Purpose:** Enable zero-downtime deployments **Mechanism:** 1. Server::Starter binds to port 2. Forks Starlet with inherited socket 3. On restart signal (HUP): - Start new Starlet process - New process binds to same socket - Old process finishes existing requests - Old process exits - No dropped connections **Command:** ```bash start_server \ --port 5000 \ --pid-file /var/run/musicbrainz.pid \ --status-file /var/run/musicbrainz.status \ -- \ starman --workers 10 app.psgi ``` **Restart:** ```bash # Send HUP signal to trigger graceful restart kill -HUP $(cat /var/run/musicbrainz.pid) ``` **Status Check:** ```bash # Check server status cat /var/run/musicbrainz.status # Output: 1234:5000 (PID:PORT) ``` ### Reverse Proxy **Production Setup:** Nginx reverse proxy in front of Starlet **Nginx Configuration:** ```nginx upstream musicbrainz { server localhost:5000; keepalive 32; } server { listen 80; server_name musicbrainz.org; location / { proxy_pass http://musicbrainz; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_http_version 1.1; proxy_set_header Connection ""; } location /static/ { alias /var/www/musicbrainz/root/static/; expires 1y; add_header Cache-Control "public, immutable"; } } ``` **Benefits:** - SSL termination - Static file serving - Gzip compression - Request buffering - Load balancing (multiple Starlet instances) ## CI/CD Pipeline ### GitHub Actions **Workflow File:** `.github/workflows/test.yml` **Triggers:** - Push to main branch - Pull requests - Manual workflow dispatch ### Build Stage **Job:** `build-tests-image` **Steps:** 1. Checkout code 2. Set up Docker Buildx 3. Build test Docker image 4. Push to GitHub Container Registry 5. Cache layers for faster rebuilds **Dockerfile:** `docker/Dockerfile.test` **Caching:** - Perl dependencies cached by cpanfile.snapshot hash - Node dependencies cached by yarn.lock hash - Docker layer caching via GitHub Actions cache ### Test Stages **Job:** `js-perl-and-pgtap` **Matrix:** - Perl 5.38.0 (stable) - Perl 5.42.0 (latest) **Steps:** 1. Pull test image from registry 2. Start PostgreSQL container 3. Start Redis container 4. Initialize test database 5. Run Perl tests (`prove -lr t/`) 6. Run JavaScript tests (`yarn test`) 7. Run pgTAP tests (`pg_prove -d musicbrainz_test t/pgtap/`) 8. Upload coverage reports **Parallelization:** Tests run in parallel across matrix ### Selenium Tests **Jobs:** `selenium-1`, `selenium-2`, `selenium-3`, `selenium-4` **Partitioning:** Tests split into 4 partitions for parallel execution **Steps:** 1. Pull test image 2. Start PostgreSQL, Redis, Solr 3. Start Selenium standalone Chrome 4. Initialize test database with sample data 5. Start MusicBrainz server 6. Run Selenium tests for partition 7. Upload screenshots on failure **Partition Strategy:** ```bash # Partition 1: Artist and release tests # Partition 2: Recording and work tests # Partition 3: Edit and relationship tests # Partition 4: Search and browse tests ``` **Selenium Configuration:** ```perl # t/selenium.pl use Selenium::Remote::Driver; my $driver = Selenium::Remote::Driver->new( remote_server_addr => 'localhost', port => 4444, browser_name => 'chrome', extra_capabilities => { chromeOptions => { args => ['--headless', '--no-sandbox', '--disable-dev-shm-usage'], }, }, ); ``` ### Second-Tier Tests **Job:** `second-perl-and-pgtap` **Purpose:** Test against Perl 5.42.0 (latest stable) **Trigger:** After main tests pass **Allowed to Fail:** Yes (informational only) ### Report Generation **Job:** `generate-reports` **Steps:** 1. Download coverage reports from all test jobs 2. Merge coverage data 3. Generate HTML coverage report 4. Upload to Codecov 5. Comment on PR with coverage summary **Coverage Tools:** - Perl: Devel::Cover - JavaScript: Istanbul/nyc ## Build Process ### Step 1: Install Perl Dependencies ```bash # Install Carton (Perl dependency manager) cpanm --notest Carton # Install dependencies from cpanfile.snapshot carton install --deployment ``` **Dependencies Installed:** - Catalyst framework - Moose object system - DBD::Pg database driver - Template::Toolkit - JSON::XS - XML::LibXML - Redis client - ~200 total CPAN modules **Installation Time:** ~10 minutes (first time), ~1 minute (cached) ### Step 2: Install Node.js Dependencies ```bash # Install Yarn (if not present) npm install -g yarn # Install dependencies from yarn.lock yarn install --frozen-lockfile ``` **Dependencies Installed:** - React 19.2.4 - Redux - Webpack 5 - Babel 7 - Jest (testing) - ESLint (linting) - ~500 total npm packages **Installation Time:** ~5 minutes (first time), ~30 seconds (cached) ### Step 3: Compile Static Resources ```bash # Compile CSS, images, fonts ./script/compile_resources.sh ``` **Tasks:** - Compile LESS to CSS - Optimize images (pngcrush, optipng) - Copy fonts to static directory - Generate CSS sprites - Minify CSS **Output:** `root/static/styles/`, `root/static/images/` **Time:** ~2 minutes ### Step 4: Build JavaScript Bundles ```bash # Build production bundles with Webpack yarn run build # Or for development (with source maps) yarn run build:dev ``` **Webpack Configuration:** - Entry points: `root/static/scripts/main.js`, `root/static/scripts/edit.js` - Output: `root/static/build/` - Loaders: Babel (JSX, ES6+), CSS, file-loader - Plugins: UglifyJS, ExtractTextPlugin, DefinePlugin - Code splitting: Vendor bundle, async chunks **Output Files:** - `main.bundle.js` - Main application code - `vendor.bundle.js` - Third-party libraries - `edit.bundle.js` - Edit interface code - `*.chunk.js` - Async-loaded chunks **Time:** ~3 minutes (production), ~30 seconds (development) ### Step 5: Initialize Database ```bash # Create database createdb musicbrainz_db # Load schema psql musicbrainz_db < admin/sql/CreateTables.sql # Load initial data ./admin/InitDb.pl --createdb --import ``` **Schema Loading:** - 375 tables created - 500+ foreign keys added - Indexes created - Triggers installed **Initial Data:** - Countries and areas - Languages - Relationship types - Instrument types - Genre definitions **Time:** ~10 minutes (schema), ~30 minutes (sample data) ### Step 6: Build Search Indexes ```bash # Build Solr indexes for all entities ./admin/BuildSearchIndexes.pl --all ``` **Indexes Built:** - Artist index - Release index - Recording index - Work index - Label index - Area, event, place, series, instrument indexes **Time:** ~2 hours (full production data), ~5 minutes (sample data) ## System Requirements ### Minimum Requirements (Development) **CPU:** 2 cores **RAM:** 4 GB **Disk:** 20 GB **Database:** PostgreSQL 16+ **Cache:** Redis 6.0+ **Search:** Solr 8.11+ ### Recommended Requirements (Production) **CPU:** 8+ cores **RAM:** 16+ GB **Disk:** 500+ GB SSD - 350 GB for PostgreSQL database - 50 GB for Solr indexes - 50 GB for backups - 50 GB for logs and temp files **Database:** PostgreSQL 16+ with: - shared_buffers = 4GB - effective_cache_size = 12GB - work_mem = 64MB - maintenance_work_mem = 1GB **Cache:** Redis 6.0+ with: - maxmemory = 2GB - maxmemory-policy = allkeys-lru **Search:** Solr 8.11+ with: - Java heap = 4GB - Solr cache = 512MB per core ### Network Requirements **Bandwidth:** 100 Mbps+ (for replication and API traffic) **Ports:** - 5000 - Website - 5001 - Web service API - 5432 - PostgreSQL - 6379 - Redis - 8983 - Solr **Firewall:** - Allow inbound 80/443 (HTTP/HTTPS) - Allow outbound 80/443 (external APIs) - Restrict 5432, 6379, 8983 to localhost ### Software Requirements **Operating System:** - Ubuntu 24.04 LTS (Noble) - recommended - Debian 12 (Bookworm) - Any Linux with Perl 5.38+ and Node.js 20+ **Perl:** 5.38.0 or later (5.42.0 tested) **Node.js:** 20.9.0 or later **PostgreSQL:** 16.0 or later (16.3 recommended) **Redis:** 6.0 or later (7.0 recommended) **Solr:** 8.11 or later **Optional:** - Docker 24.0+ - Docker Compose 2.0+ - Nginx 1.24+ (reverse proxy) - RabbitMQ 3.12+ (background jobs) ## Deployment Strategies ### Single Server **Use Case:** Development, small mirrors **Architecture:** - All services on one server - PostgreSQL, Redis, Solr, MusicBrainz on localhost - Nginx reverse proxy **Pros:** - Simple setup - Low cost - Easy to manage **Cons:** - Single point of failure - Limited scalability - Resource contention ### Multi-Server **Use Case:** Production, high-traffic mirrors **Architecture:** - Web tier: 2+ servers running MusicBrainz (load balanced) - Database tier: PostgreSQL primary + replicas - Cache tier: Redis (possibly clustered) - Search tier: Solr (possibly sharded) **Pros:** - High availability - Horizontal scalability - Better performance **Cons:** - Complex setup - Higher cost - Requires load balancer ### Docker Swarm / Kubernetes **Use Case:** Large-scale deployments, cloud environments **Architecture:** - Container orchestration - Auto-scaling - Service discovery - Health checks **Pros:** - Automated deployment - Self-healing - Easy scaling **Cons:** - Steep learning curve - Operational complexity - Overhead ## Monitoring and Logging ### Logging **Framework:** Log::Dispatch **Log Levels:** - DEBUG - Verbose debugging - INFO - Informational messages - WARN - Warnings - ERROR - Errors - FATAL - Fatal errors **Log Destinations:** - STDOUT (development) - File (production): `/var/log/musicbrainz/server.log` - Syslog (optional) **Log Rotation:** - Daily rotation - Keep 30 days - Compress old logs ### Error Tracking **Platform:** Sentry **Integration:** - Server-side: Perl Sentry SDK - Client-side: JavaScript Sentry SDK **Captured:** - Exceptions - Error messages - Stack traces - Request context - User context ### Metrics **Current State:** No Prometheus/metrics endpoint **Workaround:** Parse logs for metrics **Future:** Prometheus exporter planned ### Health Checks **Current State:** No dedicated health check endpoint **Workaround:** Check `/` returns 200 **Future:** `/health` endpoint planned