feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,707 @@
|
||||
# MusicBrainz Server Deployment
|
||||
|
||||
## Docker Architecture
|
||||
|
||||
### Build System
|
||||
|
||||
**Template Engine:** M4 macros
|
||||
**Base Image:** Ubuntu Noble (24.04 LTS)
|
||||
**Dockerfile Location:** `docker/Dockerfile.template`
|
||||
|
||||
**Template Processing:**
|
||||
```bash
|
||||
# Generate Dockerfile from template
|
||||
m4 docker/Dockerfile.template > docker/Dockerfile
|
||||
```
|
||||
|
||||
**M4 Macros:**
|
||||
- `INSTALL_PERL_DEPENDENCIES` - Install Perl modules via carton
|
||||
- `INSTALL_NODE_DEPENDENCIES` - Install Node.js packages via yarn
|
||||
- `COMPILE_RESOURCES` - Compile static assets
|
||||
- `SETUP_DATABASE` - Initialize PostgreSQL schema
|
||||
|
||||
**Multi-Stage Build:**
|
||||
1. Base stage - Install system dependencies
|
||||
2. Build stage - Compile assets and dependencies
|
||||
3. Runtime stage - Copy artifacts, minimal runtime
|
||||
|
||||
### Container Types
|
||||
|
||||
**website:**
|
||||
- Main web application
|
||||
- Serves HTML pages via Template Toolkit
|
||||
- Handles user authentication and sessions
|
||||
- Port: 5000
|
||||
|
||||
**webservice:**
|
||||
- API endpoints (/ws/2/)
|
||||
- JSON/XML serialization
|
||||
- OAuth authentication
|
||||
- Port: 5001
|
||||
|
||||
**tests:**
|
||||
- Run test suites
|
||||
- Perl unit tests
|
||||
- JavaScript tests
|
||||
- pgTAP database tests
|
||||
- No exposed ports (ephemeral)
|
||||
|
||||
**cron:**
|
||||
- Scheduled tasks
|
||||
- Statistics calculation
|
||||
- Data cleanup
|
||||
- Replication packet export
|
||||
- No exposed ports
|
||||
|
||||
**sitemaps:**
|
||||
- Generate XML sitemaps
|
||||
- Update search engine indexes
|
||||
- Run daily
|
||||
- No exposed ports
|
||||
|
||||
**json-dump:**
|
||||
- Export database to JSON
|
||||
- Generate data dumps for download
|
||||
- Run weekly
|
||||
- No exposed ports
|
||||
|
||||
**solr-backup:**
|
||||
- Backup Solr indexes
|
||||
- Run daily
|
||||
- No exposed ports
|
||||
|
||||
**template-renderer:**
|
||||
- Isolated Template Toolkit renderer
|
||||
- Forked from main process
|
||||
- Prevents template errors from crashing main app
|
||||
- IPC via Unix socket
|
||||
|
||||
### Docker Compose
|
||||
|
||||
**File:** `docker-compose.yml`
|
||||
|
||||
**Services:**
|
||||
```yaml
|
||||
services:
|
||||
db:
|
||||
image: postgres:16
|
||||
volumes:
|
||||
- pgdata:/var/lib/postgresql/data
|
||||
environment:
|
||||
POSTGRES_USER: musicbrainz
|
||||
POSTGRES_PASSWORD: musicbrainz
|
||||
POSTGRES_DB: musicbrainz_db
|
||||
ports:
|
||||
- "5432:5432"
|
||||
|
||||
redis:
|
||||
image: redis:7
|
||||
volumes:
|
||||
- redisdata:/data
|
||||
ports:
|
||||
- "6379:6379"
|
||||
|
||||
solr:
|
||||
image: solr:8.11
|
||||
volumes:
|
||||
- solrdata:/var/solr
|
||||
ports:
|
||||
- "8983:8983"
|
||||
|
||||
website:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: docker/Dockerfile
|
||||
target: website
|
||||
depends_on:
|
||||
- db
|
||||
- redis
|
||||
- solr
|
||||
ports:
|
||||
- "5000:5000"
|
||||
environment:
|
||||
MUSICBRAINZ_SERVER_PROCESSES: 10
|
||||
MUSICBRAINZ_USE_PROXY: 1
|
||||
|
||||
webservice:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: docker/Dockerfile
|
||||
target: webservice
|
||||
depends_on:
|
||||
- db
|
||||
- redis
|
||||
- solr
|
||||
ports:
|
||||
- "5001:5001"
|
||||
|
||||
volumes:
|
||||
pgdata:
|
||||
redisdata:
|
||||
solrdata:
|
||||
```
|
||||
|
||||
### Image Layers
|
||||
|
||||
**Base Layer (Ubuntu Noble):**
|
||||
- System packages (build-essential, libpq-dev, etc.)
|
||||
- Perl 5.38
|
||||
- Node.js 20
|
||||
- PostgreSQL client libraries
|
||||
|
||||
**Dependency Layer:**
|
||||
- Perl modules (via carton)
|
||||
- Node.js packages (via yarn)
|
||||
- Cached for faster rebuilds
|
||||
|
||||
**Application Layer:**
|
||||
- Application code
|
||||
- Compiled assets
|
||||
- Configuration templates
|
||||
|
||||
**Runtime Layer:**
|
||||
- Minimal runtime dependencies
|
||||
- No build tools
|
||||
- Smaller image size
|
||||
|
||||
## PSGI Server Configuration
|
||||
|
||||
### Starlet
|
||||
|
||||
**Server:** Starlet (high-performance PSGI server)
|
||||
**Protocol:** HTTP/1.1
|
||||
**Concurrency:** Pre-forking worker model
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# Start Starlet with 10 workers
|
||||
starman --workers 10 \
|
||||
--max-requests 100 \
|
||||
--listen :5000 \
|
||||
app.psgi
|
||||
```
|
||||
|
||||
**Worker Settings:**
|
||||
- **Workers:** 10 (configurable via `MUSICBRAINZ_SERVER_PROCESSES`)
|
||||
- **Max Requests per Worker:** 30-90 (random to prevent thundering herd)
|
||||
- **Worker Timeout:** 300 seconds (5 minutes)
|
||||
- **Keepalive:** Enabled (60 seconds)
|
||||
|
||||
**Worker Lifecycle:**
|
||||
1. Master process forks 10 workers
|
||||
2. Each worker handles requests until max_requests reached
|
||||
3. Worker exits gracefully
|
||||
4. Master forks new worker to replace it
|
||||
5. Prevents memory leaks from accumulating
|
||||
|
||||
### Server::Starter (Zero-Downtime Restarts)
|
||||
|
||||
**Purpose:** Enable zero-downtime deployments
|
||||
|
||||
**Mechanism:**
|
||||
1. Server::Starter binds to port
|
||||
2. Forks Starlet with inherited socket
|
||||
3. On restart signal (HUP):
|
||||
- Start new Starlet process
|
||||
- New process binds to same socket
|
||||
- Old process finishes existing requests
|
||||
- Old process exits
|
||||
- No dropped connections
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
start_server \
|
||||
--port 5000 \
|
||||
--pid-file /var/run/musicbrainz.pid \
|
||||
--status-file /var/run/musicbrainz.status \
|
||||
-- \
|
||||
starman --workers 10 app.psgi
|
||||
```
|
||||
|
||||
**Restart:**
|
||||
```bash
|
||||
# Send HUP signal to trigger graceful restart
|
||||
kill -HUP $(cat /var/run/musicbrainz.pid)
|
||||
```
|
||||
|
||||
**Status Check:**
|
||||
```bash
|
||||
# Check server status
|
||||
cat /var/run/musicbrainz.status
|
||||
# Output: 1234:5000 (PID:PORT)
|
||||
```
|
||||
|
||||
### Reverse Proxy
|
||||
|
||||
**Production Setup:** Nginx reverse proxy in front of Starlet
|
||||
|
||||
**Nginx Configuration:**
|
||||
```nginx
|
||||
upstream musicbrainz {
|
||||
server localhost:5000;
|
||||
keepalive 32;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 80;
|
||||
server_name musicbrainz.org;
|
||||
|
||||
location / {
|
||||
proxy_pass http://musicbrainz;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Connection "";
|
||||
}
|
||||
|
||||
location /static/ {
|
||||
alias /var/www/musicbrainz/root/static/;
|
||||
expires 1y;
|
||||
add_header Cache-Control "public, immutable";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- SSL termination
|
||||
- Static file serving
|
||||
- Gzip compression
|
||||
- Request buffering
|
||||
- Load balancing (multiple Starlet instances)
|
||||
|
||||
## CI/CD Pipeline
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
**Workflow File:** `.github/workflows/test.yml`
|
||||
|
||||
**Triggers:**
|
||||
- Push to main branch
|
||||
- Pull requests
|
||||
- Manual workflow dispatch
|
||||
|
||||
### Build Stage
|
||||
|
||||
**Job:** `build-tests-image`
|
||||
|
||||
**Steps:**
|
||||
1. Checkout code
|
||||
2. Set up Docker Buildx
|
||||
3. Build test Docker image
|
||||
4. Push to GitHub Container Registry
|
||||
5. Cache layers for faster rebuilds
|
||||
|
||||
**Dockerfile:** `docker/Dockerfile.test`
|
||||
|
||||
**Caching:**
|
||||
- Perl dependencies cached by cpanfile.snapshot hash
|
||||
- Node dependencies cached by yarn.lock hash
|
||||
- Docker layer caching via GitHub Actions cache
|
||||
|
||||
### Test Stages
|
||||
|
||||
**Job:** `js-perl-and-pgtap`
|
||||
|
||||
**Matrix:**
|
||||
- Perl 5.38.0 (stable)
|
||||
- Perl 5.42.0 (latest)
|
||||
|
||||
**Steps:**
|
||||
1. Pull test image from registry
|
||||
2. Start PostgreSQL container
|
||||
3. Start Redis container
|
||||
4. Initialize test database
|
||||
5. Run Perl tests (`prove -lr t/`)
|
||||
6. Run JavaScript tests (`yarn test`)
|
||||
7. Run pgTAP tests (`pg_prove -d musicbrainz_test t/pgtap/`)
|
||||
8. Upload coverage reports
|
||||
|
||||
**Parallelization:** Tests run in parallel across matrix
|
||||
|
||||
### Selenium Tests
|
||||
|
||||
**Jobs:** `selenium-1`, `selenium-2`, `selenium-3`, `selenium-4`
|
||||
|
||||
**Partitioning:** Tests split into 4 partitions for parallel execution
|
||||
|
||||
**Steps:**
|
||||
1. Pull test image
|
||||
2. Start PostgreSQL, Redis, Solr
|
||||
3. Start Selenium standalone Chrome
|
||||
4. Initialize test database with sample data
|
||||
5. Start MusicBrainz server
|
||||
6. Run Selenium tests for partition
|
||||
7. Upload screenshots on failure
|
||||
|
||||
**Partition Strategy:**
|
||||
```bash
|
||||
# Partition 1: Artist and release tests
|
||||
# Partition 2: Recording and work tests
|
||||
# Partition 3: Edit and relationship tests
|
||||
# Partition 4: Search and browse tests
|
||||
```
|
||||
|
||||
**Selenium Configuration:**
|
||||
```perl
|
||||
# t/selenium.pl
|
||||
use Selenium::Remote::Driver;
|
||||
|
||||
my $driver = Selenium::Remote::Driver->new(
|
||||
remote_server_addr => 'localhost',
|
||||
port => 4444,
|
||||
browser_name => 'chrome',
|
||||
extra_capabilities => {
|
||||
chromeOptions => {
|
||||
args => ['--headless', '--no-sandbox', '--disable-dev-shm-usage'],
|
||||
},
|
||||
},
|
||||
);
|
||||
```
|
||||
|
||||
### Second-Tier Tests
|
||||
|
||||
**Job:** `second-perl-and-pgtap`
|
||||
|
||||
**Purpose:** Test against Perl 5.42.0 (latest stable)
|
||||
|
||||
**Trigger:** After main tests pass
|
||||
|
||||
**Allowed to Fail:** Yes (informational only)
|
||||
|
||||
### Report Generation
|
||||
|
||||
**Job:** `generate-reports`
|
||||
|
||||
**Steps:**
|
||||
1. Download coverage reports from all test jobs
|
||||
2. Merge coverage data
|
||||
3. Generate HTML coverage report
|
||||
4. Upload to Codecov
|
||||
5. Comment on PR with coverage summary
|
||||
|
||||
**Coverage Tools:**
|
||||
- Perl: Devel::Cover
|
||||
- JavaScript: Istanbul/nyc
|
||||
|
||||
## Build Process
|
||||
|
||||
### Step 1: Install Perl Dependencies
|
||||
|
||||
```bash
|
||||
# Install Carton (Perl dependency manager)
|
||||
cpanm --notest Carton
|
||||
|
||||
# Install dependencies from cpanfile.snapshot
|
||||
carton install --deployment
|
||||
```
|
||||
|
||||
**Dependencies Installed:**
|
||||
- Catalyst framework
|
||||
- Moose object system
|
||||
- DBD::Pg database driver
|
||||
- Template::Toolkit
|
||||
- JSON::XS
|
||||
- XML::LibXML
|
||||
- Redis client
|
||||
- ~200 total CPAN modules
|
||||
|
||||
**Installation Time:** ~10 minutes (first time), ~1 minute (cached)
|
||||
|
||||
### Step 2: Install Node.js Dependencies
|
||||
|
||||
```bash
|
||||
# Install Yarn (if not present)
|
||||
npm install -g yarn
|
||||
|
||||
# Install dependencies from yarn.lock
|
||||
yarn install --frozen-lockfile
|
||||
```
|
||||
|
||||
**Dependencies Installed:**
|
||||
- React 19.2.4
|
||||
- Redux
|
||||
- Webpack 5
|
||||
- Babel 7
|
||||
- Jest (testing)
|
||||
- ESLint (linting)
|
||||
- ~500 total npm packages
|
||||
|
||||
**Installation Time:** ~5 minutes (first time), ~30 seconds (cached)
|
||||
|
||||
### Step 3: Compile Static Resources
|
||||
|
||||
```bash
|
||||
# Compile CSS, images, fonts
|
||||
./script/compile_resources.sh
|
||||
```
|
||||
|
||||
**Tasks:**
|
||||
- Compile LESS to CSS
|
||||
- Optimize images (pngcrush, optipng)
|
||||
- Copy fonts to static directory
|
||||
- Generate CSS sprites
|
||||
- Minify CSS
|
||||
|
||||
**Output:** `root/static/styles/`, `root/static/images/`
|
||||
|
||||
**Time:** ~2 minutes
|
||||
|
||||
### Step 4: Build JavaScript Bundles
|
||||
|
||||
```bash
|
||||
# Build production bundles with Webpack
|
||||
yarn run build
|
||||
|
||||
# Or for development (with source maps)
|
||||
yarn run build:dev
|
||||
```
|
||||
|
||||
**Webpack Configuration:**
|
||||
- Entry points: `root/static/scripts/main.js`, `root/static/scripts/edit.js`
|
||||
- Output: `root/static/build/`
|
||||
- Loaders: Babel (JSX, ES6+), CSS, file-loader
|
||||
- Plugins: UglifyJS, ExtractTextPlugin, DefinePlugin
|
||||
- Code splitting: Vendor bundle, async chunks
|
||||
|
||||
**Output Files:**
|
||||
- `main.bundle.js` - Main application code
|
||||
- `vendor.bundle.js` - Third-party libraries
|
||||
- `edit.bundle.js` - Edit interface code
|
||||
- `*.chunk.js` - Async-loaded chunks
|
||||
|
||||
**Time:** ~3 minutes (production), ~30 seconds (development)
|
||||
|
||||
### Step 5: Initialize Database
|
||||
|
||||
```bash
|
||||
# Create database
|
||||
createdb musicbrainz_db
|
||||
|
||||
# Load schema
|
||||
psql musicbrainz_db < admin/sql/CreateTables.sql
|
||||
|
||||
# Load initial data
|
||||
./admin/InitDb.pl --createdb --import
|
||||
```
|
||||
|
||||
**Schema Loading:**
|
||||
- 375 tables created
|
||||
- 500+ foreign keys added
|
||||
- Indexes created
|
||||
- Triggers installed
|
||||
|
||||
**Initial Data:**
|
||||
- Countries and areas
|
||||
- Languages
|
||||
- Relationship types
|
||||
- Instrument types
|
||||
- Genre definitions
|
||||
|
||||
**Time:** ~10 minutes (schema), ~30 minutes (sample data)
|
||||
|
||||
### Step 6: Build Search Indexes
|
||||
|
||||
```bash
|
||||
# Build Solr indexes for all entities
|
||||
./admin/BuildSearchIndexes.pl --all
|
||||
```
|
||||
|
||||
**Indexes Built:**
|
||||
- Artist index
|
||||
- Release index
|
||||
- Recording index
|
||||
- Work index
|
||||
- Label index
|
||||
- Area, event, place, series, instrument indexes
|
||||
|
||||
**Time:** ~2 hours (full production data), ~5 minutes (sample data)
|
||||
|
||||
## System Requirements
|
||||
|
||||
### Minimum Requirements (Development)
|
||||
|
||||
**CPU:** 2 cores
|
||||
**RAM:** 4 GB
|
||||
**Disk:** 20 GB
|
||||
**Database:** PostgreSQL 16+
|
||||
**Cache:** Redis 6.0+
|
||||
**Search:** Solr 8.11+
|
||||
|
||||
### Recommended Requirements (Production)
|
||||
|
||||
**CPU:** 8+ cores
|
||||
**RAM:** 16+ GB
|
||||
**Disk:** 500+ GB SSD
|
||||
- 350 GB for PostgreSQL database
|
||||
- 50 GB for Solr indexes
|
||||
- 50 GB for backups
|
||||
- 50 GB for logs and temp files
|
||||
|
||||
**Database:** PostgreSQL 16+ with:
|
||||
- shared_buffers = 4GB
|
||||
- effective_cache_size = 12GB
|
||||
- work_mem = 64MB
|
||||
- maintenance_work_mem = 1GB
|
||||
|
||||
**Cache:** Redis 6.0+ with:
|
||||
- maxmemory = 2GB
|
||||
- maxmemory-policy = allkeys-lru
|
||||
|
||||
**Search:** Solr 8.11+ with:
|
||||
- Java heap = 4GB
|
||||
- Solr cache = 512MB per core
|
||||
|
||||
### Network Requirements
|
||||
|
||||
**Bandwidth:** 100 Mbps+ (for replication and API traffic)
|
||||
|
||||
**Ports:**
|
||||
- 5000 - Website
|
||||
- 5001 - Web service API
|
||||
- 5432 - PostgreSQL
|
||||
- 6379 - Redis
|
||||
- 8983 - Solr
|
||||
|
||||
**Firewall:**
|
||||
- Allow inbound 80/443 (HTTP/HTTPS)
|
||||
- Allow outbound 80/443 (external APIs)
|
||||
- Restrict 5432, 6379, 8983 to localhost
|
||||
|
||||
### Software Requirements
|
||||
|
||||
**Operating System:**
|
||||
- Ubuntu 24.04 LTS (Noble) - recommended
|
||||
- Debian 12 (Bookworm)
|
||||
- Any Linux with Perl 5.38+ and Node.js 20+
|
||||
|
||||
**Perl:** 5.38.0 or later (5.42.0 tested)
|
||||
|
||||
**Node.js:** 20.9.0 or later
|
||||
|
||||
**PostgreSQL:** 16.0 or later (16.3 recommended)
|
||||
|
||||
**Redis:** 6.0 or later (7.0 recommended)
|
||||
|
||||
**Solr:** 8.11 or later
|
||||
|
||||
**Optional:**
|
||||
- Docker 24.0+
|
||||
- Docker Compose 2.0+
|
||||
- Nginx 1.24+ (reverse proxy)
|
||||
- RabbitMQ 3.12+ (background jobs)
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
### Single Server
|
||||
|
||||
**Use Case:** Development, small mirrors
|
||||
|
||||
**Architecture:**
|
||||
- All services on one server
|
||||
- PostgreSQL, Redis, Solr, MusicBrainz on localhost
|
||||
- Nginx reverse proxy
|
||||
|
||||
**Pros:**
|
||||
- Simple setup
|
||||
- Low cost
|
||||
- Easy to manage
|
||||
|
||||
**Cons:**
|
||||
- Single point of failure
|
||||
- Limited scalability
|
||||
- Resource contention
|
||||
|
||||
### Multi-Server
|
||||
|
||||
**Use Case:** Production, high-traffic mirrors
|
||||
|
||||
**Architecture:**
|
||||
- Web tier: 2+ servers running MusicBrainz (load balanced)
|
||||
- Database tier: PostgreSQL primary + replicas
|
||||
- Cache tier: Redis (possibly clustered)
|
||||
- Search tier: Solr (possibly sharded)
|
||||
|
||||
**Pros:**
|
||||
- High availability
|
||||
- Horizontal scalability
|
||||
- Better performance
|
||||
|
||||
**Cons:**
|
||||
- Complex setup
|
||||
- Higher cost
|
||||
- Requires load balancer
|
||||
|
||||
### Docker Swarm / Kubernetes
|
||||
|
||||
**Use Case:** Large-scale deployments, cloud environments
|
||||
|
||||
**Architecture:**
|
||||
- Container orchestration
|
||||
- Auto-scaling
|
||||
- Service discovery
|
||||
- Health checks
|
||||
|
||||
**Pros:**
|
||||
- Automated deployment
|
||||
- Self-healing
|
||||
- Easy scaling
|
||||
|
||||
**Cons:**
|
||||
- Steep learning curve
|
||||
- Operational complexity
|
||||
- Overhead
|
||||
|
||||
## Monitoring and Logging
|
||||
|
||||
### Logging
|
||||
|
||||
**Framework:** Log::Dispatch
|
||||
|
||||
**Log Levels:**
|
||||
- DEBUG - Verbose debugging
|
||||
- INFO - Informational messages
|
||||
- WARN - Warnings
|
||||
- ERROR - Errors
|
||||
- FATAL - Fatal errors
|
||||
|
||||
**Log Destinations:**
|
||||
- STDOUT (development)
|
||||
- File (production): `/var/log/musicbrainz/server.log`
|
||||
- Syslog (optional)
|
||||
|
||||
**Log Rotation:**
|
||||
- Daily rotation
|
||||
- Keep 30 days
|
||||
- Compress old logs
|
||||
|
||||
### Error Tracking
|
||||
|
||||
**Platform:** Sentry
|
||||
|
||||
**Integration:**
|
||||
- Server-side: Perl Sentry SDK
|
||||
- Client-side: JavaScript Sentry SDK
|
||||
|
||||
**Captured:**
|
||||
- Exceptions
|
||||
- Error messages
|
||||
- Stack traces
|
||||
- Request context
|
||||
- User context
|
||||
|
||||
### Metrics
|
||||
|
||||
**Current State:** No Prometheus/metrics endpoint
|
||||
|
||||
**Workaround:** Parse logs for metrics
|
||||
|
||||
**Future:** Prometheus exporter planned
|
||||
|
||||
### Health Checks
|
||||
|
||||
**Current State:** No dedicated health check endpoint
|
||||
|
||||
**Workaround:** Check `/` returns 200
|
||||
|
||||
**Future:** `/health` endpoint planned
|
||||
Reference in New Issue
Block a user