feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
@@ -0,0 +1,271 @@
# MusicBrainz Server Overview
## Project Identity
**Name:** MusicBrainz Server
**Repository:** https://github.com/metabrainz/musicbrainz-server
**License:** GPL-2.0+
**Description:** Open music encyclopedia that collects music metadata and makes it available to the public. Community-maintained database of music information including artists, releases, recordings, works, labels, and the relationships between them.
## Technology Stack
### Backend
**Primary Language:** Perl 5.38+
**Web Framework:** Catalyst (MVC framework)
**Object System:** Moose (modern Perl OOP)
**Core Perl Dependencies:**
- Catalyst::Runtime - Web application framework
- Moose - Modern object system for Perl
- DBD::Pg - PostgreSQL database driver
- Template::Toolkit - Template processing system
- Plack - PSGI toolkit and server adapters
- Redis - Perl Redis client
- JSON::XS - Fast JSON encoding/decoding
- XML::LibXML - XML processing
- DBIx::Connector - Fast, safe DBI connection management
- Readonly - Facility for creating read-only scalars, arrays, hashes
- Digest::SHA - SHA message digest algorithm
- LWP::UserAgent - HTTP client
- DateTime - Date and time object
- List::AllUtils - List manipulation utilities
- Try::Tiny - Minimal try/catch
- Class::Load - Load modules by name
- namespace::autoclean - Keep imports out of namespace
### Frontend
**Primary Language:** JavaScript (ES6+)
**UI Framework:** React 19.2.4
**State Management:** Redux
**Legacy Framework:** Knockout.js (still present in some views)
**Core JavaScript Dependencies:**
- React 19.2.4 - UI component library
- Redux - State management
- Webpack 5 - Module bundler
- Babel 7 - JavaScript compiler
- knockout - Legacy MVVM framework
- jQuery - DOM manipulation (legacy)
- lodash - Utility library
- immutable - Immutable data structures
- weight-balanced-tree - Efficient tree data structure
### Infrastructure
**Database:** PostgreSQL 16+
- 375 tables
- 500+ foreign key constraints
- Full-text search capabilities
- Custom replication via dbmirror2
**Cache:** Redis
- 16 separate databases
- Entity caching
- Session storage
- Pub/sub messaging
**Search:** Apache Solr
- Primary search engine
- PostgreSQL full-text as fallback
**Message Queue:** RabbitMQ (for background jobs)
## System Prerequisites
**Required:**
- Perl 5.38+ (5.42.0 tested in CI)
- Node.js 20.9+
- PostgreSQL 16+
- Redis 6.0+
- Apache Solr 8.11+
**Optional:**
- Docker + Docker Compose (for containerized deployment)
- RabbitMQ (for background job processing)
## Entry Point
**File:** `app.psgi`
**Initialization Flow:**
1. `app.psgi` loads the Plack middleware stack
2. Initializes `MusicBrainz::Server` Catalyst application
3. Loads configuration from `DBDefs.pm`
4. Establishes database connections via `DBIx::Connector`
5. Initializes Redis connection pool
6. Forks template renderer process for isolation
7. Loads Catalyst controllers, models, and views
8. Mounts PSGI application
**Middleware Stack:**
- Plack::Middleware::ReverseProxy - Handle X-Forwarded headers
- Plack::Middleware::Static - Serve static files
- Plack::Middleware::Session - Session management
- Custom middleware for CSRF protection
- Custom middleware for request logging
## Codebase Scale
**Perl:**
- 1,866 Perl files
- 53 controllers (13,000 lines)
- 106 Data modules (26,000 lines)
- 132 entity classes
- 43 form modules
- 4 view modules
**JavaScript:**
- 1,447 JavaScript files
- React components
- Redux reducers and actions
- Legacy Knockout view models
**Database:**
- 375 tables
- 332 migration files
- 4,068 lines in CreateTables.sql
**Tests:**
- Perl unit tests (t/)
- JavaScript tests (Jest)
- pgTAP database tests
- Selenium integration tests (4 partitions)
## Build Process
### Perl Dependencies
```bash
# Install Carton (Perl dependency manager)
cpanm Carton
# Install Perl dependencies from cpanfile.snapshot
carton install
```
### JavaScript Dependencies
```bash
# Install Node.js dependencies
yarn install
```
### Asset Compilation
```bash
# Compile static resources (CSS, images, fonts)
./script/compile_resources.sh
# Build JavaScript bundles with Webpack
yarn run build
```
**Build Outputs:**
- `root/static/build/` - Compiled JavaScript bundles
- `root/static/styles/` - Compiled CSS
- `root/static/images/` - Optimized images
## Run Commands
### Development
```bash
# Using plackup (development server)
plackup -Ilib -r app.psgi
# With auto-reload on file changes
plackup -Ilib -R lib,root -r app.psgi
```
### Production
```bash
# Using Starman (production PSGI server)
starman --workers 10 --listen :5000 app.psgi
# Using Server::Starter for zero-downtime restarts
start_server --port 5000 -- starman --workers 10 app.psgi
```
### Docker
```bash
# Build Docker images
docker-compose build
# Start all services
docker-compose up -d
# Start specific service
docker-compose up -d website
```
**Available Services:**
- `website` - Main web application
- `webservice` - API service
- `cron` - Scheduled tasks
- `sitemaps` - Sitemap generation
- `json-dump` - JSON data dumps
- `solr-backup` - Solr index backup
- `tests` - Test runner
## Directory Structure
```
musicbrainz-server/
├── admin/ # Database schema and migrations
│ ├── sql/
│ │ ├── CreateTables.sql
│ │ └── updates/ # 332 migration files
├── lib/ # Perl application code
│ └── MusicBrainz/
│ └── Server/
│ ├── Controller/ # 53 controllers
│ ├── Data/ # 106 data access modules
│ ├── Entity/ # 132 entity classes
│ ├── Form/ # 43 form handlers
│ ├── View/ # 4 view modules
│ ├── WebService/ # API implementation
│ └── Edit/ # Edit system
├── root/ # Frontend assets
│ ├── static/ # Static files
│ │ ├── scripts/ # JavaScript source
│ │ ├── styles/ # CSS/LESS
│ │ └── images/
│ └── layout.tt # Main template
├── t/ # Perl tests
├── docker/ # Docker configuration
├── script/ # Utility scripts
├── app.psgi # PSGI entry point
├── cpanfile # Perl dependencies
├── package.json # Node.js dependencies
└── webpack.config.js # Webpack configuration
```
## Configuration
**Primary Config:** `lib/DBDefs.pm`
**Two-Tier System:**
1. `lib/DBDefs/Default.pm` - Default values
2. `lib/DBDefs.pm` - Instance-specific overrides (not in git)
**Key Configuration Areas:**
- Database connection strings
- Redis connection parameters
- Solr endpoints
- External service credentials (Cover Art Archive, Wikipedia, etc.)
- Session settings
- Email configuration
- OAuth2 settings
- Feature flags
## Status
**Active Development:** Continuous development since 2001 (15+ years)
**Production Status:** Stable, serving millions of requests daily
**Community:** Large open-source community with hundreds of contributors
**Data Quality:** Community-driven editing with voting system ensures high quality
**API Usage:** Powers metadata for major music services and applications worldwide