feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
@@ -0,0 +1,416 @@
# MusicBrainz Server API
## Base Endpoint
`/ws/2/{entity}/{mbid}`
**Version:** 2 (current stable)
**Protocol:** HTTPS (HTTP redirects to HTTPS)
**Base URL:** `https://musicbrainz.org/ws/2/`
## Endpoint Reference
### Core Entities (13)
| Entity | Endpoint | Description |
|--------|----------|-------------|
| artist | `/ws/2/artist/{mbid}` | Artists, bands, orchestras, choirs, characters |
| release | `/ws/2/release/{mbid}` | Physical or digital release of recordings |
| recording | `/ws/2/recording/{mbid}` | Unique audio recording |
| release-group | `/ws/2/release-group/{mbid}` | Logical grouping of releases |
| work | `/ws/2/work/{mbid}` | Musical composition or song |
| label | `/ws/2/label/{mbid}` | Record label or imprint |
| area | `/ws/2/area/{mbid}` | Geographic region (country, city, etc.) |
| event | `/ws/2/event/{mbid}` | Concert, festival, or other music event |
| place | `/ws/2/place/{mbid}` | Venue, studio, or other location |
| series | `/ws/2/series/{mbid}` | Ordered sequence of entities |
| instrument | `/ws/2/instrument/{mbid}` | Musical instrument |
| genre | `/ws/2/genre/{mbid}` | Music genre |
| url | `/ws/2/url/{mbid}` | External URL relationship |
### Identifier Lookups (3)
| Lookup | Endpoint | Description |
|--------|----------|-------------|
| discid | `/ws/2/discid/{discid}` | CD table of contents lookup |
| isrc | `/ws/2/isrc/{isrc}` | International Standard Recording Code |
| iswc | `/ws/2/iswc/{iswc}` | International Standard Musical Work Code |
### User Data Endpoints
| Endpoint | Methods | Description |
|----------|---------|-------------|
| `/ws/2/collection` | GET, POST, PUT, DELETE | User collections |
| `/ws/2/{entity}/{mbid}/tags` | GET, POST | User tags |
| `/ws/2/{entity}/{mbid}/ratings` | GET, POST | User ratings (0-100) |
| `/ws/2/{entity}/{mbid}/annotation` | GET | User annotations |
## HTTP Methods
### GET - Lookup
Retrieve a single entity by MBID:
```
GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da
```
### GET - Browse
Browse entities related to another entity:
```
GET /ws/2/release?artist=5b11f4ce-a62d-471e-81fc-a69a8278c7da
```
### GET - Search
Search entities using Lucene query syntax:
```
GET /ws/2/artist?query=artist:nirvana AND country:US
```
### POST - Submit
Submit new data (requires authentication):
```
POST /ws/2/recording/{mbid}?client={client_id}
Content-Type: application/json
{
"isrcs": ["USRC17607839"]
}
```
### PUT - Add to Collection
Add entities to a collection (semicolon-separated MBIDs):
```
PUT /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2};{mbid3}
```
### DELETE - Remove from Collection
Remove entities from a collection:
```
DELETE /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2}
```
## Query Parameters
### Format Parameter
**Parameter:** `fmt`
**Values:** `xml`, `json`
**Default:** `xml`
```
/ws/2/artist/{mbid}?fmt=json
```
### Include Parameters (inc)
Control which related data to include in the response. Multiple values separated by `+`.
**Common Includes (all entities):**
- `aliases` - Alternative names
- `annotation` - Latest annotation
- `tags` - Folksonomy tags
- `user-tags` - Tags submitted by authenticated user
- `genres` - Genre tags
- `user-genres` - Genres submitted by authenticated user
- `ratings` - Average rating
- `user-ratings` - Rating submitted by authenticated user
**Entity-Specific Includes:**
**Artist:**
- `recordings` - Recordings by this artist
- `releases` - Releases by this artist
- `release-groups` - Release groups by this artist
- `works` - Works by this artist
- `artist-rels` - Relationships to other artists
- `label-rels` - Relationships to labels
- `recording-rels` - Relationships to recordings
- `release-rels` - Relationships to releases
- `release-group-rels` - Relationships to release groups
- `url-rels` - Relationships to URLs
- `work-rels` - Relationships to works
**Release:**
- `artist-credits` - Artist credits for the release
- `labels` - Labels for the release
- `recordings` - Recordings on the release
- `release-groups` - Release group for this release
- `media` - Media (discs) in the release
- `discids` - Disc IDs associated with the release
- `isrcs` - ISRCs for recordings on the release
**Recording:**
- `artist-credits` - Artist credits for the recording
- `releases` - Releases containing this recording
- `isrcs` - ISRCs for this recording
- `work-rels` - Works this recording is a performance of
**Release Group:**
- `artist-credits` - Artist credits for the release group
- `releases` - Releases in this group
**Work:**
- `artist-rels` - Artists related to this work (composers, lyricists)
- `recording-rels` - Recordings of this work
**Example:**
```
/ws/2/release/{mbid}?inc=artist-credits+labels+recordings+media
```
### Browse Parameters
Browse entities related to another entity:
**Parameters:**
- `artist={mbid}` - Browse by artist
- `release={mbid}` - Browse by release
- `release-group={mbid}` - Browse by release group
- `recording={mbid}` - Browse by recording
- `work={mbid}` - Browse by work
- `label={mbid}` - Browse by label
- `area={mbid}` - Browse by area
- `collection={mbid}` - Browse by collection
- `track_artist={mbid}` - Browse by track artist
**Example:**
```
/ws/2/recording?artist=5b11f4ce-a62d-471e-81fc-a69a8278c7da&limit=100
```
### Pagination Parameters
**Parameters:**
- `limit` - Number of results (max 100, default 25)
- `offset` - Starting offset (default 0)
**Example:**
```
/ws/2/artist?query=nirvana&limit=100&offset=100
```
### Search Parameter
**Parameter:** `query`
**Syntax:** Lucene query syntax
**Example:**
```
/ws/2/artist?query=artist:nirvana AND country:US AND type:group
```
## Response Formats
### XML Format
**Namespace:** `http://musicbrainz.org/ns/mmd-2.0#`
```xml
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#">
<artist id="5b11f4ce-a62d-471e-81fc-a69a8278c7da" type="Group">
<name>Nirvana</name>
<sort-name>Nirvana</sort-name>
<country>US</country>
<life-span>
<begin>1987</begin>
<end>1994-04-05</end>
<ended>true</ended>
</life-span>
</artist>
</metadata>
```
### JSON Format
```json
{
"id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"type": "Group",
"name": "Nirvana",
"sort-name": "Nirvana",
"country": "US",
"life-span": {
"begin": "1987",
"end": "1994-04-05",
"ended": true
}
}
```
## Authentication
### OAuth2 Bearer Token
**Primary authentication method for user-specific operations.**
**Header:**
```
Authorization: Bearer {access_token}
```
**Token Endpoint:** `https://musicbrainz.org/oauth2/token`
**Authorization Endpoint:** `https://musicbrainz.org/oauth2/authorize`
**Grant Types:**
- Authorization Code (with PKCE)
- Refresh Token
### HTTP Digest Authentication
**Legacy authentication method, still supported.**
**Header:**
```
Authorization: Digest username="user", realm="musicbrainz.org", ...
```
## OAuth Scopes
| Scope | Description |
|-------|-------------|
| `profile` | Read user profile information |
| `email` | Read user email address |
| `tag` | Submit and modify tags |
| `rating` | Submit and modify ratings |
| `collection` | Create and modify collections |
| `submit_barcode` | Submit barcodes to releases |
| `submit_isrc` | Submit ISRCs to recordings |
## Rate Limiting
**Limits:**
- Maximum 100 items per page
- 1 request per second (recommended)
- Client identification required for POST requests
**Client Identification:**
All POST requests must include a `client` parameter:
```
POST /ws/2/recording/{mbid}?client=MyApp-1.0
```
**Format:** `{application_name}-{version}`
**Rate Limit Headers:**
```
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1609459200
```
## CORS Support
**Enabled:** Yes
**Allowed Origins:** `*`
**Allowed Methods:** GET, POST, PUT, DELETE
**Allowed Headers:** Authorization, Content-Type
## Error Codes
| Code | Description |
|------|-------------|
| 400 | Bad Request - Invalid parameters or malformed request |
| 401 | Unauthorized - Authentication required |
| 403 | Forbidden - Insufficient permissions |
| 404 | Not Found - Entity does not exist |
| 405 | Method Not Allowed - HTTP method not supported for this endpoint |
| 406 | Not Acceptable - Requested format not available |
| 415 | Unsupported Media Type - Invalid Content-Type |
| 501 | Not Implemented - Feature not yet implemented |
| 503 | Service Unavailable - Server overloaded or maintenance |
**Error Response (JSON):**
```json
{
"error": "Not Found",
"help": "For usage, please see: https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2"
}
```
## Example Requests
### Lookup Artist with Releases
```
GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases+release-groups&fmt=json
```
### Search for Recordings
```
GET /ws/2/recording?query=recording:"Smells Like Teen Spirit" AND artist:nirvana&fmt=json
```
### Browse Releases by Artist
```
GET /ws/2/release?artist=5b11f4ce-a62d-471e-81fc-a69a8278c7da&limit=100&offset=0&fmt=json
```
### Submit ISRC
```
POST /ws/2/recording/5b11f4ce-a62d-471e-81fc-a69a8278c7da?client=MyApp-1.0
Authorization: Bearer {token}
Content-Type: application/json
{
"isrcs": ["USRC17607839"]
}
```
### Add Releases to Collection
```
PUT /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2};{mbid3}
Authorization: Bearer {token}
```
## Collection Management
Collections allow users to organize entities (releases, artists, etc.).
**List User Collections:**
```
GET /ws/2/collection?fmt=json
Authorization: Bearer {token}
```
**Get Collection Contents:**
```
GET /ws/2/collection/{collection_mbid}/releases?fmt=json
```
**Add to Collection (semicolon-separated MBIDs):**
```
PUT /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2};{mbid3}
```
**Remove from Collection:**
```
DELETE /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2}
```
## Best Practices
1. **Always include a User-Agent header** identifying your application
2. **Respect rate limits** - 1 request per second recommended
3. **Use client parameter** for all POST requests
4. **Cache responses** when appropriate
5. **Use inc parameters** to minimize requests
6. **Handle errors gracefully** with exponential backoff
7. **Use HTTPS** for all requests (HTTP redirects to HTTPS)
@@ -0,0 +1,568 @@
# MusicBrainz Server Architecture
## Design Pattern
Hybrid MVC + Service Layer architecture built on the Catalyst web framework. The application follows a layered approach with clear separation of concerns between presentation, business logic, and data access.
## Directory Structure
```
lib/MusicBrainz/Server/
├── Controller/ # 53 controllers, 13,000 lines
│ ├── Artist.pm
│ ├── Release.pm
│ ├── Recording.pm
│ ├── WS/ # Web Service controllers
│ │ └── 2/ # API version 2
│ └── ...
├── Data/ # 106 modules, 26,000 lines
│ ├── Artist.pm
│ ├── Release.pm
│ ├── Recording.pm
│ ├── Relationship.pm
│ └── ...
├── Entity/ # 132 entity classes
│ ├── Artist.pm
│ ├── Release.pm
│ ├── Recording.pm
│ ├── Types.pm
│ └── ...
├── Form/ # 43 form handlers
│ ├── Artist.pm
│ ├── Release.pm
│ └── ...
├── View/ # 4 view modules
│ ├── Default.pm # Template Toolkit
│ ├── JSON.pm
│ ├── XML.pm
│ └── JSONLD.pm
├── WebService/ # API implementation
│ ├── Serializer/
│ │ ├── JSON/
│ │ ├── XML/
│ │ └── JSONLD/
│ └── Validator.pm
├── Edit/ # Edit system
│ ├── Artist/
│ ├── Release/
│ ├── Recording/
│ └── ...
├── Context.pm # Service layer coordinator
├── DBDefs.pm # Configuration
└── Sql.pm # SQL abstraction layer
admin/ # Database administration
├── sql/
│ ├── CreateTables.sql # Schema definition (4,068 lines)
│ └── updates/ # 332 migration files
root/ # Frontend assets
├── static/
│ ├── scripts/ # JavaScript source
│ │ ├── common/
│ │ ├── edit/
│ │ └── release/
│ ├── styles/ # CSS/LESS
│ └── images/
└── layout.tt # Main template
t/ # Tests
├── lib/ # Test utilities
├── pgtap/ # Database tests
└── selenium/ # Integration tests
```
## Architectural Layers
### Controller Layer (53 modules, 13,000 lines)
**Responsibility:** Handle HTTP requests, coordinate business logic, render responses.
**Key Controllers:**
- `Artist.pm` - Artist entity operations
- `Release.pm` - Release entity operations
- `Recording.pm` - Recording entity operations
- `ReleaseGroup.pm` - Release group operations
- `Work.pm` - Work entity operations
- `Label.pm` - Label entity operations
- `Edit.pm` - Edit submission and voting
- `Search.pm` - Search interface
- `WS::2::*` - Web service API endpoints
**Controller Pattern:**
```perl
package MusicBrainz::Server::Controller::Artist;
use Moose;
BEGIN { extends 'MusicBrainz::Server::Controller' }
sub show : Path Args(1) {
my ($self, $c, $gid) = @_;
my $artist = $c->model('Artist')->get_by_gid($gid);
$c->stash( artist => $artist );
}
```
**Responsibilities:**
- Request validation
- Authentication/authorization checks
- Coordinate Data layer calls
- Prepare data for views
- Handle form submissions
### Data Layer (106 modules, 26,000 lines)
**Responsibility:** Repository pattern for database access. Each entity has a corresponding Data module.
**Key Data Modules:**
- `Data::Artist` - Artist CRUD operations
- `Data::Release` - Release CRUD operations
- `Data::Recording` - Recording CRUD operations
- `Data::Relationship` - Relationship management
- `Data::Edit` - Edit persistence
- `Data::Search` - Search operations
**Data Module Pattern:**
```perl
package MusicBrainz::Server::Data::Artist;
use Moose;
extends 'MusicBrainz::Server::Data::Entity';
sub _table { 'artist' }
sub _entity_class { 'MusicBrainz::Server::Entity::Artist' }
sub get_by_gid {
my ($self, $gid) = @_;
return $self->_get_by_key('gid', $gid);
}
```
**Moose Roles:**
- `Role::Editable` - Entities that can be edited
- `Role::Taggable` - Entities that can be tagged
- `Role::Rateable` - Entities that can be rated
- `Role::Relatable` - Entities that can have relationships
- `Role::Aliasable` - Entities that can have aliases
- `Role::Annotation` - Entities that can be annotated
**Data Access Pattern:**
- No ORM (not DBIx::Class)
- Custom Moose-based abstraction
- Raw SQL via `DBD::Pg`
- `DBIx::Connector` for connection pooling
- `Sql.pm` provides query builder utilities
### Entity Layer (132 classes)
**Responsibility:** Domain objects representing database entities.
**Key Entities:**
- `Entity::Artist` - Artist domain object
- `Entity::Release` - Release domain object
- `Entity::Recording` - Recording domain object
- `Entity::ReleaseGroup` - Release group domain object
- `Entity::Work` - Work domain object
- `Entity::Label` - Label domain object
- `Entity::Relationship` - Relationship between entities
**Entity Pattern:**
```perl
package MusicBrainz::Server::Entity::Artist;
use Moose;
extends 'MusicBrainz::Server::Entity';
has 'name' => ( is => 'rw', isa => 'Str' );
has 'sort_name' => ( is => 'rw', isa => 'Str' );
has 'type_id' => ( is => 'rw', isa => 'Maybe[Int]' );
has 'country_id' => ( is => 'rw', isa => 'Maybe[Int]' );
has 'begin_date' => ( is => 'rw', isa => 'PartialDate' );
has 'end_date' => ( is => 'rw', isa => 'PartialDate' );
```
**Entity Characteristics:**
- Immutable after construction (mostly)
- Type-safe via Moose type system
- Lazy loading of relationships
- No database logic (pure domain objects)
### Form Layer (43 modules)
**Responsibility:** Form validation and processing using HTML::FormHandler.
**Key Forms:**
- `Form::Artist` - Artist creation/editing
- `Form::Release` - Release creation/editing
- `Form::Recording` - Recording creation/editing
- `Form::Edit::*` - Edit-specific forms
**Form Pattern:**
```perl
package MusicBrainz::Server::Form::Artist;
use HTML::FormHandler::Moose;
extends 'MusicBrainz::Server::Form';
has_field 'name' => ( type => 'Text', required => 1 );
has_field 'sort_name' => ( type => 'Text', required => 1 );
has_field 'type_id' => ( type => 'Select' );
```
### View Layer (4 modules)
**Responsibility:** Render responses in different formats.
**Views:**
- `View::Default` - Template Toolkit for HTML
- `View::JSON` - JSON serialization
- `View::XML` - XML serialization
- `View::JSONLD` - JSON-LD serialization
## Edit System Architecture
**Pattern:** Command Pattern
**Concept:** All data modifications are represented as "edits" - versioned, votable changes that go through a review process.
**Edit Lifecycle:**
1. User submits edit via form
2. Edit is validated and persisted to `edit` table
3. Edit enters voting period (typically 7 days)
4. Community votes on edit (yes/no/abstain)
5. Auto-editors can approve immediately
6. Edit is applied or rejected based on votes
7. Full audit trail maintained
**Edit Types (examples):**
- `Edit::Artist::Create` - Create new artist
- `Edit::Artist::Edit` - Modify artist data
- `Edit::Artist::Delete` - Delete artist
- `Edit::Release::Create` - Create new release
- `Edit::Release::AddReleaseLabel` - Add label to release
- `Edit::Relationship::Create` - Create relationship
- `Edit::Relationship::Edit` - Modify relationship
- `Edit::Relationship::Delete` - Delete relationship
**Edit Structure:**
```perl
package MusicBrainz::Server::Edit::Artist::Edit;
use Moose;
extends 'MusicBrainz::Server::Edit';
sub edit_type { 1 } # Unique edit type ID
sub edit_name { 'Edit artist' }
sub initialize {
my ($self, %opts) = @_;
# Store old and new data
$self->data({
entity_id => $opts{artist_id},
old => { ... },
new => { ... },
});
}
sub accept {
my $self = shift;
# Apply the edit
$self->c->model('Artist')->update($self->data->{entity_id}, $self->data->{new});
}
```
**Edit Data Storage:**
- `edit` table - Edit metadata (type, status, votes)
- `edit_data` table - Edit-specific data (JSON)
- `vote` table - User votes on edits
**Edit Statuses:**
- Open - Awaiting votes
- Applied - Accepted and applied
- Failed Vote - Rejected by community
- Failed Dependency - Dependent edit failed
- Error - Application error
- Deleted - Cancelled by submitter
## Serialization Architecture
### JSON Serializer
**Location:** `lib/MusicBrainz/Server/WebService/Serializer/JSON/2/`
**Modules:**
- `Artist.pm` - Artist JSON serialization
- `Release.pm` - Release JSON serialization
- `Recording.pm` - Recording JSON serialization
- `Utils.pm` - Common serialization utilities
**Pattern:**
```perl
sub serialize {
my ($self, $entity, $inc, $opts) = @_;
my $data = {
id => $entity->gid,
name => $entity->name,
'sort-name' => $entity->sort_name,
};
if ($inc->artist_credits) {
$data->{'artist-credit'} = $self->serialize_artist_credit($entity->artist_credit);
}
return $data;
}
```
### XML Serializer
**Location:** `lib/MusicBrainz/Server/WebService/Serializer/XML/2/`
**Namespace:** `http://musicbrainz.org/ns/mmd-2.0#`
**Pattern:**
```perl
sub serialize {
my ($self, $entity, $inc, $opts) = @_;
my $xml = XML::LibXML::Element->new('artist');
$xml->setAttribute('id', $entity->gid);
$xml->appendTextChild('name', $entity->name);
$xml->appendTextChild('sort-name', $entity->sort_name);
return $xml;
}
```
### JSON-LD Serializer
**Location:** `lib/MusicBrainz/Server/WebService/Serializer/JSONLD/`
**Context:** Schema.org vocabulary
**Pattern:**
```perl
sub serialize {
my ($self, $entity) = @_;
return {
'@context' => 'http://schema.org',
'@type' => 'MusicGroup',
'@id' => 'https://musicbrainz.org/artist/' . $entity->gid,
'name' => $entity->name,
};
}
```
## Frontend Architecture
### Template Toolkit (Server-Side Rendering)
**Location:** `root/`
**Main Template:** `root/layout.tt`
**Template Structure:**
```
root/
├── layout.tt # Main layout
├── artist/
│ ├── index.tt # Artist listing
│ ├── show.tt # Artist detail
│ └── edit.tt # Artist edit form
├── release/
│ ├── index.tt
│ ├── show.tt
│ └── edit.tt
└── components/
├── header.tt
├── footer.tt
└── sidebar.tt
```
**Template Pattern:**
```tt2
[% WRAPPER 'layout.tt' title=artist.name %]
<h1>[% artist.name %]</h1>
<p>Sort name: [% artist.sort_name %]</p>
[% IF artist.releases.size %]
<h2>Releases</h2>
<ul>
[% FOR release IN artist.releases %]
<li><a href="/release/[% release.gid %]">[% release.name %]</a></li>
[% END %]
</ul>
[% END %]
[% END %]
```
### React (Progressive Enhancement)
**Location:** `root/static/scripts/`
**Strategy:** Progressive enhancement - server renders HTML, React hydrates for interactivity.
**Component Structure:**
```
root/static/scripts/
├── common/
│ ├── components/
│ │ ├── EntityLink.js
│ │ ├── Autocomplete.js
│ │ └── ReleaseList.js
│ └── utility/
├── edit/
│ ├── components/
│ │ ├── EditNote.js
│ │ └── VotingSection.js
│ └── reducers/
└── release/
├── components/
│ ├── ReleaseHeader.js
│ └── TrackList.js
└── reducers/
```
**React Pattern:**
```javascript
import React from 'react';
import ReactDOM from 'react-dom';
const ReleaseList = ({ releases }) => (
<ul>
{releases.map(release => (
<li key={release.gid}>
<a href={`/release/${release.gid}`}>{release.name}</a>
</li>
))}
</ul>
);
// Hydrate server-rendered content
const container = document.getElementById('release-list');
if (container) {
const releases = JSON.parse(container.dataset.releases);
ReactDOM.hydrate(<ReleaseList releases={releases} />, container);
}
```
### Legacy Knockout.js
**Status:** Being phased out, but still present in some views.
**Location:** `root/static/scripts/` (mixed with React)
**Pattern:**
```javascript
ko.applyBindings({
releases: ko.observableArray([...]),
addRelease: function() { ... }
});
```
## Service Layer (Context)
**File:** `lib/MusicBrainz/Server/Context.pm`
**Responsibility:** Coordinate operations across multiple Data modules, manage transactions, provide unified interface.
**Pattern:**
```perl
my $artist = $c->model('Artist')->get_by_gid($gid);
$c->model('ArtistCredit')->load($artist);
$c->model('Release')->load_for_artist($artist);
$c->model('Relationship')->load($artist);
```
**Context Provides:**
- Database connection management
- Transaction handling
- Model access (`$c->model('Artist')`)
- Configuration access (`$c->config`)
- Session management
- Request/response handling
## Key Design Patterns
### Repository Pattern
**Implementation:** Data layer modules
**Purpose:** Abstract database access, provide clean interface for entity operations.
**Example:**
```perl
# Instead of raw SQL everywhere:
my $artist = $c->model('Artist')->get_by_gid($gid);
# Data::Artist handles the SQL:
sub get_by_gid {
my ($self, $gid) = @_;
return $self->sql->select_single_row_hash(
'SELECT * FROM artist WHERE gid = ?', $gid
);
}
```
### Command Pattern
**Implementation:** Edit system
**Purpose:** Encapsulate all data modifications as objects, enabling undo, audit trails, and voting.
**Example:**
```perl
my $edit = $c->model('Edit')->create(
edit_type => $EDIT_ARTIST_EDIT,
editor_id => $c->user->id,
artist_id => $artist->id,
old => { name => 'Old Name' },
new => { name => 'New Name' },
);
```
### Service Pattern
**Implementation:** Context object
**Purpose:** Coordinate operations across multiple repositories, manage transactions.
**Example:**
```perl
$c->model('MB')->with_transaction(sub {
my $artist = $c->model('Artist')->insert({ name => 'New Artist' });
$c->model('Edit')->create(
edit_type => $EDIT_ARTIST_CREATE,
entity_id => $artist->id,
);
});
```
## Data Access Layer
**No ORM:** MusicBrainz does not use DBIx::Class or any traditional ORM.
**Custom Abstraction:**
- Moose-based Data modules
- Raw SQL via `DBD::Pg`
- `DBIx::Connector` for connection pooling
- `Sql.pm` provides query builder utilities
**Rationale:**
- Performance - Direct SQL is faster
- Flexibility - Complex queries easier to write
- Control - Full control over query execution
- Legacy - Codebase predates modern ORMs
**SQL Abstraction Example:**
```perl
# lib/MusicBrainz/Server/Data/Sql.pm
sub select_single_row_hash {
my ($self, $query, @args) = @_;
my $row = $self->dbh->selectrow_hashref($query, undef, @args);
return $row;
}
sub select_list_of_hashes {
my ($self, $query, @args) = @_;
my $rows = $self->dbh->selectall_arrayref($query, { Slice => {} }, @args);
return $rows;
}
```
@@ -0,0 +1,736 @@
# MusicBrainz Server Codebase
## Configuration System
### Two-Tier Architecture
**File:** `lib/DBDefs.pm`
**Structure:**
1. `lib/DBDefs/Default.pm` - Base defaults (in git)
2. `lib/DBDefs.pm` - Instance-specific overrides (not in git)
**Pattern:**
```perl
package DBDefs;
use parent 'DBDefs::Default';
# Override defaults for this instance
sub DB_SCHEMA_SEQUENCE { 28 }
sub DB_STAGING_SERVER { 0 }
sub REPLICATION_TYPE { RT_MASTER }
```
### Configuration Categories
**Database Configuration:**
```perl
# Primary database
sub READWRITE_DATABASE {
return {
database => 'musicbrainz_db',
host => 'localhost',
port => 5432,
username => 'musicbrainz',
password => 'musicbrainz',
};
}
# Read-only replica (optional)
sub READONLY_DATABASE { READWRITE_DATABASE }
# System user for maintenance
sub SYSTEM_USER { 'musicbrainz' }
# Schema version
sub DB_SCHEMA_SEQUENCE { 28 }
# Staging server flag
sub DB_STAGING_SERVER { 0 }
```
**Redis Configuration:**
```perl
# Redis server
sub REDIS_SERVER { 'localhost:6379' }
# Redis namespace (prefix for all keys)
sub REDIS_NAMESPACE { 'MB' }
# Redis databases (0-15)
sub REDIS_DATABASE_CACHE { 0 }
sub REDIS_DATABASE_SESSION { 1 }
sub REDIS_DATABASE_SEARCH { 2 }
sub REDIS_DATABASE_STATS { 3 }
```
**Solr Configuration:**
```perl
# Solr server
sub SOLR_SERVER { 'http://localhost:8983/solr' }
# Solr cores
sub SOLR_CORE_ARTIST { 'artist' }
sub SOLR_CORE_RELEASE { 'release' }
sub SOLR_CORE_RECORDING { 'recording' }
# ... (13 cores total)
```
**Web Server Configuration:**
```perl
# Server processes
sub WEB_SERVER_PROCESSES { 10 }
# Server host
sub WEB_SERVER_HOST { 'localhost' }
# Server port
sub WEB_SERVER_PORT { 5000 }
# Use reverse proxy
sub WEB_SERVER_USED_IN_REVERSE_PROXY { 1 }
```
**Mail Configuration:**
```perl
# SMTP server
sub SMTP_SERVER { 'localhost' }
# From address
sub EMAIL_SUPPORT_ADDRESS { 'support@musicbrainz.org' }
# Noreply address
sub EMAIL_NOREPLY_ADDRESS { 'noreply@musicbrainz.org' }
# Bugs address
sub EMAIL_BUGS_ADDRESS { 'bugs@musicbrainz.org' }
```
**External Service Configuration:**
```perl
# Cover Art Archive
sub COVER_ART_ARCHIVE_ACCESS_KEY { '' }
sub COVER_ART_ARCHIVE_SECRET_KEY { '' }
sub COVER_ART_ARCHIVE_UPLOAD_PREFIXER { 'MB' }
sub COVER_ART_ARCHIVE_DOWNLOAD_PREFIX { 'https://coverartarchive.org' }
# Wikipedia
sub WIKIPEDIA_CACHE_TIMEOUT { 259200 } # 3 days
# Discourse SSO
sub DISCOURSE_SSO_SECRET { '' }
sub DISCOURSE_SERVER { 'https://community.metabrainz.org' }
# MetaBrainz OAuth
sub OAUTH2_ENFORCE_TLS { 1 }
```
**Replication Configuration:**
```perl
# Replication type
sub REPLICATION_TYPE { RT_STANDALONE } # RT_MASTER, RT_MIRROR, RT_STANDALONE
# Replication access token
sub REPLICATION_ACCESS_TOKEN { '' }
# Replication URL
sub REPLICATION_URL { 'https://data.musicbrainz.org/replication' }
```
**Session Configuration:**
```perl
# Session expiry (10 hours)
sub SESSION_EXPIRE { 36000 }
# Session idle timeout (3 hours)
sub SESSION_IDLE_TIMEOUT { 10800 }
# Session cookie name
sub SESSION_COOKIE { 'AF_SID' }
# Session cookie domain
sub SESSION_DOMAIN { '.musicbrainz.org' }
```
**Feature Flags:**
```perl
# Enable beta features
sub BETA_FEATURES { 0 }
# Enable development mode
sub DEVELOPMENT_SERVER { 0 }
# Enable debug mode
sub DEBUG { 0 }
# Enable SQL logging
sub DB_READ_ONLY { 0 }
```
**Rate Limiting:**
```perl
# API rate limit (requests per second)
sub API_RATE_LIMIT { 1 }
# Web rate limit (requests per second)
sub WEB_RATE_LIMIT { 10 }
```
**Caching:**
```perl
# Cache TTL for entities (seconds)
sub CACHE_TTL_ENTITY { 3600 } # 1 hour
# Cache TTL for search results (seconds)
sub CACHE_TTL_SEARCH { 900 } # 15 minutes
# Cache TTL for statistics (seconds)
sub CACHE_TTL_STATS { 3600 } # 1 hour
```
## Logging System
### Log::Dispatch Framework
**Configuration:**
```perl
use Log::Dispatch;
my $log = Log::Dispatch->new(
outputs => [
[
'Screen',
min_level => 'debug',
stderr => 1,
newline => 1,
],
[
'File',
min_level => 'info',
filename => '/var/log/musicbrainz/server.log',
mode => 'append',
newline => 1,
],
],
);
```
### Log Levels
**DEBUG:** Verbose debugging information
```perl
$log->debug("Loading artist with GID: $gid");
```
**INFO:** Informational messages
```perl
$log->info("User $username logged in");
```
**WARN:** Warning messages
```perl
$log->warn("Cache miss for entity $gid");
```
**ERROR:** Error messages
```perl
$log->error("Failed to connect to database: $error");
```
**FATAL:** Fatal errors
```perl
$log->fatal("Database connection lost, shutting down");
```
### Message Limit
**Maximum Size:** 16KB per log message
**Truncation:** Messages exceeding 16KB are truncated with "..." suffix
**Rationale:** Prevent log flooding from large data dumps
### Lazy Evaluation
**Pattern:**
```perl
# Expensive operation only executed if debug level enabled
$log->debug(sub {
my $data = expensive_serialization($object);
return "Object data: $data";
});
```
**Benefits:**
- Avoid expensive operations when logging disabled
- Reduce CPU usage in production
### Stack Traces
**Automatic:** Stack traces included for ERROR and FATAL levels
**Format:**
```
ERROR: Failed to load artist
Stack trace:
at MusicBrainz::Server::Data::Artist::get_by_gid line 123
at MusicBrainz::Server::Controller::Artist::show line 45
at Catalyst::Action::execute line 67
```
### Log Rotation
**Tool:** logrotate
**Configuration:**
```
/var/log/musicbrainz/*.log {
daily
rotate 30
compress
delaycompress
notifempty
create 0640 musicbrainz musicbrainz
sharedscripts
postrotate
/usr/bin/killall -HUP starman
endscript
}
```
## Error Tracking (Sentry)
### Server-Side Integration
**Library:** Sentry::Raven (Perl SDK)
**Configuration:**
```perl
use Sentry::Raven;
my $raven = Sentry::Raven->new(
sentry_dsn => 'https://public_key@sentry.io/project_id',
environment => 'production',
release => '2024.01.15',
);
```
**Capture Exception:**
```perl
eval {
# Code that might fail
$c->model('Artist')->get_by_gid($gid);
};
if ($@) {
$raven->capture_exception($@, {
request => {
url => $c->req->uri,
method => $c->req->method,
headers => $c->req->headers,
},
user => {
id => $c->user->id,
username => $c->user->name,
},
extra => {
gid => $gid,
},
});
}
```
### Client-Side Integration
**Library:** @sentry/browser (JavaScript SDK)
**Configuration:**
```javascript
import * as Sentry from '@sentry/browser';
Sentry.init({
dsn: 'https://public_key@sentry.io/project_id',
environment: 'production',
release: '2024.01.15',
integrations: [
new Sentry.BrowserTracing(),
],
tracesSampleRate: 0.1,
});
```
**Capture Exception:**
```javascript
try {
// Code that might fail
loadArtist(gid);
} catch (error) {
Sentry.captureException(error, {
tags: {
component: 'ArtistPage',
},
extra: {
gid: gid,
},
});
}
```
### Context Enrichment
**Request Context:**
- URL
- HTTP method
- Headers
- Query parameters
- POST data (sanitized)
**User Context:**
- User ID
- Username
- Email (hashed)
- IP address (anonymized)
**Custom Context:**
- Entity GID
- Edit ID
- Search query
- API endpoint
## Monitoring
### Current State
**Metrics Endpoint:** None (no Prometheus exporter)
**Health Check Endpoint:** None (no `/health` endpoint)
**Workarounds:**
- Monitor HTTP 200 responses on `/`
- Parse logs for error rates
- Monitor database connection count
- Monitor Redis memory usage
### Planned Improvements
**Prometheus Exporter:**
- Request count by endpoint
- Request duration histogram
- Database query count
- Database query duration
- Cache hit/miss ratio
- Edit submission rate
- Vote count
**Health Check Endpoint:**
- Database connectivity
- Redis connectivity
- Solr connectivity
- Disk space
- Memory usage
## Session Management
### Redis-Backed Sessions
**Storage:** Redis database 1
**Session Key:** `session:{session_id}`
**Session Data:**
```json
{
"user_id": 12345,
"username": "user",
"csrf_token": "abc123...",
"last_activity": 1609459200,
"preferences": {
"language": "en",
"timezone": "UTC"
}
}
```
### Session Lifecycle
**Creation:**
```perl
my $session_id = generate_session_id(); # Random 32-byte hex
my $session_data = {
user_id => $user->id,
csrf_token => generate_csrf_token(),
last_activity => time(),
};
$redis->setex(
"session:$session_id",
36000, # 10 hours
encode_json($session_data)
);
$c->res->cookies->{AF_SID} = {
value => $session_id,
path => '/',
domain => '.musicbrainz.org',
secure => 1,
httponly => 1,
samesite => 'Lax',
};
```
**Validation:**
```perl
my $session_id = $c->req->cookies->{AF_SID};
my $session_json = $redis->get("session:$session_id");
if (!$session_json) {
# Session expired or invalid
return undef;
}
my $session_data = decode_json($session_json);
# Check idle timeout
my $idle_time = time() - $session_data->{last_activity};
if ($idle_time > 10800) { # 3 hours
$redis->del("session:$session_id");
return undef;
}
# Update last activity
$session_data->{last_activity} = time();
$redis->setex("session:$session_id", 36000, encode_json($session_data));
return $session_data;
```
**Destruction:**
```perl
$redis->del("session:$session_id");
$c->res->cookies->{AF_SID} = {
value => '',
expires => '-1d',
};
```
### Session Expiry
**Absolute Expiry:** 10 hours (36,000 seconds)
**Idle Timeout:** 3 hours (10,800 seconds)
**Sliding Window:** Last activity updated on each request
### Cookie Configuration
**Name:** `AF_SID`
**Attributes:**
- `Secure` - HTTPS only
- `HttpOnly` - Not accessible via JavaScript
- `SameSite=Lax` - CSRF protection
- `Domain=.musicbrainz.org` - Shared across subdomains
- `Path=/` - Available site-wide
## Security
### CSRF Protection
**Token Generation:**
```perl
use Digest::SHA qw(sha256_hex);
my $csrf_token = sha256_hex(
$session_id .
$user_id .
time() .
random_bytes(32)
);
```
**Token Storage:** Stored in session data
**Token Validation:**
```perl
sub validate_csrf_token {
my ($c, $submitted_token) = @_;
my $session_token = $c->session->{csrf_token};
if (!$session_token || $submitted_token ne $session_token) {
$c->detach('/error_403');
}
}
```
**Form Inclusion:**
```html
<form method="POST" action="/edit/artist/create">
<input type="hidden" name="csrf_token" value="[% csrf_token %]">
<!-- form fields -->
</form>
```
**AJAX Requests:**
```javascript
fetch('/api/endpoint', {
method: 'POST',
headers: {
'X-CSRF-Token': csrfToken,
'Content-Type': 'application/json',
},
body: JSON.stringify(data),
});
```
### Content Security Policy (CSP)
**Header:**
```
Content-Security-Policy:
default-src 'self';
script-src 'self' 'unsafe-inline' https://www.google-analytics.com;
style-src 'self' 'unsafe-inline';
img-src 'self' data: https:;
font-src 'self' data:;
connect-src 'self' https://sentry.io;
frame-ancestors 'none';
```
**Directives:**
- `default-src 'self'` - Only load resources from same origin
- `script-src` - Allow scripts from self and Google Analytics
- `style-src` - Allow styles from self (inline allowed for legacy)
- `img-src` - Allow images from anywhere (cover art, etc.)
- `connect-src` - Allow AJAX to self and Sentry
- `frame-ancestors 'none'` - Prevent clickjacking
### Authentication
**Realms:**
1. Session-based (cookie)
2. HTTP Digest (legacy)
3. OAuth2 Bearer token
**Session Authentication:**
```perl
sub authenticate_session {
my ($c) = @_;
my $session_id = $c->req->cookies->{AF_SID};
my $session = $c->model('Session')->load($session_id);
if ($session) {
my $user = $c->model('Editor')->get_by_id($session->{user_id});
$c->set_authenticated_user($user);
}
}
```
**OAuth2 Authentication:**
```perl
sub authenticate_oauth2 {
my ($c) = @_;
my $auth_header = $c->req->header('Authorization');
if ($auth_header =~ /^Bearer (.+)$/) {
my $token = $1;
my $token_info = $c->model('OAuth2')->introspect($token);
if ($token_info->{active}) {
my $user = $c->model('Editor')->get_by_id($token_info->{sub});
$c->set_authenticated_user($user);
}
}
}
```
### Password Hashing
**Algorithm:** Bcrypt
**Cost Factor:** 12 (2^12 = 4096 iterations)
**Hashing:**
```perl
use Crypt::Eksblowfish::Bcrypt qw(bcrypt en_base64);
sub hash_password {
my ($password) = @_;
my $salt = generate_salt(); # 16 random bytes
my $settings = '$2a$12$' . en_base64($salt);
return bcrypt($password, $settings);
}
```
**Verification:**
```perl
sub verify_password {
my ($password, $hash) = @_;
my $computed_hash = bcrypt($password, $hash);
return $computed_hash eq $hash;
}
```
**Password Requirements:**
- Minimum 8 characters
- No maximum length
- No complexity requirements (user choice)
### Editor Privileges
**Privilege Flags (Bitmask):**
| Flag | Value | Description |
|------|-------|-------------|
| `UNTRUSTED` | 1 | New user, limited privileges |
| `AUTOEDITOR` | 2 | Auto-editor, edits auto-approved |
| `BOT` | 4 | Bot account |
| `UNTRUSTED_BOT` | 5 | Untrusted bot (1 + 4) |
| `RELATIONSHIP_EDITOR` | 8 | Can edit relationships |
| `WIKI_TRANSCLUSION` | 16 | Can transclude wiki content |
| `MBID_SUBMITTER` | 32 | Can submit MBIDs |
| `ACCOUNT_ADMIN` | 64 | Can manage user accounts |
| `LOCATION_EDITOR` | 128 | Can edit locations |
| `BANNER_EDITOR` | 256 | Can edit site banners |
| `EDITING_DISABLED` | 512 | Editing disabled (banned) |
| `ADDING_NOTES_DISABLED` | 1024 | Cannot add edit notes |
| `SPAMMER` | 2048 | Marked as spammer |
| `AUTO_EDITOR_ELECTIONS` | 4096 | Can vote in auto-editor elections |
| `DONT_NAG` | 8192 | Don't show donation nag |
**Privilege Check:**
```perl
sub is_auto_editor {
my ($user) = @_;
return ($user->privs & 2) != 0;
}
sub can_edit_relationships {
my ($user) = @_;
return ($user->privs & 8) != 0;
}
```
### Auto-Editor Election System
**Eligibility:**
- 100+ accepted edits
- Member for 2+ weeks
- No recent failed votes
**Election Process:**
1. User nominates self or is nominated
2. 1-week voting period
3. Existing auto-editors vote
4. 75% approval required
5. Minimum 5 votes required
**Auto-Editor Benefits:**
- Edits auto-approved (no voting period)
- Can vote in elections
- Can approve/reject edits
- Higher trust level
@@ -0,0 +1,618 @@
# MusicBrainz Server Data Layer
## Database Overview
**Engine:** PostgreSQL 16+
**Tables:** 375
**Foreign Key Constraints:** 500+
**Schema Definition:** `admin/sql/CreateTables.sql` (4,068 lines)
**Production Size:** ~350GB (full dataset with indexes)
## PostgreSQL Schema
### Core Entity Tables
**Artists:**
- `artist` - Artist entities (bands, musicians, orchestras, etc.)
- `artist_alias` - Alternative names for artists
- `artist_credit` - Artist credit configurations
- `artist_credit_name` - Individual artists in a credit
- `artist_type` - Artist type enumeration (person, group, etc.)
- `artist_tag` - Folksonomy tags
- `artist_rating_raw` - User ratings
- `artist_annotation` - User annotations
- `artist_gid_redirect` - MBID redirects after merges
**Releases:**
- `release` - Release entities (albums, singles, etc.)
- `release_alias` - Alternative release names
- `release_group` - Logical grouping of releases
- `release_group_primary_type` - Album, Single, EP, etc.
- `release_group_secondary_type` - Compilation, Live, Remix, etc.
- `release_status` - Official, Promotion, Bootleg, etc.
- `release_packaging` - Jewel Case, Digipak, etc.
- `release_label` - Labels associated with release
- `release_country` - Release events by country
- `release_tag` - Folksonomy tags
- `release_rating_raw` - User ratings
- `release_annotation` - User annotations
- `release_gid_redirect` - MBID redirects
**Recordings:**
- `recording` - Recording entities (unique audio recordings)
- `recording_alias` - Alternative recording names
- `recording_tag` - Folksonomy tags
- `recording_rating_raw` - User ratings
- `recording_annotation` - User annotations
- `recording_gid_redirect` - MBID redirects
- `isrc` - International Standard Recording Codes
- `recording_isrc` - Recording to ISRC mapping
**Works:**
- `work` - Musical composition entities
- `work_alias` - Alternative work names
- `work_type` - Song, Symphony, Opera, etc.
- `work_attribute` - Work attributes (key, tempo, etc.)
- `work_attribute_type` - Attribute type definitions
- `work_tag` - Folksonomy tags
- `work_rating_raw` - User ratings
- `work_annotation` - User annotations
- `work_gid_redirect` - MBID redirects
- `iswc` - International Standard Musical Work Codes
- `work_iswc` - Work to ISWC mapping
**Labels:**
- `label` - Record label entities
- `label_alias` - Alternative label names
- `label_type` - Original Production, Bootleg Production, etc.
- `label_tag` - Folksonomy tags
- `label_rating_raw` - User ratings
- `label_annotation` - User annotations
- `label_gid_redirect` - MBID redirects
**Geographic:**
- `area` - Geographic areas (countries, cities, etc.)
- `area_alias` - Alternative area names
- `area_type` - Country, Subdivision, City, etc.
- `area_tag` - Folksonomy tags
- `area_annotation` - User annotations
- `area_gid_redirect` - MBID redirects
- `country_area` - ISO country code mapping
- `iso_3166_1` - ISO 3166-1 country codes
- `iso_3166_2` - ISO 3166-2 subdivision codes
- `iso_3166_3` - ISO 3166-3 former country codes
**Events:**
- `event` - Event entities (concerts, festivals, etc.)
- `event_alias` - Alternative event names
- `event_type` - Concert, Festival, etc.
- `event_tag` - Folksonomy tags
- `event_rating_raw` - User ratings
- `event_annotation` - User annotations
- `event_gid_redirect` - MBID redirects
**Places:**
- `place` - Venue/location entities
- `place_alias` - Alternative place names
- `place_type` - Venue, Studio, etc.
- `place_tag` - Folksonomy tags
- `place_annotation` - User annotations
- `place_gid_redirect` - MBID redirects
**Series:**
- `series` - Ordered sequence entities
- `series_alias` - Alternative series names
- `series_type` - Release group series, etc.
- `series_ordering_type` - Automatic, Manual
- `series_tag` - Folksonomy tags
- `series_annotation` - User annotations
- `series_gid_redirect` - MBID redirects
**Instruments:**
- `instrument` - Musical instrument entities
- `instrument_alias` - Alternative instrument names
- `instrument_type` - Wind, String, Percussion, etc.
- `instrument_tag` - Folksonomy tags
- `instrument_annotation` - User annotations
- `instrument_gid_redirect` - MBID redirects
**Genres:**
- `genre` - Genre entities
- `genre_alias` - Alternative genre names
- `genre_annotation` - User annotations
- `genre_gid_redirect` - MBID redirects
**URLs:**
- `url` - External URL entities
- `url_gid_redirect` - MBID redirects
### Relationship Tables (l_* tables)
**Pattern:** `l_{entity1}_{entity2}` for relationships between entities.
**Examples:**
- `l_artist_artist` - Artist-to-artist relationships (member of, collaboration, etc.)
- `l_artist_recording` - Artist-to-recording relationships (performer, conductor, etc.)
- `l_artist_release` - Artist-to-release relationships
- `l_artist_release_group` - Artist-to-release-group relationships
- `l_artist_work` - Artist-to-work relationships (composer, lyricist, etc.)
- `l_artist_url` - Artist-to-URL relationships (official homepage, social media, etc.)
- `l_recording_work` - Recording-to-work relationships (performance of)
- `l_release_release_group` - Release-to-release-group relationships
- `l_release_url` - Release-to-URL relationships (purchase links, streaming, etc.)
**Relationship Support Tables:**
- `link` - Link instances
- `link_type` - Relationship type definitions
- `link_attribute` - Relationship attributes
- `link_attribute_type` - Attribute type definitions
- `link_crediting` - Custom relationship credits
- `link_text_attribute` - Text attributes for relationships
### Media Tables
**Physical Media:**
- `medium` - Physical media (CDs, vinyl, etc.)
- `medium_format` - CD, Vinyl, Digital Media, etc.
- `medium_cdtoc` - CD table of contents
- `cdtoc` - CD TOC data
- `cdtoc_raw` - Raw CD TOC data
**Tracks:**
- `track` - Individual tracks on media
- `track_gid_redirect` - Track MBID redirects
### Metadata Tables
**Tags:**
- `tag` - Tag definitions
- `tag_relation` - Tag relationships
- `{entity}_tag` - Tags per entity type
- `{entity}_tag_raw` - Raw user tag submissions
**Ratings:**
- `{entity}_rating_raw` - Raw user ratings per entity type
**Annotations:**
- `annotation` - Annotation text
- `{entity}_annotation` - Annotations per entity type
**Collections:**
- `editor_collection` - User collections
- `editor_collection_type` - Collection type (release, artist, etc.)
- `editor_collection_{entity}` - Collection contents per entity type
### Editorial Tables
**Edits:**
- `edit` - Edit submissions
- `edit_data` - Edit-specific data (JSON)
- `edit_{entity}` - Edit to entity mappings
- `vote` - User votes on edits
- `edit_note` - Discussion notes on edits
- `edit_note_recipient` - Edit note notifications
**Editors:**
- `editor` - User accounts
- `editor_preference` - User preferences
- `editor_language` - User language preferences
- `editor_subscribe_artist` - Artist subscriptions
- `editor_subscribe_collection` - Collection subscriptions
- `editor_subscribe_label` - Label subscriptions
- `editor_subscribe_series` - Series subscriptions
- `editor_subscribe_editor` - Editor subscriptions
- `editor_oauth_token` - OAuth tokens
- `application` - OAuth applications
**Moderation:**
- `autoeditor_election` - Auto-editor elections
- `autoeditor_election_vote` - Election votes
- `editor_watch_preferences` - Watchlist preferences
- `editor_watch_artist` - Artist watchlist
- `editor_watch_release_group_type` - Release group type filters
- `editor_watch_release_status` - Release status filters
### Identifier Tables
**Standard Identifiers:**
- `isrc` - International Standard Recording Code
- `iswc` - International Standard Musical Work Code
- `recording_isrc` - Recording to ISRC mapping
- `work_iswc` - Work to ISWC mapping
**MusicBrainz Identifiers:**
- `{entity}_gid_redirect` - MBID redirects after merges
**Barcodes:**
- `release_barcode` - Release barcodes (EAN, UPC)
### Replication Tables (dbmirror2)
**Replication System:**
- `dbmirror_pending` - Pending replication packets
- `dbmirror_pendingdata` - Replication data
- `replication_control` - Replication state tracking
**Modes:**
- `RT_MASTER` - Master database (generates replication packets)
- `RT_MIRROR` - Mirror database (consumes replication packets)
- `RT_STANDALONE` - Standalone database (no replication)
### Auxiliary Tables
**Statistics:**
- `statistic` - Cached statistics
- `statistic_event` - Statistic calculation events
**Documentation:**
- `documentation.l_{entity1}_{entity2}_example` - Relationship examples
**Deprecated:**
- Various `_deleted` tables for soft deletes
## Schema Management
### CreateTables.sql
**Location:** `admin/sql/CreateTables.sql`
**Size:** 4,068 lines
**Purpose:** Complete schema definition for fresh installations
**Structure:**
```sql
-- Core entity tables
CREATE TABLE artist (...);
CREATE TABLE release (...);
CREATE TABLE recording (...);
-- Indexes
CREATE INDEX artist_idx_name ON artist (name);
CREATE INDEX artist_idx_gid ON artist (gid);
-- Foreign keys
ALTER TABLE artist_credit_name
ADD CONSTRAINT artist_credit_name_fk_artist
FOREIGN KEY (artist) REFERENCES artist(id);
-- Triggers
CREATE TRIGGER a_ins_artist AFTER INSERT ON artist ...;
```
### Migration System
**Location:** `admin/sql/updates/`
**Count:** 332 migration files
**Naming:** Date-based (YYYYMMDD-HHMMSS-description.sql)
**Example Filenames:**
- `20230115-mbs-12345-add-genre-table.sql`
- `20230220-mbs-12346-add-event-series-relationship.sql`
- `20230315-mbs-12347-add-recording-length-index.sql`
**Migration Structure:**
```sql
\set ON_ERROR_STOP 1
BEGIN;
-- Schema changes
ALTER TABLE artist ADD COLUMN disambiguation TEXT;
-- Data migrations
UPDATE artist SET disambiguation = '' WHERE disambiguation IS NULL;
-- Constraints
ALTER TABLE artist ALTER COLUMN disambiguation SET NOT NULL;
COMMIT;
```
**Schema Change Variants:**
- `schema-change/` subdirectory contains master/mirror variants
- Master migrations may include replication setup
- Mirror migrations skip replication-specific changes
**Migration Tracking:**
- Migrations are tracked in the database
- Applied migrations recorded to prevent re-application
- Rollback not supported (forward-only migrations)
## Custom ORM (Moose-based Data Layer)
### Architecture
**NOT DBIx::Class** - MusicBrainz uses a custom Moose-based data access layer.
**Components:**
- 106 Data modules in `lib/MusicBrainz/Server/Data/`
- `DBIx::Connector` for connection pooling
- `Sql.pm` for query abstraction
- Raw SQL via `DBD::Pg`
### Data Module Pattern
**Base Class:** `MusicBrainz::Server::Data::Entity`
**Example:**
```perl
package MusicBrainz::Server::Data::Artist;
use Moose;
extends 'MusicBrainz::Server::Data::Entity';
with 'MusicBrainz::Server::Data::Role::Editable';
with 'MusicBrainz::Server::Data::Role::LinksToEdit';
with 'MusicBrainz::Server::Data::Role::Merge';
sub _table { 'artist' }
sub _entity_class { 'MusicBrainz::Server::Entity::Artist' }
sub _columns {
return 'id, gid, name, sort_name, begin_date_year, begin_date_month,
begin_date_day, end_date_year, end_date_month, end_date_day,
type, area, gender, comment, edits_pending, last_updated,
ended, begin_area, end_area';
}
sub _column_mapping {
return {
id => 'id',
gid => 'gid',
name => 'name',
sort_name => 'sort_name',
type_id => 'type',
area_id => 'area',
gender_id => 'gender',
comment => 'comment',
edits_pending => 'edits_pending',
last_updated => 'last_updated',
ended => 'ended',
begin_area_id => 'begin_area',
end_area_id => 'end_area',
};
}
sub get_by_gid {
my ($self, $gid) = @_;
return $self->_get_by_key('gid', $gid);
}
sub insert {
my ($self, $data) = @_;
my $row = $self->_hash_to_row($data);
my $id = $self->sql->insert_row('artist', $row, 'id');
return $self->_new_from_row($row);
}
```
### Moose Roles
**Role::Editable:**
- Entities that can be edited via the edit system
- Provides `load_meta()` for edit counts
**Role::Taggable:**
- Entities that support folksonomy tags
- Provides `tags()`, `add_tags()`, `remove_tags()`
**Role::Rateable:**
- Entities that can be rated (0-100 scale)
- Provides `rating()`, `user_rating()`
**Role::Relatable:**
- Entities that can have relationships
- Provides `relationships()`, `add_relationship()`
**Role::Aliasable:**
- Entities that can have alternative names
- Provides `aliases()`, `add_alias()`
**Role::Annotation:**
- Entities that can be annotated
- Provides `latest_annotation()`
### Sql.pm Abstraction
**Location:** `lib/MusicBrainz/Server/Sql.pm`
**Purpose:** Thin abstraction over DBI for common query patterns.
**Methods:**
```perl
# Single row
my $row = $sql->select_single_row_hash(
'SELECT * FROM artist WHERE gid = ?', $gid
);
# Multiple rows
my $rows = $sql->select_list_of_hashes(
'SELECT * FROM artist WHERE area = ?', $area_id
);
# Insert
my $id = $sql->insert_row('artist', {
gid => $gid,
name => $name,
sort_name => $sort_name,
}, 'id');
# Update
$sql->update_row('artist', {
name => $new_name,
}, { id => $artist_id });
# Delete
$sql->delete_row('artist', { id => $artist_id });
# Transaction
$sql->begin;
eval {
$sql->insert_row(...);
$sql->update_row(...);
$sql->commit;
};
if ($@) {
$sql->rollback;
die $@;
}
```
### DBIx::Connector
**Purpose:** Fast, safe DBI connection management with automatic reconnection.
**Configuration:**
```perl
my $conn = DBIx::Connector->new(
$dsn, $username, $password,
{
RaiseError => 1,
AutoCommit => 1,
pg_enable_utf8 => 1,
}
);
# Execute with automatic reconnection
$conn->run(sub {
my $dbh = $_;
$dbh->do('SELECT ...');
});
```
## Search Infrastructure
### Apache Solr (Primary)
**Purpose:** Full-text search across all entities
**Cores:**
- `artist` - Artist search
- `release` - Release search
- `release-group` - Release group search
- `recording` - Recording search
- `work` - Work search
- `label` - Label search
- `area` - Area search
- `event` - Event search
- `place` - Place search
- `series` - Series search
- `instrument` - Instrument search
- `tag` - Tag search
**Indexing:**
- Incremental updates via edit system
- Full reindex via `admin/BuildSearchIndexes.pl`
- Real-time updates for new entities
**Query Features:**
- Fuzzy matching
- Phrase search
- Boolean operators (AND, OR, NOT)
- Field-specific search (artist:nirvana)
- Wildcards (nirv*)
- Proximity search ("smells spirit"~5)
### PostgreSQL Full-Text (Fallback)
**Purpose:** Fallback when Solr is unavailable
**Implementation:**
- `mb_simple_tsvector` function for text vectorization
- GIN indexes on tsvector columns
- `to_tsquery()` for query parsing
**Example:**
```sql
CREATE INDEX artist_idx_name_txt ON artist
USING gin(mb_simple_tsvector(name));
SELECT * FROM artist
WHERE mb_simple_tsvector(name) @@ to_tsquery('simple', 'nirvana');
```
**Limitations:**
- Less sophisticated than Solr
- No fuzzy matching
- Limited ranking
- Used only as emergency fallback
## Redis Caching
### Architecture
**Databases:** 16 separate Redis databases (0-15)
**Database Allocation:**
- DB 0: Entity cache (GID lookups)
- DB 1: Session storage
- DB 2-15: Various caches (search, statistics, etc.)
### Entity Cache (GID Cache)
**Purpose:** Cache entity lookups by MBID (GID)
**Pattern:**
```perl
# Cache key: entity:gid:{gid}
my $cache_key = "artist:gid:$gid";
# Try cache first
my $cached = $redis->get($cache_key);
if ($cached) {
return decode_json($cached);
}
# Cache miss - load from database
my $artist = $self->sql->select_single_row_hash(
'SELECT * FROM artist WHERE gid = ?', $gid
);
# Store in cache (1 hour TTL)
$redis->setex($cache_key, 3600, encode_json($artist));
return $artist;
```
**TTL:** 1 hour (3600 seconds)
**Invalidation:** On edit application
### Session Storage
**Purpose:** Store user sessions
**Pattern:**
```perl
# Session key: session:{session_id}
my $session_key = "session:$session_id";
# Store session
$redis->setex($session_key, 36000, encode_json({
user_id => $user_id,
csrf_token => $csrf_token,
last_activity => time(),
}));
# Retrieve session
my $session = decode_json($redis->get($session_key));
```
**TTL:** 10 hours absolute, 3 hours idle
**Cookie:** `AF_SID` (SameSite=Lax, Secure, HttpOnly)
### Cache Invalidation
**Strategy:** Invalidate on write
**Example:**
```perl
# After updating artist
$self->sql->update_row('artist', { name => $new_name }, { id => $id });
# Invalidate cache
$redis->del("artist:gid:$gid");
```
**Bulk Invalidation:**
- Pattern-based deletion via `SCAN` + `DEL`
- Used for relationship changes affecting multiple entities
@@ -0,0 +1,707 @@
# MusicBrainz Server Deployment
## Docker Architecture
### Build System
**Template Engine:** M4 macros
**Base Image:** Ubuntu Noble (24.04 LTS)
**Dockerfile Location:** `docker/Dockerfile.template`
**Template Processing:**
```bash
# Generate Dockerfile from template
m4 docker/Dockerfile.template > docker/Dockerfile
```
**M4 Macros:**
- `INSTALL_PERL_DEPENDENCIES` - Install Perl modules via carton
- `INSTALL_NODE_DEPENDENCIES` - Install Node.js packages via yarn
- `COMPILE_RESOURCES` - Compile static assets
- `SETUP_DATABASE` - Initialize PostgreSQL schema
**Multi-Stage Build:**
1. Base stage - Install system dependencies
2. Build stage - Compile assets and dependencies
3. Runtime stage - Copy artifacts, minimal runtime
### Container Types
**website:**
- Main web application
- Serves HTML pages via Template Toolkit
- Handles user authentication and sessions
- Port: 5000
**webservice:**
- API endpoints (/ws/2/)
- JSON/XML serialization
- OAuth authentication
- Port: 5001
**tests:**
- Run test suites
- Perl unit tests
- JavaScript tests
- pgTAP database tests
- No exposed ports (ephemeral)
**cron:**
- Scheduled tasks
- Statistics calculation
- Data cleanup
- Replication packet export
- No exposed ports
**sitemaps:**
- Generate XML sitemaps
- Update search engine indexes
- Run daily
- No exposed ports
**json-dump:**
- Export database to JSON
- Generate data dumps for download
- Run weekly
- No exposed ports
**solr-backup:**
- Backup Solr indexes
- Run daily
- No exposed ports
**template-renderer:**
- Isolated Template Toolkit renderer
- Forked from main process
- Prevents template errors from crashing main app
- IPC via Unix socket
### Docker Compose
**File:** `docker-compose.yml`
**Services:**
```yaml
services:
db:
image: postgres:16
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_USER: musicbrainz
POSTGRES_PASSWORD: musicbrainz
POSTGRES_DB: musicbrainz_db
ports:
- "5432:5432"
redis:
image: redis:7
volumes:
- redisdata:/data
ports:
- "6379:6379"
solr:
image: solr:8.11
volumes:
- solrdata:/var/solr
ports:
- "8983:8983"
website:
build:
context: .
dockerfile: docker/Dockerfile
target: website
depends_on:
- db
- redis
- solr
ports:
- "5000:5000"
environment:
MUSICBRAINZ_SERVER_PROCESSES: 10
MUSICBRAINZ_USE_PROXY: 1
webservice:
build:
context: .
dockerfile: docker/Dockerfile
target: webservice
depends_on:
- db
- redis
- solr
ports:
- "5001:5001"
volumes:
pgdata:
redisdata:
solrdata:
```
### Image Layers
**Base Layer (Ubuntu Noble):**
- System packages (build-essential, libpq-dev, etc.)
- Perl 5.38
- Node.js 20
- PostgreSQL client libraries
**Dependency Layer:**
- Perl modules (via carton)
- Node.js packages (via yarn)
- Cached for faster rebuilds
**Application Layer:**
- Application code
- Compiled assets
- Configuration templates
**Runtime Layer:**
- Minimal runtime dependencies
- No build tools
- Smaller image size
## PSGI Server Configuration
### Starlet
**Server:** Starlet (high-performance PSGI server)
**Protocol:** HTTP/1.1
**Concurrency:** Pre-forking worker model
**Configuration:**
```perl
# Start Starlet with 10 workers
starman --workers 10 \
--max-requests 100 \
--listen :5000 \
app.psgi
```
**Worker Settings:**
- **Workers:** 10 (configurable via `MUSICBRAINZ_SERVER_PROCESSES`)
- **Max Requests per Worker:** 30-90 (random to prevent thundering herd)
- **Worker Timeout:** 300 seconds (5 minutes)
- **Keepalive:** Enabled (60 seconds)
**Worker Lifecycle:**
1. Master process forks 10 workers
2. Each worker handles requests until max_requests reached
3. Worker exits gracefully
4. Master forks new worker to replace it
5. Prevents memory leaks from accumulating
### Server::Starter (Zero-Downtime Restarts)
**Purpose:** Enable zero-downtime deployments
**Mechanism:**
1. Server::Starter binds to port
2. Forks Starlet with inherited socket
3. On restart signal (HUP):
- Start new Starlet process
- New process binds to same socket
- Old process finishes existing requests
- Old process exits
- No dropped connections
**Command:**
```bash
start_server \
--port 5000 \
--pid-file /var/run/musicbrainz.pid \
--status-file /var/run/musicbrainz.status \
-- \
starman --workers 10 app.psgi
```
**Restart:**
```bash
# Send HUP signal to trigger graceful restart
kill -HUP $(cat /var/run/musicbrainz.pid)
```
**Status Check:**
```bash
# Check server status
cat /var/run/musicbrainz.status
# Output: 1234:5000 (PID:PORT)
```
### Reverse Proxy
**Production Setup:** Nginx reverse proxy in front of Starlet
**Nginx Configuration:**
```nginx
upstream musicbrainz {
server localhost:5000;
keepalive 32;
}
server {
listen 80;
server_name musicbrainz.org;
location / {
proxy_pass http://musicbrainz;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
location /static/ {
alias /var/www/musicbrainz/root/static/;
expires 1y;
add_header Cache-Control "public, immutable";
}
}
```
**Benefits:**
- SSL termination
- Static file serving
- Gzip compression
- Request buffering
- Load balancing (multiple Starlet instances)
## CI/CD Pipeline
### GitHub Actions
**Workflow File:** `.github/workflows/test.yml`
**Triggers:**
- Push to main branch
- Pull requests
- Manual workflow dispatch
### Build Stage
**Job:** `build-tests-image`
**Steps:**
1. Checkout code
2. Set up Docker Buildx
3. Build test Docker image
4. Push to GitHub Container Registry
5. Cache layers for faster rebuilds
**Dockerfile:** `docker/Dockerfile.test`
**Caching:**
- Perl dependencies cached by cpanfile.snapshot hash
- Node dependencies cached by yarn.lock hash
- Docker layer caching via GitHub Actions cache
### Test Stages
**Job:** `js-perl-and-pgtap`
**Matrix:**
- Perl 5.38.0 (stable)
- Perl 5.42.0 (latest)
**Steps:**
1. Pull test image from registry
2. Start PostgreSQL container
3. Start Redis container
4. Initialize test database
5. Run Perl tests (`prove -lr t/`)
6. Run JavaScript tests (`yarn test`)
7. Run pgTAP tests (`pg_prove -d musicbrainz_test t/pgtap/`)
8. Upload coverage reports
**Parallelization:** Tests run in parallel across matrix
### Selenium Tests
**Jobs:** `selenium-1`, `selenium-2`, `selenium-3`, `selenium-4`
**Partitioning:** Tests split into 4 partitions for parallel execution
**Steps:**
1. Pull test image
2. Start PostgreSQL, Redis, Solr
3. Start Selenium standalone Chrome
4. Initialize test database with sample data
5. Start MusicBrainz server
6. Run Selenium tests for partition
7. Upload screenshots on failure
**Partition Strategy:**
```bash
# Partition 1: Artist and release tests
# Partition 2: Recording and work tests
# Partition 3: Edit and relationship tests
# Partition 4: Search and browse tests
```
**Selenium Configuration:**
```perl
# t/selenium.pl
use Selenium::Remote::Driver;
my $driver = Selenium::Remote::Driver->new(
remote_server_addr => 'localhost',
port => 4444,
browser_name => 'chrome',
extra_capabilities => {
chromeOptions => {
args => ['--headless', '--no-sandbox', '--disable-dev-shm-usage'],
},
},
);
```
### Second-Tier Tests
**Job:** `second-perl-and-pgtap`
**Purpose:** Test against Perl 5.42.0 (latest stable)
**Trigger:** After main tests pass
**Allowed to Fail:** Yes (informational only)
### Report Generation
**Job:** `generate-reports`
**Steps:**
1. Download coverage reports from all test jobs
2. Merge coverage data
3. Generate HTML coverage report
4. Upload to Codecov
5. Comment on PR with coverage summary
**Coverage Tools:**
- Perl: Devel::Cover
- JavaScript: Istanbul/nyc
## Build Process
### Step 1: Install Perl Dependencies
```bash
# Install Carton (Perl dependency manager)
cpanm --notest Carton
# Install dependencies from cpanfile.snapshot
carton install --deployment
```
**Dependencies Installed:**
- Catalyst framework
- Moose object system
- DBD::Pg database driver
- Template::Toolkit
- JSON::XS
- XML::LibXML
- Redis client
- ~200 total CPAN modules
**Installation Time:** ~10 minutes (first time), ~1 minute (cached)
### Step 2: Install Node.js Dependencies
```bash
# Install Yarn (if not present)
npm install -g yarn
# Install dependencies from yarn.lock
yarn install --frozen-lockfile
```
**Dependencies Installed:**
- React 19.2.4
- Redux
- Webpack 5
- Babel 7
- Jest (testing)
- ESLint (linting)
- ~500 total npm packages
**Installation Time:** ~5 minutes (first time), ~30 seconds (cached)
### Step 3: Compile Static Resources
```bash
# Compile CSS, images, fonts
./script/compile_resources.sh
```
**Tasks:**
- Compile LESS to CSS
- Optimize images (pngcrush, optipng)
- Copy fonts to static directory
- Generate CSS sprites
- Minify CSS
**Output:** `root/static/styles/`, `root/static/images/`
**Time:** ~2 minutes
### Step 4: Build JavaScript Bundles
```bash
# Build production bundles with Webpack
yarn run build
# Or for development (with source maps)
yarn run build:dev
```
**Webpack Configuration:**
- Entry points: `root/static/scripts/main.js`, `root/static/scripts/edit.js`
- Output: `root/static/build/`
- Loaders: Babel (JSX, ES6+), CSS, file-loader
- Plugins: UglifyJS, ExtractTextPlugin, DefinePlugin
- Code splitting: Vendor bundle, async chunks
**Output Files:**
- `main.bundle.js` - Main application code
- `vendor.bundle.js` - Third-party libraries
- `edit.bundle.js` - Edit interface code
- `*.chunk.js` - Async-loaded chunks
**Time:** ~3 minutes (production), ~30 seconds (development)
### Step 5: Initialize Database
```bash
# Create database
createdb musicbrainz_db
# Load schema
psql musicbrainz_db < admin/sql/CreateTables.sql
# Load initial data
./admin/InitDb.pl --createdb --import
```
**Schema Loading:**
- 375 tables created
- 500+ foreign keys added
- Indexes created
- Triggers installed
**Initial Data:**
- Countries and areas
- Languages
- Relationship types
- Instrument types
- Genre definitions
**Time:** ~10 minutes (schema), ~30 minutes (sample data)
### Step 6: Build Search Indexes
```bash
# Build Solr indexes for all entities
./admin/BuildSearchIndexes.pl --all
```
**Indexes Built:**
- Artist index
- Release index
- Recording index
- Work index
- Label index
- Area, event, place, series, instrument indexes
**Time:** ~2 hours (full production data), ~5 minutes (sample data)
## System Requirements
### Minimum Requirements (Development)
**CPU:** 2 cores
**RAM:** 4 GB
**Disk:** 20 GB
**Database:** PostgreSQL 16+
**Cache:** Redis 6.0+
**Search:** Solr 8.11+
### Recommended Requirements (Production)
**CPU:** 8+ cores
**RAM:** 16+ GB
**Disk:** 500+ GB SSD
- 350 GB for PostgreSQL database
- 50 GB for Solr indexes
- 50 GB for backups
- 50 GB for logs and temp files
**Database:** PostgreSQL 16+ with:
- shared_buffers = 4GB
- effective_cache_size = 12GB
- work_mem = 64MB
- maintenance_work_mem = 1GB
**Cache:** Redis 6.0+ with:
- maxmemory = 2GB
- maxmemory-policy = allkeys-lru
**Search:** Solr 8.11+ with:
- Java heap = 4GB
- Solr cache = 512MB per core
### Network Requirements
**Bandwidth:** 100 Mbps+ (for replication and API traffic)
**Ports:**
- 5000 - Website
- 5001 - Web service API
- 5432 - PostgreSQL
- 6379 - Redis
- 8983 - Solr
**Firewall:**
- Allow inbound 80/443 (HTTP/HTTPS)
- Allow outbound 80/443 (external APIs)
- Restrict 5432, 6379, 8983 to localhost
### Software Requirements
**Operating System:**
- Ubuntu 24.04 LTS (Noble) - recommended
- Debian 12 (Bookworm)
- Any Linux with Perl 5.38+ and Node.js 20+
**Perl:** 5.38.0 or later (5.42.0 tested)
**Node.js:** 20.9.0 or later
**PostgreSQL:** 16.0 or later (16.3 recommended)
**Redis:** 6.0 or later (7.0 recommended)
**Solr:** 8.11 or later
**Optional:**
- Docker 24.0+
- Docker Compose 2.0+
- Nginx 1.24+ (reverse proxy)
- RabbitMQ 3.12+ (background jobs)
## Deployment Strategies
### Single Server
**Use Case:** Development, small mirrors
**Architecture:**
- All services on one server
- PostgreSQL, Redis, Solr, MusicBrainz on localhost
- Nginx reverse proxy
**Pros:**
- Simple setup
- Low cost
- Easy to manage
**Cons:**
- Single point of failure
- Limited scalability
- Resource contention
### Multi-Server
**Use Case:** Production, high-traffic mirrors
**Architecture:**
- Web tier: 2+ servers running MusicBrainz (load balanced)
- Database tier: PostgreSQL primary + replicas
- Cache tier: Redis (possibly clustered)
- Search tier: Solr (possibly sharded)
**Pros:**
- High availability
- Horizontal scalability
- Better performance
**Cons:**
- Complex setup
- Higher cost
- Requires load balancer
### Docker Swarm / Kubernetes
**Use Case:** Large-scale deployments, cloud environments
**Architecture:**
- Container orchestration
- Auto-scaling
- Service discovery
- Health checks
**Pros:**
- Automated deployment
- Self-healing
- Easy scaling
**Cons:**
- Steep learning curve
- Operational complexity
- Overhead
## Monitoring and Logging
### Logging
**Framework:** Log::Dispatch
**Log Levels:**
- DEBUG - Verbose debugging
- INFO - Informational messages
- WARN - Warnings
- ERROR - Errors
- FATAL - Fatal errors
**Log Destinations:**
- STDOUT (development)
- File (production): `/var/log/musicbrainz/server.log`
- Syslog (optional)
**Log Rotation:**
- Daily rotation
- Keep 30 days
- Compress old logs
### Error Tracking
**Platform:** Sentry
**Integration:**
- Server-side: Perl Sentry SDK
- Client-side: JavaScript Sentry SDK
**Captured:**
- Exceptions
- Error messages
- Stack traces
- Request context
- User context
### Metrics
**Current State:** No Prometheus/metrics endpoint
**Workaround:** Parse logs for metrics
**Future:** Prometheus exporter planned
### Health Checks
**Current State:** No dedicated health check endpoint
**Workaround:** Check `/` returns 200
**Future:** `/health` endpoint planned
@@ -0,0 +1,513 @@
# MusicBrainz Server Evaluation
## Strengths
### 1. Canonical Music Metadata Source
**Evidence:** MusicBrainz is the de facto standard for music metadata. Used by:
- Spotify (artist/release matching)
- Last.fm (scrobbling normalization)
- Roon (music library management)
- Picard (music tagging)
- Beets (music organization)
- Hundreds of other music applications
**Impact:** Any music metadata aggregator must include MusicBrainz data to be comprehensive. It's the foundation that other services build upon.
**Data Quality:** Community-driven editing with voting system ensures high accuracy. Over 2 million edits per year, with auto-editors providing quality control.
### 2. Massive, Comprehensive Dataset
**Scale (as of 2024):**
- 2.1+ million artists
- 3.5+ million releases
- 30+ million recordings
- 1.5+ million works
- 1.3+ million labels
- 100+ million relationships
**Coverage:** Extensive coverage across:
- All genres (classical, jazz, rock, electronic, world music, etc.)
- All eras (historical recordings to latest releases)
- All regions (global coverage with strong international community)
- All formats (vinyl, CD, digital, cassette, etc.)
**Relationships:** Rich relationship data connecting:
- Artists to recordings (performer, conductor, engineer, etc.)
- Recordings to works (performance of composition)
- Artists to artists (member of, collaboration, etc.)
- Releases to labels, areas, events, etc.
**Identifiers:** Comprehensive identifier coverage:
- ISRCs (International Standard Recording Code)
- ISWCs (International Standard Musical Work Code)
- Barcodes (EAN, UPC)
- Disc IDs (CD table of contents)
- External links (Wikipedia, Discogs, AllMusic, etc.)
### 3. Mature, Battle-Tested Codebase
**Age:** 15+ years of continuous development (since 2001)
**Stability:** Proven reliability serving millions of requests daily with minimal downtime.
**Evolution:** Gradual modernization while maintaining backward compatibility:
- Started with Template Toolkit (still used)
- Added Knockout.js (being phased out)
- Migrating to React (ongoing)
- API has remained stable since v2 (2011)
**Community:** Large, active open-source community:
- 500+ contributors on GitHub
- Active development (commits daily)
- Responsive to issues and pull requests
- Strong documentation culture
### 4. Comprehensive, Well-Designed API
**Maturity:** API v2 stable since 2011, widely adopted
**Formats:** Multiple serialization formats:
- JSON (modern, widely supported)
- XML (legacy, still used by many clients)
- JSON-LD (semantic web, Schema.org vocabulary)
**Features:**
- Lookup by MBID (unique identifier)
- Browse by relationships (all releases by artist, etc.)
- Search with Lucene query syntax
- Include parameters for fine-grained control
- Pagination for large result sets
- CORS enabled for browser clients
**Rate Limiting:** Reasonable limits (1 req/sec recommended) with clear documentation
**Authentication:** Modern OAuth2 with PKCE for user-specific operations
**Documentation:** Comprehensive API docs with examples at musicbrainz.org/doc/Development/XML_Web_Service/Version_2
### 5. Transparent Edit/Voting System
**Command Pattern:** All modifications are versioned edits, providing:
- Full audit trail (who changed what, when, why)
- Rollback capability (edits can be reverted)
- Transparency (all edits publicly visible)
- Accountability (editors build reputation)
**Community Quality Control:**
- 7-day voting period for most edits
- Community votes yes/no/abstain
- Auto-editors can approve immediately (earned privilege)
- Failed edits can be resubmitted with improvements
**Edit Types:** 100+ edit types covering all operations:
- Create/edit/delete entities
- Add/edit/delete relationships
- Merge duplicates
- Add identifiers (ISRC, barcode, etc.)
**Benefits:**
- High data quality through peer review
- Prevents vandalism and spam
- Encourages collaboration and discussion
- Builds trust in the data
### 6. Replication Support for Mirrors
**Architecture:** Master-Mirror via dbmirror2 packet system
**Use Cases:**
- Organizations needing local copy (reduced latency, offline access)
- High-volume API users (avoid rate limits)
- Research projects (full dataset access)
- Backup/disaster recovery
**Replication Packets:**
- Incremental updates (not full dumps)
- Hourly packets available
- Efficient bandwidth usage
- Verifiable integrity
**Mirror Benefits:**
- Full read access to entire dataset
- No rate limiting
- Custom queries and analytics
- Integration with internal systems
### 7. Rich Relationship Model
**Advanced Relationships:** Not just artist-to-release, but:
- Artist-to-artist (member of, collaboration, married to, etc.)
- Recording-to-work (performance of composition)
- Release-to-event (recorded at festival, etc.)
- Work-to-work (arrangement of, medley of, etc.)
**Relationship Attributes:**
- Dates (begin/end)
- Credits (custom artist credits)
- Instruments (performer played guitar, etc.)
- Roles (producer, engineer, etc.)
**Use Cases:**
- Music discovery (find similar artists)
- Discography completeness (all releases by artist)
- Session musician tracking (who played on what)
- Classical music (composer, conductor, orchestra, etc.)
## Weaknesses
### 1. Perl Language Ecosystem Decline
**Evidence:**
- Perl ranked #19 in TIOBE index (down from top 5 in 2000s)
- Declining CPAN module releases (peak 2014, declining since)
- Fewer Perl developers entering workforce
- Most new web projects use Python, JavaScript, Go, Rust
**Impact:**
- Harder to recruit Perl developers
- Smaller pool of contributors
- Slower adoption of modern practices
- Dependency on aging CPAN modules
**Mitigation:**
- MusicBrainz has stable, experienced Perl team
- Codebase is well-documented
- Gradual migration to JavaScript on frontend
- API allows language-agnostic integration
**Reality Check:** While Perl is declining, MusicBrainz's Perl codebase is mature and stable. The bigger risk is long-term maintainability (10+ years), not immediate functionality.
### 2. Heavy Infrastructure Requirements
**Database Size:** ~350GB for production dataset (with indexes)
**Resource Requirements:**
- 8+ CPU cores
- 16+ GB RAM
- 500+ GB SSD storage
- PostgreSQL 16+ (specific version requirement)
- Redis (16 databases)
- Apache Solr (13 cores)
**Deployment Complexity:**
- Multiple services to coordinate
- Complex build process (Perl + Node.js)
- Long initial setup (schema load, index build)
- Replication setup requires FTP server
**Cost Implications:**
- Self-hosting requires dedicated server (~$200+/month)
- Cloud hosting even more expensive
- Bandwidth costs for replication
- Operational overhead (backups, monitoring, updates)
**Practical Impact:** For most use cases, using the public API is far more practical than self-hosting. Only large organizations with specific needs (high volume, custom queries, offline access) should consider self-hosting.
### 3. No Modern Observability
**Missing:**
- Prometheus metrics endpoint
- Structured logging (JSON logs)
- Distributed tracing (OpenTelemetry)
- Health check endpoint
- Readiness/liveness probes
**Current State:**
- Plain text logs
- No metrics export
- Manual log parsing for monitoring
- No standardized health checks
**Impact:**
- Harder to integrate with modern monitoring stacks (Grafana, Datadog, etc.)
- Limited visibility into performance bottlenecks
- Difficult to debug production issues
- No SLO/SLA tracking
**Workarounds:**
- Parse logs with Logstash/Fluentd
- Monitor HTTP responses
- Database query monitoring
- Custom metrics collection
**Future:** Prometheus exporter is planned but not yet implemented.
### 4. Incomplete Frontend Modernization
**Legacy Code:**
- Knockout.js still present in many views
- jQuery used extensively
- Inline JavaScript in templates
- Mixed Template Toolkit + React
**Evidence:**
- `root/static/scripts/` contains both Knockout and React
- Some pages fully React, others fully Knockout, some mixed
- Inconsistent UI patterns across pages
**Impact:**
- Larger JavaScript bundle size
- Maintenance burden (two frameworks)
- Inconsistent user experience
- Harder for new contributors
**Migration Status:**
- New features use React
- Old features gradually migrated
- No timeline for complete migration
- Knockout removal is low priority
**Reality Check:** This is a cosmetic issue, not a functional one. The site works well despite the mixed frontend. For API users, this is irrelevant.
### 5. Custom ORM Instead of Standard
**Architecture:** Custom Moose-based data layer, not DBIx::Class
**Characteristics:**
- 106 Data modules (26,000 lines)
- Raw SQL via DBD::Pg
- Custom query builder (Sql.pm)
- Moose roles for common patterns
**Drawbacks:**
- Steeper learning curve for new contributors
- No ecosystem of plugins/extensions
- Manual query construction
- No automatic migrations
**Benefits:**
- Better performance (no ORM overhead)
- Full control over SQL
- Simpler for complex queries
- Fewer dependencies
**Reality Check:** The custom ORM is well-designed and battle-tested. It's not a weakness in functionality, but in onboarding and maintainability. For a project this mature, changing to a standard ORM would be a massive undertaking with little benefit.
### 6. Limited Real-Time Capabilities
**Current State:**
- No WebSocket support
- No Server-Sent Events
- No real-time notifications
- Polling required for updates
**Impact:**
- Edit notifications delayed
- Search results not live-updated
- Collaborative editing limited
- Higher server load from polling
**Workarounds:**
- Redis pub/sub for internal events
- Periodic polling from clients
- Email notifications for edits
**Future:** Real-time features not prioritized (low demand).
## Integration Considerations
### API Integration (Recommended)
**Best For:**
- Most use cases
- Low to medium volume (<1M requests/month)
- No custom query requirements
- Budget-conscious projects
**Approach:**
```python
import requests
# Lookup artist by MBID
response = requests.get(
'https://musicbrainz.org/ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da',
params={'fmt': 'json', 'inc': 'releases+recordings'},
headers={'User-Agent': 'MyApp/1.0 (contact@example.com)'}
)
artist = response.json()
```
**Advantages:**
- No infrastructure to manage
- Always up-to-date data
- No storage costs
- Simple integration
**Limitations:**
- Rate limiting (1 req/sec recommended)
- Network latency
- No custom queries
- Dependent on MusicBrainz uptime
**Best Practices:**
- Cache responses aggressively
- Respect rate limits
- Include User-Agent with contact info
- Handle errors gracefully
### Replication/Mirror (Advanced)
**Best For:**
- High volume (>10M requests/month)
- Custom queries and analytics
- Offline access required
- Research projects
**Approach:**
1. Set up PostgreSQL 16+ server (500GB+ storage)
2. Download initial database dump
3. Load schema and data
4. Configure replication (RT_MIRROR mode)
5. Download and apply hourly replication packets
**Advantages:**
- No rate limiting
- Full dataset access
- Custom queries
- Low latency
**Disadvantages:**
- High infrastructure cost (~$200+/month)
- Operational overhead
- Replication lag (minutes to hours)
- Storage requirements (350GB+)
**Maintenance:**
- Apply replication packets hourly
- Monitor replication lag
- Rebuild indexes periodically
- Backup database regularly
### Hybrid Approach (Optimal)
**Strategy:**
- Use API for lookups and searches
- Cache frequently accessed data locally
- Replicate subset of data for custom queries
- Fall back to API for cache misses
**Example:**
```python
# Check local cache first
artist = cache.get(f'artist:{mbid}')
if not artist:
# Cache miss - fetch from API
response = requests.get(f'https://musicbrainz.org/ws/2/artist/{mbid}')
artist = response.json()
# Cache for 1 hour
cache.set(f'artist:{mbid}', artist, ttl=3600)
return artist
```
**Benefits:**
- Lower API usage (respect rate limits)
- Faster response times
- Reduced infrastructure costs
- Graceful degradation
## Relevance to Metadata Aggregator Project
### Primary Data Source
**Role:** MusicBrainz is the foundational music metadata source. All other music metadata projects reference or build upon MusicBrainz:
- **Discogs:** Cross-references MusicBrainz IDs
- **Last.fm:** Uses MusicBrainz for artist/track normalization
- **AcousticBrainz:** Audio analysis keyed by MusicBrainz recording ID
- **ListenBrainz:** Listening history using MusicBrainz IDs
- **CritiqueBrainz:** Reviews keyed by MusicBrainz release ID
**Implication:** A metadata aggregator without MusicBrainz is incomplete. MusicBrainz provides the canonical identifiers (MBIDs) that link data across services.
### Integration Priority: Critical
**Rationale:**
1. **Canonical IDs:** MBIDs are the standard for music entity identification
2. **Comprehensive Coverage:** Largest open music metadata database
3. **Relationship Data:** Rich connections between entities
4. **Community Trust:** High data quality through peer review
5. **API Stability:** Mature, stable API with long-term support
**Recommended Integration:**
- Use MusicBrainz API as primary metadata source
- Cache responses locally (1-hour TTL)
- Use MBIDs as primary keys in aggregator database
- Cross-reference with other sources (Discogs, Last.fm, etc.)
- Contribute improvements back to MusicBrainz
### Data Model Alignment
**MusicBrainz Entities Map Well to Aggregator Needs:**
| MusicBrainz Entity | Aggregator Use Case |
|-------------------|---------------------|
| Artist | Artist profiles, discographies |
| Release | Album/single metadata |
| Recording | Track metadata, audio fingerprinting |
| Work | Composition metadata, cover detection |
| Label | Label discographies, release attribution |
| Relationship | Music discovery, session musician tracking |
**Identifiers:**
- MBID as primary key
- ISRC for recording matching
- Barcode for release matching
- Disc ID for CD identification
### Complementary Data Sources
**MusicBrainz Strengths:**
- Canonical entity IDs
- Relationship data
- Release metadata
- Identifier coverage
**MusicBrainz Gaps (fill with other sources):**
- Album reviews → CritiqueBrainz, AllMusic
- Listening statistics → Last.fm, Spotify
- Audio features → AcousticBrainz, Spotify
- Lyrics → LyricWiki, Genius
- Album art → Cover Art Archive (integrated)
- Popularity metrics → Last.fm, Spotify
### Implementation Roadmap
**Phase 1: Basic Integration**
1. Implement MusicBrainz API client
2. Cache artist/release/recording lookups
3. Store MBIDs as primary keys
4. Handle rate limiting gracefully
**Phase 2: Enhanced Integration**
1. Implement relationship traversal
2. Add search functionality
3. Integrate Cover Art Archive
4. Add identifier lookups (ISRC, barcode)
**Phase 3: Advanced Integration**
1. Consider replication for high volume
2. Contribute improvements to MusicBrainz
3. Implement edit submission (if applicable)
4. Add real-time update monitoring
**Phase 4: Ecosystem Integration**
1. Integrate complementary services (Last.fm, etc.)
2. Cross-reference data across sources
3. Resolve conflicts and duplicates
4. Build unified metadata view
## Conclusion
**Overall Assessment:** MusicBrainz is an essential, high-quality music metadata source with a mature codebase and comprehensive API. While it has some technical debt (Perl, legacy frontend, custom ORM), these are manageable and don't impact its value as a data source.
**Recommendation for Metadata Aggregator:**
- **Priority:** Critical - integrate early
- **Approach:** API-based with aggressive caching
- **Timeline:** Phase 1 in first sprint
- **Resources:** Low (API integration is straightforward)
**Key Takeaway:** MusicBrainz is the foundation of music metadata. Any serious music metadata aggregator must integrate MusicBrainz to be comprehensive and credible.
@@ -0,0 +1,529 @@
# MusicBrainz Server Integrations
## Cover Art Archive
### Overview
**Service:** Cover Art Archive (coverartarchive.org)
**Storage:** Amazon S3 + Internet Archive
**Purpose:** Store and serve album cover artwork
### Upload Process
**Method:** Signed POST to S3
**Authentication:** HMAC-SHA1 signed policy
**Configuration:**
```perl
# DBDefs.pm
sub COVER_ART_ARCHIVE_ACCESS_KEY { 'access_key' }
sub COVER_ART_ARCHIVE_SECRET_KEY { 'secret_key' }
sub COVER_ART_ARCHIVE_UPLOAD_PREFIXER { 'MB' }
sub COVER_ART_ARCHIVE_DOWNLOAD_PREFIX { 'https://coverartarchive.org' }
```
**Upload Flow:**
1. User uploads image via MusicBrainz interface
2. Server generates S3 policy document
3. Policy signed with HMAC-SHA1 using secret key
4. Browser POSTs directly to S3 with signed policy
5. S3 stores image and forwards to Internet Archive
6. Image becomes available at coverartarchive.org
**Policy Document:**
```json
{
"expiration": "2024-12-31T23:59:59Z",
"conditions": [
{"bucket": "mbid-{release_mbid}"},
{"acl": "public-read"},
["starts-with", "$key", "mbid-{release_mbid}/"],
["content-length-range", 0, 10485760]
]
}
```
**Signature:**
```perl
use Digest::SHA qw(hmac_sha1_base64);
my $policy_b64 = encode_base64($policy_json);
my $signature = hmac_sha1_base64($policy_b64, $secret_key);
$signature .= '=' while length($signature) % 4; # Pad to multiple of 4
```
### Retrieval
**URL Pattern:** `https://coverartarchive.org/release/{mbid}/front`
**Image Types:**
- `front` - Front cover
- `back` - Back cover
- `{id}` - Specific image by ID
**Sizes:**
- Original (full resolution)
- `250` - 250px thumbnail
- `500` - 500px thumbnail
- `1200` - 1200px large
**Example:**
```
https://coverartarchive.org/release/76df3287-6cda-33eb-8e9a-044b5e15ffdd/front-250.jpg
```
## Wikipedia/Wikidata/Wikimedia Commons
### MediaWiki API Integration
**Purpose:** Fetch article extracts, images, and structured data
**Endpoints:**
- Wikipedia: `https://{lang}.wikipedia.org/w/api.php`
- Wikidata: `https://www.wikidata.org/w/api.php`
- Commons: `https://commons.wikimedia.org/w/api.php`
### Wikipedia Extracts
**API Action:** `query` with `prop=extracts`
**Request:**
```perl
my $url = "https://en.wikipedia.org/w/api.php?" .
"action=query&" .
"prop=extracts&" .
"exintro=1&" .
"explaintext=1&" .
"titles=" . uri_escape($artist_name) .
"&format=json";
my $response = $ua->get($url);
my $data = decode_json($response->content);
```
**Caching:** 3 days for extracts
**Display:** Artist/release pages show Wikipedia extract in sidebar
### Language Links
**API Action:** `query` with `prop=langlinks`
**Purpose:** Find Wikipedia articles in different languages
**Request:**
```perl
my $url = "https://en.wikipedia.org/w/api.php?" .
"action=query&" .
"prop=langlinks&" .
"titles=" . uri_escape($title) .
"&lllimit=500&" .
"&format=json";
```
**Caching:** 7 days for language links
**Usage:** Display Wikipedia links in user's preferred language
### Wikidata Integration
**Purpose:** Fetch structured data (birth dates, locations, etc.)
**API Action:** `wbgetentities`
**Request:**
```perl
my $url = "https://www.wikidata.org/w/api.php?" .
"action=wbgetentities&" .
"ids=Q{wikidata_id}&" .
"format=json";
```
**Data Extracted:**
- Birth/death dates
- Birth/death places
- Occupations
- Genres
- Record labels
- Official websites
### Wikimedia Commons Images
**Purpose:** Fetch artist/band photos
**API Action:** `query` with `prop=imageinfo`
**Request:**
```perl
my $url = "https://commons.wikimedia.org/w/api.php?" .
"action=query&" .
"prop=imageinfo&" .
"iiprop=url|size|mime&" .
"titles=File:" . uri_escape($filename) .
"&format=json";
```
**Display:** Artist pages show Commons images in sidebar
## CritiqueBrainz
### Overview
**Service:** CritiqueBrainz (critiquebrainz.org)
**Purpose:** User-generated music reviews
### Integration
**Method:** URL linking
**Pattern:** `https://critiquebrainz.org/release/{mbid}`
**Display:** Release pages show link to CritiqueBrainz reviews
**Embedding:** Review count and average rating displayed on release pages
**API:** CritiqueBrainz API used to fetch review statistics
**Request:**
```perl
my $url = "https://critiquebrainz.org/ws/1/release/$mbid";
my $response = $ua->get($url);
my $data = decode_json($response->content);
my $review_count = $data->{review_count};
my $avg_rating = $data->{average_rating};
```
## Event Art Archive
### Overview
**Service:** Event Art Archive
**Purpose:** Store event posters and promotional materials
**Architecture:** Similar to Cover Art Archive (S3 + Internet Archive)
**URL Pattern:** `https://eventartarchive.org/event/{mbid}`
## Discourse SSO
### Overview
**Service:** MusicBrainz Community Forum (community.metabrainz.org)
**Protocol:** Discourse SSO (Single Sign-On)
### Authentication Flow
**Method:** HMAC-SHA256 signed payload
**Flow:**
1. User clicks "Log in" on Discourse
2. Discourse redirects to MusicBrainz with nonce
3. MusicBrainz authenticates user
4. MusicBrainz generates SSO payload
5. Payload signed with HMAC-SHA256
6. User redirected back to Discourse with signed payload
7. Discourse verifies signature and logs in user
**Configuration:**
```perl
# DBDefs.pm
sub DISCOURSE_SSO_SECRET { 'shared_secret' }
sub DISCOURSE_SERVER { 'https://community.metabrainz.org' }
```
**Payload Generation:**
```perl
use Digest::SHA qw(hmac_sha256_hex);
use MIME::Base64;
my $payload = encode_base64(
"nonce=$nonce&" .
"email=$email&" .
"external_id=$user_id&" .
"username=$username&" .
"name=$name"
);
my $signature = hmac_sha256_hex($payload, $sso_secret);
my $redirect_url = "$discourse_server/session/sso_login?" .
"sso=" . uri_escape($payload) .
"&sig=$signature";
```
**User Data Synced:**
- Email address
- Username
- Display name
- User ID (external_id)
- Avatar URL (optional)
- Admin status (optional)
- Moderator status (optional)
## MetaBrainz OAuth
### Overview
**Service:** Centralized OAuth provider for MetaBrainz services
**Protocol:** OAuth 2.0 with token introspection
### Token Introspection
**Endpoint:** `https://musicbrainz.org/oauth2/introspect`
**Method:** POST
**Request:**
```perl
my $response = $ua->post(
'https://musicbrainz.org/oauth2/introspect',
{
token => $access_token,
client_id => $client_id,
client_secret => $client_secret,
}
);
my $data = decode_json($response->content);
```
**Response:**
```json
{
"active": true,
"scope": "profile email tag rating collection",
"client_id": "client_id",
"username": "username",
"token_type": "Bearer",
"exp": 1609459200,
"iat": 1609372800,
"sub": "user_id"
}
```
**Usage:** Other MetaBrainz services (ListenBrainz, BookBrainz, etc.) validate tokens via introspection
### Services Using MetaBrainz OAuth
- ListenBrainz (listening history)
- BookBrainz (book metadata)
- CritiqueBrainz (music reviews)
- AcousticBrainz (audio analysis)
- Picard (music tagger)
## Replication System
### Overview
**Purpose:** Synchronize database changes from master to mirrors
**Protocol:** dbmirror2 packet system
### Replication Modes
**RT_MASTER:**
- Generates replication packets
- Writes to `dbmirror_pending` and `dbmirror_pendingdata` tables
- Exports packets for mirrors
**RT_MIRROR:**
- Consumes replication packets
- Applies changes from master
- Read-only (no edits)
**RT_STANDALONE:**
- No replication
- Fully independent database
**Configuration:**
```perl
# DBDefs.pm
sub REPLICATION_TYPE { RT_MASTER } # or RT_MIRROR or RT_STANDALONE
sub REPLICATION_ACCESS_TOKEN { 'secret_token' }
```
### Packet Structure
**Tables:**
- `dbmirror_pending` - Pending transactions
- `dbmirror_pendingdata` - Data changes (INSERT/UPDATE/DELETE)
**Packet Format:**
```
SeqId: 12345
TransactionId: 67890
Operation: i # i=INSERT, u=UPDATE, d=DELETE
TableName: artist
Data: {"id":123,"gid":"...","name":"..."}
```
### Replication Flow
**Master Side:**
1. Edit applied to database
2. Triggers capture changes to `dbmirror_pending`
3. Export script generates replication packets
4. Packets uploaded to FTP server
**Mirror Side:**
1. Download replication packets from FTP
2. Apply packets in sequence order
3. Update replication state
4. Verify data integrity
**Packet Export:**
```bash
# On master
./admin/replication/ExportReplicationPackets
# Generates packets in replication/ directory
# Uploads to FTP server
```
**Packet Import:**
```bash
# On mirror
./admin/replication/LoadReplicationChanges
# Downloads packets from FTP
# Applies changes to database
```
### Replication Lag
**Monitoring:** Mirrors track replication lag (time behind master)
**Typical Lag:** Minutes to hours depending on packet size and network
**Status Endpoint:** `/replication-status` shows current replication state
## Redis Integration
### Architecture
**Connection:** Single Redis instance, 16 databases (0-15)
**Configuration:**
```perl
# DBDefs.pm
sub REDIS_SERVER { 'localhost:6379' }
sub REDIS_NAMESPACE { 'MB' }
```
### Use Cases
**Session Management (DB 1):**
- Store user sessions
- 10 hour absolute expiry
- 3 hour idle timeout
**Entity Cache (DB 0):**
- Cache entity lookups by MBID
- 1 hour TTL
- Invalidate on edit
**Search Cache (DB 2):**
- Cache search results
- 15 minute TTL
**Statistics Cache (DB 3):**
- Cache homepage statistics
- 1 hour TTL
**Rate Limiting (DB 4):**
- Track API request counts
- 1 second sliding window
**Pub/Sub (DB 5):**
- Real-time notifications
- Edit submission events
- Cache invalidation events
### Connection Pooling
**Library:** Redis.pm with connection pooling
**Pool Size:** 10 connections per worker
**Reconnection:** Automatic reconnection on connection loss
## HTTP Client
### LWP::UserAgent
**Purpose:** HTTP client for external service communication
**Configuration:**
```perl
use LWP::UserAgent;
my $ua = LWP::UserAgent->new(
agent => 'MusicBrainz/1.0 (https://musicbrainz.org)',
timeout => 30,
max_redirect => 5,
);
```
**User-Agent:** Always identifies as MusicBrainz with contact URL
**Timeout:** 30 seconds default
**Redirects:** Follow up to 5 redirects
**SSL Verification:** Enabled by default
### Rate Limiting
**External Services:** Respect rate limits via delays
**Wikipedia API:** 1 request per second (recommended)
**Wikidata API:** 1 request per second (recommended)
**Implementation:**
```perl
use Time::HiRes qw(sleep);
my $last_request_time = 0;
sub rate_limited_request {
my ($url) = @_;
my $elapsed = time() - $last_request_time;
if ($elapsed < 1) {
sleep(1 - $elapsed);
}
my $response = $ua->get($url);
$last_request_time = time();
return $response;
}
```
### Error Handling
**Retry Logic:** Exponential backoff for transient errors
**Timeouts:** Fail gracefully on timeout
**Logging:** Log all external service errors to Sentry
**Example:**
```perl
use Try::Tiny;
my $response;
my $retries = 3;
for my $attempt (1..$retries) {
try {
$response = $ua->get($url);
last if $response->is_success;
} catch {
warn "Request failed (attempt $attempt): $_";
sleep(2 ** $attempt); # Exponential backoff
};
}
```
@@ -0,0 +1,271 @@
# MusicBrainz Server Overview
## Project Identity
**Name:** MusicBrainz Server
**Repository:** https://github.com/metabrainz/musicbrainz-server
**License:** GPL-2.0+
**Description:** Open music encyclopedia that collects music metadata and makes it available to the public. Community-maintained database of music information including artists, releases, recordings, works, labels, and the relationships between them.
## Technology Stack
### Backend
**Primary Language:** Perl 5.38+
**Web Framework:** Catalyst (MVC framework)
**Object System:** Moose (modern Perl OOP)
**Core Perl Dependencies:**
- Catalyst::Runtime - Web application framework
- Moose - Modern object system for Perl
- DBD::Pg - PostgreSQL database driver
- Template::Toolkit - Template processing system
- Plack - PSGI toolkit and server adapters
- Redis - Perl Redis client
- JSON::XS - Fast JSON encoding/decoding
- XML::LibXML - XML processing
- DBIx::Connector - Fast, safe DBI connection management
- Readonly - Facility for creating read-only scalars, arrays, hashes
- Digest::SHA - SHA message digest algorithm
- LWP::UserAgent - HTTP client
- DateTime - Date and time object
- List::AllUtils - List manipulation utilities
- Try::Tiny - Minimal try/catch
- Class::Load - Load modules by name
- namespace::autoclean - Keep imports out of namespace
### Frontend
**Primary Language:** JavaScript (ES6+)
**UI Framework:** React 19.2.4
**State Management:** Redux
**Legacy Framework:** Knockout.js (still present in some views)
**Core JavaScript Dependencies:**
- React 19.2.4 - UI component library
- Redux - State management
- Webpack 5 - Module bundler
- Babel 7 - JavaScript compiler
- knockout - Legacy MVVM framework
- jQuery - DOM manipulation (legacy)
- lodash - Utility library
- immutable - Immutable data structures
- weight-balanced-tree - Efficient tree data structure
### Infrastructure
**Database:** PostgreSQL 16+
- 375 tables
- 500+ foreign key constraints
- Full-text search capabilities
- Custom replication via dbmirror2
**Cache:** Redis
- 16 separate databases
- Entity caching
- Session storage
- Pub/sub messaging
**Search:** Apache Solr
- Primary search engine
- PostgreSQL full-text as fallback
**Message Queue:** RabbitMQ (for background jobs)
## System Prerequisites
**Required:**
- Perl 5.38+ (5.42.0 tested in CI)
- Node.js 20.9+
- PostgreSQL 16+
- Redis 6.0+
- Apache Solr 8.11+
**Optional:**
- Docker + Docker Compose (for containerized deployment)
- RabbitMQ (for background job processing)
## Entry Point
**File:** `app.psgi`
**Initialization Flow:**
1. `app.psgi` loads the Plack middleware stack
2. Initializes `MusicBrainz::Server` Catalyst application
3. Loads configuration from `DBDefs.pm`
4. Establishes database connections via `DBIx::Connector`
5. Initializes Redis connection pool
6. Forks template renderer process for isolation
7. Loads Catalyst controllers, models, and views
8. Mounts PSGI application
**Middleware Stack:**
- Plack::Middleware::ReverseProxy - Handle X-Forwarded headers
- Plack::Middleware::Static - Serve static files
- Plack::Middleware::Session - Session management
- Custom middleware for CSRF protection
- Custom middleware for request logging
## Codebase Scale
**Perl:**
- 1,866 Perl files
- 53 controllers (13,000 lines)
- 106 Data modules (26,000 lines)
- 132 entity classes
- 43 form modules
- 4 view modules
**JavaScript:**
- 1,447 JavaScript files
- React components
- Redux reducers and actions
- Legacy Knockout view models
**Database:**
- 375 tables
- 332 migration files
- 4,068 lines in CreateTables.sql
**Tests:**
- Perl unit tests (t/)
- JavaScript tests (Jest)
- pgTAP database tests
- Selenium integration tests (4 partitions)
## Build Process
### Perl Dependencies
```bash
# Install Carton (Perl dependency manager)
cpanm Carton
# Install Perl dependencies from cpanfile.snapshot
carton install
```
### JavaScript Dependencies
```bash
# Install Node.js dependencies
yarn install
```
### Asset Compilation
```bash
# Compile static resources (CSS, images, fonts)
./script/compile_resources.sh
# Build JavaScript bundles with Webpack
yarn run build
```
**Build Outputs:**
- `root/static/build/` - Compiled JavaScript bundles
- `root/static/styles/` - Compiled CSS
- `root/static/images/` - Optimized images
## Run Commands
### Development
```bash
# Using plackup (development server)
plackup -Ilib -r app.psgi
# With auto-reload on file changes
plackup -Ilib -R lib,root -r app.psgi
```
### Production
```bash
# Using Starman (production PSGI server)
starman --workers 10 --listen :5000 app.psgi
# Using Server::Starter for zero-downtime restarts
start_server --port 5000 -- starman --workers 10 app.psgi
```
### Docker
```bash
# Build Docker images
docker-compose build
# Start all services
docker-compose up -d
# Start specific service
docker-compose up -d website
```
**Available Services:**
- `website` - Main web application
- `webservice` - API service
- `cron` - Scheduled tasks
- `sitemaps` - Sitemap generation
- `json-dump` - JSON data dumps
- `solr-backup` - Solr index backup
- `tests` - Test runner
## Directory Structure
```
musicbrainz-server/
├── admin/ # Database schema and migrations
│ ├── sql/
│ │ ├── CreateTables.sql
│ │ └── updates/ # 332 migration files
├── lib/ # Perl application code
│ └── MusicBrainz/
│ └── Server/
│ ├── Controller/ # 53 controllers
│ ├── Data/ # 106 data access modules
│ ├── Entity/ # 132 entity classes
│ ├── Form/ # 43 form handlers
│ ├── View/ # 4 view modules
│ ├── WebService/ # API implementation
│ └── Edit/ # Edit system
├── root/ # Frontend assets
│ ├── static/ # Static files
│ │ ├── scripts/ # JavaScript source
│ │ ├── styles/ # CSS/LESS
│ │ └── images/
│ └── layout.tt # Main template
├── t/ # Perl tests
├── docker/ # Docker configuration
├── script/ # Utility scripts
├── app.psgi # PSGI entry point
├── cpanfile # Perl dependencies
├── package.json # Node.js dependencies
└── webpack.config.js # Webpack configuration
```
## Configuration
**Primary Config:** `lib/DBDefs.pm`
**Two-Tier System:**
1. `lib/DBDefs/Default.pm` - Default values
2. `lib/DBDefs.pm` - Instance-specific overrides (not in git)
**Key Configuration Areas:**
- Database connection strings
- Redis connection parameters
- Solr endpoints
- External service credentials (Cover Art Archive, Wikipedia, etc.)
- Session settings
- Email configuration
- OAuth2 settings
- Feature flags
## Status
**Active Development:** Continuous development since 2001 (15+ years)
**Production Status:** Stable, serving millions of requests daily
**Community:** Large open-source community with hundreds of contributors
**Data Quality:** Community-driven editing with voting system ensures high quality
**API Usage:** Powers metadata for major music services and applications worldwide