feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,416 @@
|
||||
# MusicBrainz Server API
|
||||
|
||||
## Base Endpoint
|
||||
|
||||
`/ws/2/{entity}/{mbid}`
|
||||
|
||||
**Version:** 2 (current stable)
|
||||
**Protocol:** HTTPS (HTTP redirects to HTTPS)
|
||||
**Base URL:** `https://musicbrainz.org/ws/2/`
|
||||
|
||||
## Endpoint Reference
|
||||
|
||||
### Core Entities (13)
|
||||
|
||||
| Entity | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| artist | `/ws/2/artist/{mbid}` | Artists, bands, orchestras, choirs, characters |
|
||||
| release | `/ws/2/release/{mbid}` | Physical or digital release of recordings |
|
||||
| recording | `/ws/2/recording/{mbid}` | Unique audio recording |
|
||||
| release-group | `/ws/2/release-group/{mbid}` | Logical grouping of releases |
|
||||
| work | `/ws/2/work/{mbid}` | Musical composition or song |
|
||||
| label | `/ws/2/label/{mbid}` | Record label or imprint |
|
||||
| area | `/ws/2/area/{mbid}` | Geographic region (country, city, etc.) |
|
||||
| event | `/ws/2/event/{mbid}` | Concert, festival, or other music event |
|
||||
| place | `/ws/2/place/{mbid}` | Venue, studio, or other location |
|
||||
| series | `/ws/2/series/{mbid}` | Ordered sequence of entities |
|
||||
| instrument | `/ws/2/instrument/{mbid}` | Musical instrument |
|
||||
| genre | `/ws/2/genre/{mbid}` | Music genre |
|
||||
| url | `/ws/2/url/{mbid}` | External URL relationship |
|
||||
|
||||
### Identifier Lookups (3)
|
||||
|
||||
| Lookup | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| discid | `/ws/2/discid/{discid}` | CD table of contents lookup |
|
||||
| isrc | `/ws/2/isrc/{isrc}` | International Standard Recording Code |
|
||||
| iswc | `/ws/2/iswc/{iswc}` | International Standard Musical Work Code |
|
||||
|
||||
### User Data Endpoints
|
||||
|
||||
| Endpoint | Methods | Description |
|
||||
|----------|---------|-------------|
|
||||
| `/ws/2/collection` | GET, POST, PUT, DELETE | User collections |
|
||||
| `/ws/2/{entity}/{mbid}/tags` | GET, POST | User tags |
|
||||
| `/ws/2/{entity}/{mbid}/ratings` | GET, POST | User ratings (0-100) |
|
||||
| `/ws/2/{entity}/{mbid}/annotation` | GET | User annotations |
|
||||
|
||||
## HTTP Methods
|
||||
|
||||
### GET - Lookup
|
||||
|
||||
Retrieve a single entity by MBID:
|
||||
|
||||
```
|
||||
GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da
|
||||
```
|
||||
|
||||
### GET - Browse
|
||||
|
||||
Browse entities related to another entity:
|
||||
|
||||
```
|
||||
GET /ws/2/release?artist=5b11f4ce-a62d-471e-81fc-a69a8278c7da
|
||||
```
|
||||
|
||||
### GET - Search
|
||||
|
||||
Search entities using Lucene query syntax:
|
||||
|
||||
```
|
||||
GET /ws/2/artist?query=artist:nirvana AND country:US
|
||||
```
|
||||
|
||||
### POST - Submit
|
||||
|
||||
Submit new data (requires authentication):
|
||||
|
||||
```
|
||||
POST /ws/2/recording/{mbid}?client={client_id}
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"isrcs": ["USRC17607839"]
|
||||
}
|
||||
```
|
||||
|
||||
### PUT - Add to Collection
|
||||
|
||||
Add entities to a collection (semicolon-separated MBIDs):
|
||||
|
||||
```
|
||||
PUT /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2};{mbid3}
|
||||
```
|
||||
|
||||
### DELETE - Remove from Collection
|
||||
|
||||
Remove entities from a collection:
|
||||
|
||||
```
|
||||
DELETE /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2}
|
||||
```
|
||||
|
||||
## Query Parameters
|
||||
|
||||
### Format Parameter
|
||||
|
||||
**Parameter:** `fmt`
|
||||
**Values:** `xml`, `json`
|
||||
**Default:** `xml`
|
||||
|
||||
```
|
||||
/ws/2/artist/{mbid}?fmt=json
|
||||
```
|
||||
|
||||
### Include Parameters (inc)
|
||||
|
||||
Control which related data to include in the response. Multiple values separated by `+`.
|
||||
|
||||
**Common Includes (all entities):**
|
||||
- `aliases` - Alternative names
|
||||
- `annotation` - Latest annotation
|
||||
- `tags` - Folksonomy tags
|
||||
- `user-tags` - Tags submitted by authenticated user
|
||||
- `genres` - Genre tags
|
||||
- `user-genres` - Genres submitted by authenticated user
|
||||
- `ratings` - Average rating
|
||||
- `user-ratings` - Rating submitted by authenticated user
|
||||
|
||||
**Entity-Specific Includes:**
|
||||
|
||||
**Artist:**
|
||||
- `recordings` - Recordings by this artist
|
||||
- `releases` - Releases by this artist
|
||||
- `release-groups` - Release groups by this artist
|
||||
- `works` - Works by this artist
|
||||
- `artist-rels` - Relationships to other artists
|
||||
- `label-rels` - Relationships to labels
|
||||
- `recording-rels` - Relationships to recordings
|
||||
- `release-rels` - Relationships to releases
|
||||
- `release-group-rels` - Relationships to release groups
|
||||
- `url-rels` - Relationships to URLs
|
||||
- `work-rels` - Relationships to works
|
||||
|
||||
**Release:**
|
||||
- `artist-credits` - Artist credits for the release
|
||||
- `labels` - Labels for the release
|
||||
- `recordings` - Recordings on the release
|
||||
- `release-groups` - Release group for this release
|
||||
- `media` - Media (discs) in the release
|
||||
- `discids` - Disc IDs associated with the release
|
||||
- `isrcs` - ISRCs for recordings on the release
|
||||
|
||||
**Recording:**
|
||||
- `artist-credits` - Artist credits for the recording
|
||||
- `releases` - Releases containing this recording
|
||||
- `isrcs` - ISRCs for this recording
|
||||
- `work-rels` - Works this recording is a performance of
|
||||
|
||||
**Release Group:**
|
||||
- `artist-credits` - Artist credits for the release group
|
||||
- `releases` - Releases in this group
|
||||
|
||||
**Work:**
|
||||
- `artist-rels` - Artists related to this work (composers, lyricists)
|
||||
- `recording-rels` - Recordings of this work
|
||||
|
||||
**Example:**
|
||||
```
|
||||
/ws/2/release/{mbid}?inc=artist-credits+labels+recordings+media
|
||||
```
|
||||
|
||||
### Browse Parameters
|
||||
|
||||
Browse entities related to another entity:
|
||||
|
||||
**Parameters:**
|
||||
- `artist={mbid}` - Browse by artist
|
||||
- `release={mbid}` - Browse by release
|
||||
- `release-group={mbid}` - Browse by release group
|
||||
- `recording={mbid}` - Browse by recording
|
||||
- `work={mbid}` - Browse by work
|
||||
- `label={mbid}` - Browse by label
|
||||
- `area={mbid}` - Browse by area
|
||||
- `collection={mbid}` - Browse by collection
|
||||
- `track_artist={mbid}` - Browse by track artist
|
||||
|
||||
**Example:**
|
||||
```
|
||||
/ws/2/recording?artist=5b11f4ce-a62d-471e-81fc-a69a8278c7da&limit=100
|
||||
```
|
||||
|
||||
### Pagination Parameters
|
||||
|
||||
**Parameters:**
|
||||
- `limit` - Number of results (max 100, default 25)
|
||||
- `offset` - Starting offset (default 0)
|
||||
|
||||
**Example:**
|
||||
```
|
||||
/ws/2/artist?query=nirvana&limit=100&offset=100
|
||||
```
|
||||
|
||||
### Search Parameter
|
||||
|
||||
**Parameter:** `query`
|
||||
**Syntax:** Lucene query syntax
|
||||
|
||||
**Example:**
|
||||
```
|
||||
/ws/2/artist?query=artist:nirvana AND country:US AND type:group
|
||||
```
|
||||
|
||||
## Response Formats
|
||||
|
||||
### XML Format
|
||||
|
||||
**Namespace:** `http://musicbrainz.org/ns/mmd-2.0#`
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#">
|
||||
<artist id="5b11f4ce-a62d-471e-81fc-a69a8278c7da" type="Group">
|
||||
<name>Nirvana</name>
|
||||
<sort-name>Nirvana</sort-name>
|
||||
<country>US</country>
|
||||
<life-span>
|
||||
<begin>1987</begin>
|
||||
<end>1994-04-05</end>
|
||||
<ended>true</ended>
|
||||
</life-span>
|
||||
</artist>
|
||||
</metadata>
|
||||
```
|
||||
|
||||
### JSON Format
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
|
||||
"type": "Group",
|
||||
"name": "Nirvana",
|
||||
"sort-name": "Nirvana",
|
||||
"country": "US",
|
||||
"life-span": {
|
||||
"begin": "1987",
|
||||
"end": "1994-04-05",
|
||||
"ended": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Authentication
|
||||
|
||||
### OAuth2 Bearer Token
|
||||
|
||||
**Primary authentication method for user-specific operations.**
|
||||
|
||||
**Header:**
|
||||
```
|
||||
Authorization: Bearer {access_token}
|
||||
```
|
||||
|
||||
**Token Endpoint:** `https://musicbrainz.org/oauth2/token`
|
||||
**Authorization Endpoint:** `https://musicbrainz.org/oauth2/authorize`
|
||||
|
||||
**Grant Types:**
|
||||
- Authorization Code (with PKCE)
|
||||
- Refresh Token
|
||||
|
||||
### HTTP Digest Authentication
|
||||
|
||||
**Legacy authentication method, still supported.**
|
||||
|
||||
**Header:**
|
||||
```
|
||||
Authorization: Digest username="user", realm="musicbrainz.org", ...
|
||||
```
|
||||
|
||||
## OAuth Scopes
|
||||
|
||||
| Scope | Description |
|
||||
|-------|-------------|
|
||||
| `profile` | Read user profile information |
|
||||
| `email` | Read user email address |
|
||||
| `tag` | Submit and modify tags |
|
||||
| `rating` | Submit and modify ratings |
|
||||
| `collection` | Create and modify collections |
|
||||
| `submit_barcode` | Submit barcodes to releases |
|
||||
| `submit_isrc` | Submit ISRCs to recordings |
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
**Limits:**
|
||||
- Maximum 100 items per page
|
||||
- 1 request per second (recommended)
|
||||
- Client identification required for POST requests
|
||||
|
||||
**Client Identification:**
|
||||
|
||||
All POST requests must include a `client` parameter:
|
||||
|
||||
```
|
||||
POST /ws/2/recording/{mbid}?client=MyApp-1.0
|
||||
```
|
||||
|
||||
**Format:** `{application_name}-{version}`
|
||||
|
||||
**Rate Limit Headers:**
|
||||
```
|
||||
X-RateLimit-Limit: 100
|
||||
X-RateLimit-Remaining: 95
|
||||
X-RateLimit-Reset: 1609459200
|
||||
```
|
||||
|
||||
## CORS Support
|
||||
|
||||
**Enabled:** Yes
|
||||
**Allowed Origins:** `*`
|
||||
**Allowed Methods:** GET, POST, PUT, DELETE
|
||||
**Allowed Headers:** Authorization, Content-Type
|
||||
|
||||
## Error Codes
|
||||
|
||||
| Code | Description |
|
||||
|------|-------------|
|
||||
| 400 | Bad Request - Invalid parameters or malformed request |
|
||||
| 401 | Unauthorized - Authentication required |
|
||||
| 403 | Forbidden - Insufficient permissions |
|
||||
| 404 | Not Found - Entity does not exist |
|
||||
| 405 | Method Not Allowed - HTTP method not supported for this endpoint |
|
||||
| 406 | Not Acceptable - Requested format not available |
|
||||
| 415 | Unsupported Media Type - Invalid Content-Type |
|
||||
| 501 | Not Implemented - Feature not yet implemented |
|
||||
| 503 | Service Unavailable - Server overloaded or maintenance |
|
||||
|
||||
**Error Response (JSON):**
|
||||
```json
|
||||
{
|
||||
"error": "Not Found",
|
||||
"help": "For usage, please see: https://musicbrainz.org/doc/Development/XML_Web_Service/Version_2"
|
||||
}
|
||||
```
|
||||
|
||||
## Example Requests
|
||||
|
||||
### Lookup Artist with Releases
|
||||
|
||||
```
|
||||
GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases+release-groups&fmt=json
|
||||
```
|
||||
|
||||
### Search for Recordings
|
||||
|
||||
```
|
||||
GET /ws/2/recording?query=recording:"Smells Like Teen Spirit" AND artist:nirvana&fmt=json
|
||||
```
|
||||
|
||||
### Browse Releases by Artist
|
||||
|
||||
```
|
||||
GET /ws/2/release?artist=5b11f4ce-a62d-471e-81fc-a69a8278c7da&limit=100&offset=0&fmt=json
|
||||
```
|
||||
|
||||
### Submit ISRC
|
||||
|
||||
```
|
||||
POST /ws/2/recording/5b11f4ce-a62d-471e-81fc-a69a8278c7da?client=MyApp-1.0
|
||||
Authorization: Bearer {token}
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"isrcs": ["USRC17607839"]
|
||||
}
|
||||
```
|
||||
|
||||
### Add Releases to Collection
|
||||
|
||||
```
|
||||
PUT /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2};{mbid3}
|
||||
Authorization: Bearer {token}
|
||||
```
|
||||
|
||||
## Collection Management
|
||||
|
||||
Collections allow users to organize entities (releases, artists, etc.).
|
||||
|
||||
**List User Collections:**
|
||||
```
|
||||
GET /ws/2/collection?fmt=json
|
||||
Authorization: Bearer {token}
|
||||
```
|
||||
|
||||
**Get Collection Contents:**
|
||||
```
|
||||
GET /ws/2/collection/{collection_mbid}/releases?fmt=json
|
||||
```
|
||||
|
||||
**Add to Collection (semicolon-separated MBIDs):**
|
||||
```
|
||||
PUT /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2};{mbid3}
|
||||
```
|
||||
|
||||
**Remove from Collection:**
|
||||
```
|
||||
DELETE /ws/2/collection/{collection_mbid}/releases/{mbid1};{mbid2}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always include a User-Agent header** identifying your application
|
||||
2. **Respect rate limits** - 1 request per second recommended
|
||||
3. **Use client parameter** for all POST requests
|
||||
4. **Cache responses** when appropriate
|
||||
5. **Use inc parameters** to minimize requests
|
||||
6. **Handle errors gracefully** with exponential backoff
|
||||
7. **Use HTTPS** for all requests (HTTP redirects to HTTPS)
|
||||
@@ -0,0 +1,568 @@
|
||||
# MusicBrainz Server Architecture
|
||||
|
||||
## Design Pattern
|
||||
|
||||
Hybrid MVC + Service Layer architecture built on the Catalyst web framework. The application follows a layered approach with clear separation of concerns between presentation, business logic, and data access.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
lib/MusicBrainz/Server/
|
||||
├── Controller/ # 53 controllers, 13,000 lines
|
||||
│ ├── Artist.pm
|
||||
│ ├── Release.pm
|
||||
│ ├── Recording.pm
|
||||
│ ├── WS/ # Web Service controllers
|
||||
│ │ └── 2/ # API version 2
|
||||
│ └── ...
|
||||
├── Data/ # 106 modules, 26,000 lines
|
||||
│ ├── Artist.pm
|
||||
│ ├── Release.pm
|
||||
│ ├── Recording.pm
|
||||
│ ├── Relationship.pm
|
||||
│ └── ...
|
||||
├── Entity/ # 132 entity classes
|
||||
│ ├── Artist.pm
|
||||
│ ├── Release.pm
|
||||
│ ├── Recording.pm
|
||||
│ ├── Types.pm
|
||||
│ └── ...
|
||||
├── Form/ # 43 form handlers
|
||||
│ ├── Artist.pm
|
||||
│ ├── Release.pm
|
||||
│ └── ...
|
||||
├── View/ # 4 view modules
|
||||
│ ├── Default.pm # Template Toolkit
|
||||
│ ├── JSON.pm
|
||||
│ ├── XML.pm
|
||||
│ └── JSONLD.pm
|
||||
├── WebService/ # API implementation
|
||||
│ ├── Serializer/
|
||||
│ │ ├── JSON/
|
||||
│ │ ├── XML/
|
||||
│ │ └── JSONLD/
|
||||
│ └── Validator.pm
|
||||
├── Edit/ # Edit system
|
||||
│ ├── Artist/
|
||||
│ ├── Release/
|
||||
│ ├── Recording/
|
||||
│ └── ...
|
||||
├── Context.pm # Service layer coordinator
|
||||
├── DBDefs.pm # Configuration
|
||||
└── Sql.pm # SQL abstraction layer
|
||||
|
||||
admin/ # Database administration
|
||||
├── sql/
|
||||
│ ├── CreateTables.sql # Schema definition (4,068 lines)
|
||||
│ └── updates/ # 332 migration files
|
||||
|
||||
root/ # Frontend assets
|
||||
├── static/
|
||||
│ ├── scripts/ # JavaScript source
|
||||
│ │ ├── common/
|
||||
│ │ ├── edit/
|
||||
│ │ └── release/
|
||||
│ ├── styles/ # CSS/LESS
|
||||
│ └── images/
|
||||
└── layout.tt # Main template
|
||||
|
||||
t/ # Tests
|
||||
├── lib/ # Test utilities
|
||||
├── pgtap/ # Database tests
|
||||
└── selenium/ # Integration tests
|
||||
```
|
||||
|
||||
## Architectural Layers
|
||||
|
||||
### Controller Layer (53 modules, 13,000 lines)
|
||||
|
||||
**Responsibility:** Handle HTTP requests, coordinate business logic, render responses.
|
||||
|
||||
**Key Controllers:**
|
||||
- `Artist.pm` - Artist entity operations
|
||||
- `Release.pm` - Release entity operations
|
||||
- `Recording.pm` - Recording entity operations
|
||||
- `ReleaseGroup.pm` - Release group operations
|
||||
- `Work.pm` - Work entity operations
|
||||
- `Label.pm` - Label entity operations
|
||||
- `Edit.pm` - Edit submission and voting
|
||||
- `Search.pm` - Search interface
|
||||
- `WS::2::*` - Web service API endpoints
|
||||
|
||||
**Controller Pattern:**
|
||||
```perl
|
||||
package MusicBrainz::Server::Controller::Artist;
|
||||
use Moose;
|
||||
BEGIN { extends 'MusicBrainz::Server::Controller' }
|
||||
|
||||
sub show : Path Args(1) {
|
||||
my ($self, $c, $gid) = @_;
|
||||
my $artist = $c->model('Artist')->get_by_gid($gid);
|
||||
$c->stash( artist => $artist );
|
||||
}
|
||||
```
|
||||
|
||||
**Responsibilities:**
|
||||
- Request validation
|
||||
- Authentication/authorization checks
|
||||
- Coordinate Data layer calls
|
||||
- Prepare data for views
|
||||
- Handle form submissions
|
||||
|
||||
### Data Layer (106 modules, 26,000 lines)
|
||||
|
||||
**Responsibility:** Repository pattern for database access. Each entity has a corresponding Data module.
|
||||
|
||||
**Key Data Modules:**
|
||||
- `Data::Artist` - Artist CRUD operations
|
||||
- `Data::Release` - Release CRUD operations
|
||||
- `Data::Recording` - Recording CRUD operations
|
||||
- `Data::Relationship` - Relationship management
|
||||
- `Data::Edit` - Edit persistence
|
||||
- `Data::Search` - Search operations
|
||||
|
||||
**Data Module Pattern:**
|
||||
```perl
|
||||
package MusicBrainz::Server::Data::Artist;
|
||||
use Moose;
|
||||
extends 'MusicBrainz::Server::Data::Entity';
|
||||
|
||||
sub _table { 'artist' }
|
||||
sub _entity_class { 'MusicBrainz::Server::Entity::Artist' }
|
||||
|
||||
sub get_by_gid {
|
||||
my ($self, $gid) = @_;
|
||||
return $self->_get_by_key('gid', $gid);
|
||||
}
|
||||
```
|
||||
|
||||
**Moose Roles:**
|
||||
- `Role::Editable` - Entities that can be edited
|
||||
- `Role::Taggable` - Entities that can be tagged
|
||||
- `Role::Rateable` - Entities that can be rated
|
||||
- `Role::Relatable` - Entities that can have relationships
|
||||
- `Role::Aliasable` - Entities that can have aliases
|
||||
- `Role::Annotation` - Entities that can be annotated
|
||||
|
||||
**Data Access Pattern:**
|
||||
- No ORM (not DBIx::Class)
|
||||
- Custom Moose-based abstraction
|
||||
- Raw SQL via `DBD::Pg`
|
||||
- `DBIx::Connector` for connection pooling
|
||||
- `Sql.pm` provides query builder utilities
|
||||
|
||||
### Entity Layer (132 classes)
|
||||
|
||||
**Responsibility:** Domain objects representing database entities.
|
||||
|
||||
**Key Entities:**
|
||||
- `Entity::Artist` - Artist domain object
|
||||
- `Entity::Release` - Release domain object
|
||||
- `Entity::Recording` - Recording domain object
|
||||
- `Entity::ReleaseGroup` - Release group domain object
|
||||
- `Entity::Work` - Work domain object
|
||||
- `Entity::Label` - Label domain object
|
||||
- `Entity::Relationship` - Relationship between entities
|
||||
|
||||
**Entity Pattern:**
|
||||
```perl
|
||||
package MusicBrainz::Server::Entity::Artist;
|
||||
use Moose;
|
||||
extends 'MusicBrainz::Server::Entity';
|
||||
|
||||
has 'name' => ( is => 'rw', isa => 'Str' );
|
||||
has 'sort_name' => ( is => 'rw', isa => 'Str' );
|
||||
has 'type_id' => ( is => 'rw', isa => 'Maybe[Int]' );
|
||||
has 'country_id' => ( is => 'rw', isa => 'Maybe[Int]' );
|
||||
has 'begin_date' => ( is => 'rw', isa => 'PartialDate' );
|
||||
has 'end_date' => ( is => 'rw', isa => 'PartialDate' );
|
||||
```
|
||||
|
||||
**Entity Characteristics:**
|
||||
- Immutable after construction (mostly)
|
||||
- Type-safe via Moose type system
|
||||
- Lazy loading of relationships
|
||||
- No database logic (pure domain objects)
|
||||
|
||||
### Form Layer (43 modules)
|
||||
|
||||
**Responsibility:** Form validation and processing using HTML::FormHandler.
|
||||
|
||||
**Key Forms:**
|
||||
- `Form::Artist` - Artist creation/editing
|
||||
- `Form::Release` - Release creation/editing
|
||||
- `Form::Recording` - Recording creation/editing
|
||||
- `Form::Edit::*` - Edit-specific forms
|
||||
|
||||
**Form Pattern:**
|
||||
```perl
|
||||
package MusicBrainz::Server::Form::Artist;
|
||||
use HTML::FormHandler::Moose;
|
||||
extends 'MusicBrainz::Server::Form';
|
||||
|
||||
has_field 'name' => ( type => 'Text', required => 1 );
|
||||
has_field 'sort_name' => ( type => 'Text', required => 1 );
|
||||
has_field 'type_id' => ( type => 'Select' );
|
||||
```
|
||||
|
||||
### View Layer (4 modules)
|
||||
|
||||
**Responsibility:** Render responses in different formats.
|
||||
|
||||
**Views:**
|
||||
- `View::Default` - Template Toolkit for HTML
|
||||
- `View::JSON` - JSON serialization
|
||||
- `View::XML` - XML serialization
|
||||
- `View::JSONLD` - JSON-LD serialization
|
||||
|
||||
## Edit System Architecture
|
||||
|
||||
**Pattern:** Command Pattern
|
||||
|
||||
**Concept:** All data modifications are represented as "edits" - versioned, votable changes that go through a review process.
|
||||
|
||||
**Edit Lifecycle:**
|
||||
1. User submits edit via form
|
||||
2. Edit is validated and persisted to `edit` table
|
||||
3. Edit enters voting period (typically 7 days)
|
||||
4. Community votes on edit (yes/no/abstain)
|
||||
5. Auto-editors can approve immediately
|
||||
6. Edit is applied or rejected based on votes
|
||||
7. Full audit trail maintained
|
||||
|
||||
**Edit Types (examples):**
|
||||
- `Edit::Artist::Create` - Create new artist
|
||||
- `Edit::Artist::Edit` - Modify artist data
|
||||
- `Edit::Artist::Delete` - Delete artist
|
||||
- `Edit::Release::Create` - Create new release
|
||||
- `Edit::Release::AddReleaseLabel` - Add label to release
|
||||
- `Edit::Relationship::Create` - Create relationship
|
||||
- `Edit::Relationship::Edit` - Modify relationship
|
||||
- `Edit::Relationship::Delete` - Delete relationship
|
||||
|
||||
**Edit Structure:**
|
||||
```perl
|
||||
package MusicBrainz::Server::Edit::Artist::Edit;
|
||||
use Moose;
|
||||
extends 'MusicBrainz::Server::Edit';
|
||||
|
||||
sub edit_type { 1 } # Unique edit type ID
|
||||
sub edit_name { 'Edit artist' }
|
||||
|
||||
sub initialize {
|
||||
my ($self, %opts) = @_;
|
||||
# Store old and new data
|
||||
$self->data({
|
||||
entity_id => $opts{artist_id},
|
||||
old => { ... },
|
||||
new => { ... },
|
||||
});
|
||||
}
|
||||
|
||||
sub accept {
|
||||
my $self = shift;
|
||||
# Apply the edit
|
||||
$self->c->model('Artist')->update($self->data->{entity_id}, $self->data->{new});
|
||||
}
|
||||
```
|
||||
|
||||
**Edit Data Storage:**
|
||||
- `edit` table - Edit metadata (type, status, votes)
|
||||
- `edit_data` table - Edit-specific data (JSON)
|
||||
- `vote` table - User votes on edits
|
||||
|
||||
**Edit Statuses:**
|
||||
- Open - Awaiting votes
|
||||
- Applied - Accepted and applied
|
||||
- Failed Vote - Rejected by community
|
||||
- Failed Dependency - Dependent edit failed
|
||||
- Error - Application error
|
||||
- Deleted - Cancelled by submitter
|
||||
|
||||
## Serialization Architecture
|
||||
|
||||
### JSON Serializer
|
||||
|
||||
**Location:** `lib/MusicBrainz/Server/WebService/Serializer/JSON/2/`
|
||||
|
||||
**Modules:**
|
||||
- `Artist.pm` - Artist JSON serialization
|
||||
- `Release.pm` - Release JSON serialization
|
||||
- `Recording.pm` - Recording JSON serialization
|
||||
- `Utils.pm` - Common serialization utilities
|
||||
|
||||
**Pattern:**
|
||||
```perl
|
||||
sub serialize {
|
||||
my ($self, $entity, $inc, $opts) = @_;
|
||||
|
||||
my $data = {
|
||||
id => $entity->gid,
|
||||
name => $entity->name,
|
||||
'sort-name' => $entity->sort_name,
|
||||
};
|
||||
|
||||
if ($inc->artist_credits) {
|
||||
$data->{'artist-credit'} = $self->serialize_artist_credit($entity->artist_credit);
|
||||
}
|
||||
|
||||
return $data;
|
||||
}
|
||||
```
|
||||
|
||||
### XML Serializer
|
||||
|
||||
**Location:** `lib/MusicBrainz/Server/WebService/Serializer/XML/2/`
|
||||
|
||||
**Namespace:** `http://musicbrainz.org/ns/mmd-2.0#`
|
||||
|
||||
**Pattern:**
|
||||
```perl
|
||||
sub serialize {
|
||||
my ($self, $entity, $inc, $opts) = @_;
|
||||
|
||||
my $xml = XML::LibXML::Element->new('artist');
|
||||
$xml->setAttribute('id', $entity->gid);
|
||||
$xml->appendTextChild('name', $entity->name);
|
||||
$xml->appendTextChild('sort-name', $entity->sort_name);
|
||||
|
||||
return $xml;
|
||||
}
|
||||
```
|
||||
|
||||
### JSON-LD Serializer
|
||||
|
||||
**Location:** `lib/MusicBrainz/Server/WebService/Serializer/JSONLD/`
|
||||
|
||||
**Context:** Schema.org vocabulary
|
||||
|
||||
**Pattern:**
|
||||
```perl
|
||||
sub serialize {
|
||||
my ($self, $entity) = @_;
|
||||
|
||||
return {
|
||||
'@context' => 'http://schema.org',
|
||||
'@type' => 'MusicGroup',
|
||||
'@id' => 'https://musicbrainz.org/artist/' . $entity->gid,
|
||||
'name' => $entity->name,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## Frontend Architecture
|
||||
|
||||
### Template Toolkit (Server-Side Rendering)
|
||||
|
||||
**Location:** `root/`
|
||||
|
||||
**Main Template:** `root/layout.tt`
|
||||
|
||||
**Template Structure:**
|
||||
```
|
||||
root/
|
||||
├── layout.tt # Main layout
|
||||
├── artist/
|
||||
│ ├── index.tt # Artist listing
|
||||
│ ├── show.tt # Artist detail
|
||||
│ └── edit.tt # Artist edit form
|
||||
├── release/
|
||||
│ ├── index.tt
|
||||
│ ├── show.tt
|
||||
│ └── edit.tt
|
||||
└── components/
|
||||
├── header.tt
|
||||
├── footer.tt
|
||||
└── sidebar.tt
|
||||
```
|
||||
|
||||
**Template Pattern:**
|
||||
```tt2
|
||||
[% WRAPPER 'layout.tt' title=artist.name %]
|
||||
<h1>[% artist.name %]</h1>
|
||||
<p>Sort name: [% artist.sort_name %]</p>
|
||||
|
||||
[% IF artist.releases.size %]
|
||||
<h2>Releases</h2>
|
||||
<ul>
|
||||
[% FOR release IN artist.releases %]
|
||||
<li><a href="/release/[% release.gid %]">[% release.name %]</a></li>
|
||||
[% END %]
|
||||
</ul>
|
||||
[% END %]
|
||||
[% END %]
|
||||
```
|
||||
|
||||
### React (Progressive Enhancement)
|
||||
|
||||
**Location:** `root/static/scripts/`
|
||||
|
||||
**Strategy:** Progressive enhancement - server renders HTML, React hydrates for interactivity.
|
||||
|
||||
**Component Structure:**
|
||||
```
|
||||
root/static/scripts/
|
||||
├── common/
|
||||
│ ├── components/
|
||||
│ │ ├── EntityLink.js
|
||||
│ │ ├── Autocomplete.js
|
||||
│ │ └── ReleaseList.js
|
||||
│ └── utility/
|
||||
├── edit/
|
||||
│ ├── components/
|
||||
│ │ ├── EditNote.js
|
||||
│ │ └── VotingSection.js
|
||||
│ └── reducers/
|
||||
└── release/
|
||||
├── components/
|
||||
│ ├── ReleaseHeader.js
|
||||
│ └── TrackList.js
|
||||
└── reducers/
|
||||
```
|
||||
|
||||
**React Pattern:**
|
||||
```javascript
|
||||
import React from 'react';
|
||||
import ReactDOM from 'react-dom';
|
||||
|
||||
const ReleaseList = ({ releases }) => (
|
||||
<ul>
|
||||
{releases.map(release => (
|
||||
<li key={release.gid}>
|
||||
<a href={`/release/${release.gid}`}>{release.name}</a>
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
);
|
||||
|
||||
// Hydrate server-rendered content
|
||||
const container = document.getElementById('release-list');
|
||||
if (container) {
|
||||
const releases = JSON.parse(container.dataset.releases);
|
||||
ReactDOM.hydrate(<ReleaseList releases={releases} />, container);
|
||||
}
|
||||
```
|
||||
|
||||
### Legacy Knockout.js
|
||||
|
||||
**Status:** Being phased out, but still present in some views.
|
||||
|
||||
**Location:** `root/static/scripts/` (mixed with React)
|
||||
|
||||
**Pattern:**
|
||||
```javascript
|
||||
ko.applyBindings({
|
||||
releases: ko.observableArray([...]),
|
||||
addRelease: function() { ... }
|
||||
});
|
||||
```
|
||||
|
||||
## Service Layer (Context)
|
||||
|
||||
**File:** `lib/MusicBrainz/Server/Context.pm`
|
||||
|
||||
**Responsibility:** Coordinate operations across multiple Data modules, manage transactions, provide unified interface.
|
||||
|
||||
**Pattern:**
|
||||
```perl
|
||||
my $artist = $c->model('Artist')->get_by_gid($gid);
|
||||
$c->model('ArtistCredit')->load($artist);
|
||||
$c->model('Release')->load_for_artist($artist);
|
||||
$c->model('Relationship')->load($artist);
|
||||
```
|
||||
|
||||
**Context Provides:**
|
||||
- Database connection management
|
||||
- Transaction handling
|
||||
- Model access (`$c->model('Artist')`)
|
||||
- Configuration access (`$c->config`)
|
||||
- Session management
|
||||
- Request/response handling
|
||||
|
||||
## Key Design Patterns
|
||||
|
||||
### Repository Pattern
|
||||
|
||||
**Implementation:** Data layer modules
|
||||
|
||||
**Purpose:** Abstract database access, provide clean interface for entity operations.
|
||||
|
||||
**Example:**
|
||||
```perl
|
||||
# Instead of raw SQL everywhere:
|
||||
my $artist = $c->model('Artist')->get_by_gid($gid);
|
||||
|
||||
# Data::Artist handles the SQL:
|
||||
sub get_by_gid {
|
||||
my ($self, $gid) = @_;
|
||||
return $self->sql->select_single_row_hash(
|
||||
'SELECT * FROM artist WHERE gid = ?', $gid
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
### Command Pattern
|
||||
|
||||
**Implementation:** Edit system
|
||||
|
||||
**Purpose:** Encapsulate all data modifications as objects, enabling undo, audit trails, and voting.
|
||||
|
||||
**Example:**
|
||||
```perl
|
||||
my $edit = $c->model('Edit')->create(
|
||||
edit_type => $EDIT_ARTIST_EDIT,
|
||||
editor_id => $c->user->id,
|
||||
artist_id => $artist->id,
|
||||
old => { name => 'Old Name' },
|
||||
new => { name => 'New Name' },
|
||||
);
|
||||
```
|
||||
|
||||
### Service Pattern
|
||||
|
||||
**Implementation:** Context object
|
||||
|
||||
**Purpose:** Coordinate operations across multiple repositories, manage transactions.
|
||||
|
||||
**Example:**
|
||||
```perl
|
||||
$c->model('MB')->with_transaction(sub {
|
||||
my $artist = $c->model('Artist')->insert({ name => 'New Artist' });
|
||||
$c->model('Edit')->create(
|
||||
edit_type => $EDIT_ARTIST_CREATE,
|
||||
entity_id => $artist->id,
|
||||
);
|
||||
});
|
||||
```
|
||||
|
||||
## Data Access Layer
|
||||
|
||||
**No ORM:** MusicBrainz does not use DBIx::Class or any traditional ORM.
|
||||
|
||||
**Custom Abstraction:**
|
||||
- Moose-based Data modules
|
||||
- Raw SQL via `DBD::Pg`
|
||||
- `DBIx::Connector` for connection pooling
|
||||
- `Sql.pm` provides query builder utilities
|
||||
|
||||
**Rationale:**
|
||||
- Performance - Direct SQL is faster
|
||||
- Flexibility - Complex queries easier to write
|
||||
- Control - Full control over query execution
|
||||
- Legacy - Codebase predates modern ORMs
|
||||
|
||||
**SQL Abstraction Example:**
|
||||
```perl
|
||||
# lib/MusicBrainz/Server/Data/Sql.pm
|
||||
sub select_single_row_hash {
|
||||
my ($self, $query, @args) = @_;
|
||||
my $row = $self->dbh->selectrow_hashref($query, undef, @args);
|
||||
return $row;
|
||||
}
|
||||
|
||||
sub select_list_of_hashes {
|
||||
my ($self, $query, @args) = @_;
|
||||
my $rows = $self->dbh->selectall_arrayref($query, { Slice => {} }, @args);
|
||||
return $rows;
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,736 @@
|
||||
# MusicBrainz Server Codebase
|
||||
|
||||
## Configuration System
|
||||
|
||||
### Two-Tier Architecture
|
||||
|
||||
**File:** `lib/DBDefs.pm`
|
||||
|
||||
**Structure:**
|
||||
1. `lib/DBDefs/Default.pm` - Base defaults (in git)
|
||||
2. `lib/DBDefs.pm` - Instance-specific overrides (not in git)
|
||||
|
||||
**Pattern:**
|
||||
```perl
|
||||
package DBDefs;
|
||||
use parent 'DBDefs::Default';
|
||||
|
||||
# Override defaults for this instance
|
||||
sub DB_SCHEMA_SEQUENCE { 28 }
|
||||
sub DB_STAGING_SERVER { 0 }
|
||||
sub REPLICATION_TYPE { RT_MASTER }
|
||||
```
|
||||
|
||||
### Configuration Categories
|
||||
|
||||
**Database Configuration:**
|
||||
```perl
|
||||
# Primary database
|
||||
sub READWRITE_DATABASE {
|
||||
return {
|
||||
database => 'musicbrainz_db',
|
||||
host => 'localhost',
|
||||
port => 5432,
|
||||
username => 'musicbrainz',
|
||||
password => 'musicbrainz',
|
||||
};
|
||||
}
|
||||
|
||||
# Read-only replica (optional)
|
||||
sub READONLY_DATABASE { READWRITE_DATABASE }
|
||||
|
||||
# System user for maintenance
|
||||
sub SYSTEM_USER { 'musicbrainz' }
|
||||
|
||||
# Schema version
|
||||
sub DB_SCHEMA_SEQUENCE { 28 }
|
||||
|
||||
# Staging server flag
|
||||
sub DB_STAGING_SERVER { 0 }
|
||||
```
|
||||
|
||||
**Redis Configuration:**
|
||||
```perl
|
||||
# Redis server
|
||||
sub REDIS_SERVER { 'localhost:6379' }
|
||||
|
||||
# Redis namespace (prefix for all keys)
|
||||
sub REDIS_NAMESPACE { 'MB' }
|
||||
|
||||
# Redis databases (0-15)
|
||||
sub REDIS_DATABASE_CACHE { 0 }
|
||||
sub REDIS_DATABASE_SESSION { 1 }
|
||||
sub REDIS_DATABASE_SEARCH { 2 }
|
||||
sub REDIS_DATABASE_STATS { 3 }
|
||||
```
|
||||
|
||||
**Solr Configuration:**
|
||||
```perl
|
||||
# Solr server
|
||||
sub SOLR_SERVER { 'http://localhost:8983/solr' }
|
||||
|
||||
# Solr cores
|
||||
sub SOLR_CORE_ARTIST { 'artist' }
|
||||
sub SOLR_CORE_RELEASE { 'release' }
|
||||
sub SOLR_CORE_RECORDING { 'recording' }
|
||||
# ... (13 cores total)
|
||||
```
|
||||
|
||||
**Web Server Configuration:**
|
||||
```perl
|
||||
# Server processes
|
||||
sub WEB_SERVER_PROCESSES { 10 }
|
||||
|
||||
# Server host
|
||||
sub WEB_SERVER_HOST { 'localhost' }
|
||||
|
||||
# Server port
|
||||
sub WEB_SERVER_PORT { 5000 }
|
||||
|
||||
# Use reverse proxy
|
||||
sub WEB_SERVER_USED_IN_REVERSE_PROXY { 1 }
|
||||
```
|
||||
|
||||
**Mail Configuration:**
|
||||
```perl
|
||||
# SMTP server
|
||||
sub SMTP_SERVER { 'localhost' }
|
||||
|
||||
# From address
|
||||
sub EMAIL_SUPPORT_ADDRESS { 'support@musicbrainz.org' }
|
||||
|
||||
# Noreply address
|
||||
sub EMAIL_NOREPLY_ADDRESS { 'noreply@musicbrainz.org' }
|
||||
|
||||
# Bugs address
|
||||
sub EMAIL_BUGS_ADDRESS { 'bugs@musicbrainz.org' }
|
||||
```
|
||||
|
||||
**External Service Configuration:**
|
||||
```perl
|
||||
# Cover Art Archive
|
||||
sub COVER_ART_ARCHIVE_ACCESS_KEY { '' }
|
||||
sub COVER_ART_ARCHIVE_SECRET_KEY { '' }
|
||||
sub COVER_ART_ARCHIVE_UPLOAD_PREFIXER { 'MB' }
|
||||
sub COVER_ART_ARCHIVE_DOWNLOAD_PREFIX { 'https://coverartarchive.org' }
|
||||
|
||||
# Wikipedia
|
||||
sub WIKIPEDIA_CACHE_TIMEOUT { 259200 } # 3 days
|
||||
|
||||
# Discourse SSO
|
||||
sub DISCOURSE_SSO_SECRET { '' }
|
||||
sub DISCOURSE_SERVER { 'https://community.metabrainz.org' }
|
||||
|
||||
# MetaBrainz OAuth
|
||||
sub OAUTH2_ENFORCE_TLS { 1 }
|
||||
```
|
||||
|
||||
**Replication Configuration:**
|
||||
```perl
|
||||
# Replication type
|
||||
sub REPLICATION_TYPE { RT_STANDALONE } # RT_MASTER, RT_MIRROR, RT_STANDALONE
|
||||
|
||||
# Replication access token
|
||||
sub REPLICATION_ACCESS_TOKEN { '' }
|
||||
|
||||
# Replication URL
|
||||
sub REPLICATION_URL { 'https://data.musicbrainz.org/replication' }
|
||||
```
|
||||
|
||||
**Session Configuration:**
|
||||
```perl
|
||||
# Session expiry (10 hours)
|
||||
sub SESSION_EXPIRE { 36000 }
|
||||
|
||||
# Session idle timeout (3 hours)
|
||||
sub SESSION_IDLE_TIMEOUT { 10800 }
|
||||
|
||||
# Session cookie name
|
||||
sub SESSION_COOKIE { 'AF_SID' }
|
||||
|
||||
# Session cookie domain
|
||||
sub SESSION_DOMAIN { '.musicbrainz.org' }
|
||||
```
|
||||
|
||||
**Feature Flags:**
|
||||
```perl
|
||||
# Enable beta features
|
||||
sub BETA_FEATURES { 0 }
|
||||
|
||||
# Enable development mode
|
||||
sub DEVELOPMENT_SERVER { 0 }
|
||||
|
||||
# Enable debug mode
|
||||
sub DEBUG { 0 }
|
||||
|
||||
# Enable SQL logging
|
||||
sub DB_READ_ONLY { 0 }
|
||||
```
|
||||
|
||||
**Rate Limiting:**
|
||||
```perl
|
||||
# API rate limit (requests per second)
|
||||
sub API_RATE_LIMIT { 1 }
|
||||
|
||||
# Web rate limit (requests per second)
|
||||
sub WEB_RATE_LIMIT { 10 }
|
||||
```
|
||||
|
||||
**Caching:**
|
||||
```perl
|
||||
# Cache TTL for entities (seconds)
|
||||
sub CACHE_TTL_ENTITY { 3600 } # 1 hour
|
||||
|
||||
# Cache TTL for search results (seconds)
|
||||
sub CACHE_TTL_SEARCH { 900 } # 15 minutes
|
||||
|
||||
# Cache TTL for statistics (seconds)
|
||||
sub CACHE_TTL_STATS { 3600 } # 1 hour
|
||||
```
|
||||
|
||||
## Logging System
|
||||
|
||||
### Log::Dispatch Framework
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
use Log::Dispatch;
|
||||
|
||||
my $log = Log::Dispatch->new(
|
||||
outputs => [
|
||||
[
|
||||
'Screen',
|
||||
min_level => 'debug',
|
||||
stderr => 1,
|
||||
newline => 1,
|
||||
],
|
||||
[
|
||||
'File',
|
||||
min_level => 'info',
|
||||
filename => '/var/log/musicbrainz/server.log',
|
||||
mode => 'append',
|
||||
newline => 1,
|
||||
],
|
||||
],
|
||||
);
|
||||
```
|
||||
|
||||
### Log Levels
|
||||
|
||||
**DEBUG:** Verbose debugging information
|
||||
```perl
|
||||
$log->debug("Loading artist with GID: $gid");
|
||||
```
|
||||
|
||||
**INFO:** Informational messages
|
||||
```perl
|
||||
$log->info("User $username logged in");
|
||||
```
|
||||
|
||||
**WARN:** Warning messages
|
||||
```perl
|
||||
$log->warn("Cache miss for entity $gid");
|
||||
```
|
||||
|
||||
**ERROR:** Error messages
|
||||
```perl
|
||||
$log->error("Failed to connect to database: $error");
|
||||
```
|
||||
|
||||
**FATAL:** Fatal errors
|
||||
```perl
|
||||
$log->fatal("Database connection lost, shutting down");
|
||||
```
|
||||
|
||||
### Message Limit
|
||||
|
||||
**Maximum Size:** 16KB per log message
|
||||
|
||||
**Truncation:** Messages exceeding 16KB are truncated with "..." suffix
|
||||
|
||||
**Rationale:** Prevent log flooding from large data dumps
|
||||
|
||||
### Lazy Evaluation
|
||||
|
||||
**Pattern:**
|
||||
```perl
|
||||
# Expensive operation only executed if debug level enabled
|
||||
$log->debug(sub {
|
||||
my $data = expensive_serialization($object);
|
||||
return "Object data: $data";
|
||||
});
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Avoid expensive operations when logging disabled
|
||||
- Reduce CPU usage in production
|
||||
|
||||
### Stack Traces
|
||||
|
||||
**Automatic:** Stack traces included for ERROR and FATAL levels
|
||||
|
||||
**Format:**
|
||||
```
|
||||
ERROR: Failed to load artist
|
||||
Stack trace:
|
||||
at MusicBrainz::Server::Data::Artist::get_by_gid line 123
|
||||
at MusicBrainz::Server::Controller::Artist::show line 45
|
||||
at Catalyst::Action::execute line 67
|
||||
```
|
||||
|
||||
### Log Rotation
|
||||
|
||||
**Tool:** logrotate
|
||||
|
||||
**Configuration:**
|
||||
```
|
||||
/var/log/musicbrainz/*.log {
|
||||
daily
|
||||
rotate 30
|
||||
compress
|
||||
delaycompress
|
||||
notifempty
|
||||
create 0640 musicbrainz musicbrainz
|
||||
sharedscripts
|
||||
postrotate
|
||||
/usr/bin/killall -HUP starman
|
||||
endscript
|
||||
}
|
||||
```
|
||||
|
||||
## Error Tracking (Sentry)
|
||||
|
||||
### Server-Side Integration
|
||||
|
||||
**Library:** Sentry::Raven (Perl SDK)
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
use Sentry::Raven;
|
||||
|
||||
my $raven = Sentry::Raven->new(
|
||||
sentry_dsn => 'https://public_key@sentry.io/project_id',
|
||||
environment => 'production',
|
||||
release => '2024.01.15',
|
||||
);
|
||||
```
|
||||
|
||||
**Capture Exception:**
|
||||
```perl
|
||||
eval {
|
||||
# Code that might fail
|
||||
$c->model('Artist')->get_by_gid($gid);
|
||||
};
|
||||
if ($@) {
|
||||
$raven->capture_exception($@, {
|
||||
request => {
|
||||
url => $c->req->uri,
|
||||
method => $c->req->method,
|
||||
headers => $c->req->headers,
|
||||
},
|
||||
user => {
|
||||
id => $c->user->id,
|
||||
username => $c->user->name,
|
||||
},
|
||||
extra => {
|
||||
gid => $gid,
|
||||
},
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Client-Side Integration
|
||||
|
||||
**Library:** @sentry/browser (JavaScript SDK)
|
||||
|
||||
**Configuration:**
|
||||
```javascript
|
||||
import * as Sentry from '@sentry/browser';
|
||||
|
||||
Sentry.init({
|
||||
dsn: 'https://public_key@sentry.io/project_id',
|
||||
environment: 'production',
|
||||
release: '2024.01.15',
|
||||
integrations: [
|
||||
new Sentry.BrowserTracing(),
|
||||
],
|
||||
tracesSampleRate: 0.1,
|
||||
});
|
||||
```
|
||||
|
||||
**Capture Exception:**
|
||||
```javascript
|
||||
try {
|
||||
// Code that might fail
|
||||
loadArtist(gid);
|
||||
} catch (error) {
|
||||
Sentry.captureException(error, {
|
||||
tags: {
|
||||
component: 'ArtistPage',
|
||||
},
|
||||
extra: {
|
||||
gid: gid,
|
||||
},
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Context Enrichment
|
||||
|
||||
**Request Context:**
|
||||
- URL
|
||||
- HTTP method
|
||||
- Headers
|
||||
- Query parameters
|
||||
- POST data (sanitized)
|
||||
|
||||
**User Context:**
|
||||
- User ID
|
||||
- Username
|
||||
- Email (hashed)
|
||||
- IP address (anonymized)
|
||||
|
||||
**Custom Context:**
|
||||
- Entity GID
|
||||
- Edit ID
|
||||
- Search query
|
||||
- API endpoint
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Current State
|
||||
|
||||
**Metrics Endpoint:** None (no Prometheus exporter)
|
||||
|
||||
**Health Check Endpoint:** None (no `/health` endpoint)
|
||||
|
||||
**Workarounds:**
|
||||
- Monitor HTTP 200 responses on `/`
|
||||
- Parse logs for error rates
|
||||
- Monitor database connection count
|
||||
- Monitor Redis memory usage
|
||||
|
||||
### Planned Improvements
|
||||
|
||||
**Prometheus Exporter:**
|
||||
- Request count by endpoint
|
||||
- Request duration histogram
|
||||
- Database query count
|
||||
- Database query duration
|
||||
- Cache hit/miss ratio
|
||||
- Edit submission rate
|
||||
- Vote count
|
||||
|
||||
**Health Check Endpoint:**
|
||||
- Database connectivity
|
||||
- Redis connectivity
|
||||
- Solr connectivity
|
||||
- Disk space
|
||||
- Memory usage
|
||||
|
||||
## Session Management
|
||||
|
||||
### Redis-Backed Sessions
|
||||
|
||||
**Storage:** Redis database 1
|
||||
|
||||
**Session Key:** `session:{session_id}`
|
||||
|
||||
**Session Data:**
|
||||
```json
|
||||
{
|
||||
"user_id": 12345,
|
||||
"username": "user",
|
||||
"csrf_token": "abc123...",
|
||||
"last_activity": 1609459200,
|
||||
"preferences": {
|
||||
"language": "en",
|
||||
"timezone": "UTC"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Session Lifecycle
|
||||
|
||||
**Creation:**
|
||||
```perl
|
||||
my $session_id = generate_session_id(); # Random 32-byte hex
|
||||
my $session_data = {
|
||||
user_id => $user->id,
|
||||
csrf_token => generate_csrf_token(),
|
||||
last_activity => time(),
|
||||
};
|
||||
|
||||
$redis->setex(
|
||||
"session:$session_id",
|
||||
36000, # 10 hours
|
||||
encode_json($session_data)
|
||||
);
|
||||
|
||||
$c->res->cookies->{AF_SID} = {
|
||||
value => $session_id,
|
||||
path => '/',
|
||||
domain => '.musicbrainz.org',
|
||||
secure => 1,
|
||||
httponly => 1,
|
||||
samesite => 'Lax',
|
||||
};
|
||||
```
|
||||
|
||||
**Validation:**
|
||||
```perl
|
||||
my $session_id = $c->req->cookies->{AF_SID};
|
||||
my $session_json = $redis->get("session:$session_id");
|
||||
|
||||
if (!$session_json) {
|
||||
# Session expired or invalid
|
||||
return undef;
|
||||
}
|
||||
|
||||
my $session_data = decode_json($session_json);
|
||||
|
||||
# Check idle timeout
|
||||
my $idle_time = time() - $session_data->{last_activity};
|
||||
if ($idle_time > 10800) { # 3 hours
|
||||
$redis->del("session:$session_id");
|
||||
return undef;
|
||||
}
|
||||
|
||||
# Update last activity
|
||||
$session_data->{last_activity} = time();
|
||||
$redis->setex("session:$session_id", 36000, encode_json($session_data));
|
||||
|
||||
return $session_data;
|
||||
```
|
||||
|
||||
**Destruction:**
|
||||
```perl
|
||||
$redis->del("session:$session_id");
|
||||
$c->res->cookies->{AF_SID} = {
|
||||
value => '',
|
||||
expires => '-1d',
|
||||
};
|
||||
```
|
||||
|
||||
### Session Expiry
|
||||
|
||||
**Absolute Expiry:** 10 hours (36,000 seconds)
|
||||
|
||||
**Idle Timeout:** 3 hours (10,800 seconds)
|
||||
|
||||
**Sliding Window:** Last activity updated on each request
|
||||
|
||||
### Cookie Configuration
|
||||
|
||||
**Name:** `AF_SID`
|
||||
|
||||
**Attributes:**
|
||||
- `Secure` - HTTPS only
|
||||
- `HttpOnly` - Not accessible via JavaScript
|
||||
- `SameSite=Lax` - CSRF protection
|
||||
- `Domain=.musicbrainz.org` - Shared across subdomains
|
||||
- `Path=/` - Available site-wide
|
||||
|
||||
## Security
|
||||
|
||||
### CSRF Protection
|
||||
|
||||
**Token Generation:**
|
||||
```perl
|
||||
use Digest::SHA qw(sha256_hex);
|
||||
|
||||
my $csrf_token = sha256_hex(
|
||||
$session_id .
|
||||
$user_id .
|
||||
time() .
|
||||
random_bytes(32)
|
||||
);
|
||||
```
|
||||
|
||||
**Token Storage:** Stored in session data
|
||||
|
||||
**Token Validation:**
|
||||
```perl
|
||||
sub validate_csrf_token {
|
||||
my ($c, $submitted_token) = @_;
|
||||
|
||||
my $session_token = $c->session->{csrf_token};
|
||||
|
||||
if (!$session_token || $submitted_token ne $session_token) {
|
||||
$c->detach('/error_403');
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Form Inclusion:**
|
||||
```html
|
||||
<form method="POST" action="/edit/artist/create">
|
||||
<input type="hidden" name="csrf_token" value="[% csrf_token %]">
|
||||
<!-- form fields -->
|
||||
</form>
|
||||
```
|
||||
|
||||
**AJAX Requests:**
|
||||
```javascript
|
||||
fetch('/api/endpoint', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'X-CSRF-Token': csrfToken,
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify(data),
|
||||
});
|
||||
```
|
||||
|
||||
### Content Security Policy (CSP)
|
||||
|
||||
**Header:**
|
||||
```
|
||||
Content-Security-Policy:
|
||||
default-src 'self';
|
||||
script-src 'self' 'unsafe-inline' https://www.google-analytics.com;
|
||||
style-src 'self' 'unsafe-inline';
|
||||
img-src 'self' data: https:;
|
||||
font-src 'self' data:;
|
||||
connect-src 'self' https://sentry.io;
|
||||
frame-ancestors 'none';
|
||||
```
|
||||
|
||||
**Directives:**
|
||||
- `default-src 'self'` - Only load resources from same origin
|
||||
- `script-src` - Allow scripts from self and Google Analytics
|
||||
- `style-src` - Allow styles from self (inline allowed for legacy)
|
||||
- `img-src` - Allow images from anywhere (cover art, etc.)
|
||||
- `connect-src` - Allow AJAX to self and Sentry
|
||||
- `frame-ancestors 'none'` - Prevent clickjacking
|
||||
|
||||
### Authentication
|
||||
|
||||
**Realms:**
|
||||
1. Session-based (cookie)
|
||||
2. HTTP Digest (legacy)
|
||||
3. OAuth2 Bearer token
|
||||
|
||||
**Session Authentication:**
|
||||
```perl
|
||||
sub authenticate_session {
|
||||
my ($c) = @_;
|
||||
|
||||
my $session_id = $c->req->cookies->{AF_SID};
|
||||
my $session = $c->model('Session')->load($session_id);
|
||||
|
||||
if ($session) {
|
||||
my $user = $c->model('Editor')->get_by_id($session->{user_id});
|
||||
$c->set_authenticated_user($user);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**OAuth2 Authentication:**
|
||||
```perl
|
||||
sub authenticate_oauth2 {
|
||||
my ($c) = @_;
|
||||
|
||||
my $auth_header = $c->req->header('Authorization');
|
||||
if ($auth_header =~ /^Bearer (.+)$/) {
|
||||
my $token = $1;
|
||||
my $token_info = $c->model('OAuth2')->introspect($token);
|
||||
|
||||
if ($token_info->{active}) {
|
||||
my $user = $c->model('Editor')->get_by_id($token_info->{sub});
|
||||
$c->set_authenticated_user($user);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Password Hashing
|
||||
|
||||
**Algorithm:** Bcrypt
|
||||
|
||||
**Cost Factor:** 12 (2^12 = 4096 iterations)
|
||||
|
||||
**Hashing:**
|
||||
```perl
|
||||
use Crypt::Eksblowfish::Bcrypt qw(bcrypt en_base64);
|
||||
|
||||
sub hash_password {
|
||||
my ($password) = @_;
|
||||
|
||||
my $salt = generate_salt(); # 16 random bytes
|
||||
my $settings = '$2a$12$' . en_base64($salt);
|
||||
|
||||
return bcrypt($password, $settings);
|
||||
}
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
```perl
|
||||
sub verify_password {
|
||||
my ($password, $hash) = @_;
|
||||
|
||||
my $computed_hash = bcrypt($password, $hash);
|
||||
|
||||
return $computed_hash eq $hash;
|
||||
}
|
||||
```
|
||||
|
||||
**Password Requirements:**
|
||||
- Minimum 8 characters
|
||||
- No maximum length
|
||||
- No complexity requirements (user choice)
|
||||
|
||||
### Editor Privileges
|
||||
|
||||
**Privilege Flags (Bitmask):**
|
||||
|
||||
| Flag | Value | Description |
|
||||
|------|-------|-------------|
|
||||
| `UNTRUSTED` | 1 | New user, limited privileges |
|
||||
| `AUTOEDITOR` | 2 | Auto-editor, edits auto-approved |
|
||||
| `BOT` | 4 | Bot account |
|
||||
| `UNTRUSTED_BOT` | 5 | Untrusted bot (1 + 4) |
|
||||
| `RELATIONSHIP_EDITOR` | 8 | Can edit relationships |
|
||||
| `WIKI_TRANSCLUSION` | 16 | Can transclude wiki content |
|
||||
| `MBID_SUBMITTER` | 32 | Can submit MBIDs |
|
||||
| `ACCOUNT_ADMIN` | 64 | Can manage user accounts |
|
||||
| `LOCATION_EDITOR` | 128 | Can edit locations |
|
||||
| `BANNER_EDITOR` | 256 | Can edit site banners |
|
||||
| `EDITING_DISABLED` | 512 | Editing disabled (banned) |
|
||||
| `ADDING_NOTES_DISABLED` | 1024 | Cannot add edit notes |
|
||||
| `SPAMMER` | 2048 | Marked as spammer |
|
||||
| `AUTO_EDITOR_ELECTIONS` | 4096 | Can vote in auto-editor elections |
|
||||
| `DONT_NAG` | 8192 | Don't show donation nag |
|
||||
|
||||
**Privilege Check:**
|
||||
```perl
|
||||
sub is_auto_editor {
|
||||
my ($user) = @_;
|
||||
return ($user->privs & 2) != 0;
|
||||
}
|
||||
|
||||
sub can_edit_relationships {
|
||||
my ($user) = @_;
|
||||
return ($user->privs & 8) != 0;
|
||||
}
|
||||
```
|
||||
|
||||
### Auto-Editor Election System
|
||||
|
||||
**Eligibility:**
|
||||
- 100+ accepted edits
|
||||
- Member for 2+ weeks
|
||||
- No recent failed votes
|
||||
|
||||
**Election Process:**
|
||||
1. User nominates self or is nominated
|
||||
2. 1-week voting period
|
||||
3. Existing auto-editors vote
|
||||
4. 75% approval required
|
||||
5. Minimum 5 votes required
|
||||
|
||||
**Auto-Editor Benefits:**
|
||||
- Edits auto-approved (no voting period)
|
||||
- Can vote in elections
|
||||
- Can approve/reject edits
|
||||
- Higher trust level
|
||||
@@ -0,0 +1,618 @@
|
||||
# MusicBrainz Server Data Layer
|
||||
|
||||
## Database Overview
|
||||
|
||||
**Engine:** PostgreSQL 16+
|
||||
**Tables:** 375
|
||||
**Foreign Key Constraints:** 500+
|
||||
**Schema Definition:** `admin/sql/CreateTables.sql` (4,068 lines)
|
||||
**Production Size:** ~350GB (full dataset with indexes)
|
||||
|
||||
## PostgreSQL Schema
|
||||
|
||||
### Core Entity Tables
|
||||
|
||||
**Artists:**
|
||||
- `artist` - Artist entities (bands, musicians, orchestras, etc.)
|
||||
- `artist_alias` - Alternative names for artists
|
||||
- `artist_credit` - Artist credit configurations
|
||||
- `artist_credit_name` - Individual artists in a credit
|
||||
- `artist_type` - Artist type enumeration (person, group, etc.)
|
||||
- `artist_tag` - Folksonomy tags
|
||||
- `artist_rating_raw` - User ratings
|
||||
- `artist_annotation` - User annotations
|
||||
- `artist_gid_redirect` - MBID redirects after merges
|
||||
|
||||
**Releases:**
|
||||
- `release` - Release entities (albums, singles, etc.)
|
||||
- `release_alias` - Alternative release names
|
||||
- `release_group` - Logical grouping of releases
|
||||
- `release_group_primary_type` - Album, Single, EP, etc.
|
||||
- `release_group_secondary_type` - Compilation, Live, Remix, etc.
|
||||
- `release_status` - Official, Promotion, Bootleg, etc.
|
||||
- `release_packaging` - Jewel Case, Digipak, etc.
|
||||
- `release_label` - Labels associated with release
|
||||
- `release_country` - Release events by country
|
||||
- `release_tag` - Folksonomy tags
|
||||
- `release_rating_raw` - User ratings
|
||||
- `release_annotation` - User annotations
|
||||
- `release_gid_redirect` - MBID redirects
|
||||
|
||||
**Recordings:**
|
||||
- `recording` - Recording entities (unique audio recordings)
|
||||
- `recording_alias` - Alternative recording names
|
||||
- `recording_tag` - Folksonomy tags
|
||||
- `recording_rating_raw` - User ratings
|
||||
- `recording_annotation` - User annotations
|
||||
- `recording_gid_redirect` - MBID redirects
|
||||
- `isrc` - International Standard Recording Codes
|
||||
- `recording_isrc` - Recording to ISRC mapping
|
||||
|
||||
**Works:**
|
||||
- `work` - Musical composition entities
|
||||
- `work_alias` - Alternative work names
|
||||
- `work_type` - Song, Symphony, Opera, etc.
|
||||
- `work_attribute` - Work attributes (key, tempo, etc.)
|
||||
- `work_attribute_type` - Attribute type definitions
|
||||
- `work_tag` - Folksonomy tags
|
||||
- `work_rating_raw` - User ratings
|
||||
- `work_annotation` - User annotations
|
||||
- `work_gid_redirect` - MBID redirects
|
||||
- `iswc` - International Standard Musical Work Codes
|
||||
- `work_iswc` - Work to ISWC mapping
|
||||
|
||||
**Labels:**
|
||||
- `label` - Record label entities
|
||||
- `label_alias` - Alternative label names
|
||||
- `label_type` - Original Production, Bootleg Production, etc.
|
||||
- `label_tag` - Folksonomy tags
|
||||
- `label_rating_raw` - User ratings
|
||||
- `label_annotation` - User annotations
|
||||
- `label_gid_redirect` - MBID redirects
|
||||
|
||||
**Geographic:**
|
||||
- `area` - Geographic areas (countries, cities, etc.)
|
||||
- `area_alias` - Alternative area names
|
||||
- `area_type` - Country, Subdivision, City, etc.
|
||||
- `area_tag` - Folksonomy tags
|
||||
- `area_annotation` - User annotations
|
||||
- `area_gid_redirect` - MBID redirects
|
||||
- `country_area` - ISO country code mapping
|
||||
- `iso_3166_1` - ISO 3166-1 country codes
|
||||
- `iso_3166_2` - ISO 3166-2 subdivision codes
|
||||
- `iso_3166_3` - ISO 3166-3 former country codes
|
||||
|
||||
**Events:**
|
||||
- `event` - Event entities (concerts, festivals, etc.)
|
||||
- `event_alias` - Alternative event names
|
||||
- `event_type` - Concert, Festival, etc.
|
||||
- `event_tag` - Folksonomy tags
|
||||
- `event_rating_raw` - User ratings
|
||||
- `event_annotation` - User annotations
|
||||
- `event_gid_redirect` - MBID redirects
|
||||
|
||||
**Places:**
|
||||
- `place` - Venue/location entities
|
||||
- `place_alias` - Alternative place names
|
||||
- `place_type` - Venue, Studio, etc.
|
||||
- `place_tag` - Folksonomy tags
|
||||
- `place_annotation` - User annotations
|
||||
- `place_gid_redirect` - MBID redirects
|
||||
|
||||
**Series:**
|
||||
- `series` - Ordered sequence entities
|
||||
- `series_alias` - Alternative series names
|
||||
- `series_type` - Release group series, etc.
|
||||
- `series_ordering_type` - Automatic, Manual
|
||||
- `series_tag` - Folksonomy tags
|
||||
- `series_annotation` - User annotations
|
||||
- `series_gid_redirect` - MBID redirects
|
||||
|
||||
**Instruments:**
|
||||
- `instrument` - Musical instrument entities
|
||||
- `instrument_alias` - Alternative instrument names
|
||||
- `instrument_type` - Wind, String, Percussion, etc.
|
||||
- `instrument_tag` - Folksonomy tags
|
||||
- `instrument_annotation` - User annotations
|
||||
- `instrument_gid_redirect` - MBID redirects
|
||||
|
||||
**Genres:**
|
||||
- `genre` - Genre entities
|
||||
- `genre_alias` - Alternative genre names
|
||||
- `genre_annotation` - User annotations
|
||||
- `genre_gid_redirect` - MBID redirects
|
||||
|
||||
**URLs:**
|
||||
- `url` - External URL entities
|
||||
- `url_gid_redirect` - MBID redirects
|
||||
|
||||
### Relationship Tables (l_* tables)
|
||||
|
||||
**Pattern:** `l_{entity1}_{entity2}` for relationships between entities.
|
||||
|
||||
**Examples:**
|
||||
- `l_artist_artist` - Artist-to-artist relationships (member of, collaboration, etc.)
|
||||
- `l_artist_recording` - Artist-to-recording relationships (performer, conductor, etc.)
|
||||
- `l_artist_release` - Artist-to-release relationships
|
||||
- `l_artist_release_group` - Artist-to-release-group relationships
|
||||
- `l_artist_work` - Artist-to-work relationships (composer, lyricist, etc.)
|
||||
- `l_artist_url` - Artist-to-URL relationships (official homepage, social media, etc.)
|
||||
- `l_recording_work` - Recording-to-work relationships (performance of)
|
||||
- `l_release_release_group` - Release-to-release-group relationships
|
||||
- `l_release_url` - Release-to-URL relationships (purchase links, streaming, etc.)
|
||||
|
||||
**Relationship Support Tables:**
|
||||
- `link` - Link instances
|
||||
- `link_type` - Relationship type definitions
|
||||
- `link_attribute` - Relationship attributes
|
||||
- `link_attribute_type` - Attribute type definitions
|
||||
- `link_crediting` - Custom relationship credits
|
||||
- `link_text_attribute` - Text attributes for relationships
|
||||
|
||||
### Media Tables
|
||||
|
||||
**Physical Media:**
|
||||
- `medium` - Physical media (CDs, vinyl, etc.)
|
||||
- `medium_format` - CD, Vinyl, Digital Media, etc.
|
||||
- `medium_cdtoc` - CD table of contents
|
||||
- `cdtoc` - CD TOC data
|
||||
- `cdtoc_raw` - Raw CD TOC data
|
||||
|
||||
**Tracks:**
|
||||
- `track` - Individual tracks on media
|
||||
- `track_gid_redirect` - Track MBID redirects
|
||||
|
||||
### Metadata Tables
|
||||
|
||||
**Tags:**
|
||||
- `tag` - Tag definitions
|
||||
- `tag_relation` - Tag relationships
|
||||
- `{entity}_tag` - Tags per entity type
|
||||
- `{entity}_tag_raw` - Raw user tag submissions
|
||||
|
||||
**Ratings:**
|
||||
- `{entity}_rating_raw` - Raw user ratings per entity type
|
||||
|
||||
**Annotations:**
|
||||
- `annotation` - Annotation text
|
||||
- `{entity}_annotation` - Annotations per entity type
|
||||
|
||||
**Collections:**
|
||||
- `editor_collection` - User collections
|
||||
- `editor_collection_type` - Collection type (release, artist, etc.)
|
||||
- `editor_collection_{entity}` - Collection contents per entity type
|
||||
|
||||
### Editorial Tables
|
||||
|
||||
**Edits:**
|
||||
- `edit` - Edit submissions
|
||||
- `edit_data` - Edit-specific data (JSON)
|
||||
- `edit_{entity}` - Edit to entity mappings
|
||||
- `vote` - User votes on edits
|
||||
- `edit_note` - Discussion notes on edits
|
||||
- `edit_note_recipient` - Edit note notifications
|
||||
|
||||
**Editors:**
|
||||
- `editor` - User accounts
|
||||
- `editor_preference` - User preferences
|
||||
- `editor_language` - User language preferences
|
||||
- `editor_subscribe_artist` - Artist subscriptions
|
||||
- `editor_subscribe_collection` - Collection subscriptions
|
||||
- `editor_subscribe_label` - Label subscriptions
|
||||
- `editor_subscribe_series` - Series subscriptions
|
||||
- `editor_subscribe_editor` - Editor subscriptions
|
||||
- `editor_oauth_token` - OAuth tokens
|
||||
- `application` - OAuth applications
|
||||
|
||||
**Moderation:**
|
||||
- `autoeditor_election` - Auto-editor elections
|
||||
- `autoeditor_election_vote` - Election votes
|
||||
- `editor_watch_preferences` - Watchlist preferences
|
||||
- `editor_watch_artist` - Artist watchlist
|
||||
- `editor_watch_release_group_type` - Release group type filters
|
||||
- `editor_watch_release_status` - Release status filters
|
||||
|
||||
### Identifier Tables
|
||||
|
||||
**Standard Identifiers:**
|
||||
- `isrc` - International Standard Recording Code
|
||||
- `iswc` - International Standard Musical Work Code
|
||||
- `recording_isrc` - Recording to ISRC mapping
|
||||
- `work_iswc` - Work to ISWC mapping
|
||||
|
||||
**MusicBrainz Identifiers:**
|
||||
- `{entity}_gid_redirect` - MBID redirects after merges
|
||||
|
||||
**Barcodes:**
|
||||
- `release_barcode` - Release barcodes (EAN, UPC)
|
||||
|
||||
### Replication Tables (dbmirror2)
|
||||
|
||||
**Replication System:**
|
||||
- `dbmirror_pending` - Pending replication packets
|
||||
- `dbmirror_pendingdata` - Replication data
|
||||
- `replication_control` - Replication state tracking
|
||||
|
||||
**Modes:**
|
||||
- `RT_MASTER` - Master database (generates replication packets)
|
||||
- `RT_MIRROR` - Mirror database (consumes replication packets)
|
||||
- `RT_STANDALONE` - Standalone database (no replication)
|
||||
|
||||
### Auxiliary Tables
|
||||
|
||||
**Statistics:**
|
||||
- `statistic` - Cached statistics
|
||||
- `statistic_event` - Statistic calculation events
|
||||
|
||||
**Documentation:**
|
||||
- `documentation.l_{entity1}_{entity2}_example` - Relationship examples
|
||||
|
||||
**Deprecated:**
|
||||
- Various `_deleted` tables for soft deletes
|
||||
|
||||
## Schema Management
|
||||
|
||||
### CreateTables.sql
|
||||
|
||||
**Location:** `admin/sql/CreateTables.sql`
|
||||
**Size:** 4,068 lines
|
||||
**Purpose:** Complete schema definition for fresh installations
|
||||
|
||||
**Structure:**
|
||||
```sql
|
||||
-- Core entity tables
|
||||
CREATE TABLE artist (...);
|
||||
CREATE TABLE release (...);
|
||||
CREATE TABLE recording (...);
|
||||
|
||||
-- Indexes
|
||||
CREATE INDEX artist_idx_name ON artist (name);
|
||||
CREATE INDEX artist_idx_gid ON artist (gid);
|
||||
|
||||
-- Foreign keys
|
||||
ALTER TABLE artist_credit_name
|
||||
ADD CONSTRAINT artist_credit_name_fk_artist
|
||||
FOREIGN KEY (artist) REFERENCES artist(id);
|
||||
|
||||
-- Triggers
|
||||
CREATE TRIGGER a_ins_artist AFTER INSERT ON artist ...;
|
||||
```
|
||||
|
||||
### Migration System
|
||||
|
||||
**Location:** `admin/sql/updates/`
|
||||
**Count:** 332 migration files
|
||||
**Naming:** Date-based (YYYYMMDD-HHMMSS-description.sql)
|
||||
|
||||
**Example Filenames:**
|
||||
- `20230115-mbs-12345-add-genre-table.sql`
|
||||
- `20230220-mbs-12346-add-event-series-relationship.sql`
|
||||
- `20230315-mbs-12347-add-recording-length-index.sql`
|
||||
|
||||
**Migration Structure:**
|
||||
```sql
|
||||
\set ON_ERROR_STOP 1
|
||||
|
||||
BEGIN;
|
||||
|
||||
-- Schema changes
|
||||
ALTER TABLE artist ADD COLUMN disambiguation TEXT;
|
||||
|
||||
-- Data migrations
|
||||
UPDATE artist SET disambiguation = '' WHERE disambiguation IS NULL;
|
||||
|
||||
-- Constraints
|
||||
ALTER TABLE artist ALTER COLUMN disambiguation SET NOT NULL;
|
||||
|
||||
COMMIT;
|
||||
```
|
||||
|
||||
**Schema Change Variants:**
|
||||
- `schema-change/` subdirectory contains master/mirror variants
|
||||
- Master migrations may include replication setup
|
||||
- Mirror migrations skip replication-specific changes
|
||||
|
||||
**Migration Tracking:**
|
||||
- Migrations are tracked in the database
|
||||
- Applied migrations recorded to prevent re-application
|
||||
- Rollback not supported (forward-only migrations)
|
||||
|
||||
## Custom ORM (Moose-based Data Layer)
|
||||
|
||||
### Architecture
|
||||
|
||||
**NOT DBIx::Class** - MusicBrainz uses a custom Moose-based data access layer.
|
||||
|
||||
**Components:**
|
||||
- 106 Data modules in `lib/MusicBrainz/Server/Data/`
|
||||
- `DBIx::Connector` for connection pooling
|
||||
- `Sql.pm` for query abstraction
|
||||
- Raw SQL via `DBD::Pg`
|
||||
|
||||
### Data Module Pattern
|
||||
|
||||
**Base Class:** `MusicBrainz::Server::Data::Entity`
|
||||
|
||||
**Example:**
|
||||
```perl
|
||||
package MusicBrainz::Server::Data::Artist;
|
||||
use Moose;
|
||||
extends 'MusicBrainz::Server::Data::Entity';
|
||||
|
||||
with 'MusicBrainz::Server::Data::Role::Editable';
|
||||
with 'MusicBrainz::Server::Data::Role::LinksToEdit';
|
||||
with 'MusicBrainz::Server::Data::Role::Merge';
|
||||
|
||||
sub _table { 'artist' }
|
||||
sub _entity_class { 'MusicBrainz::Server::Entity::Artist' }
|
||||
|
||||
sub _columns {
|
||||
return 'id, gid, name, sort_name, begin_date_year, begin_date_month,
|
||||
begin_date_day, end_date_year, end_date_month, end_date_day,
|
||||
type, area, gender, comment, edits_pending, last_updated,
|
||||
ended, begin_area, end_area';
|
||||
}
|
||||
|
||||
sub _column_mapping {
|
||||
return {
|
||||
id => 'id',
|
||||
gid => 'gid',
|
||||
name => 'name',
|
||||
sort_name => 'sort_name',
|
||||
type_id => 'type',
|
||||
area_id => 'area',
|
||||
gender_id => 'gender',
|
||||
comment => 'comment',
|
||||
edits_pending => 'edits_pending',
|
||||
last_updated => 'last_updated',
|
||||
ended => 'ended',
|
||||
begin_area_id => 'begin_area',
|
||||
end_area_id => 'end_area',
|
||||
};
|
||||
}
|
||||
|
||||
sub get_by_gid {
|
||||
my ($self, $gid) = @_;
|
||||
return $self->_get_by_key('gid', $gid);
|
||||
}
|
||||
|
||||
sub insert {
|
||||
my ($self, $data) = @_;
|
||||
my $row = $self->_hash_to_row($data);
|
||||
my $id = $self->sql->insert_row('artist', $row, 'id');
|
||||
return $self->_new_from_row($row);
|
||||
}
|
||||
```
|
||||
|
||||
### Moose Roles
|
||||
|
||||
**Role::Editable:**
|
||||
- Entities that can be edited via the edit system
|
||||
- Provides `load_meta()` for edit counts
|
||||
|
||||
**Role::Taggable:**
|
||||
- Entities that support folksonomy tags
|
||||
- Provides `tags()`, `add_tags()`, `remove_tags()`
|
||||
|
||||
**Role::Rateable:**
|
||||
- Entities that can be rated (0-100 scale)
|
||||
- Provides `rating()`, `user_rating()`
|
||||
|
||||
**Role::Relatable:**
|
||||
- Entities that can have relationships
|
||||
- Provides `relationships()`, `add_relationship()`
|
||||
|
||||
**Role::Aliasable:**
|
||||
- Entities that can have alternative names
|
||||
- Provides `aliases()`, `add_alias()`
|
||||
|
||||
**Role::Annotation:**
|
||||
- Entities that can be annotated
|
||||
- Provides `latest_annotation()`
|
||||
|
||||
### Sql.pm Abstraction
|
||||
|
||||
**Location:** `lib/MusicBrainz/Server/Sql.pm`
|
||||
|
||||
**Purpose:** Thin abstraction over DBI for common query patterns.
|
||||
|
||||
**Methods:**
|
||||
```perl
|
||||
# Single row
|
||||
my $row = $sql->select_single_row_hash(
|
||||
'SELECT * FROM artist WHERE gid = ?', $gid
|
||||
);
|
||||
|
||||
# Multiple rows
|
||||
my $rows = $sql->select_list_of_hashes(
|
||||
'SELECT * FROM artist WHERE area = ?', $area_id
|
||||
);
|
||||
|
||||
# Insert
|
||||
my $id = $sql->insert_row('artist', {
|
||||
gid => $gid,
|
||||
name => $name,
|
||||
sort_name => $sort_name,
|
||||
}, 'id');
|
||||
|
||||
# Update
|
||||
$sql->update_row('artist', {
|
||||
name => $new_name,
|
||||
}, { id => $artist_id });
|
||||
|
||||
# Delete
|
||||
$sql->delete_row('artist', { id => $artist_id });
|
||||
|
||||
# Transaction
|
||||
$sql->begin;
|
||||
eval {
|
||||
$sql->insert_row(...);
|
||||
$sql->update_row(...);
|
||||
$sql->commit;
|
||||
};
|
||||
if ($@) {
|
||||
$sql->rollback;
|
||||
die $@;
|
||||
}
|
||||
```
|
||||
|
||||
### DBIx::Connector
|
||||
|
||||
**Purpose:** Fast, safe DBI connection management with automatic reconnection.
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
my $conn = DBIx::Connector->new(
|
||||
$dsn, $username, $password,
|
||||
{
|
||||
RaiseError => 1,
|
||||
AutoCommit => 1,
|
||||
pg_enable_utf8 => 1,
|
||||
}
|
||||
);
|
||||
|
||||
# Execute with automatic reconnection
|
||||
$conn->run(sub {
|
||||
my $dbh = $_;
|
||||
$dbh->do('SELECT ...');
|
||||
});
|
||||
```
|
||||
|
||||
## Search Infrastructure
|
||||
|
||||
### Apache Solr (Primary)
|
||||
|
||||
**Purpose:** Full-text search across all entities
|
||||
|
||||
**Cores:**
|
||||
- `artist` - Artist search
|
||||
- `release` - Release search
|
||||
- `release-group` - Release group search
|
||||
- `recording` - Recording search
|
||||
- `work` - Work search
|
||||
- `label` - Label search
|
||||
- `area` - Area search
|
||||
- `event` - Event search
|
||||
- `place` - Place search
|
||||
- `series` - Series search
|
||||
- `instrument` - Instrument search
|
||||
- `tag` - Tag search
|
||||
|
||||
**Indexing:**
|
||||
- Incremental updates via edit system
|
||||
- Full reindex via `admin/BuildSearchIndexes.pl`
|
||||
- Real-time updates for new entities
|
||||
|
||||
**Query Features:**
|
||||
- Fuzzy matching
|
||||
- Phrase search
|
||||
- Boolean operators (AND, OR, NOT)
|
||||
- Field-specific search (artist:nirvana)
|
||||
- Wildcards (nirv*)
|
||||
- Proximity search ("smells spirit"~5)
|
||||
|
||||
### PostgreSQL Full-Text (Fallback)
|
||||
|
||||
**Purpose:** Fallback when Solr is unavailable
|
||||
|
||||
**Implementation:**
|
||||
- `mb_simple_tsvector` function for text vectorization
|
||||
- GIN indexes on tsvector columns
|
||||
- `to_tsquery()` for query parsing
|
||||
|
||||
**Example:**
|
||||
```sql
|
||||
CREATE INDEX artist_idx_name_txt ON artist
|
||||
USING gin(mb_simple_tsvector(name));
|
||||
|
||||
SELECT * FROM artist
|
||||
WHERE mb_simple_tsvector(name) @@ to_tsquery('simple', 'nirvana');
|
||||
```
|
||||
|
||||
**Limitations:**
|
||||
- Less sophisticated than Solr
|
||||
- No fuzzy matching
|
||||
- Limited ranking
|
||||
- Used only as emergency fallback
|
||||
|
||||
## Redis Caching
|
||||
|
||||
### Architecture
|
||||
|
||||
**Databases:** 16 separate Redis databases (0-15)
|
||||
|
||||
**Database Allocation:**
|
||||
- DB 0: Entity cache (GID lookups)
|
||||
- DB 1: Session storage
|
||||
- DB 2-15: Various caches (search, statistics, etc.)
|
||||
|
||||
### Entity Cache (GID Cache)
|
||||
|
||||
**Purpose:** Cache entity lookups by MBID (GID)
|
||||
|
||||
**Pattern:**
|
||||
```perl
|
||||
# Cache key: entity:gid:{gid}
|
||||
my $cache_key = "artist:gid:$gid";
|
||||
|
||||
# Try cache first
|
||||
my $cached = $redis->get($cache_key);
|
||||
if ($cached) {
|
||||
return decode_json($cached);
|
||||
}
|
||||
|
||||
# Cache miss - load from database
|
||||
my $artist = $self->sql->select_single_row_hash(
|
||||
'SELECT * FROM artist WHERE gid = ?', $gid
|
||||
);
|
||||
|
||||
# Store in cache (1 hour TTL)
|
||||
$redis->setex($cache_key, 3600, encode_json($artist));
|
||||
|
||||
return $artist;
|
||||
```
|
||||
|
||||
**TTL:** 1 hour (3600 seconds)
|
||||
|
||||
**Invalidation:** On edit application
|
||||
|
||||
### Session Storage
|
||||
|
||||
**Purpose:** Store user sessions
|
||||
|
||||
**Pattern:**
|
||||
```perl
|
||||
# Session key: session:{session_id}
|
||||
my $session_key = "session:$session_id";
|
||||
|
||||
# Store session
|
||||
$redis->setex($session_key, 36000, encode_json({
|
||||
user_id => $user_id,
|
||||
csrf_token => $csrf_token,
|
||||
last_activity => time(),
|
||||
}));
|
||||
|
||||
# Retrieve session
|
||||
my $session = decode_json($redis->get($session_key));
|
||||
```
|
||||
|
||||
**TTL:** 10 hours absolute, 3 hours idle
|
||||
|
||||
**Cookie:** `AF_SID` (SameSite=Lax, Secure, HttpOnly)
|
||||
|
||||
### Cache Invalidation
|
||||
|
||||
**Strategy:** Invalidate on write
|
||||
|
||||
**Example:**
|
||||
```perl
|
||||
# After updating artist
|
||||
$self->sql->update_row('artist', { name => $new_name }, { id => $id });
|
||||
|
||||
# Invalidate cache
|
||||
$redis->del("artist:gid:$gid");
|
||||
```
|
||||
|
||||
**Bulk Invalidation:**
|
||||
- Pattern-based deletion via `SCAN` + `DEL`
|
||||
- Used for relationship changes affecting multiple entities
|
||||
@@ -0,0 +1,707 @@
|
||||
# MusicBrainz Server Deployment
|
||||
|
||||
## Docker Architecture
|
||||
|
||||
### Build System
|
||||
|
||||
**Template Engine:** M4 macros
|
||||
**Base Image:** Ubuntu Noble (24.04 LTS)
|
||||
**Dockerfile Location:** `docker/Dockerfile.template`
|
||||
|
||||
**Template Processing:**
|
||||
```bash
|
||||
# Generate Dockerfile from template
|
||||
m4 docker/Dockerfile.template > docker/Dockerfile
|
||||
```
|
||||
|
||||
**M4 Macros:**
|
||||
- `INSTALL_PERL_DEPENDENCIES` - Install Perl modules via carton
|
||||
- `INSTALL_NODE_DEPENDENCIES` - Install Node.js packages via yarn
|
||||
- `COMPILE_RESOURCES` - Compile static assets
|
||||
- `SETUP_DATABASE` - Initialize PostgreSQL schema
|
||||
|
||||
**Multi-Stage Build:**
|
||||
1. Base stage - Install system dependencies
|
||||
2. Build stage - Compile assets and dependencies
|
||||
3. Runtime stage - Copy artifacts, minimal runtime
|
||||
|
||||
### Container Types
|
||||
|
||||
**website:**
|
||||
- Main web application
|
||||
- Serves HTML pages via Template Toolkit
|
||||
- Handles user authentication and sessions
|
||||
- Port: 5000
|
||||
|
||||
**webservice:**
|
||||
- API endpoints (/ws/2/)
|
||||
- JSON/XML serialization
|
||||
- OAuth authentication
|
||||
- Port: 5001
|
||||
|
||||
**tests:**
|
||||
- Run test suites
|
||||
- Perl unit tests
|
||||
- JavaScript tests
|
||||
- pgTAP database tests
|
||||
- No exposed ports (ephemeral)
|
||||
|
||||
**cron:**
|
||||
- Scheduled tasks
|
||||
- Statistics calculation
|
||||
- Data cleanup
|
||||
- Replication packet export
|
||||
- No exposed ports
|
||||
|
||||
**sitemaps:**
|
||||
- Generate XML sitemaps
|
||||
- Update search engine indexes
|
||||
- Run daily
|
||||
- No exposed ports
|
||||
|
||||
**json-dump:**
|
||||
- Export database to JSON
|
||||
- Generate data dumps for download
|
||||
- Run weekly
|
||||
- No exposed ports
|
||||
|
||||
**solr-backup:**
|
||||
- Backup Solr indexes
|
||||
- Run daily
|
||||
- No exposed ports
|
||||
|
||||
**template-renderer:**
|
||||
- Isolated Template Toolkit renderer
|
||||
- Forked from main process
|
||||
- Prevents template errors from crashing main app
|
||||
- IPC via Unix socket
|
||||
|
||||
### Docker Compose
|
||||
|
||||
**File:** `docker-compose.yml`
|
||||
|
||||
**Services:**
|
||||
```yaml
|
||||
services:
|
||||
db:
|
||||
image: postgres:16
|
||||
volumes:
|
||||
- pgdata:/var/lib/postgresql/data
|
||||
environment:
|
||||
POSTGRES_USER: musicbrainz
|
||||
POSTGRES_PASSWORD: musicbrainz
|
||||
POSTGRES_DB: musicbrainz_db
|
||||
ports:
|
||||
- "5432:5432"
|
||||
|
||||
redis:
|
||||
image: redis:7
|
||||
volumes:
|
||||
- redisdata:/data
|
||||
ports:
|
||||
- "6379:6379"
|
||||
|
||||
solr:
|
||||
image: solr:8.11
|
||||
volumes:
|
||||
- solrdata:/var/solr
|
||||
ports:
|
||||
- "8983:8983"
|
||||
|
||||
website:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: docker/Dockerfile
|
||||
target: website
|
||||
depends_on:
|
||||
- db
|
||||
- redis
|
||||
- solr
|
||||
ports:
|
||||
- "5000:5000"
|
||||
environment:
|
||||
MUSICBRAINZ_SERVER_PROCESSES: 10
|
||||
MUSICBRAINZ_USE_PROXY: 1
|
||||
|
||||
webservice:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: docker/Dockerfile
|
||||
target: webservice
|
||||
depends_on:
|
||||
- db
|
||||
- redis
|
||||
- solr
|
||||
ports:
|
||||
- "5001:5001"
|
||||
|
||||
volumes:
|
||||
pgdata:
|
||||
redisdata:
|
||||
solrdata:
|
||||
```
|
||||
|
||||
### Image Layers
|
||||
|
||||
**Base Layer (Ubuntu Noble):**
|
||||
- System packages (build-essential, libpq-dev, etc.)
|
||||
- Perl 5.38
|
||||
- Node.js 20
|
||||
- PostgreSQL client libraries
|
||||
|
||||
**Dependency Layer:**
|
||||
- Perl modules (via carton)
|
||||
- Node.js packages (via yarn)
|
||||
- Cached for faster rebuilds
|
||||
|
||||
**Application Layer:**
|
||||
- Application code
|
||||
- Compiled assets
|
||||
- Configuration templates
|
||||
|
||||
**Runtime Layer:**
|
||||
- Minimal runtime dependencies
|
||||
- No build tools
|
||||
- Smaller image size
|
||||
|
||||
## PSGI Server Configuration
|
||||
|
||||
### Starlet
|
||||
|
||||
**Server:** Starlet (high-performance PSGI server)
|
||||
**Protocol:** HTTP/1.1
|
||||
**Concurrency:** Pre-forking worker model
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# Start Starlet with 10 workers
|
||||
starman --workers 10 \
|
||||
--max-requests 100 \
|
||||
--listen :5000 \
|
||||
app.psgi
|
||||
```
|
||||
|
||||
**Worker Settings:**
|
||||
- **Workers:** 10 (configurable via `MUSICBRAINZ_SERVER_PROCESSES`)
|
||||
- **Max Requests per Worker:** 30-90 (random to prevent thundering herd)
|
||||
- **Worker Timeout:** 300 seconds (5 minutes)
|
||||
- **Keepalive:** Enabled (60 seconds)
|
||||
|
||||
**Worker Lifecycle:**
|
||||
1. Master process forks 10 workers
|
||||
2. Each worker handles requests until max_requests reached
|
||||
3. Worker exits gracefully
|
||||
4. Master forks new worker to replace it
|
||||
5. Prevents memory leaks from accumulating
|
||||
|
||||
### Server::Starter (Zero-Downtime Restarts)
|
||||
|
||||
**Purpose:** Enable zero-downtime deployments
|
||||
|
||||
**Mechanism:**
|
||||
1. Server::Starter binds to port
|
||||
2. Forks Starlet with inherited socket
|
||||
3. On restart signal (HUP):
|
||||
- Start new Starlet process
|
||||
- New process binds to same socket
|
||||
- Old process finishes existing requests
|
||||
- Old process exits
|
||||
- No dropped connections
|
||||
|
||||
**Command:**
|
||||
```bash
|
||||
start_server \
|
||||
--port 5000 \
|
||||
--pid-file /var/run/musicbrainz.pid \
|
||||
--status-file /var/run/musicbrainz.status \
|
||||
-- \
|
||||
starman --workers 10 app.psgi
|
||||
```
|
||||
|
||||
**Restart:**
|
||||
```bash
|
||||
# Send HUP signal to trigger graceful restart
|
||||
kill -HUP $(cat /var/run/musicbrainz.pid)
|
||||
```
|
||||
|
||||
**Status Check:**
|
||||
```bash
|
||||
# Check server status
|
||||
cat /var/run/musicbrainz.status
|
||||
# Output: 1234:5000 (PID:PORT)
|
||||
```
|
||||
|
||||
### Reverse Proxy
|
||||
|
||||
**Production Setup:** Nginx reverse proxy in front of Starlet
|
||||
|
||||
**Nginx Configuration:**
|
||||
```nginx
|
||||
upstream musicbrainz {
|
||||
server localhost:5000;
|
||||
keepalive 32;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 80;
|
||||
server_name musicbrainz.org;
|
||||
|
||||
location / {
|
||||
proxy_pass http://musicbrainz;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Connection "";
|
||||
}
|
||||
|
||||
location /static/ {
|
||||
alias /var/www/musicbrainz/root/static/;
|
||||
expires 1y;
|
||||
add_header Cache-Control "public, immutable";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- SSL termination
|
||||
- Static file serving
|
||||
- Gzip compression
|
||||
- Request buffering
|
||||
- Load balancing (multiple Starlet instances)
|
||||
|
||||
## CI/CD Pipeline
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
**Workflow File:** `.github/workflows/test.yml`
|
||||
|
||||
**Triggers:**
|
||||
- Push to main branch
|
||||
- Pull requests
|
||||
- Manual workflow dispatch
|
||||
|
||||
### Build Stage
|
||||
|
||||
**Job:** `build-tests-image`
|
||||
|
||||
**Steps:**
|
||||
1. Checkout code
|
||||
2. Set up Docker Buildx
|
||||
3. Build test Docker image
|
||||
4. Push to GitHub Container Registry
|
||||
5. Cache layers for faster rebuilds
|
||||
|
||||
**Dockerfile:** `docker/Dockerfile.test`
|
||||
|
||||
**Caching:**
|
||||
- Perl dependencies cached by cpanfile.snapshot hash
|
||||
- Node dependencies cached by yarn.lock hash
|
||||
- Docker layer caching via GitHub Actions cache
|
||||
|
||||
### Test Stages
|
||||
|
||||
**Job:** `js-perl-and-pgtap`
|
||||
|
||||
**Matrix:**
|
||||
- Perl 5.38.0 (stable)
|
||||
- Perl 5.42.0 (latest)
|
||||
|
||||
**Steps:**
|
||||
1. Pull test image from registry
|
||||
2. Start PostgreSQL container
|
||||
3. Start Redis container
|
||||
4. Initialize test database
|
||||
5. Run Perl tests (`prove -lr t/`)
|
||||
6. Run JavaScript tests (`yarn test`)
|
||||
7. Run pgTAP tests (`pg_prove -d musicbrainz_test t/pgtap/`)
|
||||
8. Upload coverage reports
|
||||
|
||||
**Parallelization:** Tests run in parallel across matrix
|
||||
|
||||
### Selenium Tests
|
||||
|
||||
**Jobs:** `selenium-1`, `selenium-2`, `selenium-3`, `selenium-4`
|
||||
|
||||
**Partitioning:** Tests split into 4 partitions for parallel execution
|
||||
|
||||
**Steps:**
|
||||
1. Pull test image
|
||||
2. Start PostgreSQL, Redis, Solr
|
||||
3. Start Selenium standalone Chrome
|
||||
4. Initialize test database with sample data
|
||||
5. Start MusicBrainz server
|
||||
6. Run Selenium tests for partition
|
||||
7. Upload screenshots on failure
|
||||
|
||||
**Partition Strategy:**
|
||||
```bash
|
||||
# Partition 1: Artist and release tests
|
||||
# Partition 2: Recording and work tests
|
||||
# Partition 3: Edit and relationship tests
|
||||
# Partition 4: Search and browse tests
|
||||
```
|
||||
|
||||
**Selenium Configuration:**
|
||||
```perl
|
||||
# t/selenium.pl
|
||||
use Selenium::Remote::Driver;
|
||||
|
||||
my $driver = Selenium::Remote::Driver->new(
|
||||
remote_server_addr => 'localhost',
|
||||
port => 4444,
|
||||
browser_name => 'chrome',
|
||||
extra_capabilities => {
|
||||
chromeOptions => {
|
||||
args => ['--headless', '--no-sandbox', '--disable-dev-shm-usage'],
|
||||
},
|
||||
},
|
||||
);
|
||||
```
|
||||
|
||||
### Second-Tier Tests
|
||||
|
||||
**Job:** `second-perl-and-pgtap`
|
||||
|
||||
**Purpose:** Test against Perl 5.42.0 (latest stable)
|
||||
|
||||
**Trigger:** After main tests pass
|
||||
|
||||
**Allowed to Fail:** Yes (informational only)
|
||||
|
||||
### Report Generation
|
||||
|
||||
**Job:** `generate-reports`
|
||||
|
||||
**Steps:**
|
||||
1. Download coverage reports from all test jobs
|
||||
2. Merge coverage data
|
||||
3. Generate HTML coverage report
|
||||
4. Upload to Codecov
|
||||
5. Comment on PR with coverage summary
|
||||
|
||||
**Coverage Tools:**
|
||||
- Perl: Devel::Cover
|
||||
- JavaScript: Istanbul/nyc
|
||||
|
||||
## Build Process
|
||||
|
||||
### Step 1: Install Perl Dependencies
|
||||
|
||||
```bash
|
||||
# Install Carton (Perl dependency manager)
|
||||
cpanm --notest Carton
|
||||
|
||||
# Install dependencies from cpanfile.snapshot
|
||||
carton install --deployment
|
||||
```
|
||||
|
||||
**Dependencies Installed:**
|
||||
- Catalyst framework
|
||||
- Moose object system
|
||||
- DBD::Pg database driver
|
||||
- Template::Toolkit
|
||||
- JSON::XS
|
||||
- XML::LibXML
|
||||
- Redis client
|
||||
- ~200 total CPAN modules
|
||||
|
||||
**Installation Time:** ~10 minutes (first time), ~1 minute (cached)
|
||||
|
||||
### Step 2: Install Node.js Dependencies
|
||||
|
||||
```bash
|
||||
# Install Yarn (if not present)
|
||||
npm install -g yarn
|
||||
|
||||
# Install dependencies from yarn.lock
|
||||
yarn install --frozen-lockfile
|
||||
```
|
||||
|
||||
**Dependencies Installed:**
|
||||
- React 19.2.4
|
||||
- Redux
|
||||
- Webpack 5
|
||||
- Babel 7
|
||||
- Jest (testing)
|
||||
- ESLint (linting)
|
||||
- ~500 total npm packages
|
||||
|
||||
**Installation Time:** ~5 minutes (first time), ~30 seconds (cached)
|
||||
|
||||
### Step 3: Compile Static Resources
|
||||
|
||||
```bash
|
||||
# Compile CSS, images, fonts
|
||||
./script/compile_resources.sh
|
||||
```
|
||||
|
||||
**Tasks:**
|
||||
- Compile LESS to CSS
|
||||
- Optimize images (pngcrush, optipng)
|
||||
- Copy fonts to static directory
|
||||
- Generate CSS sprites
|
||||
- Minify CSS
|
||||
|
||||
**Output:** `root/static/styles/`, `root/static/images/`
|
||||
|
||||
**Time:** ~2 minutes
|
||||
|
||||
### Step 4: Build JavaScript Bundles
|
||||
|
||||
```bash
|
||||
# Build production bundles with Webpack
|
||||
yarn run build
|
||||
|
||||
# Or for development (with source maps)
|
||||
yarn run build:dev
|
||||
```
|
||||
|
||||
**Webpack Configuration:**
|
||||
- Entry points: `root/static/scripts/main.js`, `root/static/scripts/edit.js`
|
||||
- Output: `root/static/build/`
|
||||
- Loaders: Babel (JSX, ES6+), CSS, file-loader
|
||||
- Plugins: UglifyJS, ExtractTextPlugin, DefinePlugin
|
||||
- Code splitting: Vendor bundle, async chunks
|
||||
|
||||
**Output Files:**
|
||||
- `main.bundle.js` - Main application code
|
||||
- `vendor.bundle.js` - Third-party libraries
|
||||
- `edit.bundle.js` - Edit interface code
|
||||
- `*.chunk.js` - Async-loaded chunks
|
||||
|
||||
**Time:** ~3 minutes (production), ~30 seconds (development)
|
||||
|
||||
### Step 5: Initialize Database
|
||||
|
||||
```bash
|
||||
# Create database
|
||||
createdb musicbrainz_db
|
||||
|
||||
# Load schema
|
||||
psql musicbrainz_db < admin/sql/CreateTables.sql
|
||||
|
||||
# Load initial data
|
||||
./admin/InitDb.pl --createdb --import
|
||||
```
|
||||
|
||||
**Schema Loading:**
|
||||
- 375 tables created
|
||||
- 500+ foreign keys added
|
||||
- Indexes created
|
||||
- Triggers installed
|
||||
|
||||
**Initial Data:**
|
||||
- Countries and areas
|
||||
- Languages
|
||||
- Relationship types
|
||||
- Instrument types
|
||||
- Genre definitions
|
||||
|
||||
**Time:** ~10 minutes (schema), ~30 minutes (sample data)
|
||||
|
||||
### Step 6: Build Search Indexes
|
||||
|
||||
```bash
|
||||
# Build Solr indexes for all entities
|
||||
./admin/BuildSearchIndexes.pl --all
|
||||
```
|
||||
|
||||
**Indexes Built:**
|
||||
- Artist index
|
||||
- Release index
|
||||
- Recording index
|
||||
- Work index
|
||||
- Label index
|
||||
- Area, event, place, series, instrument indexes
|
||||
|
||||
**Time:** ~2 hours (full production data), ~5 minutes (sample data)
|
||||
|
||||
## System Requirements
|
||||
|
||||
### Minimum Requirements (Development)
|
||||
|
||||
**CPU:** 2 cores
|
||||
**RAM:** 4 GB
|
||||
**Disk:** 20 GB
|
||||
**Database:** PostgreSQL 16+
|
||||
**Cache:** Redis 6.0+
|
||||
**Search:** Solr 8.11+
|
||||
|
||||
### Recommended Requirements (Production)
|
||||
|
||||
**CPU:** 8+ cores
|
||||
**RAM:** 16+ GB
|
||||
**Disk:** 500+ GB SSD
|
||||
- 350 GB for PostgreSQL database
|
||||
- 50 GB for Solr indexes
|
||||
- 50 GB for backups
|
||||
- 50 GB for logs and temp files
|
||||
|
||||
**Database:** PostgreSQL 16+ with:
|
||||
- shared_buffers = 4GB
|
||||
- effective_cache_size = 12GB
|
||||
- work_mem = 64MB
|
||||
- maintenance_work_mem = 1GB
|
||||
|
||||
**Cache:** Redis 6.0+ with:
|
||||
- maxmemory = 2GB
|
||||
- maxmemory-policy = allkeys-lru
|
||||
|
||||
**Search:** Solr 8.11+ with:
|
||||
- Java heap = 4GB
|
||||
- Solr cache = 512MB per core
|
||||
|
||||
### Network Requirements
|
||||
|
||||
**Bandwidth:** 100 Mbps+ (for replication and API traffic)
|
||||
|
||||
**Ports:**
|
||||
- 5000 - Website
|
||||
- 5001 - Web service API
|
||||
- 5432 - PostgreSQL
|
||||
- 6379 - Redis
|
||||
- 8983 - Solr
|
||||
|
||||
**Firewall:**
|
||||
- Allow inbound 80/443 (HTTP/HTTPS)
|
||||
- Allow outbound 80/443 (external APIs)
|
||||
- Restrict 5432, 6379, 8983 to localhost
|
||||
|
||||
### Software Requirements
|
||||
|
||||
**Operating System:**
|
||||
- Ubuntu 24.04 LTS (Noble) - recommended
|
||||
- Debian 12 (Bookworm)
|
||||
- Any Linux with Perl 5.38+ and Node.js 20+
|
||||
|
||||
**Perl:** 5.38.0 or later (5.42.0 tested)
|
||||
|
||||
**Node.js:** 20.9.0 or later
|
||||
|
||||
**PostgreSQL:** 16.0 or later (16.3 recommended)
|
||||
|
||||
**Redis:** 6.0 or later (7.0 recommended)
|
||||
|
||||
**Solr:** 8.11 or later
|
||||
|
||||
**Optional:**
|
||||
- Docker 24.0+
|
||||
- Docker Compose 2.0+
|
||||
- Nginx 1.24+ (reverse proxy)
|
||||
- RabbitMQ 3.12+ (background jobs)
|
||||
|
||||
## Deployment Strategies
|
||||
|
||||
### Single Server
|
||||
|
||||
**Use Case:** Development, small mirrors
|
||||
|
||||
**Architecture:**
|
||||
- All services on one server
|
||||
- PostgreSQL, Redis, Solr, MusicBrainz on localhost
|
||||
- Nginx reverse proxy
|
||||
|
||||
**Pros:**
|
||||
- Simple setup
|
||||
- Low cost
|
||||
- Easy to manage
|
||||
|
||||
**Cons:**
|
||||
- Single point of failure
|
||||
- Limited scalability
|
||||
- Resource contention
|
||||
|
||||
### Multi-Server
|
||||
|
||||
**Use Case:** Production, high-traffic mirrors
|
||||
|
||||
**Architecture:**
|
||||
- Web tier: 2+ servers running MusicBrainz (load balanced)
|
||||
- Database tier: PostgreSQL primary + replicas
|
||||
- Cache tier: Redis (possibly clustered)
|
||||
- Search tier: Solr (possibly sharded)
|
||||
|
||||
**Pros:**
|
||||
- High availability
|
||||
- Horizontal scalability
|
||||
- Better performance
|
||||
|
||||
**Cons:**
|
||||
- Complex setup
|
||||
- Higher cost
|
||||
- Requires load balancer
|
||||
|
||||
### Docker Swarm / Kubernetes
|
||||
|
||||
**Use Case:** Large-scale deployments, cloud environments
|
||||
|
||||
**Architecture:**
|
||||
- Container orchestration
|
||||
- Auto-scaling
|
||||
- Service discovery
|
||||
- Health checks
|
||||
|
||||
**Pros:**
|
||||
- Automated deployment
|
||||
- Self-healing
|
||||
- Easy scaling
|
||||
|
||||
**Cons:**
|
||||
- Steep learning curve
|
||||
- Operational complexity
|
||||
- Overhead
|
||||
|
||||
## Monitoring and Logging
|
||||
|
||||
### Logging
|
||||
|
||||
**Framework:** Log::Dispatch
|
||||
|
||||
**Log Levels:**
|
||||
- DEBUG - Verbose debugging
|
||||
- INFO - Informational messages
|
||||
- WARN - Warnings
|
||||
- ERROR - Errors
|
||||
- FATAL - Fatal errors
|
||||
|
||||
**Log Destinations:**
|
||||
- STDOUT (development)
|
||||
- File (production): `/var/log/musicbrainz/server.log`
|
||||
- Syslog (optional)
|
||||
|
||||
**Log Rotation:**
|
||||
- Daily rotation
|
||||
- Keep 30 days
|
||||
- Compress old logs
|
||||
|
||||
### Error Tracking
|
||||
|
||||
**Platform:** Sentry
|
||||
|
||||
**Integration:**
|
||||
- Server-side: Perl Sentry SDK
|
||||
- Client-side: JavaScript Sentry SDK
|
||||
|
||||
**Captured:**
|
||||
- Exceptions
|
||||
- Error messages
|
||||
- Stack traces
|
||||
- Request context
|
||||
- User context
|
||||
|
||||
### Metrics
|
||||
|
||||
**Current State:** No Prometheus/metrics endpoint
|
||||
|
||||
**Workaround:** Parse logs for metrics
|
||||
|
||||
**Future:** Prometheus exporter planned
|
||||
|
||||
### Health Checks
|
||||
|
||||
**Current State:** No dedicated health check endpoint
|
||||
|
||||
**Workaround:** Check `/` returns 200
|
||||
|
||||
**Future:** `/health` endpoint planned
|
||||
@@ -0,0 +1,513 @@
|
||||
# MusicBrainz Server Evaluation
|
||||
|
||||
## Strengths
|
||||
|
||||
### 1. Canonical Music Metadata Source
|
||||
|
||||
**Evidence:** MusicBrainz is the de facto standard for music metadata. Used by:
|
||||
- Spotify (artist/release matching)
|
||||
- Last.fm (scrobbling normalization)
|
||||
- Roon (music library management)
|
||||
- Picard (music tagging)
|
||||
- Beets (music organization)
|
||||
- Hundreds of other music applications
|
||||
|
||||
**Impact:** Any music metadata aggregator must include MusicBrainz data to be comprehensive. It's the foundation that other services build upon.
|
||||
|
||||
**Data Quality:** Community-driven editing with voting system ensures high accuracy. Over 2 million edits per year, with auto-editors providing quality control.
|
||||
|
||||
### 2. Massive, Comprehensive Dataset
|
||||
|
||||
**Scale (as of 2024):**
|
||||
- 2.1+ million artists
|
||||
- 3.5+ million releases
|
||||
- 30+ million recordings
|
||||
- 1.5+ million works
|
||||
- 1.3+ million labels
|
||||
- 100+ million relationships
|
||||
|
||||
**Coverage:** Extensive coverage across:
|
||||
- All genres (classical, jazz, rock, electronic, world music, etc.)
|
||||
- All eras (historical recordings to latest releases)
|
||||
- All regions (global coverage with strong international community)
|
||||
- All formats (vinyl, CD, digital, cassette, etc.)
|
||||
|
||||
**Relationships:** Rich relationship data connecting:
|
||||
- Artists to recordings (performer, conductor, engineer, etc.)
|
||||
- Recordings to works (performance of composition)
|
||||
- Artists to artists (member of, collaboration, etc.)
|
||||
- Releases to labels, areas, events, etc.
|
||||
|
||||
**Identifiers:** Comprehensive identifier coverage:
|
||||
- ISRCs (International Standard Recording Code)
|
||||
- ISWCs (International Standard Musical Work Code)
|
||||
- Barcodes (EAN, UPC)
|
||||
- Disc IDs (CD table of contents)
|
||||
- External links (Wikipedia, Discogs, AllMusic, etc.)
|
||||
|
||||
### 3. Mature, Battle-Tested Codebase
|
||||
|
||||
**Age:** 15+ years of continuous development (since 2001)
|
||||
|
||||
**Stability:** Proven reliability serving millions of requests daily with minimal downtime.
|
||||
|
||||
**Evolution:** Gradual modernization while maintaining backward compatibility:
|
||||
- Started with Template Toolkit (still used)
|
||||
- Added Knockout.js (being phased out)
|
||||
- Migrating to React (ongoing)
|
||||
- API has remained stable since v2 (2011)
|
||||
|
||||
**Community:** Large, active open-source community:
|
||||
- 500+ contributors on GitHub
|
||||
- Active development (commits daily)
|
||||
- Responsive to issues and pull requests
|
||||
- Strong documentation culture
|
||||
|
||||
### 4. Comprehensive, Well-Designed API
|
||||
|
||||
**Maturity:** API v2 stable since 2011, widely adopted
|
||||
|
||||
**Formats:** Multiple serialization formats:
|
||||
- JSON (modern, widely supported)
|
||||
- XML (legacy, still used by many clients)
|
||||
- JSON-LD (semantic web, Schema.org vocabulary)
|
||||
|
||||
**Features:**
|
||||
- Lookup by MBID (unique identifier)
|
||||
- Browse by relationships (all releases by artist, etc.)
|
||||
- Search with Lucene query syntax
|
||||
- Include parameters for fine-grained control
|
||||
- Pagination for large result sets
|
||||
- CORS enabled for browser clients
|
||||
|
||||
**Rate Limiting:** Reasonable limits (1 req/sec recommended) with clear documentation
|
||||
|
||||
**Authentication:** Modern OAuth2 with PKCE for user-specific operations
|
||||
|
||||
**Documentation:** Comprehensive API docs with examples at musicbrainz.org/doc/Development/XML_Web_Service/Version_2
|
||||
|
||||
### 5. Transparent Edit/Voting System
|
||||
|
||||
**Command Pattern:** All modifications are versioned edits, providing:
|
||||
- Full audit trail (who changed what, when, why)
|
||||
- Rollback capability (edits can be reverted)
|
||||
- Transparency (all edits publicly visible)
|
||||
- Accountability (editors build reputation)
|
||||
|
||||
**Community Quality Control:**
|
||||
- 7-day voting period for most edits
|
||||
- Community votes yes/no/abstain
|
||||
- Auto-editors can approve immediately (earned privilege)
|
||||
- Failed edits can be resubmitted with improvements
|
||||
|
||||
**Edit Types:** 100+ edit types covering all operations:
|
||||
- Create/edit/delete entities
|
||||
- Add/edit/delete relationships
|
||||
- Merge duplicates
|
||||
- Add identifiers (ISRC, barcode, etc.)
|
||||
|
||||
**Benefits:**
|
||||
- High data quality through peer review
|
||||
- Prevents vandalism and spam
|
||||
- Encourages collaboration and discussion
|
||||
- Builds trust in the data
|
||||
|
||||
### 6. Replication Support for Mirrors
|
||||
|
||||
**Architecture:** Master-Mirror via dbmirror2 packet system
|
||||
|
||||
**Use Cases:**
|
||||
- Organizations needing local copy (reduced latency, offline access)
|
||||
- High-volume API users (avoid rate limits)
|
||||
- Research projects (full dataset access)
|
||||
- Backup/disaster recovery
|
||||
|
||||
**Replication Packets:**
|
||||
- Incremental updates (not full dumps)
|
||||
- Hourly packets available
|
||||
- Efficient bandwidth usage
|
||||
- Verifiable integrity
|
||||
|
||||
**Mirror Benefits:**
|
||||
- Full read access to entire dataset
|
||||
- No rate limiting
|
||||
- Custom queries and analytics
|
||||
- Integration with internal systems
|
||||
|
||||
### 7. Rich Relationship Model
|
||||
|
||||
**Advanced Relationships:** Not just artist-to-release, but:
|
||||
- Artist-to-artist (member of, collaboration, married to, etc.)
|
||||
- Recording-to-work (performance of composition)
|
||||
- Release-to-event (recorded at festival, etc.)
|
||||
- Work-to-work (arrangement of, medley of, etc.)
|
||||
|
||||
**Relationship Attributes:**
|
||||
- Dates (begin/end)
|
||||
- Credits (custom artist credits)
|
||||
- Instruments (performer played guitar, etc.)
|
||||
- Roles (producer, engineer, etc.)
|
||||
|
||||
**Use Cases:**
|
||||
- Music discovery (find similar artists)
|
||||
- Discography completeness (all releases by artist)
|
||||
- Session musician tracking (who played on what)
|
||||
- Classical music (composer, conductor, orchestra, etc.)
|
||||
|
||||
## Weaknesses
|
||||
|
||||
### 1. Perl Language Ecosystem Decline
|
||||
|
||||
**Evidence:**
|
||||
- Perl ranked #19 in TIOBE index (down from top 5 in 2000s)
|
||||
- Declining CPAN module releases (peak 2014, declining since)
|
||||
- Fewer Perl developers entering workforce
|
||||
- Most new web projects use Python, JavaScript, Go, Rust
|
||||
|
||||
**Impact:**
|
||||
- Harder to recruit Perl developers
|
||||
- Smaller pool of contributors
|
||||
- Slower adoption of modern practices
|
||||
- Dependency on aging CPAN modules
|
||||
|
||||
**Mitigation:**
|
||||
- MusicBrainz has stable, experienced Perl team
|
||||
- Codebase is well-documented
|
||||
- Gradual migration to JavaScript on frontend
|
||||
- API allows language-agnostic integration
|
||||
|
||||
**Reality Check:** While Perl is declining, MusicBrainz's Perl codebase is mature and stable. The bigger risk is long-term maintainability (10+ years), not immediate functionality.
|
||||
|
||||
### 2. Heavy Infrastructure Requirements
|
||||
|
||||
**Database Size:** ~350GB for production dataset (with indexes)
|
||||
|
||||
**Resource Requirements:**
|
||||
- 8+ CPU cores
|
||||
- 16+ GB RAM
|
||||
- 500+ GB SSD storage
|
||||
- PostgreSQL 16+ (specific version requirement)
|
||||
- Redis (16 databases)
|
||||
- Apache Solr (13 cores)
|
||||
|
||||
**Deployment Complexity:**
|
||||
- Multiple services to coordinate
|
||||
- Complex build process (Perl + Node.js)
|
||||
- Long initial setup (schema load, index build)
|
||||
- Replication setup requires FTP server
|
||||
|
||||
**Cost Implications:**
|
||||
- Self-hosting requires dedicated server (~$200+/month)
|
||||
- Cloud hosting even more expensive
|
||||
- Bandwidth costs for replication
|
||||
- Operational overhead (backups, monitoring, updates)
|
||||
|
||||
**Practical Impact:** For most use cases, using the public API is far more practical than self-hosting. Only large organizations with specific needs (high volume, custom queries, offline access) should consider self-hosting.
|
||||
|
||||
### 3. No Modern Observability
|
||||
|
||||
**Missing:**
|
||||
- Prometheus metrics endpoint
|
||||
- Structured logging (JSON logs)
|
||||
- Distributed tracing (OpenTelemetry)
|
||||
- Health check endpoint
|
||||
- Readiness/liveness probes
|
||||
|
||||
**Current State:**
|
||||
- Plain text logs
|
||||
- No metrics export
|
||||
- Manual log parsing for monitoring
|
||||
- No standardized health checks
|
||||
|
||||
**Impact:**
|
||||
- Harder to integrate with modern monitoring stacks (Grafana, Datadog, etc.)
|
||||
- Limited visibility into performance bottlenecks
|
||||
- Difficult to debug production issues
|
||||
- No SLO/SLA tracking
|
||||
|
||||
**Workarounds:**
|
||||
- Parse logs with Logstash/Fluentd
|
||||
- Monitor HTTP responses
|
||||
- Database query monitoring
|
||||
- Custom metrics collection
|
||||
|
||||
**Future:** Prometheus exporter is planned but not yet implemented.
|
||||
|
||||
### 4. Incomplete Frontend Modernization
|
||||
|
||||
**Legacy Code:**
|
||||
- Knockout.js still present in many views
|
||||
- jQuery used extensively
|
||||
- Inline JavaScript in templates
|
||||
- Mixed Template Toolkit + React
|
||||
|
||||
**Evidence:**
|
||||
- `root/static/scripts/` contains both Knockout and React
|
||||
- Some pages fully React, others fully Knockout, some mixed
|
||||
- Inconsistent UI patterns across pages
|
||||
|
||||
**Impact:**
|
||||
- Larger JavaScript bundle size
|
||||
- Maintenance burden (two frameworks)
|
||||
- Inconsistent user experience
|
||||
- Harder for new contributors
|
||||
|
||||
**Migration Status:**
|
||||
- New features use React
|
||||
- Old features gradually migrated
|
||||
- No timeline for complete migration
|
||||
- Knockout removal is low priority
|
||||
|
||||
**Reality Check:** This is a cosmetic issue, not a functional one. The site works well despite the mixed frontend. For API users, this is irrelevant.
|
||||
|
||||
### 5. Custom ORM Instead of Standard
|
||||
|
||||
**Architecture:** Custom Moose-based data layer, not DBIx::Class
|
||||
|
||||
**Characteristics:**
|
||||
- 106 Data modules (26,000 lines)
|
||||
- Raw SQL via DBD::Pg
|
||||
- Custom query builder (Sql.pm)
|
||||
- Moose roles for common patterns
|
||||
|
||||
**Drawbacks:**
|
||||
- Steeper learning curve for new contributors
|
||||
- No ecosystem of plugins/extensions
|
||||
- Manual query construction
|
||||
- No automatic migrations
|
||||
|
||||
**Benefits:**
|
||||
- Better performance (no ORM overhead)
|
||||
- Full control over SQL
|
||||
- Simpler for complex queries
|
||||
- Fewer dependencies
|
||||
|
||||
**Reality Check:** The custom ORM is well-designed and battle-tested. It's not a weakness in functionality, but in onboarding and maintainability. For a project this mature, changing to a standard ORM would be a massive undertaking with little benefit.
|
||||
|
||||
### 6. Limited Real-Time Capabilities
|
||||
|
||||
**Current State:**
|
||||
- No WebSocket support
|
||||
- No Server-Sent Events
|
||||
- No real-time notifications
|
||||
- Polling required for updates
|
||||
|
||||
**Impact:**
|
||||
- Edit notifications delayed
|
||||
- Search results not live-updated
|
||||
- Collaborative editing limited
|
||||
- Higher server load from polling
|
||||
|
||||
**Workarounds:**
|
||||
- Redis pub/sub for internal events
|
||||
- Periodic polling from clients
|
||||
- Email notifications for edits
|
||||
|
||||
**Future:** Real-time features not prioritized (low demand).
|
||||
|
||||
## Integration Considerations
|
||||
|
||||
### API Integration (Recommended)
|
||||
|
||||
**Best For:**
|
||||
- Most use cases
|
||||
- Low to medium volume (<1M requests/month)
|
||||
- No custom query requirements
|
||||
- Budget-conscious projects
|
||||
|
||||
**Approach:**
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Lookup artist by MBID
|
||||
response = requests.get(
|
||||
'https://musicbrainz.org/ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da',
|
||||
params={'fmt': 'json', 'inc': 'releases+recordings'},
|
||||
headers={'User-Agent': 'MyApp/1.0 (contact@example.com)'}
|
||||
)
|
||||
artist = response.json()
|
||||
```
|
||||
|
||||
**Advantages:**
|
||||
- No infrastructure to manage
|
||||
- Always up-to-date data
|
||||
- No storage costs
|
||||
- Simple integration
|
||||
|
||||
**Limitations:**
|
||||
- Rate limiting (1 req/sec recommended)
|
||||
- Network latency
|
||||
- No custom queries
|
||||
- Dependent on MusicBrainz uptime
|
||||
|
||||
**Best Practices:**
|
||||
- Cache responses aggressively
|
||||
- Respect rate limits
|
||||
- Include User-Agent with contact info
|
||||
- Handle errors gracefully
|
||||
|
||||
### Replication/Mirror (Advanced)
|
||||
|
||||
**Best For:**
|
||||
- High volume (>10M requests/month)
|
||||
- Custom queries and analytics
|
||||
- Offline access required
|
||||
- Research projects
|
||||
|
||||
**Approach:**
|
||||
1. Set up PostgreSQL 16+ server (500GB+ storage)
|
||||
2. Download initial database dump
|
||||
3. Load schema and data
|
||||
4. Configure replication (RT_MIRROR mode)
|
||||
5. Download and apply hourly replication packets
|
||||
|
||||
**Advantages:**
|
||||
- No rate limiting
|
||||
- Full dataset access
|
||||
- Custom queries
|
||||
- Low latency
|
||||
|
||||
**Disadvantages:**
|
||||
- High infrastructure cost (~$200+/month)
|
||||
- Operational overhead
|
||||
- Replication lag (minutes to hours)
|
||||
- Storage requirements (350GB+)
|
||||
|
||||
**Maintenance:**
|
||||
- Apply replication packets hourly
|
||||
- Monitor replication lag
|
||||
- Rebuild indexes periodically
|
||||
- Backup database regularly
|
||||
|
||||
### Hybrid Approach (Optimal)
|
||||
|
||||
**Strategy:**
|
||||
- Use API for lookups and searches
|
||||
- Cache frequently accessed data locally
|
||||
- Replicate subset of data for custom queries
|
||||
- Fall back to API for cache misses
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
# Check local cache first
|
||||
artist = cache.get(f'artist:{mbid}')
|
||||
|
||||
if not artist:
|
||||
# Cache miss - fetch from API
|
||||
response = requests.get(f'https://musicbrainz.org/ws/2/artist/{mbid}')
|
||||
artist = response.json()
|
||||
|
||||
# Cache for 1 hour
|
||||
cache.set(f'artist:{mbid}', artist, ttl=3600)
|
||||
|
||||
return artist
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Lower API usage (respect rate limits)
|
||||
- Faster response times
|
||||
- Reduced infrastructure costs
|
||||
- Graceful degradation
|
||||
|
||||
## Relevance to Metadata Aggregator Project
|
||||
|
||||
### Primary Data Source
|
||||
|
||||
**Role:** MusicBrainz is the foundational music metadata source. All other music metadata projects reference or build upon MusicBrainz:
|
||||
|
||||
- **Discogs:** Cross-references MusicBrainz IDs
|
||||
- **Last.fm:** Uses MusicBrainz for artist/track normalization
|
||||
- **AcousticBrainz:** Audio analysis keyed by MusicBrainz recording ID
|
||||
- **ListenBrainz:** Listening history using MusicBrainz IDs
|
||||
- **CritiqueBrainz:** Reviews keyed by MusicBrainz release ID
|
||||
|
||||
**Implication:** A metadata aggregator without MusicBrainz is incomplete. MusicBrainz provides the canonical identifiers (MBIDs) that link data across services.
|
||||
|
||||
### Integration Priority: Critical
|
||||
|
||||
**Rationale:**
|
||||
1. **Canonical IDs:** MBIDs are the standard for music entity identification
|
||||
2. **Comprehensive Coverage:** Largest open music metadata database
|
||||
3. **Relationship Data:** Rich connections between entities
|
||||
4. **Community Trust:** High data quality through peer review
|
||||
5. **API Stability:** Mature, stable API with long-term support
|
||||
|
||||
**Recommended Integration:**
|
||||
- Use MusicBrainz API as primary metadata source
|
||||
- Cache responses locally (1-hour TTL)
|
||||
- Use MBIDs as primary keys in aggregator database
|
||||
- Cross-reference with other sources (Discogs, Last.fm, etc.)
|
||||
- Contribute improvements back to MusicBrainz
|
||||
|
||||
### Data Model Alignment
|
||||
|
||||
**MusicBrainz Entities Map Well to Aggregator Needs:**
|
||||
|
||||
| MusicBrainz Entity | Aggregator Use Case |
|
||||
|-------------------|---------------------|
|
||||
| Artist | Artist profiles, discographies |
|
||||
| Release | Album/single metadata |
|
||||
| Recording | Track metadata, audio fingerprinting |
|
||||
| Work | Composition metadata, cover detection |
|
||||
| Label | Label discographies, release attribution |
|
||||
| Relationship | Music discovery, session musician tracking |
|
||||
|
||||
**Identifiers:**
|
||||
- MBID as primary key
|
||||
- ISRC for recording matching
|
||||
- Barcode for release matching
|
||||
- Disc ID for CD identification
|
||||
|
||||
### Complementary Data Sources
|
||||
|
||||
**MusicBrainz Strengths:**
|
||||
- Canonical entity IDs
|
||||
- Relationship data
|
||||
- Release metadata
|
||||
- Identifier coverage
|
||||
|
||||
**MusicBrainz Gaps (fill with other sources):**
|
||||
- Album reviews → CritiqueBrainz, AllMusic
|
||||
- Listening statistics → Last.fm, Spotify
|
||||
- Audio features → AcousticBrainz, Spotify
|
||||
- Lyrics → LyricWiki, Genius
|
||||
- Album art → Cover Art Archive (integrated)
|
||||
- Popularity metrics → Last.fm, Spotify
|
||||
|
||||
### Implementation Roadmap
|
||||
|
||||
**Phase 1: Basic Integration**
|
||||
1. Implement MusicBrainz API client
|
||||
2. Cache artist/release/recording lookups
|
||||
3. Store MBIDs as primary keys
|
||||
4. Handle rate limiting gracefully
|
||||
|
||||
**Phase 2: Enhanced Integration**
|
||||
1. Implement relationship traversal
|
||||
2. Add search functionality
|
||||
3. Integrate Cover Art Archive
|
||||
4. Add identifier lookups (ISRC, barcode)
|
||||
|
||||
**Phase 3: Advanced Integration**
|
||||
1. Consider replication for high volume
|
||||
2. Contribute improvements to MusicBrainz
|
||||
3. Implement edit submission (if applicable)
|
||||
4. Add real-time update monitoring
|
||||
|
||||
**Phase 4: Ecosystem Integration**
|
||||
1. Integrate complementary services (Last.fm, etc.)
|
||||
2. Cross-reference data across sources
|
||||
3. Resolve conflicts and duplicates
|
||||
4. Build unified metadata view
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Overall Assessment:** MusicBrainz is an essential, high-quality music metadata source with a mature codebase and comprehensive API. While it has some technical debt (Perl, legacy frontend, custom ORM), these are manageable and don't impact its value as a data source.
|
||||
|
||||
**Recommendation for Metadata Aggregator:**
|
||||
- **Priority:** Critical - integrate early
|
||||
- **Approach:** API-based with aggressive caching
|
||||
- **Timeline:** Phase 1 in first sprint
|
||||
- **Resources:** Low (API integration is straightforward)
|
||||
|
||||
**Key Takeaway:** MusicBrainz is the foundation of music metadata. Any serious music metadata aggregator must integrate MusicBrainz to be comprehensive and credible.
|
||||
@@ -0,0 +1,529 @@
|
||||
# MusicBrainz Server Integrations
|
||||
|
||||
## Cover Art Archive
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** Cover Art Archive (coverartarchive.org)
|
||||
**Storage:** Amazon S3 + Internet Archive
|
||||
**Purpose:** Store and serve album cover artwork
|
||||
|
||||
### Upload Process
|
||||
|
||||
**Method:** Signed POST to S3
|
||||
|
||||
**Authentication:** HMAC-SHA1 signed policy
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# DBDefs.pm
|
||||
sub COVER_ART_ARCHIVE_ACCESS_KEY { 'access_key' }
|
||||
sub COVER_ART_ARCHIVE_SECRET_KEY { 'secret_key' }
|
||||
sub COVER_ART_ARCHIVE_UPLOAD_PREFIXER { 'MB' }
|
||||
sub COVER_ART_ARCHIVE_DOWNLOAD_PREFIX { 'https://coverartarchive.org' }
|
||||
```
|
||||
|
||||
**Upload Flow:**
|
||||
1. User uploads image via MusicBrainz interface
|
||||
2. Server generates S3 policy document
|
||||
3. Policy signed with HMAC-SHA1 using secret key
|
||||
4. Browser POSTs directly to S3 with signed policy
|
||||
5. S3 stores image and forwards to Internet Archive
|
||||
6. Image becomes available at coverartarchive.org
|
||||
|
||||
**Policy Document:**
|
||||
```json
|
||||
{
|
||||
"expiration": "2024-12-31T23:59:59Z",
|
||||
"conditions": [
|
||||
{"bucket": "mbid-{release_mbid}"},
|
||||
{"acl": "public-read"},
|
||||
["starts-with", "$key", "mbid-{release_mbid}/"],
|
||||
["content-length-range", 0, 10485760]
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Signature:**
|
||||
```perl
|
||||
use Digest::SHA qw(hmac_sha1_base64);
|
||||
|
||||
my $policy_b64 = encode_base64($policy_json);
|
||||
my $signature = hmac_sha1_base64($policy_b64, $secret_key);
|
||||
$signature .= '=' while length($signature) % 4; # Pad to multiple of 4
|
||||
```
|
||||
|
||||
### Retrieval
|
||||
|
||||
**URL Pattern:** `https://coverartarchive.org/release/{mbid}/front`
|
||||
|
||||
**Image Types:**
|
||||
- `front` - Front cover
|
||||
- `back` - Back cover
|
||||
- `{id}` - Specific image by ID
|
||||
|
||||
**Sizes:**
|
||||
- Original (full resolution)
|
||||
- `250` - 250px thumbnail
|
||||
- `500` - 500px thumbnail
|
||||
- `1200` - 1200px large
|
||||
|
||||
**Example:**
|
||||
```
|
||||
https://coverartarchive.org/release/76df3287-6cda-33eb-8e9a-044b5e15ffdd/front-250.jpg
|
||||
```
|
||||
|
||||
## Wikipedia/Wikidata/Wikimedia Commons
|
||||
|
||||
### MediaWiki API Integration
|
||||
|
||||
**Purpose:** Fetch article extracts, images, and structured data
|
||||
|
||||
**Endpoints:**
|
||||
- Wikipedia: `https://{lang}.wikipedia.org/w/api.php`
|
||||
- Wikidata: `https://www.wikidata.org/w/api.php`
|
||||
- Commons: `https://commons.wikimedia.org/w/api.php`
|
||||
|
||||
### Wikipedia Extracts
|
||||
|
||||
**API Action:** `query` with `prop=extracts`
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://en.wikipedia.org/w/api.php?" .
|
||||
"action=query&" .
|
||||
"prop=extracts&" .
|
||||
"exintro=1&" .
|
||||
"explaintext=1&" .
|
||||
"titles=" . uri_escape($artist_name) .
|
||||
"&format=json";
|
||||
|
||||
my $response = $ua->get($url);
|
||||
my $data = decode_json($response->content);
|
||||
```
|
||||
|
||||
**Caching:** 3 days for extracts
|
||||
|
||||
**Display:** Artist/release pages show Wikipedia extract in sidebar
|
||||
|
||||
### Language Links
|
||||
|
||||
**API Action:** `query` with `prop=langlinks`
|
||||
|
||||
**Purpose:** Find Wikipedia articles in different languages
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://en.wikipedia.org/w/api.php?" .
|
||||
"action=query&" .
|
||||
"prop=langlinks&" .
|
||||
"titles=" . uri_escape($title) .
|
||||
"&lllimit=500&" .
|
||||
"&format=json";
|
||||
```
|
||||
|
||||
**Caching:** 7 days for language links
|
||||
|
||||
**Usage:** Display Wikipedia links in user's preferred language
|
||||
|
||||
### Wikidata Integration
|
||||
|
||||
**Purpose:** Fetch structured data (birth dates, locations, etc.)
|
||||
|
||||
**API Action:** `wbgetentities`
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://www.wikidata.org/w/api.php?" .
|
||||
"action=wbgetentities&" .
|
||||
"ids=Q{wikidata_id}&" .
|
||||
"format=json";
|
||||
```
|
||||
|
||||
**Data Extracted:**
|
||||
- Birth/death dates
|
||||
- Birth/death places
|
||||
- Occupations
|
||||
- Genres
|
||||
- Record labels
|
||||
- Official websites
|
||||
|
||||
### Wikimedia Commons Images
|
||||
|
||||
**Purpose:** Fetch artist/band photos
|
||||
|
||||
**API Action:** `query` with `prop=imageinfo`
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://commons.wikimedia.org/w/api.php?" .
|
||||
"action=query&" .
|
||||
"prop=imageinfo&" .
|
||||
"iiprop=url|size|mime&" .
|
||||
"titles=File:" . uri_escape($filename) .
|
||||
"&format=json";
|
||||
```
|
||||
|
||||
**Display:** Artist pages show Commons images in sidebar
|
||||
|
||||
## CritiqueBrainz
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** CritiqueBrainz (critiquebrainz.org)
|
||||
**Purpose:** User-generated music reviews
|
||||
|
||||
### Integration
|
||||
|
||||
**Method:** URL linking
|
||||
|
||||
**Pattern:** `https://critiquebrainz.org/release/{mbid}`
|
||||
|
||||
**Display:** Release pages show link to CritiqueBrainz reviews
|
||||
|
||||
**Embedding:** Review count and average rating displayed on release pages
|
||||
|
||||
**API:** CritiqueBrainz API used to fetch review statistics
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://critiquebrainz.org/ws/1/release/$mbid";
|
||||
my $response = $ua->get($url);
|
||||
my $data = decode_json($response->content);
|
||||
|
||||
my $review_count = $data->{review_count};
|
||||
my $avg_rating = $data->{average_rating};
|
||||
```
|
||||
|
||||
## Event Art Archive
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** Event Art Archive
|
||||
**Purpose:** Store event posters and promotional materials
|
||||
|
||||
**Architecture:** Similar to Cover Art Archive (S3 + Internet Archive)
|
||||
|
||||
**URL Pattern:** `https://eventartarchive.org/event/{mbid}`
|
||||
|
||||
## Discourse SSO
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** MusicBrainz Community Forum (community.metabrainz.org)
|
||||
**Protocol:** Discourse SSO (Single Sign-On)
|
||||
|
||||
### Authentication Flow
|
||||
|
||||
**Method:** HMAC-SHA256 signed payload
|
||||
|
||||
**Flow:**
|
||||
1. User clicks "Log in" on Discourse
|
||||
2. Discourse redirects to MusicBrainz with nonce
|
||||
3. MusicBrainz authenticates user
|
||||
4. MusicBrainz generates SSO payload
|
||||
5. Payload signed with HMAC-SHA256
|
||||
6. User redirected back to Discourse with signed payload
|
||||
7. Discourse verifies signature and logs in user
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# DBDefs.pm
|
||||
sub DISCOURSE_SSO_SECRET { 'shared_secret' }
|
||||
sub DISCOURSE_SERVER { 'https://community.metabrainz.org' }
|
||||
```
|
||||
|
||||
**Payload Generation:**
|
||||
```perl
|
||||
use Digest::SHA qw(hmac_sha256_hex);
|
||||
use MIME::Base64;
|
||||
|
||||
my $payload = encode_base64(
|
||||
"nonce=$nonce&" .
|
||||
"email=$email&" .
|
||||
"external_id=$user_id&" .
|
||||
"username=$username&" .
|
||||
"name=$name"
|
||||
);
|
||||
|
||||
my $signature = hmac_sha256_hex($payload, $sso_secret);
|
||||
|
||||
my $redirect_url = "$discourse_server/session/sso_login?" .
|
||||
"sso=" . uri_escape($payload) .
|
||||
"&sig=$signature";
|
||||
```
|
||||
|
||||
**User Data Synced:**
|
||||
- Email address
|
||||
- Username
|
||||
- Display name
|
||||
- User ID (external_id)
|
||||
- Avatar URL (optional)
|
||||
- Admin status (optional)
|
||||
- Moderator status (optional)
|
||||
|
||||
## MetaBrainz OAuth
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** Centralized OAuth provider for MetaBrainz services
|
||||
**Protocol:** OAuth 2.0 with token introspection
|
||||
|
||||
### Token Introspection
|
||||
|
||||
**Endpoint:** `https://musicbrainz.org/oauth2/introspect`
|
||||
|
||||
**Method:** POST
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $response = $ua->post(
|
||||
'https://musicbrainz.org/oauth2/introspect',
|
||||
{
|
||||
token => $access_token,
|
||||
client_id => $client_id,
|
||||
client_secret => $client_secret,
|
||||
}
|
||||
);
|
||||
|
||||
my $data = decode_json($response->content);
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"active": true,
|
||||
"scope": "profile email tag rating collection",
|
||||
"client_id": "client_id",
|
||||
"username": "username",
|
||||
"token_type": "Bearer",
|
||||
"exp": 1609459200,
|
||||
"iat": 1609372800,
|
||||
"sub": "user_id"
|
||||
}
|
||||
```
|
||||
|
||||
**Usage:** Other MetaBrainz services (ListenBrainz, BookBrainz, etc.) validate tokens via introspection
|
||||
|
||||
### Services Using MetaBrainz OAuth
|
||||
|
||||
- ListenBrainz (listening history)
|
||||
- BookBrainz (book metadata)
|
||||
- CritiqueBrainz (music reviews)
|
||||
- AcousticBrainz (audio analysis)
|
||||
- Picard (music tagger)
|
||||
|
||||
## Replication System
|
||||
|
||||
### Overview
|
||||
|
||||
**Purpose:** Synchronize database changes from master to mirrors
|
||||
**Protocol:** dbmirror2 packet system
|
||||
|
||||
### Replication Modes
|
||||
|
||||
**RT_MASTER:**
|
||||
- Generates replication packets
|
||||
- Writes to `dbmirror_pending` and `dbmirror_pendingdata` tables
|
||||
- Exports packets for mirrors
|
||||
|
||||
**RT_MIRROR:**
|
||||
- Consumes replication packets
|
||||
- Applies changes from master
|
||||
- Read-only (no edits)
|
||||
|
||||
**RT_STANDALONE:**
|
||||
- No replication
|
||||
- Fully independent database
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# DBDefs.pm
|
||||
sub REPLICATION_TYPE { RT_MASTER } # or RT_MIRROR or RT_STANDALONE
|
||||
sub REPLICATION_ACCESS_TOKEN { 'secret_token' }
|
||||
```
|
||||
|
||||
### Packet Structure
|
||||
|
||||
**Tables:**
|
||||
- `dbmirror_pending` - Pending transactions
|
||||
- `dbmirror_pendingdata` - Data changes (INSERT/UPDATE/DELETE)
|
||||
|
||||
**Packet Format:**
|
||||
```
|
||||
SeqId: 12345
|
||||
TransactionId: 67890
|
||||
Operation: i # i=INSERT, u=UPDATE, d=DELETE
|
||||
TableName: artist
|
||||
Data: {"id":123,"gid":"...","name":"..."}
|
||||
```
|
||||
|
||||
### Replication Flow
|
||||
|
||||
**Master Side:**
|
||||
1. Edit applied to database
|
||||
2. Triggers capture changes to `dbmirror_pending`
|
||||
3. Export script generates replication packets
|
||||
4. Packets uploaded to FTP server
|
||||
|
||||
**Mirror Side:**
|
||||
1. Download replication packets from FTP
|
||||
2. Apply packets in sequence order
|
||||
3. Update replication state
|
||||
4. Verify data integrity
|
||||
|
||||
**Packet Export:**
|
||||
```bash
|
||||
# On master
|
||||
./admin/replication/ExportReplicationPackets
|
||||
|
||||
# Generates packets in replication/ directory
|
||||
# Uploads to FTP server
|
||||
```
|
||||
|
||||
**Packet Import:**
|
||||
```bash
|
||||
# On mirror
|
||||
./admin/replication/LoadReplicationChanges
|
||||
|
||||
# Downloads packets from FTP
|
||||
# Applies changes to database
|
||||
```
|
||||
|
||||
### Replication Lag
|
||||
|
||||
**Monitoring:** Mirrors track replication lag (time behind master)
|
||||
|
||||
**Typical Lag:** Minutes to hours depending on packet size and network
|
||||
|
||||
**Status Endpoint:** `/replication-status` shows current replication state
|
||||
|
||||
## Redis Integration
|
||||
|
||||
### Architecture
|
||||
|
||||
**Connection:** Single Redis instance, 16 databases (0-15)
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# DBDefs.pm
|
||||
sub REDIS_SERVER { 'localhost:6379' }
|
||||
sub REDIS_NAMESPACE { 'MB' }
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
**Session Management (DB 1):**
|
||||
- Store user sessions
|
||||
- 10 hour absolute expiry
|
||||
- 3 hour idle timeout
|
||||
|
||||
**Entity Cache (DB 0):**
|
||||
- Cache entity lookups by MBID
|
||||
- 1 hour TTL
|
||||
- Invalidate on edit
|
||||
|
||||
**Search Cache (DB 2):**
|
||||
- Cache search results
|
||||
- 15 minute TTL
|
||||
|
||||
**Statistics Cache (DB 3):**
|
||||
- Cache homepage statistics
|
||||
- 1 hour TTL
|
||||
|
||||
**Rate Limiting (DB 4):**
|
||||
- Track API request counts
|
||||
- 1 second sliding window
|
||||
|
||||
**Pub/Sub (DB 5):**
|
||||
- Real-time notifications
|
||||
- Edit submission events
|
||||
- Cache invalidation events
|
||||
|
||||
### Connection Pooling
|
||||
|
||||
**Library:** Redis.pm with connection pooling
|
||||
|
||||
**Pool Size:** 10 connections per worker
|
||||
|
||||
**Reconnection:** Automatic reconnection on connection loss
|
||||
|
||||
## HTTP Client
|
||||
|
||||
### LWP::UserAgent
|
||||
|
||||
**Purpose:** HTTP client for external service communication
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
use LWP::UserAgent;
|
||||
|
||||
my $ua = LWP::UserAgent->new(
|
||||
agent => 'MusicBrainz/1.0 (https://musicbrainz.org)',
|
||||
timeout => 30,
|
||||
max_redirect => 5,
|
||||
);
|
||||
```
|
||||
|
||||
**User-Agent:** Always identifies as MusicBrainz with contact URL
|
||||
|
||||
**Timeout:** 30 seconds default
|
||||
|
||||
**Redirects:** Follow up to 5 redirects
|
||||
|
||||
**SSL Verification:** Enabled by default
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
**External Services:** Respect rate limits via delays
|
||||
|
||||
**Wikipedia API:** 1 request per second (recommended)
|
||||
|
||||
**Wikidata API:** 1 request per second (recommended)
|
||||
|
||||
**Implementation:**
|
||||
```perl
|
||||
use Time::HiRes qw(sleep);
|
||||
|
||||
my $last_request_time = 0;
|
||||
|
||||
sub rate_limited_request {
|
||||
my ($url) = @_;
|
||||
|
||||
my $elapsed = time() - $last_request_time;
|
||||
if ($elapsed < 1) {
|
||||
sleep(1 - $elapsed);
|
||||
}
|
||||
|
||||
my $response = $ua->get($url);
|
||||
$last_request_time = time();
|
||||
|
||||
return $response;
|
||||
}
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
**Retry Logic:** Exponential backoff for transient errors
|
||||
|
||||
**Timeouts:** Fail gracefully on timeout
|
||||
|
||||
**Logging:** Log all external service errors to Sentry
|
||||
|
||||
**Example:**
|
||||
```perl
|
||||
use Try::Tiny;
|
||||
|
||||
my $response;
|
||||
my $retries = 3;
|
||||
|
||||
for my $attempt (1..$retries) {
|
||||
try {
|
||||
$response = $ua->get($url);
|
||||
last if $response->is_success;
|
||||
} catch {
|
||||
warn "Request failed (attempt $attempt): $_";
|
||||
sleep(2 ** $attempt); # Exponential backoff
|
||||
};
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,271 @@
|
||||
# MusicBrainz Server Overview
|
||||
|
||||
## Project Identity
|
||||
|
||||
**Name:** MusicBrainz Server
|
||||
**Repository:** https://github.com/metabrainz/musicbrainz-server
|
||||
**License:** GPL-2.0+
|
||||
**Description:** Open music encyclopedia that collects music metadata and makes it available to the public. Community-maintained database of music information including artists, releases, recordings, works, labels, and the relationships between them.
|
||||
|
||||
## Technology Stack
|
||||
|
||||
### Backend
|
||||
|
||||
**Primary Language:** Perl 5.38+
|
||||
**Web Framework:** Catalyst (MVC framework)
|
||||
**Object System:** Moose (modern Perl OOP)
|
||||
|
||||
**Core Perl Dependencies:**
|
||||
- Catalyst::Runtime - Web application framework
|
||||
- Moose - Modern object system for Perl
|
||||
- DBD::Pg - PostgreSQL database driver
|
||||
- Template::Toolkit - Template processing system
|
||||
- Plack - PSGI toolkit and server adapters
|
||||
- Redis - Perl Redis client
|
||||
- JSON::XS - Fast JSON encoding/decoding
|
||||
- XML::LibXML - XML processing
|
||||
- DBIx::Connector - Fast, safe DBI connection management
|
||||
- Readonly - Facility for creating read-only scalars, arrays, hashes
|
||||
- Digest::SHA - SHA message digest algorithm
|
||||
- LWP::UserAgent - HTTP client
|
||||
- DateTime - Date and time object
|
||||
- List::AllUtils - List manipulation utilities
|
||||
- Try::Tiny - Minimal try/catch
|
||||
- Class::Load - Load modules by name
|
||||
- namespace::autoclean - Keep imports out of namespace
|
||||
|
||||
### Frontend
|
||||
|
||||
**Primary Language:** JavaScript (ES6+)
|
||||
**UI Framework:** React 19.2.4
|
||||
**State Management:** Redux
|
||||
**Legacy Framework:** Knockout.js (still present in some views)
|
||||
|
||||
**Core JavaScript Dependencies:**
|
||||
- React 19.2.4 - UI component library
|
||||
- Redux - State management
|
||||
- Webpack 5 - Module bundler
|
||||
- Babel 7 - JavaScript compiler
|
||||
- knockout - Legacy MVVM framework
|
||||
- jQuery - DOM manipulation (legacy)
|
||||
- lodash - Utility library
|
||||
- immutable - Immutable data structures
|
||||
- weight-balanced-tree - Efficient tree data structure
|
||||
|
||||
### Infrastructure
|
||||
|
||||
**Database:** PostgreSQL 16+
|
||||
- 375 tables
|
||||
- 500+ foreign key constraints
|
||||
- Full-text search capabilities
|
||||
- Custom replication via dbmirror2
|
||||
|
||||
**Cache:** Redis
|
||||
- 16 separate databases
|
||||
- Entity caching
|
||||
- Session storage
|
||||
- Pub/sub messaging
|
||||
|
||||
**Search:** Apache Solr
|
||||
- Primary search engine
|
||||
- PostgreSQL full-text as fallback
|
||||
|
||||
**Message Queue:** RabbitMQ (for background jobs)
|
||||
|
||||
## System Prerequisites
|
||||
|
||||
**Required:**
|
||||
- Perl 5.38+ (5.42.0 tested in CI)
|
||||
- Node.js 20.9+
|
||||
- PostgreSQL 16+
|
||||
- Redis 6.0+
|
||||
- Apache Solr 8.11+
|
||||
|
||||
**Optional:**
|
||||
- Docker + Docker Compose (for containerized deployment)
|
||||
- RabbitMQ (for background job processing)
|
||||
|
||||
## Entry Point
|
||||
|
||||
**File:** `app.psgi`
|
||||
|
||||
**Initialization Flow:**
|
||||
1. `app.psgi` loads the Plack middleware stack
|
||||
2. Initializes `MusicBrainz::Server` Catalyst application
|
||||
3. Loads configuration from `DBDefs.pm`
|
||||
4. Establishes database connections via `DBIx::Connector`
|
||||
5. Initializes Redis connection pool
|
||||
6. Forks template renderer process for isolation
|
||||
7. Loads Catalyst controllers, models, and views
|
||||
8. Mounts PSGI application
|
||||
|
||||
**Middleware Stack:**
|
||||
- Plack::Middleware::ReverseProxy - Handle X-Forwarded headers
|
||||
- Plack::Middleware::Static - Serve static files
|
||||
- Plack::Middleware::Session - Session management
|
||||
- Custom middleware for CSRF protection
|
||||
- Custom middleware for request logging
|
||||
|
||||
## Codebase Scale
|
||||
|
||||
**Perl:**
|
||||
- 1,866 Perl files
|
||||
- 53 controllers (13,000 lines)
|
||||
- 106 Data modules (26,000 lines)
|
||||
- 132 entity classes
|
||||
- 43 form modules
|
||||
- 4 view modules
|
||||
|
||||
**JavaScript:**
|
||||
- 1,447 JavaScript files
|
||||
- React components
|
||||
- Redux reducers and actions
|
||||
- Legacy Knockout view models
|
||||
|
||||
**Database:**
|
||||
- 375 tables
|
||||
- 332 migration files
|
||||
- 4,068 lines in CreateTables.sql
|
||||
|
||||
**Tests:**
|
||||
- Perl unit tests (t/)
|
||||
- JavaScript tests (Jest)
|
||||
- pgTAP database tests
|
||||
- Selenium integration tests (4 partitions)
|
||||
|
||||
## Build Process
|
||||
|
||||
### Perl Dependencies
|
||||
|
||||
```bash
|
||||
# Install Carton (Perl dependency manager)
|
||||
cpanm Carton
|
||||
|
||||
# Install Perl dependencies from cpanfile.snapshot
|
||||
carton install
|
||||
```
|
||||
|
||||
### JavaScript Dependencies
|
||||
|
||||
```bash
|
||||
# Install Node.js dependencies
|
||||
yarn install
|
||||
```
|
||||
|
||||
### Asset Compilation
|
||||
|
||||
```bash
|
||||
# Compile static resources (CSS, images, fonts)
|
||||
./script/compile_resources.sh
|
||||
|
||||
# Build JavaScript bundles with Webpack
|
||||
yarn run build
|
||||
```
|
||||
|
||||
**Build Outputs:**
|
||||
- `root/static/build/` - Compiled JavaScript bundles
|
||||
- `root/static/styles/` - Compiled CSS
|
||||
- `root/static/images/` - Optimized images
|
||||
|
||||
## Run Commands
|
||||
|
||||
### Development
|
||||
|
||||
```bash
|
||||
# Using plackup (development server)
|
||||
plackup -Ilib -r app.psgi
|
||||
|
||||
# With auto-reload on file changes
|
||||
plackup -Ilib -R lib,root -r app.psgi
|
||||
```
|
||||
|
||||
### Production
|
||||
|
||||
```bash
|
||||
# Using Starman (production PSGI server)
|
||||
starman --workers 10 --listen :5000 app.psgi
|
||||
|
||||
# Using Server::Starter for zero-downtime restarts
|
||||
start_server --port 5000 -- starman --workers 10 app.psgi
|
||||
```
|
||||
|
||||
### Docker
|
||||
|
||||
```bash
|
||||
# Build Docker images
|
||||
docker-compose build
|
||||
|
||||
# Start all services
|
||||
docker-compose up -d
|
||||
|
||||
# Start specific service
|
||||
docker-compose up -d website
|
||||
```
|
||||
|
||||
**Available Services:**
|
||||
- `website` - Main web application
|
||||
- `webservice` - API service
|
||||
- `cron` - Scheduled tasks
|
||||
- `sitemaps` - Sitemap generation
|
||||
- `json-dump` - JSON data dumps
|
||||
- `solr-backup` - Solr index backup
|
||||
- `tests` - Test runner
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
musicbrainz-server/
|
||||
├── admin/ # Database schema and migrations
|
||||
│ ├── sql/
|
||||
│ │ ├── CreateTables.sql
|
||||
│ │ └── updates/ # 332 migration files
|
||||
├── lib/ # Perl application code
|
||||
│ └── MusicBrainz/
|
||||
│ └── Server/
|
||||
│ ├── Controller/ # 53 controllers
|
||||
│ ├── Data/ # 106 data access modules
|
||||
│ ├── Entity/ # 132 entity classes
|
||||
│ ├── Form/ # 43 form handlers
|
||||
│ ├── View/ # 4 view modules
|
||||
│ ├── WebService/ # API implementation
|
||||
│ └── Edit/ # Edit system
|
||||
├── root/ # Frontend assets
|
||||
│ ├── static/ # Static files
|
||||
│ │ ├── scripts/ # JavaScript source
|
||||
│ │ ├── styles/ # CSS/LESS
|
||||
│ │ └── images/
|
||||
│ └── layout.tt # Main template
|
||||
├── t/ # Perl tests
|
||||
├── docker/ # Docker configuration
|
||||
├── script/ # Utility scripts
|
||||
├── app.psgi # PSGI entry point
|
||||
├── cpanfile # Perl dependencies
|
||||
├── package.json # Node.js dependencies
|
||||
└── webpack.config.js # Webpack configuration
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
**Primary Config:** `lib/DBDefs.pm`
|
||||
|
||||
**Two-Tier System:**
|
||||
1. `lib/DBDefs/Default.pm` - Default values
|
||||
2. `lib/DBDefs.pm` - Instance-specific overrides (not in git)
|
||||
|
||||
**Key Configuration Areas:**
|
||||
- Database connection strings
|
||||
- Redis connection parameters
|
||||
- Solr endpoints
|
||||
- External service credentials (Cover Art Archive, Wikipedia, etc.)
|
||||
- Session settings
|
||||
- Email configuration
|
||||
- OAuth2 settings
|
||||
- Feature flags
|
||||
|
||||
## Status
|
||||
|
||||
**Active Development:** Continuous development since 2001 (15+ years)
|
||||
**Production Status:** Stable, serving millions of requests daily
|
||||
**Community:** Large open-source community with hundreds of contributors
|
||||
**Data Quality:** Community-driven editing with voting system ensures high quality
|
||||
**API Usage:** Powers metadata for major music services and applications worldwide
|
||||
Reference in New Issue
Block a user