feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
@@ -0,0 +1,568 @@
# MusicBrainz Server Architecture
## Design Pattern
Hybrid MVC + Service Layer architecture built on the Catalyst web framework. The application follows a layered approach with clear separation of concerns between presentation, business logic, and data access.
## Directory Structure
```
lib/MusicBrainz/Server/
├── Controller/ # 53 controllers, 13,000 lines
│ ├── Artist.pm
│ ├── Release.pm
│ ├── Recording.pm
│ ├── WS/ # Web Service controllers
│ │ └── 2/ # API version 2
│ └── ...
├── Data/ # 106 modules, 26,000 lines
│ ├── Artist.pm
│ ├── Release.pm
│ ├── Recording.pm
│ ├── Relationship.pm
│ └── ...
├── Entity/ # 132 entity classes
│ ├── Artist.pm
│ ├── Release.pm
│ ├── Recording.pm
│ ├── Types.pm
│ └── ...
├── Form/ # 43 form handlers
│ ├── Artist.pm
│ ├── Release.pm
│ └── ...
├── View/ # 4 view modules
│ ├── Default.pm # Template Toolkit
│ ├── JSON.pm
│ ├── XML.pm
│ └── JSONLD.pm
├── WebService/ # API implementation
│ ├── Serializer/
│ │ ├── JSON/
│ │ ├── XML/
│ │ └── JSONLD/
│ └── Validator.pm
├── Edit/ # Edit system
│ ├── Artist/
│ ├── Release/
│ ├── Recording/
│ └── ...
├── Context.pm # Service layer coordinator
├── DBDefs.pm # Configuration
└── Sql.pm # SQL abstraction layer
admin/ # Database administration
├── sql/
│ ├── CreateTables.sql # Schema definition (4,068 lines)
│ └── updates/ # 332 migration files
root/ # Frontend assets
├── static/
│ ├── scripts/ # JavaScript source
│ │ ├── common/
│ │ ├── edit/
│ │ └── release/
│ ├── styles/ # CSS/LESS
│ └── images/
└── layout.tt # Main template
t/ # Tests
├── lib/ # Test utilities
├── pgtap/ # Database tests
└── selenium/ # Integration tests
```
## Architectural Layers
### Controller Layer (53 modules, 13,000 lines)
**Responsibility:** Handle HTTP requests, coordinate business logic, render responses.
**Key Controllers:**
- `Artist.pm` - Artist entity operations
- `Release.pm` - Release entity operations
- `Recording.pm` - Recording entity operations
- `ReleaseGroup.pm` - Release group operations
- `Work.pm` - Work entity operations
- `Label.pm` - Label entity operations
- `Edit.pm` - Edit submission and voting
- `Search.pm` - Search interface
- `WS::2::*` - Web service API endpoints
**Controller Pattern:**
```perl
package MusicBrainz::Server::Controller::Artist;
use Moose;
BEGIN { extends 'MusicBrainz::Server::Controller' }
sub show : Path Args(1) {
my ($self, $c, $gid) = @_;
my $artist = $c->model('Artist')->get_by_gid($gid);
$c->stash( artist => $artist );
}
```
**Responsibilities:**
- Request validation
- Authentication/authorization checks
- Coordinate Data layer calls
- Prepare data for views
- Handle form submissions
### Data Layer (106 modules, 26,000 lines)
**Responsibility:** Repository pattern for database access. Each entity has a corresponding Data module.
**Key Data Modules:**
- `Data::Artist` - Artist CRUD operations
- `Data::Release` - Release CRUD operations
- `Data::Recording` - Recording CRUD operations
- `Data::Relationship` - Relationship management
- `Data::Edit` - Edit persistence
- `Data::Search` - Search operations
**Data Module Pattern:**
```perl
package MusicBrainz::Server::Data::Artist;
use Moose;
extends 'MusicBrainz::Server::Data::Entity';
sub _table { 'artist' }
sub _entity_class { 'MusicBrainz::Server::Entity::Artist' }
sub get_by_gid {
my ($self, $gid) = @_;
return $self->_get_by_key('gid', $gid);
}
```
**Moose Roles:**
- `Role::Editable` - Entities that can be edited
- `Role::Taggable` - Entities that can be tagged
- `Role::Rateable` - Entities that can be rated
- `Role::Relatable` - Entities that can have relationships
- `Role::Aliasable` - Entities that can have aliases
- `Role::Annotation` - Entities that can be annotated
**Data Access Pattern:**
- No ORM (not DBIx::Class)
- Custom Moose-based abstraction
- Raw SQL via `DBD::Pg`
- `DBIx::Connector` for connection pooling
- `Sql.pm` provides query builder utilities
### Entity Layer (132 classes)
**Responsibility:** Domain objects representing database entities.
**Key Entities:**
- `Entity::Artist` - Artist domain object
- `Entity::Release` - Release domain object
- `Entity::Recording` - Recording domain object
- `Entity::ReleaseGroup` - Release group domain object
- `Entity::Work` - Work domain object
- `Entity::Label` - Label domain object
- `Entity::Relationship` - Relationship between entities
**Entity Pattern:**
```perl
package MusicBrainz::Server::Entity::Artist;
use Moose;
extends 'MusicBrainz::Server::Entity';
has 'name' => ( is => 'rw', isa => 'Str' );
has 'sort_name' => ( is => 'rw', isa => 'Str' );
has 'type_id' => ( is => 'rw', isa => 'Maybe[Int]' );
has 'country_id' => ( is => 'rw', isa => 'Maybe[Int]' );
has 'begin_date' => ( is => 'rw', isa => 'PartialDate' );
has 'end_date' => ( is => 'rw', isa => 'PartialDate' );
```
**Entity Characteristics:**
- Immutable after construction (mostly)
- Type-safe via Moose type system
- Lazy loading of relationships
- No database logic (pure domain objects)
### Form Layer (43 modules)
**Responsibility:** Form validation and processing using HTML::FormHandler.
**Key Forms:**
- `Form::Artist` - Artist creation/editing
- `Form::Release` - Release creation/editing
- `Form::Recording` - Recording creation/editing
- `Form::Edit::*` - Edit-specific forms
**Form Pattern:**
```perl
package MusicBrainz::Server::Form::Artist;
use HTML::FormHandler::Moose;
extends 'MusicBrainz::Server::Form';
has_field 'name' => ( type => 'Text', required => 1 );
has_field 'sort_name' => ( type => 'Text', required => 1 );
has_field 'type_id' => ( type => 'Select' );
```
### View Layer (4 modules)
**Responsibility:** Render responses in different formats.
**Views:**
- `View::Default` - Template Toolkit for HTML
- `View::JSON` - JSON serialization
- `View::XML` - XML serialization
- `View::JSONLD` - JSON-LD serialization
## Edit System Architecture
**Pattern:** Command Pattern
**Concept:** All data modifications are represented as "edits" - versioned, votable changes that go through a review process.
**Edit Lifecycle:**
1. User submits edit via form
2. Edit is validated and persisted to `edit` table
3. Edit enters voting period (typically 7 days)
4. Community votes on edit (yes/no/abstain)
5. Auto-editors can approve immediately
6. Edit is applied or rejected based on votes
7. Full audit trail maintained
**Edit Types (examples):**
- `Edit::Artist::Create` - Create new artist
- `Edit::Artist::Edit` - Modify artist data
- `Edit::Artist::Delete` - Delete artist
- `Edit::Release::Create` - Create new release
- `Edit::Release::AddReleaseLabel` - Add label to release
- `Edit::Relationship::Create` - Create relationship
- `Edit::Relationship::Edit` - Modify relationship
- `Edit::Relationship::Delete` - Delete relationship
**Edit Structure:**
```perl
package MusicBrainz::Server::Edit::Artist::Edit;
use Moose;
extends 'MusicBrainz::Server::Edit';
sub edit_type { 1 } # Unique edit type ID
sub edit_name { 'Edit artist' }
sub initialize {
my ($self, %opts) = @_;
# Store old and new data
$self->data({
entity_id => $opts{artist_id},
old => { ... },
new => { ... },
});
}
sub accept {
my $self = shift;
# Apply the edit
$self->c->model('Artist')->update($self->data->{entity_id}, $self->data->{new});
}
```
**Edit Data Storage:**
- `edit` table - Edit metadata (type, status, votes)
- `edit_data` table - Edit-specific data (JSON)
- `vote` table - User votes on edits
**Edit Statuses:**
- Open - Awaiting votes
- Applied - Accepted and applied
- Failed Vote - Rejected by community
- Failed Dependency - Dependent edit failed
- Error - Application error
- Deleted - Cancelled by submitter
## Serialization Architecture
### JSON Serializer
**Location:** `lib/MusicBrainz/Server/WebService/Serializer/JSON/2/`
**Modules:**
- `Artist.pm` - Artist JSON serialization
- `Release.pm` - Release JSON serialization
- `Recording.pm` - Recording JSON serialization
- `Utils.pm` - Common serialization utilities
**Pattern:**
```perl
sub serialize {
my ($self, $entity, $inc, $opts) = @_;
my $data = {
id => $entity->gid,
name => $entity->name,
'sort-name' => $entity->sort_name,
};
if ($inc->artist_credits) {
$data->{'artist-credit'} = $self->serialize_artist_credit($entity->artist_credit);
}
return $data;
}
```
### XML Serializer
**Location:** `lib/MusicBrainz/Server/WebService/Serializer/XML/2/`
**Namespace:** `http://musicbrainz.org/ns/mmd-2.0#`
**Pattern:**
```perl
sub serialize {
my ($self, $entity, $inc, $opts) = @_;
my $xml = XML::LibXML::Element->new('artist');
$xml->setAttribute('id', $entity->gid);
$xml->appendTextChild('name', $entity->name);
$xml->appendTextChild('sort-name', $entity->sort_name);
return $xml;
}
```
### JSON-LD Serializer
**Location:** `lib/MusicBrainz/Server/WebService/Serializer/JSONLD/`
**Context:** Schema.org vocabulary
**Pattern:**
```perl
sub serialize {
my ($self, $entity) = @_;
return {
'@context' => 'http://schema.org',
'@type' => 'MusicGroup',
'@id' => 'https://musicbrainz.org/artist/' . $entity->gid,
'name' => $entity->name,
};
}
```
## Frontend Architecture
### Template Toolkit (Server-Side Rendering)
**Location:** `root/`
**Main Template:** `root/layout.tt`
**Template Structure:**
```
root/
├── layout.tt # Main layout
├── artist/
│ ├── index.tt # Artist listing
│ ├── show.tt # Artist detail
│ └── edit.tt # Artist edit form
├── release/
│ ├── index.tt
│ ├── show.tt
│ └── edit.tt
└── components/
├── header.tt
├── footer.tt
└── sidebar.tt
```
**Template Pattern:**
```tt2
[% WRAPPER 'layout.tt' title=artist.name %]
<h1>[% artist.name %]</h1>
<p>Sort name: [% artist.sort_name %]</p>
[% IF artist.releases.size %]
<h2>Releases</h2>
<ul>
[% FOR release IN artist.releases %]
<li><a href="/release/[% release.gid %]">[% release.name %]</a></li>
[% END %]
</ul>
[% END %]
[% END %]
```
### React (Progressive Enhancement)
**Location:** `root/static/scripts/`
**Strategy:** Progressive enhancement - server renders HTML, React hydrates for interactivity.
**Component Structure:**
```
root/static/scripts/
├── common/
│ ├── components/
│ │ ├── EntityLink.js
│ │ ├── Autocomplete.js
│ │ └── ReleaseList.js
│ └── utility/
├── edit/
│ ├── components/
│ │ ├── EditNote.js
│ │ └── VotingSection.js
│ └── reducers/
└── release/
├── components/
│ ├── ReleaseHeader.js
│ └── TrackList.js
└── reducers/
```
**React Pattern:**
```javascript
import React from 'react';
import ReactDOM from 'react-dom';
const ReleaseList = ({ releases }) => (
<ul>
{releases.map(release => (
<li key={release.gid}>
<a href={`/release/${release.gid}`}>{release.name}</a>
</li>
))}
</ul>
);
// Hydrate server-rendered content
const container = document.getElementById('release-list');
if (container) {
const releases = JSON.parse(container.dataset.releases);
ReactDOM.hydrate(<ReleaseList releases={releases} />, container);
}
```
### Legacy Knockout.js
**Status:** Being phased out, but still present in some views.
**Location:** `root/static/scripts/` (mixed with React)
**Pattern:**
```javascript
ko.applyBindings({
releases: ko.observableArray([...]),
addRelease: function() { ... }
});
```
## Service Layer (Context)
**File:** `lib/MusicBrainz/Server/Context.pm`
**Responsibility:** Coordinate operations across multiple Data modules, manage transactions, provide unified interface.
**Pattern:**
```perl
my $artist = $c->model('Artist')->get_by_gid($gid);
$c->model('ArtistCredit')->load($artist);
$c->model('Release')->load_for_artist($artist);
$c->model('Relationship')->load($artist);
```
**Context Provides:**
- Database connection management
- Transaction handling
- Model access (`$c->model('Artist')`)
- Configuration access (`$c->config`)
- Session management
- Request/response handling
## Key Design Patterns
### Repository Pattern
**Implementation:** Data layer modules
**Purpose:** Abstract database access, provide clean interface for entity operations.
**Example:**
```perl
# Instead of raw SQL everywhere:
my $artist = $c->model('Artist')->get_by_gid($gid);
# Data::Artist handles the SQL:
sub get_by_gid {
my ($self, $gid) = @_;
return $self->sql->select_single_row_hash(
'SELECT * FROM artist WHERE gid = ?', $gid
);
}
```
### Command Pattern
**Implementation:** Edit system
**Purpose:** Encapsulate all data modifications as objects, enabling undo, audit trails, and voting.
**Example:**
```perl
my $edit = $c->model('Edit')->create(
edit_type => $EDIT_ARTIST_EDIT,
editor_id => $c->user->id,
artist_id => $artist->id,
old => { name => 'Old Name' },
new => { name => 'New Name' },
);
```
### Service Pattern
**Implementation:** Context object
**Purpose:** Coordinate operations across multiple repositories, manage transactions.
**Example:**
```perl
$c->model('MB')->with_transaction(sub {
my $artist = $c->model('Artist')->insert({ name => 'New Artist' });
$c->model('Edit')->create(
edit_type => $EDIT_ARTIST_CREATE,
entity_id => $artist->id,
);
});
```
## Data Access Layer
**No ORM:** MusicBrainz does not use DBIx::Class or any traditional ORM.
**Custom Abstraction:**
- Moose-based Data modules
- Raw SQL via `DBD::Pg`
- `DBIx::Connector` for connection pooling
- `Sql.pm` provides query builder utilities
**Rationale:**
- Performance - Direct SQL is faster
- Flexibility - Complex queries easier to write
- Control - Full control over query execution
- Legacy - Codebase predates modern ORMs
**SQL Abstraction Example:**
```perl
# lib/MusicBrainz/Server/Data/Sql.pm
sub select_single_row_hash {
my ($self, $query, @args) = @_;
my $row = $self->dbh->selectrow_hashref($query, undef, @args);
return $row;
}
sub select_list_of_hashes {
my ($self, $query, @args) = @_;
my $rows = $self->dbh->selectall_arrayref($query, { Slice => {} }, @args);
return $rows;
}
```