Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

619 lines
16 KiB
Markdown

# MusicBrainz Server Data Layer
## Database Overview
**Engine:** PostgreSQL 16+
**Tables:** 375
**Foreign Key Constraints:** 500+
**Schema Definition:** `admin/sql/CreateTables.sql` (4,068 lines)
**Production Size:** ~350GB (full dataset with indexes)
## PostgreSQL Schema
### Core Entity Tables
**Artists:**
- `artist` - Artist entities (bands, musicians, orchestras, etc.)
- `artist_alias` - Alternative names for artists
- `artist_credit` - Artist credit configurations
- `artist_credit_name` - Individual artists in a credit
- `artist_type` - Artist type enumeration (person, group, etc.)
- `artist_tag` - Folksonomy tags
- `artist_rating_raw` - User ratings
- `artist_annotation` - User annotations
- `artist_gid_redirect` - MBID redirects after merges
**Releases:**
- `release` - Release entities (albums, singles, etc.)
- `release_alias` - Alternative release names
- `release_group` - Logical grouping of releases
- `release_group_primary_type` - Album, Single, EP, etc.
- `release_group_secondary_type` - Compilation, Live, Remix, etc.
- `release_status` - Official, Promotion, Bootleg, etc.
- `release_packaging` - Jewel Case, Digipak, etc.
- `release_label` - Labels associated with release
- `release_country` - Release events by country
- `release_tag` - Folksonomy tags
- `release_rating_raw` - User ratings
- `release_annotation` - User annotations
- `release_gid_redirect` - MBID redirects
**Recordings:**
- `recording` - Recording entities (unique audio recordings)
- `recording_alias` - Alternative recording names
- `recording_tag` - Folksonomy tags
- `recording_rating_raw` - User ratings
- `recording_annotation` - User annotations
- `recording_gid_redirect` - MBID redirects
- `isrc` - International Standard Recording Codes
- `recording_isrc` - Recording to ISRC mapping
**Works:**
- `work` - Musical composition entities
- `work_alias` - Alternative work names
- `work_type` - Song, Symphony, Opera, etc.
- `work_attribute` - Work attributes (key, tempo, etc.)
- `work_attribute_type` - Attribute type definitions
- `work_tag` - Folksonomy tags
- `work_rating_raw` - User ratings
- `work_annotation` - User annotations
- `work_gid_redirect` - MBID redirects
- `iswc` - International Standard Musical Work Codes
- `work_iswc` - Work to ISWC mapping
**Labels:**
- `label` - Record label entities
- `label_alias` - Alternative label names
- `label_type` - Original Production, Bootleg Production, etc.
- `label_tag` - Folksonomy tags
- `label_rating_raw` - User ratings
- `label_annotation` - User annotations
- `label_gid_redirect` - MBID redirects
**Geographic:**
- `area` - Geographic areas (countries, cities, etc.)
- `area_alias` - Alternative area names
- `area_type` - Country, Subdivision, City, etc.
- `area_tag` - Folksonomy tags
- `area_annotation` - User annotations
- `area_gid_redirect` - MBID redirects
- `country_area` - ISO country code mapping
- `iso_3166_1` - ISO 3166-1 country codes
- `iso_3166_2` - ISO 3166-2 subdivision codes
- `iso_3166_3` - ISO 3166-3 former country codes
**Events:**
- `event` - Event entities (concerts, festivals, etc.)
- `event_alias` - Alternative event names
- `event_type` - Concert, Festival, etc.
- `event_tag` - Folksonomy tags
- `event_rating_raw` - User ratings
- `event_annotation` - User annotations
- `event_gid_redirect` - MBID redirects
**Places:**
- `place` - Venue/location entities
- `place_alias` - Alternative place names
- `place_type` - Venue, Studio, etc.
- `place_tag` - Folksonomy tags
- `place_annotation` - User annotations
- `place_gid_redirect` - MBID redirects
**Series:**
- `series` - Ordered sequence entities
- `series_alias` - Alternative series names
- `series_type` - Release group series, etc.
- `series_ordering_type` - Automatic, Manual
- `series_tag` - Folksonomy tags
- `series_annotation` - User annotations
- `series_gid_redirect` - MBID redirects
**Instruments:**
- `instrument` - Musical instrument entities
- `instrument_alias` - Alternative instrument names
- `instrument_type` - Wind, String, Percussion, etc.
- `instrument_tag` - Folksonomy tags
- `instrument_annotation` - User annotations
- `instrument_gid_redirect` - MBID redirects
**Genres:**
- `genre` - Genre entities
- `genre_alias` - Alternative genre names
- `genre_annotation` - User annotations
- `genre_gid_redirect` - MBID redirects
**URLs:**
- `url` - External URL entities
- `url_gid_redirect` - MBID redirects
### Relationship Tables (l_* tables)
**Pattern:** `l_{entity1}_{entity2}` for relationships between entities.
**Examples:**
- `l_artist_artist` - Artist-to-artist relationships (member of, collaboration, etc.)
- `l_artist_recording` - Artist-to-recording relationships (performer, conductor, etc.)
- `l_artist_release` - Artist-to-release relationships
- `l_artist_release_group` - Artist-to-release-group relationships
- `l_artist_work` - Artist-to-work relationships (composer, lyricist, etc.)
- `l_artist_url` - Artist-to-URL relationships (official homepage, social media, etc.)
- `l_recording_work` - Recording-to-work relationships (performance of)
- `l_release_release_group` - Release-to-release-group relationships
- `l_release_url` - Release-to-URL relationships (purchase links, streaming, etc.)
**Relationship Support Tables:**
- `link` - Link instances
- `link_type` - Relationship type definitions
- `link_attribute` - Relationship attributes
- `link_attribute_type` - Attribute type definitions
- `link_crediting` - Custom relationship credits
- `link_text_attribute` - Text attributes for relationships
### Media Tables
**Physical Media:**
- `medium` - Physical media (CDs, vinyl, etc.)
- `medium_format` - CD, Vinyl, Digital Media, etc.
- `medium_cdtoc` - CD table of contents
- `cdtoc` - CD TOC data
- `cdtoc_raw` - Raw CD TOC data
**Tracks:**
- `track` - Individual tracks on media
- `track_gid_redirect` - Track MBID redirects
### Metadata Tables
**Tags:**
- `tag` - Tag definitions
- `tag_relation` - Tag relationships
- `{entity}_tag` - Tags per entity type
- `{entity}_tag_raw` - Raw user tag submissions
**Ratings:**
- `{entity}_rating_raw` - Raw user ratings per entity type
**Annotations:**
- `annotation` - Annotation text
- `{entity}_annotation` - Annotations per entity type
**Collections:**
- `editor_collection` - User collections
- `editor_collection_type` - Collection type (release, artist, etc.)
- `editor_collection_{entity}` - Collection contents per entity type
### Editorial Tables
**Edits:**
- `edit` - Edit submissions
- `edit_data` - Edit-specific data (JSON)
- `edit_{entity}` - Edit to entity mappings
- `vote` - User votes on edits
- `edit_note` - Discussion notes on edits
- `edit_note_recipient` - Edit note notifications
**Editors:**
- `editor` - User accounts
- `editor_preference` - User preferences
- `editor_language` - User language preferences
- `editor_subscribe_artist` - Artist subscriptions
- `editor_subscribe_collection` - Collection subscriptions
- `editor_subscribe_label` - Label subscriptions
- `editor_subscribe_series` - Series subscriptions
- `editor_subscribe_editor` - Editor subscriptions
- `editor_oauth_token` - OAuth tokens
- `application` - OAuth applications
**Moderation:**
- `autoeditor_election` - Auto-editor elections
- `autoeditor_election_vote` - Election votes
- `editor_watch_preferences` - Watchlist preferences
- `editor_watch_artist` - Artist watchlist
- `editor_watch_release_group_type` - Release group type filters
- `editor_watch_release_status` - Release status filters
### Identifier Tables
**Standard Identifiers:**
- `isrc` - International Standard Recording Code
- `iswc` - International Standard Musical Work Code
- `recording_isrc` - Recording to ISRC mapping
- `work_iswc` - Work to ISWC mapping
**MusicBrainz Identifiers:**
- `{entity}_gid_redirect` - MBID redirects after merges
**Barcodes:**
- `release_barcode` - Release barcodes (EAN, UPC)
### Replication Tables (dbmirror2)
**Replication System:**
- `dbmirror_pending` - Pending replication packets
- `dbmirror_pendingdata` - Replication data
- `replication_control` - Replication state tracking
**Modes:**
- `RT_MASTER` - Master database (generates replication packets)
- `RT_MIRROR` - Mirror database (consumes replication packets)
- `RT_STANDALONE` - Standalone database (no replication)
### Auxiliary Tables
**Statistics:**
- `statistic` - Cached statistics
- `statistic_event` - Statistic calculation events
**Documentation:**
- `documentation.l_{entity1}_{entity2}_example` - Relationship examples
**Deprecated:**
- Various `_deleted` tables for soft deletes
## Schema Management
### CreateTables.sql
**Location:** `admin/sql/CreateTables.sql`
**Size:** 4,068 lines
**Purpose:** Complete schema definition for fresh installations
**Structure:**
```sql
-- Core entity tables
CREATE TABLE artist (...);
CREATE TABLE release (...);
CREATE TABLE recording (...);
-- Indexes
CREATE INDEX artist_idx_name ON artist (name);
CREATE INDEX artist_idx_gid ON artist (gid);
-- Foreign keys
ALTER TABLE artist_credit_name
ADD CONSTRAINT artist_credit_name_fk_artist
FOREIGN KEY (artist) REFERENCES artist(id);
-- Triggers
CREATE TRIGGER a_ins_artist AFTER INSERT ON artist ...;
```
### Migration System
**Location:** `admin/sql/updates/`
**Count:** 332 migration files
**Naming:** Date-based (YYYYMMDD-HHMMSS-description.sql)
**Example Filenames:**
- `20230115-mbs-12345-add-genre-table.sql`
- `20230220-mbs-12346-add-event-series-relationship.sql`
- `20230315-mbs-12347-add-recording-length-index.sql`
**Migration Structure:**
```sql
\set ON_ERROR_STOP 1
BEGIN;
-- Schema changes
ALTER TABLE artist ADD COLUMN disambiguation TEXT;
-- Data migrations
UPDATE artist SET disambiguation = '' WHERE disambiguation IS NULL;
-- Constraints
ALTER TABLE artist ALTER COLUMN disambiguation SET NOT NULL;
COMMIT;
```
**Schema Change Variants:**
- `schema-change/` subdirectory contains master/mirror variants
- Master migrations may include replication setup
- Mirror migrations skip replication-specific changes
**Migration Tracking:**
- Migrations are tracked in the database
- Applied migrations recorded to prevent re-application
- Rollback not supported (forward-only migrations)
## Custom ORM (Moose-based Data Layer)
### Architecture
**NOT DBIx::Class** - MusicBrainz uses a custom Moose-based data access layer.
**Components:**
- 106 Data modules in `lib/MusicBrainz/Server/Data/`
- `DBIx::Connector` for connection pooling
- `Sql.pm` for query abstraction
- Raw SQL via `DBD::Pg`
### Data Module Pattern
**Base Class:** `MusicBrainz::Server::Data::Entity`
**Example:**
```perl
package MusicBrainz::Server::Data::Artist;
use Moose;
extends 'MusicBrainz::Server::Data::Entity';
with 'MusicBrainz::Server::Data::Role::Editable';
with 'MusicBrainz::Server::Data::Role::LinksToEdit';
with 'MusicBrainz::Server::Data::Role::Merge';
sub _table { 'artist' }
sub _entity_class { 'MusicBrainz::Server::Entity::Artist' }
sub _columns {
return 'id, gid, name, sort_name, begin_date_year, begin_date_month,
begin_date_day, end_date_year, end_date_month, end_date_day,
type, area, gender, comment, edits_pending, last_updated,
ended, begin_area, end_area';
}
sub _column_mapping {
return {
id => 'id',
gid => 'gid',
name => 'name',
sort_name => 'sort_name',
type_id => 'type',
area_id => 'area',
gender_id => 'gender',
comment => 'comment',
edits_pending => 'edits_pending',
last_updated => 'last_updated',
ended => 'ended',
begin_area_id => 'begin_area',
end_area_id => 'end_area',
};
}
sub get_by_gid {
my ($self, $gid) = @_;
return $self->_get_by_key('gid', $gid);
}
sub insert {
my ($self, $data) = @_;
my $row = $self->_hash_to_row($data);
my $id = $self->sql->insert_row('artist', $row, 'id');
return $self->_new_from_row($row);
}
```
### Moose Roles
**Role::Editable:**
- Entities that can be edited via the edit system
- Provides `load_meta()` for edit counts
**Role::Taggable:**
- Entities that support folksonomy tags
- Provides `tags()`, `add_tags()`, `remove_tags()`
**Role::Rateable:**
- Entities that can be rated (0-100 scale)
- Provides `rating()`, `user_rating()`
**Role::Relatable:**
- Entities that can have relationships
- Provides `relationships()`, `add_relationship()`
**Role::Aliasable:**
- Entities that can have alternative names
- Provides `aliases()`, `add_alias()`
**Role::Annotation:**
- Entities that can be annotated
- Provides `latest_annotation()`
### Sql.pm Abstraction
**Location:** `lib/MusicBrainz/Server/Sql.pm`
**Purpose:** Thin abstraction over DBI for common query patterns.
**Methods:**
```perl
# Single row
my $row = $sql->select_single_row_hash(
'SELECT * FROM artist WHERE gid = ?', $gid
);
# Multiple rows
my $rows = $sql->select_list_of_hashes(
'SELECT * FROM artist WHERE area = ?', $area_id
);
# Insert
my $id = $sql->insert_row('artist', {
gid => $gid,
name => $name,
sort_name => $sort_name,
}, 'id');
# Update
$sql->update_row('artist', {
name => $new_name,
}, { id => $artist_id });
# Delete
$sql->delete_row('artist', { id => $artist_id });
# Transaction
$sql->begin;
eval {
$sql->insert_row(...);
$sql->update_row(...);
$sql->commit;
};
if ($@) {
$sql->rollback;
die $@;
}
```
### DBIx::Connector
**Purpose:** Fast, safe DBI connection management with automatic reconnection.
**Configuration:**
```perl
my $conn = DBIx::Connector->new(
$dsn, $username, $password,
{
RaiseError => 1,
AutoCommit => 1,
pg_enable_utf8 => 1,
}
);
# Execute with automatic reconnection
$conn->run(sub {
my $dbh = $_;
$dbh->do('SELECT ...');
});
```
## Search Infrastructure
### Apache Solr (Primary)
**Purpose:** Full-text search across all entities
**Cores:**
- `artist` - Artist search
- `release` - Release search
- `release-group` - Release group search
- `recording` - Recording search
- `work` - Work search
- `label` - Label search
- `area` - Area search
- `event` - Event search
- `place` - Place search
- `series` - Series search
- `instrument` - Instrument search
- `tag` - Tag search
**Indexing:**
- Incremental updates via edit system
- Full reindex via `admin/BuildSearchIndexes.pl`
- Real-time updates for new entities
**Query Features:**
- Fuzzy matching
- Phrase search
- Boolean operators (AND, OR, NOT)
- Field-specific search (artist:nirvana)
- Wildcards (nirv*)
- Proximity search ("smells spirit"~5)
### PostgreSQL Full-Text (Fallback)
**Purpose:** Fallback when Solr is unavailable
**Implementation:**
- `mb_simple_tsvector` function for text vectorization
- GIN indexes on tsvector columns
- `to_tsquery()` for query parsing
**Example:**
```sql
CREATE INDEX artist_idx_name_txt ON artist
USING gin(mb_simple_tsvector(name));
SELECT * FROM artist
WHERE mb_simple_tsvector(name) @@ to_tsquery('simple', 'nirvana');
```
**Limitations:**
- Less sophisticated than Solr
- No fuzzy matching
- Limited ranking
- Used only as emergency fallback
## Redis Caching
### Architecture
**Databases:** 16 separate Redis databases (0-15)
**Database Allocation:**
- DB 0: Entity cache (GID lookups)
- DB 1: Session storage
- DB 2-15: Various caches (search, statistics, etc.)
### Entity Cache (GID Cache)
**Purpose:** Cache entity lookups by MBID (GID)
**Pattern:**
```perl
# Cache key: entity:gid:{gid}
my $cache_key = "artist:gid:$gid";
# Try cache first
my $cached = $redis->get($cache_key);
if ($cached) {
return decode_json($cached);
}
# Cache miss - load from database
my $artist = $self->sql->select_single_row_hash(
'SELECT * FROM artist WHERE gid = ?', $gid
);
# Store in cache (1 hour TTL)
$redis->setex($cache_key, 3600, encode_json($artist));
return $artist;
```
**TTL:** 1 hour (3600 seconds)
**Invalidation:** On edit application
### Session Storage
**Purpose:** Store user sessions
**Pattern:**
```perl
# Session key: session:{session_id}
my $session_key = "session:$session_id";
# Store session
$redis->setex($session_key, 36000, encode_json({
user_id => $user_id,
csrf_token => $csrf_token,
last_activity => time(),
}));
# Retrieve session
my $session = decode_json($redis->get($session_key));
```
**TTL:** 10 hours absolute, 3 hours idle
**Cookie:** `AF_SID` (SameSite=Lax, Secure, HttpOnly)
### Cache Invalidation
**Strategy:** Invalidate on write
**Example:**
```perl
# After updating artist
$self->sql->update_row('artist', { name => $new_name }, { id => $id });
# Invalidate cache
$redis->del("artist:gid:$gid");
```
**Bulk Invalidation:**
- Pattern-based deletion via `SCAN` + `DEL`
- Used for relationship changes affecting multiple entities