# MusicBrainz Server Data Layer ## Database Overview **Engine:** PostgreSQL 16+ **Tables:** 375 **Foreign Key Constraints:** 500+ **Schema Definition:** `admin/sql/CreateTables.sql` (4,068 lines) **Production Size:** ~350GB (full dataset with indexes) ## PostgreSQL Schema ### Core Entity Tables **Artists:** - `artist` - Artist entities (bands, musicians, orchestras, etc.) - `artist_alias` - Alternative names for artists - `artist_credit` - Artist credit configurations - `artist_credit_name` - Individual artists in a credit - `artist_type` - Artist type enumeration (person, group, etc.) - `artist_tag` - Folksonomy tags - `artist_rating_raw` - User ratings - `artist_annotation` - User annotations - `artist_gid_redirect` - MBID redirects after merges **Releases:** - `release` - Release entities (albums, singles, etc.) - `release_alias` - Alternative release names - `release_group` - Logical grouping of releases - `release_group_primary_type` - Album, Single, EP, etc. - `release_group_secondary_type` - Compilation, Live, Remix, etc. - `release_status` - Official, Promotion, Bootleg, etc. - `release_packaging` - Jewel Case, Digipak, etc. - `release_label` - Labels associated with release - `release_country` - Release events by country - `release_tag` - Folksonomy tags - `release_rating_raw` - User ratings - `release_annotation` - User annotations - `release_gid_redirect` - MBID redirects **Recordings:** - `recording` - Recording entities (unique audio recordings) - `recording_alias` - Alternative recording names - `recording_tag` - Folksonomy tags - `recording_rating_raw` - User ratings - `recording_annotation` - User annotations - `recording_gid_redirect` - MBID redirects - `isrc` - International Standard Recording Codes - `recording_isrc` - Recording to ISRC mapping **Works:** - `work` - Musical composition entities - `work_alias` - Alternative work names - `work_type` - Song, Symphony, Opera, etc. - `work_attribute` - Work attributes (key, tempo, etc.) - `work_attribute_type` - Attribute type definitions - `work_tag` - Folksonomy tags - `work_rating_raw` - User ratings - `work_annotation` - User annotations - `work_gid_redirect` - MBID redirects - `iswc` - International Standard Musical Work Codes - `work_iswc` - Work to ISWC mapping **Labels:** - `label` - Record label entities - `label_alias` - Alternative label names - `label_type` - Original Production, Bootleg Production, etc. - `label_tag` - Folksonomy tags - `label_rating_raw` - User ratings - `label_annotation` - User annotations - `label_gid_redirect` - MBID redirects **Geographic:** - `area` - Geographic areas (countries, cities, etc.) - `area_alias` - Alternative area names - `area_type` - Country, Subdivision, City, etc. - `area_tag` - Folksonomy tags - `area_annotation` - User annotations - `area_gid_redirect` - MBID redirects - `country_area` - ISO country code mapping - `iso_3166_1` - ISO 3166-1 country codes - `iso_3166_2` - ISO 3166-2 subdivision codes - `iso_3166_3` - ISO 3166-3 former country codes **Events:** - `event` - Event entities (concerts, festivals, etc.) - `event_alias` - Alternative event names - `event_type` - Concert, Festival, etc. - `event_tag` - Folksonomy tags - `event_rating_raw` - User ratings - `event_annotation` - User annotations - `event_gid_redirect` - MBID redirects **Places:** - `place` - Venue/location entities - `place_alias` - Alternative place names - `place_type` - Venue, Studio, etc. - `place_tag` - Folksonomy tags - `place_annotation` - User annotations - `place_gid_redirect` - MBID redirects **Series:** - `series` - Ordered sequence entities - `series_alias` - Alternative series names - `series_type` - Release group series, etc. - `series_ordering_type` - Automatic, Manual - `series_tag` - Folksonomy tags - `series_annotation` - User annotations - `series_gid_redirect` - MBID redirects **Instruments:** - `instrument` - Musical instrument entities - `instrument_alias` - Alternative instrument names - `instrument_type` - Wind, String, Percussion, etc. - `instrument_tag` - Folksonomy tags - `instrument_annotation` - User annotations - `instrument_gid_redirect` - MBID redirects **Genres:** - `genre` - Genre entities - `genre_alias` - Alternative genre names - `genre_annotation` - User annotations - `genre_gid_redirect` - MBID redirects **URLs:** - `url` - External URL entities - `url_gid_redirect` - MBID redirects ### Relationship Tables (l_* tables) **Pattern:** `l_{entity1}_{entity2}` for relationships between entities. **Examples:** - `l_artist_artist` - Artist-to-artist relationships (member of, collaboration, etc.) - `l_artist_recording` - Artist-to-recording relationships (performer, conductor, etc.) - `l_artist_release` - Artist-to-release relationships - `l_artist_release_group` - Artist-to-release-group relationships - `l_artist_work` - Artist-to-work relationships (composer, lyricist, etc.) - `l_artist_url` - Artist-to-URL relationships (official homepage, social media, etc.) - `l_recording_work` - Recording-to-work relationships (performance of) - `l_release_release_group` - Release-to-release-group relationships - `l_release_url` - Release-to-URL relationships (purchase links, streaming, etc.) **Relationship Support Tables:** - `link` - Link instances - `link_type` - Relationship type definitions - `link_attribute` - Relationship attributes - `link_attribute_type` - Attribute type definitions - `link_crediting` - Custom relationship credits - `link_text_attribute` - Text attributes for relationships ### Media Tables **Physical Media:** - `medium` - Physical media (CDs, vinyl, etc.) - `medium_format` - CD, Vinyl, Digital Media, etc. - `medium_cdtoc` - CD table of contents - `cdtoc` - CD TOC data - `cdtoc_raw` - Raw CD TOC data **Tracks:** - `track` - Individual tracks on media - `track_gid_redirect` - Track MBID redirects ### Metadata Tables **Tags:** - `tag` - Tag definitions - `tag_relation` - Tag relationships - `{entity}_tag` - Tags per entity type - `{entity}_tag_raw` - Raw user tag submissions **Ratings:** - `{entity}_rating_raw` - Raw user ratings per entity type **Annotations:** - `annotation` - Annotation text - `{entity}_annotation` - Annotations per entity type **Collections:** - `editor_collection` - User collections - `editor_collection_type` - Collection type (release, artist, etc.) - `editor_collection_{entity}` - Collection contents per entity type ### Editorial Tables **Edits:** - `edit` - Edit submissions - `edit_data` - Edit-specific data (JSON) - `edit_{entity}` - Edit to entity mappings - `vote` - User votes on edits - `edit_note` - Discussion notes on edits - `edit_note_recipient` - Edit note notifications **Editors:** - `editor` - User accounts - `editor_preference` - User preferences - `editor_language` - User language preferences - `editor_subscribe_artist` - Artist subscriptions - `editor_subscribe_collection` - Collection subscriptions - `editor_subscribe_label` - Label subscriptions - `editor_subscribe_series` - Series subscriptions - `editor_subscribe_editor` - Editor subscriptions - `editor_oauth_token` - OAuth tokens - `application` - OAuth applications **Moderation:** - `autoeditor_election` - Auto-editor elections - `autoeditor_election_vote` - Election votes - `editor_watch_preferences` - Watchlist preferences - `editor_watch_artist` - Artist watchlist - `editor_watch_release_group_type` - Release group type filters - `editor_watch_release_status` - Release status filters ### Identifier Tables **Standard Identifiers:** - `isrc` - International Standard Recording Code - `iswc` - International Standard Musical Work Code - `recording_isrc` - Recording to ISRC mapping - `work_iswc` - Work to ISWC mapping **MusicBrainz Identifiers:** - `{entity}_gid_redirect` - MBID redirects after merges **Barcodes:** - `release_barcode` - Release barcodes (EAN, UPC) ### Replication Tables (dbmirror2) **Replication System:** - `dbmirror_pending` - Pending replication packets - `dbmirror_pendingdata` - Replication data - `replication_control` - Replication state tracking **Modes:** - `RT_MASTER` - Master database (generates replication packets) - `RT_MIRROR` - Mirror database (consumes replication packets) - `RT_STANDALONE` - Standalone database (no replication) ### Auxiliary Tables **Statistics:** - `statistic` - Cached statistics - `statistic_event` - Statistic calculation events **Documentation:** - `documentation.l_{entity1}_{entity2}_example` - Relationship examples **Deprecated:** - Various `_deleted` tables for soft deletes ## Schema Management ### CreateTables.sql **Location:** `admin/sql/CreateTables.sql` **Size:** 4,068 lines **Purpose:** Complete schema definition for fresh installations **Structure:** ```sql -- Core entity tables CREATE TABLE artist (...); CREATE TABLE release (...); CREATE TABLE recording (...); -- Indexes CREATE INDEX artist_idx_name ON artist (name); CREATE INDEX artist_idx_gid ON artist (gid); -- Foreign keys ALTER TABLE artist_credit_name ADD CONSTRAINT artist_credit_name_fk_artist FOREIGN KEY (artist) REFERENCES artist(id); -- Triggers CREATE TRIGGER a_ins_artist AFTER INSERT ON artist ...; ``` ### Migration System **Location:** `admin/sql/updates/` **Count:** 332 migration files **Naming:** Date-based (YYYYMMDD-HHMMSS-description.sql) **Example Filenames:** - `20230115-mbs-12345-add-genre-table.sql` - `20230220-mbs-12346-add-event-series-relationship.sql` - `20230315-mbs-12347-add-recording-length-index.sql` **Migration Structure:** ```sql \set ON_ERROR_STOP 1 BEGIN; -- Schema changes ALTER TABLE artist ADD COLUMN disambiguation TEXT; -- Data migrations UPDATE artist SET disambiguation = '' WHERE disambiguation IS NULL; -- Constraints ALTER TABLE artist ALTER COLUMN disambiguation SET NOT NULL; COMMIT; ``` **Schema Change Variants:** - `schema-change/` subdirectory contains master/mirror variants - Master migrations may include replication setup - Mirror migrations skip replication-specific changes **Migration Tracking:** - Migrations are tracked in the database - Applied migrations recorded to prevent re-application - Rollback not supported (forward-only migrations) ## Custom ORM (Moose-based Data Layer) ### Architecture **NOT DBIx::Class** - MusicBrainz uses a custom Moose-based data access layer. **Components:** - 106 Data modules in `lib/MusicBrainz/Server/Data/` - `DBIx::Connector` for connection pooling - `Sql.pm` for query abstraction - Raw SQL via `DBD::Pg` ### Data Module Pattern **Base Class:** `MusicBrainz::Server::Data::Entity` **Example:** ```perl package MusicBrainz::Server::Data::Artist; use Moose; extends 'MusicBrainz::Server::Data::Entity'; with 'MusicBrainz::Server::Data::Role::Editable'; with 'MusicBrainz::Server::Data::Role::LinksToEdit'; with 'MusicBrainz::Server::Data::Role::Merge'; sub _table { 'artist' } sub _entity_class { 'MusicBrainz::Server::Entity::Artist' } sub _columns { return 'id, gid, name, sort_name, begin_date_year, begin_date_month, begin_date_day, end_date_year, end_date_month, end_date_day, type, area, gender, comment, edits_pending, last_updated, ended, begin_area, end_area'; } sub _column_mapping { return { id => 'id', gid => 'gid', name => 'name', sort_name => 'sort_name', type_id => 'type', area_id => 'area', gender_id => 'gender', comment => 'comment', edits_pending => 'edits_pending', last_updated => 'last_updated', ended => 'ended', begin_area_id => 'begin_area', end_area_id => 'end_area', }; } sub get_by_gid { my ($self, $gid) = @_; return $self->_get_by_key('gid', $gid); } sub insert { my ($self, $data) = @_; my $row = $self->_hash_to_row($data); my $id = $self->sql->insert_row('artist', $row, 'id'); return $self->_new_from_row($row); } ``` ### Moose Roles **Role::Editable:** - Entities that can be edited via the edit system - Provides `load_meta()` for edit counts **Role::Taggable:** - Entities that support folksonomy tags - Provides `tags()`, `add_tags()`, `remove_tags()` **Role::Rateable:** - Entities that can be rated (0-100 scale) - Provides `rating()`, `user_rating()` **Role::Relatable:** - Entities that can have relationships - Provides `relationships()`, `add_relationship()` **Role::Aliasable:** - Entities that can have alternative names - Provides `aliases()`, `add_alias()` **Role::Annotation:** - Entities that can be annotated - Provides `latest_annotation()` ### Sql.pm Abstraction **Location:** `lib/MusicBrainz/Server/Sql.pm` **Purpose:** Thin abstraction over DBI for common query patterns. **Methods:** ```perl # Single row my $row = $sql->select_single_row_hash( 'SELECT * FROM artist WHERE gid = ?', $gid ); # Multiple rows my $rows = $sql->select_list_of_hashes( 'SELECT * FROM artist WHERE area = ?', $area_id ); # Insert my $id = $sql->insert_row('artist', { gid => $gid, name => $name, sort_name => $sort_name, }, 'id'); # Update $sql->update_row('artist', { name => $new_name, }, { id => $artist_id }); # Delete $sql->delete_row('artist', { id => $artist_id }); # Transaction $sql->begin; eval { $sql->insert_row(...); $sql->update_row(...); $sql->commit; }; if ($@) { $sql->rollback; die $@; } ``` ### DBIx::Connector **Purpose:** Fast, safe DBI connection management with automatic reconnection. **Configuration:** ```perl my $conn = DBIx::Connector->new( $dsn, $username, $password, { RaiseError => 1, AutoCommit => 1, pg_enable_utf8 => 1, } ); # Execute with automatic reconnection $conn->run(sub { my $dbh = $_; $dbh->do('SELECT ...'); }); ``` ## Search Infrastructure ### Apache Solr (Primary) **Purpose:** Full-text search across all entities **Cores:** - `artist` - Artist search - `release` - Release search - `release-group` - Release group search - `recording` - Recording search - `work` - Work search - `label` - Label search - `area` - Area search - `event` - Event search - `place` - Place search - `series` - Series search - `instrument` - Instrument search - `tag` - Tag search **Indexing:** - Incremental updates via edit system - Full reindex via `admin/BuildSearchIndexes.pl` - Real-time updates for new entities **Query Features:** - Fuzzy matching - Phrase search - Boolean operators (AND, OR, NOT) - Field-specific search (artist:nirvana) - Wildcards (nirv*) - Proximity search ("smells spirit"~5) ### PostgreSQL Full-Text (Fallback) **Purpose:** Fallback when Solr is unavailable **Implementation:** - `mb_simple_tsvector` function for text vectorization - GIN indexes on tsvector columns - `to_tsquery()` for query parsing **Example:** ```sql CREATE INDEX artist_idx_name_txt ON artist USING gin(mb_simple_tsvector(name)); SELECT * FROM artist WHERE mb_simple_tsvector(name) @@ to_tsquery('simple', 'nirvana'); ``` **Limitations:** - Less sophisticated than Solr - No fuzzy matching - Limited ranking - Used only as emergency fallback ## Redis Caching ### Architecture **Databases:** 16 separate Redis databases (0-15) **Database Allocation:** - DB 0: Entity cache (GID lookups) - DB 1: Session storage - DB 2-15: Various caches (search, statistics, etc.) ### Entity Cache (GID Cache) **Purpose:** Cache entity lookups by MBID (GID) **Pattern:** ```perl # Cache key: entity:gid:{gid} my $cache_key = "artist:gid:$gid"; # Try cache first my $cached = $redis->get($cache_key); if ($cached) { return decode_json($cached); } # Cache miss - load from database my $artist = $self->sql->select_single_row_hash( 'SELECT * FROM artist WHERE gid = ?', $gid ); # Store in cache (1 hour TTL) $redis->setex($cache_key, 3600, encode_json($artist)); return $artist; ``` **TTL:** 1 hour (3600 seconds) **Invalidation:** On edit application ### Session Storage **Purpose:** Store user sessions **Pattern:** ```perl # Session key: session:{session_id} my $session_key = "session:$session_id"; # Store session $redis->setex($session_key, 36000, encode_json({ user_id => $user_id, csrf_token => $csrf_token, last_activity => time(), })); # Retrieve session my $session = decode_json($redis->get($session_key)); ``` **TTL:** 10 hours absolute, 3 hours idle **Cookie:** `AF_SID` (SameSite=Lax, Secure, HttpOnly) ### Cache Invalidation **Strategy:** Invalidate on write **Example:** ```perl # After updating artist $self->sql->update_row('artist', { name => $new_name }, { id => $id }); # Invalidate cache $redis->del("artist:gid:$gid"); ``` **Bulk Invalidation:** - Pattern-based deletion via `SCAN` + `DEL` - Used for relationship changes affecting multiple entities