- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
16 KiB
MusicBrainz Server Data Layer
Database Overview
Engine: PostgreSQL 16+
Tables: 375
Foreign Key Constraints: 500+
Schema Definition: admin/sql/CreateTables.sql (4,068 lines)
Production Size: ~350GB (full dataset with indexes)
PostgreSQL Schema
Core Entity Tables
Artists:
artist- Artist entities (bands, musicians, orchestras, etc.)artist_alias- Alternative names for artistsartist_credit- Artist credit configurationsartist_credit_name- Individual artists in a creditartist_type- Artist type enumeration (person, group, etc.)artist_tag- Folksonomy tagsartist_rating_raw- User ratingsartist_annotation- User annotationsartist_gid_redirect- MBID redirects after merges
Releases:
release- Release entities (albums, singles, etc.)release_alias- Alternative release namesrelease_group- Logical grouping of releasesrelease_group_primary_type- Album, Single, EP, etc.release_group_secondary_type- Compilation, Live, Remix, etc.release_status- Official, Promotion, Bootleg, etc.release_packaging- Jewel Case, Digipak, etc.release_label- Labels associated with releaserelease_country- Release events by countryrelease_tag- Folksonomy tagsrelease_rating_raw- User ratingsrelease_annotation- User annotationsrelease_gid_redirect- MBID redirects
Recordings:
recording- Recording entities (unique audio recordings)recording_alias- Alternative recording namesrecording_tag- Folksonomy tagsrecording_rating_raw- User ratingsrecording_annotation- User annotationsrecording_gid_redirect- MBID redirectsisrc- International Standard Recording Codesrecording_isrc- Recording to ISRC mapping
Works:
work- Musical composition entitieswork_alias- Alternative work nameswork_type- Song, Symphony, Opera, etc.work_attribute- Work attributes (key, tempo, etc.)work_attribute_type- Attribute type definitionswork_tag- Folksonomy tagswork_rating_raw- User ratingswork_annotation- User annotationswork_gid_redirect- MBID redirectsiswc- International Standard Musical Work Codeswork_iswc- Work to ISWC mapping
Labels:
label- Record label entitieslabel_alias- Alternative label nameslabel_type- Original Production, Bootleg Production, etc.label_tag- Folksonomy tagslabel_rating_raw- User ratingslabel_annotation- User annotationslabel_gid_redirect- MBID redirects
Geographic:
area- Geographic areas (countries, cities, etc.)area_alias- Alternative area namesarea_type- Country, Subdivision, City, etc.area_tag- Folksonomy tagsarea_annotation- User annotationsarea_gid_redirect- MBID redirectscountry_area- ISO country code mappingiso_3166_1- ISO 3166-1 country codesiso_3166_2- ISO 3166-2 subdivision codesiso_3166_3- ISO 3166-3 former country codes
Events:
event- Event entities (concerts, festivals, etc.)event_alias- Alternative event namesevent_type- Concert, Festival, etc.event_tag- Folksonomy tagsevent_rating_raw- User ratingsevent_annotation- User annotationsevent_gid_redirect- MBID redirects
Places:
place- Venue/location entitiesplace_alias- Alternative place namesplace_type- Venue, Studio, etc.place_tag- Folksonomy tagsplace_annotation- User annotationsplace_gid_redirect- MBID redirects
Series:
series- Ordered sequence entitiesseries_alias- Alternative series namesseries_type- Release group series, etc.series_ordering_type- Automatic, Manualseries_tag- Folksonomy tagsseries_annotation- User annotationsseries_gid_redirect- MBID redirects
Instruments:
instrument- Musical instrument entitiesinstrument_alias- Alternative instrument namesinstrument_type- Wind, String, Percussion, etc.instrument_tag- Folksonomy tagsinstrument_annotation- User annotationsinstrument_gid_redirect- MBID redirects
Genres:
genre- Genre entitiesgenre_alias- Alternative genre namesgenre_annotation- User annotationsgenre_gid_redirect- MBID redirects
URLs:
url- External URL entitiesurl_gid_redirect- MBID redirects
Relationship Tables (l_* tables)
Pattern: l_{entity1}_{entity2} for relationships between entities.
Examples:
l_artist_artist- Artist-to-artist relationships (member of, collaboration, etc.)l_artist_recording- Artist-to-recording relationships (performer, conductor, etc.)l_artist_release- Artist-to-release relationshipsl_artist_release_group- Artist-to-release-group relationshipsl_artist_work- Artist-to-work relationships (composer, lyricist, etc.)l_artist_url- Artist-to-URL relationships (official homepage, social media, etc.)l_recording_work- Recording-to-work relationships (performance of)l_release_release_group- Release-to-release-group relationshipsl_release_url- Release-to-URL relationships (purchase links, streaming, etc.)
Relationship Support Tables:
link- Link instanceslink_type- Relationship type definitionslink_attribute- Relationship attributeslink_attribute_type- Attribute type definitionslink_crediting- Custom relationship creditslink_text_attribute- Text attributes for relationships
Media Tables
Physical Media:
medium- Physical media (CDs, vinyl, etc.)medium_format- CD, Vinyl, Digital Media, etc.medium_cdtoc- CD table of contentscdtoc- CD TOC datacdtoc_raw- Raw CD TOC data
Tracks:
track- Individual tracks on mediatrack_gid_redirect- Track MBID redirects
Metadata Tables
Tags:
tag- Tag definitionstag_relation- Tag relationships{entity}_tag- Tags per entity type{entity}_tag_raw- Raw user tag submissions
Ratings:
{entity}_rating_raw- Raw user ratings per entity type
Annotations:
annotation- Annotation text{entity}_annotation- Annotations per entity type
Collections:
editor_collection- User collectionseditor_collection_type- Collection type (release, artist, etc.)editor_collection_{entity}- Collection contents per entity type
Editorial Tables
Edits:
edit- Edit submissionsedit_data- Edit-specific data (JSON)edit_{entity}- Edit to entity mappingsvote- User votes on editsedit_note- Discussion notes on editsedit_note_recipient- Edit note notifications
Editors:
editor- User accountseditor_preference- User preferenceseditor_language- User language preferenceseditor_subscribe_artist- Artist subscriptionseditor_subscribe_collection- Collection subscriptionseditor_subscribe_label- Label subscriptionseditor_subscribe_series- Series subscriptionseditor_subscribe_editor- Editor subscriptionseditor_oauth_token- OAuth tokensapplication- OAuth applications
Moderation:
autoeditor_election- Auto-editor electionsautoeditor_election_vote- Election voteseditor_watch_preferences- Watchlist preferenceseditor_watch_artist- Artist watchlisteditor_watch_release_group_type- Release group type filterseditor_watch_release_status- Release status filters
Identifier Tables
Standard Identifiers:
isrc- International Standard Recording Codeiswc- International Standard Musical Work Coderecording_isrc- Recording to ISRC mappingwork_iswc- Work to ISWC mapping
MusicBrainz Identifiers:
{entity}_gid_redirect- MBID redirects after merges
Barcodes:
release_barcode- Release barcodes (EAN, UPC)
Replication Tables (dbmirror2)
Replication System:
dbmirror_pending- Pending replication packetsdbmirror_pendingdata- Replication datareplication_control- Replication state tracking
Modes:
RT_MASTER- Master database (generates replication packets)RT_MIRROR- Mirror database (consumes replication packets)RT_STANDALONE- Standalone database (no replication)
Auxiliary Tables
Statistics:
statistic- Cached statisticsstatistic_event- Statistic calculation events
Documentation:
documentation.l_{entity1}_{entity2}_example- Relationship examples
Deprecated:
- Various
_deletedtables for soft deletes
Schema Management
CreateTables.sql
Location: admin/sql/CreateTables.sql
Size: 4,068 lines
Purpose: Complete schema definition for fresh installations
Structure:
-- Core entity tables
CREATE TABLE artist (...);
CREATE TABLE release (...);
CREATE TABLE recording (...);
-- Indexes
CREATE INDEX artist_idx_name ON artist (name);
CREATE INDEX artist_idx_gid ON artist (gid);
-- Foreign keys
ALTER TABLE artist_credit_name
ADD CONSTRAINT artist_credit_name_fk_artist
FOREIGN KEY (artist) REFERENCES artist(id);
-- Triggers
CREATE TRIGGER a_ins_artist AFTER INSERT ON artist ...;
Migration System
Location: admin/sql/updates/
Count: 332 migration files
Naming: Date-based (YYYYMMDD-HHMMSS-description.sql)
Example Filenames:
20230115-mbs-12345-add-genre-table.sql20230220-mbs-12346-add-event-series-relationship.sql20230315-mbs-12347-add-recording-length-index.sql
Migration Structure:
\set ON_ERROR_STOP 1
BEGIN;
-- Schema changes
ALTER TABLE artist ADD COLUMN disambiguation TEXT;
-- Data migrations
UPDATE artist SET disambiguation = '' WHERE disambiguation IS NULL;
-- Constraints
ALTER TABLE artist ALTER COLUMN disambiguation SET NOT NULL;
COMMIT;
Schema Change Variants:
schema-change/subdirectory contains master/mirror variants- Master migrations may include replication setup
- Mirror migrations skip replication-specific changes
Migration Tracking:
- Migrations are tracked in the database
- Applied migrations recorded to prevent re-application
- Rollback not supported (forward-only migrations)
Custom ORM (Moose-based Data Layer)
Architecture
NOT DBIx::Class - MusicBrainz uses a custom Moose-based data access layer.
Components:
- 106 Data modules in
lib/MusicBrainz/Server/Data/ DBIx::Connectorfor connection poolingSql.pmfor query abstraction- Raw SQL via
DBD::Pg
Data Module Pattern
Base Class: MusicBrainz::Server::Data::Entity
Example:
package MusicBrainz::Server::Data::Artist;
use Moose;
extends 'MusicBrainz::Server::Data::Entity';
with 'MusicBrainz::Server::Data::Role::Editable';
with 'MusicBrainz::Server::Data::Role::LinksToEdit';
with 'MusicBrainz::Server::Data::Role::Merge';
sub _table { 'artist' }
sub _entity_class { 'MusicBrainz::Server::Entity::Artist' }
sub _columns {
return 'id, gid, name, sort_name, begin_date_year, begin_date_month,
begin_date_day, end_date_year, end_date_month, end_date_day,
type, area, gender, comment, edits_pending, last_updated,
ended, begin_area, end_area';
}
sub _column_mapping {
return {
id => 'id',
gid => 'gid',
name => 'name',
sort_name => 'sort_name',
type_id => 'type',
area_id => 'area',
gender_id => 'gender',
comment => 'comment',
edits_pending => 'edits_pending',
last_updated => 'last_updated',
ended => 'ended',
begin_area_id => 'begin_area',
end_area_id => 'end_area',
};
}
sub get_by_gid {
my ($self, $gid) = @_;
return $self->_get_by_key('gid', $gid);
}
sub insert {
my ($self, $data) = @_;
my $row = $self->_hash_to_row($data);
my $id = $self->sql->insert_row('artist', $row, 'id');
return $self->_new_from_row($row);
}
Moose Roles
Role::Editable:
- Entities that can be edited via the edit system
- Provides
load_meta()for edit counts
Role::Taggable:
- Entities that support folksonomy tags
- Provides
tags(),add_tags(),remove_tags()
Role::Rateable:
- Entities that can be rated (0-100 scale)
- Provides
rating(),user_rating()
Role::Relatable:
- Entities that can have relationships
- Provides
relationships(),add_relationship()
Role::Aliasable:
- Entities that can have alternative names
- Provides
aliases(),add_alias()
Role::Annotation:
- Entities that can be annotated
- Provides
latest_annotation()
Sql.pm Abstraction
Location: lib/MusicBrainz/Server/Sql.pm
Purpose: Thin abstraction over DBI for common query patterns.
Methods:
# Single row
my $row = $sql->select_single_row_hash(
'SELECT * FROM artist WHERE gid = ?', $gid
);
# Multiple rows
my $rows = $sql->select_list_of_hashes(
'SELECT * FROM artist WHERE area = ?', $area_id
);
# Insert
my $id = $sql->insert_row('artist', {
gid => $gid,
name => $name,
sort_name => $sort_name,
}, 'id');
# Update
$sql->update_row('artist', {
name => $new_name,
}, { id => $artist_id });
# Delete
$sql->delete_row('artist', { id => $artist_id });
# Transaction
$sql->begin;
eval {
$sql->insert_row(...);
$sql->update_row(...);
$sql->commit;
};
if ($@) {
$sql->rollback;
die $@;
}
DBIx::Connector
Purpose: Fast, safe DBI connection management with automatic reconnection.
Configuration:
my $conn = DBIx::Connector->new(
$dsn, $username, $password,
{
RaiseError => 1,
AutoCommit => 1,
pg_enable_utf8 => 1,
}
);
# Execute with automatic reconnection
$conn->run(sub {
my $dbh = $_;
$dbh->do('SELECT ...');
});
Search Infrastructure
Apache Solr (Primary)
Purpose: Full-text search across all entities
Cores:
artist- Artist searchrelease- Release searchrelease-group- Release group searchrecording- Recording searchwork- Work searchlabel- Label searcharea- Area searchevent- Event searchplace- Place searchseries- Series searchinstrument- Instrument searchtag- Tag search
Indexing:
- Incremental updates via edit system
- Full reindex via
admin/BuildSearchIndexes.pl - Real-time updates for new entities
Query Features:
- Fuzzy matching
- Phrase search
- Boolean operators (AND, OR, NOT)
- Field-specific search (artist:nirvana)
- Wildcards (nirv*)
- Proximity search ("smells spirit"~5)
PostgreSQL Full-Text (Fallback)
Purpose: Fallback when Solr is unavailable
Implementation:
mb_simple_tsvectorfunction for text vectorization- GIN indexes on tsvector columns
to_tsquery()for query parsing
Example:
CREATE INDEX artist_idx_name_txt ON artist
USING gin(mb_simple_tsvector(name));
SELECT * FROM artist
WHERE mb_simple_tsvector(name) @@ to_tsquery('simple', 'nirvana');
Limitations:
- Less sophisticated than Solr
- No fuzzy matching
- Limited ranking
- Used only as emergency fallback
Redis Caching
Architecture
Databases: 16 separate Redis databases (0-15)
Database Allocation:
- DB 0: Entity cache (GID lookups)
- DB 1: Session storage
- DB 2-15: Various caches (search, statistics, etc.)
Entity Cache (GID Cache)
Purpose: Cache entity lookups by MBID (GID)
Pattern:
# Cache key: entity:gid:{gid}
my $cache_key = "artist:gid:$gid";
# Try cache first
my $cached = $redis->get($cache_key);
if ($cached) {
return decode_json($cached);
}
# Cache miss - load from database
my $artist = $self->sql->select_single_row_hash(
'SELECT * FROM artist WHERE gid = ?', $gid
);
# Store in cache (1 hour TTL)
$redis->setex($cache_key, 3600, encode_json($artist));
return $artist;
TTL: 1 hour (3600 seconds)
Invalidation: On edit application
Session Storage
Purpose: Store user sessions
Pattern:
# Session key: session:{session_id}
my $session_key = "session:$session_id";
# Store session
$redis->setex($session_key, 36000, encode_json({
user_id => $user_id,
csrf_token => $csrf_token,
last_activity => time(),
}));
# Retrieve session
my $session = decode_json($redis->get($session_key));
TTL: 10 hours absolute, 3 hours idle
Cookie: AF_SID (SameSite=Lax, Secure, HttpOnly)
Cache Invalidation
Strategy: Invalidate on write
Example:
# After updating artist
$self->sql->update_row('artist', { name => $new_name }, { id => $id });
# Invalidate cache
$redis->del("artist:gid:$gid");
Bulk Invalidation:
- Pattern-based deletion via
SCAN+DEL - Used for relationship changes affecting multiple entities