# MusicBrainz Ingestion Architecture documentation for ingesting music metadata from MusicBrainz. --- ## Overview **MusicBrainz** is an open music encyclopedia maintained by the MetaBrainz Foundation. It serves as the canonical source for music metadata with community-curated data covering artists, releases, recordings, and works. | Attribute | Value | |-----------|-------| | Data Quality | High (community-curated) | | Coverage | ~2M artists, ~3M releases, ~30M recordings | | Update Frequency | Real-time edits, weekly dumps | | API Style | REST with Lucene search | | Cost | Free (rate-limited) | --- ## Data Model MusicBrainz uses a hierarchical model that separates abstract concepts from concrete manifestations. ### Entity Hierarchy ``` ┌──────────┐ │ WORK │ ← Composition (the song as written) │ (ISWC) │ "Bohemian Rhapsody" by Freddie Mercury └────┬─────┘ │ performed as ▼ ┌──────────┐ │RECORDING │ ← Unique audio (specific performance) │ (ISRC) │ Studio version, live version, demo └────┬─────┘ │ appears on ▼ ┌──────────┐ ┌──────────┐ │ ARTIST │◄─────────►│ RELEASE │ ← Physical/digital product │ (MBID) │ credited │ (UPC) │ US CD, UK Vinyl, Spotify release └──────────┘ on └────┬─────┘ │ variant of ▼ ┌──────────┐ │ RELEASE │ ← Abstract album concept │ GROUP │ "A Night at the Opera" (all editions) └──────────┘ ``` ### Core Entities | Entity | Description | Identifier | Example | |--------|-------------|------------|---------| | **Artist** | Musician, band, orchestra, composer | MBID | Queen, Freddie Mercury | | **Work** | Abstract composition | ISWC | "Bohemian Rhapsody" (the song) | | **Recording** | Specific audio performance | ISRC | Studio recording of Bohemian Rhapsody | | **Release** | Concrete product (CD, vinyl, digital) | Barcode/UPC | 1975 UK vinyl pressing | | **Release Group** | Abstract album (all editions) | MBID | "A Night at the Opera" | | **Label** | Record label or imprint | MBID | EMI, Hollywood Records | ### Key Distinction: Release vs Release Group **Release Group** = The abstract album concept - "Nevermind" by Nirvana **Release** = A specific physical or digital product - 1991 US CD (DGC) - 1991 UK CD (Geffen) - 2011 Deluxe Edition (4 CDs) - 2021 30th Anniversary Super Deluxe This separation allows tracking all variants while maintaining a single "album" identity. ### Key Distinction: Recording vs Work **Work** = The composition (what was written) - Composer: Kurt Cobain - ISWC identifier - No audio - just the abstract song **Recording** = A specific audio capture - Performer: Nirvana - ISRC identifier - Has duration, audio characteristics - Multiple recordings of same work (studio, live, acoustic) --- ## Relationship System MusicBrainz uses **Advanced Relationships (ARs)** to connect entities with typed, attributed links. ### Relationship Types **Artist ↔ Artist:** - `member of band` (with dates) - `collaboration` - `teacher of` **Artist ↔ Recording:** - `performer` (with instrument) - `producer` - `engineer` - `mix` **Artist ↔ Work:** - `composer` - `lyricist` - `writer` **Recording ↔ Work:** - `performance of` **Artist ↔ URL:** - `official homepage` - `social network` (Spotify, YouTube, etc.) - `streaming` ### Relationship Attributes Relationships carry attributes providing detail: ``` Artist: John Lennon └─► Recording: "Come Together" Relationship: performer Attributes: - instrument: vocals - instrument: rhythm guitar ``` --- ## API Access Patterns ### Three Methods | Method | Purpose | Use Case | |--------|---------|----------| | **Lookup** | Fetch single entity by MBID | Known entity, need full details | | **Browse** | Paginate related entities | All albums by artist, all tracks on album | | **Search** | Find entities by criteria | Find artist by name, recording by ISRC | ### Lookup Direct fetch by MusicBrainz ID (MBID). Returns single entity with optional related data via `inc` parameter. Related data options: `releases`, `recordings`, `url-rels`, `artist-rels`, `genres`, `labels`, `media`, `isrcs` **Limitation:** Related entities capped at 25 per request. Use Browse for complete lists. ### Browse Paginated fetch of entities related to another entity. Supports up to 100 items per request. Must iterate with offset for complete data. ### Search Lucene-syntax queries across entity fields. Useful for: - Finding entities by name (fuzzy matching) - Looking up by external identifier (ISRC, barcode) - Filtering by attributes (country, type, date) --- ## Rate Limiting | Rule | Limit | |------|-------| | Requests per second | **1** (hard limit) | | Burst allowance | None | | Violation penalty | HTTP 503 until rate drops | | User-Agent | **Required** (blocked without) | User-Agent format: `AppName/Version ( contact-url-or-email )` --- ## Entity Mapping to Internal Schema ### Artist | MusicBrainz | Internal | Notes | |-------------|----------|-------| | `id` | `source_id` | MBID stored as external reference | | `name` | `name` | | | `sort-name` | `sort_name` | | | `type` | `artist_type` | Person, Group, Orchestra, etc. | | `country` | `country` | ISO code | | `life-span.begin` | `formed_date` | | | `life-span.end` | `disbanded_date` | | | `disambiguation` | `description` | Short disambiguator | | URL relationship (image) | `image_url` | From Wikimedia Commons link | ### Album (from Release Group) | MusicBrainz | Internal | Notes | |-------------|----------|-------| | `id` | `source_id` | Release Group MBID | | `title` | `title` | | | `primary-type` | `album_type` | Album, EP, Single | | `first-release-date` | `release_date` | Earliest release | | Label from release | `label_id` | From canonical release | ### Track (from Recording) | MusicBrainz | Internal | Notes | |-------------|----------|-------| | `id` | `source_id` | Recording MBID | | `title` | `title` | | | `length` | `duration_ms` | In milliseconds | | `isrcs[0]` | `isrc` | First ISRC if multiple | | Work relationship | `work_id` | Link to composition | ### Work | MusicBrainz | Internal | Notes | |-------------|----------|-------| | `id` | `source_id` | Work MBID | | `title` | `title` | | | `type` | `work_type` | Song, Symphony, Opera, etc. | | `language` | `language` | ISO code | ### Label | MusicBrainz | Internal | Notes | |-------------|----------|-------| | `id` | `source_id` | Label MBID | | `name` | `name` | | | `country` | `country` | ISO code | | `life-span.begin` | `founded_date` | | --- ## Ingestion Flow ### Artist Discovery ``` INPUT: Artist name │ ▼ ┌─────────────────────────────────────┐ │ SEARCH by name │ │ → Ranked matches with scores │ │ → Select highest + verify │ └─────────────────┬───────────────────┘ │ MBID ▼ ┌─────────────────────────────────────┐ │ LOOKUP with relationships │ │ → URLs, genres, band members │ └─────────────────┬───────────────────┘ │ ▼ STORE: artist + external_id + genres ``` ### Discography Sync ``` INPUT: Artist MBID │ ▼ ┌─────────────────────────────────────┐ │ BROWSE all release-groups │ │ → Filter: album, ep, single │ │ → Paginate until exhausted │ └─────────────────┬───────────────────┘ │ for each ▼ ┌─────────────────────────────────────┐ │ LOOKUP release-group │ │ → Get releases list │ │ → Select canonical release │ └─────────────────┬───────────────────┘ │ release MBID ▼ ┌─────────────────────────────────────┐ │ LOOKUP release with tracks │ │ → Media structure (discs) │ │ → Track positions │ │ → ISRCs, label info │ └─────────────────┬───────────────────┘ │ ▼ STORE: album + tracks + positions ``` ### Canonical Release Selection When a release-group has multiple releases, select one as canonical: | Priority | Criteria | |----------|----------| | 1 | Status: Official > Promotional > Bootleg | | 2 | Format: Digital > CD > Vinyl | | 3 | Completeness: Has barcode, has label | | 4 | Date: Original release preferred | --- ## Cover Art Album artwork served by **Cover Art Archive** (coverartarchive.org), not MusicBrainz directly. | Size | URL Pattern | |------|-------------| | Original | `/release/{release_mbid}/front` | | Thumbnail | `/release/{release_mbid}/front-250` | | Medium | `/release/{release_mbid}/front-500` | | Large | `/release/{release_mbid}/front-1200` | Not all releases have cover art. Check availability via release metadata. --- ## Bulk Data Access For large-scale ingestion, database dumps avoid rate limits. | Source | Format | Frequency | Use Case | |--------|--------|-----------|----------| | JSON dumps | JSONL (gzipped) | 2x/week | Initial seeding | | PostgreSQL dumps | SQL | 2x/week | Full mirror | | Replication packets | Incremental | Hourly | Staying in sync | ### Recommended Strategy | Phase | Method | |-------|--------| | Initial load | JSON dumps | | On-demand | Live API with caching | | Periodic refresh | JSON dumps monthly | --- ## Caching | Entity | TTL | Rationale | |--------|-----|-----------| | Artist | 30 days | Rarely changes | | Album | 30 days | Rarely changes | | Track | 30 days | Rarely changes | | Search results | 24 hours | New entries may appear | --- ## External ID Storage Store in `*_external_ids` tables: | Field | Value | |-------|-------| | `source` | `"musicbrainz"` | | `source_id` | MBID (UUID) | | `url` | `https://musicbrainz.org/{entity}/{mbid}` | Enables: - Cross-source deduplication - Lookup by MBID from other services - Link back for verification --- ## Go Client Recommended: `go.uploadedlobster.com/musicbrainzws2`