Files
metadata-agregator/docs/research/musicbrainz-server/analysis/INTEGRATIONS.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

11 KiB

MusicBrainz Server Integrations

Cover Art Archive

Overview

Service: Cover Art Archive (coverartarchive.org)
Storage: Amazon S3 + Internet Archive
Purpose: Store and serve album cover artwork

Upload Process

Method: Signed POST to S3

Authentication: HMAC-SHA1 signed policy

Configuration:

# DBDefs.pm
sub COVER_ART_ARCHIVE_ACCESS_KEY { 'access_key' }
sub COVER_ART_ARCHIVE_SECRET_KEY { 'secret_key' }
sub COVER_ART_ARCHIVE_UPLOAD_PREFIXER { 'MB' }
sub COVER_ART_ARCHIVE_DOWNLOAD_PREFIX { 'https://coverartarchive.org' }

Upload Flow:

  1. User uploads image via MusicBrainz interface
  2. Server generates S3 policy document
  3. Policy signed with HMAC-SHA1 using secret key
  4. Browser POSTs directly to S3 with signed policy
  5. S3 stores image and forwards to Internet Archive
  6. Image becomes available at coverartarchive.org

Policy Document:

{
  "expiration": "2024-12-31T23:59:59Z",
  "conditions": [
    {"bucket": "mbid-{release_mbid}"},
    {"acl": "public-read"},
    ["starts-with", "$key", "mbid-{release_mbid}/"],
    ["content-length-range", 0, 10485760]
  ]
}

Signature:

use Digest::SHA qw(hmac_sha1_base64);

my $policy_b64 = encode_base64($policy_json);
my $signature = hmac_sha1_base64($policy_b64, $secret_key);
$signature .= '=' while length($signature) % 4;  # Pad to multiple of 4

Retrieval

URL Pattern: https://coverartarchive.org/release/{mbid}/front

Image Types:

  • front - Front cover
  • back - Back cover
  • {id} - Specific image by ID

Sizes:

  • Original (full resolution)
  • 250 - 250px thumbnail
  • 500 - 500px thumbnail
  • 1200 - 1200px large

Example:

https://coverartarchive.org/release/76df3287-6cda-33eb-8e9a-044b5e15ffdd/front-250.jpg

Wikipedia/Wikidata/Wikimedia Commons

MediaWiki API Integration

Purpose: Fetch article extracts, images, and structured data

Endpoints:

  • Wikipedia: https://{lang}.wikipedia.org/w/api.php
  • Wikidata: https://www.wikidata.org/w/api.php
  • Commons: https://commons.wikimedia.org/w/api.php

Wikipedia Extracts

API Action: query with prop=extracts

Request:

my $url = "https://en.wikipedia.org/w/api.php?" . 
          "action=query&" .
          "prop=extracts&" .
          "exintro=1&" .
          "explaintext=1&" .
          "titles=" . uri_escape($artist_name) .
          "&format=json";

my $response = $ua->get($url);
my $data = decode_json($response->content);

Caching: 3 days for extracts

Display: Artist/release pages show Wikipedia extract in sidebar

API Action: query with prop=langlinks

Purpose: Find Wikipedia articles in different languages

Request:

my $url = "https://en.wikipedia.org/w/api.php?" .
          "action=query&" .
          "prop=langlinks&" .
          "titles=" . uri_escape($title) .
          "&lllimit=500&" .
          "&format=json";

Caching: 7 days for language links

Usage: Display Wikipedia links in user's preferred language

Wikidata Integration

Purpose: Fetch structured data (birth dates, locations, etc.)

API Action: wbgetentities

Request:

my $url = "https://www.wikidata.org/w/api.php?" .
          "action=wbgetentities&" .
          "ids=Q{wikidata_id}&" .
          "format=json";

Data Extracted:

  • Birth/death dates
  • Birth/death places
  • Occupations
  • Genres
  • Record labels
  • Official websites

Wikimedia Commons Images

Purpose: Fetch artist/band photos

API Action: query with prop=imageinfo

Request:

my $url = "https://commons.wikimedia.org/w/api.php?" .
          "action=query&" .
          "prop=imageinfo&" .
          "iiprop=url|size|mime&" .
          "titles=File:" . uri_escape($filename) .
          "&format=json";

Display: Artist pages show Commons images in sidebar

CritiqueBrainz

Overview

Service: CritiqueBrainz (critiquebrainz.org)
Purpose: User-generated music reviews

Integration

Method: URL linking

Pattern: https://critiquebrainz.org/release/{mbid}

Display: Release pages show link to CritiqueBrainz reviews

Embedding: Review count and average rating displayed on release pages

API: CritiqueBrainz API used to fetch review statistics

Request:

my $url = "https://critiquebrainz.org/ws/1/release/$mbid";
my $response = $ua->get($url);
my $data = decode_json($response->content);

my $review_count = $data->{review_count};
my $avg_rating = $data->{average_rating};

Event Art Archive

Overview

Service: Event Art Archive
Purpose: Store event posters and promotional materials

Architecture: Similar to Cover Art Archive (S3 + Internet Archive)

URL Pattern: https://eventartarchive.org/event/{mbid}

Discourse SSO

Overview

Service: MusicBrainz Community Forum (community.metabrainz.org)
Protocol: Discourse SSO (Single Sign-On)

Authentication Flow

Method: HMAC-SHA256 signed payload

Flow:

  1. User clicks "Log in" on Discourse
  2. Discourse redirects to MusicBrainz with nonce
  3. MusicBrainz authenticates user
  4. MusicBrainz generates SSO payload
  5. Payload signed with HMAC-SHA256
  6. User redirected back to Discourse with signed payload
  7. Discourse verifies signature and logs in user

Configuration:

# DBDefs.pm
sub DISCOURSE_SSO_SECRET { 'shared_secret' }
sub DISCOURSE_SERVER { 'https://community.metabrainz.org' }

Payload Generation:

use Digest::SHA qw(hmac_sha256_hex);
use MIME::Base64;

my $payload = encode_base64(
    "nonce=$nonce&" .
    "email=$email&" .
    "external_id=$user_id&" .
    "username=$username&" .
    "name=$name"
);

my $signature = hmac_sha256_hex($payload, $sso_secret);

my $redirect_url = "$discourse_server/session/sso_login?" .
                   "sso=" . uri_escape($payload) .
                   "&sig=$signature";

User Data Synced:

  • Email address
  • Username
  • Display name
  • User ID (external_id)
  • Avatar URL (optional)
  • Admin status (optional)
  • Moderator status (optional)

MetaBrainz OAuth

Overview

Service: Centralized OAuth provider for MetaBrainz services
Protocol: OAuth 2.0 with token introspection

Token Introspection

Endpoint: https://musicbrainz.org/oauth2/introspect

Method: POST

Request:

my $response = $ua->post(
    'https://musicbrainz.org/oauth2/introspect',
    {
        token => $access_token,
        client_id => $client_id,
        client_secret => $client_secret,
    }
);

my $data = decode_json($response->content);

Response:

{
  "active": true,
  "scope": "profile email tag rating collection",
  "client_id": "client_id",
  "username": "username",
  "token_type": "Bearer",
  "exp": 1609459200,
  "iat": 1609372800,
  "sub": "user_id"
}

Usage: Other MetaBrainz services (ListenBrainz, BookBrainz, etc.) validate tokens via introspection

Services Using MetaBrainz OAuth

  • ListenBrainz (listening history)
  • BookBrainz (book metadata)
  • CritiqueBrainz (music reviews)
  • AcousticBrainz (audio analysis)
  • Picard (music tagger)

Replication System

Overview

Purpose: Synchronize database changes from master to mirrors
Protocol: dbmirror2 packet system

Replication Modes

RT_MASTER:

  • Generates replication packets
  • Writes to dbmirror_pending and dbmirror_pendingdata tables
  • Exports packets for mirrors

RT_MIRROR:

  • Consumes replication packets
  • Applies changes from master
  • Read-only (no edits)

RT_STANDALONE:

  • No replication
  • Fully independent database

Configuration:

# DBDefs.pm
sub REPLICATION_TYPE { RT_MASTER }  # or RT_MIRROR or RT_STANDALONE
sub REPLICATION_ACCESS_TOKEN { 'secret_token' }

Packet Structure

Tables:

  • dbmirror_pending - Pending transactions
  • dbmirror_pendingdata - Data changes (INSERT/UPDATE/DELETE)

Packet Format:

SeqId: 12345
TransactionId: 67890
Operation: i  # i=INSERT, u=UPDATE, d=DELETE
TableName: artist
Data: {"id":123,"gid":"...","name":"..."}

Replication Flow

Master Side:

  1. Edit applied to database
  2. Triggers capture changes to dbmirror_pending
  3. Export script generates replication packets
  4. Packets uploaded to FTP server

Mirror Side:

  1. Download replication packets from FTP
  2. Apply packets in sequence order
  3. Update replication state
  4. Verify data integrity

Packet Export:

# On master
./admin/replication/ExportReplicationPackets

# Generates packets in replication/ directory
# Uploads to FTP server

Packet Import:

# On mirror
./admin/replication/LoadReplicationChanges

# Downloads packets from FTP
# Applies changes to database

Replication Lag

Monitoring: Mirrors track replication lag (time behind master)

Typical Lag: Minutes to hours depending on packet size and network

Status Endpoint: /replication-status shows current replication state

Redis Integration

Architecture

Connection: Single Redis instance, 16 databases (0-15)

Configuration:

# DBDefs.pm
sub REDIS_SERVER { 'localhost:6379' }
sub REDIS_NAMESPACE { 'MB' }

Use Cases

Session Management (DB 1):

  • Store user sessions
  • 10 hour absolute expiry
  • 3 hour idle timeout

Entity Cache (DB 0):

  • Cache entity lookups by MBID
  • 1 hour TTL
  • Invalidate on edit

Search Cache (DB 2):

  • Cache search results
  • 15 minute TTL

Statistics Cache (DB 3):

  • Cache homepage statistics
  • 1 hour TTL

Rate Limiting (DB 4):

  • Track API request counts
  • 1 second sliding window

Pub/Sub (DB 5):

  • Real-time notifications
  • Edit submission events
  • Cache invalidation events

Connection Pooling

Library: Redis.pm with connection pooling

Pool Size: 10 connections per worker

Reconnection: Automatic reconnection on connection loss

HTTP Client

LWP::UserAgent

Purpose: HTTP client for external service communication

Configuration:

use LWP::UserAgent;

my $ua = LWP::UserAgent->new(
    agent => 'MusicBrainz/1.0 (https://musicbrainz.org)',
    timeout => 30,
    max_redirect => 5,
);

User-Agent: Always identifies as MusicBrainz with contact URL

Timeout: 30 seconds default

Redirects: Follow up to 5 redirects

SSL Verification: Enabled by default

Rate Limiting

External Services: Respect rate limits via delays

Wikipedia API: 1 request per second (recommended)

Wikidata API: 1 request per second (recommended)

Implementation:

use Time::HiRes qw(sleep);

my $last_request_time = 0;

sub rate_limited_request {
    my ($url) = @_;
    
    my $elapsed = time() - $last_request_time;
    if ($elapsed < 1) {
        sleep(1 - $elapsed);
    }
    
    my $response = $ua->get($url);
    $last_request_time = time();
    
    return $response;
}

Error Handling

Retry Logic: Exponential backoff for transient errors

Timeouts: Fail gracefully on timeout

Logging: Log all external service errors to Sentry

Example:

use Try::Tiny;

my $response;
my $retries = 3;

for my $attempt (1..$retries) {
    try {
        $response = $ua->get($url);
        last if $response->is_success;
    } catch {
        warn "Request failed (attempt $attempt): $_";
        sleep(2 ** $attempt);  # Exponential backoff
    };
}