a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
530 lines
11 KiB
Markdown
530 lines
11 KiB
Markdown
# MusicBrainz Server Integrations
|
|
|
|
## Cover Art Archive
|
|
|
|
### Overview
|
|
|
|
**Service:** Cover Art Archive (coverartarchive.org)
|
|
**Storage:** Amazon S3 + Internet Archive
|
|
**Purpose:** Store and serve album cover artwork
|
|
|
|
### Upload Process
|
|
|
|
**Method:** Signed POST to S3
|
|
|
|
**Authentication:** HMAC-SHA1 signed policy
|
|
|
|
**Configuration:**
|
|
```perl
|
|
# DBDefs.pm
|
|
sub COVER_ART_ARCHIVE_ACCESS_KEY { 'access_key' }
|
|
sub COVER_ART_ARCHIVE_SECRET_KEY { 'secret_key' }
|
|
sub COVER_ART_ARCHIVE_UPLOAD_PREFIXER { 'MB' }
|
|
sub COVER_ART_ARCHIVE_DOWNLOAD_PREFIX { 'https://coverartarchive.org' }
|
|
```
|
|
|
|
**Upload Flow:**
|
|
1. User uploads image via MusicBrainz interface
|
|
2. Server generates S3 policy document
|
|
3. Policy signed with HMAC-SHA1 using secret key
|
|
4. Browser POSTs directly to S3 with signed policy
|
|
5. S3 stores image and forwards to Internet Archive
|
|
6. Image becomes available at coverartarchive.org
|
|
|
|
**Policy Document:**
|
|
```json
|
|
{
|
|
"expiration": "2024-12-31T23:59:59Z",
|
|
"conditions": [
|
|
{"bucket": "mbid-{release_mbid}"},
|
|
{"acl": "public-read"},
|
|
["starts-with", "$key", "mbid-{release_mbid}/"],
|
|
["content-length-range", 0, 10485760]
|
|
]
|
|
}
|
|
```
|
|
|
|
**Signature:**
|
|
```perl
|
|
use Digest::SHA qw(hmac_sha1_base64);
|
|
|
|
my $policy_b64 = encode_base64($policy_json);
|
|
my $signature = hmac_sha1_base64($policy_b64, $secret_key);
|
|
$signature .= '=' while length($signature) % 4; # Pad to multiple of 4
|
|
```
|
|
|
|
### Retrieval
|
|
|
|
**URL Pattern:** `https://coverartarchive.org/release/{mbid}/front`
|
|
|
|
**Image Types:**
|
|
- `front` - Front cover
|
|
- `back` - Back cover
|
|
- `{id}` - Specific image by ID
|
|
|
|
**Sizes:**
|
|
- Original (full resolution)
|
|
- `250` - 250px thumbnail
|
|
- `500` - 500px thumbnail
|
|
- `1200` - 1200px large
|
|
|
|
**Example:**
|
|
```
|
|
https://coverartarchive.org/release/76df3287-6cda-33eb-8e9a-044b5e15ffdd/front-250.jpg
|
|
```
|
|
|
|
## Wikipedia/Wikidata/Wikimedia Commons
|
|
|
|
### MediaWiki API Integration
|
|
|
|
**Purpose:** Fetch article extracts, images, and structured data
|
|
|
|
**Endpoints:**
|
|
- Wikipedia: `https://{lang}.wikipedia.org/w/api.php`
|
|
- Wikidata: `https://www.wikidata.org/w/api.php`
|
|
- Commons: `https://commons.wikimedia.org/w/api.php`
|
|
|
|
### Wikipedia Extracts
|
|
|
|
**API Action:** `query` with `prop=extracts`
|
|
|
|
**Request:**
|
|
```perl
|
|
my $url = "https://en.wikipedia.org/w/api.php?" .
|
|
"action=query&" .
|
|
"prop=extracts&" .
|
|
"exintro=1&" .
|
|
"explaintext=1&" .
|
|
"titles=" . uri_escape($artist_name) .
|
|
"&format=json";
|
|
|
|
my $response = $ua->get($url);
|
|
my $data = decode_json($response->content);
|
|
```
|
|
|
|
**Caching:** 3 days for extracts
|
|
|
|
**Display:** Artist/release pages show Wikipedia extract in sidebar
|
|
|
|
### Language Links
|
|
|
|
**API Action:** `query` with `prop=langlinks`
|
|
|
|
**Purpose:** Find Wikipedia articles in different languages
|
|
|
|
**Request:**
|
|
```perl
|
|
my $url = "https://en.wikipedia.org/w/api.php?" .
|
|
"action=query&" .
|
|
"prop=langlinks&" .
|
|
"titles=" . uri_escape($title) .
|
|
"&lllimit=500&" .
|
|
"&format=json";
|
|
```
|
|
|
|
**Caching:** 7 days for language links
|
|
|
|
**Usage:** Display Wikipedia links in user's preferred language
|
|
|
|
### Wikidata Integration
|
|
|
|
**Purpose:** Fetch structured data (birth dates, locations, etc.)
|
|
|
|
**API Action:** `wbgetentities`
|
|
|
|
**Request:**
|
|
```perl
|
|
my $url = "https://www.wikidata.org/w/api.php?" .
|
|
"action=wbgetentities&" .
|
|
"ids=Q{wikidata_id}&" .
|
|
"format=json";
|
|
```
|
|
|
|
**Data Extracted:**
|
|
- Birth/death dates
|
|
- Birth/death places
|
|
- Occupations
|
|
- Genres
|
|
- Record labels
|
|
- Official websites
|
|
|
|
### Wikimedia Commons Images
|
|
|
|
**Purpose:** Fetch artist/band photos
|
|
|
|
**API Action:** `query` with `prop=imageinfo`
|
|
|
|
**Request:**
|
|
```perl
|
|
my $url = "https://commons.wikimedia.org/w/api.php?" .
|
|
"action=query&" .
|
|
"prop=imageinfo&" .
|
|
"iiprop=url|size|mime&" .
|
|
"titles=File:" . uri_escape($filename) .
|
|
"&format=json";
|
|
```
|
|
|
|
**Display:** Artist pages show Commons images in sidebar
|
|
|
|
## CritiqueBrainz
|
|
|
|
### Overview
|
|
|
|
**Service:** CritiqueBrainz (critiquebrainz.org)
|
|
**Purpose:** User-generated music reviews
|
|
|
|
### Integration
|
|
|
|
**Method:** URL linking
|
|
|
|
**Pattern:** `https://critiquebrainz.org/release/{mbid}`
|
|
|
|
**Display:** Release pages show link to CritiqueBrainz reviews
|
|
|
|
**Embedding:** Review count and average rating displayed on release pages
|
|
|
|
**API:** CritiqueBrainz API used to fetch review statistics
|
|
|
|
**Request:**
|
|
```perl
|
|
my $url = "https://critiquebrainz.org/ws/1/release/$mbid";
|
|
my $response = $ua->get($url);
|
|
my $data = decode_json($response->content);
|
|
|
|
my $review_count = $data->{review_count};
|
|
my $avg_rating = $data->{average_rating};
|
|
```
|
|
|
|
## Event Art Archive
|
|
|
|
### Overview
|
|
|
|
**Service:** Event Art Archive
|
|
**Purpose:** Store event posters and promotional materials
|
|
|
|
**Architecture:** Similar to Cover Art Archive (S3 + Internet Archive)
|
|
|
|
**URL Pattern:** `https://eventartarchive.org/event/{mbid}`
|
|
|
|
## Discourse SSO
|
|
|
|
### Overview
|
|
|
|
**Service:** MusicBrainz Community Forum (community.metabrainz.org)
|
|
**Protocol:** Discourse SSO (Single Sign-On)
|
|
|
|
### Authentication Flow
|
|
|
|
**Method:** HMAC-SHA256 signed payload
|
|
|
|
**Flow:**
|
|
1. User clicks "Log in" on Discourse
|
|
2. Discourse redirects to MusicBrainz with nonce
|
|
3. MusicBrainz authenticates user
|
|
4. MusicBrainz generates SSO payload
|
|
5. Payload signed with HMAC-SHA256
|
|
6. User redirected back to Discourse with signed payload
|
|
7. Discourse verifies signature and logs in user
|
|
|
|
**Configuration:**
|
|
```perl
|
|
# DBDefs.pm
|
|
sub DISCOURSE_SSO_SECRET { 'shared_secret' }
|
|
sub DISCOURSE_SERVER { 'https://community.metabrainz.org' }
|
|
```
|
|
|
|
**Payload Generation:**
|
|
```perl
|
|
use Digest::SHA qw(hmac_sha256_hex);
|
|
use MIME::Base64;
|
|
|
|
my $payload = encode_base64(
|
|
"nonce=$nonce&" .
|
|
"email=$email&" .
|
|
"external_id=$user_id&" .
|
|
"username=$username&" .
|
|
"name=$name"
|
|
);
|
|
|
|
my $signature = hmac_sha256_hex($payload, $sso_secret);
|
|
|
|
my $redirect_url = "$discourse_server/session/sso_login?" .
|
|
"sso=" . uri_escape($payload) .
|
|
"&sig=$signature";
|
|
```
|
|
|
|
**User Data Synced:**
|
|
- Email address
|
|
- Username
|
|
- Display name
|
|
- User ID (external_id)
|
|
- Avatar URL (optional)
|
|
- Admin status (optional)
|
|
- Moderator status (optional)
|
|
|
|
## MetaBrainz OAuth
|
|
|
|
### Overview
|
|
|
|
**Service:** Centralized OAuth provider for MetaBrainz services
|
|
**Protocol:** OAuth 2.0 with token introspection
|
|
|
|
### Token Introspection
|
|
|
|
**Endpoint:** `https://musicbrainz.org/oauth2/introspect`
|
|
|
|
**Method:** POST
|
|
|
|
**Request:**
|
|
```perl
|
|
my $response = $ua->post(
|
|
'https://musicbrainz.org/oauth2/introspect',
|
|
{
|
|
token => $access_token,
|
|
client_id => $client_id,
|
|
client_secret => $client_secret,
|
|
}
|
|
);
|
|
|
|
my $data = decode_json($response->content);
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"active": true,
|
|
"scope": "profile email tag rating collection",
|
|
"client_id": "client_id",
|
|
"username": "username",
|
|
"token_type": "Bearer",
|
|
"exp": 1609459200,
|
|
"iat": 1609372800,
|
|
"sub": "user_id"
|
|
}
|
|
```
|
|
|
|
**Usage:** Other MetaBrainz services (ListenBrainz, BookBrainz, etc.) validate tokens via introspection
|
|
|
|
### Services Using MetaBrainz OAuth
|
|
|
|
- ListenBrainz (listening history)
|
|
- BookBrainz (book metadata)
|
|
- CritiqueBrainz (music reviews)
|
|
- AcousticBrainz (audio analysis)
|
|
- Picard (music tagger)
|
|
|
|
## Replication System
|
|
|
|
### Overview
|
|
|
|
**Purpose:** Synchronize database changes from master to mirrors
|
|
**Protocol:** dbmirror2 packet system
|
|
|
|
### Replication Modes
|
|
|
|
**RT_MASTER:**
|
|
- Generates replication packets
|
|
- Writes to `dbmirror_pending` and `dbmirror_pendingdata` tables
|
|
- Exports packets for mirrors
|
|
|
|
**RT_MIRROR:**
|
|
- Consumes replication packets
|
|
- Applies changes from master
|
|
- Read-only (no edits)
|
|
|
|
**RT_STANDALONE:**
|
|
- No replication
|
|
- Fully independent database
|
|
|
|
**Configuration:**
|
|
```perl
|
|
# DBDefs.pm
|
|
sub REPLICATION_TYPE { RT_MASTER } # or RT_MIRROR or RT_STANDALONE
|
|
sub REPLICATION_ACCESS_TOKEN { 'secret_token' }
|
|
```
|
|
|
|
### Packet Structure
|
|
|
|
**Tables:**
|
|
- `dbmirror_pending` - Pending transactions
|
|
- `dbmirror_pendingdata` - Data changes (INSERT/UPDATE/DELETE)
|
|
|
|
**Packet Format:**
|
|
```
|
|
SeqId: 12345
|
|
TransactionId: 67890
|
|
Operation: i # i=INSERT, u=UPDATE, d=DELETE
|
|
TableName: artist
|
|
Data: {"id":123,"gid":"...","name":"..."}
|
|
```
|
|
|
|
### Replication Flow
|
|
|
|
**Master Side:**
|
|
1. Edit applied to database
|
|
2. Triggers capture changes to `dbmirror_pending`
|
|
3. Export script generates replication packets
|
|
4. Packets uploaded to FTP server
|
|
|
|
**Mirror Side:**
|
|
1. Download replication packets from FTP
|
|
2. Apply packets in sequence order
|
|
3. Update replication state
|
|
4. Verify data integrity
|
|
|
|
**Packet Export:**
|
|
```bash
|
|
# On master
|
|
./admin/replication/ExportReplicationPackets
|
|
|
|
# Generates packets in replication/ directory
|
|
# Uploads to FTP server
|
|
```
|
|
|
|
**Packet Import:**
|
|
```bash
|
|
# On mirror
|
|
./admin/replication/LoadReplicationChanges
|
|
|
|
# Downloads packets from FTP
|
|
# Applies changes to database
|
|
```
|
|
|
|
### Replication Lag
|
|
|
|
**Monitoring:** Mirrors track replication lag (time behind master)
|
|
|
|
**Typical Lag:** Minutes to hours depending on packet size and network
|
|
|
|
**Status Endpoint:** `/replication-status` shows current replication state
|
|
|
|
## Redis Integration
|
|
|
|
### Architecture
|
|
|
|
**Connection:** Single Redis instance, 16 databases (0-15)
|
|
|
|
**Configuration:**
|
|
```perl
|
|
# DBDefs.pm
|
|
sub REDIS_SERVER { 'localhost:6379' }
|
|
sub REDIS_NAMESPACE { 'MB' }
|
|
```
|
|
|
|
### Use Cases
|
|
|
|
**Session Management (DB 1):**
|
|
- Store user sessions
|
|
- 10 hour absolute expiry
|
|
- 3 hour idle timeout
|
|
|
|
**Entity Cache (DB 0):**
|
|
- Cache entity lookups by MBID
|
|
- 1 hour TTL
|
|
- Invalidate on edit
|
|
|
|
**Search Cache (DB 2):**
|
|
- Cache search results
|
|
- 15 minute TTL
|
|
|
|
**Statistics Cache (DB 3):**
|
|
- Cache homepage statistics
|
|
- 1 hour TTL
|
|
|
|
**Rate Limiting (DB 4):**
|
|
- Track API request counts
|
|
- 1 second sliding window
|
|
|
|
**Pub/Sub (DB 5):**
|
|
- Real-time notifications
|
|
- Edit submission events
|
|
- Cache invalidation events
|
|
|
|
### Connection Pooling
|
|
|
|
**Library:** Redis.pm with connection pooling
|
|
|
|
**Pool Size:** 10 connections per worker
|
|
|
|
**Reconnection:** Automatic reconnection on connection loss
|
|
|
|
## HTTP Client
|
|
|
|
### LWP::UserAgent
|
|
|
|
**Purpose:** HTTP client for external service communication
|
|
|
|
**Configuration:**
|
|
```perl
|
|
use LWP::UserAgent;
|
|
|
|
my $ua = LWP::UserAgent->new(
|
|
agent => 'MusicBrainz/1.0 (https://musicbrainz.org)',
|
|
timeout => 30,
|
|
max_redirect => 5,
|
|
);
|
|
```
|
|
|
|
**User-Agent:** Always identifies as MusicBrainz with contact URL
|
|
|
|
**Timeout:** 30 seconds default
|
|
|
|
**Redirects:** Follow up to 5 redirects
|
|
|
|
**SSL Verification:** Enabled by default
|
|
|
|
### Rate Limiting
|
|
|
|
**External Services:** Respect rate limits via delays
|
|
|
|
**Wikipedia API:** 1 request per second (recommended)
|
|
|
|
**Wikidata API:** 1 request per second (recommended)
|
|
|
|
**Implementation:**
|
|
```perl
|
|
use Time::HiRes qw(sleep);
|
|
|
|
my $last_request_time = 0;
|
|
|
|
sub rate_limited_request {
|
|
my ($url) = @_;
|
|
|
|
my $elapsed = time() - $last_request_time;
|
|
if ($elapsed < 1) {
|
|
sleep(1 - $elapsed);
|
|
}
|
|
|
|
my $response = $ua->get($url);
|
|
$last_request_time = time();
|
|
|
|
return $response;
|
|
}
|
|
```
|
|
|
|
### Error Handling
|
|
|
|
**Retry Logic:** Exponential backoff for transient errors
|
|
|
|
**Timeouts:** Fail gracefully on timeout
|
|
|
|
**Logging:** Log all external service errors to Sentry
|
|
|
|
**Example:**
|
|
```perl
|
|
use Try::Tiny;
|
|
|
|
my $response;
|
|
my $retries = 3;
|
|
|
|
for my $attempt (1..$retries) {
|
|
try {
|
|
$response = $ua->get($url);
|
|
last if $response->is_success;
|
|
} catch {
|
|
warn "Request failed (attempt $attempt): $_";
|
|
sleep(2 ** $attempt); # Exponential backoff
|
|
};
|
|
}
|
|
```
|