Files
metadata-agregator/docs/research/musicbrainz-server/analysis/INTEGRATIONS.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

530 lines
11 KiB
Markdown

# MusicBrainz Server Integrations
## Cover Art Archive
### Overview
**Service:** Cover Art Archive (coverartarchive.org)
**Storage:** Amazon S3 + Internet Archive
**Purpose:** Store and serve album cover artwork
### Upload Process
**Method:** Signed POST to S3
**Authentication:** HMAC-SHA1 signed policy
**Configuration:**
```perl
# DBDefs.pm
sub COVER_ART_ARCHIVE_ACCESS_KEY { 'access_key' }
sub COVER_ART_ARCHIVE_SECRET_KEY { 'secret_key' }
sub COVER_ART_ARCHIVE_UPLOAD_PREFIXER { 'MB' }
sub COVER_ART_ARCHIVE_DOWNLOAD_PREFIX { 'https://coverartarchive.org' }
```
**Upload Flow:**
1. User uploads image via MusicBrainz interface
2. Server generates S3 policy document
3. Policy signed with HMAC-SHA1 using secret key
4. Browser POSTs directly to S3 with signed policy
5. S3 stores image and forwards to Internet Archive
6. Image becomes available at coverartarchive.org
**Policy Document:**
```json
{
"expiration": "2024-12-31T23:59:59Z",
"conditions": [
{"bucket": "mbid-{release_mbid}"},
{"acl": "public-read"},
["starts-with", "$key", "mbid-{release_mbid}/"],
["content-length-range", 0, 10485760]
]
}
```
**Signature:**
```perl
use Digest::SHA qw(hmac_sha1_base64);
my $policy_b64 = encode_base64($policy_json);
my $signature = hmac_sha1_base64($policy_b64, $secret_key);
$signature .= '=' while length($signature) % 4; # Pad to multiple of 4
```
### Retrieval
**URL Pattern:** `https://coverartarchive.org/release/{mbid}/front`
**Image Types:**
- `front` - Front cover
- `back` - Back cover
- `{id}` - Specific image by ID
**Sizes:**
- Original (full resolution)
- `250` - 250px thumbnail
- `500` - 500px thumbnail
- `1200` - 1200px large
**Example:**
```
https://coverartarchive.org/release/76df3287-6cda-33eb-8e9a-044b5e15ffdd/front-250.jpg
```
## Wikipedia/Wikidata/Wikimedia Commons
### MediaWiki API Integration
**Purpose:** Fetch article extracts, images, and structured data
**Endpoints:**
- Wikipedia: `https://{lang}.wikipedia.org/w/api.php`
- Wikidata: `https://www.wikidata.org/w/api.php`
- Commons: `https://commons.wikimedia.org/w/api.php`
### Wikipedia Extracts
**API Action:** `query` with `prop=extracts`
**Request:**
```perl
my $url = "https://en.wikipedia.org/w/api.php?" .
"action=query&" .
"prop=extracts&" .
"exintro=1&" .
"explaintext=1&" .
"titles=" . uri_escape($artist_name) .
"&format=json";
my $response = $ua->get($url);
my $data = decode_json($response->content);
```
**Caching:** 3 days for extracts
**Display:** Artist/release pages show Wikipedia extract in sidebar
### Language Links
**API Action:** `query` with `prop=langlinks`
**Purpose:** Find Wikipedia articles in different languages
**Request:**
```perl
my $url = "https://en.wikipedia.org/w/api.php?" .
"action=query&" .
"prop=langlinks&" .
"titles=" . uri_escape($title) .
"&lllimit=500&" .
"&format=json";
```
**Caching:** 7 days for language links
**Usage:** Display Wikipedia links in user's preferred language
### Wikidata Integration
**Purpose:** Fetch structured data (birth dates, locations, etc.)
**API Action:** `wbgetentities`
**Request:**
```perl
my $url = "https://www.wikidata.org/w/api.php?" .
"action=wbgetentities&" .
"ids=Q{wikidata_id}&" .
"format=json";
```
**Data Extracted:**
- Birth/death dates
- Birth/death places
- Occupations
- Genres
- Record labels
- Official websites
### Wikimedia Commons Images
**Purpose:** Fetch artist/band photos
**API Action:** `query` with `prop=imageinfo`
**Request:**
```perl
my $url = "https://commons.wikimedia.org/w/api.php?" .
"action=query&" .
"prop=imageinfo&" .
"iiprop=url|size|mime&" .
"titles=File:" . uri_escape($filename) .
"&format=json";
```
**Display:** Artist pages show Commons images in sidebar
## CritiqueBrainz
### Overview
**Service:** CritiqueBrainz (critiquebrainz.org)
**Purpose:** User-generated music reviews
### Integration
**Method:** URL linking
**Pattern:** `https://critiquebrainz.org/release/{mbid}`
**Display:** Release pages show link to CritiqueBrainz reviews
**Embedding:** Review count and average rating displayed on release pages
**API:** CritiqueBrainz API used to fetch review statistics
**Request:**
```perl
my $url = "https://critiquebrainz.org/ws/1/release/$mbid";
my $response = $ua->get($url);
my $data = decode_json($response->content);
my $review_count = $data->{review_count};
my $avg_rating = $data->{average_rating};
```
## Event Art Archive
### Overview
**Service:** Event Art Archive
**Purpose:** Store event posters and promotional materials
**Architecture:** Similar to Cover Art Archive (S3 + Internet Archive)
**URL Pattern:** `https://eventartarchive.org/event/{mbid}`
## Discourse SSO
### Overview
**Service:** MusicBrainz Community Forum (community.metabrainz.org)
**Protocol:** Discourse SSO (Single Sign-On)
### Authentication Flow
**Method:** HMAC-SHA256 signed payload
**Flow:**
1. User clicks "Log in" on Discourse
2. Discourse redirects to MusicBrainz with nonce
3. MusicBrainz authenticates user
4. MusicBrainz generates SSO payload
5. Payload signed with HMAC-SHA256
6. User redirected back to Discourse with signed payload
7. Discourse verifies signature and logs in user
**Configuration:**
```perl
# DBDefs.pm
sub DISCOURSE_SSO_SECRET { 'shared_secret' }
sub DISCOURSE_SERVER { 'https://community.metabrainz.org' }
```
**Payload Generation:**
```perl
use Digest::SHA qw(hmac_sha256_hex);
use MIME::Base64;
my $payload = encode_base64(
"nonce=$nonce&" .
"email=$email&" .
"external_id=$user_id&" .
"username=$username&" .
"name=$name"
);
my $signature = hmac_sha256_hex($payload, $sso_secret);
my $redirect_url = "$discourse_server/session/sso_login?" .
"sso=" . uri_escape($payload) .
"&sig=$signature";
```
**User Data Synced:**
- Email address
- Username
- Display name
- User ID (external_id)
- Avatar URL (optional)
- Admin status (optional)
- Moderator status (optional)
## MetaBrainz OAuth
### Overview
**Service:** Centralized OAuth provider for MetaBrainz services
**Protocol:** OAuth 2.0 with token introspection
### Token Introspection
**Endpoint:** `https://musicbrainz.org/oauth2/introspect`
**Method:** POST
**Request:**
```perl
my $response = $ua->post(
'https://musicbrainz.org/oauth2/introspect',
{
token => $access_token,
client_id => $client_id,
client_secret => $client_secret,
}
);
my $data = decode_json($response->content);
```
**Response:**
```json
{
"active": true,
"scope": "profile email tag rating collection",
"client_id": "client_id",
"username": "username",
"token_type": "Bearer",
"exp": 1609459200,
"iat": 1609372800,
"sub": "user_id"
}
```
**Usage:** Other MetaBrainz services (ListenBrainz, BookBrainz, etc.) validate tokens via introspection
### Services Using MetaBrainz OAuth
- ListenBrainz (listening history)
- BookBrainz (book metadata)
- CritiqueBrainz (music reviews)
- AcousticBrainz (audio analysis)
- Picard (music tagger)
## Replication System
### Overview
**Purpose:** Synchronize database changes from master to mirrors
**Protocol:** dbmirror2 packet system
### Replication Modes
**RT_MASTER:**
- Generates replication packets
- Writes to `dbmirror_pending` and `dbmirror_pendingdata` tables
- Exports packets for mirrors
**RT_MIRROR:**
- Consumes replication packets
- Applies changes from master
- Read-only (no edits)
**RT_STANDALONE:**
- No replication
- Fully independent database
**Configuration:**
```perl
# DBDefs.pm
sub REPLICATION_TYPE { RT_MASTER } # or RT_MIRROR or RT_STANDALONE
sub REPLICATION_ACCESS_TOKEN { 'secret_token' }
```
### Packet Structure
**Tables:**
- `dbmirror_pending` - Pending transactions
- `dbmirror_pendingdata` - Data changes (INSERT/UPDATE/DELETE)
**Packet Format:**
```
SeqId: 12345
TransactionId: 67890
Operation: i # i=INSERT, u=UPDATE, d=DELETE
TableName: artist
Data: {"id":123,"gid":"...","name":"..."}
```
### Replication Flow
**Master Side:**
1. Edit applied to database
2. Triggers capture changes to `dbmirror_pending`
3. Export script generates replication packets
4. Packets uploaded to FTP server
**Mirror Side:**
1. Download replication packets from FTP
2. Apply packets in sequence order
3. Update replication state
4. Verify data integrity
**Packet Export:**
```bash
# On master
./admin/replication/ExportReplicationPackets
# Generates packets in replication/ directory
# Uploads to FTP server
```
**Packet Import:**
```bash
# On mirror
./admin/replication/LoadReplicationChanges
# Downloads packets from FTP
# Applies changes to database
```
### Replication Lag
**Monitoring:** Mirrors track replication lag (time behind master)
**Typical Lag:** Minutes to hours depending on packet size and network
**Status Endpoint:** `/replication-status` shows current replication state
## Redis Integration
### Architecture
**Connection:** Single Redis instance, 16 databases (0-15)
**Configuration:**
```perl
# DBDefs.pm
sub REDIS_SERVER { 'localhost:6379' }
sub REDIS_NAMESPACE { 'MB' }
```
### Use Cases
**Session Management (DB 1):**
- Store user sessions
- 10 hour absolute expiry
- 3 hour idle timeout
**Entity Cache (DB 0):**
- Cache entity lookups by MBID
- 1 hour TTL
- Invalidate on edit
**Search Cache (DB 2):**
- Cache search results
- 15 minute TTL
**Statistics Cache (DB 3):**
- Cache homepage statistics
- 1 hour TTL
**Rate Limiting (DB 4):**
- Track API request counts
- 1 second sliding window
**Pub/Sub (DB 5):**
- Real-time notifications
- Edit submission events
- Cache invalidation events
### Connection Pooling
**Library:** Redis.pm with connection pooling
**Pool Size:** 10 connections per worker
**Reconnection:** Automatic reconnection on connection loss
## HTTP Client
### LWP::UserAgent
**Purpose:** HTTP client for external service communication
**Configuration:**
```perl
use LWP::UserAgent;
my $ua = LWP::UserAgent->new(
agent => 'MusicBrainz/1.0 (https://musicbrainz.org)',
timeout => 30,
max_redirect => 5,
);
```
**User-Agent:** Always identifies as MusicBrainz with contact URL
**Timeout:** 30 seconds default
**Redirects:** Follow up to 5 redirects
**SSL Verification:** Enabled by default
### Rate Limiting
**External Services:** Respect rate limits via delays
**Wikipedia API:** 1 request per second (recommended)
**Wikidata API:** 1 request per second (recommended)
**Implementation:**
```perl
use Time::HiRes qw(sleep);
my $last_request_time = 0;
sub rate_limited_request {
my ($url) = @_;
my $elapsed = time() - $last_request_time;
if ($elapsed < 1) {
sleep(1 - $elapsed);
}
my $response = $ua->get($url);
$last_request_time = time();
return $response;
}
```
### Error Handling
**Retry Logic:** Exponential backoff for transient errors
**Timeouts:** Fail gracefully on timeout
**Logging:** Log all external service errors to Sentry
**Example:**
```perl
use Try::Tiny;
my $response;
my $retries = 3;
for my $attempt (1..$retries) {
try {
$response = $ua->get($url);
last if $response->is_success;
} catch {
warn "Request failed (attempt $attempt): $_";
sleep(2 ** $attempt); # Exponential backoff
};
}
```