feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,529 @@
|
||||
# MusicBrainz Server Integrations
|
||||
|
||||
## Cover Art Archive
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** Cover Art Archive (coverartarchive.org)
|
||||
**Storage:** Amazon S3 + Internet Archive
|
||||
**Purpose:** Store and serve album cover artwork
|
||||
|
||||
### Upload Process
|
||||
|
||||
**Method:** Signed POST to S3
|
||||
|
||||
**Authentication:** HMAC-SHA1 signed policy
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# DBDefs.pm
|
||||
sub COVER_ART_ARCHIVE_ACCESS_KEY { 'access_key' }
|
||||
sub COVER_ART_ARCHIVE_SECRET_KEY { 'secret_key' }
|
||||
sub COVER_ART_ARCHIVE_UPLOAD_PREFIXER { 'MB' }
|
||||
sub COVER_ART_ARCHIVE_DOWNLOAD_PREFIX { 'https://coverartarchive.org' }
|
||||
```
|
||||
|
||||
**Upload Flow:**
|
||||
1. User uploads image via MusicBrainz interface
|
||||
2. Server generates S3 policy document
|
||||
3. Policy signed with HMAC-SHA1 using secret key
|
||||
4. Browser POSTs directly to S3 with signed policy
|
||||
5. S3 stores image and forwards to Internet Archive
|
||||
6. Image becomes available at coverartarchive.org
|
||||
|
||||
**Policy Document:**
|
||||
```json
|
||||
{
|
||||
"expiration": "2024-12-31T23:59:59Z",
|
||||
"conditions": [
|
||||
{"bucket": "mbid-{release_mbid}"},
|
||||
{"acl": "public-read"},
|
||||
["starts-with", "$key", "mbid-{release_mbid}/"],
|
||||
["content-length-range", 0, 10485760]
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Signature:**
|
||||
```perl
|
||||
use Digest::SHA qw(hmac_sha1_base64);
|
||||
|
||||
my $policy_b64 = encode_base64($policy_json);
|
||||
my $signature = hmac_sha1_base64($policy_b64, $secret_key);
|
||||
$signature .= '=' while length($signature) % 4; # Pad to multiple of 4
|
||||
```
|
||||
|
||||
### Retrieval
|
||||
|
||||
**URL Pattern:** `https://coverartarchive.org/release/{mbid}/front`
|
||||
|
||||
**Image Types:**
|
||||
- `front` - Front cover
|
||||
- `back` - Back cover
|
||||
- `{id}` - Specific image by ID
|
||||
|
||||
**Sizes:**
|
||||
- Original (full resolution)
|
||||
- `250` - 250px thumbnail
|
||||
- `500` - 500px thumbnail
|
||||
- `1200` - 1200px large
|
||||
|
||||
**Example:**
|
||||
```
|
||||
https://coverartarchive.org/release/76df3287-6cda-33eb-8e9a-044b5e15ffdd/front-250.jpg
|
||||
```
|
||||
|
||||
## Wikipedia/Wikidata/Wikimedia Commons
|
||||
|
||||
### MediaWiki API Integration
|
||||
|
||||
**Purpose:** Fetch article extracts, images, and structured data
|
||||
|
||||
**Endpoints:**
|
||||
- Wikipedia: `https://{lang}.wikipedia.org/w/api.php`
|
||||
- Wikidata: `https://www.wikidata.org/w/api.php`
|
||||
- Commons: `https://commons.wikimedia.org/w/api.php`
|
||||
|
||||
### Wikipedia Extracts
|
||||
|
||||
**API Action:** `query` with `prop=extracts`
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://en.wikipedia.org/w/api.php?" .
|
||||
"action=query&" .
|
||||
"prop=extracts&" .
|
||||
"exintro=1&" .
|
||||
"explaintext=1&" .
|
||||
"titles=" . uri_escape($artist_name) .
|
||||
"&format=json";
|
||||
|
||||
my $response = $ua->get($url);
|
||||
my $data = decode_json($response->content);
|
||||
```
|
||||
|
||||
**Caching:** 3 days for extracts
|
||||
|
||||
**Display:** Artist/release pages show Wikipedia extract in sidebar
|
||||
|
||||
### Language Links
|
||||
|
||||
**API Action:** `query` with `prop=langlinks`
|
||||
|
||||
**Purpose:** Find Wikipedia articles in different languages
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://en.wikipedia.org/w/api.php?" .
|
||||
"action=query&" .
|
||||
"prop=langlinks&" .
|
||||
"titles=" . uri_escape($title) .
|
||||
"&lllimit=500&" .
|
||||
"&format=json";
|
||||
```
|
||||
|
||||
**Caching:** 7 days for language links
|
||||
|
||||
**Usage:** Display Wikipedia links in user's preferred language
|
||||
|
||||
### Wikidata Integration
|
||||
|
||||
**Purpose:** Fetch structured data (birth dates, locations, etc.)
|
||||
|
||||
**API Action:** `wbgetentities`
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://www.wikidata.org/w/api.php?" .
|
||||
"action=wbgetentities&" .
|
||||
"ids=Q{wikidata_id}&" .
|
||||
"format=json";
|
||||
```
|
||||
|
||||
**Data Extracted:**
|
||||
- Birth/death dates
|
||||
- Birth/death places
|
||||
- Occupations
|
||||
- Genres
|
||||
- Record labels
|
||||
- Official websites
|
||||
|
||||
### Wikimedia Commons Images
|
||||
|
||||
**Purpose:** Fetch artist/band photos
|
||||
|
||||
**API Action:** `query` with `prop=imageinfo`
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://commons.wikimedia.org/w/api.php?" .
|
||||
"action=query&" .
|
||||
"prop=imageinfo&" .
|
||||
"iiprop=url|size|mime&" .
|
||||
"titles=File:" . uri_escape($filename) .
|
||||
"&format=json";
|
||||
```
|
||||
|
||||
**Display:** Artist pages show Commons images in sidebar
|
||||
|
||||
## CritiqueBrainz
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** CritiqueBrainz (critiquebrainz.org)
|
||||
**Purpose:** User-generated music reviews
|
||||
|
||||
### Integration
|
||||
|
||||
**Method:** URL linking
|
||||
|
||||
**Pattern:** `https://critiquebrainz.org/release/{mbid}`
|
||||
|
||||
**Display:** Release pages show link to CritiqueBrainz reviews
|
||||
|
||||
**Embedding:** Review count and average rating displayed on release pages
|
||||
|
||||
**API:** CritiqueBrainz API used to fetch review statistics
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $url = "https://critiquebrainz.org/ws/1/release/$mbid";
|
||||
my $response = $ua->get($url);
|
||||
my $data = decode_json($response->content);
|
||||
|
||||
my $review_count = $data->{review_count};
|
||||
my $avg_rating = $data->{average_rating};
|
||||
```
|
||||
|
||||
## Event Art Archive
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** Event Art Archive
|
||||
**Purpose:** Store event posters and promotional materials
|
||||
|
||||
**Architecture:** Similar to Cover Art Archive (S3 + Internet Archive)
|
||||
|
||||
**URL Pattern:** `https://eventartarchive.org/event/{mbid}`
|
||||
|
||||
## Discourse SSO
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** MusicBrainz Community Forum (community.metabrainz.org)
|
||||
**Protocol:** Discourse SSO (Single Sign-On)
|
||||
|
||||
### Authentication Flow
|
||||
|
||||
**Method:** HMAC-SHA256 signed payload
|
||||
|
||||
**Flow:**
|
||||
1. User clicks "Log in" on Discourse
|
||||
2. Discourse redirects to MusicBrainz with nonce
|
||||
3. MusicBrainz authenticates user
|
||||
4. MusicBrainz generates SSO payload
|
||||
5. Payload signed with HMAC-SHA256
|
||||
6. User redirected back to Discourse with signed payload
|
||||
7. Discourse verifies signature and logs in user
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# DBDefs.pm
|
||||
sub DISCOURSE_SSO_SECRET { 'shared_secret' }
|
||||
sub DISCOURSE_SERVER { 'https://community.metabrainz.org' }
|
||||
```
|
||||
|
||||
**Payload Generation:**
|
||||
```perl
|
||||
use Digest::SHA qw(hmac_sha256_hex);
|
||||
use MIME::Base64;
|
||||
|
||||
my $payload = encode_base64(
|
||||
"nonce=$nonce&" .
|
||||
"email=$email&" .
|
||||
"external_id=$user_id&" .
|
||||
"username=$username&" .
|
||||
"name=$name"
|
||||
);
|
||||
|
||||
my $signature = hmac_sha256_hex($payload, $sso_secret);
|
||||
|
||||
my $redirect_url = "$discourse_server/session/sso_login?" .
|
||||
"sso=" . uri_escape($payload) .
|
||||
"&sig=$signature";
|
||||
```
|
||||
|
||||
**User Data Synced:**
|
||||
- Email address
|
||||
- Username
|
||||
- Display name
|
||||
- User ID (external_id)
|
||||
- Avatar URL (optional)
|
||||
- Admin status (optional)
|
||||
- Moderator status (optional)
|
||||
|
||||
## MetaBrainz OAuth
|
||||
|
||||
### Overview
|
||||
|
||||
**Service:** Centralized OAuth provider for MetaBrainz services
|
||||
**Protocol:** OAuth 2.0 with token introspection
|
||||
|
||||
### Token Introspection
|
||||
|
||||
**Endpoint:** `https://musicbrainz.org/oauth2/introspect`
|
||||
|
||||
**Method:** POST
|
||||
|
||||
**Request:**
|
||||
```perl
|
||||
my $response = $ua->post(
|
||||
'https://musicbrainz.org/oauth2/introspect',
|
||||
{
|
||||
token => $access_token,
|
||||
client_id => $client_id,
|
||||
client_secret => $client_secret,
|
||||
}
|
||||
);
|
||||
|
||||
my $data = decode_json($response->content);
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"active": true,
|
||||
"scope": "profile email tag rating collection",
|
||||
"client_id": "client_id",
|
||||
"username": "username",
|
||||
"token_type": "Bearer",
|
||||
"exp": 1609459200,
|
||||
"iat": 1609372800,
|
||||
"sub": "user_id"
|
||||
}
|
||||
```
|
||||
|
||||
**Usage:** Other MetaBrainz services (ListenBrainz, BookBrainz, etc.) validate tokens via introspection
|
||||
|
||||
### Services Using MetaBrainz OAuth
|
||||
|
||||
- ListenBrainz (listening history)
|
||||
- BookBrainz (book metadata)
|
||||
- CritiqueBrainz (music reviews)
|
||||
- AcousticBrainz (audio analysis)
|
||||
- Picard (music tagger)
|
||||
|
||||
## Replication System
|
||||
|
||||
### Overview
|
||||
|
||||
**Purpose:** Synchronize database changes from master to mirrors
|
||||
**Protocol:** dbmirror2 packet system
|
||||
|
||||
### Replication Modes
|
||||
|
||||
**RT_MASTER:**
|
||||
- Generates replication packets
|
||||
- Writes to `dbmirror_pending` and `dbmirror_pendingdata` tables
|
||||
- Exports packets for mirrors
|
||||
|
||||
**RT_MIRROR:**
|
||||
- Consumes replication packets
|
||||
- Applies changes from master
|
||||
- Read-only (no edits)
|
||||
|
||||
**RT_STANDALONE:**
|
||||
- No replication
|
||||
- Fully independent database
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# DBDefs.pm
|
||||
sub REPLICATION_TYPE { RT_MASTER } # or RT_MIRROR or RT_STANDALONE
|
||||
sub REPLICATION_ACCESS_TOKEN { 'secret_token' }
|
||||
```
|
||||
|
||||
### Packet Structure
|
||||
|
||||
**Tables:**
|
||||
- `dbmirror_pending` - Pending transactions
|
||||
- `dbmirror_pendingdata` - Data changes (INSERT/UPDATE/DELETE)
|
||||
|
||||
**Packet Format:**
|
||||
```
|
||||
SeqId: 12345
|
||||
TransactionId: 67890
|
||||
Operation: i # i=INSERT, u=UPDATE, d=DELETE
|
||||
TableName: artist
|
||||
Data: {"id":123,"gid":"...","name":"..."}
|
||||
```
|
||||
|
||||
### Replication Flow
|
||||
|
||||
**Master Side:**
|
||||
1. Edit applied to database
|
||||
2. Triggers capture changes to `dbmirror_pending`
|
||||
3. Export script generates replication packets
|
||||
4. Packets uploaded to FTP server
|
||||
|
||||
**Mirror Side:**
|
||||
1. Download replication packets from FTP
|
||||
2. Apply packets in sequence order
|
||||
3. Update replication state
|
||||
4. Verify data integrity
|
||||
|
||||
**Packet Export:**
|
||||
```bash
|
||||
# On master
|
||||
./admin/replication/ExportReplicationPackets
|
||||
|
||||
# Generates packets in replication/ directory
|
||||
# Uploads to FTP server
|
||||
```
|
||||
|
||||
**Packet Import:**
|
||||
```bash
|
||||
# On mirror
|
||||
./admin/replication/LoadReplicationChanges
|
||||
|
||||
# Downloads packets from FTP
|
||||
# Applies changes to database
|
||||
```
|
||||
|
||||
### Replication Lag
|
||||
|
||||
**Monitoring:** Mirrors track replication lag (time behind master)
|
||||
|
||||
**Typical Lag:** Minutes to hours depending on packet size and network
|
||||
|
||||
**Status Endpoint:** `/replication-status` shows current replication state
|
||||
|
||||
## Redis Integration
|
||||
|
||||
### Architecture
|
||||
|
||||
**Connection:** Single Redis instance, 16 databases (0-15)
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
# DBDefs.pm
|
||||
sub REDIS_SERVER { 'localhost:6379' }
|
||||
sub REDIS_NAMESPACE { 'MB' }
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
**Session Management (DB 1):**
|
||||
- Store user sessions
|
||||
- 10 hour absolute expiry
|
||||
- 3 hour idle timeout
|
||||
|
||||
**Entity Cache (DB 0):**
|
||||
- Cache entity lookups by MBID
|
||||
- 1 hour TTL
|
||||
- Invalidate on edit
|
||||
|
||||
**Search Cache (DB 2):**
|
||||
- Cache search results
|
||||
- 15 minute TTL
|
||||
|
||||
**Statistics Cache (DB 3):**
|
||||
- Cache homepage statistics
|
||||
- 1 hour TTL
|
||||
|
||||
**Rate Limiting (DB 4):**
|
||||
- Track API request counts
|
||||
- 1 second sliding window
|
||||
|
||||
**Pub/Sub (DB 5):**
|
||||
- Real-time notifications
|
||||
- Edit submission events
|
||||
- Cache invalidation events
|
||||
|
||||
### Connection Pooling
|
||||
|
||||
**Library:** Redis.pm with connection pooling
|
||||
|
||||
**Pool Size:** 10 connections per worker
|
||||
|
||||
**Reconnection:** Automatic reconnection on connection loss
|
||||
|
||||
## HTTP Client
|
||||
|
||||
### LWP::UserAgent
|
||||
|
||||
**Purpose:** HTTP client for external service communication
|
||||
|
||||
**Configuration:**
|
||||
```perl
|
||||
use LWP::UserAgent;
|
||||
|
||||
my $ua = LWP::UserAgent->new(
|
||||
agent => 'MusicBrainz/1.0 (https://musicbrainz.org)',
|
||||
timeout => 30,
|
||||
max_redirect => 5,
|
||||
);
|
||||
```
|
||||
|
||||
**User-Agent:** Always identifies as MusicBrainz with contact URL
|
||||
|
||||
**Timeout:** 30 seconds default
|
||||
|
||||
**Redirects:** Follow up to 5 redirects
|
||||
|
||||
**SSL Verification:** Enabled by default
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
**External Services:** Respect rate limits via delays
|
||||
|
||||
**Wikipedia API:** 1 request per second (recommended)
|
||||
|
||||
**Wikidata API:** 1 request per second (recommended)
|
||||
|
||||
**Implementation:**
|
||||
```perl
|
||||
use Time::HiRes qw(sleep);
|
||||
|
||||
my $last_request_time = 0;
|
||||
|
||||
sub rate_limited_request {
|
||||
my ($url) = @_;
|
||||
|
||||
my $elapsed = time() - $last_request_time;
|
||||
if ($elapsed < 1) {
|
||||
sleep(1 - $elapsed);
|
||||
}
|
||||
|
||||
my $response = $ua->get($url);
|
||||
$last_request_time = time();
|
||||
|
||||
return $response;
|
||||
}
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
**Retry Logic:** Exponential backoff for transient errors
|
||||
|
||||
**Timeouts:** Fail gracefully on timeout
|
||||
|
||||
**Logging:** Log all external service errors to Sentry
|
||||
|
||||
**Example:**
|
||||
```perl
|
||||
use Try::Tiny;
|
||||
|
||||
my $response;
|
||||
my $retries = 3;
|
||||
|
||||
for my $attempt (1..$retries) {
|
||||
try {
|
||||
$response = $ua->get($url);
|
||||
last if $response->is_success;
|
||||
} catch {
|
||||
warn "Request failed (attempt $attempt): $_";
|
||||
sleep(2 ** $attempt); # Exponential backoff
|
||||
};
|
||||
}
|
||||
```
|
||||
Reference in New Issue
Block a user