feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
@@ -0,0 +1,884 @@
# GraphBrainz Integrations
## Integration Architecture
GraphBrainz integrates with 5 external APIs through a unified extension system:
| Integration | Type | Authentication | Rate Limit |
|-------------|------|----------------|------------|
| MusicBrainz | Core | None | 5 req/5.5s |
| Cover Art Archive | Built-in | None | 10 req/s |
| fanart.tv | Built-in | API key | 10 req/s |
| MediaWiki | Built-in | None | 10 req/s |
| TheAudioDB | Built-in | API key | 10 req/s |
External extensions (separate npm packages):
| Extension | Package | Authentication |
|-----------|---------|----------------|
| Last.fm | graphbrainz-extension-lastfm | API key |
| Discogs | graphbrainz-extension-discogs | API key |
| Spotify | graphbrainz-extension-spotify | OAuth |
## MusicBrainz REST API
### Overview
| Property | Value |
|----------|-------|
| Base URL | http://musicbrainz.org/ws/2/ |
| Protocol | REST (JSON) |
| Authentication | None |
| Rate Limit | 5 requests per 5.5 seconds |
| Documentation | https://musicbrainz.org/doc/MusicBrainz_API |
### Operations
#### Lookup
Retrieve single entity by MBID.
**Endpoint Pattern**:
```
GET /ws/2/{entity}/{mbid}?inc={relationships}&fmt=json
```
**Supported Entities**:
- area, artist, collection, event, instrument, label, place, recording, release, release-group, series, url, work
**Example**:
```
GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases+recordings&fmt=json
```
#### Browse
Retrieve entities linked to parent entity.
**Endpoint Pattern**:
```
GET /ws/2/{entity}?{parent-entity}={mbid}&limit={limit}&offset={offset}&inc={relationships}&fmt=json
```
**Example**:
```
GET /ws/2/release?artist=5b11f4ce-a62d-471e-81fc-a69a8278c7da&limit=25&offset=0&fmt=json
```
#### Search
Lucene-based full-text search.
**Endpoint Pattern**:
```
GET /ws/2/{entity}?query={lucene-query}&limit={limit}&offset={offset}&fmt=json
```
**Example**:
```
GET /ws/2/artist?query=artist:Radiohead%20AND%20country:GB&limit=25&fmt=json
```
### Rate Limiting
**Policy**: 5 requests per 5.5 seconds (0.909 req/s average)
**Implementation**:
```javascript
const musicbrainzLimiter = new RateLimiter({
limit: 5,
interval: 5500,
concurrency: 1
});
```
**Compliance Strategy**:
- Token bucket algorithm
- Sequential requests (no parallelization)
- Priority queue for request ordering
### Local Mirror Support
GraphBrainz supports local MusicBrainz mirrors to eliminate rate limits:
```bash
MUSICBRAINZ_BASE_URL=http://localhost:5000/ws/2/
```
**Benefits**:
- No rate limiting
- Reduced latency
- Offline operation
- Full dataset access
**Setup**: See https://musicbrainz.org/doc/MusicBrainz_Server/Setup
## Cover Art Archive
### Overview
| Property | Value |
|----------|-------|
| Base URL | http://coverartarchive.org/ |
| Protocol | REST (JSON) |
| Authentication | None |
| Rate Limit | 10 requests per second |
| Documentation | https://musicbrainz.org/doc/Cover_Art_Archive/API |
### Purpose
Provides album artwork and thumbnails for MusicBrainz releases.
### Schema Extension
Adds `coverArtArchive` field to `Release` type:
```graphql
extend type Release {
coverArtArchive: CoverArtArchiveRelease
}
type CoverArtArchiveRelease {
front: Boolean
back: Boolean
artwork: Boolean
count: Int
release: String
images: [CoverArtArchiveImage]
}
type CoverArtArchiveImage {
fileID: String
image: String
thumbnails: CoverArtArchiveThumbnails
front: Boolean
back: Boolean
types: [String]
edit: Int
approved: Boolean
comment: String
}
type CoverArtArchiveThumbnails {
small: String # 250px
large: String # 500px
}
```
### API Endpoints
#### Release Cover Art
**Endpoint**:
```
GET /release/{mbid}
```
**Response**:
```json
{
"images": [
{
"id": "12345",
"image": "http://coverartarchive.org/release/{mbid}/12345.jpg",
"thumbnails": {
"small": "http://coverartarchive.org/release/{mbid}/12345-250.jpg",
"large": "http://coverartarchive.org/release/{mbid}/12345-500.jpg"
},
"front": true,
"back": false,
"types": ["Front"],
"approved": true
}
],
"release": "http://musicbrainz.org/release/{mbid}"
}
```
#### Front Cover (Direct)
**Endpoint**:
```
GET /release/{mbid}/front
GET /release/{mbid}/front-250 # Small thumbnail
GET /release/{mbid}/front-500 # Large thumbnail
```
Returns image binary (JPEG/PNG).
### Configuration
| Environment Variable | Default | Purpose |
|---------------------|---------|---------|
| COVERART_CACHE_SIZE | 8192 | LRU cache size |
| COVERART_CACHE_TTL | 86400000 | Cache TTL (1 day) |
### Example Query
```graphql
{
lookup {
release(mbid: "f0c8b1e5-c3b6-46c0-9641-25fd3c00e56a") {
title
coverArtArchive {
front
back
count
images {
image
thumbnails {
large
}
types
front
}
}
}
}
}
```
### Implementation
**File**: `src/extensions/cover-art-archive/index.js`
**Client**: Custom HTTP client extending base `Client` class
**Resolver**:
```javascript
Release: {
coverArtArchive(release, args, context) {
return context.coverArtArchive.loader.load(release.id);
}
}
```
## fanart.tv
### Overview
| Property | Value |
|----------|-------|
| Base URL | http://webservice.fanart.tv/v3/ |
| Protocol | REST (JSON) |
| Authentication | API key (required) |
| Rate Limit | 10 requests per second |
| Documentation | https://fanart.tv/api-docs/ |
### Purpose
Provides high-quality artist images: backgrounds, banners, logos, thumbnails.
### Schema Extension
Adds `fanArt` field to `Artist` type:
```graphql
extend type Artist {
fanArt: FanArtImages
}
type FanArtImages {
backgrounds: [FanArtImage]
banners: [FanArtImage]
logos: [FanArtLabelImage]
logosHD: [FanArtLabelImage]
thumbnails: [FanArtImage]
}
type FanArtImage {
imageID: String
url: String
likes: Int
}
type FanArtLabelImage {
imageID: String
url: String
likes: Int
color: String
}
```
### API Endpoints
#### Artist Images
**Endpoint**:
```
GET /music/{mbid}?api_key={key}
```
**Response**:
```json
{
"name": "Radiohead",
"mbid_id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"artistbackground": [
{
"id": "12345",
"url": "https://assets.fanart.tv/fanart/music/5b11f4ce.../artistbackground/...",
"likes": "42"
}
],
"hdmusiclogo": [
{
"id": "67890",
"url": "https://assets.fanart.tv/fanart/music/5b11f4ce.../hdmusiclogo/...",
"likes": "128",
"colour": "FFFFFF"
}
],
"artistthumb": [...],
"musicbanner": [...]
}
```
### Configuration
| Environment Variable | Required | Default | Purpose |
|---------------------|----------|---------|---------|
| FANART_API_KEY | Yes | - | API authentication |
| FANART_CACHE_SIZE | No | 8192 | LRU cache size |
| FANART_CACHE_TTL | No | 86400000 | Cache TTL (1 day) |
### Example Query
```graphql
{
lookup {
artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") {
name
fanArt {
backgrounds {
url
likes
}
logosHD {
url
color
likes
}
banners {
url
}
}
}
}
}
```
### Implementation
**File**: `src/extensions/fanart/index.js`
**Client**: `FanArtClient` extending base `Client`
**Resolver**:
```javascript
Artist: {
fanArt(artist, args, context) {
return context.fanart.loader.load(artist.id);
}
}
```
## MediaWiki
### Overview
| Property | Value |
|----------|-------|
| Base URL | https://musicbrainz.org/w/api.php |
| Protocol | MediaWiki API |
| Authentication | None |
| Rate Limit | 10 requests per second |
| Documentation | https://www.mediawiki.org/wiki/API |
### Purpose
Retrieves images from MusicBrainz Wiki for artists, including EXIF metadata and license information.
### Schema Extension
Adds `mediaWikiImages` field to `Artist` type:
```graphql
extend type Artist {
mediaWikiImages: [MediaWikiImage]
}
type MediaWikiImage {
url: String
descriptionURL: String
title: String
user: String
size: Int
width: Int
height: Int
canonicalTitle: String
objectName: String
descriptionShortURL: String
metadata: [MediaWikiImageMetadata]
}
type MediaWikiImageMetadata {
name: String
value: String
}
```
### API Endpoints
#### Image Search
**Endpoint**:
```
GET /w/api.php?action=query&titles={artist-name}&prop=images&format=json
```
**Response**:
```json
{
"query": {
"pages": {
"12345": {
"title": "Radiohead",
"images": [
{
"title": "File:Radiohead.jpg"
}
]
}
}
}
}
```
#### Image Info
**Endpoint**:
```
GET /w/api.php?action=query&titles=File:{filename}&prop=imageinfo&iiprop=url|size|metadata|user&format=json
```
**Response**:
```json
{
"query": {
"pages": {
"67890": {
"imageinfo": [
{
"url": "https://musicbrainz.org/w/images/...",
"descriptionurl": "https://musicbrainz.org/w/File:...",
"width": 1200,
"height": 800,
"size": 245678,
"user": "WikiUser",
"metadata": [
{ "name": "DateTime", "value": "2020:01:15 10:30:00" },
{ "name": "Artist", "value": "Photographer Name" }
]
}
]
}
}
}
}
```
### Configuration
| Environment Variable | Default | Purpose |
|---------------------|---------|---------|
| MEDIAWIKI_CACHE_SIZE | 8192 | LRU cache size |
| MEDIAWIKI_CACHE_TTL | 86400000 | Cache TTL (1 day) |
### Example Query
```graphql
{
lookup {
artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") {
name
mediaWikiImages {
url
width
height
user
metadata {
name
value
}
}
}
}
}
```
### Implementation
**File**: `src/extensions/mediawiki/index.js`
**Client**: `MediaWikiClient` extending base `Client`
**Resolver**:
```javascript
Artist: {
mediaWikiImages(artist, args, context) {
return context.mediawiki.loader.load(artist.name);
}
}
```
## TheAudioDB
### Overview
| Property | Value |
|----------|-------|
| Base URL | http://www.theaudiodb.com/api/v1/json/ |
| Protocol | REST (JSON) |
| Authentication | API key (required) |
| Rate Limit | 10 requests per second |
| Documentation | https://www.theaudiodb.com/api_guide.php |
### Purpose
Provides artist biographies, logos, and additional metadata.
### Schema Extension
Adds `theAudioDB` field to `Artist` type:
```graphql
extend type Artist {
theAudioDB: TheAudioDBArtist
}
type TheAudioDBArtist {
artistID: String
biography: String
biographyEN: String
memberCount: Int
banner: String
logo: String
thumbnail: String
fanArt: [TheAudioDBImage]
}
type TheAudioDBImage {
url: String
}
```
### API Endpoints
#### Artist by MBID
**Endpoint**:
```
GET /{api-key}/artist-mb.php?i={mbid}
```
**Response**:
```json
{
"artists": [
{
"idArtist": "111239",
"strArtist": "Radiohead",
"strArtistMBID": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"strBiographyEN": "Radiohead are an English rock band...",
"intMembers": "5",
"strArtistBanner": "https://www.theaudiodb.com/images/media/artist/banner/...",
"strArtistLogo": "https://www.theaudiodb.com/images/media/artist/logo/...",
"strArtistThumb": "https://www.theaudiodb.com/images/media/artist/thumb/...",
"strArtistFanart": "https://www.theaudiodb.com/images/media/artist/fanart/...",
"strArtistFanart2": "https://www.theaudiodb.com/images/media/artist/fanart2/...",
"strArtistFanart3": "https://www.theaudiodb.com/images/media/artist/fanart3/..."
}
]
}
```
### Configuration
| Environment Variable | Required | Default | Purpose |
|---------------------|----------|---------|---------|
| THEAUDIODB_API_KEY | Yes | - | API authentication |
| THEAUDIODB_CACHE_SIZE | No | 8192 | LRU cache size |
| THEAUDIODB_CACHE_TTL | No | 86400000 | Cache TTL (1 day) |
### Example Query
```graphql
{
lookup {
artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") {
name
theAudioDB {
biographyEN
memberCount
logo
banner
fanArt {
url
}
}
}
}
}
```
### Implementation
**File**: `src/extensions/theaudiodb/index.js`
**Client**: `TheAudioDBClient` extending base `Client`
**Resolver**:
```javascript
Artist: {
theAudioDB(artist, args, context) {
return context.theaudiodb.loader.load(artist.id);
}
}
```
## Extension Pattern
All extensions follow a consistent pattern for integration.
### Extension Interface
```javascript
{
name: String, // Extension identifier
description: String, // Human-readable description
extendContext: Function, // Add HTTP client, DataLoader, cache to context
extendSchema: Function // Add GraphQL types and resolvers
}
```
### Context Extension
```javascript
extendContext(context, options) {
const client = new ExtensionClient({
baseURL: options.baseURL,
apiKey: options.apiKey,
timeout: options.timeout
});
const cache = new LRU({
max: options.cacheSize || 8192,
ttl: options.cacheTTL || 86400000
});
const loader = new DataLoader(
keys => batchFetch(client, keys),
{ cache: false } // Use LRU cache instead
);
return {
...context,
[extensionName]: {
client,
loader,
cache
}
};
}
```
### Schema Extension
```javascript
extendSchema(schema, options) {
const typeDefs = `
extend type Artist {
extensionField: ExtensionType
}
type ExtensionType {
field1: String
field2: Int
}
`;
const resolvers = {
Artist: {
extensionField(artist, args, context) {
return context.extensionName.loader.load(artist.id);
}
}
};
return extendSchema(schema, { typeDefs, resolvers });
}
```
### Client Base Class
All extension clients extend a base `Client` class:
**File**: `src/client.js`
```javascript
class Client {
constructor(options) {
this.client = got.extend({
prefixUrl: options.baseURL,
headers: options.headers,
timeout: options.timeout || 30000,
retry: { limit: 3 },
hooks: {
beforeRequest: [this.beforeRequest.bind(this)],
afterResponse: [this.afterResponse.bind(this)]
}
});
this.cache = options.cache;
this.limiter = options.limiter;
}
async get(path, options) {
const cacheKey = this.getCacheKey(path, options);
const cached = this.cache.get(cacheKey);
if (cached) {
return cached;
}
await this.limiter.acquire();
const response = await this.client.get(path, options);
const data = response.body;
this.cache.set(cacheKey, data);
return data;
}
getCacheKey(path, options) {
return `${path}:${JSON.stringify(options)}`;
}
beforeRequest(options) {
debug(`${this.constructor.name}`)(`${options.method} ${options.url}`);
}
afterResponse(response) {
return response;
}
}
```
## External Extensions
### Last.fm
**Package**: `graphbrainz-extension-lastfm`
**Installation**:
```bash
npm install graphbrainz-extension-lastfm
```
**Configuration**:
```bash
LASTFM_API_KEY=your-api-key
```
**Schema Additions**:
- `Artist.lastFM` - Scrobble statistics, similar artists
- `Recording.lastFM` - Play counts, listener counts
### Discogs
**Package**: `graphbrainz-extension-discogs`
**Installation**:
```bash
npm install graphbrainz-extension-discogs
```
**Configuration**:
```bash
DISCOGS_API_KEY=your-api-key
```
**Schema Additions**:
- `Release.discogs` - Marketplace data, pricing, community ratings
### Spotify
**Package**: `graphbrainz-extension-spotify`
**Installation**:
```bash
npm install graphbrainz-extension-spotify
```
**Configuration**:
```bash
SPOTIFY_CLIENT_ID=your-client-id
SPOTIFY_CLIENT_SECRET=your-client-secret
```
**Schema Additions**:
- `Artist.spotify` - Popularity, followers, genres
- `Recording.spotify` - Audio features, preview URLs
## Integration Best Practices
### Error Handling
Each extension implements custom error classes:
```javascript
class FanArtError extends Error {
constructor(message, statusCode) {
super(message);
this.name = 'FanArtError';
this.statusCode = statusCode;
}
}
```
### Graceful Degradation
Extension failures don't break core queries:
```graphql
{
lookup {
artist(mbid: "...") {
name # Always works (core)
fanArt { # Returns null if fanart.tv fails
backgrounds
}
}
}
}
```
### Rate Limit Coordination
Each extension has independent rate limiter to prevent cross-contamination:
```javascript
const fanartLimiter = new RateLimiter({ limit: 10, interval: 1000 });
const theaudiodbLimiter = new RateLimiter({ limit: 10, interval: 1000 });
```
### Cache Isolation
Separate caches prevent eviction conflicts:
```javascript
const fanartCache = new LRU({ max: 8192 });
const theaudiodbCache = new LRU({ max: 8192 });
```