- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
14 KiB
GraphBrainz Data Layer
Data Source Architecture
GraphBrainz is a stateless proxy with no persistent database. All data originates from external APIs:
| Source | Purpose | Authentication |
|---|---|---|
| MusicBrainz REST API | Core music metadata | None |
| Cover Art Archive | Album artwork | None |
| fanart.tv | Artist images | API key required |
| MediaWiki | Wiki images | None |
| TheAudioDB | Artist biographies | API key required |
MusicBrainz Backend
Base URL Configuration
| Environment Variable | Default | Purpose |
|---|---|---|
| MUSICBRAINZ_BASE_URL | http://musicbrainz.org/ws/2/ | API endpoint |
Local Mirror Support:
MUSICBRAINZ_BASE_URL=http://localhost:5000/ws/2/
Using a local MusicBrainz mirror eliminates rate limits and reduces latency.
API Operations
GraphBrainz uses three MusicBrainz API operations:
1. Lookup
Retrieve single entity by MBID.
URL Pattern:
GET /ws/2/{entity}/{mbid}?inc={relationships}
Example:
GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases+recordings
Supported Entities: area, artist, collection, event, instrument, label, place, recording, release, release-group, series, url, work
2. Browse
Retrieve entities linked to a parent entity.
URL Pattern:
GET /ws/2/{entity}?{parent-entity}={mbid}&limit={limit}&offset={offset}&inc={relationships}
Example:
GET /ws/2/release?artist=5b11f4ce-a62d-471e-81fc-a69a8278c7da&limit=25&offset=0
Supported Relationships: See API.md for full matrix
3. Search
Lucene-based full-text search.
URL Pattern:
GET /ws/2/{entity}?query={lucene-query}&limit={limit}&offset={offset}
Example:
GET /ws/2/artist?query=artist:Radiohead%20AND%20country:GB&limit=25
Supported Entities: area, artist, event, instrument, label, place, recording, release, release-group, work
Include Parameters
GraphBrainz resolvers inspect the GraphQL AST to determine which inc parameters are needed:
| Parameter | Description | Entities |
|---|---|---|
| aliases | Alternative names | All |
| annotation | Editorial notes | All |
| tags | User-generated tags | All |
| ratings | User ratings | All |
| genres | Genre classifications | All |
| artist-credits | Artist credit details | Recording, Release, ReleaseGroup, Track |
| artists | Related artists | Recording, Release, ReleaseGroup, Work |
| collections | Collections containing entity | All |
| labels | Record labels | Release |
| recordings | Recordings | Artist, Release, Work |
| releases | Releases | Artist, Label, Recording, ReleaseGroup |
| release-groups | Release groups | Artist, Release |
| works | Musical works | Artist, Recording |
| discids | Disc IDs | Release |
| media | Media/tracks | Release |
| isrcs | ISRC codes | Recording |
| url-rels | URL relationships | All |
| artist-rels | Artist relationships | All |
| label-rels | Label relationships | All |
| recording-rels | Recording relationships | All |
| release-rels | Release relationships | All |
| release-group-rels | Release group relationships | All |
| work-rels | Work relationships | All |
| area-rels | Area relationships | All |
| place-rels | Place relationships | All |
| event-rels | Event relationships | All |
| series-rels | Series relationships | All |
| instrument-rels | Instrument relationships | All |
Response Format
MusicBrainz returns JSON with entity-specific structure:
{
"id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"name": "Radiohead",
"sort-name": "Radiohead",
"type": "Group",
"country": "GB",
"life-span": {
"begin": "1985"
},
"releases": [
{
"id": "...",
"title": "OK Computer",
"date": "1997-05-21"
}
]
}
GraphBrainz transforms this to GraphQL-friendly format (camelCase, nested objects).
Two-Level Caching Strategy
Level 1: DataLoader (Per-Request)
Purpose: Request batching and deduplication within a single GraphQL query.
Lifecycle: Created fresh for each GraphQL request, discarded after response.
Implementation:
import DataLoader from 'dataloader';
const artistLoader = new DataLoader(async (keys) => {
const results = await Promise.all(
keys.map(key => fetchArtist(key.mbid, key.inc))
);
return results;
});
Benefits:
- Batches multiple requests for same entity type
- Deduplicates identical requests within query
- Prevents N+1 query problems
Example:
{
lookup {
release(mbid: "...") {
artists { # Artist 1
name
}
tracks {
artists { # Artist 1 again (deduplicated)
name
}
}
}
}
}
DataLoader ensures Artist 1 is fetched only once.
Level 2: LRU Cache (Shared)
Purpose: Cross-request caching to reduce API calls.
Lifecycle: Shared across all requests, persists for configured TTL.
Configuration:
| Parameter | Environment Variable | Default |
|---|---|---|
| Size | GRAPHBRAINZ_CACHE_SIZE | 8192 items |
| TTL | GRAPHBRAINZ_CACHE_TTL | 86400000 ms (1 day) |
Implementation:
import LRU from 'lru-cache';
const cache = new LRU({
max: 8192,
ttl: 86400000, // 1 day
updateAgeOnGet: true,
updateAgeOnHas: true
});
Cache Key Strategy:
Keys combine entity type, MBID, and inc parameters to prevent collisions:
artist:5b11f4ce-a62d-471e-81fc-a69a8278c7da:releases,recordings
release:f0c8b1e5-...:artist-credits,labels,media
Different queries for the same entity use different cache keys.
Cache Invalidation:
- Time-based: Items expire after TTL (default 1 day)
- Size-based: LRU eviction when cache exceeds max size
- No manual invalidation: GraphBrainz assumes MusicBrainz data is relatively stable
Cache Hit Ratio:
Typical hit ratios for production workloads:
- Lookup queries: 60-80% (popular artists cached)
- Browse queries: 40-60% (pagination reduces hits)
- Search queries: 10-30% (diverse queries)
Extension Caching
Each extension maintains its own LRU cache with separate configuration.
Cover Art Archive
| Parameter | Environment Variable | Default |
|---|---|---|
| Size | COVERART_CACHE_SIZE | 8192 |
| TTL | COVERART_CACHE_TTL | 86400000 ms |
Cache Key: coverart:{release-mbid}
fanart.tv
| Parameter | Environment Variable | Default |
|---|---|---|
| Size | FANART_CACHE_SIZE | 8192 |
| TTL | FANART_CACHE_TTL | 86400000 ms |
Cache Key: fanart:{artist-mbid}
TheAudioDB
| Parameter | Environment Variable | Default |
|---|---|---|
| Size | THEAUDIODB_CACHE_SIZE | 8192 |
| TTL | THEAUDIODB_CACHE_TTL | 86400000 ms |
Cache Key: theaudiodb:{artist-mbid}
MediaWiki
| Parameter | Environment Variable | Default |
|---|---|---|
| Size | MEDIAWIKI_CACHE_SIZE | 8192 |
| TTL | MEDIAWIKI_CACHE_TTL | 86400000 ms |
Cache Key: mediawiki:{artist-name}
Data Flow
Complete request flow from GraphQL query to response:
1. GraphQL Query Received
↓
2. Resolver Inspects AST
↓ (determines required inc parameters)
3. DataLoader.load({ mbid, inc })
↓
4. Check DataLoader Cache (per-request)
↓ (miss)
5. Check LRU Cache (shared)
↓ (miss)
6. Rate Limiter Queue
↓ (acquire token)
7. HTTP Request via got
↓
8. MusicBrainz API Response
↓
9. Store in LRU Cache
↓
10. Return to DataLoader
↓
11. Return to Resolver
↓
12. GraphQL Response
Cache Hit Path:
1. GraphQL Query Received
↓
2. Resolver Inspects AST
↓
3. DataLoader.load({ mbid, inc })
↓
4. Check DataLoader Cache (per-request)
↓ (hit - return immediately)
5. GraphQL Response
Shared Cache Hit Path:
1. GraphQL Query Received
↓
2. Resolver Inspects AST
↓
3. DataLoader.load({ mbid, inc })
↓
4. Check DataLoader Cache (per-request)
↓ (miss)
5. Check LRU Cache (shared)
↓ (hit - return immediately)
6. Store in DataLoader Cache
↓
7. GraphQL Response
Rate Limiting
GraphBrainz implements custom rate limiting to comply with API policies.
MusicBrainz Rate Limits
Policy: 5 requests per 5.5 seconds (approximately 0.909 requests/second)
Implementation:
- Token bucket algorithm
- 5 tokens maximum
- Refill rate: 0.909 tokens/second
- Sequential requests (concurrency: 1)
Configuration:
const musicbrainzLimiter = new RateLimiter({
limit: 5,
interval: 5500, // milliseconds
concurrency: 1
});
Extension Rate Limits
Default Policy: 10 requests per second
Implementation:
- Token bucket algorithm
- 10 tokens maximum
- Refill rate: 10 tokens/second
- Parallel requests (concurrency: 5)
Per-Extension Configuration:
| Extension | Rate Limit | Concurrency |
|---|---|---|
| Cover Art Archive | 10 req/s | 5 |
| fanart.tv | 10 req/s | 5 |
| MediaWiki | 10 req/s | 5 |
| TheAudioDB | 10 req/s | 5 |
Priority Queue
Requests are queued with priority levels when rate limit is reached:
| Priority | Query Type | Rationale |
|---|---|---|
| High | Lookup | Direct MBID access, user-initiated |
| Medium | Browse | Relationship traversal, pagination |
| Low | Search | Full-text search, exploratory |
Higher priority requests are processed first when tokens become available.
Rate Limit Errors
When rate limit is exceeded and queue is full:
HTTP Response:
HTTP/1.1 429 Too Many Requests
Retry-After: 5
GraphQL Error:
{
"errors": [
{
"message": "Rate limit exceeded",
"extensions": {
"code": "RATE_LIMIT",
"retryAfter": 5
}
}
]
}
HTTP Client
GraphBrainz uses got v11.8.2 for HTTP requests.
Client Configuration
import got from 'got';
const client = got.extend({
prefixUrl: process.env.MUSICBRAINZ_BASE_URL,
headers: {
'User-Agent': 'GraphBrainz/9.0.0 (https://github.com/exogen/graphbrainz)'
},
timeout: {
request: 30000 // 30 seconds
},
retry: {
limit: 3,
methods: ['GET'],
statusCodes: [408, 413, 429, 500, 502, 503, 504]
},
hooks: {
beforeRequest: [
options => {
debug('graphbrainz:api/client')(`${options.method} ${options.url}`);
}
]
}
});
Request Headers
| Header | Value | Purpose |
|---|---|---|
| User-Agent | GraphBrainz/9.0.0 (...) | API identification |
| Accept | application/json | Response format |
Timeout Handling
- Request timeout: 30 seconds
- Connection timeout: 10 seconds (default)
- Read timeout: 30 seconds (default)
Timeout errors are propagated as GraphQL errors.
Retry Logic
Automatic retry for transient failures:
- Max retries: 3
- Retry methods: GET only
- Retry status codes: 408, 413, 429, 500, 502, 503, 504
- Backoff: Exponential (1s, 2s, 4s)
Data Transformation
MusicBrainz API responses are transformed to GraphQL-friendly format:
Field Name Conversion
| MusicBrainz | GraphQL |
|---|---|
| sort-name | sortName |
| life-span | lifeSpan |
| artist-credit | artistCredit |
| release-group | releaseGroup |
| iso-3166-1-codes | iso31661Codes |
Nested Object Flattening
MusicBrainz:
{
"life-span": {
"begin": "1985",
"end": null
}
}
GraphQL:
{
"lifeSpan": {
"begin": "1985",
"end": null
}
}
Array Normalization
MusicBrainz:
{
"releases": [
{ "id": "...", "title": "..." }
]
}
GraphQL (Relay connection):
{
"releases": {
"edges": [
{
"node": { "id": "...", "title": "..." },
"cursor": "..."
}
],
"pageInfo": { ... },
"totalCount": 1
}
}
Relationship Expansion
MusicBrainz relationships are flattened into GraphQL fields:
MusicBrainz:
{
"relations": [
{
"type": "member of band",
"target": "5b11f4ce-...",
"artist": { "name": "Radiohead" }
}
]
}
GraphQL:
{
relationships {
edges {
node {
type
target {
... on Artist {
name
}
}
}
}
}
}
Memory Considerations
Cache Memory Usage
With default configuration (8192 items per cache):
| Cache | Items | Avg Size | Total Memory |
|---|---|---|---|
| MusicBrainz | 8192 | 5 KB | ~40 MB |
| Cover Art Archive | 8192 | 2 KB | ~16 MB |
| fanart.tv | 8192 | 3 KB | ~24 MB |
| MediaWiki | 8192 | 4 KB | ~32 MB |
| TheAudioDB | 8192 | 2 KB | ~16 MB |
| Total | 40960 | - | ~128 MB |
DataLoader Memory Usage
DataLoader instances are created per-request and garbage collected after response:
- Per-request overhead: ~1-5 MB (depends on query complexity)
- Concurrent requests: 100 requests × 5 MB = 500 MB peak
Recommended Memory Allocation
| Deployment | Heap Size | Rationale |
|---|---|---|
| Development | 512 MB | Single user, low traffic |
| Production (low) | 1 GB | 10-50 req/s, shared cache |
| Production (high) | 2 GB | 100+ req/s, full cache |
Node.js Configuration:
node --max-old-space-size=2048 cli.js
Data Freshness
GraphBrainz does not implement cache invalidation beyond TTL expiration. Data freshness depends on:
| Data Type | Typical Update Frequency | Cache TTL | Staleness Risk |
|---|---|---|---|
| Artist metadata | Weeks to months | 1 day | Low |
| Release metadata | Days to weeks | 1 day | Low |
| Relationships | Weeks to months | 1 day | Low |
| Cover art | Months to years | 1 day | Very low |
| Artist images | Months to years | 1 day | Very low |
| Biographies | Months to years | 1 day | Very low |
For real-time data requirements, reduce cache TTL:
GRAPHBRAINZ_CACHE_TTL=3600000 # 1 hour
Or disable caching entirely:
GRAPHBRAINZ_CACHE_SIZE=0