Files
metadata-agregator/docs/research/graphbrainz/analysis/ARCHITECTURE.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

13 KiB

GraphBrainz Architecture

Schema Construction Strategy

GraphBrainz employs a hybrid schema construction approach:

  • Core Schema: Programmatic construction using GraphQL.js constructors
  • Extensions: SDL (Schema Definition Language) strings merged via extendSchema()

This strategy provides type safety and runtime flexibility for the core while allowing extensions to use the more ergonomic SDL syntax.

Why Programmatic Construction?

Benefit Description
Type Safety Compile-time validation of schema structure
Dynamic Fields Runtime field generation based on configuration
AST Inspection Direct access to GraphQL AST for resolver optimization
Extension Points Programmatic hooks for schema modification

Entity Type System

GraphBrainz defines 17 entity types in src/types/ (~2000 lines of code):

Entity Type File Path Purpose
Area src/types/area.js Geographic regions
Artist src/types/artist.js Musicians and groups
Collection src/types/collection.js User-curated lists
Disc src/types/disc.js Physical media
Event src/types/event.js Concerts and performances
Instrument src/types/instrument.js Musical instruments
Label src/types/label.js Record labels
Place src/types/place.js Venues and locations
Recording src/types/recording.js Audio recordings
Release src/types/release.js Album releases
ReleaseGroup src/types/release-group.js Release groupings
Series src/types/series.js Ordered collections
Tag src/types/tag.js User-generated tags
Track src/types/track.js Individual tracks
URL src/types/url.js External links
Work src/types/work.js Musical compositions
Relationships src/types/relationships.js Entity connections

Each type file exports a GraphQL object type with field definitions, resolvers, and relationship mappings.

Query Type Hierarchy

GraphBrainz exposes four primary query patterns:

1. Lookup Queries

Direct entity retrieval by MusicBrainz ID (MBID).

Supported Entities: 13 types

lookup {
  area(mbid: String!)
  artist(mbid: String!)
  collection(mbid: String!)
  event(mbid: String!)
  instrument(mbid: String!)
  label(mbid: String!)
  place(mbid: String!)
  recording(mbid: String!)
  release(mbid: String!)
  releaseGroup(mbid: String!)
  series(mbid: String!)
  url(mbid: String!)
  work(mbid: String!)
}

2. Browse Queries

Retrieve entities linked to a parent entity with cursor-based pagination.

Supported Entities: 9 types

browse {
  areas(collection: String, first: Int, after: String)
  artists(area: String, collection: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
  collections(area: String, artist: String, editor: String, event: String, label: String, place: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
  events(area: String, artist: String, collection: String, place: String, first: Int, after: String)
  labels(area: String, collection: String, release: String, first: Int, after: String)
  places(area: String, collection: String, first: Int, after: String)
  recordings(artist: String, collection: String, release: String, first: Int, after: String)
  releases(area: String, artist: String, collection: String, label: String, recording: String, releaseGroup: String, track: String, trackArtist: String, first: Int, after: String)
  releaseGroups(artist: String, collection: String, release: String, first: Int, after: String)
}

3. Search Queries

Lucene-based full-text search across entity types.

Supported Entities: 10 types

search {
  areas(query: String!, first: Int, after: String)
  artists(query: String!, first: Int, after: String)
  events(query: String!, first: Int, after: String)
  instruments(query: String!, first: Int, after: String)
  labels(query: String!, first: Int, after: String)
  places(query: String!, first: Int, after: String)
  recordings(query: String!, first: Int, after: String)
  releases(query: String!, first: Int, after: String)
  releaseGroups(query: String!, first: Int, after: String)
  works(query: String!, first: Int, after: String)
}

4. Node Query (Relay)

Global object identification via Relay-compliant node interface.

node(id: ID!)

Resolver Architecture

GraphBrainz implements a three-tier resolver structure:

Tier 1: Query Resolvers

Entry points for lookup, browse, search, and node queries. Responsibilities:

  • Validate input parameters
  • Construct MusicBrainz API URLs
  • Delegate to DataLoader
  • Return raw API responses

Location: src/resolvers/query.js

Tier 2: Field Resolvers

Resolve individual fields on entity types. Responsibilities:

  • Extract field values from parent object
  • Trigger subqueries for related entities
  • Apply field-level transformations
  • Handle null/undefined cases

Location: src/types/*.js (per entity type)

Tier 3: Subquery Resolvers

Handle nested entity relationships. Responsibilities:

  • Inspect GraphQL AST for required fields
  • Determine MusicBrainz inc parameters
  • Batch related entity requests
  • Resolve circular dependencies

Location: src/resolvers/subquery.js

AST Inspection for Query Optimization

GraphBrainz resolvers inspect the GraphQL AST to determine which MusicBrainz inc parameters are needed. This eliminates over-fetching and under-fetching.

Example

GraphQL Query:

{
  lookup {
    artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") {
      name
      releases {
        title
        date
      }
    }
  }
}

AST Inspection Result:

  • Detects releases field in selection set
  • Adds inc=releases to MusicBrainz API request
  • Avoids fetching recordings, works, or other unneeded relationships

MusicBrainz API Call:

GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases

Implementation

AST inspection occurs in resolver functions via info.fieldNodes:

function resolveArtist(parent, args, context, info) {
  const selections = info.fieldNodes[0].selectionSet.selections;
  const inc = [];
  
  for (const selection of selections) {
    if (selection.name.value === 'releases') {
      inc.push('releases');
    }
    if (selection.name.value === 'recordings') {
      inc.push('recordings');
    }
  }
  
  return context.loaders.artist.load({ mbid: args.mbid, inc });
}

Extension System

Extensions modify the schema and context in two phases:

Phase 1: Context Extension

Extensions add custom HTTP clients, DataLoaders, and caches to the GraphQL context.

Interface:

{
  extendContext(context, options) {
    return {
      ...context,
      [extensionName]: {
        client: new ExtensionClient(options),
        loader: new DataLoader(batchFn),
        cache: new LRUCache(options)
      }
    };
  }
}

Phase 2: Schema Extension

Extensions add fields to existing types or define new types via SDL.

Interface:

{
  extendSchema(schema, options) {
    const typeDefs = `
      extend type Artist {
        fanArt: FanArtImages
      }
      
      type FanArtImages {
        backgrounds: [FanArtImage]
        logos: [FanArtImage]
      }
    `;
    
    const resolvers = {
      Artist: {
        fanArt(artist, args, context) {
          return context.fanart.loader.load(artist.id);
        }
      }
    };
    
    return extendSchema(schema, { typeDefs, resolvers });
  }
}

Extension Loading

Extensions are loaded via environment variable or programmatic options:

Environment Variable:

GRAPHBRAINZ_EXTENSIONS="cover-art-archive,fanart,mediawiki,theaudiodb"

Programmatic:

import { middleware } from 'graphbrainz';
import lastfm from 'graphbrainz-extension-lastfm';

app.use('/graphql', middleware({
  extensions: [lastfm]
}));

DataLoader Integration

GraphBrainz uses DataLoader for request batching and deduplication.

Per-Request Batching

Each GraphQL request receives a fresh DataLoader instance. This ensures:

  • Requests within a single query are batched
  • Duplicate requests are deduplicated
  • Cache is scoped to request lifecycle

Batch Functions

Each entity type has a batch function that:

  1. Receives array of keys (MBIDs or query parameters)
  2. Groups keys by API endpoint
  3. Makes batched HTTP requests
  4. Returns array of results in same order as keys

Example:

async function batchArtists(keys) {
  const results = await Promise.all(
    keys.map(key => 
      got(`/ws/2/artist/${key.mbid}?inc=${key.inc.join(',')}`)
    )
  );
  return results.map(r => r.body);
}

const artistLoader = new DataLoader(batchArtists);

LRU Cache Layer

Shared LRU cache sits above DataLoader for cross-request caching.

Configuration

Parameter Environment Variable Default
Size GRAPHBRAINZ_CACHE_SIZE 8192 items
TTL GRAPHBRAINZ_CACHE_TTL 86400000 ms (1 day)

Cache Key Strategy

Cache keys combine entity type, MBID, and inc parameters:

artist:5b11f4ce-a62d-471e-81fc-a69a8278c7da:releases,recordings

This ensures different queries for the same entity don't collide.

Per-Extension Caches

Each extension maintains its own LRU cache with separate configuration:

  • FANART_CACHE_SIZE / FANART_CACHE_TTL
  • THEAUDIODB_CACHE_SIZE / THEAUDIODB_CACHE_TTL
  • COVERART_CACHE_SIZE / COVERART_CACHE_TTL

Rate Limiting

Custom priority queue implementation ensures API compliance.

MusicBrainz Rate Limits

  • Limit: 5 requests per 5.5 seconds
  • Strategy: Token bucket with 5 tokens, refill rate 0.909 tokens/second
  • Concurrency: 1 (sequential requests)

Extension Rate Limits

  • Limit: 10 requests per second (default)
  • Strategy: Token bucket with 10 tokens, refill rate 10 tokens/second
  • Concurrency: 5 (parallel requests)

Priority Queue

Requests are queued with priority levels:

  1. High: Lookup queries (direct MBID access)
  2. Medium: Browse queries (relationship traversal)
  3. Low: Search queries (full-text search)

Higher priority requests are processed first when rate limit is reached.

Implementation

Location: src/rate-limit.js

class RateLimiter {
  constructor(options) {
    this.tokens = options.limit;
    this.limit = options.limit;
    this.refillRate = options.limit / options.interval;
    this.queue = new PriorityQueue();
  }
  
  async acquire(priority = 'medium') {
    if (this.tokens > 0) {
      this.tokens--;
      return Promise.resolve();
    }
    
    return new Promise(resolve => {
      this.queue.enqueue({ resolve, priority });
    });
  }
  
  refill() {
    this.tokens = Math.min(this.limit, this.tokens + this.refillRate);
    while (this.tokens > 0 && this.queue.length > 0) {
      const { resolve } = this.queue.dequeue();
      this.tokens--;
      resolve();
    }
  }
}

File Structure

src/
├── index.js                 # Entry point, start() function
├── schema.js                # Schema construction
├── context.js               # Context factory
├── types/                   # Entity type definitions
│   ├── area.js
│   ├── artist.js
│   ├── collection.js
│   ├── disc.js
│   ├── event.js
│   ├── instrument.js
│   ├── label.js
│   ├── place.js
│   ├── recording.js
│   ├── release.js
│   ├── release-group.js
│   ├── series.js
│   ├── tag.js
│   ├── track.js
│   ├── url.js
│   ├── work.js
│   └── relationships.js
├── resolvers/               # Resolver implementations
│   ├── query.js
│   └── subquery.js
├── loaders/                 # DataLoader batch functions
│   └── musicbrainz.js
├── rate-limit.js            # Rate limiter implementation
├── client.js                # Base HTTP client
└── extensions/              # Built-in extensions
    ├── cover-art-archive/
    ├── fanart/
    ├── mediawiki/
    └── theaudiodb/

Relay Compliance

GraphBrainz implements the Relay specification for cursor-based pagination:

Connection Pattern

All list fields return connection types:

type ArtistConnection {
  edges: [ArtistEdge]
  nodes: [Artist]
  pageInfo: PageInfo!
  totalCount: Int
}

type ArtistEdge {
  node: Artist
  cursor: String!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

Pagination Arguments

  • first: Int - Number of items to return
  • after: String - Cursor for pagination
  • last: Int - Number of items from end (not implemented)
  • before: String - Cursor for reverse pagination (not implemented)

Node Interface

Global object identification via node(id: ID!) query:

interface Node {
  id: ID!
}

All entity types implement the Node interface with globally unique IDs.