Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

13 KiB

Raw Blame History

GraphBrainz Architecture

Schema Construction Strategy

GraphBrainz employs a hybrid schema construction approach:

Core Schema: Programmatic construction using GraphQL.js constructors
Extensions: SDL (Schema Definition Language) strings merged via extendSchema()

This strategy provides type safety and runtime flexibility for the core while allowing extensions to use the more ergonomic SDL syntax.

Why Programmatic Construction?

Benefit	Description
Type Safety	Compile-time validation of schema structure
Dynamic Fields	Runtime field generation based on configuration
AST Inspection	Direct access to GraphQL AST for resolver optimization
Extension Points	Programmatic hooks for schema modification

Entity Type System

GraphBrainz defines 17 entity types in src/types/ (~2000 lines of code):

Entity Type	File Path	Purpose
Area	src/types/area.js	Geographic regions
Artist	src/types/artist.js	Musicians and groups
Collection	src/types/collection.js	User-curated lists
Disc	src/types/disc.js	Physical media
Event	src/types/event.js	Concerts and performances
Instrument	src/types/instrument.js	Musical instruments
Label	src/types/label.js	Record labels
Place	src/types/place.js	Venues and locations
Recording	src/types/recording.js	Audio recordings
Release	src/types/release.js	Album releases
ReleaseGroup	src/types/release-group.js	Release groupings
Series	src/types/series.js	Ordered collections
Tag	src/types/tag.js	User-generated tags
Track	src/types/track.js	Individual tracks
URL	src/types/url.js	External links
Work	src/types/work.js	Musical compositions
Relationships	src/types/relationships.js	Entity connections

Each type file exports a GraphQL object type with field definitions, resolvers, and relationship mappings.

Query Type Hierarchy

GraphBrainz exposes four primary query patterns:

1. Lookup Queries

Direct entity retrieval by MusicBrainz ID (MBID).

Supported Entities: 13 types

lookup {
  area(mbid: String!)
  artist(mbid: String!)
  collection(mbid: String!)
  event(mbid: String!)
  instrument(mbid: String!)
  label(mbid: String!)
  place(mbid: String!)
  recording(mbid: String!)
  release(mbid: String!)
  releaseGroup(mbid: String!)
  series(mbid: String!)
  url(mbid: String!)
  work(mbid: String!)
}

2. Browse Queries

Retrieve entities linked to a parent entity with cursor-based pagination.

Supported Entities: 9 types

browse {
  areas(collection: String, first: Int, after: String)
  artists(area: String, collection: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
  collections(area: String, artist: String, editor: String, event: String, label: String, place: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
  events(area: String, artist: String, collection: String, place: String, first: Int, after: String)
  labels(area: String, collection: String, release: String, first: Int, after: String)
  places(area: String, collection: String, first: Int, after: String)
  recordings(artist: String, collection: String, release: String, first: Int, after: String)
  releases(area: String, artist: String, collection: String, label: String, recording: String, releaseGroup: String, track: String, trackArtist: String, first: Int, after: String)
  releaseGroups(artist: String, collection: String, release: String, first: Int, after: String)
}

3. Search Queries

Lucene-based full-text search across entity types.

Supported Entities: 10 types

search {
  areas(query: String!, first: Int, after: String)
  artists(query: String!, first: Int, after: String)
  events(query: String!, first: Int, after: String)
  instruments(query: String!, first: Int, after: String)
  labels(query: String!, first: Int, after: String)
  places(query: String!, first: Int, after: String)
  recordings(query: String!, first: Int, after: String)
  releases(query: String!, first: Int, after: String)
  releaseGroups(query: String!, first: Int, after: String)
  works(query: String!, first: Int, after: String)
}

4. Node Query (Relay)

Global object identification via Relay-compliant node interface.

node(id: ID!)

Resolver Architecture

GraphBrainz implements a three-tier resolver structure:

Tier 1: Query Resolvers

Entry points for lookup, browse, search, and node queries. Responsibilities:

Validate input parameters
Construct MusicBrainz API URLs
Delegate to DataLoader
Return raw API responses

Location: src/resolvers/query.js

Tier 2: Field Resolvers

Resolve individual fields on entity types. Responsibilities:

Extract field values from parent object
Trigger subqueries for related entities
Apply field-level transformations
Handle null/undefined cases

Location: src/types/*.js (per entity type)

Tier 3: Subquery Resolvers

Handle nested entity relationships. Responsibilities:

Inspect GraphQL AST for required fields
Determine MusicBrainz inc parameters
Batch related entity requests
Resolve circular dependencies

Location: src/resolvers/subquery.js

AST Inspection for Query Optimization

GraphBrainz resolvers inspect the GraphQL AST to determine which MusicBrainz inc parameters are needed. This eliminates over-fetching and under-fetching.

Example

GraphQL Query:

{
  lookup {
    artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") {
      name
      releases {
        title
        date
      }
    }
  }
}

AST Inspection Result:

Detects releases field in selection set
Adds inc=releases to MusicBrainz API request
Avoids fetching recordings, works, or other unneeded relationships

MusicBrainz API Call:

GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases

Implementation

AST inspection occurs in resolver functions via info.fieldNodes:

function resolveArtist(parent, args, context, info) {
  const selections = info.fieldNodes[0].selectionSet.selections;
  const inc = [];
  
  for (const selection of selections) {
    if (selection.name.value === 'releases') {
      inc.push('releases');
    }
    if (selection.name.value === 'recordings') {
      inc.push('recordings');
    }
  }
  
  return context.loaders.artist.load({ mbid: args.mbid, inc });
}

Extension System

Extensions modify the schema and context in two phases:

Phase 1: Context Extension

Extensions add custom HTTP clients, DataLoaders, and caches to the GraphQL context.

Interface:

{
  extendContext(context, options) {
    return {
      ...context,
      [extensionName]: {
        client: new ExtensionClient(options),
        loader: new DataLoader(batchFn),
        cache: new LRUCache(options)
      }
    };
  }
}

Phase 2: Schema Extension

Extensions add fields to existing types or define new types via SDL.

Interface:

{
  extendSchema(schema, options) {
    const typeDefs = `
      extend type Artist {
        fanArt: FanArtImages
      }
      
      type FanArtImages {
        backgrounds: [FanArtImage]
        logos: [FanArtImage]
      }
    `;
    
    const resolvers = {
      Artist: {
        fanArt(artist, args, context) {
          return context.fanart.loader.load(artist.id);
        }
      }
    };
    
    return extendSchema(schema, { typeDefs, resolvers });
  }
}

Extension Loading

Extensions are loaded via environment variable or programmatic options:

Environment Variable:

GRAPHBRAINZ_EXTENSIONS="cover-art-archive,fanart,mediawiki,theaudiodb"

Programmatic:

import { middleware } from 'graphbrainz';
import lastfm from 'graphbrainz-extension-lastfm';

app.use('/graphql', middleware({
  extensions: [lastfm]
}));

DataLoader Integration

GraphBrainz uses DataLoader for request batching and deduplication.

Per-Request Batching

Each GraphQL request receives a fresh DataLoader instance. This ensures:

Requests within a single query are batched
Duplicate requests are deduplicated
Cache is scoped to request lifecycle

Batch Functions

Each entity type has a batch function that:

Receives array of keys (MBIDs or query parameters)
Groups keys by API endpoint
Makes batched HTTP requests
Returns array of results in same order as keys

Example:

async function batchArtists(keys) {
  const results = await Promise.all(
    keys.map(key => 
      got(`/ws/2/artist/${key.mbid}?inc=${key.inc.join(',')}`)
    )
  );
  return results.map(r => r.body);
}

const artistLoader = new DataLoader(batchArtists);

LRU Cache Layer

Shared LRU cache sits above DataLoader for cross-request caching.

Configuration

Parameter	Environment Variable	Default
Size	GRAPHBRAINZ_CACHE_SIZE	8192 items
TTL	GRAPHBRAINZ_CACHE_TTL	86400000 ms (1 day)

Cache Key Strategy

Cache keys combine entity type, MBID, and inc parameters:

artist:5b11f4ce-a62d-471e-81fc-a69a8278c7da:releases,recordings

This ensures different queries for the same entity don't collide.

Per-Extension Caches

Each extension maintains its own LRU cache with separate configuration:

FANART_CACHE_SIZE / FANART_CACHE_TTL
THEAUDIODB_CACHE_SIZE / THEAUDIODB_CACHE_TTL
COVERART_CACHE_SIZE / COVERART_CACHE_TTL

Rate Limiting

Custom priority queue implementation ensures API compliance.

MusicBrainz Rate Limits

Limit: 5 requests per 5.5 seconds
Strategy: Token bucket with 5 tokens, refill rate 0.909 tokens/second
Concurrency: 1 (sequential requests)

Extension Rate Limits

Limit: 10 requests per second (default)
Strategy: Token bucket with 10 tokens, refill rate 10 tokens/second
Concurrency: 5 (parallel requests)

Priority Queue

Requests are queued with priority levels:

High: Lookup queries (direct MBID access)
Medium: Browse queries (relationship traversal)
Low: Search queries (full-text search)

Higher priority requests are processed first when rate limit is reached.

Implementation

Location: src/rate-limit.js

class RateLimiter {
  constructor(options) {
    this.tokens = options.limit;
    this.limit = options.limit;
    this.refillRate = options.limit / options.interval;
    this.queue = new PriorityQueue();
  }
  
  async acquire(priority = 'medium') {
    if (this.tokens > 0) {
      this.tokens--;
      return Promise.resolve();
    }
    
    return new Promise(resolve => {
      this.queue.enqueue({ resolve, priority });
    });
  }
  
  refill() {
    this.tokens = Math.min(this.limit, this.tokens + this.refillRate);
    while (this.tokens > 0 && this.queue.length > 0) {
      const { resolve } = this.queue.dequeue();
      this.tokens--;
      resolve();
    }
  }
}

File Structure

src/
├── index.js                 # Entry point, start() function
├── schema.js                # Schema construction
├── context.js               # Context factory
├── types/                   # Entity type definitions
│   ├── area.js
│   ├── artist.js
│   ├── collection.js
│   ├── disc.js
│   ├── event.js
│   ├── instrument.js
│   ├── label.js
│   ├── place.js
│   ├── recording.js
│   ├── release.js
│   ├── release-group.js
│   ├── series.js
│   ├── tag.js
│   ├── track.js
│   ├── url.js
│   ├── work.js
│   └── relationships.js
├── resolvers/               # Resolver implementations
│   ├── query.js
│   └── subquery.js
├── loaders/                 # DataLoader batch functions
│   └── musicbrainz.js
├── rate-limit.js            # Rate limiter implementation
├── client.js                # Base HTTP client
└── extensions/              # Built-in extensions
    ├── cover-art-archive/
    ├── fanart/
    ├── mediawiki/
    └── theaudiodb/

Relay Compliance

GraphBrainz implements the Relay specification for cursor-based pagination:

Connection Pattern

All list fields return connection types:

type ArtistConnection {
  edges: [ArtistEdge]
  nodes: [Artist]
  pageInfo: PageInfo!
  totalCount: Int
}

type ArtistEdge {
  node: Artist
  cursor: String!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

Pagination Arguments

first: Int - Number of items to return
after: String - Cursor for pagination
last: Int - Number of items from end (not implemented)
before: String - Cursor for reverse pagination (not implemented)

Node Interface

Global object identification via node(id: ID!) query:

interface Node {
  id: ID!
}

All entity types implement the Node interface with globally unique IDs.

13 KiB Raw Blame History

GraphBrainz Architecture

Schema Construction Strategy

Why Programmatic Construction?

Entity Type System

Query Type Hierarchy

1. Lookup Queries

2. Browse Queries

3. Search Queries

4. Node Query (Relay)

Resolver Architecture

Tier 1: Query Resolvers

Tier 2: Field Resolvers

Tier 3: Subquery Resolvers

AST Inspection for Query Optimization

Example

Implementation

Extension System

Phase 1: Context Extension

Phase 2: Schema Extension

Extension Loading

DataLoader Integration

Per-Request Batching

Batch Functions

LRU Cache Layer

Configuration

Cache Key Strategy

Per-Extension Caches

Rate Limiting

MusicBrainz Rate Limits

Extension Rate Limits

Priority Queue

Implementation

File Structure

Relay Compliance

Connection Pattern

Pagination Arguments

Node Interface

13 KiB

Raw Blame History