# GraphBrainz Architecture ## Schema Construction Strategy GraphBrainz employs a hybrid schema construction approach: - **Core Schema**: Programmatic construction using GraphQL.js constructors - **Extensions**: SDL (Schema Definition Language) strings merged via `extendSchema()` This strategy provides type safety and runtime flexibility for the core while allowing extensions to use the more ergonomic SDL syntax. ### Why Programmatic Construction? | Benefit | Description | |---------|-------------| | Type Safety | Compile-time validation of schema structure | | Dynamic Fields | Runtime field generation based on configuration | | AST Inspection | Direct access to GraphQL AST for resolver optimization | | Extension Points | Programmatic hooks for schema modification | ## Entity Type System GraphBrainz defines 17 entity types in `src/types/` (~2000 lines of code): | Entity Type | File Path | Purpose | |-------------|-----------|---------| | Area | src/types/area.js | Geographic regions | | Artist | src/types/artist.js | Musicians and groups | | Collection | src/types/collection.js | User-curated lists | | Disc | src/types/disc.js | Physical media | | Event | src/types/event.js | Concerts and performances | | Instrument | src/types/instrument.js | Musical instruments | | Label | src/types/label.js | Record labels | | Place | src/types/place.js | Venues and locations | | Recording | src/types/recording.js | Audio recordings | | Release | src/types/release.js | Album releases | | ReleaseGroup | src/types/release-group.js | Release groupings | | Series | src/types/series.js | Ordered collections | | Tag | src/types/tag.js | User-generated tags | | Track | src/types/track.js | Individual tracks | | URL | src/types/url.js | External links | | Work | src/types/work.js | Musical compositions | | Relationships | src/types/relationships.js | Entity connections | Each type file exports a GraphQL object type with field definitions, resolvers, and relationship mappings. ## Query Type Hierarchy GraphBrainz exposes four primary query patterns: ### 1. Lookup Queries Direct entity retrieval by MusicBrainz ID (MBID). **Supported Entities**: 13 types ``` lookup { area(mbid: String!) artist(mbid: String!) collection(mbid: String!) event(mbid: String!) instrument(mbid: String!) label(mbid: String!) place(mbid: String!) recording(mbid: String!) release(mbid: String!) releaseGroup(mbid: String!) series(mbid: String!) url(mbid: String!) work(mbid: String!) } ``` ### 2. Browse Queries Retrieve entities linked to a parent entity with cursor-based pagination. **Supported Entities**: 9 types ``` browse { areas(collection: String, first: Int, after: String) artists(area: String, collection: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String) collections(area: String, artist: String, editor: String, event: String, label: String, place: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String) events(area: String, artist: String, collection: String, place: String, first: Int, after: String) labels(area: String, collection: String, release: String, first: Int, after: String) places(area: String, collection: String, first: Int, after: String) recordings(artist: String, collection: String, release: String, first: Int, after: String) releases(area: String, artist: String, collection: String, label: String, recording: String, releaseGroup: String, track: String, trackArtist: String, first: Int, after: String) releaseGroups(artist: String, collection: String, release: String, first: Int, after: String) } ``` ### 3. Search Queries Lucene-based full-text search across entity types. **Supported Entities**: 10 types ``` search { areas(query: String!, first: Int, after: String) artists(query: String!, first: Int, after: String) events(query: String!, first: Int, after: String) instruments(query: String!, first: Int, after: String) labels(query: String!, first: Int, after: String) places(query: String!, first: Int, after: String) recordings(query: String!, first: Int, after: String) releases(query: String!, first: Int, after: String) releaseGroups(query: String!, first: Int, after: String) works(query: String!, first: Int, after: String) } ``` ### 4. Node Query (Relay) Global object identification via Relay-compliant node interface. ``` node(id: ID!) ``` ## Resolver Architecture GraphBrainz implements a three-tier resolver structure: ### Tier 1: Query Resolvers Entry points for lookup, browse, search, and node queries. Responsibilities: - Validate input parameters - Construct MusicBrainz API URLs - Delegate to DataLoader - Return raw API responses **Location**: `src/resolvers/query.js` ### Tier 2: Field Resolvers Resolve individual fields on entity types. Responsibilities: - Extract field values from parent object - Trigger subqueries for related entities - Apply field-level transformations - Handle null/undefined cases **Location**: `src/types/*.js` (per entity type) ### Tier 3: Subquery Resolvers Handle nested entity relationships. Responsibilities: - Inspect GraphQL AST for required fields - Determine MusicBrainz `inc` parameters - Batch related entity requests - Resolve circular dependencies **Location**: `src/resolvers/subquery.js` ## AST Inspection for Query Optimization GraphBrainz resolvers inspect the GraphQL AST to determine which MusicBrainz `inc` parameters are needed. This eliminates over-fetching and under-fetching. ### Example **GraphQL Query**: ```graphql { lookup { artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") { name releases { title date } } } } ``` **AST Inspection Result**: - Detects `releases` field in selection set - Adds `inc=releases` to MusicBrainz API request - Avoids fetching recordings, works, or other unneeded relationships **MusicBrainz API Call**: ``` GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases ``` ### Implementation AST inspection occurs in resolver functions via `info.fieldNodes`: ```javascript function resolveArtist(parent, args, context, info) { const selections = info.fieldNodes[0].selectionSet.selections; const inc = []; for (const selection of selections) { if (selection.name.value === 'releases') { inc.push('releases'); } if (selection.name.value === 'recordings') { inc.push('recordings'); } } return context.loaders.artist.load({ mbid: args.mbid, inc }); } ``` ## Extension System Extensions modify the schema and context in two phases: ### Phase 1: Context Extension Extensions add custom HTTP clients, DataLoaders, and caches to the GraphQL context. **Interface**: ```javascript { extendContext(context, options) { return { ...context, [extensionName]: { client: new ExtensionClient(options), loader: new DataLoader(batchFn), cache: new LRUCache(options) } }; } } ``` ### Phase 2: Schema Extension Extensions add fields to existing types or define new types via SDL. **Interface**: ```javascript { extendSchema(schema, options) { const typeDefs = ` extend type Artist { fanArt: FanArtImages } type FanArtImages { backgrounds: [FanArtImage] logos: [FanArtImage] } `; const resolvers = { Artist: { fanArt(artist, args, context) { return context.fanart.loader.load(artist.id); } } }; return extendSchema(schema, { typeDefs, resolvers }); } } ``` ### Extension Loading Extensions are loaded via environment variable or programmatic options: **Environment Variable**: ```bash GRAPHBRAINZ_EXTENSIONS="cover-art-archive,fanart,mediawiki,theaudiodb" ``` **Programmatic**: ```javascript import { middleware } from 'graphbrainz'; import lastfm from 'graphbrainz-extension-lastfm'; app.use('/graphql', middleware({ extensions: [lastfm] })); ``` ## DataLoader Integration GraphBrainz uses DataLoader for request batching and deduplication. ### Per-Request Batching Each GraphQL request receives a fresh DataLoader instance. This ensures: - Requests within a single query are batched - Duplicate requests are deduplicated - Cache is scoped to request lifecycle ### Batch Functions Each entity type has a batch function that: 1. Receives array of keys (MBIDs or query parameters) 2. Groups keys by API endpoint 3. Makes batched HTTP requests 4. Returns array of results in same order as keys **Example**: ```javascript async function batchArtists(keys) { const results = await Promise.all( keys.map(key => got(`/ws/2/artist/${key.mbid}?inc=${key.inc.join(',')}`) ) ); return results.map(r => r.body); } const artistLoader = new DataLoader(batchArtists); ``` ## LRU Cache Layer Shared LRU cache sits above DataLoader for cross-request caching. ### Configuration | Parameter | Environment Variable | Default | |-----------|---------------------|---------| | Size | GRAPHBRAINZ_CACHE_SIZE | 8192 items | | TTL | GRAPHBRAINZ_CACHE_TTL | 86400000 ms (1 day) | ### Cache Key Strategy Cache keys combine entity type, MBID, and `inc` parameters: ``` artist:5b11f4ce-a62d-471e-81fc-a69a8278c7da:releases,recordings ``` This ensures different queries for the same entity don't collide. ### Per-Extension Caches Each extension maintains its own LRU cache with separate configuration: - `FANART_CACHE_SIZE` / `FANART_CACHE_TTL` - `THEAUDIODB_CACHE_SIZE` / `THEAUDIODB_CACHE_TTL` - `COVERART_CACHE_SIZE` / `COVERART_CACHE_TTL` ## Rate Limiting Custom priority queue implementation ensures API compliance. ### MusicBrainz Rate Limits - **Limit**: 5 requests per 5.5 seconds - **Strategy**: Token bucket with 5 tokens, refill rate 0.909 tokens/second - **Concurrency**: 1 (sequential requests) ### Extension Rate Limits - **Limit**: 10 requests per second (default) - **Strategy**: Token bucket with 10 tokens, refill rate 10 tokens/second - **Concurrency**: 5 (parallel requests) ### Priority Queue Requests are queued with priority levels: 1. **High**: Lookup queries (direct MBID access) 2. **Medium**: Browse queries (relationship traversal) 3. **Low**: Search queries (full-text search) Higher priority requests are processed first when rate limit is reached. ### Implementation **Location**: `src/rate-limit.js` ```javascript class RateLimiter { constructor(options) { this.tokens = options.limit; this.limit = options.limit; this.refillRate = options.limit / options.interval; this.queue = new PriorityQueue(); } async acquire(priority = 'medium') { if (this.tokens > 0) { this.tokens--; return Promise.resolve(); } return new Promise(resolve => { this.queue.enqueue({ resolve, priority }); }); } refill() { this.tokens = Math.min(this.limit, this.tokens + this.refillRate); while (this.tokens > 0 && this.queue.length > 0) { const { resolve } = this.queue.dequeue(); this.tokens--; resolve(); } } } ``` ## File Structure ``` src/ ├── index.js # Entry point, start() function ├── schema.js # Schema construction ├── context.js # Context factory ├── types/ # Entity type definitions │ ├── area.js │ ├── artist.js │ ├── collection.js │ ├── disc.js │ ├── event.js │ ├── instrument.js │ ├── label.js │ ├── place.js │ ├── recording.js │ ├── release.js │ ├── release-group.js │ ├── series.js │ ├── tag.js │ ├── track.js │ ├── url.js │ ├── work.js │ └── relationships.js ├── resolvers/ # Resolver implementations │ ├── query.js │ └── subquery.js ├── loaders/ # DataLoader batch functions │ └── musicbrainz.js ├── rate-limit.js # Rate limiter implementation ├── client.js # Base HTTP client └── extensions/ # Built-in extensions ├── cover-art-archive/ ├── fanart/ ├── mediawiki/ └── theaudiodb/ ``` ## Relay Compliance GraphBrainz implements the Relay specification for cursor-based pagination: ### Connection Pattern All list fields return connection types: ```graphql type ArtistConnection { edges: [ArtistEdge] nodes: [Artist] pageInfo: PageInfo! totalCount: Int } type ArtistEdge { node: Artist cursor: String! } type PageInfo { hasNextPage: Boolean! hasPreviousPage: Boolean! startCursor: String endCursor: String } ``` ### Pagination Arguments - `first: Int` - Number of items to return - `after: String` - Cursor for pagination - `last: Int` - Number of items from end (not implemented) - `before: String` - Cursor for reverse pagination (not implemented) ### Node Interface Global object identification via `node(id: ID!)` query: ```graphql interface Node { id: ID! } ``` All entity types implement the Node interface with globally unique IDs.