a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
500 lines
13 KiB
Markdown
500 lines
13 KiB
Markdown
# GraphBrainz Architecture
|
|
|
|
## Schema Construction Strategy
|
|
|
|
GraphBrainz employs a hybrid schema construction approach:
|
|
|
|
- **Core Schema**: Programmatic construction using GraphQL.js constructors
|
|
- **Extensions**: SDL (Schema Definition Language) strings merged via `extendSchema()`
|
|
|
|
This strategy provides type safety and runtime flexibility for the core while allowing extensions to use the more ergonomic SDL syntax.
|
|
|
|
### Why Programmatic Construction?
|
|
|
|
| Benefit | Description |
|
|
|---------|-------------|
|
|
| Type Safety | Compile-time validation of schema structure |
|
|
| Dynamic Fields | Runtime field generation based on configuration |
|
|
| AST Inspection | Direct access to GraphQL AST for resolver optimization |
|
|
| Extension Points | Programmatic hooks for schema modification |
|
|
|
|
## Entity Type System
|
|
|
|
GraphBrainz defines 17 entity types in `src/types/` (~2000 lines of code):
|
|
|
|
| Entity Type | File Path | Purpose |
|
|
|-------------|-----------|---------|
|
|
| Area | src/types/area.js | Geographic regions |
|
|
| Artist | src/types/artist.js | Musicians and groups |
|
|
| Collection | src/types/collection.js | User-curated lists |
|
|
| Disc | src/types/disc.js | Physical media |
|
|
| Event | src/types/event.js | Concerts and performances |
|
|
| Instrument | src/types/instrument.js | Musical instruments |
|
|
| Label | src/types/label.js | Record labels |
|
|
| Place | src/types/place.js | Venues and locations |
|
|
| Recording | src/types/recording.js | Audio recordings |
|
|
| Release | src/types/release.js | Album releases |
|
|
| ReleaseGroup | src/types/release-group.js | Release groupings |
|
|
| Series | src/types/series.js | Ordered collections |
|
|
| Tag | src/types/tag.js | User-generated tags |
|
|
| Track | src/types/track.js | Individual tracks |
|
|
| URL | src/types/url.js | External links |
|
|
| Work | src/types/work.js | Musical compositions |
|
|
| Relationships | src/types/relationships.js | Entity connections |
|
|
|
|
Each type file exports a GraphQL object type with field definitions, resolvers, and relationship mappings.
|
|
|
|
## Query Type Hierarchy
|
|
|
|
GraphBrainz exposes four primary query patterns:
|
|
|
|
### 1. Lookup Queries
|
|
|
|
Direct entity retrieval by MusicBrainz ID (MBID).
|
|
|
|
**Supported Entities**: 13 types
|
|
|
|
```
|
|
lookup {
|
|
area(mbid: String!)
|
|
artist(mbid: String!)
|
|
collection(mbid: String!)
|
|
event(mbid: String!)
|
|
instrument(mbid: String!)
|
|
label(mbid: String!)
|
|
place(mbid: String!)
|
|
recording(mbid: String!)
|
|
release(mbid: String!)
|
|
releaseGroup(mbid: String!)
|
|
series(mbid: String!)
|
|
url(mbid: String!)
|
|
work(mbid: String!)
|
|
}
|
|
```
|
|
|
|
### 2. Browse Queries
|
|
|
|
Retrieve entities linked to a parent entity with cursor-based pagination.
|
|
|
|
**Supported Entities**: 9 types
|
|
|
|
```
|
|
browse {
|
|
areas(collection: String, first: Int, after: String)
|
|
artists(area: String, collection: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
|
|
collections(area: String, artist: String, editor: String, event: String, label: String, place: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
|
|
events(area: String, artist: String, collection: String, place: String, first: Int, after: String)
|
|
labels(area: String, collection: String, release: String, first: Int, after: String)
|
|
places(area: String, collection: String, first: Int, after: String)
|
|
recordings(artist: String, collection: String, release: String, first: Int, after: String)
|
|
releases(area: String, artist: String, collection: String, label: String, recording: String, releaseGroup: String, track: String, trackArtist: String, first: Int, after: String)
|
|
releaseGroups(artist: String, collection: String, release: String, first: Int, after: String)
|
|
}
|
|
```
|
|
|
|
### 3. Search Queries
|
|
|
|
Lucene-based full-text search across entity types.
|
|
|
|
**Supported Entities**: 10 types
|
|
|
|
```
|
|
search {
|
|
areas(query: String!, first: Int, after: String)
|
|
artists(query: String!, first: Int, after: String)
|
|
events(query: String!, first: Int, after: String)
|
|
instruments(query: String!, first: Int, after: String)
|
|
labels(query: String!, first: Int, after: String)
|
|
places(query: String!, first: Int, after: String)
|
|
recordings(query: String!, first: Int, after: String)
|
|
releases(query: String!, first: Int, after: String)
|
|
releaseGroups(query: String!, first: Int, after: String)
|
|
works(query: String!, first: Int, after: String)
|
|
}
|
|
```
|
|
|
|
### 4. Node Query (Relay)
|
|
|
|
Global object identification via Relay-compliant node interface.
|
|
|
|
```
|
|
node(id: ID!)
|
|
```
|
|
|
|
## Resolver Architecture
|
|
|
|
GraphBrainz implements a three-tier resolver structure:
|
|
|
|
### Tier 1: Query Resolvers
|
|
|
|
Entry points for lookup, browse, search, and node queries. Responsibilities:
|
|
|
|
- Validate input parameters
|
|
- Construct MusicBrainz API URLs
|
|
- Delegate to DataLoader
|
|
- Return raw API responses
|
|
|
|
**Location**: `src/resolvers/query.js`
|
|
|
|
### Tier 2: Field Resolvers
|
|
|
|
Resolve individual fields on entity types. Responsibilities:
|
|
|
|
- Extract field values from parent object
|
|
- Trigger subqueries for related entities
|
|
- Apply field-level transformations
|
|
- Handle null/undefined cases
|
|
|
|
**Location**: `src/types/*.js` (per entity type)
|
|
|
|
### Tier 3: Subquery Resolvers
|
|
|
|
Handle nested entity relationships. Responsibilities:
|
|
|
|
- Inspect GraphQL AST for required fields
|
|
- Determine MusicBrainz `inc` parameters
|
|
- Batch related entity requests
|
|
- Resolve circular dependencies
|
|
|
|
**Location**: `src/resolvers/subquery.js`
|
|
|
|
## AST Inspection for Query Optimization
|
|
|
|
GraphBrainz resolvers inspect the GraphQL AST to determine which MusicBrainz `inc` parameters are needed. This eliminates over-fetching and under-fetching.
|
|
|
|
### Example
|
|
|
|
**GraphQL Query**:
|
|
```graphql
|
|
{
|
|
lookup {
|
|
artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") {
|
|
name
|
|
releases {
|
|
title
|
|
date
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**AST Inspection Result**:
|
|
- Detects `releases` field in selection set
|
|
- Adds `inc=releases` to MusicBrainz API request
|
|
- Avoids fetching recordings, works, or other unneeded relationships
|
|
|
|
**MusicBrainz API Call**:
|
|
```
|
|
GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases
|
|
```
|
|
|
|
### Implementation
|
|
|
|
AST inspection occurs in resolver functions via `info.fieldNodes`:
|
|
|
|
```javascript
|
|
function resolveArtist(parent, args, context, info) {
|
|
const selections = info.fieldNodes[0].selectionSet.selections;
|
|
const inc = [];
|
|
|
|
for (const selection of selections) {
|
|
if (selection.name.value === 'releases') {
|
|
inc.push('releases');
|
|
}
|
|
if (selection.name.value === 'recordings') {
|
|
inc.push('recordings');
|
|
}
|
|
}
|
|
|
|
return context.loaders.artist.load({ mbid: args.mbid, inc });
|
|
}
|
|
```
|
|
|
|
## Extension System
|
|
|
|
Extensions modify the schema and context in two phases:
|
|
|
|
### Phase 1: Context Extension
|
|
|
|
Extensions add custom HTTP clients, DataLoaders, and caches to the GraphQL context.
|
|
|
|
**Interface**:
|
|
```javascript
|
|
{
|
|
extendContext(context, options) {
|
|
return {
|
|
...context,
|
|
[extensionName]: {
|
|
client: new ExtensionClient(options),
|
|
loader: new DataLoader(batchFn),
|
|
cache: new LRUCache(options)
|
|
}
|
|
};
|
|
}
|
|
}
|
|
```
|
|
|
|
### Phase 2: Schema Extension
|
|
|
|
Extensions add fields to existing types or define new types via SDL.
|
|
|
|
**Interface**:
|
|
```javascript
|
|
{
|
|
extendSchema(schema, options) {
|
|
const typeDefs = `
|
|
extend type Artist {
|
|
fanArt: FanArtImages
|
|
}
|
|
|
|
type FanArtImages {
|
|
backgrounds: [FanArtImage]
|
|
logos: [FanArtImage]
|
|
}
|
|
`;
|
|
|
|
const resolvers = {
|
|
Artist: {
|
|
fanArt(artist, args, context) {
|
|
return context.fanart.loader.load(artist.id);
|
|
}
|
|
}
|
|
};
|
|
|
|
return extendSchema(schema, { typeDefs, resolvers });
|
|
}
|
|
}
|
|
```
|
|
|
|
### Extension Loading
|
|
|
|
Extensions are loaded via environment variable or programmatic options:
|
|
|
|
**Environment Variable**:
|
|
```bash
|
|
GRAPHBRAINZ_EXTENSIONS="cover-art-archive,fanart,mediawiki,theaudiodb"
|
|
```
|
|
|
|
**Programmatic**:
|
|
```javascript
|
|
import { middleware } from 'graphbrainz';
|
|
import lastfm from 'graphbrainz-extension-lastfm';
|
|
|
|
app.use('/graphql', middleware({
|
|
extensions: [lastfm]
|
|
}));
|
|
```
|
|
|
|
## DataLoader Integration
|
|
|
|
GraphBrainz uses DataLoader for request batching and deduplication.
|
|
|
|
### Per-Request Batching
|
|
|
|
Each GraphQL request receives a fresh DataLoader instance. This ensures:
|
|
|
|
- Requests within a single query are batched
|
|
- Duplicate requests are deduplicated
|
|
- Cache is scoped to request lifecycle
|
|
|
|
### Batch Functions
|
|
|
|
Each entity type has a batch function that:
|
|
|
|
1. Receives array of keys (MBIDs or query parameters)
|
|
2. Groups keys by API endpoint
|
|
3. Makes batched HTTP requests
|
|
4. Returns array of results in same order as keys
|
|
|
|
**Example**:
|
|
```javascript
|
|
async function batchArtists(keys) {
|
|
const results = await Promise.all(
|
|
keys.map(key =>
|
|
got(`/ws/2/artist/${key.mbid}?inc=${key.inc.join(',')}`)
|
|
)
|
|
);
|
|
return results.map(r => r.body);
|
|
}
|
|
|
|
const artistLoader = new DataLoader(batchArtists);
|
|
```
|
|
|
|
## LRU Cache Layer
|
|
|
|
Shared LRU cache sits above DataLoader for cross-request caching.
|
|
|
|
### Configuration
|
|
|
|
| Parameter | Environment Variable | Default |
|
|
|-----------|---------------------|---------|
|
|
| Size | GRAPHBRAINZ_CACHE_SIZE | 8192 items |
|
|
| TTL | GRAPHBRAINZ_CACHE_TTL | 86400000 ms (1 day) |
|
|
|
|
### Cache Key Strategy
|
|
|
|
Cache keys combine entity type, MBID, and `inc` parameters:
|
|
|
|
```
|
|
artist:5b11f4ce-a62d-471e-81fc-a69a8278c7da:releases,recordings
|
|
```
|
|
|
|
This ensures different queries for the same entity don't collide.
|
|
|
|
### Per-Extension Caches
|
|
|
|
Each extension maintains its own LRU cache with separate configuration:
|
|
|
|
- `FANART_CACHE_SIZE` / `FANART_CACHE_TTL`
|
|
- `THEAUDIODB_CACHE_SIZE` / `THEAUDIODB_CACHE_TTL`
|
|
- `COVERART_CACHE_SIZE` / `COVERART_CACHE_TTL`
|
|
|
|
## Rate Limiting
|
|
|
|
Custom priority queue implementation ensures API compliance.
|
|
|
|
### MusicBrainz Rate Limits
|
|
|
|
- **Limit**: 5 requests per 5.5 seconds
|
|
- **Strategy**: Token bucket with 5 tokens, refill rate 0.909 tokens/second
|
|
- **Concurrency**: 1 (sequential requests)
|
|
|
|
### Extension Rate Limits
|
|
|
|
- **Limit**: 10 requests per second (default)
|
|
- **Strategy**: Token bucket with 10 tokens, refill rate 10 tokens/second
|
|
- **Concurrency**: 5 (parallel requests)
|
|
|
|
### Priority Queue
|
|
|
|
Requests are queued with priority levels:
|
|
|
|
1. **High**: Lookup queries (direct MBID access)
|
|
2. **Medium**: Browse queries (relationship traversal)
|
|
3. **Low**: Search queries (full-text search)
|
|
|
|
Higher priority requests are processed first when rate limit is reached.
|
|
|
|
### Implementation
|
|
|
|
**Location**: `src/rate-limit.js`
|
|
|
|
```javascript
|
|
class RateLimiter {
|
|
constructor(options) {
|
|
this.tokens = options.limit;
|
|
this.limit = options.limit;
|
|
this.refillRate = options.limit / options.interval;
|
|
this.queue = new PriorityQueue();
|
|
}
|
|
|
|
async acquire(priority = 'medium') {
|
|
if (this.tokens > 0) {
|
|
this.tokens--;
|
|
return Promise.resolve();
|
|
}
|
|
|
|
return new Promise(resolve => {
|
|
this.queue.enqueue({ resolve, priority });
|
|
});
|
|
}
|
|
|
|
refill() {
|
|
this.tokens = Math.min(this.limit, this.tokens + this.refillRate);
|
|
while (this.tokens > 0 && this.queue.length > 0) {
|
|
const { resolve } = this.queue.dequeue();
|
|
this.tokens--;
|
|
resolve();
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## File Structure
|
|
|
|
```
|
|
src/
|
|
├── index.js # Entry point, start() function
|
|
├── schema.js # Schema construction
|
|
├── context.js # Context factory
|
|
├── types/ # Entity type definitions
|
|
│ ├── area.js
|
|
│ ├── artist.js
|
|
│ ├── collection.js
|
|
│ ├── disc.js
|
|
│ ├── event.js
|
|
│ ├── instrument.js
|
|
│ ├── label.js
|
|
│ ├── place.js
|
|
│ ├── recording.js
|
|
│ ├── release.js
|
|
│ ├── release-group.js
|
|
│ ├── series.js
|
|
│ ├── tag.js
|
|
│ ├── track.js
|
|
│ ├── url.js
|
|
│ ├── work.js
|
|
│ └── relationships.js
|
|
├── resolvers/ # Resolver implementations
|
|
│ ├── query.js
|
|
│ └── subquery.js
|
|
├── loaders/ # DataLoader batch functions
|
|
│ └── musicbrainz.js
|
|
├── rate-limit.js # Rate limiter implementation
|
|
├── client.js # Base HTTP client
|
|
└── extensions/ # Built-in extensions
|
|
├── cover-art-archive/
|
|
├── fanart/
|
|
├── mediawiki/
|
|
└── theaudiodb/
|
|
```
|
|
|
|
## Relay Compliance
|
|
|
|
GraphBrainz implements the Relay specification for cursor-based pagination:
|
|
|
|
### Connection Pattern
|
|
|
|
All list fields return connection types:
|
|
|
|
```graphql
|
|
type ArtistConnection {
|
|
edges: [ArtistEdge]
|
|
nodes: [Artist]
|
|
pageInfo: PageInfo!
|
|
totalCount: Int
|
|
}
|
|
|
|
type ArtistEdge {
|
|
node: Artist
|
|
cursor: String!
|
|
}
|
|
|
|
type PageInfo {
|
|
hasNextPage: Boolean!
|
|
hasPreviousPage: Boolean!
|
|
startCursor: String
|
|
endCursor: String
|
|
}
|
|
```
|
|
|
|
### Pagination Arguments
|
|
|
|
- `first: Int` - Number of items to return
|
|
- `after: String` - Cursor for pagination
|
|
- `last: Int` - Number of items from end (not implemented)
|
|
- `before: String` - Cursor for reverse pagination (not implemented)
|
|
|
|
### Node Interface
|
|
|
|
Global object identification via `node(id: ID!)` query:
|
|
|
|
```graphql
|
|
interface Node {
|
|
id: ID!
|
|
}
|
|
```
|
|
|
|
All entity types implement the Node interface with globally unique IDs.
|