feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,499 @@
|
||||
# GraphBrainz Architecture
|
||||
|
||||
## Schema Construction Strategy
|
||||
|
||||
GraphBrainz employs a hybrid schema construction approach:
|
||||
|
||||
- **Core Schema**: Programmatic construction using GraphQL.js constructors
|
||||
- **Extensions**: SDL (Schema Definition Language) strings merged via `extendSchema()`
|
||||
|
||||
This strategy provides type safety and runtime flexibility for the core while allowing extensions to use the more ergonomic SDL syntax.
|
||||
|
||||
### Why Programmatic Construction?
|
||||
|
||||
| Benefit | Description |
|
||||
|---------|-------------|
|
||||
| Type Safety | Compile-time validation of schema structure |
|
||||
| Dynamic Fields | Runtime field generation based on configuration |
|
||||
| AST Inspection | Direct access to GraphQL AST for resolver optimization |
|
||||
| Extension Points | Programmatic hooks for schema modification |
|
||||
|
||||
## Entity Type System
|
||||
|
||||
GraphBrainz defines 17 entity types in `src/types/` (~2000 lines of code):
|
||||
|
||||
| Entity Type | File Path | Purpose |
|
||||
|-------------|-----------|---------|
|
||||
| Area | src/types/area.js | Geographic regions |
|
||||
| Artist | src/types/artist.js | Musicians and groups |
|
||||
| Collection | src/types/collection.js | User-curated lists |
|
||||
| Disc | src/types/disc.js | Physical media |
|
||||
| Event | src/types/event.js | Concerts and performances |
|
||||
| Instrument | src/types/instrument.js | Musical instruments |
|
||||
| Label | src/types/label.js | Record labels |
|
||||
| Place | src/types/place.js | Venues and locations |
|
||||
| Recording | src/types/recording.js | Audio recordings |
|
||||
| Release | src/types/release.js | Album releases |
|
||||
| ReleaseGroup | src/types/release-group.js | Release groupings |
|
||||
| Series | src/types/series.js | Ordered collections |
|
||||
| Tag | src/types/tag.js | User-generated tags |
|
||||
| Track | src/types/track.js | Individual tracks |
|
||||
| URL | src/types/url.js | External links |
|
||||
| Work | src/types/work.js | Musical compositions |
|
||||
| Relationships | src/types/relationships.js | Entity connections |
|
||||
|
||||
Each type file exports a GraphQL object type with field definitions, resolvers, and relationship mappings.
|
||||
|
||||
## Query Type Hierarchy
|
||||
|
||||
GraphBrainz exposes four primary query patterns:
|
||||
|
||||
### 1. Lookup Queries
|
||||
|
||||
Direct entity retrieval by MusicBrainz ID (MBID).
|
||||
|
||||
**Supported Entities**: 13 types
|
||||
|
||||
```
|
||||
lookup {
|
||||
area(mbid: String!)
|
||||
artist(mbid: String!)
|
||||
collection(mbid: String!)
|
||||
event(mbid: String!)
|
||||
instrument(mbid: String!)
|
||||
label(mbid: String!)
|
||||
place(mbid: String!)
|
||||
recording(mbid: String!)
|
||||
release(mbid: String!)
|
||||
releaseGroup(mbid: String!)
|
||||
series(mbid: String!)
|
||||
url(mbid: String!)
|
||||
work(mbid: String!)
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Browse Queries
|
||||
|
||||
Retrieve entities linked to a parent entity with cursor-based pagination.
|
||||
|
||||
**Supported Entities**: 9 types
|
||||
|
||||
```
|
||||
browse {
|
||||
areas(collection: String, first: Int, after: String)
|
||||
artists(area: String, collection: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
|
||||
collections(area: String, artist: String, editor: String, event: String, label: String, place: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
|
||||
events(area: String, artist: String, collection: String, place: String, first: Int, after: String)
|
||||
labels(area: String, collection: String, release: String, first: Int, after: String)
|
||||
places(area: String, collection: String, first: Int, after: String)
|
||||
recordings(artist: String, collection: String, release: String, first: Int, after: String)
|
||||
releases(area: String, artist: String, collection: String, label: String, recording: String, releaseGroup: String, track: String, trackArtist: String, first: Int, after: String)
|
||||
releaseGroups(artist: String, collection: String, release: String, first: Int, after: String)
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Search Queries
|
||||
|
||||
Lucene-based full-text search across entity types.
|
||||
|
||||
**Supported Entities**: 10 types
|
||||
|
||||
```
|
||||
search {
|
||||
areas(query: String!, first: Int, after: String)
|
||||
artists(query: String!, first: Int, after: String)
|
||||
events(query: String!, first: Int, after: String)
|
||||
instruments(query: String!, first: Int, after: String)
|
||||
labels(query: String!, first: Int, after: String)
|
||||
places(query: String!, first: Int, after: String)
|
||||
recordings(query: String!, first: Int, after: String)
|
||||
releases(query: String!, first: Int, after: String)
|
||||
releaseGroups(query: String!, first: Int, after: String)
|
||||
works(query: String!, first: Int, after: String)
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Node Query (Relay)
|
||||
|
||||
Global object identification via Relay-compliant node interface.
|
||||
|
||||
```
|
||||
node(id: ID!)
|
||||
```
|
||||
|
||||
## Resolver Architecture
|
||||
|
||||
GraphBrainz implements a three-tier resolver structure:
|
||||
|
||||
### Tier 1: Query Resolvers
|
||||
|
||||
Entry points for lookup, browse, search, and node queries. Responsibilities:
|
||||
|
||||
- Validate input parameters
|
||||
- Construct MusicBrainz API URLs
|
||||
- Delegate to DataLoader
|
||||
- Return raw API responses
|
||||
|
||||
**Location**: `src/resolvers/query.js`
|
||||
|
||||
### Tier 2: Field Resolvers
|
||||
|
||||
Resolve individual fields on entity types. Responsibilities:
|
||||
|
||||
- Extract field values from parent object
|
||||
- Trigger subqueries for related entities
|
||||
- Apply field-level transformations
|
||||
- Handle null/undefined cases
|
||||
|
||||
**Location**: `src/types/*.js` (per entity type)
|
||||
|
||||
### Tier 3: Subquery Resolvers
|
||||
|
||||
Handle nested entity relationships. Responsibilities:
|
||||
|
||||
- Inspect GraphQL AST for required fields
|
||||
- Determine MusicBrainz `inc` parameters
|
||||
- Batch related entity requests
|
||||
- Resolve circular dependencies
|
||||
|
||||
**Location**: `src/resolvers/subquery.js`
|
||||
|
||||
## AST Inspection for Query Optimization
|
||||
|
||||
GraphBrainz resolvers inspect the GraphQL AST to determine which MusicBrainz `inc` parameters are needed. This eliminates over-fetching and under-fetching.
|
||||
|
||||
### Example
|
||||
|
||||
**GraphQL Query**:
|
||||
```graphql
|
||||
{
|
||||
lookup {
|
||||
artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") {
|
||||
name
|
||||
releases {
|
||||
title
|
||||
date
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**AST Inspection Result**:
|
||||
- Detects `releases` field in selection set
|
||||
- Adds `inc=releases` to MusicBrainz API request
|
||||
- Avoids fetching recordings, works, or other unneeded relationships
|
||||
|
||||
**MusicBrainz API Call**:
|
||||
```
|
||||
GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases
|
||||
```
|
||||
|
||||
### Implementation
|
||||
|
||||
AST inspection occurs in resolver functions via `info.fieldNodes`:
|
||||
|
||||
```javascript
|
||||
function resolveArtist(parent, args, context, info) {
|
||||
const selections = info.fieldNodes[0].selectionSet.selections;
|
||||
const inc = [];
|
||||
|
||||
for (const selection of selections) {
|
||||
if (selection.name.value === 'releases') {
|
||||
inc.push('releases');
|
||||
}
|
||||
if (selection.name.value === 'recordings') {
|
||||
inc.push('recordings');
|
||||
}
|
||||
}
|
||||
|
||||
return context.loaders.artist.load({ mbid: args.mbid, inc });
|
||||
}
|
||||
```
|
||||
|
||||
## Extension System
|
||||
|
||||
Extensions modify the schema and context in two phases:
|
||||
|
||||
### Phase 1: Context Extension
|
||||
|
||||
Extensions add custom HTTP clients, DataLoaders, and caches to the GraphQL context.
|
||||
|
||||
**Interface**:
|
||||
```javascript
|
||||
{
|
||||
extendContext(context, options) {
|
||||
return {
|
||||
...context,
|
||||
[extensionName]: {
|
||||
client: new ExtensionClient(options),
|
||||
loader: new DataLoader(batchFn),
|
||||
cache: new LRUCache(options)
|
||||
}
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Phase 2: Schema Extension
|
||||
|
||||
Extensions add fields to existing types or define new types via SDL.
|
||||
|
||||
**Interface**:
|
||||
```javascript
|
||||
{
|
||||
extendSchema(schema, options) {
|
||||
const typeDefs = `
|
||||
extend type Artist {
|
||||
fanArt: FanArtImages
|
||||
}
|
||||
|
||||
type FanArtImages {
|
||||
backgrounds: [FanArtImage]
|
||||
logos: [FanArtImage]
|
||||
}
|
||||
`;
|
||||
|
||||
const resolvers = {
|
||||
Artist: {
|
||||
fanArt(artist, args, context) {
|
||||
return context.fanart.loader.load(artist.id);
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
return extendSchema(schema, { typeDefs, resolvers });
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Extension Loading
|
||||
|
||||
Extensions are loaded via environment variable or programmatic options:
|
||||
|
||||
**Environment Variable**:
|
||||
```bash
|
||||
GRAPHBRAINZ_EXTENSIONS="cover-art-archive,fanart,mediawiki,theaudiodb"
|
||||
```
|
||||
|
||||
**Programmatic**:
|
||||
```javascript
|
||||
import { middleware } from 'graphbrainz';
|
||||
import lastfm from 'graphbrainz-extension-lastfm';
|
||||
|
||||
app.use('/graphql', middleware({
|
||||
extensions: [lastfm]
|
||||
}));
|
||||
```
|
||||
|
||||
## DataLoader Integration
|
||||
|
||||
GraphBrainz uses DataLoader for request batching and deduplication.
|
||||
|
||||
### Per-Request Batching
|
||||
|
||||
Each GraphQL request receives a fresh DataLoader instance. This ensures:
|
||||
|
||||
- Requests within a single query are batched
|
||||
- Duplicate requests are deduplicated
|
||||
- Cache is scoped to request lifecycle
|
||||
|
||||
### Batch Functions
|
||||
|
||||
Each entity type has a batch function that:
|
||||
|
||||
1. Receives array of keys (MBIDs or query parameters)
|
||||
2. Groups keys by API endpoint
|
||||
3. Makes batched HTTP requests
|
||||
4. Returns array of results in same order as keys
|
||||
|
||||
**Example**:
|
||||
```javascript
|
||||
async function batchArtists(keys) {
|
||||
const results = await Promise.all(
|
||||
keys.map(key =>
|
||||
got(`/ws/2/artist/${key.mbid}?inc=${key.inc.join(',')}`)
|
||||
)
|
||||
);
|
||||
return results.map(r => r.body);
|
||||
}
|
||||
|
||||
const artistLoader = new DataLoader(batchArtists);
|
||||
```
|
||||
|
||||
## LRU Cache Layer
|
||||
|
||||
Shared LRU cache sits above DataLoader for cross-request caching.
|
||||
|
||||
### Configuration
|
||||
|
||||
| Parameter | Environment Variable | Default |
|
||||
|-----------|---------------------|---------|
|
||||
| Size | GRAPHBRAINZ_CACHE_SIZE | 8192 items |
|
||||
| TTL | GRAPHBRAINZ_CACHE_TTL | 86400000 ms (1 day) |
|
||||
|
||||
### Cache Key Strategy
|
||||
|
||||
Cache keys combine entity type, MBID, and `inc` parameters:
|
||||
|
||||
```
|
||||
artist:5b11f4ce-a62d-471e-81fc-a69a8278c7da:releases,recordings
|
||||
```
|
||||
|
||||
This ensures different queries for the same entity don't collide.
|
||||
|
||||
### Per-Extension Caches
|
||||
|
||||
Each extension maintains its own LRU cache with separate configuration:
|
||||
|
||||
- `FANART_CACHE_SIZE` / `FANART_CACHE_TTL`
|
||||
- `THEAUDIODB_CACHE_SIZE` / `THEAUDIODB_CACHE_TTL`
|
||||
- `COVERART_CACHE_SIZE` / `COVERART_CACHE_TTL`
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
Custom priority queue implementation ensures API compliance.
|
||||
|
||||
### MusicBrainz Rate Limits
|
||||
|
||||
- **Limit**: 5 requests per 5.5 seconds
|
||||
- **Strategy**: Token bucket with 5 tokens, refill rate 0.909 tokens/second
|
||||
- **Concurrency**: 1 (sequential requests)
|
||||
|
||||
### Extension Rate Limits
|
||||
|
||||
- **Limit**: 10 requests per second (default)
|
||||
- **Strategy**: Token bucket with 10 tokens, refill rate 10 tokens/second
|
||||
- **Concurrency**: 5 (parallel requests)
|
||||
|
||||
### Priority Queue
|
||||
|
||||
Requests are queued with priority levels:
|
||||
|
||||
1. **High**: Lookup queries (direct MBID access)
|
||||
2. **Medium**: Browse queries (relationship traversal)
|
||||
3. **Low**: Search queries (full-text search)
|
||||
|
||||
Higher priority requests are processed first when rate limit is reached.
|
||||
|
||||
### Implementation
|
||||
|
||||
**Location**: `src/rate-limit.js`
|
||||
|
||||
```javascript
|
||||
class RateLimiter {
|
||||
constructor(options) {
|
||||
this.tokens = options.limit;
|
||||
this.limit = options.limit;
|
||||
this.refillRate = options.limit / options.interval;
|
||||
this.queue = new PriorityQueue();
|
||||
}
|
||||
|
||||
async acquire(priority = 'medium') {
|
||||
if (this.tokens > 0) {
|
||||
this.tokens--;
|
||||
return Promise.resolve();
|
||||
}
|
||||
|
||||
return new Promise(resolve => {
|
||||
this.queue.enqueue({ resolve, priority });
|
||||
});
|
||||
}
|
||||
|
||||
refill() {
|
||||
this.tokens = Math.min(this.limit, this.tokens + this.refillRate);
|
||||
while (this.tokens > 0 && this.queue.length > 0) {
|
||||
const { resolve } = this.queue.dequeue();
|
||||
this.tokens--;
|
||||
resolve();
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── index.js # Entry point, start() function
|
||||
├── schema.js # Schema construction
|
||||
├── context.js # Context factory
|
||||
├── types/ # Entity type definitions
|
||||
│ ├── area.js
|
||||
│ ├── artist.js
|
||||
│ ├── collection.js
|
||||
│ ├── disc.js
|
||||
│ ├── event.js
|
||||
│ ├── instrument.js
|
||||
│ ├── label.js
|
||||
│ ├── place.js
|
||||
│ ├── recording.js
|
||||
│ ├── release.js
|
||||
│ ├── release-group.js
|
||||
│ ├── series.js
|
||||
│ ├── tag.js
|
||||
│ ├── track.js
|
||||
│ ├── url.js
|
||||
│ ├── work.js
|
||||
│ └── relationships.js
|
||||
├── resolvers/ # Resolver implementations
|
||||
│ ├── query.js
|
||||
│ └── subquery.js
|
||||
├── loaders/ # DataLoader batch functions
|
||||
│ └── musicbrainz.js
|
||||
├── rate-limit.js # Rate limiter implementation
|
||||
├── client.js # Base HTTP client
|
||||
└── extensions/ # Built-in extensions
|
||||
├── cover-art-archive/
|
||||
├── fanart/
|
||||
├── mediawiki/
|
||||
└── theaudiodb/
|
||||
```
|
||||
|
||||
## Relay Compliance
|
||||
|
||||
GraphBrainz implements the Relay specification for cursor-based pagination:
|
||||
|
||||
### Connection Pattern
|
||||
|
||||
All list fields return connection types:
|
||||
|
||||
```graphql
|
||||
type ArtistConnection {
|
||||
edges: [ArtistEdge]
|
||||
nodes: [Artist]
|
||||
pageInfo: PageInfo!
|
||||
totalCount: Int
|
||||
}
|
||||
|
||||
type ArtistEdge {
|
||||
node: Artist
|
||||
cursor: String!
|
||||
}
|
||||
|
||||
type PageInfo {
|
||||
hasNextPage: Boolean!
|
||||
hasPreviousPage: Boolean!
|
||||
startCursor: String
|
||||
endCursor: String
|
||||
}
|
||||
```
|
||||
|
||||
### Pagination Arguments
|
||||
|
||||
- `first: Int` - Number of items to return
|
||||
- `after: String` - Cursor for pagination
|
||||
- `last: Int` - Number of items from end (not implemented)
|
||||
- `before: String` - Cursor for reverse pagination (not implemented)
|
||||
|
||||
### Node Interface
|
||||
|
||||
Global object identification via `node(id: ID!)` query:
|
||||
|
||||
```graphql
|
||||
interface Node {
|
||||
id: ID!
|
||||
}
|
||||
```
|
||||
|
||||
All entity types implement the Node interface with globally unique IDs.
|
||||
Reference in New Issue
Block a user