feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
@@ -0,0 +1,499 @@
# GraphBrainz Architecture
## Schema Construction Strategy
GraphBrainz employs a hybrid schema construction approach:
- **Core Schema**: Programmatic construction using GraphQL.js constructors
- **Extensions**: SDL (Schema Definition Language) strings merged via `extendSchema()`
This strategy provides type safety and runtime flexibility for the core while allowing extensions to use the more ergonomic SDL syntax.
### Why Programmatic Construction?
| Benefit | Description |
|---------|-------------|
| Type Safety | Compile-time validation of schema structure |
| Dynamic Fields | Runtime field generation based on configuration |
| AST Inspection | Direct access to GraphQL AST for resolver optimization |
| Extension Points | Programmatic hooks for schema modification |
## Entity Type System
GraphBrainz defines 17 entity types in `src/types/` (~2000 lines of code):
| Entity Type | File Path | Purpose |
|-------------|-----------|---------|
| Area | src/types/area.js | Geographic regions |
| Artist | src/types/artist.js | Musicians and groups |
| Collection | src/types/collection.js | User-curated lists |
| Disc | src/types/disc.js | Physical media |
| Event | src/types/event.js | Concerts and performances |
| Instrument | src/types/instrument.js | Musical instruments |
| Label | src/types/label.js | Record labels |
| Place | src/types/place.js | Venues and locations |
| Recording | src/types/recording.js | Audio recordings |
| Release | src/types/release.js | Album releases |
| ReleaseGroup | src/types/release-group.js | Release groupings |
| Series | src/types/series.js | Ordered collections |
| Tag | src/types/tag.js | User-generated tags |
| Track | src/types/track.js | Individual tracks |
| URL | src/types/url.js | External links |
| Work | src/types/work.js | Musical compositions |
| Relationships | src/types/relationships.js | Entity connections |
Each type file exports a GraphQL object type with field definitions, resolvers, and relationship mappings.
## Query Type Hierarchy
GraphBrainz exposes four primary query patterns:
### 1. Lookup Queries
Direct entity retrieval by MusicBrainz ID (MBID).
**Supported Entities**: 13 types
```
lookup {
area(mbid: String!)
artist(mbid: String!)
collection(mbid: String!)
event(mbid: String!)
instrument(mbid: String!)
label(mbid: String!)
place(mbid: String!)
recording(mbid: String!)
release(mbid: String!)
releaseGroup(mbid: String!)
series(mbid: String!)
url(mbid: String!)
work(mbid: String!)
}
```
### 2. Browse Queries
Retrieve entities linked to a parent entity with cursor-based pagination.
**Supported Entities**: 9 types
```
browse {
areas(collection: String, first: Int, after: String)
artists(area: String, collection: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
collections(area: String, artist: String, editor: String, event: String, label: String, place: String, recording: String, release: String, releaseGroup: String, work: String, first: Int, after: String)
events(area: String, artist: String, collection: String, place: String, first: Int, after: String)
labels(area: String, collection: String, release: String, first: Int, after: String)
places(area: String, collection: String, first: Int, after: String)
recordings(artist: String, collection: String, release: String, first: Int, after: String)
releases(area: String, artist: String, collection: String, label: String, recording: String, releaseGroup: String, track: String, trackArtist: String, first: Int, after: String)
releaseGroups(artist: String, collection: String, release: String, first: Int, after: String)
}
```
### 3. Search Queries
Lucene-based full-text search across entity types.
**Supported Entities**: 10 types
```
search {
areas(query: String!, first: Int, after: String)
artists(query: String!, first: Int, after: String)
events(query: String!, first: Int, after: String)
instruments(query: String!, first: Int, after: String)
labels(query: String!, first: Int, after: String)
places(query: String!, first: Int, after: String)
recordings(query: String!, first: Int, after: String)
releases(query: String!, first: Int, after: String)
releaseGroups(query: String!, first: Int, after: String)
works(query: String!, first: Int, after: String)
}
```
### 4. Node Query (Relay)
Global object identification via Relay-compliant node interface.
```
node(id: ID!)
```
## Resolver Architecture
GraphBrainz implements a three-tier resolver structure:
### Tier 1: Query Resolvers
Entry points for lookup, browse, search, and node queries. Responsibilities:
- Validate input parameters
- Construct MusicBrainz API URLs
- Delegate to DataLoader
- Return raw API responses
**Location**: `src/resolvers/query.js`
### Tier 2: Field Resolvers
Resolve individual fields on entity types. Responsibilities:
- Extract field values from parent object
- Trigger subqueries for related entities
- Apply field-level transformations
- Handle null/undefined cases
**Location**: `src/types/*.js` (per entity type)
### Tier 3: Subquery Resolvers
Handle nested entity relationships. Responsibilities:
- Inspect GraphQL AST for required fields
- Determine MusicBrainz `inc` parameters
- Batch related entity requests
- Resolve circular dependencies
**Location**: `src/resolvers/subquery.js`
## AST Inspection for Query Optimization
GraphBrainz resolvers inspect the GraphQL AST to determine which MusicBrainz `inc` parameters are needed. This eliminates over-fetching and under-fetching.
### Example
**GraphQL Query**:
```graphql
{
lookup {
artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") {
name
releases {
title
date
}
}
}
}
```
**AST Inspection Result**:
- Detects `releases` field in selection set
- Adds `inc=releases` to MusicBrainz API request
- Avoids fetching recordings, works, or other unneeded relationships
**MusicBrainz API Call**:
```
GET /ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?inc=releases
```
### Implementation
AST inspection occurs in resolver functions via `info.fieldNodes`:
```javascript
function resolveArtist(parent, args, context, info) {
const selections = info.fieldNodes[0].selectionSet.selections;
const inc = [];
for (const selection of selections) {
if (selection.name.value === 'releases') {
inc.push('releases');
}
if (selection.name.value === 'recordings') {
inc.push('recordings');
}
}
return context.loaders.artist.load({ mbid: args.mbid, inc });
}
```
## Extension System
Extensions modify the schema and context in two phases:
### Phase 1: Context Extension
Extensions add custom HTTP clients, DataLoaders, and caches to the GraphQL context.
**Interface**:
```javascript
{
extendContext(context, options) {
return {
...context,
[extensionName]: {
client: new ExtensionClient(options),
loader: new DataLoader(batchFn),
cache: new LRUCache(options)
}
};
}
}
```
### Phase 2: Schema Extension
Extensions add fields to existing types or define new types via SDL.
**Interface**:
```javascript
{
extendSchema(schema, options) {
const typeDefs = `
extend type Artist {
fanArt: FanArtImages
}
type FanArtImages {
backgrounds: [FanArtImage]
logos: [FanArtImage]
}
`;
const resolvers = {
Artist: {
fanArt(artist, args, context) {
return context.fanart.loader.load(artist.id);
}
}
};
return extendSchema(schema, { typeDefs, resolvers });
}
}
```
### Extension Loading
Extensions are loaded via environment variable or programmatic options:
**Environment Variable**:
```bash
GRAPHBRAINZ_EXTENSIONS="cover-art-archive,fanart,mediawiki,theaudiodb"
```
**Programmatic**:
```javascript
import { middleware } from 'graphbrainz';
import lastfm from 'graphbrainz-extension-lastfm';
app.use('/graphql', middleware({
extensions: [lastfm]
}));
```
## DataLoader Integration
GraphBrainz uses DataLoader for request batching and deduplication.
### Per-Request Batching
Each GraphQL request receives a fresh DataLoader instance. This ensures:
- Requests within a single query are batched
- Duplicate requests are deduplicated
- Cache is scoped to request lifecycle
### Batch Functions
Each entity type has a batch function that:
1. Receives array of keys (MBIDs or query parameters)
2. Groups keys by API endpoint
3. Makes batched HTTP requests
4. Returns array of results in same order as keys
**Example**:
```javascript
async function batchArtists(keys) {
const results = await Promise.all(
keys.map(key =>
got(`/ws/2/artist/${key.mbid}?inc=${key.inc.join(',')}`)
)
);
return results.map(r => r.body);
}
const artistLoader = new DataLoader(batchArtists);
```
## LRU Cache Layer
Shared LRU cache sits above DataLoader for cross-request caching.
### Configuration
| Parameter | Environment Variable | Default |
|-----------|---------------------|---------|
| Size | GRAPHBRAINZ_CACHE_SIZE | 8192 items |
| TTL | GRAPHBRAINZ_CACHE_TTL | 86400000 ms (1 day) |
### Cache Key Strategy
Cache keys combine entity type, MBID, and `inc` parameters:
```
artist:5b11f4ce-a62d-471e-81fc-a69a8278c7da:releases,recordings
```
This ensures different queries for the same entity don't collide.
### Per-Extension Caches
Each extension maintains its own LRU cache with separate configuration:
- `FANART_CACHE_SIZE` / `FANART_CACHE_TTL`
- `THEAUDIODB_CACHE_SIZE` / `THEAUDIODB_CACHE_TTL`
- `COVERART_CACHE_SIZE` / `COVERART_CACHE_TTL`
## Rate Limiting
Custom priority queue implementation ensures API compliance.
### MusicBrainz Rate Limits
- **Limit**: 5 requests per 5.5 seconds
- **Strategy**: Token bucket with 5 tokens, refill rate 0.909 tokens/second
- **Concurrency**: 1 (sequential requests)
### Extension Rate Limits
- **Limit**: 10 requests per second (default)
- **Strategy**: Token bucket with 10 tokens, refill rate 10 tokens/second
- **Concurrency**: 5 (parallel requests)
### Priority Queue
Requests are queued with priority levels:
1. **High**: Lookup queries (direct MBID access)
2. **Medium**: Browse queries (relationship traversal)
3. **Low**: Search queries (full-text search)
Higher priority requests are processed first when rate limit is reached.
### Implementation
**Location**: `src/rate-limit.js`
```javascript
class RateLimiter {
constructor(options) {
this.tokens = options.limit;
this.limit = options.limit;
this.refillRate = options.limit / options.interval;
this.queue = new PriorityQueue();
}
async acquire(priority = 'medium') {
if (this.tokens > 0) {
this.tokens--;
return Promise.resolve();
}
return new Promise(resolve => {
this.queue.enqueue({ resolve, priority });
});
}
refill() {
this.tokens = Math.min(this.limit, this.tokens + this.refillRate);
while (this.tokens > 0 && this.queue.length > 0) {
const { resolve } = this.queue.dequeue();
this.tokens--;
resolve();
}
}
}
```
## File Structure
```
src/
├── index.js # Entry point, start() function
├── schema.js # Schema construction
├── context.js # Context factory
├── types/ # Entity type definitions
│ ├── area.js
│ ├── artist.js
│ ├── collection.js
│ ├── disc.js
│ ├── event.js
│ ├── instrument.js
│ ├── label.js
│ ├── place.js
│ ├── recording.js
│ ├── release.js
│ ├── release-group.js
│ ├── series.js
│ ├── tag.js
│ ├── track.js
│ ├── url.js
│ ├── work.js
│ └── relationships.js
├── resolvers/ # Resolver implementations
│ ├── query.js
│ └── subquery.js
├── loaders/ # DataLoader batch functions
│ └── musicbrainz.js
├── rate-limit.js # Rate limiter implementation
├── client.js # Base HTTP client
└── extensions/ # Built-in extensions
├── cover-art-archive/
├── fanart/
├── mediawiki/
└── theaudiodb/
```
## Relay Compliance
GraphBrainz implements the Relay specification for cursor-based pagination:
### Connection Pattern
All list fields return connection types:
```graphql
type ArtistConnection {
edges: [ArtistEdge]
nodes: [Artist]
pageInfo: PageInfo!
totalCount: Int
}
type ArtistEdge {
node: Artist
cursor: String!
}
type PageInfo {
hasNextPage: Boolean!
hasPreviousPage: Boolean!
startCursor: String
endCursor: String
}
```
### Pagination Arguments
- `first: Int` - Number of items to return
- `after: String` - Cursor for pagination
- `last: Int` - Number of items from end (not implemented)
- `before: String` - Cursor for reverse pagination (not implemented)
### Node Interface
Global object identification via `node(id: ID!)` query:
```graphql
interface Node {
id: ID!
}
```
All entity types implement the Node interface with globally unique IDs.