Files
metadata-agregator/docs/research/graphbrainz/analysis/CODEBASE.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

18 KiB

GraphBrainz Codebase

Configuration System

GraphBrainz uses environment variables for all configuration.

Core Configuration

Variable Type Default Purpose
NODE_ENV string development Environment mode
PORT number 3000 Server port
GRAPHBRAINZ_PATH string / GraphQL endpoint path
GRAPHBRAINZ_CORS_ORIGIN string/boolean false CORS origin (false, *, or URL)
GRAPHBRAINZ_GRAPHIQL boolean true (dev) Enable GraphiQL interface
GRAPHBRAINZ_EXTENSIONS string - Comma-separated extension list

Cache Configuration

Variable Type Default Purpose
GRAPHBRAINZ_CACHE_SIZE number 8192 LRU cache max items
GRAPHBRAINZ_CACHE_TTL number 86400000 Cache TTL in milliseconds (1 day)

MusicBrainz Configuration

Variable Type Default Purpose
MUSICBRAINZ_BASE_URL string http://musicbrainz.org/ws/2/ MusicBrainz API endpoint

Extension Configuration

Cover Art Archive

Variable Type Default Purpose
COVERART_CACHE_SIZE number 8192 LRU cache max items
COVERART_CACHE_TTL number 86400000 Cache TTL in milliseconds

fanart.tv

Variable Type Default Purpose
FANART_API_KEY string - API authentication (required)
FANART_CACHE_SIZE number 8192 LRU cache max items
FANART_CACHE_TTL number 86400000 Cache TTL in milliseconds

MediaWiki

Variable Type Default Purpose
MEDIAWIKI_CACHE_SIZE number 8192 LRU cache max items
MEDIAWIKI_CACHE_TTL number 86400000 Cache TTL in milliseconds

TheAudioDB

Variable Type Default Purpose
THEAUDIODB_API_KEY string - API authentication (required)
THEAUDIODB_CACHE_SIZE number 8192 LRU cache max items
THEAUDIODB_CACHE_TTL number 86400000 Cache TTL in milliseconds

Configuration Loading

File: src/config.js

import dotenv from 'dotenv';

dotenv.config();

export default {
  port: parseInt(process.env.PORT, 10) || 3000,
  path: process.env.GRAPHBRAINZ_PATH || '/',
  corsOrigin: process.env.GRAPHBRAINZ_CORS_ORIGIN === 'false' 
    ? false 
    : process.env.GRAPHBRAINZ_CORS_ORIGIN || false,
  graphiql: process.env.GRAPHBRAINZ_GRAPHIQL === 'true' 
    || process.env.NODE_ENV === 'development',
  extensions: process.env.GRAPHBRAINZ_EXTENSIONS 
    ? process.env.GRAPHBRAINZ_EXTENSIONS.split(',') 
    : [],
  cache: {
    size: parseInt(process.env.GRAPHBRAINZ_CACHE_SIZE, 10) || 8192,
    ttl: parseInt(process.env.GRAPHBRAINZ_CACHE_TTL, 10) || 86400000
  },
  musicbrainz: {
    baseURL: process.env.MUSICBRAINZ_BASE_URL || 'http://musicbrainz.org/ws/2/'
  }
};

Logging System

GraphBrainz uses the debug package for namespace-based logging.

Debug Namespaces

Namespace Purpose Location
graphbrainz:schema Schema construction src/schema.js
graphbrainz:context Context creation src/context.js
graphbrainz:loaders DataLoader operations src/loaders/*.js
graphbrainz:rate-limit Rate limiter activity src/rate-limit.js
graphbrainz:api/client HTTP requests src/client.js
graphbrainz:extensions:coverart Cover Art Archive src/extensions/cover-art-archive/
graphbrainz:extensions:fanart fanart.tv src/extensions/fanart/
graphbrainz:extensions:mediawiki MediaWiki src/extensions/mediawiki/
graphbrainz:extensions:theaudiodb TheAudioDB src/extensions/theaudiodb/

Enabling Debug Logging

All Namespaces:

DEBUG=graphbrainz:* node cli.js

Specific Namespace:

DEBUG=graphbrainz:api/client node cli.js

Multiple Namespaces:

DEBUG=graphbrainz:schema,graphbrainz:loaders node cli.js

Exclude Namespaces:

DEBUG=graphbrainz:*,-graphbrainz:api/client node cli.js

Debug Output Format

graphbrainz:api/client GET http://musicbrainz.org/ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da +0ms
graphbrainz:loaders Artist loader: batching 3 requests +5ms
graphbrainz:rate-limit Acquired token (4 remaining) +10ms
graphbrainz:extensions:fanart GET http://webservice.fanart.tv/v3/music/5b11f4ce-a62d-471e-81fc-a69a8278c7da +150ms

Implementation

File: src/client.js

import debug from 'debug';

const log = debug('graphbrainz:api/client');

class Client {
  async get(url, options) {
    log(`GET ${url}`);
    const response = await this.client.get(url, options);
    log(`Response: ${response.statusCode}`);
    return response;
  }
}

Error Handling

GraphBrainz implements custom error classes for different failure modes.

Error Class Hierarchy

Error (built-in)
├── GraphBrainzError (base)
│   ├── MusicBrainzError
│   ├── CoverArtArchiveError
│   ├── FanArtError
│   ├── MediaWikiError
│   └── TheAudioDBError
└── ValidationError

Custom Error Classes

File: src/errors.js

import ExtendableError from 'es6-error';

export class GraphBrainzError extends ExtendableError {
  constructor(message, statusCode) {
    super(message);
    this.statusCode = statusCode;
  }
}

export class MusicBrainzError extends GraphBrainzError {
  constructor(message, statusCode) {
    super(message, statusCode);
    this.name = 'MusicBrainzError';
  }
}

export class FanArtError extends GraphBrainzError {
  constructor(message, statusCode) {
    super(message, statusCode);
    this.name = 'FanArtError';
  }
}

export class TheAudioDBError extends GraphBrainzError {
  constructor(message, statusCode) {
    super(message, statusCode);
    this.name = 'TheAudioDBError';
  }
}

export class CoverArtArchiveError extends GraphBrainzError {
  constructor(message, statusCode) {
    super(message, statusCode);
    this.name = 'CoverArtArchiveError';
  }
}

export class ValidationError extends GraphBrainzError {
  constructor(message) {
    super(message, 400);
    this.name = 'ValidationError';
  }
}

Error Handling in Resolvers

async function resolveArtist(parent, args, context) {
  try {
    return await context.loaders.artist.load(args.mbid);
  } catch (error) {
    if (error.statusCode === 404) {
      return null;  // Artist not found
    }
    throw new MusicBrainzError(
      `Failed to fetch artist: ${error.message}`,
      error.statusCode
    );
  }
}

Scalar Validation Errors

File: src/scalars.js

import { GraphQLScalarType } from 'graphql';
import { ValidationError } from './errors.js';

export const MBID = new GraphQLScalarType({
  name: 'MBID',
  description: 'MusicBrainz ID (UUID format)',
  
  serialize(value) {
    return value;
  },
  
  parseValue(value) {
    if (!/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i.test(value)) {
      throw new ValidationError(`Invalid MBID format: ${value}`);
    }
    return value;
  },
  
  parseLiteral(ast) {
    if (ast.kind !== 'StringValue') {
      throw new ValidationError('MBID must be a string');
    }
    return this.parseValue(ast.value);
  }
});

GraphQL Error Formatting

File: src/index.js

import { formatError } from 'graphql';

function customFormatError(error) {
  const formatted = formatError(error);
  
  // Include stack trace in development only
  if (process.env.NODE_ENV === 'development') {
    formatted.stack = error.stack;
  }
  
  // Add custom error code
  if (error.originalError) {
    formatted.extensions = {
      ...formatted.extensions,
      code: error.originalError.name,
      statusCode: error.originalError.statusCode
    };
  }
  
  return formatted;
}

export const middleware = (options) => {
  return expressGraphQL({
    schema,
    context,
    graphiql: options.graphiql,
    customFormatErrorFn: customFormatError
  });
};

Error Response Format

Development:

{
  "errors": [
    {
      "message": "Failed to fetch artist: Network error",
      "locations": [{ "line": 2, "column": 3 }],
      "path": ["lookup", "artist"],
      "extensions": {
        "code": "MusicBrainzError",
        "statusCode": 503
      },
      "stack": "MusicBrainzError: Failed to fetch artist: Network error\n    at resolveArtist (src/resolvers/artist.js:15:11)\n    ..."
    }
  ],
  "data": null
}

Production:

{
  "errors": [
    {
      "message": "Failed to fetch artist: Network error",
      "locations": [{ "line": 2, "column": 3 }],
      "path": ["lookup", "artist"],
      "extensions": {
        "code": "MusicBrainzError",
        "statusCode": 503
      }
    }
  ],
  "data": null
}

Testing Infrastructure

GraphBrainz uses AVA test framework with ava-nock for HTTP mocking.

Test Framework

Tool Purpose Version
AVA Test runner Latest
ava-nock HTTP mocking Latest
c8 Code coverage Latest

Test Configuration

File: package.json

{
  "ava": {
    "files": [
      "test/**/*.test.js"
    ],
    "timeout": "30s",
    "verbose": true,
    "require": [
      "dotenv/config"
    ]
  }
}

HTTP Mocking with ava-nock

ava-nock provides three modes:

Mode Purpose Behavior
play Replay fixtures Use cached HTTP responses
record Record fixtures Make real HTTP requests, save responses
cache Hybrid Use cache if available, record if missing

Configuration:

import test from 'ava';
import nock from 'ava-nock';

test.before(() => {
  nock.setupTests({
    mode: 'play',  // or 'record', 'cache'
    fixtures: 'test/fixtures'
  });
});

Test Fixtures

Location: test/fixtures/*.nock

Format: JSON files containing HTTP request/response pairs

Example: test/fixtures/artist-lookup.nock

[
  {
    "scope": "http://musicbrainz.org:80",
    "method": "GET",
    "path": "/ws/2/artist/5b11f4ce-a62d-471e-81fc-a69a8278c7da?fmt=json",
    "status": 200,
    "response": {
      "id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
      "name": "Radiohead",
      "sort-name": "Radiohead",
      "type": "Group",
      "country": "GB"
    }
  }
]

Test Suite Structure

File: test/schema.test.js (1475+ lines)

import test from 'ava';
import { graphql } from 'graphql';
import { schema, context } from '../src/index.js';

test('lookup artist by MBID', async t => {
  const query = `
    {
      lookup {
        artist(mbid: "5b11f4ce-a62d-471e-81fc-a69a8278c7da") {
          name
          country
        }
      }
    }
  `;
  
  const result = await graphql({
    schema,
    source: query,
    contextValue: context
  });
  
  t.is(result.errors, undefined);
  t.is(result.data.lookup.artist.name, 'Radiohead');
  t.is(result.data.lookup.artist.country, 'GB');
});

test('browse releases by artist', async t => {
  const query = `
    {
      browse {
        releases(artist: "5b11f4ce-a62d-471e-81fc-a69a8278c7da", first: 5) {
          edges {
            node {
              title
            }
          }
          totalCount
        }
      }
    }
  `;
  
  const result = await graphql({
    schema,
    source: query,
    contextValue: context
  });
  
  t.is(result.errors, undefined);
  t.true(result.data.browse.releases.edges.length > 0);
  t.true(result.data.browse.releases.totalCount > 0);
});

test('search artists', async t => {
  const query = `
    {
      search {
        artists(query: "artist:Radiohead", first: 5) {
          edges {
            node {
              name
              score
            }
          }
        }
      }
    }
  `;
  
  const result = await graphql({
    schema,
    source: query,
    contextValue: context
  });
  
  t.is(result.errors, undefined);
  t.true(result.data.search.artists.edges.length > 0);
  t.is(result.data.search.artists.edges[0].node.name, 'Radiohead');
});

Extension Tests

File: test/extensions.test.js

import test from 'ava';
import { graphql } from 'graphql';
import { schema, context } from '../src/index.js';

test('Cover Art Archive extension', async t => {
  const query = `
    {
      lookup {
        release(mbid: "f0c8b1e5-c3b6-46c0-9641-25fd3c00e56a") {
          title
          coverArtArchive {
            front
            images {
              image
              thumbnails {
                large
              }
            }
          }
        }
      }
    }
  `;
  
  const result = await graphql({
    schema,
    source: query,
    contextValue: context
  });
  
  t.is(result.errors, undefined);
  t.true(result.data.lookup.release.coverArtArchive.front);
  t.true(result.data.lookup.release.coverArtArchive.images.length > 0);
});

Test Separation

GraphBrainz separates tests into two categories:

Test File Purpose Lines
test/base-schema.test.js Core schema without extensions ~800
test/extended-schema.test.js Schema with all extensions ~675

Coverage Configuration

File: package.json

{
  "scripts": {
    "test": "c8 ava",
    "coverage": "c8 report --reporter=text-lcov > coverage/lcov.info"
  },
  "c8": {
    "include": [
      "src/**/*.js"
    ],
    "exclude": [
      "test/**/*.js"
    ],
    "reporter": [
      "text",
      "lcov",
      "html"
    ],
    "all": true
  }
}

Coverage Reporting

Services:

Upload:

npm run coverage
npx codecov
npx coveralls < coverage/lcov.info

File Structure

graphbrainz/
├── cli.js                          # CLI entry point
├── package.json                    # NPM package configuration
├── schema.json                     # Schema introspection JSON
├── schema.graphql                  # Schema SDL
├── Procfile                        # Heroku process definition
├── .travis.yml                     # Travis CI configuration
├── .env.example                    # Example environment variables
├── src/
│   ├── index.js                    # Main module exports
│   ├── schema.js                   # Schema construction
│   ├── context.js                  # Context factory
│   ├── config.js                   # Configuration loading
│   ├── client.js                   # Base HTTP client
│   ├── rate-limit.js               # Rate limiter implementation
│   ├── errors.js                   # Custom error classes
│   ├── scalars.js                  # Custom scalar types
│   ├── types/                      # Entity type definitions
│   │   ├── area.js
│   │   ├── artist.js
│   │   ├── collection.js
│   │   ├── disc.js
│   │   ├── event.js
│   │   ├── instrument.js
│   │   ├── label.js
│   │   ├── place.js
│   │   ├── recording.js
│   │   ├── release.js
│   │   ├── release-group.js
│   │   ├── series.js
│   │   ├── tag.js
│   │   ├── track.js
│   │   ├── url.js
│   │   ├── work.js
│   │   └── relationships.js
│   ├── resolvers/                  # Resolver implementations
│   │   ├── query.js
│   │   └── subquery.js
│   ├── loaders/                    # DataLoader batch functions
│   │   └── musicbrainz.js
│   └── extensions/                 # Built-in extensions
│       ├── cover-art-archive/
│       │   ├── index.js
│       │   ├── client.js
│       │   └── schema.js
│       ├── fanart/
│       │   ├── index.js
│       │   ├── client.js
│       │   └── schema.js
│       ├── mediawiki/
│       │   ├── index.js
│       │   ├── client.js
│       │   └── schema.js
│       └── theaudiodb/
│           ├── index.js
│           ├── client.js
│           └── schema.js
├── test/
│   ├── base-schema.test.js         # Core schema tests (~800 lines)
│   ├── extended-schema.test.js     # Extension tests (~675 lines)
│   └── fixtures/                   # HTTP mock fixtures
│       ├── artist-lookup.nock
│       ├── release-browse.nock
│       ├── artist-search.nock
│       └── ...
├── scripts/
│   ├── deploy.sh                   # Heroku deployment script
│   ├── generate-readme-toc.js      # README table of contents
│   ├── generate-schema-docs.js     # Schema documentation
│   ├── generate-type-docs.js       # Type documentation
│   └── generate-extension-docs.js  # Extension documentation
├── docs/                           # Generated documentation
│   ├── schema.md
│   ├── types.md
│   └── extensions.md
└── coverage/                       # Code coverage reports
    ├── lcov.info
    └── index.html

Code Metrics

Metric Value
Total Lines ~5000
Entity Types 17
Type Definitions ~2000 lines
Test Suite 1475+ lines
Extensions 4 built-in
Dependencies 10 core

No Metrics/APM

GraphBrainz does not include:

  • Prometheus metrics
  • StatsD integration
  • APM (Application Performance Monitoring)
  • Health check endpoints
  • Readiness probes
  • Liveness probes

These would need to be added for production observability.

No Structured Logging

GraphBrainz uses debug package for logging, which is:

  • Namespace-based (good)
  • Opt-in via DEBUG env var (good)
  • Plain text output (not structured)
  • No log levels (only on/off per namespace)
  • No log aggregation support

For production, consider migrating to structured logging:

import pino from 'pino';

const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    level: (label) => ({ level: label })
  }
});

logger.info({ mbid: '...', duration: 150 }, 'Artist lookup completed');