Files
metadata-agregator/docs/research/harmony/analysis/ARCHITECTURE.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

24 KiB

Harmony - Architecture Analysis

System Architecture Overview

Harmony implements a 4-stage pipeline architecture for metadata aggregation and harmonization:

┌──────────┐     ┌────────────┐     ┌───────┐     ┌──────┐
│  LOOKUP  │ --> │ HARMONIZE  │ --> │ MERGE │ --> │ SEED │
└──────────┘     └────────────┘     └───────┘     └──────┘
     │                 │                 │             │
  Parallel         Provider          3-phase      MusicBrainz
  Multi-source     Conversion        Merge        Format
  Queries          to Harmony        Algorithm    Conversion

Each stage has distinct responsibilities and operates on well-defined data structures.

Stage 1: LOOKUP

CombinedReleaseLookup

The entry point for all metadata retrieval operations.

Location: harmonizer/combined_lookup.ts

Responsibilities:

  • Accepts GTIN, URLs, or provider-specific IDs
  • Determines which providers to query based on input
  • Executes provider lookups in parallel
  • Handles provider failures gracefully via Promise.allSettled
  • Returns array of provider-specific release objects

Input Types:

interface LookupInput {
	gtin?: string;           // Global Trade Item Number (barcode)
	urls?: string[];         // Provider URLs
	region?: string[];       // Market regions (e.g., ['GB', 'US', 'JP'])
	category?: string;       // Provider category filter
	providerIds?: Record<string, string>; // Provider-specific IDs
}

Parallel Execution:

// Conceptual flow
const lookupPromises = providers.map(provider => 
	provider.lookup(input).catch(error => ({ error }))
);
const results = await Promise.allSettled(lookupPromises);

Output: Array of provider-native release objects (Spotify, Deezer, iTunes formats, etc.)

Provider Selection Logic

  1. URL-based: Extract provider from URL pattern matching
  2. GTIN-based: Query all providers supporting GTIN lookup
  3. Category filtering: Apply user preferences (all/default/preferred)
  4. Region filtering: Pass region codes to region-aware providers

Stage 2: HARMONIZE

Provider Conversion

Each provider implements a harmonize() method that converts its native format to HarmonyRelease.

Location: Individual provider files in providers/

Conversion Responsibilities:

  • Map provider-specific field names to Harmony schema
  • Normalize data types (dates, durations, ISRCs)
  • Extract nested structures (artists, labels, media)
  • Detect language and script from metadata
  • Resolve release types (album, single, EP, etc.)
  • Extract external links and identifiers

Example Provider Conversion (conceptual):

class SpotifyProvider extends MetadataApiProvider {
	harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease {
		return {
			title: spotifyAlbum.name,
			artists: this.convertArtists(spotifyAlbum.artists),
			gtin: spotifyAlbum.external_ids?.upc,
			media: this.convertTracks(spotifyAlbum.tracks),
			releaseDate: this.parseDate(spotifyAlbum.release_date),
			images: this.convertImages(spotifyAlbum.images),
			externalLinks: [{
				url: spotifyAlbum.external_urls.spotify,
				types: ['streaming']
			}],
			// ... additional fields
		};
	}
}

HarmonyRelease Schema

Location: harmonizer/types.ts (273 lines)

Core Structure:

interface HarmonyRelease {
	// Basic metadata
	title: string;
	artists: ArtistCreditName[];
	gtin?: string;
	
	// Media and tracks
	media: HarmonyMedium[];
	
	// Release details
	language?: string;
	script?: string;
	status?: ReleaseStatus;
	types: ReleaseType[];
	releaseDate?: PartialDate;
	
	// Commercial info
	labels: Label[];
	packaging?: PackagingType;
	copyright?: string;
	
	// Distribution
	availableIn?: string[];      // Country codes
	excludedFrom?: string[];     // Country codes
	
	// Visual assets
	images: Image[];
	
	// Links and identifiers
	externalLinks: ExternalLink[];
	
	// Metadata about metadata
	info: {
		providers: string[];           // Which providers contributed
		messages: Message[];           // Warnings, errors
		sourceMap?: SourceMap;         // Property -> provider mapping
		incompatibleData?: IncompatibilityInfo;
	};
}

Key Sub-structures:

ArtistCreditName

interface ArtistCreditName {
	name: string;              // Display name
	creditedName?: string;     // Alternative credit
	joinPhrase?: string;       // Separator (e.g., " & ", " feat. ")
	mbid?: string;             // MusicBrainz ID
}

HarmonyMedium

interface HarmonyMedium {
	title?: string;
	format?: MediumFormat;     // CD, Vinyl, Digital, etc.
	position: number;
	tracks: HarmonyTrack[];
}

HarmonyTrack

interface HarmonyTrack {
	title: string;
	artists?: ArtistCreditName[];
	position: number;
	length?: number;           // Duration in milliseconds
	isrc?: string;             // International Standard Recording Code
}

Label

interface Label {
	name: string;
	catalogNumber?: string;
	mbid?: string;
}

Image

interface Image {
	url: string;
	types: ImageType[];        // 'front', 'back', 'medium', etc.
	width?: number;
	height?: number;
	comment?: string;
}

Harmonizer Modules

Location: harmonizer/ directory

Module Purpose Lines
types.ts HarmonyRelease schema and type definitions 273
merge.ts 3-phase merge algorithm ~200
compatibility.ts Conflict detection and resolution ~150
deduplicate.ts Remove duplicate entries ~100
isrc.ts ISRC validation and normalization ~50
language_script.ts Auto-detect language and script ~100
release_label.ts Label normalization ~80
release_types.ts Release type inference ~120
tracklist_gap.ts Detect missing tracks ~60

Stage 3: MERGE

3-Phase Merge Algorithm

Location: harmonizer/merge.ts

The merge algorithm combines multiple HarmonyRelease objects into a single MergedHarmonyRelease using provider preferences and compatibility checking.

Phase 1: Property Collection

Collect all values for each property across all releases:

// Conceptual
const propertyValues = {
	title: ['Album Title', 'Album Title (Deluxe)', 'Album Title'],
	gtin: ['0602537347377', '0602537347377'],
	releaseDate: ['2014-11-24', '2014-11-24', '2014-11-25'],
	// ... all properties
};

Phase 2: Compatibility Checking

For each property, check if values are compatible:

interface CompatibilityCheck {
	compatible: boolean;
	canonicalValue?: any;
	conflicts?: ConflictInfo[];
}

Compatibility Rules:

  • Strings: Case-insensitive comparison, whitespace normalization
  • Dates: Partial date matching (year-only vs. full date)
  • Arrays: Set comparison (order-independent)
  • Numbers: Exact match or within tolerance
  • Objects: Recursive field comparison

Example Compatibility:

// Compatible
'2014-11-24'  '2014-11'  // Partial date match
'Album Title'  'album title'  // Case-insensitive

// Incompatible
'2014-11-24'  '2014-11-25'  // Date conflict
'Album'  'EP'  // Type conflict

Phase 3: Value Selection

For each property, select the best value using provider preferences:

Provider Preference Order (configurable):

  1. MusicBrainz (template/reference)
  2. Spotify (high quality, comprehensive)
  3. Tidal (high quality audio metadata)
  4. Deezer (good coverage)
  5. iTunes (region-specific)
  6. Bandcamp (artist-verified)
  7. Beatport (electronic music specialist)
  8. Mora (Japan specialist)
  9. Ototoy (Japan specialist)

Selection Logic:

function selectBestValue(values: PropertyValues, preferences: string[]): any {
	// 1. Filter to compatible values only
	const compatible = values.filter(v => v.isCompatible);
	
	// 2. If no compatible values, mark as conflict
	if (compatible.length === 0) {
		return { conflict: true, values };
	}
	
	// 3. Select from highest-preference provider
	for (const provider of preferences) {
		const value = compatible.find(v => v.provider === provider);
		if (value) return value.data;
	}
	
	// 4. Fallback to first compatible value
	return compatible[0].data;
}

MergedHarmonyRelease

Extends HarmonyRelease with merge metadata:

interface MergedHarmonyRelease extends HarmonyRelease {
	sourceMap: SourceMap;              // Property -> provider mapping
	incompatibleData?: IncompatibilityInfo;
}

interface SourceMap {
	[propertyPath: string]: string;    // e.g., "title" -> "spotify"
}

interface IncompatibilityInfo {
	conflicts: Conflict[];
	warnings: string[];
}

interface Conflict {
	property: string;
	values: Array<{
		provider: string;
		value: any;
	}>;
}

Deduplication

Location: harmonizer/deduplicate.ts

Removes duplicate entries in arrays:

  • Artists: Match by name (case-insensitive) or MBID
  • Labels: Match by name and catalog number
  • Tracks: Match by position and title
  • Images: Match by URL or dimensions
  • External links: Match by URL

Compatibility Checking

Location: harmonizer/compatibility.ts

Detects and reports incompatible data:

Incompatibility Types:

  1. Value conflicts: Different values for same property
  2. Type conflicts: Different data types
  3. Structural conflicts: Different array lengths, missing required fields
  4. Semantic conflicts: Logically incompatible values (e.g., release date before artist birth)

Handling:

  • Strict mode: Reject merge if any conflicts
  • Lenient mode: Prefer highest-quality provider, log warnings
  • User override: Allow manual conflict resolution

Stage 4: SEED

MusicBrainz Seeding

Location: musicbrainz/seeding.ts

Converts MergedHarmonyRelease to MusicBrainz import format.

Conversion Steps:

  1. Map HarmonyRelease fields to MusicBrainz schema
  2. Generate edit notes with provider URLs
  3. Create permalink for reproducibility
  4. Build annotation with extra data (copyright, availability)
  5. Format for MusicBrainz seeder form

MusicBrainz Mapping:

Harmony Field MusicBrainz Field Notes
title Release name Direct mapping
artists Artist credit Join with joinPhrase
gtin Barcode Validate format
releaseDate Release events Per-country events
labels Release labels With catalog numbers
media Mediums With format and tracks
types Release group types Primary + secondary
language Language ISO 639-3 code
script Script ISO 15924 code
packaging Packaging Jewel case, digipak, etc.

Edit Note Generation:

function generateEditNote(release: MergedHarmonyRelease, permalink: string): string {
	const sources = release.info.providers.join(', ');
	return `
Imported from ${sources} via Harmony
Permalink: ${permalink}
${release.externalLinks.map(link => link.url).join('\n')}
	`.trim();
}

MBID Resolution

Location: musicbrainz/mbid_mapping.ts

Resolves external URLs to MusicBrainz IDs (MBIDs).

Batch Lookup:

  • Collects up to 100 URLs
  • Single MusicBrainz API request: GET /ws/2/url?resource={url1}&resource={url2}&...
  • Caches results in localStorage (dev) or sessionStorage (prod)
  • Returns MBID mappings

Duplicate Detection:

  • Checks if release already exists in MusicBrainz
  • Warns user before creating duplicate
  • Provides link to existing release

Cache Strategy:

interface MBIDCache {
	[externalUrl: string]: {
		mbid: string;
		type: 'release' | 'release-group' | 'recording' | 'artist';
		cached: number;  // Timestamp
	};
}

Annotation Builder

Location: musicbrainz/annotation.ts

Generates MusicBrainz annotation text for additional metadata:

Included Data:

  • Copyright information
  • Availability/exclusion regions
  • Provider-specific notes
  • Compatibility warnings
  • Image URLs (if not added as cover art)

Format:

Copyright: © 2014 Record Label
Available in: US, GB, DE, JP
Excluded from: CN

Sources:
- Spotify: https://open.spotify.com/album/xyz
- Deezer: https://www.deezer.com/album/123

Notes:
- Release date conflict: Spotify (2014-11-24) vs iTunes (2014-11-25)

Provider Architecture

Base Class Hierarchy

MetadataProvider (abstract)
├── MetadataApiProvider (OAuth2 support)
│   ├── SpotifyProvider
│   └── TidalProvider
├── ReleaseLookup (GTIN/URL/ID support)
│   ├── DeezerProvider
│   ├── iTunesProvider
│   ├── BandcampProvider
│   ├── BeatportProvider
│   ├── MoraProvider
│   └── OtotoyProvider
└── ReleaseApiLookup (multi-region support)
    ├── iTunesProvider
    └── DeezerProvider

MetadataProvider (Abstract Base)

Location: providers/base.ts

Core Responsibilities:

  • URL pattern matching via URLPattern
  • Rate limiting with configurable delays
  • HTTP response caching via snap_storage
  • Error handling and retry logic
  • Feature quality ratings

Key Methods:

abstract class MetadataProvider {
	// URL pattern matching
	abstract urlPattern: URLPattern;
	matchesUrl(url: string): boolean;
	
	// Lookup methods
	abstract lookupByUrl(url: string): Promise<Release>;
	abstract lookupByGtin(gtin: string, region?: string): Promise<Release>;
	
	// Harmonization
	abstract harmonize(release: Release): HarmonyRelease;
	
	// Rate limiting
	protected rateLimit: RateLimiter;
	protected async throttle(): Promise<void>;
	
	// Caching
	protected cache: SnapStorage;
	protected async getCached(key: string): Promise<Response | null>;
	protected async setCached(key: string, response: Response): Promise<void>;
	
	// Feature quality
	abstract featureQuality: FeatureQualityMap;
}

MetadataApiProvider (OAuth2)

Location: providers/api_base.ts

Additional Responsibilities:

  • OAuth2 token acquisition and refresh
  • Token caching in localStorage
  • Automatic token renewal
  • API client configuration

OAuth2 Flow:

class MetadataApiProvider extends MetadataProvider {
	protected async getAccessToken(): Promise<string> {
		// 1. Check cache
		const cached = localStorage.getItem(`${this.name}_token`);
		if (cached && !this.isTokenExpired(cached)) {
			return cached.access_token;
		}
		
		// 2. Request new token
		const token = await this.requestToken();
		
		// 3. Cache token
		localStorage.setItem(`${this.name}_token`, JSON.stringify(token));
		
		return token.access_token;
	}
	
	protected abstract async requestToken(): Promise<OAuth2Token>;
}

ReleaseLookup

Location: providers/release_lookup.ts

Lookup Methods:

interface ReleaseLookup {
	lookupByUrl(url: string): Promise<Release>;
	lookupByGtin(gtin: string): Promise<Release>;
	lookupById(id: string): Promise<Release>;
}

ReleaseApiLookup (Multi-Region)

Location: providers/release_api_lookup.ts

Region Handling:

class ReleaseApiLookup extends ReleaseLookup {
	protected supportedRegions: string[];  // ['US', 'GB', 'JP', ...]
	
	async lookupByGtin(gtin: string, regions: string[]): Promise<Release[]> {
		const lookups = regions
			.filter(r => this.supportedRegions.includes(r))
			.map(r => this.lookupInRegion(gtin, r));
		
		const results = await Promise.allSettled(lookups);
		return results
			.filter(r => r.status === 'fulfilled')
			.map(r => r.value);
	}
	
	protected abstract lookupInRegion(gtin: string, region: string): Promise<Release>;
}

Provider Registry

Location: providers/registry.ts

Manages provider instantiation and categorization.

Registry Structure:

class ProviderRegistry {
	private providers: Map<string, MetadataProvider>;
	private categories: Map<string, string[]>;  // category -> provider names
	
	register(provider: MetadataProvider, category: string): void;
	get(name: string): MetadataProvider | undefined;
	getByCategory(category: string): MetadataProvider[];
	getByUrl(url: string): MetadataProvider | undefined;
	getByGtin(): MetadataProvider[];  // All GTIN-supporting providers
}

Categories:

  • default: Commonly used providers (Spotify, Deezer, iTunes)
  • preferred: High-quality providers (Spotify, Tidal, MusicBrainz)
  • all: All registered providers
  • japan: Japan-specific providers (Mora, Ototoy)
  • electronic: Electronic music specialists (Beatport)

Feature Quality Ratings

Each provider declares quality ratings for supported features:

interface FeatureQualityMap {
	gtin: FeatureQuality;
	title: FeatureQuality;
	artists: FeatureQuality;
	releaseDate: FeatureQuality;
	labels: FeatureQuality;
	media: FeatureQuality;
	tracks: FeatureQuality;
	isrc: FeatureQuality;
	images: FeatureQuality | number;  // Number = max dimension
	copyright: FeatureQuality;
	availability: FeatureQuality;
}

enum FeatureQuality {
	MISSING = 0,
	BAD = 1,
	PRESENT = 2,
	GOOD = 3,
}

Example (Spotify):

featureQuality = {
	gtin: FeatureQuality.GOOD,
	title: FeatureQuality.GOOD,
	artists: FeatureQuality.GOOD,
	releaseDate: FeatureQuality.GOOD,
	labels: FeatureQuality.PRESENT,
	media: FeatureQuality.GOOD,
	tracks: FeatureQuality.GOOD,
	isrc: FeatureQuality.GOOD,
	images: 2000,  // Max 2000px
	copyright: FeatureQuality.PRESENT,
	availability: FeatureQuality.GOOD,
};

Server Architecture (Fresh Framework)

Fresh Islands Architecture

Fresh uses a hybrid rendering model:

  • Server-side rendering (SSR): Default for all components
  • Islands: Client-side interactive components

Benefits:

  • Minimal JavaScript shipped to client
  • Fast initial page load
  • Progressive enhancement
  • SEO-friendly

Route Structure

Location: routes/ directory

Route File URL Purpose
index.tsx / Landing page
release.tsx /release Main lookup interface
release/actions.tsx /release/actions ISRC/cover submission
about.tsx /about Provider documentation
settings.tsx /settings User preferences

Components

Location: components/ directory

22 Static Components (server-rendered):

  • Layout components (Header, Footer, Navigation)
  • Display components (ReleaseInfo, TrackList, ArtistCredit)
  • Comparison components (ProviderTable, FeatureMatrix)
  • Form components (LookupForm, SeederForm)

5 Interactive Islands (client-side):

  • LookupForm.tsx: Dynamic form with validation
  • ProviderSelector.tsx: Provider category filtering
  • RegionSelector.tsx: Multi-region selection
  • PermalinkGenerator.tsx: Timestamp-based permalink creation
  • SeederForm.tsx: MusicBrainz import form with copy-to-clipboard

Request Flow

1. Browser Request
   ↓
2. Fresh Router (routes/release.tsx)
   ↓
3. CombinedReleaseLookup (parallel provider queries)
   ↓
4. Provider Harmonization (convert to HarmonyRelease)
   ↓
5. Merge Algorithm (combine releases)
   ↓
6. Server-Side Rendering (generate HTML)
   ↓
7. Island Hydration (activate interactive components)
   ↓
8. Browser Response

Data Flow Diagram

┌─────────────────────────────────────────────────────────────┐
│                        User Input                            │
│  GTIN: 0602537347377  URLs: [spotify, deezer]  Region: US   │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                  CombinedReleaseLookup                       │
│  - Parse input                                               │
│  - Select providers (Spotify, Deezer)                        │
│  - Execute parallel lookups                                  │
└────────────────────────┬────────────────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Spotify   │  │   Deezer    │  │   iTunes    │
│   Provider  │  │   Provider  │  │   Provider  │
│             │  │             │  │             │
│ - API call  │  │ - API call  │  │ - API call  │
│ - Cache     │  │ - Cache     │  │ - Cache     │
│ - Parse     │  │ - Parse     │  │ - Parse     │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                │                │
       ▼                ▼                ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  Harmonize  │  │  Harmonize  │  │  Harmonize  │
│  (Spotify)  │  │  (Deezer)   │  │  (iTunes)   │
└──────┬──────┘  └──────┬──────┘  └──────┬──────┘
       │                │                │
       └────────────────┼────────────────┘
                        ▼
┌─────────────────────────────────────────────────────────────┐
│                     Merge Algorithm                          │
│  Phase 1: Collect property values from all releases         │
│  Phase 2: Check compatibility                                │
│  Phase 3: Select best value per property                     │
└────────────────────────┬────────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                  MergedHarmonyRelease                        │
│  - Unified metadata                                          │
│  - Source map (property -> provider)                         │
│  - Incompatibility warnings                                  │
└────────────────────────┬────────────────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         ▼                               ▼
┌─────────────────┐              ┌─────────────────┐
│  Web UI Display │              │  MusicBrainz    │
│  - Comparison   │              │  Seeding        │
│  - Warnings     │              │  - Convert      │
│  - Permalink    │              │  - Edit note    │
└─────────────────┘              │  - Annotation   │
                                 └─────────────────┘

Summary

Harmony's architecture demonstrates:

  1. Clear separation of concerns: 4-stage pipeline with distinct responsibilities
  2. Provider abstraction: Base classes handle common functionality (caching, rate limiting, OAuth2)
  3. Type safety: 273-line HarmonyRelease schema ensures data consistency
  4. Intelligent merging: 3-phase algorithm with compatibility checking and provider preferences
  5. Graceful degradation: Promise.allSettled ensures partial results on provider failures
  6. MusicBrainz integration: Seamless conversion to MB format with MBID resolution
  7. Modern web stack: Fresh framework with SSR and islands for optimal performance

This architecture is production-ready and serves as an excellent reference for building metadata aggregation systems.