Files
metadata-agregator/docs/research/harmony/analysis/OVERVIEW.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

13 KiB

Harmony - Project Overview

Project Identity

Property Value
Name Harmony
Repository https://github.com/kellnerd/harmony
License MIT (2022-2024 David Kellner)
Language TypeScript
Runtime Deno
Primary Framework Fresh 1.6.8
UI Library Preact 10.19.6
Purpose Music metadata aggregator and MusicBrainz importer

Core Purpose

Harmony is a specialized tool designed to solve two critical problems in music metadata management:

  1. Multi-source metadata aggregation: Fetches release information from 9 different music platforms and intelligently merges them into a unified, harmonized dataset
  2. MusicBrainz import facilitation: Converts aggregated metadata into MusicBrainz-compatible format for seeding new releases or improving existing entries

The project targets MusicBrainz editors and music metadata enthusiasts who need to cross-reference multiple sources when adding or verifying release information.

Technical Stack

Runtime and Framework

  • Deno: Modern TypeScript/JavaScript runtime with built-in tooling
  • Fresh 1.6.8: Deno-native web framework with server-side rendering and islands architecture
  • Preact 10.19.6: Lightweight React alternative for interactive UI components

Key Dependencies

Dependency Purpose
@kellnerd/musicbrainz MusicBrainz API client and data structures
snap-storage HTTP response caching with SQLite backend
@std/* Deno standard library modules (log, testing, http, etc.)
preact UI rendering and component system
preact-render-to-string Server-side rendering

Entry Points

The project provides three distinct entry points for different use cases:

1. Web Server (Production)

# File: server/main.ts
deno task server

Starts the Fresh web application for interactive metadata lookup and comparison.

2. Development Server

# File: server/dev.ts
deno task dev

Runs the web server with auto-reload on file changes.

3. Command-Line Interface

# File: cli.ts
deno task cli

Provides terminal-based GTIN/URL lookup for testing and automation.

Available Tasks

The deno.json configuration defines the following tasks:

Task Command Purpose
check deno fmt --check && deno lint && deno check **/*.ts Verify code formatting, linting, and type checking
ok deno fmt && deno lint && deno check **/*.ts && deno test -A Format, lint, check, and test in one command
cli deno run -A cli.ts Run command-line interface
dev deno run -A --watch=static/,routes/ server/dev.ts Start development server with auto-reload
build deno run -A server/dev.ts build Build static assets
server DENO_DEPLOYMENT_ID=$(git describe --tags --always) deno run -A server/main.ts Start production server

Provider Ecosystem

Harmony integrates with 9 music metadata providers, categorized by access method:

API-Based Providers (5)

Provider Authentication Rate Limit Max Image Size GTIN Support
Spotify OAuth2 Not specified 2000px Yes (UPC)
Deezer Public API 50 req/5s 1400px Yes
iTunes Public API Not specified Varies Yes
Tidal OAuth2 Not specified 1280px Yes
MusicBrainz Public API 5 req/5s N/A Yes (barcode)

HTML Scraping Providers (4)

Provider Region Max Image Size GTIN Support Notes
Bandcamp Global 3000px No JSON-LD extraction
Beatport Global Varies Yes Electronic music focus
Mora Japan Varies Yes Japanese market
Ototoy Japan Varies Yes Japanese market

Not Implemented

  • KKBOX: Mentioned in documentation but not implemented

Architecture Highlights

Harmony employs a 4-stage pipeline for metadata processing:

  1. LOOKUP: CombinedReleaseLookup queries multiple providers in parallel
  2. HARMONIZE: Each provider converts its native format to HarmonyRelease schema
  3. MERGE: Combines releases from multiple providers using configurable preferences
  4. SEED: Converts harmonized data to MusicBrainz import format

This pipeline ensures:

  • Parallel provider queries for performance
  • Standardized internal data representation
  • Intelligent conflict resolution
  • MusicBrainz-compatible output

Data Storage Strategy

Harmony uses a cache-first, no-database approach:

  • snap_storage: SQLite-backed HTTP response cache (snaps.db + snaps/ directory)
  • 24-hour default cache policy: Reduces API calls and enables permalink functionality
  • Permalink system: ts parameter replays cached lookups for reproducible results
  • In-memory processing: All data transformations happen in memory, no persistent storage

This design prioritizes:

  • Reproducibility (permalinks)
  • API rate limit compliance
  • Simplicity (no database migrations)
  • Statelessness (no user data storage)

Deployment Model

Harmony is designed for self-hosted deployment without containerization:

Production Deployment

deno run -A server/main.ts

Environment variables:

  • PORT: Server port (default varies)
  • DENO_DEPLOYMENT_ID: Version identifier (auto-set from git tags)
  • HARMONY_SPOTIFY_CLIENT_ID / HARMONY_SPOTIFY_CLIENT_SECRET
  • HARMONY_TIDAL_CLIENT_ID / HARMONY_TIDAL_CLIENT_SECRET
  • HARMONY_MB_API_URL: MusicBrainz API endpoint
  • HARMONY_MB_TARGET_URL: MusicBrainz target instance
  • HARMONY_DATA_DIR: Data directory for cache storage

CI/CD Pipeline

GitHub Actions workflow (deno.yml):

  1. Test stage: Format check, lint, type check, unit tests
  2. Deploy stage: SSH to server, rsync code, systemd service restart
  3. Trigger: Tagged releases (v*) and authorized users only

No Docker

The project intentionally avoids containerization:

  • Deno provides consistent runtime across environments
  • Fresh framework handles asset bundling
  • Simple systemd service management
  • Direct SSH deployment

CLI Usage

The command-line interface supports GTIN and URL lookups:

# GTIN lookup
deno task cli --gtin 0602537347377

# URL lookup
deno task cli --url https://open.spotify.com/album/xyz

# Multiple URLs
deno task cli --url https://open.spotify.com/album/xyz --url https://www.deezer.com/album/123

# Region-specific lookup
deno task cli --gtin 0602537347377 --region JP,US

Output includes:

  • Harmonized release metadata
  • Provider comparison
  • Compatibility warnings
  • MusicBrainz seeding data

Web Interface

The Fresh-based web UI provides:

Main Route: /release

Query parameters:

  • gtin: Global Trade Item Number (barcode)
  • url: Provider URL(s) - supports multiple
  • region: Market regions (default: GB,US,DE,JP)
  • category: Provider category filter (all/default/preferred)
  • [provider_name]: Provider-specific ID or GTIN lookup
  • [provider_name]!: Template mode for provider
  • ts: Timestamp for permalink replay

Additional Routes

Route Purpose
/ Landing page with documentation
/release/actions ISRC/cover submission for existing MusicBrainz releases
/about Provider documentation and feature comparison
/settings User preferences (stored in cookies)

UI Components

  • 22 static components: Server-rendered UI elements
  • 5 interactive islands: Client-side interactive features (Fresh islands architecture)

Feature Quality System

Providers are rated on feature quality using a standardized scale:

Rating Meaning
MISSING Feature not available
BAD Feature present but unreliable/incomplete
PRESENT Feature available with acceptable quality
GOOD Feature available with high quality
Numeric Specific measurements (e.g., image dimensions)

This system enables:

  • Informed provider selection
  • Merge algorithm prioritization
  • User transparency about data quality

Development Workflow

Code Quality Standards

# Format code (tabs, single quotes, 120 char width)
deno fmt

# Lint code
deno lint

# Type check
deno check **/*.ts

# Run tests
deno test -A

# All-in-one
deno task ok

Testing Infrastructure

  • 38 test files: Comprehensive test coverage
  • Declarative provider specs: describeProvider helper for consistent provider testing
  • Snapshot testing: Verify output stability
  • Offline mode: 43 cached responses in testdata/ directory
  • Download flag: --download to fetch fresh test data

Logging System

5 specialized loggers using Deno std/log:

Logger Level Purpose
harmony.lookup INFO Release lookup operations
harmony.mbid DEBUG MusicBrainz ID resolution
harmony.provider DEBUG/INFO Provider interactions
harmony.server INFO Server lifecycle events
requests INFO/WARN HTTP request logging

All loggers use ConsoleHandler with color formatting for readability.

Error Handling Philosophy

Harmony uses a graceful degradation approach:

Error Hierarchy

LookupError (base)
└── ProviderError
    ├── ResponseError (HTTP/API errors)
    ├── CompatibilityError (data conflicts)
    └── CacheMissError (cache lookup failures)

Resilience Strategy

  • Promise.allSettled: Continue processing even if some providers fail
  • Rate limit handling: Parse Retry-After headers, dynamic delay adjustment
  • Partial results: Return available data even with provider failures
  • User feedback: Display warnings for failed providers

Project Maturity

Strengths

  • Single developer project: Consistent vision and architecture
  • Active maintenance: Recent Tidal v1 deprecation handling (2025-01-21)
  • Production-ready: Used by MusicBrainz community
  • Well-tested: 38 test files with offline test data
  • Type-safe: Full TypeScript coverage with 273-line HarmonyRelease schema

Limitations

  • No REST API: Web UI only, no programmatic JSON endpoints
  • No authentication: Public access only
  • No metrics/monitoring: No health endpoint, no Sentry integration
  • Scraping fragility: HTML-based providers break when sites change
  • Deno-only: Fresh framework ties project to Deno ecosystem

Relevance to Metadata Aggregation

Harmony represents the gold standard for multi-source music metadata aggregation:

Architectural Lessons

  1. Provider abstraction: Base classes with URLPattern matching, rate limiting, caching
  2. Harmonized schema: HarmonyRelease as universal internal format
  3. Intelligent merging: 3-phase merge with provider preferences
  4. Permalink system: Timestamp-based cache replay for reproducibility
  5. Quality ratings: Per-feature, per-provider quality assessment

Adoption Recommendations

  • HarmonyRelease schema: Adopt as internal data model
  • Merge algorithm: Study 3-phase merge with compatibility checking
  • Provider base classes: Reuse abstraction patterns
  • MBID resolution: Batch URL lookup (100 per request) is efficient
  • Testing framework: Declarative provider specs with offline mode

Configuration Management

Environment Variables

# OAuth2 Credentials
HARMONY_SPOTIFY_CLIENT_ID=your_client_id
HARMONY_SPOTIFY_CLIENT_SECRET=your_client_secret
HARMONY_TIDAL_CLIENT_ID=your_client_id
HARMONY_TIDAL_CLIENT_SECRET=your_client_secret

# MusicBrainz Integration
HARMONY_MB_API_URL=https://musicbrainz.org/ws/2
HARMONY_MB_TARGET_URL=https://musicbrainz.org

# Storage
HARMONY_DATA_DIR=/path/to/data

# Server
PORT=8000
FORWARD_PROTO=https

Configuration Helpers

Located in utils/config.ts:

  • getFromEnv(key, defaultValue): String environment variables
  • getBooleanFromEnv(key, defaultValue): Boolean parsing
  • getUrlFromEnv(key, defaultValue): URL validation

Template

.env.example provides a complete configuration template for new deployments.

Community and Licensing

  • License: MIT (permissive, commercial-friendly)
  • Copyright: 2022-2024 David Kellner
  • Community: MusicBrainz editor community
  • Contribution: Single maintainer, open to contributions
  • Documentation: Comprehensive inline comments and type definitions

Summary

Harmony is a production-ready, TypeScript-based music metadata aggregator that demonstrates best practices in:

  • Multi-source data integration
  • Intelligent conflict resolution
  • MusicBrainz ecosystem integration
  • Type-safe architecture
  • Graceful error handling

Its 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED) and provider abstraction system make it the most relevant reference project for building a comprehensive metadata aggregation system.