Files
metadata-agregator/docs/research/harmony/analysis/OVERVIEW.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

395 lines
13 KiB
Markdown

# Harmony - Project Overview
## Project Identity
| Property | Value |
|----------|-------|
| **Name** | Harmony |
| **Repository** | https://github.com/kellnerd/harmony |
| **License** | MIT (2022-2024 David Kellner) |
| **Language** | TypeScript |
| **Runtime** | Deno |
| **Primary Framework** | Fresh 1.6.8 |
| **UI Library** | Preact 10.19.6 |
| **Purpose** | Music metadata aggregator and MusicBrainz importer |
## Core Purpose
Harmony is a specialized tool designed to solve two critical problems in music metadata management:
1. **Multi-source metadata aggregation**: Fetches release information from 9 different music platforms and intelligently merges them into a unified, harmonized dataset
2. **MusicBrainz import facilitation**: Converts aggregated metadata into MusicBrainz-compatible format for seeding new releases or improving existing entries
The project targets MusicBrainz editors and music metadata enthusiasts who need to cross-reference multiple sources when adding or verifying release information.
## Technical Stack
### Runtime and Framework
- **Deno**: Modern TypeScript/JavaScript runtime with built-in tooling
- **Fresh 1.6.8**: Deno-native web framework with server-side rendering and islands architecture
- **Preact 10.19.6**: Lightweight React alternative for interactive UI components
### Key Dependencies
| Dependency | Purpose |
|------------|---------|
| `@kellnerd/musicbrainz` | MusicBrainz API client and data structures |
| `snap-storage` | HTTP response caching with SQLite backend |
| `@std/*` | Deno standard library modules (log, testing, http, etc.) |
| `preact` | UI rendering and component system |
| `preact-render-to-string` | Server-side rendering |
## Entry Points
The project provides three distinct entry points for different use cases:
### 1. Web Server (Production)
```bash
# File: server/main.ts
deno task server
```
Starts the Fresh web application for interactive metadata lookup and comparison.
### 2. Development Server
```bash
# File: server/dev.ts
deno task dev
```
Runs the web server with auto-reload on file changes.
### 3. Command-Line Interface
```bash
# File: cli.ts
deno task cli
```
Provides terminal-based GTIN/URL lookup for testing and automation.
## Available Tasks
The `deno.json` configuration defines the following tasks:
| Task | Command | Purpose |
|------|---------|---------|
| `check` | `deno fmt --check && deno lint && deno check **/*.ts` | Verify code formatting, linting, and type checking |
| `ok` | `deno fmt && deno lint && deno check **/*.ts && deno test -A` | Format, lint, check, and test in one command |
| `cli` | `deno run -A cli.ts` | Run command-line interface |
| `dev` | `deno run -A --watch=static/,routes/ server/dev.ts` | Start development server with auto-reload |
| `build` | `deno run -A server/dev.ts build` | Build static assets |
| `server` | `DENO_DEPLOYMENT_ID=$(git describe --tags --always) deno run -A server/main.ts` | Start production server |
## Provider Ecosystem
Harmony integrates with 9 music metadata providers, categorized by access method:
### API-Based Providers (5)
| Provider | Authentication | Rate Limit | Max Image Size | GTIN Support |
|----------|---------------|------------|----------------|--------------|
| **Spotify** | OAuth2 | Not specified | 2000px | Yes (UPC) |
| **Deezer** | Public API | 50 req/5s | 1400px | Yes |
| **iTunes** | Public API | Not specified | Varies | Yes |
| **Tidal** | OAuth2 | Not specified | 1280px | Yes |
| **MusicBrainz** | Public API | 5 req/5s | N/A | Yes (barcode) |
### HTML Scraping Providers (4)
| Provider | Region | Max Image Size | GTIN Support | Notes |
|----------|--------|----------------|--------------|-------|
| **Bandcamp** | Global | 3000px | No | JSON-LD extraction |
| **Beatport** | Global | Varies | Yes | Electronic music focus |
| **Mora** | Japan | Varies | Yes | Japanese market |
| **Ototoy** | Japan | Varies | Yes | Japanese market |
### Not Implemented
- **KKBOX**: Mentioned in documentation but not implemented
## Architecture Highlights
Harmony employs a **4-stage pipeline** for metadata processing:
1. **LOOKUP**: `CombinedReleaseLookup` queries multiple providers in parallel
2. **HARMONIZE**: Each provider converts its native format to `HarmonyRelease` schema
3. **MERGE**: Combines releases from multiple providers using configurable preferences
4. **SEED**: Converts harmonized data to MusicBrainz import format
This pipeline ensures:
- Parallel provider queries for performance
- Standardized internal data representation
- Intelligent conflict resolution
- MusicBrainz-compatible output
## Data Storage Strategy
Harmony uses a **cache-first, no-database** approach:
- **snap_storage**: SQLite-backed HTTP response cache (`snaps.db` + `snaps/` directory)
- **24-hour default cache policy**: Reduces API calls and enables permalink functionality
- **Permalink system**: `ts` parameter replays cached lookups for reproducible results
- **In-memory processing**: All data transformations happen in memory, no persistent storage
This design prioritizes:
- Reproducibility (permalinks)
- API rate limit compliance
- Simplicity (no database migrations)
- Statelessness (no user data storage)
## Deployment Model
Harmony is designed for **self-hosted deployment** without containerization:
### Production Deployment
```bash
deno run -A server/main.ts
```
Environment variables:
- `PORT`: Server port (default varies)
- `DENO_DEPLOYMENT_ID`: Version identifier (auto-set from git tags)
- `HARMONY_SPOTIFY_CLIENT_ID` / `HARMONY_SPOTIFY_CLIENT_SECRET`
- `HARMONY_TIDAL_CLIENT_ID` / `HARMONY_TIDAL_CLIENT_SECRET`
- `HARMONY_MB_API_URL`: MusicBrainz API endpoint
- `HARMONY_MB_TARGET_URL`: MusicBrainz target instance
- `HARMONY_DATA_DIR`: Data directory for cache storage
### CI/CD Pipeline
GitHub Actions workflow (`deno.yml`):
1. **Test stage**: Format check, lint, type check, unit tests
2. **Deploy stage**: SSH to server, rsync code, systemd service restart
3. **Trigger**: Tagged releases (`v*`) and authorized users only
### No Docker
The project intentionally avoids containerization:
- Deno provides consistent runtime across environments
- Fresh framework handles asset bundling
- Simple systemd service management
- Direct SSH deployment
## CLI Usage
The command-line interface supports GTIN and URL lookups:
```bash
# GTIN lookup
deno task cli --gtin 0602537347377
# URL lookup
deno task cli --url https://open.spotify.com/album/xyz
# Multiple URLs
deno task cli --url https://open.spotify.com/album/xyz --url https://www.deezer.com/album/123
# Region-specific lookup
deno task cli --gtin 0602537347377 --region JP,US
```
Output includes:
- Harmonized release metadata
- Provider comparison
- Compatibility warnings
- MusicBrainz seeding data
## Web Interface
The Fresh-based web UI provides:
### Main Route: `/release`
Query parameters:
- `gtin`: Global Trade Item Number (barcode)
- `url`: Provider URL(s) - supports multiple
- `region`: Market regions (default: GB,US,DE,JP)
- `category`: Provider category filter (all/default/preferred)
- `[provider_name]`: Provider-specific ID or GTIN lookup
- `[provider_name]!`: Template mode for provider
- `ts`: Timestamp for permalink replay
### Additional Routes
| Route | Purpose |
|-------|---------|
| `/` | Landing page with documentation |
| `/release/actions` | ISRC/cover submission for existing MusicBrainz releases |
| `/about` | Provider documentation and feature comparison |
| `/settings` | User preferences (stored in cookies) |
### UI Components
- **22 static components**: Server-rendered UI elements
- **5 interactive islands**: Client-side interactive features (Fresh islands architecture)
## Feature Quality System
Providers are rated on feature quality using a standardized scale:
| Rating | Meaning |
|--------|---------|
| `MISSING` | Feature not available |
| `BAD` | Feature present but unreliable/incomplete |
| `PRESENT` | Feature available with acceptable quality |
| `GOOD` | Feature available with high quality |
| Numeric | Specific measurements (e.g., image dimensions) |
This system enables:
- Informed provider selection
- Merge algorithm prioritization
- User transparency about data quality
## Development Workflow
### Code Quality Standards
```bash
# Format code (tabs, single quotes, 120 char width)
deno fmt
# Lint code
deno lint
# Type check
deno check **/*.ts
# Run tests
deno test -A
# All-in-one
deno task ok
```
### Testing Infrastructure
- **38 test files**: Comprehensive test coverage
- **Declarative provider specs**: `describeProvider` helper for consistent provider testing
- **Snapshot testing**: Verify output stability
- **Offline mode**: 43 cached responses in `testdata/` directory
- **Download flag**: `--download` to fetch fresh test data
### Logging System
5 specialized loggers using Deno std/log:
| Logger | Level | Purpose |
|--------|-------|---------|
| `harmony.lookup` | INFO | Release lookup operations |
| `harmony.mbid` | DEBUG | MusicBrainz ID resolution |
| `harmony.provider` | DEBUG/INFO | Provider interactions |
| `harmony.server` | INFO | Server lifecycle events |
| `requests` | INFO/WARN | HTTP request logging |
All loggers use `ConsoleHandler` with color formatting for readability.
## Error Handling Philosophy
Harmony uses a **graceful degradation** approach:
### Error Hierarchy
```
LookupError (base)
└── ProviderError
├── ResponseError (HTTP/API errors)
├── CompatibilityError (data conflicts)
└── CacheMissError (cache lookup failures)
```
### Resilience Strategy
- `Promise.allSettled`: Continue processing even if some providers fail
- Rate limit handling: Parse `Retry-After` headers, dynamic delay adjustment
- Partial results: Return available data even with provider failures
- User feedback: Display warnings for failed providers
## Project Maturity
### Strengths
- **Single developer project**: Consistent vision and architecture
- **Active maintenance**: Recent Tidal v1 deprecation handling (2025-01-21)
- **Production-ready**: Used by MusicBrainz community
- **Well-tested**: 38 test files with offline test data
- **Type-safe**: Full TypeScript coverage with 273-line `HarmonyRelease` schema
### Limitations
- **No REST API**: Web UI only, no programmatic JSON endpoints
- **No authentication**: Public access only
- **No metrics/monitoring**: No health endpoint, no Sentry integration
- **Scraping fragility**: HTML-based providers break when sites change
- **Deno-only**: Fresh framework ties project to Deno ecosystem
## Relevance to Metadata Aggregation
Harmony represents the **gold standard** for multi-source music metadata aggregation:
### Architectural Lessons
1. **Provider abstraction**: Base classes with URLPattern matching, rate limiting, caching
2. **Harmonized schema**: `HarmonyRelease` as universal internal format
3. **Intelligent merging**: 3-phase merge with provider preferences
4. **Permalink system**: Timestamp-based cache replay for reproducibility
5. **Quality ratings**: Per-feature, per-provider quality assessment
### Adoption Recommendations
- **HarmonyRelease schema**: Adopt as internal data model
- **Merge algorithm**: Study 3-phase merge with compatibility checking
- **Provider base classes**: Reuse abstraction patterns
- **MBID resolution**: Batch URL lookup (100 per request) is efficient
- **Testing framework**: Declarative provider specs with offline mode
## Configuration Management
### Environment Variables
```bash
# OAuth2 Credentials
HARMONY_SPOTIFY_CLIENT_ID=your_client_id
HARMONY_SPOTIFY_CLIENT_SECRET=your_client_secret
HARMONY_TIDAL_CLIENT_ID=your_client_id
HARMONY_TIDAL_CLIENT_SECRET=your_client_secret
# MusicBrainz Integration
HARMONY_MB_API_URL=https://musicbrainz.org/ws/2
HARMONY_MB_TARGET_URL=https://musicbrainz.org
# Storage
HARMONY_DATA_DIR=/path/to/data
# Server
PORT=8000
FORWARD_PROTO=https
```
### Configuration Helpers
Located in `utils/config.ts`:
- `getFromEnv(key, defaultValue)`: String environment variables
- `getBooleanFromEnv(key, defaultValue)`: Boolean parsing
- `getUrlFromEnv(key, defaultValue)`: URL validation
### Template
`.env.example` provides a complete configuration template for new deployments.
## Community and Licensing
- **License**: MIT (permissive, commercial-friendly)
- **Copyright**: 2022-2024 David Kellner
- **Community**: MusicBrainz editor community
- **Contribution**: Single maintainer, open to contributions
- **Documentation**: Comprehensive inline comments and type definitions
## Summary
Harmony is a production-ready, TypeScript-based music metadata aggregator that demonstrates best practices in:
- Multi-source data integration
- Intelligent conflict resolution
- MusicBrainz ecosystem integration
- Type-safe architecture
- Graceful error handling
Its 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED) and provider abstraction system make it the most relevant reference project for building a comprehensive metadata aggregation system.