a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
395 lines
13 KiB
Markdown
395 lines
13 KiB
Markdown
# Harmony - Project Overview
|
|
|
|
## Project Identity
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **Name** | Harmony |
|
|
| **Repository** | https://github.com/kellnerd/harmony |
|
|
| **License** | MIT (2022-2024 David Kellner) |
|
|
| **Language** | TypeScript |
|
|
| **Runtime** | Deno |
|
|
| **Primary Framework** | Fresh 1.6.8 |
|
|
| **UI Library** | Preact 10.19.6 |
|
|
| **Purpose** | Music metadata aggregator and MusicBrainz importer |
|
|
|
|
## Core Purpose
|
|
|
|
Harmony is a specialized tool designed to solve two critical problems in music metadata management:
|
|
|
|
1. **Multi-source metadata aggregation**: Fetches release information from 9 different music platforms and intelligently merges them into a unified, harmonized dataset
|
|
2. **MusicBrainz import facilitation**: Converts aggregated metadata into MusicBrainz-compatible format for seeding new releases or improving existing entries
|
|
|
|
The project targets MusicBrainz editors and music metadata enthusiasts who need to cross-reference multiple sources when adding or verifying release information.
|
|
|
|
## Technical Stack
|
|
|
|
### Runtime and Framework
|
|
|
|
- **Deno**: Modern TypeScript/JavaScript runtime with built-in tooling
|
|
- **Fresh 1.6.8**: Deno-native web framework with server-side rendering and islands architecture
|
|
- **Preact 10.19.6**: Lightweight React alternative for interactive UI components
|
|
|
|
### Key Dependencies
|
|
|
|
| Dependency | Purpose |
|
|
|------------|---------|
|
|
| `@kellnerd/musicbrainz` | MusicBrainz API client and data structures |
|
|
| `snap-storage` | HTTP response caching with SQLite backend |
|
|
| `@std/*` | Deno standard library modules (log, testing, http, etc.) |
|
|
| `preact` | UI rendering and component system |
|
|
| `preact-render-to-string` | Server-side rendering |
|
|
|
|
## Entry Points
|
|
|
|
The project provides three distinct entry points for different use cases:
|
|
|
|
### 1. Web Server (Production)
|
|
```bash
|
|
# File: server/main.ts
|
|
deno task server
|
|
```
|
|
Starts the Fresh web application for interactive metadata lookup and comparison.
|
|
|
|
### 2. Development Server
|
|
```bash
|
|
# File: server/dev.ts
|
|
deno task dev
|
|
```
|
|
Runs the web server with auto-reload on file changes.
|
|
|
|
### 3. Command-Line Interface
|
|
```bash
|
|
# File: cli.ts
|
|
deno task cli
|
|
```
|
|
Provides terminal-based GTIN/URL lookup for testing and automation.
|
|
|
|
## Available Tasks
|
|
|
|
The `deno.json` configuration defines the following tasks:
|
|
|
|
| Task | Command | Purpose |
|
|
|------|---------|---------|
|
|
| `check` | `deno fmt --check && deno lint && deno check **/*.ts` | Verify code formatting, linting, and type checking |
|
|
| `ok` | `deno fmt && deno lint && deno check **/*.ts && deno test -A` | Format, lint, check, and test in one command |
|
|
| `cli` | `deno run -A cli.ts` | Run command-line interface |
|
|
| `dev` | `deno run -A --watch=static/,routes/ server/dev.ts` | Start development server with auto-reload |
|
|
| `build` | `deno run -A server/dev.ts build` | Build static assets |
|
|
| `server` | `DENO_DEPLOYMENT_ID=$(git describe --tags --always) deno run -A server/main.ts` | Start production server |
|
|
|
|
## Provider Ecosystem
|
|
|
|
Harmony integrates with 9 music metadata providers, categorized by access method:
|
|
|
|
### API-Based Providers (5)
|
|
|
|
| Provider | Authentication | Rate Limit | Max Image Size | GTIN Support |
|
|
|----------|---------------|------------|----------------|--------------|
|
|
| **Spotify** | OAuth2 | Not specified | 2000px | Yes (UPC) |
|
|
| **Deezer** | Public API | 50 req/5s | 1400px | Yes |
|
|
| **iTunes** | Public API | Not specified | Varies | Yes |
|
|
| **Tidal** | OAuth2 | Not specified | 1280px | Yes |
|
|
| **MusicBrainz** | Public API | 5 req/5s | N/A | Yes (barcode) |
|
|
|
|
### HTML Scraping Providers (4)
|
|
|
|
| Provider | Region | Max Image Size | GTIN Support | Notes |
|
|
|----------|--------|----------------|--------------|-------|
|
|
| **Bandcamp** | Global | 3000px | No | JSON-LD extraction |
|
|
| **Beatport** | Global | Varies | Yes | Electronic music focus |
|
|
| **Mora** | Japan | Varies | Yes | Japanese market |
|
|
| **Ototoy** | Japan | Varies | Yes | Japanese market |
|
|
|
|
### Not Implemented
|
|
|
|
- **KKBOX**: Mentioned in documentation but not implemented
|
|
|
|
## Architecture Highlights
|
|
|
|
Harmony employs a **4-stage pipeline** for metadata processing:
|
|
|
|
1. **LOOKUP**: `CombinedReleaseLookup` queries multiple providers in parallel
|
|
2. **HARMONIZE**: Each provider converts its native format to `HarmonyRelease` schema
|
|
3. **MERGE**: Combines releases from multiple providers using configurable preferences
|
|
4. **SEED**: Converts harmonized data to MusicBrainz import format
|
|
|
|
This pipeline ensures:
|
|
- Parallel provider queries for performance
|
|
- Standardized internal data representation
|
|
- Intelligent conflict resolution
|
|
- MusicBrainz-compatible output
|
|
|
|
## Data Storage Strategy
|
|
|
|
Harmony uses a **cache-first, no-database** approach:
|
|
|
|
- **snap_storage**: SQLite-backed HTTP response cache (`snaps.db` + `snaps/` directory)
|
|
- **24-hour default cache policy**: Reduces API calls and enables permalink functionality
|
|
- **Permalink system**: `ts` parameter replays cached lookups for reproducible results
|
|
- **In-memory processing**: All data transformations happen in memory, no persistent storage
|
|
|
|
This design prioritizes:
|
|
- Reproducibility (permalinks)
|
|
- API rate limit compliance
|
|
- Simplicity (no database migrations)
|
|
- Statelessness (no user data storage)
|
|
|
|
## Deployment Model
|
|
|
|
Harmony is designed for **self-hosted deployment** without containerization:
|
|
|
|
### Production Deployment
|
|
```bash
|
|
deno run -A server/main.ts
|
|
```
|
|
|
|
Environment variables:
|
|
- `PORT`: Server port (default varies)
|
|
- `DENO_DEPLOYMENT_ID`: Version identifier (auto-set from git tags)
|
|
- `HARMONY_SPOTIFY_CLIENT_ID` / `HARMONY_SPOTIFY_CLIENT_SECRET`
|
|
- `HARMONY_TIDAL_CLIENT_ID` / `HARMONY_TIDAL_CLIENT_SECRET`
|
|
- `HARMONY_MB_API_URL`: MusicBrainz API endpoint
|
|
- `HARMONY_MB_TARGET_URL`: MusicBrainz target instance
|
|
- `HARMONY_DATA_DIR`: Data directory for cache storage
|
|
|
|
### CI/CD Pipeline
|
|
|
|
GitHub Actions workflow (`deno.yml`):
|
|
1. **Test stage**: Format check, lint, type check, unit tests
|
|
2. **Deploy stage**: SSH to server, rsync code, systemd service restart
|
|
3. **Trigger**: Tagged releases (`v*`) and authorized users only
|
|
|
|
### No Docker
|
|
|
|
The project intentionally avoids containerization:
|
|
- Deno provides consistent runtime across environments
|
|
- Fresh framework handles asset bundling
|
|
- Simple systemd service management
|
|
- Direct SSH deployment
|
|
|
|
## CLI Usage
|
|
|
|
The command-line interface supports GTIN and URL lookups:
|
|
|
|
```bash
|
|
# GTIN lookup
|
|
deno task cli --gtin 0602537347377
|
|
|
|
# URL lookup
|
|
deno task cli --url https://open.spotify.com/album/xyz
|
|
|
|
# Multiple URLs
|
|
deno task cli --url https://open.spotify.com/album/xyz --url https://www.deezer.com/album/123
|
|
|
|
# Region-specific lookup
|
|
deno task cli --gtin 0602537347377 --region JP,US
|
|
```
|
|
|
|
Output includes:
|
|
- Harmonized release metadata
|
|
- Provider comparison
|
|
- Compatibility warnings
|
|
- MusicBrainz seeding data
|
|
|
|
## Web Interface
|
|
|
|
The Fresh-based web UI provides:
|
|
|
|
### Main Route: `/release`
|
|
|
|
Query parameters:
|
|
- `gtin`: Global Trade Item Number (barcode)
|
|
- `url`: Provider URL(s) - supports multiple
|
|
- `region`: Market regions (default: GB,US,DE,JP)
|
|
- `category`: Provider category filter (all/default/preferred)
|
|
- `[provider_name]`: Provider-specific ID or GTIN lookup
|
|
- `[provider_name]!`: Template mode for provider
|
|
- `ts`: Timestamp for permalink replay
|
|
|
|
### Additional Routes
|
|
|
|
| Route | Purpose |
|
|
|-------|---------|
|
|
| `/` | Landing page with documentation |
|
|
| `/release/actions` | ISRC/cover submission for existing MusicBrainz releases |
|
|
| `/about` | Provider documentation and feature comparison |
|
|
| `/settings` | User preferences (stored in cookies) |
|
|
|
|
### UI Components
|
|
|
|
- **22 static components**: Server-rendered UI elements
|
|
- **5 interactive islands**: Client-side interactive features (Fresh islands architecture)
|
|
|
|
## Feature Quality System
|
|
|
|
Providers are rated on feature quality using a standardized scale:
|
|
|
|
| Rating | Meaning |
|
|
|--------|---------|
|
|
| `MISSING` | Feature not available |
|
|
| `BAD` | Feature present but unreliable/incomplete |
|
|
| `PRESENT` | Feature available with acceptable quality |
|
|
| `GOOD` | Feature available with high quality |
|
|
| Numeric | Specific measurements (e.g., image dimensions) |
|
|
|
|
This system enables:
|
|
- Informed provider selection
|
|
- Merge algorithm prioritization
|
|
- User transparency about data quality
|
|
|
|
## Development Workflow
|
|
|
|
### Code Quality Standards
|
|
|
|
```bash
|
|
# Format code (tabs, single quotes, 120 char width)
|
|
deno fmt
|
|
|
|
# Lint code
|
|
deno lint
|
|
|
|
# Type check
|
|
deno check **/*.ts
|
|
|
|
# Run tests
|
|
deno test -A
|
|
|
|
# All-in-one
|
|
deno task ok
|
|
```
|
|
|
|
### Testing Infrastructure
|
|
|
|
- **38 test files**: Comprehensive test coverage
|
|
- **Declarative provider specs**: `describeProvider` helper for consistent provider testing
|
|
- **Snapshot testing**: Verify output stability
|
|
- **Offline mode**: 43 cached responses in `testdata/` directory
|
|
- **Download flag**: `--download` to fetch fresh test data
|
|
|
|
### Logging System
|
|
|
|
5 specialized loggers using Deno std/log:
|
|
|
|
| Logger | Level | Purpose |
|
|
|--------|-------|---------|
|
|
| `harmony.lookup` | INFO | Release lookup operations |
|
|
| `harmony.mbid` | DEBUG | MusicBrainz ID resolution |
|
|
| `harmony.provider` | DEBUG/INFO | Provider interactions |
|
|
| `harmony.server` | INFO | Server lifecycle events |
|
|
| `requests` | INFO/WARN | HTTP request logging |
|
|
|
|
All loggers use `ConsoleHandler` with color formatting for readability.
|
|
|
|
## Error Handling Philosophy
|
|
|
|
Harmony uses a **graceful degradation** approach:
|
|
|
|
### Error Hierarchy
|
|
|
|
```
|
|
LookupError (base)
|
|
└── ProviderError
|
|
├── ResponseError (HTTP/API errors)
|
|
├── CompatibilityError (data conflicts)
|
|
└── CacheMissError (cache lookup failures)
|
|
```
|
|
|
|
### Resilience Strategy
|
|
|
|
- `Promise.allSettled`: Continue processing even if some providers fail
|
|
- Rate limit handling: Parse `Retry-After` headers, dynamic delay adjustment
|
|
- Partial results: Return available data even with provider failures
|
|
- User feedback: Display warnings for failed providers
|
|
|
|
## Project Maturity
|
|
|
|
### Strengths
|
|
|
|
- **Single developer project**: Consistent vision and architecture
|
|
- **Active maintenance**: Recent Tidal v1 deprecation handling (2025-01-21)
|
|
- **Production-ready**: Used by MusicBrainz community
|
|
- **Well-tested**: 38 test files with offline test data
|
|
- **Type-safe**: Full TypeScript coverage with 273-line `HarmonyRelease` schema
|
|
|
|
### Limitations
|
|
|
|
- **No REST API**: Web UI only, no programmatic JSON endpoints
|
|
- **No authentication**: Public access only
|
|
- **No metrics/monitoring**: No health endpoint, no Sentry integration
|
|
- **Scraping fragility**: HTML-based providers break when sites change
|
|
- **Deno-only**: Fresh framework ties project to Deno ecosystem
|
|
|
|
## Relevance to Metadata Aggregation
|
|
|
|
Harmony represents the **gold standard** for multi-source music metadata aggregation:
|
|
|
|
### Architectural Lessons
|
|
|
|
1. **Provider abstraction**: Base classes with URLPattern matching, rate limiting, caching
|
|
2. **Harmonized schema**: `HarmonyRelease` as universal internal format
|
|
3. **Intelligent merging**: 3-phase merge with provider preferences
|
|
4. **Permalink system**: Timestamp-based cache replay for reproducibility
|
|
5. **Quality ratings**: Per-feature, per-provider quality assessment
|
|
|
|
### Adoption Recommendations
|
|
|
|
- **HarmonyRelease schema**: Adopt as internal data model
|
|
- **Merge algorithm**: Study 3-phase merge with compatibility checking
|
|
- **Provider base classes**: Reuse abstraction patterns
|
|
- **MBID resolution**: Batch URL lookup (100 per request) is efficient
|
|
- **Testing framework**: Declarative provider specs with offline mode
|
|
|
|
## Configuration Management
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# OAuth2 Credentials
|
|
HARMONY_SPOTIFY_CLIENT_ID=your_client_id
|
|
HARMONY_SPOTIFY_CLIENT_SECRET=your_client_secret
|
|
HARMONY_TIDAL_CLIENT_ID=your_client_id
|
|
HARMONY_TIDAL_CLIENT_SECRET=your_client_secret
|
|
|
|
# MusicBrainz Integration
|
|
HARMONY_MB_API_URL=https://musicbrainz.org/ws/2
|
|
HARMONY_MB_TARGET_URL=https://musicbrainz.org
|
|
|
|
# Storage
|
|
HARMONY_DATA_DIR=/path/to/data
|
|
|
|
# Server
|
|
PORT=8000
|
|
FORWARD_PROTO=https
|
|
```
|
|
|
|
### Configuration Helpers
|
|
|
|
Located in `utils/config.ts`:
|
|
- `getFromEnv(key, defaultValue)`: String environment variables
|
|
- `getBooleanFromEnv(key, defaultValue)`: Boolean parsing
|
|
- `getUrlFromEnv(key, defaultValue)`: URL validation
|
|
|
|
### Template
|
|
|
|
`.env.example` provides a complete configuration template for new deployments.
|
|
|
|
## Community and Licensing
|
|
|
|
- **License**: MIT (permissive, commercial-friendly)
|
|
- **Copyright**: 2022-2024 David Kellner
|
|
- **Community**: MusicBrainz editor community
|
|
- **Contribution**: Single maintainer, open to contributions
|
|
- **Documentation**: Comprehensive inline comments and type definitions
|
|
|
|
## Summary
|
|
|
|
Harmony is a production-ready, TypeScript-based music metadata aggregator that demonstrates best practices in:
|
|
- Multi-source data integration
|
|
- Intelligent conflict resolution
|
|
- MusicBrainz ecosystem integration
|
|
- Type-safe architecture
|
|
- Graceful error handling
|
|
|
|
Its 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED) and provider abstraction system make it the most relevant reference project for building a comprehensive metadata aggregation system.
|