feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
+751
View File
@@ -0,0 +1,751 @@
# Harmony - API and Interface Analysis
## API Architecture
Harmony is a **web UI-first application** built on the Fresh framework. It does not provide a traditional REST API or JSON endpoints. All interactions occur through server-side rendered HTML pages with embedded data.
### Framework: Fresh 1.6.8
Fresh is a Deno-native web framework with:
- **Server-side rendering (SSR)**: All pages rendered on server
- **Islands architecture**: Selective client-side interactivity
- **File-based routing**: Routes defined by file structure
- **Zero config**: No build step required for development
## Route Structure
### Main Application Routes
| Route | File | Method | Purpose |
|-------|------|--------|---------|
| `/` | `routes/index.tsx` | GET | Landing page with documentation |
| `/release` | `routes/release.tsx` | GET | Main lookup and comparison interface |
| `/release/actions` | `routes/release/actions.tsx` | GET | ISRC/cover submission for existing MB releases |
| `/about` | `routes/about.tsx` | GET | Provider documentation and feature matrix |
| `/settings` | `routes/settings.tsx` | GET/POST | User preferences (stored in cookies) |
### Static Assets
| Route | Purpose |
|-------|---------|
| `/static/*` | CSS, JavaScript, images |
| `/favicon.ico` | Site favicon |
## Primary Route: `/release`
The main interface for metadata lookup and harmonization.
### Query Parameters
#### Core Lookup Parameters
| Parameter | Type | Required | Description | Example |
|-----------|------|----------|-------------|---------|
| `gtin` | string | No* | Global Trade Item Number (barcode) | `0602537347377` |
| `url` | string[] | No* | Provider URL(s), supports multiple | `https://open.spotify.com/album/xyz` |
*At least one of `gtin` or `url` must be provided.
#### Provider-Specific Parameters
| Parameter | Type | Description | Example |
|-----------|------|-------------|---------|
| `[provider_name]` | string | Provider-specific ID or GTIN lookup | `spotify=3DiDSNVBRYVzccLn2yqhMJ` |
| `[provider_name]!` | empty | Template mode for provider | `musicbrainz!` |
**Supported Provider Names**:
- `spotify`
- `deezer`
- `itunes`
- `tidal`
- `bandcamp`
- `beatport`
- `musicbrainz`
- `mora`
- `ototoy`
#### Filtering Parameters
| Parameter | Type | Default | Description | Values |
|-----------|------|---------|-------------|--------|
| `region` | string[] | `GB,US,DE,JP` | Market regions for lookup | ISO 3166-1 alpha-2 codes |
| `category` | string | `default` | Provider category filter | `all`, `default`, `preferred` |
#### Permalink Parameters
| Parameter | Type | Description | Example |
|-----------|------|-------------|---------|
| `ts` | number | Unix timestamp for cache replay | `1704067200` |
### Request Examples
#### GTIN Lookup (Default Regions)
```
GET /release?gtin=0602537347377
```
Queries all GTIN-supporting providers in default regions (GB, US, DE, JP).
#### GTIN Lookup (Specific Regions)
```
GET /release?gtin=0602537347377&region=JP,US
```
Queries only Japan and US regions.
#### URL Lookup (Single Provider)
```
GET /release?url=https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ
```
Queries only Spotify using the provided URL.
#### URL Lookup (Multiple Providers)
```
GET /release?url=https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ&url=https://www.deezer.com/album/123456
```
Queries both Spotify and Deezer.
#### Provider-Specific ID Lookup
```
GET /release?spotify=3DiDSNVBRYVzccLn2yqhMJ&deezer=123456
```
Queries Spotify and Deezer using their native IDs.
#### Template Mode (MusicBrainz)
```
GET /release?gtin=0602537347377&musicbrainz!
```
Uses MusicBrainz as template provider (reference data for merge).
#### Category Filtering
```
GET /release?gtin=0602537347377&category=preferred
```
Queries only preferred providers (Spotify, Tidal, MusicBrainz).
#### Permalink (Cache Replay)
```
GET /release?gtin=0602537347377&ts=1704067200
```
Replays cached lookup from timestamp 1704067200.
### Response Format
The `/release` route returns an **HTML page** with embedded data, not JSON.
#### Response Sections
1. **Release Header**
- Title
- Artist credit
- Release date
- GTIN (if available)
2. **Provider Comparison Table**
- Side-by-side comparison of all providers
- Color-coded compatibility indicators
- Feature quality ratings
3. **Harmonized Metadata Display**
- Merged release information
- Track listing with ISRCs
- Label and catalog number information
- Cover art images
- Copyright and availability info
4. **MusicBrainz Seeder Form**
- Pre-filled form for MB import
- Edit note with provider URLs
- Annotation with extra data
- Copy-to-clipboard functionality
5. **Warnings and Messages**
- Compatibility conflicts
- Provider errors
- Missing data indicators
- Duplicate detection warnings
6. **Permalink**
- Timestamp-based URL for reproducibility
- Share button
#### Example Response Structure (HTML)
```html
<!DOCTYPE html>
<html>
<head>
<title>Album Title - Artist Name | Harmony</title>
<!-- Meta tags, CSS -->
</head>
<body>
<header>
<!-- Navigation -->
</header>
<main>
<!-- Release Header -->
<section class="release-header">
<h1>Album Title</h1>
<p class="artist-credit">Artist Name</p>
<p class="release-date">2014-11-24</p>
<p class="gtin">GTIN: 0602537347377</p>
</section>
<!-- Provider Comparison -->
<section class="provider-comparison">
<table>
<thead>
<tr>
<th>Property</th>
<th>Spotify</th>
<th>Deezer</th>
<th>iTunes</th>
<th>Merged</th>
</tr>
</thead>
<tbody>
<!-- Comparison rows -->
</tbody>
</table>
</section>
<!-- Harmonized Metadata -->
<section class="harmonized-release">
<!-- Track listing, labels, images, etc. -->
</section>
<!-- MusicBrainz Seeder -->
<section class="musicbrainz-seeder">
<form>
<!-- Pre-filled MB import form -->
</form>
</section>
<!-- Warnings -->
<section class="warnings">
<!-- Compatibility warnings, errors -->
</section>
<!-- Permalink -->
<section class="permalink">
<input type="text" readonly value="https://harmony.example.com/release?gtin=0602537347377&ts=1704067200">
<button>Copy</button>
</section>
</main>
<footer>
<!-- Footer content -->
</footer>
<!-- Island hydration scripts -->
<script type="module" src="/islands/LookupForm.js"></script>
<script type="module" src="/islands/SeederForm.js"></script>
</body>
</html>
```
### Error Handling
Errors are displayed inline in the HTML response:
#### Provider Errors
```html
<div class="provider-error">
<strong>Spotify:</strong> Rate limit exceeded. Retry after 60 seconds.
</div>
```
#### Lookup Errors
```html
<div class="lookup-error">
<strong>Error:</strong> No providers found for GTIN 0602537347377 in region CN.
</div>
```
#### Compatibility Warnings
```html
<div class="compatibility-warning">
<strong>Warning:</strong> Release date conflict:
<ul>
<li>Spotify: 2014-11-24</li>
<li>iTunes: 2014-11-25</li>
</ul>
Using Spotify value (higher preference).
</div>
```
## Secondary Routes
### `/` - Landing Page
**Purpose**: Introduction and quick start guide
**Content**:
- Project description
- Supported providers
- Usage examples
- Link to `/about` for detailed documentation
**No query parameters**
### `/release/actions` - ISRC/Cover Submission
**Purpose**: Submit ISRCs or cover art for existing MusicBrainz releases
**Query Parameters**:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `mbid` | string | Yes | MusicBrainz release ID |
| `action` | string | Yes | `isrc` or `cover` |
**Example**:
```
GET /release/actions?mbid=12345678-1234-1234-1234-123456789012&action=isrc
```
**Response**: Form for submitting ISRCs or cover art to MusicBrainz
### `/about` - Provider Documentation
**Purpose**: Detailed provider information and feature comparison
**Content**:
- Provider descriptions
- Feature quality matrix
- Rate limits and authentication requirements
- Supported regions
- Known limitations
**No query parameters**
**Feature Quality Matrix Example**:
| Provider | GTIN | Title | Artists | Date | Labels | Tracks | ISRC | Images | Copyright |
|----------|------|-------|---------|------|--------|--------|------|--------|-----------|
| Spotify | ✓ | ✓ | ✓ | ✓ | ~ | ✓ | ✓ | 2000px | ~ |
| Deezer | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 1400px | ✓ |
| iTunes | ✓ | ✓ | ✓ | ✓ | ~ | ✓ | ~ | Varies | ~ |
| Tidal | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 1280px | ✓ |
| Bandcamp | ✗ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | 3000px | ✓ |
Legend:
- ✓ = GOOD quality
- ~ = PRESENT quality
- ✗ = MISSING
### `/settings` - User Preferences
**Purpose**: Configure user preferences
**Method**: GET (display form), POST (save preferences)
**Preferences**:
| Setting | Type | Default | Description |
|---------|------|---------|-------------|
| `defaultRegions` | string[] | `['GB','US','DE','JP']` | Default regions for lookup |
| `defaultCategory` | string | `default` | Default provider category |
| `providerPreferences` | string[] | Custom order | Provider preference order for merge |
| `showCompatibilityWarnings` | boolean | `true` | Display compatibility warnings |
| `cacheStrategy` | string | `24h` | Cache duration |
**Storage**: Preferences stored in cookies (no server-side storage)
**Example Cookie**:
```
harmony_prefs={"defaultRegions":["JP","US"],"defaultCategory":"preferred","providerPreferences":["spotify","tidal","deezer"]}; Max-Age=31536000; Path=/
```
## Islands (Client-Side Interactivity)
Fresh's islands architecture enables selective client-side interactivity.
### Island Components
#### 1. LookupForm Island
**File**: `islands/LookupForm.tsx`
**Purpose**: Dynamic lookup form with validation
**Features**:
- Real-time GTIN validation
- URL parsing and provider detection
- Region multi-select
- Category radio buttons
- Form submission with loading state
**Client-Side Logic**:
```typescript
// Conceptual
function LookupForm() {
const [gtin, setGtin] = useState('');
const [urls, setUrls] = useState<string[]>([]);
const [regions, setRegions] = useState(['GB', 'US', 'DE', 'JP']);
const validateGtin = (value: string) => {
// GTIN-13 validation
return /^\d{13}$/.test(value);
};
const handleSubmit = async (e: Event) => {
e.preventDefault();
// Navigate to /release with query params
const params = new URLSearchParams();
if (gtin) params.set('gtin', gtin);
urls.forEach(url => params.append('url', url));
params.set('region', regions.join(','));
window.location.href = `/release?${params}`;
};
return (
<form onSubmit={handleSubmit}>
{/* Form fields */}
</form>
);
}
```
#### 2. ProviderSelector Island
**File**: `islands/ProviderSelector.tsx`
**Purpose**: Provider category filtering
**Features**:
- Category selection (all/default/preferred)
- Individual provider checkboxes
- Real-time URL update
#### 3. RegionSelector Island
**File**: `islands/RegionSelector.tsx`
**Purpose**: Multi-region selection
**Features**:
- Checkbox list of supported regions
- Select all / deselect all
- Common region presets (US+GB, Japan, Europe)
#### 4. PermalinkGenerator Island
**File**: `islands/PermalinkGenerator.tsx`
**Purpose**: Generate timestamp-based permalink
**Features**:
- Current timestamp capture
- URL generation with `ts` parameter
- Copy to clipboard
- Share button
**Client-Side Logic**:
```typescript
function PermalinkGenerator({ currentUrl }: { currentUrl: string }) {
const [permalink, setPermalink] = useState('');
const generatePermalink = () => {
const url = new URL(currentUrl);
url.searchParams.set('ts', Math.floor(Date.now() / 1000).toString());
setPermalink(url.toString());
};
const copyToClipboard = () => {
navigator.clipboard.writeText(permalink);
};
return (
<div>
<button onClick={generatePermalink}>Generate Permalink</button>
{permalink && (
<>
<input type="text" readonly value={permalink} />
<button onClick={copyToClipboard}>Copy</button>
</>
)}
</div>
);
}
```
#### 5. SeederForm Island
**File**: `islands/SeederForm.tsx`
**Purpose**: MusicBrainz import form with copy functionality
**Features**:
- Pre-filled form fields
- Copy individual fields to clipboard
- Copy entire form as JSON
- Open MusicBrainz seeder in new tab
**Client-Side Logic**:
```typescript
function SeederForm({ release }: { release: MergedHarmonyRelease }) {
const copyField = (field: string, value: string) => {
navigator.clipboard.writeText(value);
};
const openSeeder = () => {
const mbUrl = `https://musicbrainz.org/release/add`;
const form = document.createElement('form');
form.method = 'POST';
form.action = mbUrl;
form.target = '_blank';
// Add form fields
Object.entries(release).forEach(([key, value]) => {
const input = document.createElement('input');
input.type = 'hidden';
input.name = key;
input.value = JSON.stringify(value);
form.appendChild(input);
});
document.body.appendChild(form);
form.submit();
document.body.removeChild(form);
};
return (
<div>
{/* Form fields with copy buttons */}
<button onClick={openSeeder}>Open in MusicBrainz</button>
</div>
);
}
```
## No REST API
Harmony **does not provide a REST API** or JSON endpoints. Key implications:
### No JSON Responses
All routes return HTML. There is no `Accept: application/json` support.
**Request**:
```
GET /release?gtin=0602537347377
Accept: application/json
```
**Response**:
```
HTTP/1.1 200 OK
Content-Type: text/html
<!DOCTYPE html>
<!-- HTML response, not JSON -->
```
### No Programmatic Access
Clients cannot fetch data programmatically without HTML parsing.
**Workaround** (not officially supported):
1. Fetch HTML response
2. Parse HTML with DOM parser
3. Extract data from structured elements
**Example** (conceptual):
```typescript
const response = await fetch('/release?gtin=0602537347377');
const html = await response.text();
const doc = new DOMParser().parseFromString(html, 'text/html');
const title = doc.querySelector('.release-header h1')?.textContent;
```
### No API Authentication
No API keys, no OAuth2 for API access (OAuth2 only used for provider authentication).
### No Rate Limiting on Server
Server does not enforce rate limits (providers have their own limits).
## Request/Response Flow
### Typical Request Flow
```
1. User submits lookup form
2. Browser sends GET /release?gtin=...&region=...
3. Fresh router matches route to routes/release.tsx
4. Route handler executes:
a. Parse query parameters
b. Call CombinedReleaseLookup
c. Parallel provider queries
d. Harmonize responses
e. Merge releases
f. Generate MusicBrainz seeding data
5. Server-side rendering:
a. Render components with data
b. Generate HTML
c. Inject island hydration scripts
6. HTTP response sent to browser
7. Browser renders HTML
8. Island hydration:
a. Load island JavaScript modules
b. Attach event listeners
c. Enable client-side interactivity
```
### Caching Strategy
#### Server-Side Caching
- **snap_storage**: Caches HTTP responses from providers
- **Cache key**: URL + query parameters
- **Cache duration**: 24 hours (configurable)
- **Cache storage**: SQLite database (`snaps.db`) + file directory (`snaps/`)
#### Client-Side Caching
- **Browser cache**: Standard HTTP caching headers
- **localStorage**: OAuth2 tokens, MBID mappings (dev mode)
- **sessionStorage**: MBID mappings (production mode)
- **Cookies**: User preferences
#### Permalink Caching
The `ts` parameter enables cache replay:
1. User performs lookup at timestamp T
2. Responses cached with timestamp T
3. Permalink generated: `/release?gtin=...&ts=T`
4. Future requests with `ts=T` replay cached responses
5. Ensures reproducible results even if provider data changes
**Cache Lookup Logic**:
```typescript
async function getCachedResponse(url: string, timestamp?: number): Promise<Response | null> {
if (timestamp) {
// Permalink mode: lookup by timestamp
return await cache.getByTimestamp(url, timestamp);
} else {
// Normal mode: lookup by recency
return await cache.getRecent(url, MAX_AGE);
}
}
```
## Error Responses
### HTTP Status Codes
| Status | Scenario |
|--------|----------|
| 200 | Success (even with partial provider failures) |
| 400 | Invalid query parameters |
| 404 | Route not found |
| 500 | Server error (unhandled exception) |
### Error Display
Errors displayed inline in HTML, not as HTTP error codes.
**Example**: All providers fail, but response is still 200 OK with error messages in HTML.
## Performance Considerations
### Parallel Provider Queries
All provider lookups execute in parallel via `Promise.allSettled`:
```typescript
const lookups = providers.map(p => p.lookup(input));
const results = await Promise.allSettled(lookups);
```
**Benefits**:
- Faster total response time
- Graceful degradation (partial results)
**Typical Response Times**:
- Single provider: 200-500ms
- Multiple providers (parallel): 500-1500ms
- Cached response: <50ms
### Server-Side Rendering Overhead
Fresh SSR adds minimal overhead:
- Component rendering: 10-50ms
- HTML generation: 5-20ms
- Total SSR overhead: <100ms
### Island Hydration
Islands load asynchronously after initial page render:
- Initial HTML render: Immediate
- Island JavaScript load: 100-300ms
- Island hydration: 50-100ms
**User experience**: Page is interactive immediately, islands enhance progressively.
## Integration Patterns
### Embedding in Other Applications
Since Harmony has no REST API, integration requires:
1. **iFrame embedding**: Embed `/release` route in iFrame
2. **Redirect**: Redirect users to Harmony for lookup
3. **HTML parsing**: Fetch and parse HTML responses (fragile)
**iFrame Example**:
```html
<iframe src="https://harmony.example.com/release?gtin=0602537347377" width="100%" height="600"></iframe>
```
### MusicBrainz Integration
Harmony integrates with MusicBrainz via:
1. **Seeder form**: Pre-filled form for MB import
2. **Edit notes**: Include provider URLs and permalink
3. **Annotations**: Extra metadata not in main form
4. **MBID resolution**: Batch URL lookup to detect duplicates
**Workflow**:
```
1. User performs lookup in Harmony
2. Harmony displays harmonized release
3. User clicks "Open in MusicBrainz"
4. Seeder form opens in new tab
5. User reviews and submits to MusicBrainz
```
## Summary
Harmony's API design prioritizes:
1. **Web UI first**: No REST API, HTML-only responses
2. **Server-side rendering**: Fast initial load, SEO-friendly
3. **Islands architecture**: Selective client-side interactivity
4. **Permalink system**: Reproducible results via timestamp caching
5. **Graceful degradation**: Partial results on provider failures
6. **MusicBrainz integration**: Seamless seeding workflow
This design is optimized for human users (MusicBrainz editors) rather than programmatic API consumers. For a metadata aggregation system targeting API consumers, a REST API layer would need to be added.
@@ -0,0 +1,795 @@
# Harmony - Architecture Analysis
## System Architecture Overview
Harmony implements a **4-stage pipeline architecture** for metadata aggregation and harmonization:
```
┌──────────┐ ┌────────────┐ ┌───────┐ ┌──────┐
│ LOOKUP │ --> │ HARMONIZE │ --> │ MERGE │ --> │ SEED │
└──────────┘ └────────────┘ └───────┘ └──────┘
│ │ │ │
Parallel Provider 3-phase MusicBrainz
Multi-source Conversion Merge Format
Queries to Harmony Algorithm Conversion
```
Each stage has distinct responsibilities and operates on well-defined data structures.
## Stage 1: LOOKUP
### CombinedReleaseLookup
The entry point for all metadata retrieval operations.
**Location**: `harmonizer/combined_lookup.ts`
**Responsibilities**:
- Accepts GTIN, URLs, or provider-specific IDs
- Determines which providers to query based on input
- Executes provider lookups in parallel
- Handles provider failures gracefully via `Promise.allSettled`
- Returns array of provider-specific release objects
**Input Types**:
```typescript
interface LookupInput {
gtin?: string; // Global Trade Item Number (barcode)
urls?: string[]; // Provider URLs
region?: string[]; // Market regions (e.g., ['GB', 'US', 'JP'])
category?: string; // Provider category filter
providerIds?: Record<string, string>; // Provider-specific IDs
}
```
**Parallel Execution**:
```typescript
// Conceptual flow
const lookupPromises = providers.map(provider =>
provider.lookup(input).catch(error => ({ error }))
);
const results = await Promise.allSettled(lookupPromises);
```
**Output**: Array of provider-native release objects (Spotify, Deezer, iTunes formats, etc.)
### Provider Selection Logic
1. **URL-based**: Extract provider from URL pattern matching
2. **GTIN-based**: Query all providers supporting GTIN lookup
3. **Category filtering**: Apply user preferences (all/default/preferred)
4. **Region filtering**: Pass region codes to region-aware providers
## Stage 2: HARMONIZE
### Provider Conversion
Each provider implements a `harmonize()` method that converts its native format to `HarmonyRelease`.
**Location**: Individual provider files in `providers/`
**Conversion Responsibilities**:
- Map provider-specific field names to Harmony schema
- Normalize data types (dates, durations, ISRCs)
- Extract nested structures (artists, labels, media)
- Detect language and script from metadata
- Resolve release types (album, single, EP, etc.)
- Extract external links and identifiers
**Example Provider Conversion** (conceptual):
```typescript
class SpotifyProvider extends MetadataApiProvider {
harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease {
return {
title: spotifyAlbum.name,
artists: this.convertArtists(spotifyAlbum.artists),
gtin: spotifyAlbum.external_ids?.upc,
media: this.convertTracks(spotifyAlbum.tracks),
releaseDate: this.parseDate(spotifyAlbum.release_date),
images: this.convertImages(spotifyAlbum.images),
externalLinks: [{
url: spotifyAlbum.external_urls.spotify,
types: ['streaming']
}],
// ... additional fields
};
}
}
```
### HarmonyRelease Schema
**Location**: `harmonizer/types.ts` (273 lines)
**Core Structure**:
```typescript
interface HarmonyRelease {
// Basic metadata
title: string;
artists: ArtistCreditName[];
gtin?: string;
// Media and tracks
media: HarmonyMedium[];
// Release details
language?: string;
script?: string;
status?: ReleaseStatus;
types: ReleaseType[];
releaseDate?: PartialDate;
// Commercial info
labels: Label[];
packaging?: PackagingType;
copyright?: string;
// Distribution
availableIn?: string[]; // Country codes
excludedFrom?: string[]; // Country codes
// Visual assets
images: Image[];
// Links and identifiers
externalLinks: ExternalLink[];
// Metadata about metadata
info: {
providers: string[]; // Which providers contributed
messages: Message[]; // Warnings, errors
sourceMap?: SourceMap; // Property -> provider mapping
incompatibleData?: IncompatibilityInfo;
};
}
```
**Key Sub-structures**:
#### ArtistCreditName
```typescript
interface ArtistCreditName {
name: string; // Display name
creditedName?: string; // Alternative credit
joinPhrase?: string; // Separator (e.g., " & ", " feat. ")
mbid?: string; // MusicBrainz ID
}
```
#### HarmonyMedium
```typescript
interface HarmonyMedium {
title?: string;
format?: MediumFormat; // CD, Vinyl, Digital, etc.
position: number;
tracks: HarmonyTrack[];
}
```
#### HarmonyTrack
```typescript
interface HarmonyTrack {
title: string;
artists?: ArtistCreditName[];
position: number;
length?: number; // Duration in milliseconds
isrc?: string; // International Standard Recording Code
}
```
#### Label
```typescript
interface Label {
name: string;
catalogNumber?: string;
mbid?: string;
}
```
#### Image
```typescript
interface Image {
url: string;
types: ImageType[]; // 'front', 'back', 'medium', etc.
width?: number;
height?: number;
comment?: string;
}
```
### Harmonizer Modules
**Location**: `harmonizer/` directory
| Module | Purpose | Lines |
|--------|---------|-------|
| `types.ts` | HarmonyRelease schema and type definitions | 273 |
| `merge.ts` | 3-phase merge algorithm | ~200 |
| `compatibility.ts` | Conflict detection and resolution | ~150 |
| `deduplicate.ts` | Remove duplicate entries | ~100 |
| `isrc.ts` | ISRC validation and normalization | ~50 |
| `language_script.ts` | Auto-detect language and script | ~100 |
| `release_label.ts` | Label normalization | ~80 |
| `release_types.ts` | Release type inference | ~120 |
| `tracklist_gap.ts` | Detect missing tracks | ~60 |
## Stage 3: MERGE
### 3-Phase Merge Algorithm
**Location**: `harmonizer/merge.ts`
The merge algorithm combines multiple `HarmonyRelease` objects into a single `MergedHarmonyRelease` using provider preferences and compatibility checking.
#### Phase 1: Property Collection
Collect all values for each property across all releases:
```typescript
// Conceptual
const propertyValues = {
title: ['Album Title', 'Album Title (Deluxe)', 'Album Title'],
gtin: ['0602537347377', '0602537347377'],
releaseDate: ['2014-11-24', '2014-11-24', '2014-11-25'],
// ... all properties
};
```
#### Phase 2: Compatibility Checking
For each property, check if values are compatible:
```typescript
interface CompatibilityCheck {
compatible: boolean;
canonicalValue?: any;
conflicts?: ConflictInfo[];
}
```
**Compatibility Rules**:
- **Strings**: Case-insensitive comparison, whitespace normalization
- **Dates**: Partial date matching (year-only vs. full date)
- **Arrays**: Set comparison (order-independent)
- **Numbers**: Exact match or within tolerance
- **Objects**: Recursive field comparison
**Example Compatibility**:
```typescript
// Compatible
'2014-11-24' '2014-11' // Partial date match
'Album Title' 'album title' // Case-insensitive
// Incompatible
'2014-11-24' '2014-11-25' // Date conflict
'Album' 'EP' // Type conflict
```
#### Phase 3: Value Selection
For each property, select the best value using provider preferences:
**Provider Preference Order** (configurable):
1. MusicBrainz (template/reference)
2. Spotify (high quality, comprehensive)
3. Tidal (high quality audio metadata)
4. Deezer (good coverage)
5. iTunes (region-specific)
6. Bandcamp (artist-verified)
7. Beatport (electronic music specialist)
8. Mora (Japan specialist)
9. Ototoy (Japan specialist)
**Selection Logic**:
```typescript
function selectBestValue(values: PropertyValues, preferences: string[]): any {
// 1. Filter to compatible values only
const compatible = values.filter(v => v.isCompatible);
// 2. If no compatible values, mark as conflict
if (compatible.length === 0) {
return { conflict: true, values };
}
// 3. Select from highest-preference provider
for (const provider of preferences) {
const value = compatible.find(v => v.provider === provider);
if (value) return value.data;
}
// 4. Fallback to first compatible value
return compatible[0].data;
}
```
### MergedHarmonyRelease
Extends `HarmonyRelease` with merge metadata:
```typescript
interface MergedHarmonyRelease extends HarmonyRelease {
sourceMap: SourceMap; // Property -> provider mapping
incompatibleData?: IncompatibilityInfo;
}
interface SourceMap {
[propertyPath: string]: string; // e.g., "title" -> "spotify"
}
interface IncompatibilityInfo {
conflicts: Conflict[];
warnings: string[];
}
interface Conflict {
property: string;
values: Array<{
provider: string;
value: any;
}>;
}
```
### Deduplication
**Location**: `harmonizer/deduplicate.ts`
Removes duplicate entries in arrays:
- **Artists**: Match by name (case-insensitive) or MBID
- **Labels**: Match by name and catalog number
- **Tracks**: Match by position and title
- **Images**: Match by URL or dimensions
- **External links**: Match by URL
### Compatibility Checking
**Location**: `harmonizer/compatibility.ts`
Detects and reports incompatible data:
**Incompatibility Types**:
1. **Value conflicts**: Different values for same property
2. **Type conflicts**: Different data types
3. **Structural conflicts**: Different array lengths, missing required fields
4. **Semantic conflicts**: Logically incompatible values (e.g., release date before artist birth)
**Handling**:
- **Strict mode**: Reject merge if any conflicts
- **Lenient mode**: Prefer highest-quality provider, log warnings
- **User override**: Allow manual conflict resolution
## Stage 4: SEED
### MusicBrainz Seeding
**Location**: `musicbrainz/seeding.ts`
Converts `MergedHarmonyRelease` to MusicBrainz import format.
**Conversion Steps**:
1. Map HarmonyRelease fields to MusicBrainz schema
2. Generate edit notes with provider URLs
3. Create permalink for reproducibility
4. Build annotation with extra data (copyright, availability)
5. Format for MusicBrainz seeder form
**MusicBrainz Mapping**:
| Harmony Field | MusicBrainz Field | Notes |
|---------------|-------------------|-------|
| `title` | Release name | Direct mapping |
| `artists` | Artist credit | Join with `joinPhrase` |
| `gtin` | Barcode | Validate format |
| `releaseDate` | Release events | Per-country events |
| `labels` | Release labels | With catalog numbers |
| `media` | Mediums | With format and tracks |
| `types` | Release group types | Primary + secondary |
| `language` | Language | ISO 639-3 code |
| `script` | Script | ISO 15924 code |
| `packaging` | Packaging | Jewel case, digipak, etc. |
**Edit Note Generation**:
```typescript
function generateEditNote(release: MergedHarmonyRelease, permalink: string): string {
const sources = release.info.providers.join(', ');
return `
Imported from ${sources} via Harmony
Permalink: ${permalink}
${release.externalLinks.map(link => link.url).join('\n')}
`.trim();
}
```
### MBID Resolution
**Location**: `musicbrainz/mbid_mapping.ts`
Resolves external URLs to MusicBrainz IDs (MBIDs).
**Batch Lookup**:
- Collects up to 100 URLs
- Single MusicBrainz API request: `GET /ws/2/url?resource={url1}&resource={url2}&...`
- Caches results in localStorage (dev) or sessionStorage (prod)
- Returns MBID mappings
**Duplicate Detection**:
- Checks if release already exists in MusicBrainz
- Warns user before creating duplicate
- Provides link to existing release
**Cache Strategy**:
```typescript
interface MBIDCache {
[externalUrl: string]: {
mbid: string;
type: 'release' | 'release-group' | 'recording' | 'artist';
cached: number; // Timestamp
};
}
```
### Annotation Builder
**Location**: `musicbrainz/annotation.ts`
Generates MusicBrainz annotation text for additional metadata:
**Included Data**:
- Copyright information
- Availability/exclusion regions
- Provider-specific notes
- Compatibility warnings
- Image URLs (if not added as cover art)
**Format**:
```
Copyright: © 2014 Record Label
Available in: US, GB, DE, JP
Excluded from: CN
Sources:
- Spotify: https://open.spotify.com/album/xyz
- Deezer: https://www.deezer.com/album/123
Notes:
- Release date conflict: Spotify (2014-11-24) vs iTunes (2014-11-25)
```
## Provider Architecture
### Base Class Hierarchy
```
MetadataProvider (abstract)
├── MetadataApiProvider (OAuth2 support)
│ ├── SpotifyProvider
│ └── TidalProvider
├── ReleaseLookup (GTIN/URL/ID support)
│ ├── DeezerProvider
│ ├── iTunesProvider
│ ├── BandcampProvider
│ ├── BeatportProvider
│ ├── MoraProvider
│ └── OtotoyProvider
└── ReleaseApiLookup (multi-region support)
├── iTunesProvider
└── DeezerProvider
```
### MetadataProvider (Abstract Base)
**Location**: `providers/base.ts`
**Core Responsibilities**:
- URL pattern matching via `URLPattern`
- Rate limiting with configurable delays
- HTTP response caching via `snap_storage`
- Error handling and retry logic
- Feature quality ratings
**Key Methods**:
```typescript
abstract class MetadataProvider {
// URL pattern matching
abstract urlPattern: URLPattern;
matchesUrl(url: string): boolean;
// Lookup methods
abstract lookupByUrl(url: string): Promise<Release>;
abstract lookupByGtin(gtin: string, region?: string): Promise<Release>;
// Harmonization
abstract harmonize(release: Release): HarmonyRelease;
// Rate limiting
protected rateLimit: RateLimiter;
protected async throttle(): Promise<void>;
// Caching
protected cache: SnapStorage;
protected async getCached(key: string): Promise<Response | null>;
protected async setCached(key: string, response: Response): Promise<void>;
// Feature quality
abstract featureQuality: FeatureQualityMap;
}
```
### MetadataApiProvider (OAuth2)
**Location**: `providers/api_base.ts`
**Additional Responsibilities**:
- OAuth2 token acquisition and refresh
- Token caching in localStorage
- Automatic token renewal
- API client configuration
**OAuth2 Flow**:
```typescript
class MetadataApiProvider extends MetadataProvider {
protected async getAccessToken(): Promise<string> {
// 1. Check cache
const cached = localStorage.getItem(`${this.name}_token`);
if (cached && !this.isTokenExpired(cached)) {
return cached.access_token;
}
// 2. Request new token
const token = await this.requestToken();
// 3. Cache token
localStorage.setItem(`${this.name}_token`, JSON.stringify(token));
return token.access_token;
}
protected abstract async requestToken(): Promise<OAuth2Token>;
}
```
### ReleaseLookup
**Location**: `providers/release_lookup.ts`
**Lookup Methods**:
```typescript
interface ReleaseLookup {
lookupByUrl(url: string): Promise<Release>;
lookupByGtin(gtin: string): Promise<Release>;
lookupById(id: string): Promise<Release>;
}
```
### ReleaseApiLookup (Multi-Region)
**Location**: `providers/release_api_lookup.ts`
**Region Handling**:
```typescript
class ReleaseApiLookup extends ReleaseLookup {
protected supportedRegions: string[]; // ['US', 'GB', 'JP', ...]
async lookupByGtin(gtin: string, regions: string[]): Promise<Release[]> {
const lookups = regions
.filter(r => this.supportedRegions.includes(r))
.map(r => this.lookupInRegion(gtin, r));
const results = await Promise.allSettled(lookups);
return results
.filter(r => r.status === 'fulfilled')
.map(r => r.value);
}
protected abstract lookupInRegion(gtin: string, region: string): Promise<Release>;
}
```
### Provider Registry
**Location**: `providers/registry.ts`
Manages provider instantiation and categorization.
**Registry Structure**:
```typescript
class ProviderRegistry {
private providers: Map<string, MetadataProvider>;
private categories: Map<string, string[]>; // category -> provider names
register(provider: MetadataProvider, category: string): void;
get(name: string): MetadataProvider | undefined;
getByCategory(category: string): MetadataProvider[];
getByUrl(url: string): MetadataProvider | undefined;
getByGtin(): MetadataProvider[]; // All GTIN-supporting providers
}
```
**Categories**:
- `default`: Commonly used providers (Spotify, Deezer, iTunes)
- `preferred`: High-quality providers (Spotify, Tidal, MusicBrainz)
- `all`: All registered providers
- `japan`: Japan-specific providers (Mora, Ototoy)
- `electronic`: Electronic music specialists (Beatport)
### Feature Quality Ratings
Each provider declares quality ratings for supported features:
```typescript
interface FeatureQualityMap {
gtin: FeatureQuality;
title: FeatureQuality;
artists: FeatureQuality;
releaseDate: FeatureQuality;
labels: FeatureQuality;
media: FeatureQuality;
tracks: FeatureQuality;
isrc: FeatureQuality;
images: FeatureQuality | number; // Number = max dimension
copyright: FeatureQuality;
availability: FeatureQuality;
}
enum FeatureQuality {
MISSING = 0,
BAD = 1,
PRESENT = 2,
GOOD = 3,
}
```
**Example** (Spotify):
```typescript
featureQuality = {
gtin: FeatureQuality.GOOD,
title: FeatureQuality.GOOD,
artists: FeatureQuality.GOOD,
releaseDate: FeatureQuality.GOOD,
labels: FeatureQuality.PRESENT,
media: FeatureQuality.GOOD,
tracks: FeatureQuality.GOOD,
isrc: FeatureQuality.GOOD,
images: 2000, // Max 2000px
copyright: FeatureQuality.PRESENT,
availability: FeatureQuality.GOOD,
};
```
## Server Architecture (Fresh Framework)
### Fresh Islands Architecture
Fresh uses a hybrid rendering model:
- **Server-side rendering (SSR)**: Default for all components
- **Islands**: Client-side interactive components
**Benefits**:
- Minimal JavaScript shipped to client
- Fast initial page load
- Progressive enhancement
- SEO-friendly
### Route Structure
**Location**: `routes/` directory
| Route File | URL | Purpose |
|------------|-----|---------|
| `index.tsx` | `/` | Landing page |
| `release.tsx` | `/release` | Main lookup interface |
| `release/actions.tsx` | `/release/actions` | ISRC/cover submission |
| `about.tsx` | `/about` | Provider documentation |
| `settings.tsx` | `/settings` | User preferences |
### Components
**Location**: `components/` directory
**22 Static Components** (server-rendered):
- Layout components (Header, Footer, Navigation)
- Display components (ReleaseInfo, TrackList, ArtistCredit)
- Comparison components (ProviderTable, FeatureMatrix)
- Form components (LookupForm, SeederForm)
**5 Interactive Islands** (client-side):
- `LookupForm.tsx`: Dynamic form with validation
- `ProviderSelector.tsx`: Provider category filtering
- `RegionSelector.tsx`: Multi-region selection
- `PermalinkGenerator.tsx`: Timestamp-based permalink creation
- `SeederForm.tsx`: MusicBrainz import form with copy-to-clipboard
### Request Flow
```
1. Browser Request
2. Fresh Router (routes/release.tsx)
3. CombinedReleaseLookup (parallel provider queries)
4. Provider Harmonization (convert to HarmonyRelease)
5. Merge Algorithm (combine releases)
6. Server-Side Rendering (generate HTML)
7. Island Hydration (activate interactive components)
8. Browser Response
```
## Data Flow Diagram
```
┌─────────────────────────────────────────────────────────────┐
│ User Input │
│ GTIN: 0602537347377 URLs: [spotify, deezer] Region: US │
└────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ CombinedReleaseLookup │
│ - Parse input │
│ - Select providers (Spotify, Deezer) │
│ - Execute parallel lookups │
└────────────────────────┬────────────────────────────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Spotify │ │ Deezer │ │ iTunes │
│ Provider │ │ Provider │ │ Provider │
│ │ │ │ │ │
│ - API call │ │ - API call │ │ - API call │
│ - Cache │ │ - Cache │ │ - Cache │
│ - Parse │ │ - Parse │ │ - Parse │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Harmonize │ │ Harmonize │ │ Harmonize │
│ (Spotify) │ │ (Deezer) │ │ (iTunes) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└────────────────┼────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Merge Algorithm │
│ Phase 1: Collect property values from all releases │
│ Phase 2: Check compatibility │
│ Phase 3: Select best value per property │
└────────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ MergedHarmonyRelease │
│ - Unified metadata │
│ - Source map (property -> provider) │
│ - Incompatibility warnings │
└────────────────────────┬────────────────────────────────────┘
┌───────────────┼───────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Web UI Display │ │ MusicBrainz │
│ - Comparison │ │ Seeding │
│ - Warnings │ │ - Convert │
│ - Permalink │ │ - Edit note │
└─────────────────┘ │ - Annotation │
└─────────────────┘
```
## Summary
Harmony's architecture demonstrates:
1. **Clear separation of concerns**: 4-stage pipeline with distinct responsibilities
2. **Provider abstraction**: Base classes handle common functionality (caching, rate limiting, OAuth2)
3. **Type safety**: 273-line HarmonyRelease schema ensures data consistency
4. **Intelligent merging**: 3-phase algorithm with compatibility checking and provider preferences
5. **Graceful degradation**: `Promise.allSettled` ensures partial results on provider failures
6. **MusicBrainz integration**: Seamless conversion to MB format with MBID resolution
7. **Modern web stack**: Fresh framework with SSR and islands for optimal performance
This architecture is production-ready and serves as an excellent reference for building metadata aggregation systems.
+832
View File
@@ -0,0 +1,832 @@
# Harmony - Codebase and Implementation Analysis
## Project Structure
```
harmony/
├── cli.ts # CLI entry point
├── config.ts # Configuration management (36 lines)
├── deno.json # Deno configuration and tasks
├── deno.lock # Dependency lock file
├── .env.example # Environment variable template
├── .github/
│ └── workflows/
│ └── deno.yml # CI/CD pipeline
├── components/ # UI components (22 static)
│ ├── Header.tsx
│ ├── Footer.tsx
│ ├── ReleaseInfo.tsx
│ ├── TrackList.tsx
│ ├── ProviderTable.tsx
│ └── ...
├── islands/ # Interactive components (5 islands)
│ ├── LookupForm.tsx
│ ├── ProviderSelector.tsx
│ ├── RegionSelector.tsx
│ ├── PermalinkGenerator.tsx
│ └── SeederForm.tsx
├── routes/ # Fresh routes
│ ├── index.tsx # Landing page
│ ├── release.tsx # Main lookup interface
│ ├── about.tsx # Provider documentation
│ ├── settings.tsx # User preferences
│ └── release/
│ └── actions.tsx # ISRC/cover submission
├── static/ # Static assets
│ ├── styles.css
│ └── favicon.ico
├── server/ # Server entry points
│ ├── main.ts # Production server
│ └── dev.ts # Development server
├── providers/ # Provider implementations
│ ├── base.ts # MetadataProvider abstract class
│ ├── api_base.ts # MetadataApiProvider (OAuth2)
│ ├── release_lookup.ts # ReleaseLookup interface
│ ├── release_api_lookup.ts # ReleaseApiLookup (multi-region)
│ ├── registry.ts # ProviderRegistry
│ ├── spotify.ts # Spotify provider
│ ├── deezer.ts # Deezer provider
│ ├── itunes.ts # iTunes provider
│ ├── tidal.ts # Tidal provider
│ ├── musicbrainz.ts # MusicBrainz provider
│ ├── bandcamp.ts # Bandcamp provider
│ ├── beatport.ts # Beatport provider
│ ├── mora.ts # Mora provider
│ └── ototoy.ts # Ototoy provider
├── harmonizer/ # Harmonization modules
│ ├── types.ts # HarmonyRelease schema (273 lines)
│ ├── combined_lookup.ts # CombinedReleaseLookup
│ ├── merge.ts # 3-phase merge algorithm
│ ├── compatibility.ts # Compatibility checking
│ ├── deduplicate.ts # Deduplication
│ ├── isrc.ts # ISRC validation
│ ├── language_script.ts # Language/script detection
│ ├── release_label.ts # Label normalization
│ ├── release_types.ts # Release type inference
│ └── tracklist_gap.ts # Track gap detection
├── musicbrainz/ # MusicBrainz integration
│ ├── seeding.ts # MB format conversion
│ ├── mbid_mapping.ts # MBID resolution (batch 100)
│ ├── api_client.ts # MB API client
│ ├── annotation.ts # Annotation builder
│ └── edit_link.ts # Edit link generation
├── utils/ # Utility modules
│ ├── config.ts # Config helpers
│ ├── logger.ts # Logging setup
│ ├── rate_limiter.ts # Rate limiting
│ ├── cache.ts # Cache utilities
│ └── errors.ts # Error classes
├── testdata/ # Test fixtures (43 cached responses)
│ ├── spotify/
│ ├── deezer/
│ ├── itunes/
│ └── ...
└── tests/ # Test files (38 total)
├── providers/
│ ├── spotify_test.ts
│ ├── deezer_test.ts
│ └── ...
├── harmonizer/
│ ├── merge_test.ts
│ ├── compatibility_test.ts
│ └── ...
└── musicbrainz/
├── seeding_test.ts
└── mbid_mapping_test.ts
```
## Configuration Management
### config.ts (36 lines)
**Location**: `config.ts`
**Purpose**: Centralized configuration with environment variable loading
**Structure**:
```typescript
export const config = {
// OAuth2 Credentials
spotify: {
clientId: getFromEnv('HARMONY_SPOTIFY_CLIENT_ID'),
clientSecret: getFromEnv('HARMONY_SPOTIFY_CLIENT_SECRET')
},
tidal: {
clientId: getFromEnv('HARMONY_TIDAL_CLIENT_ID'),
clientSecret: getFromEnv('HARMONY_TIDAL_CLIENT_SECRET')
},
// MusicBrainz Configuration
musicbrainz: {
apiUrl: getUrlFromEnv('HARMONY_MB_API_URL', 'https://musicbrainz.org/ws/2'),
targetUrl: getUrlFromEnv('HARMONY_MB_TARGET_URL', 'https://musicbrainz.org')
},
// Data Storage
dataDir: getFromEnv('HARMONY_DATA_DIR', './'),
// Server Configuration
port: parseInt(getFromEnv('PORT', '8000')),
forwardProto: getFromEnv('FORWARD_PROTO'),
deploymentId: getFromEnv('DENO_DEPLOYMENT_ID')
};
```
### utils/config.ts
**Configuration Helpers**:
```typescript
export function getFromEnv(key: string, defaultValue?: string): string {
const value = Deno.env.get(key);
if (value === undefined) {
if (defaultValue !== undefined) {
return defaultValue;
}
throw new Error(`Environment variable ${key} is required but not set`);
}
return value;
}
export function getBooleanFromEnv(key: string, defaultValue: boolean): boolean {
const value = Deno.env.get(key);
if (value === undefined) return defaultValue;
return value.toLowerCase() === 'true' || value === '1';
}
export function getUrlFromEnv(key: string, defaultValue?: string): string {
const value = getFromEnv(key, defaultValue);
try {
new URL(value); // Validate URL format
return value;
} catch {
throw new Error(`Environment variable ${key} is not a valid URL: ${value}`);
}
}
```
### .env.example
**Template**:
```bash
# OAuth2 Credentials
# Get from: https://developer.spotify.com/dashboard
HARMONY_SPOTIFY_CLIENT_ID=
HARMONY_SPOTIFY_CLIENT_SECRET=
# Get from: https://developer.tidal.com/
HARMONY_TIDAL_CLIENT_ID=
HARMONY_TIDAL_CLIENT_SECRET=
# MusicBrainz Configuration
HARMONY_MB_API_URL=https://musicbrainz.org/ws/2
HARMONY_MB_TARGET_URL=https://musicbrainz.org
# Data Storage
HARMONY_DATA_DIR=/var/lib/harmony
# Server Configuration
PORT=8000
FORWARD_PROTO=https
```
## Logging System
### utils/logger.ts
**Logger Setup**:
```typescript
import * as log from 'std/log/mod.ts';
export async function setupLogging() {
await log.setup({
handlers: {
console: new log.handlers.ConsoleHandler('DEBUG', {
formatter: (record) => {
const timestamp = new Date(record.datetime).toISOString();
const level = record.levelName.padEnd(7);
const logger = record.loggerName.padEnd(20);
return `${timestamp} ${level} ${logger} ${record.msg}`;
},
useColors: true
})
},
loggers: {
'harmony.lookup': {
level: 'INFO',
handlers: ['console']
},
'harmony.mbid': {
level: 'DEBUG',
handlers: ['console']
},
'harmony.provider': {
level: 'INFO',
handlers: ['console']
},
'harmony.server': {
level: 'INFO',
handlers: ['console']
},
'requests': {
level: 'INFO',
handlers: ['console']
}
}
});
}
```
### Logger Usage
**Get logger**:
```typescript
import * as log from 'std/log/mod.ts';
const logger = log.getLogger('harmony.provider');
```
**Log levels**:
```typescript
logger.debug('Debug message');
logger.info('Info message');
logger.warning('Warning message');
logger.error('Error message');
logger.critical('Critical message');
```
**Structured logging**:
```typescript
logger.info(`Fetching album ${albumId} from ${providerName}`);
logger.warning(`Rate limit exceeded, retrying after ${retryAfter}s`);
logger.error(`Provider ${providerName} failed: ${error.message}`);
```
### Color Formatting
**Console output** (with ANSI colors):
```
2024-01-01T12:00:00.000Z INFO harmony.lookup Looking up GTIN 0602537347377
2024-01-01T12:00:00.123Z INFO harmony.provider Spotify: Fetching album 3DiDSNVBRYVzccLn2yqhMJ
2024-01-01T12:00:00.456Z DEBUG harmony.provider Spotify: Using cached response
2024-01-01T12:00:00.789Z WARN harmony.provider iTunes: Rate limit exceeded
2024-01-01T12:00:01.234Z INFO harmony.lookup Merge complete: 3 providers
```
**Color scheme**:
- DEBUG: Gray
- INFO: Blue
- WARNING: Yellow
- ERROR: Red
- CRITICAL: Red + bold
## Error Handling
### Error Hierarchy
**File**: `utils/errors.ts`
```typescript
// Base error
export class LookupError extends Error {
constructor(message: string) {
super(message);
this.name = 'LookupError';
}
}
// Provider errors
export class ProviderError extends LookupError {
constructor(
public provider: string,
message: string
) {
super(`${provider}: ${message}`);
this.name = 'ProviderError';
}
}
// HTTP/API errors
export class ResponseError extends ProviderError {
constructor(
provider: string,
public status: number,
message: string
) {
super(provider, `HTTP ${status}: ${message}`);
this.name = 'ResponseError';
}
}
// Data compatibility errors
export class CompatibilityError extends LookupError {
constructor(
public property: string,
public values: any[]
) {
super(`Incompatible values for ${property}: ${JSON.stringify(values)}`);
this.name = 'CompatibilityError';
}
}
// Cache errors
export class CacheMissError extends LookupError {
constructor(
public key: string
) {
super(`Cache miss for key: ${key}`);
this.name = 'CacheMissError';
}
}
```
### Error Handling Patterns
#### Graceful Degradation
```typescript
// Use Promise.allSettled for parallel provider queries
const lookupPromises = providers.map(provider =>
provider.lookup(input).catch(error => {
logger.warning(`Provider ${provider.name} failed: ${error.message}`);
return null; // Return null on error
})
);
const results = await Promise.allSettled(lookupPromises);
// Filter successful results
const releases = results
.filter(r => r.status === 'fulfilled' && r.value !== null)
.map(r => r.value);
if (releases.length === 0) {
throw new LookupError('All providers failed');
}
```
#### Rate Limit Handling
```typescript
async function fetchWithRetry(url: string, maxRetries = 3): Promise<Response> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url);
if (response.status === 429) {
// Rate limit exceeded
const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
if (retryAfter > 300) {
// Don't wait more than 5 minutes
throw new ResponseError('provider', 429, `Rate limit exceeded, retry after ${retryAfter}s (too long)`);
}
logger.warning(`Rate limit exceeded, retrying after ${retryAfter}s`);
await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
continue;
}
if (!response.ok) {
throw new ResponseError('provider', response.status, response.statusText);
}
return response;
}
throw new ResponseError('provider', 429, 'Rate limit exceeded after max retries');
}
```
#### Error Propagation
```typescript
try {
const release = await provider.lookup(input);
return provider.harmonize(release);
} catch (error) {
if (error instanceof ProviderError) {
// Log and re-throw provider errors
logger.error(error.message);
throw error;
} else {
// Wrap unexpected errors
throw new ProviderError(provider.name, error.message);
}
}
```
## Testing Infrastructure
### Test Framework
**Deno built-in testing** + `@std/testing`:
```typescript
import { assertEquals, assertExists } from '@std/testing/asserts';
import { describe, it } from '@std/testing/bdd';
```
### Test Structure
**38 test files** organized by module:
```
tests/
├── providers/
│ ├── spotify_test.ts
│ ├── deezer_test.ts
│ ├── itunes_test.ts
│ ├── tidal_test.ts
│ ├── musicbrainz_test.ts
│ ├── bandcamp_test.ts
│ ├── beatport_test.ts
│ ├── mora_test.ts
│ └── ototoy_test.ts
├── harmonizer/
│ ├── merge_test.ts
│ ├── compatibility_test.ts
│ ├── deduplicate_test.ts
│ ├── isrc_test.ts
│ ├── language_script_test.ts
│ ├── release_label_test.ts
│ ├── release_types_test.ts
│ └── tracklist_gap_test.ts
└── musicbrainz/
├── seeding_test.ts
├── mbid_mapping_test.ts
├── annotation_test.ts
└── edit_link_test.ts
```
### Declarative Provider Tests
**File**: `tests/utils/describe_provider.ts`
**Purpose**: Consistent provider testing with minimal boilerplate
**Usage**:
```typescript
import { describeProvider } from '../utils/describe_provider.ts';
describeProvider({
name: 'Spotify',
provider: new SpotifyProvider(),
tests: {
urlMatching: [
{ url: 'https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ', shouldMatch: true },
{ url: 'https://www.deezer.com/album/123456', shouldMatch: false }
],
gtinLookup: {
gtin: '0602537347377',
expectedTitle: 'Album Title',
expectedArtists: ['Artist Name']
},
urlLookup: {
url: 'https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ',
expectedTitle: 'Album Title'
},
harmonization: {
input: spotifyAlbumFixture,
expectedFields: ['title', 'artists', 'gtin', 'media', 'images']
}
}
});
```
**Generated tests**:
- URL pattern matching
- GTIN lookup
- URL lookup
- Harmonization
- Feature quality validation
### Snapshot Testing
**Purpose**: Verify output stability across changes
**Example**:
```typescript
import { assertSnapshot } from '@std/testing/snapshot';
Deno.test('Spotify harmonization snapshot', async (t) => {
const provider = new SpotifyProvider();
const spotifyAlbum = await loadFixture('spotify/album.json');
const harmonyRelease = provider.harmonize(spotifyAlbum);
await assertSnapshot(t, harmonyRelease);
});
```
**Snapshot file** (auto-generated):
```typescript
// __snapshots__/spotify_test.ts.snap
export const snapshot = {
"Spotify harmonization snapshot": {
title: "Album Title",
artists: [{ name: "Artist Name" }],
gtin: "0602537347377",
// ... full object
}
};
```
### Offline Testing
**Test data**: 43 cached responses in `testdata/`
**Structure**:
```
testdata/
├── spotify/
│ ├── album_3DiDSNVBRYVzccLn2yqhMJ.json
│ ├── album_search_upc_0602537347377.json
│ └── ...
├── deezer/
│ ├── album_123456.json
│ └── ...
├── itunes/
│ ├── lookup_us_123456.json
│ └── ...
└── ...
```
**Loading fixtures**:
```typescript
async function loadFixture(path: string): Promise<any> {
const content = await Deno.readTextFile(`testdata/${path}`);
return JSON.parse(content);
}
```
**Offline mode** (default):
```bash
deno test -A
```
Uses cached responses from `testdata/`, no network requests.
**Download mode** (fetch fresh data):
```bash
deno test -A --download
```
Fetches fresh responses from providers and updates `testdata/`.
### Test Coverage
**Run tests with coverage**:
```bash
deno test -A --coverage=coverage
deno coverage coverage
```
**Coverage report**:
```
file:///opt/harmony/providers/spotify.ts 95.2%
file:///opt/harmony/harmonizer/merge.ts 88.7%
file:///opt/harmony/musicbrainz/seeding.ts 92.3%
...
```
## Code Style
### Formatting Rules
**File**: `deno.json`
```json
{
"fmt": {
"useTabs": true,
"lineWidth": 120,
"indentWidth": 4,
"singleQuote": true,
"proseWrap": "preserve"
}
}
```
**Rules**:
- **Tabs**: Use tabs for indentation (not spaces)
- **Line width**: 120 characters maximum
- **Quotes**: Single quotes for strings
- **Semicolons**: Required
- **Trailing commas**: Allowed
**Format code**:
```bash
deno fmt
```
**Check formatting**:
```bash
deno fmt --check
```
### Linting Rules
**File**: `deno.json`
```json
{
"lint": {
"rules": {
"tags": ["recommended"],
"exclude": ["no-explicit-any"]
}
}
}
```
**Lint code**:
```bash
deno lint
```
**Common lint errors**:
- Unused variables
- Missing return types
- Unreachable code
- Prefer `const` over `let`
### Type Checking
**Strict mode** enabled:
```json
{
"compilerOptions": {
"strict": true,
"noImplicitAny": true,
"strictNullChecks": true,
"strictFunctionTypes": true
}
}
```
**Type check**:
```bash
deno check **/*.ts
```
## Dependency Management
### deno.json
**Import map**:
```json
{
"imports": {
"$fresh/": "https://deno.land/x/fresh@1.6.8/",
"preact": "https://esm.sh/preact@10.19.6",
"preact/": "https://esm.sh/preact@10.19.6/",
"@preact/signals": "https://esm.sh/@preact/signals@1.2.2",
"@kellnerd/musicbrainz": "https://deno.land/x/musicbrainz@v0.5.0/mod.ts",
"snap-storage": "https://deno.land/x/snap_storage@v0.2.0/mod.ts",
"@std/": "https://deno.land/std@0.208.0/"
}
}
```
**Key dependencies**:
| Dependency | Version | Purpose |
|------------|---------|---------|
| Fresh | 1.6.8 | Web framework |
| Preact | 10.19.6 | UI library |
| @kellnerd/musicbrainz | 0.5.0 | MusicBrainz API client |
| snap-storage | 0.2.0 | HTTP response caching |
| @std/* | 0.208.0 | Deno standard library |
### Lock File
**deno.lock**: Dependency integrity verification
**Update lock file**:
```bash
deno cache --reload --lock=deno.lock --lock-write deps.ts
```
## Tasks
### deno.json Tasks
```json
{
"tasks": {
"check": "deno fmt --check && deno lint && deno check **/*.ts",
"ok": "deno fmt && deno lint && deno check **/*.ts && deno test -A",
"cli": "deno run -A cli.ts",
"dev": "deno run -A --watch=static/,routes/ server/dev.ts",
"build": "deno run -A server/dev.ts build",
"server": "DENO_DEPLOYMENT_ID=$(git describe --tags --always) deno run -A server/main.ts"
}
}
```
**Task descriptions**:
| Task | Purpose | Usage |
|------|---------|-------|
| `check` | Verify code quality (format, lint, type check) | `deno task check` |
| `ok` | Format, lint, check, and test | `deno task ok` |
| `cli` | Run CLI | `deno task cli --gtin 0602537347377` |
| `dev` | Start development server | `deno task dev` |
| `build` | Build static assets | `deno task build` |
| `server` | Start production server | `deno task server` |
## No External Tooling
Harmony **does not use**:
- **Sentry**: No error tracking
- **Prometheus**: No metrics collection
- **Datadog/New Relic**: No APM
- **Webpack/Vite**: Fresh handles bundling
- **ESLint**: Deno lint built-in
- **Prettier**: Deno fmt built-in
- **Jest/Mocha**: Deno test built-in
**Rationale**: Deno provides all necessary tooling out-of-the-box.
## Performance Optimizations
### Parallel Provider Queries
```typescript
const lookups = providers.map(p => p.lookup(input));
const results = await Promise.allSettled(lookups);
```
**Benefit**: Reduce total response time from sum of provider latencies to max of provider latencies.
### HTTP Response Caching
```typescript
const cached = await cache.get(url);
if (cached) return cached;
const response = await fetch(url);
await cache.set(url, response);
return response;
```
**Benefit**: Avoid redundant API calls, comply with rate limits.
### OAuth2 Token Caching
```typescript
const cached = localStorage.getItem('spotify_token');
if (cached && !isExpired(cached)) {
return cached.access_token;
}
```
**Benefit**: Reduce token requests, faster authentication.
### Server-Side Rendering
Fresh SSR generates HTML on server, reducing client-side JavaScript.
**Benefit**: Faster initial page load, better SEO.
### Islands Architecture
Only interactive components load JavaScript on client.
**Benefit**: Minimal JavaScript bundle size, faster page interactivity.
## Summary
Harmony's codebase demonstrates:
1. **Clean architecture**: Clear separation of concerns (providers, harmonizer, MusicBrainz)
2. **Type safety**: Full TypeScript coverage with strict mode
3. **Comprehensive testing**: 38 test files with declarative provider specs
4. **Offline testing**: 43 cached responses for reproducible tests
5. **Logging system**: 5 specialized loggers with color formatting
6. **Error hierarchy**: Structured error handling with graceful degradation
7. **Configuration management**: Environment variables with validation
8. **Code quality**: Deno fmt, lint, and type check enforced
9. **No external tooling**: Deno provides all necessary tools
10. **Performance optimizations**: Parallel queries, caching, SSR, islands
This codebase is production-ready and serves as an excellent reference for building type-safe, well-tested metadata aggregation systems.
+955
View File
@@ -0,0 +1,955 @@
# Harmony - Data Model and Storage Analysis
## Storage Philosophy
Harmony employs a **cache-first, no-database** architecture:
- **No traditional database**: No PostgreSQL, MySQL, MongoDB, etc.
- **No persistent user data**: No accounts, no saved searches, no user-generated content
- **Cache as storage**: HTTP response caching via `snap_storage` library
- **In-memory processing**: All data transformations happen in memory
- **Stateless design**: Each request is independent
This approach prioritizes:
- **Simplicity**: No database migrations, no schema evolution
- **Reproducibility**: Permalink system enables exact result replay
- **API compliance**: Caching reduces provider API calls
- **Deployment ease**: No database server required
## Persistence Layer: snap_storage
### Overview
`snap_storage` is a Deno library for HTTP response caching with SQLite backend.
**Repository**: https://github.com/kellnerd/snap-storage (same author as Harmony)
**Purpose**: Store HTTP responses with timestamps for later retrieval
### Storage Structure
#### SQLite Database: `snaps.db`
**Location**: `${HARMONY_DATA_DIR}/snaps.db` (default: `./snaps.db`)
**Schema** (conceptual):
```sql
CREATE TABLE snaps (
id INTEGER PRIMARY KEY AUTOINCREMENT,
key TEXT NOT NULL UNIQUE,
url TEXT NOT NULL,
timestamp INTEGER NOT NULL,
status INTEGER NOT NULL,
headers TEXT NOT NULL,
body_path TEXT NOT NULL,
created_at INTEGER NOT NULL
);
CREATE INDEX idx_snaps_key ON snaps(key);
CREATE INDEX idx_snaps_timestamp ON snaps(timestamp);
CREATE INDEX idx_snaps_url ON snaps(url);
```
**Fields**:
- `key`: Cache key (hash of URL + parameters)
- `url`: Original request URL
- `timestamp`: Unix timestamp of request
- `status`: HTTP status code
- `headers`: JSON-encoded response headers
- `body_path`: Path to response body file in `snaps/` directory
- `created_at`: Record creation timestamp
#### File Directory: `snaps/`
**Location**: `${HARMONY_DATA_DIR}/snaps/` (default: `./snaps/`)
**Structure**:
```
snaps/
├── 0a/
│ ├── 0a1b2c3d4e5f6g7h8i9j.json
│ └── 0a9f8e7d6c5b4a3.json
├── 1b/
│ └── 1b2c3d4e5f6g7h8i9j0a.json
└── ...
```
**File naming**: First 2 characters of hash as directory, full hash as filename
**File content**: Raw HTTP response body (JSON, HTML, XML, etc.)
### Cache Operations
#### Store Response
```typescript
interface CacheEntry {
url: string;
timestamp: number;
response: Response;
}
async function storeResponse(entry: CacheEntry): Promise<void> {
const key = hashUrl(entry.url);
const bodyPath = `snaps/${key.slice(0, 2)}/${key}.json`;
// Store body to file
await Deno.writeTextFile(bodyPath, await entry.response.text());
// Store metadata to database
await db.execute(`
INSERT INTO snaps (key, url, timestamp, status, headers, body_path, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?)
`, [
key,
entry.url,
entry.timestamp,
entry.response.status,
JSON.stringify(Object.fromEntries(entry.response.headers)),
bodyPath,
Date.now()
]);
}
```
#### Retrieve Response
```typescript
async function getResponse(url: string, timestamp?: number): Promise<Response | null> {
const key = hashUrl(url);
let query = `SELECT * FROM snaps WHERE key = ?`;
const params = [key];
if (timestamp) {
// Permalink mode: exact timestamp match
query += ` AND timestamp = ?`;
params.push(timestamp);
} else {
// Normal mode: most recent within cache duration
const maxAge = 24 * 60 * 60 * 1000; // 24 hours
query += ` AND created_at > ? ORDER BY created_at DESC LIMIT 1`;
params.push(Date.now() - maxAge);
}
const row = await db.queryOne(query, params);
if (!row) return null;
// Read body from file
const body = await Deno.readTextFile(row.body_path);
// Reconstruct Response object
return new Response(body, {
status: row.status,
headers: JSON.parse(row.headers)
});
}
```
### Cache Policy
#### Default Policy
- **Duration**: 24 hours
- **Eviction**: No automatic eviction (manual cleanup required)
- **Size limit**: No enforced limit (grows indefinitely)
#### Permalink Policy
- **Duration**: Indefinite (never evicted)
- **Purpose**: Enable reproducible results
- **Lookup**: Exact timestamp match
#### Cache Key Generation
```typescript
function hashUrl(url: string): string {
// Normalize URL
const normalized = new URL(url);
normalized.searchParams.sort(); // Consistent parameter order
// Hash normalized URL
const encoder = new TextEncoder();
const data = encoder.encode(normalized.toString());
const hashBuffer = await crypto.subtle.digest('SHA-256', data);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
```
### Cache Management
#### Manual Cleanup
No automatic cleanup. Users must manually delete old cache entries:
```bash
# Delete cache older than 30 days
sqlite3 snaps.db "DELETE FROM snaps WHERE created_at < $(date -d '30 days ago' +%s)000"
# Clean up orphaned files
find snaps/ -type f -mtime +30 -delete
```
#### Cache Statistics
```bash
# Total cache entries
sqlite3 snaps.db "SELECT COUNT(*) FROM snaps"
# Cache size
du -sh snaps/
# Entries per provider
sqlite3 snaps.db "SELECT url, COUNT(*) FROM snaps GROUP BY url"
```
## MBID Cache
### Purpose
Cache MusicBrainz ID (MBID) mappings for external URLs to avoid repeated API calls.
### Storage Location
- **Development**: `localStorage` (persistent across sessions)
- **Production**: `sessionStorage` (cleared on browser close)
**Rationale**: Development benefits from persistent cache, production prioritizes fresh data.
### Cache Structure
```typescript
interface MBIDCache {
[externalUrl: string]: MBIDCacheEntry;
}
interface MBIDCacheEntry {
mbid: string;
type: 'release' | 'release-group' | 'recording' | 'artist' | 'label';
cached: number; // Unix timestamp
}
```
### Cache Operations
#### Store MBID Mapping
```typescript
function cacheMBID(url: string, mbid: string, type: string): void {
const cache = getMBIDCache();
cache[url] = {
mbid,
type,
cached: Date.now()
};
setMBIDCache(cache);
}
function getMBIDCache(): MBIDCache {
const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage;
const cached = storage.getItem('harmony_mbid_cache');
return cached ? JSON.parse(cached) : {};
}
function setMBIDCache(cache: MBIDCache): void {
const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage;
storage.setItem('harmony_mbid_cache', JSON.stringify(cache));
}
```
#### Retrieve MBID Mapping
```typescript
function getCachedMBID(url: string): MBIDCacheEntry | null {
const cache = getMBIDCache();
const entry = cache[url];
if (!entry) return null;
// Check if cache is stale (24 hours)
const maxAge = 24 * 60 * 60 * 1000;
if (Date.now() - entry.cached > maxAge) {
delete cache[url];
setMBIDCache(cache);
return null;
}
return entry;
}
```
#### Batch MBID Lookup
MusicBrainz API supports batch URL lookup (up to 100 URLs per request):
```typescript
async function resolveMBIDs(urls: string[]): Promise<Map<string, MBIDCacheEntry>> {
const results = new Map<string, MBIDCacheEntry>();
// Check cache first
const uncached: string[] = [];
for (const url of urls) {
const cached = getCachedMBID(url);
if (cached) {
results.set(url, cached);
} else {
uncached.push(url);
}
}
// Batch lookup uncached URLs (100 at a time)
for (let i = 0; i < uncached.length; i += 100) {
const batch = uncached.slice(i, i + 100);
const params = batch.map(url => `resource=${encodeURIComponent(url)}`).join('&');
const response = await fetch(`https://musicbrainz.org/ws/2/url?${params}`);
const data = await response.json();
// Parse response and cache results
for (const urlData of data.urls) {
const mbid = urlData.relations[0]?.release?.id;
const type = urlData.relations[0]?.type;
if (mbid) {
cacheMBID(urlData.resource, mbid, type);
results.set(urlData.resource, { mbid, type, cached: Date.now() });
}
}
}
return results;
}
```
## Core Data Model: HarmonyRelease
### Schema Definition
**Location**: `harmonizer/types.ts` (273 lines)
**Full Interface**:
```typescript
interface HarmonyRelease {
// ===== Basic Metadata =====
title: string;
artists: ArtistCreditName[];
gtin?: string; // Global Trade Item Number (barcode)
// ===== Media and Tracks =====
media: HarmonyMedium[];
// ===== Release Details =====
language?: string; // ISO 639-3 code
script?: string; // ISO 15924 code
status?: ReleaseStatus;
types: ReleaseType[];
releaseDate?: PartialDate;
// ===== Commercial Information =====
labels: Label[];
packaging?: PackagingType;
copyright?: string;
// ===== Distribution =====
availableIn?: string[]; // ISO 3166-1 alpha-2 country codes
excludedFrom?: string[]; // ISO 3166-1 alpha-2 country codes
// ===== Visual Assets =====
images: Image[];
// ===== External Links =====
externalLinks: ExternalLink[];
// ===== Metadata About Metadata =====
info: ReleaseInfo;
}
```
### Sub-Structures
#### ArtistCreditName
```typescript
interface ArtistCreditName {
name: string; // Artist name
creditedName?: string; // Alternative credit (e.g., "feat. Artist")
joinPhrase?: string; // Separator (e.g., " & ", " feat. ", " vs. ")
mbid?: string; // MusicBrainz artist ID
}
```
**Example**:
```typescript
[
{ name: "Artist A", joinPhrase: " & " },
{ name: "Artist B", joinPhrase: " feat. " },
{ name: "Artist C", creditedName: "Artist C (DJ Set)" }
]
```
**Rendering**: "Artist A & Artist B feat. Artist C (DJ Set)"
#### HarmonyMedium
```typescript
interface HarmonyMedium {
title?: string; // Medium title (e.g., "Disc 1: The Album")
format?: MediumFormat;
position: number; // 1-indexed
tracks: HarmonyTrack[];
}
enum MediumFormat {
CD = 'CD',
Vinyl = 'Vinyl',
Digital = 'Digital Media',
Cassette = 'Cassette',
DVD = 'DVD',
BluRay = 'Blu-ray',
Other = 'Other'
}
```
#### HarmonyTrack
```typescript
interface HarmonyTrack {
title: string;
artists?: ArtistCreditName[]; // Track-specific artists (overrides release artists)
position: number; // 1-indexed within medium
length?: number; // Duration in milliseconds
isrc?: string; // International Standard Recording Code
}
```
**Example**:
```typescript
{
title: "Track Title",
artists: [{ name: "Track Artist" }],
position: 1,
length: 245000, // 4:05
isrc: "USRC17607839"
}
```
#### Label
```typescript
interface Label {
name: string;
catalogNumber?: string;
mbid?: string; // MusicBrainz label ID
}
```
**Example**:
```typescript
[
{ name: "Record Label", catalogNumber: "RL-12345" },
{ name: "Distributor", catalogNumber: "DIST-67890" }
]
```
#### Image
```typescript
interface Image {
url: string;
types: ImageType[];
width?: number;
height?: number;
comment?: string;
}
enum ImageType {
Front = 'front',
Back = 'back',
Medium = 'medium',
Tray = 'tray',
Booklet = 'booklet',
Obi = 'obi',
Spine = 'spine',
Track = 'track',
Liner = 'liner',
Sticker = 'sticker',
Poster = 'poster',
Watermark = 'watermark',
Raw = 'raw',
Unedited = 'unedited'
}
```
**Example**:
```typescript
[
{
url: "https://i.scdn.co/image/ab67616d0000b273...",
types: [ImageType.Front],
width: 2000,
height: 2000
},
{
url: "https://e-cdn-images.dzcdn.net/images/cover/...",
types: [ImageType.Front],
width: 1400,
height: 1400,
comment: "Deezer cover"
}
]
```
#### ExternalLink
```typescript
interface ExternalLink {
url: string;
types: LinkType[];
}
enum LinkType {
Streaming = 'streaming',
Purchase = 'purchase',
Download = 'download',
License = 'license',
Crowdfunding = 'crowdfunding',
Other = 'other'
}
```
**Example**:
```typescript
[
{
url: "https://open.spotify.com/album/xyz",
types: [LinkType.Streaming]
},
{
url: "https://bandcamp.com/album/xyz",
types: [LinkType.Streaming, LinkType.Purchase]
}
]
```
#### ReleaseInfo
```typescript
interface ReleaseInfo {
providers: string[]; // Provider names that contributed data
messages: Message[]; // Warnings, errors, info messages
sourceMap?: SourceMap; // Property -> provider mapping (only in MergedHarmonyRelease)
incompatibleData?: IncompatibilityInfo; // Conflicts (only in MergedHarmonyRelease)
}
interface Message {
level: 'error' | 'warning' | 'info';
text: string;
provider?: string;
}
```
**Example**:
```typescript
{
providers: ["spotify", "deezer", "itunes"],
messages: [
{
level: "warning",
text: "Release date conflict: Spotify (2014-11-24) vs iTunes (2014-11-25)",
provider: "itunes"
},
{
level: "info",
text: "Using Spotify value (higher preference)"
}
]
}
```
### Enumerations
#### ReleaseStatus
```typescript
enum ReleaseStatus {
Official = 'official',
Promotion = 'promotion',
Bootleg = 'bootleg',
PseudoRelease = 'pseudo-release'
}
```
#### ReleaseType
```typescript
enum ReleaseType {
// Primary types
Album = 'album',
Single = 'single',
EP = 'ep',
Broadcast = 'broadcast',
Other = 'other',
// Secondary types
Compilation = 'compilation',
Soundtrack = 'soundtrack',
Spokenword = 'spokenword',
Interview = 'interview',
Audiobook = 'audiobook',
AudioDrama = 'audio drama',
Live = 'live',
Remix = 'remix',
DJMix = 'dj-mix',
Mixtape = 'mixtape',
Demo = 'demo',
FieldRecording = 'field recording'
}
```
**Usage**: Array of types (primary + secondary)
```typescript
types: [ReleaseType.Album, ReleaseType.Live] // Live album
types: [ReleaseType.EP, ReleaseType.Remix] // Remix EP
```
#### PackagingType
```typescript
enum PackagingType {
JewelCase = 'jewel case',
SlimJewelCase = 'slim jewel case',
Digipak = 'digipak',
Cardboard = 'cardboard/paper sleeve',
KeepCase = 'keep case',
None = 'none',
Other = 'other'
}
```
#### PartialDate
```typescript
interface PartialDate {
year: number;
month?: number; // 1-12
day?: number; // 1-31
}
```
**Examples**:
```typescript
{ year: 2014 } // Year only
{ year: 2014, month: 11 } // Year and month
{ year: 2014, month: 11, day: 24 } // Full date
```
**Serialization**:
```typescript
function serializePartialDate(date: PartialDate): string {
let result = date.year.toString();
if (date.month) {
result += `-${date.month.toString().padStart(2, '0')}`;
if (date.day) {
result += `-${date.day.toString().padStart(2, '0')}`;
}
}
return result;
}
// Examples:
// { year: 2014 } -> "2014"
// { year: 2014, month: 11 } -> "2014-11"
// { year: 2014, month: 11, day: 24 } -> "2014-11-24"
```
## MergedHarmonyRelease
Extends `HarmonyRelease` with merge metadata.
```typescript
interface MergedHarmonyRelease extends HarmonyRelease {
info: ReleaseInfo & {
sourceMap: SourceMap;
incompatibleData?: IncompatibilityInfo;
};
}
interface SourceMap {
[propertyPath: string]: string; // Property path -> provider name
}
interface IncompatibilityInfo {
conflicts: Conflict[];
warnings: string[];
}
interface Conflict {
property: string;
values: ConflictValue[];
}
interface ConflictValue {
provider: string;
value: any;
}
```
**Example**:
```typescript
{
title: "Album Title",
releaseDate: { year: 2014, month: 11, day: 24 },
// ... other fields
info: {
providers: ["spotify", "deezer", "itunes"],
sourceMap: {
"title": "spotify",
"releaseDate": "spotify",
"gtin": "deezer",
"media[0].tracks[0].isrc": "spotify"
},
incompatibleData: {
conflicts: [
{
property: "releaseDate",
values: [
{ provider: "spotify", value: { year: 2014, month: 11, day: 24 } },
{ provider: "itunes", value: { year: 2014, month: 11, day: 25 } }
]
}
],
warnings: [
"Release date conflict resolved using Spotify value (higher preference)"
]
},
messages: []
}
}
```
## Data Transformations
### Provider-Specific to HarmonyRelease
Each provider implements a `harmonize()` method:
```typescript
// Spotify example (conceptual)
class SpotifyProvider {
harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease {
return {
title: spotifyAlbum.name,
artists: spotifyAlbum.artists.map(a => ({
name: a.name,
mbid: undefined // Spotify doesn't provide MBIDs
})),
gtin: spotifyAlbum.external_ids?.upc,
media: [{
format: MediumFormat.Digital,
position: 1,
tracks: spotifyAlbum.tracks.items.map((t, i) => ({
title: t.name,
position: i + 1,
length: t.duration_ms,
isrc: t.external_ids?.isrc
}))
}],
releaseDate: this.parseDate(spotifyAlbum.release_date),
types: this.inferTypes(spotifyAlbum.album_type),
images: spotifyAlbum.images.map(img => ({
url: img.url,
types: [ImageType.Front],
width: img.width,
height: img.height
})),
externalLinks: [{
url: spotifyAlbum.external_urls.spotify,
types: [LinkType.Streaming]
}],
labels: spotifyAlbum.label ? [{ name: spotifyAlbum.label }] : [],
copyright: spotifyAlbum.copyrights?.[0]?.text,
availableIn: spotifyAlbum.available_markets,
info: {
providers: ["spotify"],
messages: []
}
};
}
}
```
### HarmonyRelease to MusicBrainz Format
**Location**: `musicbrainz/seeding.ts`
```typescript
interface MusicBrainzRelease {
name: string;
artist_credit: MBArtistCredit[];
barcode?: string;
release_events: MBReleaseEvent[];
labels: MBLabel[];
mediums: MBMedium[];
release_group: {
primary_type: string;
secondary_types: string[];
};
language?: string;
script?: string;
packaging?: string;
annotation?: string;
}
function convertToMusicBrainz(release: MergedHarmonyRelease): MusicBrainzRelease {
return {
name: release.title,
artist_credit: release.artists.map(a => ({
name: a.name,
credited_name: a.creditedName,
join_phrase: a.joinPhrase || '',
mbid: a.mbid
})),
barcode: release.gtin,
release_events: convertReleaseEvents(release.releaseDate, release.availableIn),
labels: release.labels.map(l => ({
name: l.name,
catalog_number: l.catalogNumber,
mbid: l.mbid
})),
mediums: release.media.map(m => ({
format: m.format,
position: m.position,
title: m.title,
tracks: m.tracks.map(t => ({
title: t.title,
position: t.position,
length: t.length,
isrc: t.isrc,
artist_credit: t.artists?.map(a => ({
name: a.name,
join_phrase: a.joinPhrase || ''
}))
}))
})),
release_group: {
primary_type: release.types.find(t => isPrimaryType(t)) || 'album',
secondary_types: release.types.filter(t => !isPrimaryType(t))
},
language: release.language,
script: release.script,
packaging: release.packaging,
annotation: buildAnnotation(release)
};
}
```
## Data Validation
### GTIN Validation
```typescript
function validateGTIN(gtin: string): boolean {
// GTIN-13 (EAN-13) validation
if (!/^\d{13}$/.test(gtin)) return false;
// Check digit validation
const digits = gtin.split('').map(Number);
const checksum = digits.slice(0, 12).reduce((sum, digit, i) => {
return sum + digit * (i % 2 === 0 ? 1 : 3);
}, 0);
const checkDigit = (10 - (checksum % 10)) % 10;
return checkDigit === digits[12];
}
```
### ISRC Validation
```typescript
function validateISRC(isrc: string): boolean {
// Format: CC-XXX-YY-NNNNN
// CC: Country code (2 letters)
// XXX: Registrant code (3 alphanumeric)
// YY: Year (2 digits)
// NNNNN: Designation code (5 digits)
return /^[A-Z]{2}-?[A-Z0-9]{3}-?\d{2}-?\d{5}$/.test(isrc);
}
function normalizeISRC(isrc: string): string {
// Remove hyphens
return isrc.replace(/-/g, '');
}
```
### Date Validation
```typescript
function validatePartialDate(date: PartialDate): boolean {
if (date.year < 1000 || date.year > 9999) return false;
if (date.month && (date.month < 1 || date.month > 12)) return false;
if (date.day && (date.day < 1 || date.day > 31)) return false;
// Validate day for specific month
if (date.month && date.day) {
const daysInMonth = new Date(date.year, date.month, 0).getDate();
if (date.day > daysInMonth) return false;
}
return true;
}
```
## Data Size Estimates
### Typical HarmonyRelease Size
**Single-disc album** (12 tracks):
- JSON serialized: ~15-25 KB
- With images: ~20-30 KB (image URLs only, not image data)
**Multi-disc compilation** (50 tracks):
- JSON serialized: ~50-80 KB
### Cache Size Estimates
**Provider response sizes**:
- Spotify album: ~10-20 KB
- Deezer album: ~15-25 KB
- iTunes album: ~20-30 KB
- Bandcamp page: ~50-100 KB (HTML)
**Daily cache growth** (100 lookups/day):
- Database: ~50 KB (metadata only)
- Files: ~2-5 MB (response bodies)
**Annual cache size** (36,500 lookups/year):
- Database: ~18 MB
- Files: ~730 MB - 1.8 GB
## No Migrations
Since Harmony has no traditional database, there are no schema migrations.
**Schema evolution strategy**:
1. Add new optional fields to `HarmonyRelease` interface
2. Update provider `harmonize()` methods to populate new fields
3. Update merge algorithm to handle new fields
4. No data migration required (old cached responses still valid)
**Breaking changes**:
1. Rename or remove fields in `HarmonyRelease`
2. Clear cache (delete `snaps.db` and `snaps/`)
3. Rebuild cache on next lookup
## Summary
Harmony's data architecture demonstrates:
1. **Cache-first design**: `snap_storage` eliminates need for traditional database
2. **Permalink system**: Timestamp-based cache replay enables reproducibility
3. **Rich data model**: 273-line `HarmonyRelease` schema covers all metadata needs
4. **Type safety**: Full TypeScript coverage ensures data consistency
5. **No migrations**: Schema evolution without data migration complexity
6. **Stateless processing**: All transformations in-memory, no persistent state
7. **MBID caching**: Efficient batch lookup reduces MusicBrainz API calls
This architecture is ideal for read-heavy, stateless applications where reproducibility and API compliance are priorities.
@@ -0,0 +1,777 @@
# Harmony - Deployment and Operations Analysis
## Deployment Philosophy
Harmony follows a **self-hosted, no-containerization** approach:
- **No Docker**: Direct Deno runtime execution
- **No Kubernetes**: Simple systemd service management
- **No cloud-native complexity**: Traditional server deployment
- **Deno Deploy compatible**: Can deploy to Deno's edge platform
This design prioritizes:
- **Simplicity**: Minimal deployment dependencies
- **Deno consistency**: Same runtime across dev and prod
- **Low overhead**: No container orchestration
- **Easy debugging**: Direct process access
## Production Deployment
### Prerequisites
1. **Deno runtime**: Version 1.37+ (Fresh 1.6.8 requirement)
2. **Git**: For version tracking and deployment
3. **systemd**: For service management (Linux)
4. **Environment variables**: OAuth2 credentials, configuration
### Installation Steps
#### 1. Clone Repository
```bash
cd /opt
git clone https://github.com/kellnerd/harmony.git
cd harmony
```
#### 2. Configure Environment
Create `.env` file from template:
```bash
cp .env.example .env
```
Edit `.env`:
```bash
# OAuth2 Credentials
HARMONY_SPOTIFY_CLIENT_ID=your_spotify_client_id
HARMONY_SPOTIFY_CLIENT_SECRET=your_spotify_client_secret
HARMONY_TIDAL_CLIENT_ID=your_tidal_client_id
HARMONY_TIDAL_CLIENT_SECRET=your_tidal_client_secret
# MusicBrainz Configuration
HARMONY_MB_API_URL=https://musicbrainz.org/ws/2
HARMONY_MB_TARGET_URL=https://musicbrainz.org
# Data Storage
HARMONY_DATA_DIR=/var/lib/harmony
# Server Configuration
PORT=8000
FORWARD_PROTO=https
```
#### 3. Create Data Directory
```bash
mkdir -p /var/lib/harmony/snaps
chown -R harmony:harmony /var/lib/harmony
```
#### 4. Create systemd Service
Create `/etc/systemd/system/harmony.service`:
```ini
[Unit]
Description=Harmony Music Metadata Aggregator
After=network.target
[Service]
Type=simple
User=harmony
Group=harmony
WorkingDirectory=/opt/harmony
EnvironmentFile=/opt/harmony/.env
ExecStart=/usr/local/bin/deno run -A server/main.ts
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/harmony
[Install]
WantedBy=multi-user.target
```
#### 5. Enable and Start Service
```bash
systemctl daemon-reload
systemctl enable harmony
systemctl start harmony
systemctl status harmony
```
### Server Startup
**Command**:
```bash
deno run -A server/main.ts
```
**Flags**:
- `-A`: Allow all permissions (network, read, write, env)
**Alternative** (granular permissions):
```bash
deno run \
--allow-net \
--allow-read=/opt/harmony,/var/lib/harmony \
--allow-write=/var/lib/harmony \
--allow-env \
server/main.ts
```
**Environment Variables**:
| Variable | Required | Default | Purpose |
|----------|----------|---------|---------|
| `PORT` | No | `8000` | HTTP server port |
| `DENO_DEPLOYMENT_ID` | No | Auto-generated | Version identifier |
| `HARMONY_SPOTIFY_CLIENT_ID` | Yes* | - | Spotify OAuth2 client ID |
| `HARMONY_SPOTIFY_CLIENT_SECRET` | Yes* | - | Spotify OAuth2 client secret |
| `HARMONY_TIDAL_CLIENT_ID` | Yes* | - | Tidal OAuth2 client ID |
| `HARMONY_TIDAL_CLIENT_SECRET` | Yes* | - | Tidal OAuth2 client secret |
| `HARMONY_MB_API_URL` | No | `https://musicbrainz.org/ws/2` | MusicBrainz API endpoint |
| `HARMONY_MB_TARGET_URL` | No | `https://musicbrainz.org` | MusicBrainz target instance |
| `HARMONY_DATA_DIR` | No | `./` | Data directory for cache |
| `FORWARD_PROTO` | No | - | Protocol for reverse proxy |
*Required only if using respective provider
**Version Identifier**:
The `DENO_DEPLOYMENT_ID` is auto-generated from git tags:
```bash
export DENO_DEPLOYMENT_ID=$(git describe --tags --always)
# Example: v1.2.3-5-g1a2b3c4
```
This identifier is used for:
- Cache invalidation on deployments
- Version display in UI
- Debugging and logging
### Reverse Proxy Configuration
#### Nginx
```nginx
server {
listen 80;
server_name harmony.example.com;
# Redirect HTTP to HTTPS
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl http2;
server_name harmony.example.com;
# SSL configuration
ssl_certificate /etc/letsencrypt/live/harmony.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/harmony.example.com/privkey.pem;
# Proxy to Harmony
location / {
proxy_pass http://localhost:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
}
# Static assets caching
location /static/ {
proxy_pass http://localhost:8000;
proxy_cache_valid 200 1d;
add_header Cache-Control "public, immutable";
}
}
```
#### Caddy
```caddy
harmony.example.com {
reverse_proxy localhost:8000
header /static/* {
Cache-Control "public, max-age=86400, immutable"
}
}
```
## CI/CD Pipeline
### GitHub Actions Workflow
**File**: `.github/workflows/deno.yml`
**Workflow Structure**:
```yaml
name: Deno CI/CD
on:
push:
branches: [main]
tags: ['v*']
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Deno
uses: denoland/setup-deno@v1
with:
deno-version: v1.x
- name: Format check
run: deno fmt --check
- name: Lint
run: deno lint
- name: Type check
run: deno check **/*.ts
- name: Run tests
run: deno test -A
deploy:
needs: test
runs-on: ubuntu-latest
if: startsWith(github.ref, 'refs/tags/v')
steps:
- uses: actions/checkout@v3
- name: Deploy to server
env:
DEPLOY_KEY: ${{ secrets.DEPLOY_KEY }}
DEPLOY_HOST: ${{ secrets.DEPLOY_HOST }}
DEPLOY_PORT: ${{ secrets.DEPLOY_PORT }}
DEPLOY_USER: ${{ secrets.DEPLOY_USER }}
DEPLOY_TARGET: ${{ secrets.DEPLOY_TARGET }}
DEPLOY_SERVICE: ${{ secrets.DEPLOY_SERVICE }}
run: |
# Setup SSH
mkdir -p ~/.ssh
echo "$DEPLOY_KEY" > ~/.ssh/deploy_key
chmod 600 ~/.ssh/deploy_key
# Rsync code to server
rsync -avz --delete \
--exclude '/deno.lock' \
--exclude '/.env' \
--exclude '/snaps.db' \
--exclude '/snaps/' \
-e "ssh -i ~/.ssh/deploy_key -p $DEPLOY_PORT" \
./ "$DEPLOY_USER@$DEPLOY_HOST:$DEPLOY_TARGET"
# Restart service
ssh -i ~/.ssh/deploy_key -p "$DEPLOY_PORT" \
"$DEPLOY_USER@$DEPLOY_HOST" \
"systemctl restart $DEPLOY_SERVICE"
```
### Deployment Secrets
Configure in GitHub repository settings:
| Secret | Example | Purpose |
|--------|---------|---------|
| `DEPLOY_KEY` | SSH private key | SSH authentication |
| `DEPLOY_HOST` | `harmony.example.com` | Target server hostname |
| `DEPLOY_PORT` | `22` | SSH port |
| `DEPLOY_USER` | `harmony` | SSH user |
| `DEPLOY_TARGET` | `/opt/harmony` | Deployment directory |
| `DEPLOY_SERVICE` | `harmony` | systemd service name |
### Deployment Trigger
**Automatic deployment** on:
- Tagged releases: `v*` (e.g., `v1.2.3`)
- Authorized users only (repository collaborators)
**Manual deployment**:
```bash
git tag v1.2.3
git push origin v1.2.3
```
### Deployment Exclusions
Files excluded from rsync:
- `/deno.lock`: Lock file (regenerated on server)
- `/.env`: Environment variables (server-specific)
- `/snaps.db`: Cache database (preserved on server)
- `/snaps/`: Cache files (preserved on server)
**Rationale**: Preserve cache and configuration across deployments.
### Deployment Verification
After deployment, verify:
1. **Service status**:
```bash
systemctl status harmony
```
2. **Logs**:
```bash
journalctl -u harmony -f
```
3. **Health check**:
```bash
curl https://harmony.example.com/
```
4. **Version**:
Check `DENO_DEPLOYMENT_ID` in logs or UI
## Development Deployment
### Local Development
**Start development server**:
```bash
deno task dev
```
**Features**:
- Auto-reload on file changes
- Watch directories: `static/`, `routes/`
- Hot module replacement for islands
- Development logging (DEBUG level)
**Environment**:
- `DENO_DEPLOYMENT_ID`: Not set (enables localStorage for MBID cache)
- `PORT`: Default `8000`
### Testing
**Run all tests**:
```bash
deno task ok
```
**Equivalent to**:
```bash
deno fmt && deno lint && deno check **/*.ts && deno test -A
```
**Run specific test file**:
```bash
deno test -A providers/spotify_test.ts
```
**Offline testing** (use cached responses):
```bash
deno test -A
```
**Download fresh test data**:
```bash
deno test -A --download
```
## Deno Deploy (Edge Platform)
Harmony is compatible with Deno Deploy for edge deployment.
### Deployment Steps
1. **Create Deno Deploy project**:
- Visit https://dash.deno.com/new
- Connect GitHub repository
- Select `server/main.ts` as entry point
2. **Configure environment variables**:
- Add all `HARMONY_*` variables
- Set `PORT` (auto-configured by Deno Deploy)
3. **Deploy**:
- Automatic deployment on git push
- Edge distribution across global regions
### Deno Deploy Benefits
- **Global edge network**: Low latency worldwide
- **Automatic HTTPS**: Free SSL certificates
- **Auto-scaling**: Handle traffic spikes
- **Zero configuration**: No server management
### Deno Deploy Limitations
- **No persistent storage**: `snap_storage` cache not supported
- **Stateless only**: Each request independent
- **No systemd**: Different service management
**Workaround**: Use external cache (Redis, Cloudflare KV) instead of `snap_storage`.
## Monitoring and Logging
### Logging System
**Logger Configuration**:
```typescript
// utils/logger.ts
import * as log from 'std/log/mod.ts';
await log.setup({
handlers: {
console: new log.handlers.ConsoleHandler('DEBUG', {
formatter: (record) => {
const level = record.levelName.padEnd(7);
const logger = record.loggerName.padEnd(20);
return `${level} ${logger} ${record.msg}`;
},
useColors: true
})
},
loggers: {
'harmony.lookup': { level: 'INFO', handlers: ['console'] },
'harmony.mbid': { level: 'DEBUG', handlers: ['console'] },
'harmony.provider': { level: 'INFO', handlers: ['console'] },
'harmony.server': { level: 'INFO', handlers: ['console'] },
'requests': { level: 'INFO', handlers: ['console'] }
}
});
```
**Log Levels**:
| Logger | Level | Purpose |
|--------|-------|---------|
| `harmony.lookup` | INFO | Release lookup operations |
| `harmony.mbid` | DEBUG | MusicBrainz ID resolution |
| `harmony.provider` | INFO | Provider interactions |
| `harmony.server` | INFO | Server lifecycle events |
| `requests` | INFO | HTTP request logging |
**Example Logs**:
```
INFO harmony.server Server listening on http://localhost:8000
INFO harmony.lookup Looking up GTIN 0602537347377 in regions: GB,US,DE,JP
INFO harmony.provider Spotify: Fetching album 3DiDSNVBRYVzccLn2yqhMJ
DEBUG harmony.provider Spotify: Using cached response
INFO harmony.provider Deezer: Fetching album 123456
WARN harmony.provider iTunes: Rate limit exceeded, retrying after 60s
INFO harmony.lookup Merge complete: 3 providers, 1 conflict
DEBUG harmony.mbid Resolving MBIDs for 3 URLs
INFO requests GET /release?gtin=0602537347377 200 1234ms
```
### systemd Journal
**View logs**:
```bash
# Follow logs
journalctl -u harmony -f
# Last 100 lines
journalctl -u harmony -n 100
# Logs since yesterday
journalctl -u harmony --since yesterday
# Logs with priority ERROR or higher
journalctl -u harmony -p err
```
**Log rotation**: Automatic via systemd (default: 4GB limit, 1 month retention)
### Request Logging Middleware
**File**: `server/middleware/request_logger.ts`
```typescript
export function requestLogger(req: Request, ctx: HandlerContext): Response {
const start = Date.now();
const logger = log.getLogger('requests');
const response = await ctx.next();
const duration = Date.now() - start;
const level = response.status >= 400 ? 'WARN' : 'INFO';
logger[level.toLowerCase()](
`${req.method} ${new URL(req.url).pathname} ${response.status} ${duration}ms`
);
return response;
}
```
### No Metrics or Monitoring
Harmony does **not include**:
- **Prometheus metrics**: No `/metrics` endpoint
- **Health checks**: No `/health` endpoint
- **APM integration**: No New Relic, Datadog, etc.
- **Error tracking**: No Sentry integration
- **Performance monitoring**: No tracing
**Workaround**: Add custom middleware for metrics collection.
**Example Health Check** (custom):
```typescript
// routes/health.ts
export const handler = {
GET: () => {
return new Response(JSON.stringify({
status: 'ok',
version: Deno.env.get('DENO_DEPLOYMENT_ID'),
timestamp: Date.now()
}), {
headers: { 'Content-Type': 'application/json' }
});
}
};
```
## Resource Requirements
### Minimum Requirements
- **CPU**: 1 core
- **RAM**: 512 MB
- **Disk**: 10 GB (for cache growth)
- **Network**: 10 Mbps
### Recommended Requirements
- **CPU**: 2 cores
- **RAM**: 2 GB
- **Disk**: 50 GB (for extensive cache)
- **Network**: 100 Mbps
### Resource Usage Estimates
**Idle**:
- CPU: <1%
- RAM: ~100 MB
**Under load** (10 req/sec):
- CPU: 10-20%
- RAM: ~200 MB
- Network: 1-5 Mbps
**Cache growth**:
- ~2-5 MB per day (100 lookups/day)
- ~730 MB - 1.8 GB per year
## Backup and Recovery
### Backup Strategy
**What to backup**:
1. **Cache database**: `/var/lib/harmony/snaps.db`
2. **Cache files**: `/var/lib/harmony/snaps/`
3. **Configuration**: `/opt/harmony/.env`
**What NOT to backup**:
- Application code (in git repository)
- Deno cache (regenerated automatically)
**Backup script**:
```bash
#!/bin/bash
# /usr/local/bin/harmony-backup.sh
BACKUP_DIR=/backup/harmony
DATE=$(date +%Y%m%d)
# Create backup directory
mkdir -p "$BACKUP_DIR/$DATE"
# Backup cache database
cp /var/lib/harmony/snaps.db "$BACKUP_DIR/$DATE/"
# Backup cache files (compressed)
tar -czf "$BACKUP_DIR/$DATE/snaps.tar.gz" /var/lib/harmony/snaps/
# Backup configuration
cp /opt/harmony/.env "$BACKUP_DIR/$DATE/"
# Delete backups older than 30 days
find "$BACKUP_DIR" -type d -mtime +30 -exec rm -rf {} +
```
**Cron schedule**:
```cron
0 2 * * * /usr/local/bin/harmony-backup.sh
```
### Recovery
**Restore from backup**:
```bash
# Stop service
systemctl stop harmony
# Restore cache database
cp /backup/harmony/20240101/snaps.db /var/lib/harmony/
# Restore cache files
tar -xzf /backup/harmony/20240101/snaps.tar.gz -C /
# Restore configuration
cp /backup/harmony/20240101/.env /opt/harmony/
# Fix permissions
chown -R harmony:harmony /var/lib/harmony
# Start service
systemctl start harmony
```
## Security Considerations
### systemd Hardening
**Security options** in `harmony.service`:
```ini
[Service]
# Prevent privilege escalation
NoNewPrivileges=true
# Private /tmp
PrivateTmp=true
# Read-only system directories
ProtectSystem=strict
# No access to /home
ProtectHome=true
# Read-write access only to data directory
ReadWritePaths=/var/lib/harmony
```
### OAuth2 Credentials
**Storage**:
- Store in `.env` file (not in git)
- Restrict file permissions: `chmod 600 .env`
- Use environment variables in production
**Rotation**:
- Rotate credentials periodically
- Update `.env` and restart service
### HTTPS
**Always use HTTPS** in production:
- Reverse proxy (Nginx, Caddy) handles SSL
- Free certificates via Let's Encrypt
- Set `FORWARD_PROTO=https` environment variable
### Rate Limiting
**No built-in rate limiting** on server:
- Implement in reverse proxy (Nginx `limit_req`)
- Or use Cloudflare rate limiting
**Example Nginx rate limiting**:
```nginx
http {
limit_req_zone $binary_remote_addr zone=harmony:10m rate=10r/s;
server {
location / {
limit_req zone=harmony burst=20 nodelay;
proxy_pass http://localhost:8000;
}
}
}
```
## Troubleshooting
### Common Issues
#### Service won't start
**Check logs**:
```bash
journalctl -u harmony -n 50
```
**Common causes**:
- Missing environment variables
- Port already in use
- Permission issues on data directory
#### High memory usage
**Cause**: Large cache or memory leak
**Solution**:
```bash
# Clear cache
rm -rf /var/lib/harmony/snaps.db /var/lib/harmony/snaps/
# Restart service
systemctl restart harmony
```
#### Provider errors
**Check provider status**:
- Spotify: https://developer.spotify.com/status
- Tidal: Check API version (v1 deprecated)
- MusicBrainz: https://musicbrainz.org/doc/MusicBrainz_Server/Status
**Verify credentials**:
```bash
# Test Spotify OAuth2
curl -X POST https://accounts.spotify.com/api/token \
-H "Authorization: Basic $(echo -n 'client_id:client_secret' | base64)" \
-d "grant_type=client_credentials"
```
## Summary
Harmony's deployment model demonstrates:
1. **Simplicity**: No Docker, no Kubernetes, direct Deno execution
2. **systemd integration**: Standard Linux service management
3. **CI/CD automation**: GitHub Actions with SSH deployment
4. **Deno Deploy compatibility**: Edge deployment option
5. **Comprehensive logging**: 5 specialized loggers with color formatting
6. **Security hardening**: systemd security options
7. **Backup strategy**: Cache and configuration backup
8. **No monitoring**: No built-in metrics or health checks (requires custom implementation)
This deployment approach is ideal for small to medium-scale deployments with minimal operational overhead.
@@ -0,0 +1,959 @@
# Harmony - Evaluation and Recommendations
## Executive Summary
Harmony is the **most relevant and architecturally sound** reference project for building a music metadata aggregation system. Its 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED), provider abstraction system, and intelligent merge algorithm represent best-in-class design patterns for multi-source data integration.
**Key Strengths**:
- Best-in-class multi-source aggregation architecture
- Intelligent 3-phase merge algorithm with provider preferences
- Comprehensive 273-line HarmonyRelease schema
- MusicBrainz integration with MBID resolution and seeding
- Type-safe TypeScript implementation with full test coverage
- Graceful degradation via Promise.allSettled
- Permalink system for reproducible results
**Key Limitations**:
- Web UI only (no REST/JSON API)
- Single developer project (bus factor = 1)
- No containerization (Docker)
- HTML scraping providers are fragile
- No monitoring/metrics infrastructure
**Recommendation**: **Adopt Harmony's architecture patterns** while addressing limitations through:
1. Add REST API layer for programmatic access
2. Containerize for easier deployment
3. Add monitoring and metrics
4. Expand provider ecosystem
5. Build community around project
## Detailed Evaluation
### Architecture (Score: 9.5/10)
#### Strengths
**1. 4-Stage Pipeline Design**
The LOOKUP → HARMONIZE → MERGE → SEED pipeline is exceptionally well-designed:
- **Clear separation of concerns**: Each stage has distinct responsibilities
- **Composable**: Stages can be used independently or combined
- **Testable**: Each stage can be tested in isolation
- **Extensible**: New providers or merge strategies can be added without affecting other stages
**Example Use Cases**:
- LOOKUP only: Fetch data from providers without harmonization
- LOOKUP + HARMONIZE: Get standardized data without merging
- Full pipeline: Complete aggregation and MusicBrainz seeding
**2. Provider Abstraction System**
The base class hierarchy is exemplary:
```
MetadataProvider (abstract)
├── MetadataApiProvider (OAuth2)
├── ReleaseLookup (GTIN/URL/ID)
└── ReleaseApiLookup (multi-region)
```
**Benefits**:
- **Consistent interface**: All providers implement same methods
- **Code reuse**: Common functionality (caching, rate limiting, OAuth2) in base classes
- **Easy provider addition**: New providers require minimal boilerplate
- **Feature quality ratings**: Transparent quality assessment
**3. Intelligent Merge Algorithm**
The 3-phase merge (collect → check compatibility → select best) is sophisticated:
- **Compatibility checking**: Detects conflicts before merging
- **Provider preferences**: Configurable priority order
- **Source tracking**: SourceMap records which provider contributed each field
- **Conflict reporting**: IncompatibilityInfo provides detailed conflict information
**Real-world value**: Solves the "which source wins" problem elegantly.
**4. Type Safety**
Full TypeScript coverage with 273-line HarmonyRelease schema ensures:
- **Compile-time error detection**: Catch bugs before runtime
- **IDE autocomplete**: Better developer experience
- **Self-documenting**: Types serve as documentation
- **Refactoring safety**: Changes propagate through type system
#### Weaknesses
**1. No REST API**
Web UI only limits programmatic access:
- **Integration difficulty**: Other applications can't easily consume data
- **Automation challenges**: No API for batch processing
- **Mobile apps**: Can't build native mobile clients
**Mitigation**: Add REST API layer (see recommendations)
**2. Tight Coupling to Fresh Framework**
Fresh is Deno-only, limiting deployment options:
- **No Node.js support**: Can't run on Node.js infrastructure
- **Framework lock-in**: Migrating to another framework would be difficult
- **Smaller ecosystem**: Fresh has fewer resources than Next.js/Remix
**Mitigation**: Extract core logic into framework-agnostic library
### Data Model (Score: 9/10)
#### Strengths
**1. Comprehensive HarmonyRelease Schema**
273 lines covering all music metadata needs:
- **Basic metadata**: Title, artists, GTIN
- **Media structure**: Multi-disc support with tracks
- **Commercial info**: Labels, catalog numbers, copyright
- **Distribution**: Available/excluded countries
- **Visual assets**: Images with dimensions and types
- **External links**: Provider URLs with link types
- **Metadata about metadata**: Providers, messages, source map
**Coverage**: Matches or exceeds MusicBrainz schema.
**2. Partial Date Support**
`PartialDate` interface handles incomplete dates:
```typescript
{ year: 2014 } // Year only
{ year: 2014, month: 11 } // Year and month
{ year: 2014, month: 11, day: 24 } // Full date
```
**Real-world value**: Many releases have incomplete release dates.
**3. Artist Credit System**
`ArtistCreditName[]` with join phrases:
```typescript
[
{ name: "Artist A", joinPhrase: " & " },
{ name: "Artist B", joinPhrase: " feat. " },
{ name: "Artist C" }
]
// Renders: "Artist A & Artist B feat. Artist C"
```
**Real-world value**: Handles complex artist credits (collaborations, features, etc.)
**4. Source Tracking**
`SourceMap` records which provider contributed each field:
```typescript
{
"title": "spotify",
"releaseDate": "spotify",
"gtin": "deezer",
"media[0].tracks[0].isrc": "spotify"
}
```
**Real-world value**: Enables data provenance and debugging.
#### Weaknesses
**1. No Versioning**
Schema has no version field:
- **Breaking changes**: No way to detect schema version
- **Migration challenges**: Can't handle multiple schema versions simultaneously
**Mitigation**: Add `schemaVersion` field to HarmonyRelease
**2. Limited Extensibility**
No extension mechanism for provider-specific data:
- **Custom fields**: No way to store provider-specific metadata
- **Experimental features**: Can't add new fields without schema change
**Mitigation**: Add `extensions` object for provider-specific data
### Provider Integration (Score: 8.5/10)
#### Strengths
**1. Diverse Provider Ecosystem**
9 providers covering major platforms:
- **Streaming**: Spotify, Deezer, Tidal
- **Purchase**: iTunes, Bandcamp, Beatport
- **Regional**: Mora, Ototoy (Japan)
- **Reference**: MusicBrainz
**Coverage**: Excellent global coverage with regional specialists.
**2. Multi-Access Methods**
Both API-based (5) and HTML scraping (4):
- **API-based**: Reliable, structured data
- **HTML scraping**: Access to platforms without APIs
**Flexibility**: Can integrate any platform regardless of API availability.
**3. OAuth2 Support**
Spotify and Tidal use OAuth2 with token caching:
- **Secure**: Industry-standard authentication
- **Efficient**: Token caching reduces auth requests
- **Automatic renewal**: Handles token expiration
**4. Rate Limiting**
Per-provider rate limiters with exponential backoff:
- **API compliance**: Respects provider rate limits
- **Retry-After support**: Parses and respects Retry-After headers
- **Configurable**: Different limits per provider
**5. Multi-Region Support**
iTunes queries multiple regions in parallel:
- **Global coverage**: Access region-specific releases
- **Parallel execution**: Faster than sequential queries
#### Weaknesses
**1. HTML Scraping Fragility**
4 providers rely on HTML scraping:
- **Breaks on redesigns**: Site changes break scrapers
- **Maintenance burden**: Requires constant updates
- **No guarantees**: Sites can block scrapers
**Mitigation**: Add monitoring for scraper failures, fallback to other providers
**2. KKBOX Not Implemented**
Mentioned but not implemented:
- **Missing coverage**: No Taiwan/Hong Kong/Southeast Asia specialist
- **Incomplete**: Documentation mentions it but code doesn't include it
**Mitigation**: Implement KKBOX provider or remove from documentation
**3. No Provider Health Monitoring**
No system to track provider availability:
- **Silent failures**: Providers can fail without notification
- **No metrics**: Can't track provider reliability over time
**Mitigation**: Add provider health checks and metrics
### MusicBrainz Integration (Score: 9/10)
#### Strengths
**1. Batch MBID Resolution**
100 URLs per request:
- **Efficient**: Reduces API calls by 100x
- **Fast**: Single request instead of 100
- **Caching**: Results cached for future lookups
**Real-world value**: Essential for duplicate detection.
**2. Duplicate Detection**
Checks if external URLs already linked to MusicBrainz:
- **Prevents duplicates**: Warns before creating duplicate releases
- **Links to existing**: Provides link to existing release
- **User-friendly**: Clear warning messages
**3. Seeding Integration**
Pre-filled form for MusicBrainz import:
- **Edit notes**: Include provider URLs and permalink
- **Annotation**: Extra metadata not in main form
- **Copy-to-clipboard**: Easy data transfer
**4. Template Provider Mode**
MusicBrainz as reference data:
- **Verification**: Compare external sources against MusicBrainz
- **Quality control**: Identify discrepancies
- **Improvement**: Find missing data in MusicBrainz
#### Weaknesses
**1. No Automatic Submission**
Manual copy-paste required:
- **Friction**: User must manually transfer data
- **Error-prone**: Copy-paste can introduce errors
**Mitigation**: Add MusicBrainz API submission (requires user authentication)
**2. No Edit Tracking**
No way to track submitted edits:
- **No feedback**: User doesn't know if edit was accepted
- **No metrics**: Can't measure Harmony's impact on MusicBrainz
**Mitigation**: Add edit tracking via MusicBrainz API
### Testing and Quality (Score: 9/10)
#### Strengths
**1. Comprehensive Test Coverage**
38 test files covering all modules:
- **Providers**: All 9 providers tested
- **Harmonizer**: Merge, compatibility, deduplication tested
- **MusicBrainz**: Seeding, MBID resolution tested
**2. Declarative Provider Tests**
`describeProvider` helper reduces boilerplate:
- **Consistent**: All providers tested the same way
- **Maintainable**: Changes to test structure affect all providers
- **Readable**: Tests are self-documenting
**3. Offline Testing**
43 cached responses in `testdata/`:
- **Fast**: No network requests during tests
- **Reproducible**: Same results every time
- **Offline-friendly**: Can test without internet
**4. Snapshot Testing**
Verify output stability:
- **Regression detection**: Catch unintended changes
- **Easy updates**: Update snapshots when changes are intentional
#### Weaknesses
**1. No Integration Tests**
Only unit tests, no end-to-end tests:
- **Missing coverage**: Full pipeline not tested together
- **Real-world scenarios**: Can't test actual provider interactions
**Mitigation**: Add integration tests with real provider calls (optional, gated by flag)
**2. No Performance Tests**
No benchmarks or performance tests:
- **No baselines**: Can't detect performance regressions
- **No optimization targets**: Don't know what to optimize
**Mitigation**: Add benchmark tests for critical paths (merge algorithm, provider lookups)
### Deployment and Operations (Score: 6/10)
#### Strengths
**1. Simple Deployment**
No Docker, no Kubernetes:
- **Low complexity**: Easy to understand and debug
- **Fast startup**: No container overhead
- **Direct access**: Can inspect process directly
**2. systemd Integration**
Standard Linux service management:
- **Familiar**: Most Linux admins know systemd
- **Reliable**: systemd handles restarts, logging
- **Secure**: systemd security hardening options
**3. CI/CD Automation**
GitHub Actions with SSH deployment:
- **Automated**: Deploy on git tag
- **Simple**: No complex orchestration
- **Reliable**: SSH is battle-tested
#### Weaknesses
**1. No Containerization**
No Docker support:
- **Deployment friction**: Requires Deno installation on server
- **Inconsistent environments**: Dev/prod differences possible
- **No orchestration**: Can't use Kubernetes, Docker Swarm
**Mitigation**: Add Dockerfile and docker-compose.yml
**2. No Monitoring**
No metrics, no health checks:
- **Blind operations**: Can't see system health
- **No alerting**: Can't detect issues proactively
- **No performance tracking**: Can't optimize without data
**Mitigation**: Add Prometheus metrics, health endpoint, logging aggregation
**3. No Horizontal Scaling**
Single-instance deployment:
- **Limited capacity**: Can't handle high traffic
- **No redundancy**: Single point of failure
- **No load balancing**: Can't distribute load
**Mitigation**: Add load balancer support, stateless design (already stateless)
**4. Manual Cache Management**
No automatic cache cleanup:
- **Disk growth**: Cache grows indefinitely
- **Manual intervention**: Requires manual cleanup scripts
- **No monitoring**: Don't know cache size without checking
**Mitigation**: Add automatic cache eviction, cache size monitoring
### Documentation (Score: 7/10)
#### Strengths
**1. Inline Comments**
Code is well-commented:
- **Type definitions**: Comprehensive JSDoc comments
- **Complex logic**: Explanations for non-obvious code
- **Examples**: Usage examples in comments
**2. Type Definitions as Documentation**
273-line HarmonyRelease schema is self-documenting:
- **Clear structure**: Types show data model
- **IDE support**: Autocomplete and type hints
- **Always up-to-date**: Types can't be out of sync with code
**3. Test Specs as Documentation**
Declarative provider tests show usage:
- **Examples**: Tests demonstrate how to use providers
- **Expected behavior**: Tests document expected outputs
#### Weaknesses
**1. No Architecture Documentation**
No high-level architecture docs:
- **Onboarding difficulty**: New contributors must read code
- **No diagrams**: Visual learners have no reference
- **No decision records**: Don't know why choices were made
**Mitigation**: Add architecture documentation (this analysis addresses this)
**2. No API Documentation**
No OpenAPI/Swagger spec:
- **Integration difficulty**: Developers must read code to understand API
- **No interactive docs**: Can't try API in browser
**Mitigation**: Add OpenAPI spec (once REST API is added)
**3. No User Guide**
No end-user documentation:
- **Learning curve**: Users must figure out UI themselves
- **No tutorials**: No step-by-step guides
- **No FAQ**: Common questions not answered
**Mitigation**: Add user guide with screenshots and examples
## Comparison with Alternatives
### vs. Beets
**Beets**: Music library management tool with metadata fetching
| Aspect | Harmony | Beets |
|--------|---------|-------|
| **Purpose** | MusicBrainz seeding | Library management |
| **Architecture** | Web UI + CLI | CLI only |
| **Providers** | 9 providers | MusicBrainz + plugins |
| **Merge algorithm** | 3-phase intelligent merge | Plugin-based |
| **MusicBrainz integration** | Seeding focus | Lookup focus |
| **Language** | TypeScript/Deno | Python |
| **Deployment** | Self-hosted web app | Local CLI tool |
**Verdict**: Harmony is better for MusicBrainz seeding, Beets is better for library management.
### vs. Picard
**Picard**: MusicBrainz official tagger
| Aspect | Harmony | Picard |
|--------|---------|-------|
| **Purpose** | Multi-source aggregation | MusicBrainz tagging |
| **Architecture** | Web UI | Desktop GUI |
| **Providers** | 9 providers | MusicBrainz + AcoustID |
| **Merge algorithm** | Intelligent merge | MusicBrainz priority |
| **Use case** | Release research | File tagging |
| **Language** | TypeScript/Deno | Python/Qt |
**Verdict**: Harmony is better for release research, Picard is better for file tagging.
### vs. Custom Scraper
**Custom Scraper**: Ad-hoc provider integration
| Aspect | Harmony | Custom Scraper |
|--------|---------|----------------|
| **Architecture** | 4-stage pipeline | Ad-hoc |
| **Provider abstraction** | Base classes | None |
| **Merge algorithm** | 3-phase intelligent | Manual |
| **Type safety** | Full TypeScript | Varies |
| **Testing** | 38 test files | Varies |
| **Maintenance** | Single codebase | Per-scraper |
**Verdict**: Harmony is vastly superior to custom scrapers.
## Adoption Recommendations
### What to Adopt
#### 1. Architecture Patterns (Priority: CRITICAL)
**Adopt**:
- 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED)
- Provider base class hierarchy
- Feature quality rating system
- Graceful degradation via Promise.allSettled
**Rationale**: These patterns are proven, well-designed, and solve real problems.
**Implementation**:
```typescript
// Adopt provider base class
abstract class MetadataProvider {
abstract name: string;
abstract urlPattern: URLPattern;
abstract lookupByUrl(url: string): Promise<Release>;
abstract harmonize(release: Release): HarmonyRelease;
abstract featureQuality: FeatureQualityMap;
}
// Adopt 4-stage pipeline
async function aggregateMetadata(input: LookupInput): Promise<MergedHarmonyRelease> {
// Stage 1: LOOKUP
const releases = await combinedLookup(input);
// Stage 2: HARMONIZE (already done in provider.lookup)
// Stage 3: MERGE
const merged = await mergeReleases(releases);
// Stage 4: SEED (optional)
const mbFormat = await convertToMusicBrainz(merged);
return merged;
}
```
#### 2. Data Model (Priority: HIGH)
**Adopt**:
- HarmonyRelease schema (273 lines)
- PartialDate interface
- ArtistCreditName with join phrases
- SourceMap for data provenance
- IncompatibilityInfo for conflict reporting
**Rationale**: Comprehensive, well-designed, covers all metadata needs.
**Modifications**:
- Add `schemaVersion` field
- Add `extensions` object for provider-specific data
#### 3. Merge Algorithm (Priority: HIGH)
**Adopt**:
- 3-phase merge (collect → check compatibility → select best)
- Provider preference system
- Compatibility checking
- Conflict reporting
**Rationale**: Solves the "which source wins" problem elegantly.
**Enhancements**:
- Add user override mechanism
- Add machine learning for automatic preference learning
#### 4. Testing Patterns (Priority: MEDIUM)
**Adopt**:
- Declarative provider tests (`describeProvider`)
- Offline testing with cached responses
- Snapshot testing
**Rationale**: Reduces boilerplate, improves maintainability.
### What to Modify
#### 1. Add REST API (Priority: CRITICAL)
**Current**: Web UI only
**Proposed**: Add REST API layer
**Endpoints**:
```
GET /api/v1/release?gtin={gtin}&region={region}
GET /api/v1/release?url={url}
POST /api/v1/release/batch
GET /api/v1/providers
GET /api/v1/providers/{name}
```
**Response format**: JSON (HarmonyRelease or MergedHarmonyRelease)
**Benefits**:
- Programmatic access
- Integration with other applications
- Mobile app support
- Batch processing
#### 2. Add Containerization (Priority: HIGH)
**Current**: No Docker
**Proposed**: Add Dockerfile and docker-compose.yml
**Dockerfile**:
```dockerfile
FROM denoland/deno:1.37.0
WORKDIR /app
COPY . .
RUN deno cache server/main.ts
EXPOSE 8000
CMD ["deno", "run", "-A", "server/main.ts"]
```
**docker-compose.yml**:
```yaml
version: '3.8'
services:
harmony:
build: .
ports:
- "8000:8000"
environment:
- HARMONY_SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
- HARMONY_SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
volumes:
- ./data:/var/lib/harmony
```
**Benefits**:
- Consistent environments
- Easy deployment
- Orchestration support (Kubernetes)
#### 3. Add Monitoring (Priority: HIGH)
**Current**: No metrics, no health checks
**Proposed**: Add Prometheus metrics and health endpoint
**Metrics**:
- Request count by route
- Request duration by route
- Provider success/failure rate
- Cache hit/miss rate
- Merge conflict rate
**Health endpoint**:
```typescript
// GET /health
{
"status": "ok",
"version": "v1.2.3",
"uptime": 3600,
"providers": {
"spotify": "ok",
"deezer": "ok",
"itunes": "degraded"
}
}
```
**Benefits**:
- Proactive issue detection
- Performance optimization
- Capacity planning
#### 4. Add Provider Health Monitoring (Priority: MEDIUM)
**Current**: Silent provider failures
**Proposed**: Track provider availability and performance
**Implementation**:
```typescript
interface ProviderHealth {
name: string;
status: 'ok' | 'degraded' | 'down';
successRate: number; // Last 100 requests
avgResponseTime: number; // Milliseconds
lastSuccess: number; // Timestamp
lastFailure: number; // Timestamp
lastError?: string;
}
```
**Benefits**:
- Identify unreliable providers
- Adjust provider preferences dynamically
- Alert on provider failures
### What to Avoid
#### 1. Don't Add Database (Priority: HIGH)
**Current**: Cache-first, no database
**Recommendation**: Keep cache-first approach
**Rationale**:
- Simplicity is a strength
- No migrations to manage
- Stateless design enables horizontal scaling
- Permalink system works well with cache
**Exception**: If adding user accounts, use separate auth database (don't mix with metadata)
#### 2. Don't Add Complex Build System (Priority: MEDIUM)
**Current**: Deno handles everything
**Recommendation**: Keep Deno's built-in tooling
**Rationale**:
- Deno fmt, lint, test are sufficient
- No need for Webpack, Vite, etc.
- Fresh handles asset bundling
**Exception**: If migrating to Node.js, use Vite or similar
#### 3. Don't Rewrite in Another Language (Priority: HIGH)
**Current**: TypeScript/Deno
**Recommendation**: Keep TypeScript/Deno
**Rationale**:
- Type safety is critical for data aggregation
- Deno tooling is excellent
- Migration cost is high
- No significant benefits from other languages
**Exception**: If Deno becomes unmaintained (unlikely)
## Integration Strategy
### Phase 1: Study and Prototype (2-4 weeks)
**Goals**:
- Deep understanding of Harmony architecture
- Prototype key components in target stack
- Validate design decisions
**Tasks**:
1. Read all source code
2. Run Harmony locally
3. Test all providers
4. Prototype provider base class
5. Prototype merge algorithm
6. Prototype HarmonyRelease schema
**Deliverables**:
- Architecture documentation (this document)
- Prototype codebase
- Design decisions document
### Phase 2: Core Implementation (6-8 weeks)
**Goals**:
- Implement 4-stage pipeline
- Implement provider abstraction
- Implement merge algorithm
- Implement 3-5 providers
**Tasks**:
1. Implement MetadataProvider base class
2. Implement HarmonyRelease schema
3. Implement CombinedReleaseLookup
4. Implement merge algorithm
5. Implement Spotify provider
6. Implement Deezer provider
7. Implement MusicBrainz provider
8. Add comprehensive tests
**Deliverables**:
- Working 4-stage pipeline
- 3-5 providers implemented
- Test coverage >80%
### Phase 3: API and Deployment (4-6 weeks)
**Goals**:
- Add REST API
- Add containerization
- Add monitoring
- Deploy to production
**Tasks**:
1. Design REST API
2. Implement API endpoints
3. Add OpenAPI documentation
4. Create Dockerfile
5. Add Prometheus metrics
6. Add health endpoint
7. Deploy to staging
8. Load testing
9. Deploy to production
**Deliverables**:
- REST API with OpenAPI spec
- Docker images
- Monitoring dashboard
- Production deployment
### Phase 4: Expansion (Ongoing)
**Goals**:
- Add more providers
- Improve merge algorithm
- Add features
**Tasks**:
1. Add iTunes provider
2. Add Tidal provider
3. Add Bandcamp provider
4. Improve compatibility checking
5. Add machine learning for provider preferences
6. Add user feedback mechanism
**Deliverables**:
- 9+ providers
- Improved merge accuracy
- User feedback system
## Risk Assessment
### Technical Risks
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| **Provider API changes** | High | High | Monitor provider APIs, add health checks, graceful degradation |
| **HTML scraping breaks** | High | Medium | Monitor scraper failures, fallback to other providers |
| **Rate limiting** | Medium | Medium | Respect rate limits, implement backoff, cache aggressively |
| **OAuth2 token expiration** | Low | Low | Automatic token renewal, error handling |
| **Merge conflicts** | Medium | Medium | Comprehensive compatibility checking, user override |
| **Performance degradation** | Low | Medium | Monitoring, caching, optimization |
### Operational Risks
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| **Single developer dependency** | High | High | Build community, document architecture, onboard contributors |
| **Deno ecosystem changes** | Low | Medium | Monitor Deno releases, test before upgrading |
| **Fresh framework changes** | Medium | Medium | Pin Fresh version, test before upgrading |
| **Provider terms of service** | Low | High | Review ToS, add rate limiting, respect robots.txt |
| **Cache growth** | Medium | Low | Automatic cache eviction, monitoring |
### Business Risks
| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| **Low adoption** | Medium | Medium | Marketing, documentation, community building |
| **Competition** | Low | Low | Focus on MusicBrainz integration, unique features |
| **Maintenance burden** | Medium | Medium | Automate testing, monitoring, deployment |
## Conclusion
Harmony is an **exceptional reference project** for music metadata aggregation. Its architecture, data model, and merge algorithm are best-in-class and should be adopted with minimal modifications.
**Key Takeaways**:
1. **Architecture**: 4-stage pipeline is proven and extensible
2. **Data Model**: HarmonyRelease schema is comprehensive and well-designed
3. **Merge Algorithm**: 3-phase merge with provider preferences solves real problems
4. **Provider Abstraction**: Base class hierarchy enables easy provider addition
5. **Type Safety**: Full TypeScript coverage prevents bugs
6. **Testing**: Declarative provider tests and offline testing are excellent patterns
**Critical Additions**:
1. **REST API**: Essential for programmatic access
2. **Containerization**: Simplifies deployment
3. **Monitoring**: Required for production operations
4. **Documentation**: Improves onboarding and adoption
**Adoption Path**:
1. Study Harmony architecture (2-4 weeks)
2. Implement core components (6-8 weeks)
3. Add API and deployment (4-6 weeks)
4. Expand providers and features (ongoing)
**Expected Outcome**: Production-ready metadata aggregation system with 9+ providers, intelligent merging, and MusicBrainz integration within 3-4 months.
## Relevance Score: 10/10
Harmony is the **most relevant project** for metadata aggregation:
- **Architecture**: Best-in-class multi-source aggregation
- **Data Model**: Comprehensive and well-designed
- **MusicBrainz Integration**: Seamless seeding workflow
- **Code Quality**: Type-safe, well-tested, maintainable
- **Production-Ready**: Used by MusicBrainz community
**Recommendation**: **Adopt Harmony's architecture as the foundation** for the metadata aggregation system. The investment in studying and adapting Harmony will pay dividends in reduced development time, fewer bugs, and better design decisions.
@@ -0,0 +1,895 @@
# Harmony - Provider Integrations Analysis
## Provider Ecosystem Overview
Harmony integrates with **9 music metadata providers** using two primary access methods:
1. **API-based providers (5)**: Structured data via REST APIs
2. **HTML scraping providers (4)**: Data extraction from web pages
All providers share a common base architecture with URL pattern matching, rate limiting, caching, and harmonization to the `HarmonyRelease` schema.
## Provider Summary Table
| Provider | Type | Auth | Rate Limit | GTIN | Max Image | Regions | Status |
|----------|------|------|------------|------|-----------|---------|--------|
| Spotify | API | OAuth2 | Not specified | Yes (UPC) | 2000px | Global | Active |
| Deezer | API | Public | 50 req/5s | Yes | 1400px | Global | Active |
| iTunes | API | Public | Not specified | Yes | Varies | Multi-region | Active |
| Tidal | API | OAuth2 | Not specified | Yes | 1280px | Global | Active (v2) |
| MusicBrainz | API | Public | 5 req/5s | Yes (barcode) | N/A | Global | Active |
| Bandcamp | Scraping | None | Not specified | No | 3000px | Global | Active |
| Beatport | Scraping | None | Not specified | Yes | Varies | Global | Active |
| Mora | Scraping | None | Not specified | Yes | Varies | Japan | Active |
| Ototoy | Scraping | None | Not specified | Yes | Varies | Japan | Active |
## API-Based Providers
### 1. Spotify
**File**: `providers/spotify.ts`
#### Authentication
- **Method**: OAuth2 Client Credentials Flow
- **Credentials**: `HARMONY_SPOTIFY_CLIENT_ID`, `HARMONY_SPOTIFY_CLIENT_SECRET`
- **Token endpoint**: `https://accounts.spotify.com/api/token`
- **Token caching**: localStorage (dev) / sessionStorage (prod)
- **Token lifetime**: 3600 seconds (1 hour)
**OAuth2 Flow**:
```typescript
async function getAccessToken(): Promise<string> {
const response = await fetch('https://accounts.spotify.com/api/token', {
method: 'POST',
headers: {
'Authorization': `Basic ${btoa(`${clientId}:${clientSecret}`)}`,
'Content-Type': 'application/x-www-form-urlencoded'
},
body: 'grant_type=client_credentials'
});
const data = await response.json();
return data.access_token;
}
```
#### API Endpoints
| Endpoint | Purpose | Example |
|----------|---------|---------|
| `GET /v1/albums/{id}` | Album lookup by Spotify ID | `/v1/albums/3DiDSNVBRYVzccLn2yqhMJ` |
| `GET /v1/search` | Search by UPC | `/v1/search?q=upc:0602537347377&type=album` |
#### URL Pattern
```typescript
urlPattern = new URLPattern({
hostname: 'open.spotify.com',
pathname: '/album/:id'
});
```
**Matches**:
- `https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ`
- `https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ?si=xyz`
#### Feature Quality
```typescript
featureQuality = {
gtin: FeatureQuality.GOOD, // UPC in external_ids
title: FeatureQuality.GOOD, // Album name
artists: FeatureQuality.GOOD, // Artist array with names
releaseDate: FeatureQuality.GOOD, // release_date field
labels: FeatureQuality.PRESENT, // Label name (no catalog number)
media: FeatureQuality.GOOD, // Disc structure
tracks: FeatureQuality.GOOD, // Track listing with durations
isrc: FeatureQuality.GOOD, // ISRC per track
images: 2000, // Max 2000x2000px
copyright: FeatureQuality.PRESENT,// Copyright array
availability: FeatureQuality.GOOD // available_markets array
};
```
#### Data Mapping
**Spotify Album Object****HarmonyRelease**:
| Spotify Field | Harmony Field | Transformation |
|---------------|---------------|----------------|
| `name` | `title` | Direct |
| `artists[].name` | `artists[].name` | Map array |
| `external_ids.upc` | `gtin` | Direct |
| `release_date` | `releaseDate` | Parse to PartialDate |
| `label` | `labels[0].name` | Single label |
| `tracks.items[]` | `media[0].tracks[]` | Map to HarmonyTrack |
| `images[]` | `images[]` | Map with dimensions |
| `copyrights[0].text` | `copyright` | First copyright |
| `available_markets[]` | `availableIn[]` | Direct |
| `external_urls.spotify` | `externalLinks[0].url` | Streaming link |
**Example Harmonization**:
```typescript
harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease {
return {
title: spotifyAlbum.name,
artists: spotifyAlbum.artists.map(a => ({ name: a.name })),
gtin: spotifyAlbum.external_ids?.upc,
media: [{
format: MediumFormat.Digital,
position: 1,
tracks: spotifyAlbum.tracks.items.map((t, i) => ({
title: t.name,
position: i + 1,
length: t.duration_ms,
isrc: t.external_ids?.isrc,
artists: t.artists.length !== spotifyAlbum.artists.length
? t.artists.map(a => ({ name: a.name }))
: undefined
}))
}],
releaseDate: this.parseDate(spotifyAlbum.release_date),
types: this.inferTypes(spotifyAlbum.album_type),
images: spotifyAlbum.images.map(img => ({
url: img.url,
types: [ImageType.Front],
width: img.width,
height: img.height
})),
labels: spotifyAlbum.label ? [{ name: spotifyAlbum.label }] : [],
copyright: spotifyAlbum.copyrights?.[0]?.text,
availableIn: spotifyAlbum.available_markets,
externalLinks: [{
url: spotifyAlbum.external_urls.spotify,
types: [LinkType.Streaming]
}],
info: {
providers: ['spotify'],
messages: []
}
};
}
```
#### Rate Limiting
- **Limit**: Not publicly specified
- **Handling**: Retry on 429 status with `Retry-After` header
- **Caching**: 24-hour cache reduces API calls
### 2. Deezer
**File**: `providers/deezer.ts`
#### Authentication
- **Method**: Public API (no authentication required)
- **Base URL**: `https://api.deezer.com`
#### Rate Limiting
- **Limit**: 50 requests per 5 seconds
- **Enforcement**: Server-side (429 status on exceed)
- **Handling**: Exponential backoff with `Retry-After` header
#### API Endpoints
| Endpoint | Purpose | Example |
|----------|---------|---------|
| `GET /album/{id}` | Album lookup by Deezer ID | `/album/123456` |
| `GET /search/album` | Search by UPC | `/search/album?q=upc:0602537347377` |
#### URL Pattern
```typescript
urlPattern = new URLPattern({
hostname: 'www.deezer.com',
pathname: '/:locale/album/:id'
});
```
**Matches**:
- `https://www.deezer.com/en/album/123456`
- `https://www.deezer.com/fr/album/123456`
#### Feature Quality
```typescript
featureQuality = {
gtin: FeatureQuality.GOOD, // UPC field
title: FeatureQuality.GOOD, // Title field
artists: FeatureQuality.GOOD, // Artist object
releaseDate: FeatureQuality.GOOD, // release_date field
labels: FeatureQuality.GOOD, // Label with catalog number
media: FeatureQuality.GOOD, // Disc structure
tracks: FeatureQuality.GOOD, // Track listing
isrc: FeatureQuality.GOOD, // ISRC per track
images: 1400, // Max 1400x1400px
copyright: FeatureQuality.GOOD, // Copyright field
availability: FeatureQuality.PRESENT // Available countries (limited)
};
```
#### Data Mapping
**Deezer Album Object****HarmonyRelease**:
| Deezer Field | Harmony Field | Notes |
|--------------|---------------|-------|
| `title` | `title` | Direct |
| `artist.name` | `artists[0].name` | Single artist |
| `upc` | `gtin` | Direct |
| `release_date` | `releaseDate` | YYYY-MM-DD format |
| `label` | `labels[0].name` | Label name |
| `tracks.data[]` | `media[0].tracks[]` | Track array |
| `cover_xl` | `images[0].url` | 1400x1400px |
| `copyright` | `copyright` | Direct |
### 3. iTunes (Apple Music)
**File**: `providers/itunes.ts`
#### Authentication
- **Method**: Public API (no authentication required)
- **Base URL**: `https://itunes.apple.com`
#### Multi-Region Support
iTunes API is region-specific. Harmony queries multiple regions in parallel.
**Supported Regions**:
- `US` (United States)
- `GB` (United Kingdom)
- `DE` (Germany)
- `JP` (Japan)
- `FR` (France)
- `CA` (Canada)
- `AU` (Australia)
**Region-Specific Endpoints**:
```
https://itunes.apple.com/us/lookup?id=123456
https://itunes.apple.com/gb/lookup?id=123456
https://itunes.apple.com/jp/lookup?id=123456
```
#### API Endpoints
| Endpoint | Purpose | Example |
|----------|---------|---------|
| `GET /{region}/lookup` | Album lookup by iTunes ID | `/us/lookup?id=123456` |
| `GET /{region}/search` | Search by UPC | `/us/search?term=upc:0602537347377` |
#### URL Pattern
```typescript
urlPattern = new URLPattern({
hostname: 'music.apple.com',
pathname: '/:region/album/:name/:id'
});
```
**Matches**:
- `https://music.apple.com/us/album/album-name/123456`
- `https://music.apple.com/jp/album/album-name/123456`
#### Feature Quality
```typescript
featureQuality = {
gtin: FeatureQuality.GOOD, // UPC in response
title: FeatureQuality.GOOD, // collectionName
artists: FeatureQuality.GOOD, // artistName
releaseDate: FeatureQuality.GOOD, // releaseDate
labels: FeatureQuality.PRESENT, // copyright (label name embedded)
media: FeatureQuality.GOOD, // Track listing
tracks: FeatureQuality.GOOD, // Track array
isrc: FeatureQuality.MISSING, // Not provided
images: 'varies', // 600x600 to 3000x3000
copyright: FeatureQuality.PRESENT,// copyright field
availability: FeatureQuality.GOOD // Region-specific
};
```
### 4. Tidal
**File**: `providers/tidal.ts`
#### Authentication
- **Method**: OAuth2 Client Credentials Flow
- **Credentials**: `HARMONY_TIDAL_CLIENT_ID`, `HARMONY_TIDAL_CLIENT_SECRET`
- **Token endpoint**: `https://auth.tidal.com/v1/oauth2/token`
- **API version**: v2 (v1 deprecated 2025-01-21)
#### API Version Migration
**v1 (deprecated 2025-01-21)**:
- Endpoint: `https://api.tidal.com/v1/albums/{id}`
- Status: No longer supported
**v2 (current)**:
- Endpoint: `https://openapi.tidal.com/v2/albums/{id}`
- Migration: Completed in Harmony codebase
#### API Endpoints
| Endpoint | Purpose | Example |
|----------|---------|---------|
| `GET /v2/albums/{id}` | Album lookup by Tidal ID | `/v2/albums/123456` |
| `GET /v2/albums/byBarcode/{upc}` | Lookup by UPC | `/v2/albums/byBarcode/0602537347377` |
#### URL Pattern
```typescript
urlPattern = new URLPattern({
hostname: 'tidal.com',
pathname: '/browse/album/:id'
});
```
**Matches**:
- `https://tidal.com/browse/album/123456`
- `https://listen.tidal.com/album/123456`
#### Feature Quality
```typescript
featureQuality = {
gtin: FeatureQuality.GOOD, // barcode field
title: FeatureQuality.GOOD, // title field
artists: FeatureQuality.GOOD, // artists array
releaseDate: FeatureQuality.GOOD, // releaseDate
labels: FeatureQuality.GOOD, // label with catalog number
media: FeatureQuality.GOOD, // Media array
tracks: FeatureQuality.GOOD, // Track listing
isrc: FeatureQuality.GOOD, // ISRC per track
images: 1280, // Max 1280x1280px
copyright: FeatureQuality.GOOD, // copyright field
availability: FeatureQuality.GOOD // Available countries
};
```
### 5. MusicBrainz
**File**: `providers/musicbrainz.ts`
#### Authentication
- **Method**: Public API (no authentication required)
- **Base URL**: Configurable via `HARMONY_MB_API_URL` (default: `https://musicbrainz.org/ws/2`)
#### Rate Limiting
- **Limit**: 5 requests per 5 seconds (1 req/sec average)
- **Enforcement**: Server-side (503 status on exceed)
- **Handling**: Exponential backoff, respect `Retry-After` header
#### API Endpoints
| Endpoint | Purpose | Example |
|----------|---------|---------|
| `GET /release/{mbid}` | Release lookup by MBID | `/release/12345678-1234-1234-1234-123456789012` |
| `GET /release?barcode={gtin}` | Search by barcode | `/release?barcode=0602537347377` |
| `GET /url?resource={url}` | MBID resolution | `/url?resource=https://open.spotify.com/album/xyz` |
#### URL Pattern
```typescript
urlPattern = new URLPattern({
hostname: 'musicbrainz.org',
pathname: '/release/:mbid'
});
```
**Matches**:
- `https://musicbrainz.org/release/12345678-1234-1234-1234-123456789012`
#### Feature Quality
```typescript
featureQuality = {
gtin: FeatureQuality.GOOD, // barcode field
title: FeatureQuality.GOOD, // title field
artists: FeatureQuality.GOOD, // artist-credit array
releaseDate: FeatureQuality.GOOD, // date field
labels: FeatureQuality.GOOD, // label-info array
media: FeatureQuality.GOOD, // media array
tracks: FeatureQuality.GOOD, // track array
isrc: FeatureQuality.GOOD, // ISRC per recording
images: FeatureQuality.MISSING, // No images in API
copyright: FeatureQuality.MISSING,// Not in API
availability: FeatureQuality.MISSING // Not tracked
};
```
#### Special Role: Template Provider
MusicBrainz serves as a **template provider** for merge algorithm:
- **Purpose**: Provide reference data for comparison
- **Usage**: `musicbrainz!` parameter in URL
- **Behavior**: MusicBrainz data used as baseline, other providers compared against it
- **Use case**: Verify existing MusicBrainz releases against external sources
#### MBID Resolution
**Batch URL Lookup** (up to 100 URLs per request):
```typescript
async function resolveMBIDs(urls: string[]): Promise<Map<string, string>> {
const params = urls.map(url => `resource=${encodeURIComponent(url)}`).join('&');
const response = await fetch(`https://musicbrainz.org/ws/2/url?${params}&inc=release-rels`);
const data = await response.json();
const mbids = new Map<string, string>();
for (const urlData of data.urls) {
const mbid = urlData.relations.find(r => r.type === 'streaming')?.release?.id;
if (mbid) {
mbids.set(urlData.resource, mbid);
}
}
return mbids;
}
```
**Duplicate Detection**:
- Check if external URLs already linked to MusicBrainz releases
- Warn user before creating duplicate
- Provide link to existing release
## HTML Scraping Providers
### 6. Bandcamp
**File**: `providers/bandcamp.ts`
#### Scraping Method
- **Technique**: JSON-LD extraction from `<script type="application/ld+json">`
- **Fallback**: HTML parsing with CSS selectors
- **Reliability**: High (JSON-LD is stable)
#### URL Pattern
```typescript
urlPattern = new URLPattern({
hostname: '*.bandcamp.com',
pathname: '/album/:slug'
});
```
**Matches**:
- `https://artist.bandcamp.com/album/album-name`
- `https://label.bandcamp.com/album/album-name`
#### Data Extraction
**JSON-LD Schema.org MusicAlbum**:
```json
{
"@type": "MusicAlbum",
"name": "Album Title",
"byArtist": {
"@type": "MusicGroup",
"name": "Artist Name"
},
"datePublished": "2014-11-24",
"image": "https://f4.bcbits.com/img/a123456789_10.jpg",
"track": [
{
"@type": "MusicRecording",
"name": "Track 1",
"duration": "PT4M5S"
}
],
"recordLabel": {
"@type": "Organization",
"name": "Label Name"
}
}
```
#### Feature Quality
```typescript
featureQuality = {
gtin: FeatureQuality.MISSING, // Not provided
title: FeatureQuality.GOOD, // name field
artists: FeatureQuality.GOOD, // byArtist
releaseDate: FeatureQuality.GOOD, // datePublished
labels: FeatureQuality.GOOD, // recordLabel
media: FeatureQuality.GOOD, // track array
tracks: FeatureQuality.GOOD, // Track listing
isrc: FeatureQuality.MISSING, // Not provided
images: 3000, // Max 3000x3000px (a123456789_10.jpg)
copyright: FeatureQuality.PRESENT,// publisher field
availability: FeatureQuality.MISSING // Not specified
};
```
#### Challenges
- **No GTIN**: Bandcamp doesn't display barcodes
- **Subdomain variability**: Each artist/label has unique subdomain
- **Rate limiting**: Not publicly specified, conservative approach
### 7. Beatport
**File**: `providers/beatport.ts`
#### Scraping Method
- **Technique**: HTML parsing with CSS selectors
- **Reliability**: Medium (HTML structure changes break scraper)
#### URL Pattern
```typescript
urlPattern = new URLPattern({
hostname: 'www.beatport.com',
pathname: '/release/:slug/:id'
});
```
**Matches**:
- `https://www.beatport.com/release/album-name/123456`
#### Data Extraction
**CSS Selectors**:
```typescript
const selectors = {
title: '.interior-release-chart-content-item h1',
artists: '.interior-release-chart-content-item .artist a',
releaseDate: '.interior-release-chart-content-item .release-date',
label: '.interior-release-chart-content-item .label a',
catalogNumber: '.interior-release-chart-content-item .catalog-number',
tracks: '.track-grid .track',
trackTitle: '.track-title',
trackArtists: '.track-artists a',
trackLength: '.track-length',
coverImage: '.interior-release-chart-artwork img'
};
```
#### Feature Quality
```typescript
featureQuality = {
gtin: FeatureQuality.PRESENT, // Sometimes in metadata
title: FeatureQuality.GOOD, // h1 element
artists: FeatureQuality.GOOD, // Artist links
releaseDate: FeatureQuality.GOOD, // Release date element
labels: FeatureQuality.GOOD, // Label + catalog number
media: FeatureQuality.GOOD, // Track grid
tracks: FeatureQuality.GOOD, // Track listing
isrc: FeatureQuality.MISSING, // Not displayed
images: 'varies', // Cover image
copyright: FeatureQuality.MISSING,// Not displayed
availability: FeatureQuality.MISSING // Not specified
};
```
#### Challenges
- **HTML structure changes**: Frequent redesigns break selectors
- **JavaScript rendering**: Some content loaded dynamically
- **Rate limiting**: Not specified, risk of IP blocking
### 8. Mora (Japan)
**File**: `providers/mora.ts`
#### Scraping Method
- **Technique**: HTML parsing with CSS selectors
- **Language**: Japanese (requires UTF-8 handling)
- **Reliability**: Medium
#### URL Pattern
```typescript
urlPattern = new URLPattern({
hostname: 'mora.jp',
pathname: '/package/:id'
});
```
**Matches**:
- `https://mora.jp/package/123456`
#### Data Extraction
**CSS Selectors** (Japanese labels):
```typescript
const selectors = {
title: '.productTitle',
artists: '.artistName a',
releaseDate: '.releaseDate',
label: '.labelName',
catalogNumber: '.catalogNumber',
tracks: '.trackList .track',
coverImage: '.productImage img'
};
```
#### Feature Quality
```typescript
featureQuality = {
gtin: FeatureQuality.PRESENT, // JAN code (Japanese barcode)
title: FeatureQuality.GOOD, // Product title
artists: FeatureQuality.GOOD, // Artist links
releaseDate: FeatureQuality.GOOD, // Release date
labels: FeatureQuality.GOOD, // Label + catalog number
media: FeatureQuality.GOOD, // Track list
tracks: FeatureQuality.GOOD, // Track details
isrc: FeatureQuality.MISSING, // Not displayed
images: 'varies', // Product image
copyright: FeatureQuality.PRESENT,// Copyright notice
availability: FeatureQuality.GOOD // Japan-specific
};
```
#### Challenges
- **Japanese text**: Requires proper encoding and language detection
- **JAN vs. UPC**: Japanese Article Number may differ from international UPC
- **Regional availability**: Japan-only releases
### 9. Ototoy (Japan)
**File**: `providers/ototoy.ts`
#### Scraping Method
- **Technique**: HTML parsing with CSS selectors
- **Language**: Japanese
- **Reliability**: Medium
#### URL Pattern
```typescript
urlPattern = new URLPattern({
hostname: 'ototoy.jp',
pathname: '/album/:id'
});
```
**Matches**:
- `https://ototoy.jp/album/123456`
#### Feature Quality
```typescript
featureQuality = {
gtin: FeatureQuality.PRESENT, // JAN code
title: FeatureQuality.GOOD, // Album title
artists: FeatureQuality.GOOD, // Artist name
releaseDate: FeatureQuality.GOOD, // Release date
labels: FeatureQuality.GOOD, // Label info
media: FeatureQuality.GOOD, // Track list
tracks: FeatureQuality.GOOD, // Track details
isrc: FeatureQuality.MISSING, // Not displayed
images: 'varies', // Album art
copyright: FeatureQuality.PRESENT,// Copyright info
availability: FeatureQuality.GOOD // Japan-specific
};
```
## Provider Base Architecture
### MetadataProvider (Abstract Base)
**File**: `providers/base.ts`
**Core Functionality**:
```typescript
abstract class MetadataProvider {
// Identity
abstract name: string;
abstract urlPattern: URLPattern;
// Lookup methods
abstract lookupByUrl(url: string): Promise<ProviderRelease>;
abstract lookupByGtin(gtin: string, region?: string): Promise<ProviderRelease>;
// Harmonization
abstract harmonize(release: ProviderRelease): HarmonyRelease;
// Feature quality
abstract featureQuality: FeatureQualityMap;
// Rate limiting
protected rateLimit: RateLimiter;
protected async throttle(): Promise<void> {
await this.rateLimit.wait();
}
// Caching
protected cache: SnapStorage;
protected async getCached(key: string): Promise<Response | null> {
return await this.cache.get(key);
}
protected async setCached(key: string, response: Response): Promise<void> {
await this.cache.set(key, response);
}
// URL matching
matchesUrl(url: string): boolean {
return this.urlPattern.test(url);
}
}
```
### MetadataApiProvider (OAuth2)
**File**: `providers/api_base.ts`
**OAuth2 Support**:
```typescript
abstract class MetadataApiProvider extends MetadataProvider {
protected abstract clientId: string;
protected abstract clientSecret: string;
protected abstract tokenEndpoint: string;
protected async getAccessToken(): Promise<string> {
// Check cache
const cached = this.getTokenFromCache();
if (cached && !this.isTokenExpired(cached)) {
return cached.access_token;
}
// Request new token
const token = await this.requestToken();
this.cacheToken(token);
return token.access_token;
}
protected abstract async requestToken(): Promise<OAuth2Token>;
protected async fetch(url: string, options?: RequestInit): Promise<Response> {
const token = await this.getAccessToken();
return await fetch(url, {
...options,
headers: {
...options?.headers,
'Authorization': `Bearer ${token}`
}
});
}
}
```
### RateLimiter
**File**: `utils/rate_limiter.ts`
**Implementation**:
```typescript
class RateLimiter {
private queue: number[] = [];
private maxRequests: number;
private timeWindow: number; // milliseconds
constructor(maxRequests: number, timeWindow: number) {
this.maxRequests = maxRequests;
this.timeWindow = timeWindow;
}
async wait(): Promise<void> {
const now = Date.now();
// Remove old requests outside time window
this.queue = this.queue.filter(t => now - t < this.timeWindow);
// If at limit, wait until oldest request expires
if (this.queue.length >= this.maxRequests) {
const oldestRequest = this.queue[0];
const waitTime = this.timeWindow - (now - oldestRequest);
await new Promise(resolve => setTimeout(resolve, waitTime));
return this.wait(); // Recursive call after waiting
}
// Add current request to queue
this.queue.push(now);
}
}
// Usage
const deezerLimiter = new RateLimiter(50, 5000); // 50 req / 5 sec
const mbLimiter = new RateLimiter(5, 5000); // 5 req / 5 sec
```
## Provider Registry
**File**: `providers/registry.ts`
**Registration**:
```typescript
class ProviderRegistry {
private providers = new Map<string, MetadataProvider>();
private categories = new Map<string, string[]>();
register(provider: MetadataProvider, category: string): void {
this.providers.set(provider.name, provider);
if (!this.categories.has(category)) {
this.categories.set(category, []);
}
this.categories.get(category)!.push(provider.name);
}
get(name: string): MetadataProvider | undefined {
return this.providers.get(name);
}
getByCategory(category: string): MetadataProvider[] {
const names = this.categories.get(category) || [];
return names.map(name => this.providers.get(name)!);
}
getByUrl(url: string): MetadataProvider | undefined {
for (const provider of this.providers.values()) {
if (provider.matchesUrl(url)) {
return provider;
}
}
return undefined;
}
getByGtin(): MetadataProvider[] {
return Array.from(this.providers.values()).filter(p =>
p.featureQuality.gtin !== FeatureQuality.MISSING
);
}
}
// Initialize registry
const registry = new ProviderRegistry();
registry.register(new SpotifyProvider(), 'preferred');
registry.register(new DeezerProvider(), 'default');
registry.register(new iTunesProvider(), 'default');
registry.register(new TidalProvider(), 'preferred');
registry.register(new MusicBrainzProvider(), 'preferred');
registry.register(new BandcampProvider(), 'all');
registry.register(new BeatportProvider(), 'all');
registry.register(new MoraProvider(), 'japan');
registry.register(new OtotoyProvider(), 'japan');
```
## Not Implemented: KKBOX
**Status**: Mentioned in documentation but not implemented
**Reason**: Unknown (possibly API access issues or low priority)
**Potential Implementation**:
- **Region**: Taiwan, Hong Kong, Japan, Singapore, Malaysia
- **API**: Public API available
- **Authentication**: API key required
- **Data quality**: High (official metadata)
## Summary
Harmony's provider integration demonstrates:
1. **Diverse access methods**: API-based (5) and HTML scraping (4)
2. **Unified abstraction**: All providers implement common interface
3. **OAuth2 support**: Spotify and Tidal with token caching
4. **Rate limiting**: Per-provider rate limiters with exponential backoff
5. **Multi-region support**: iTunes queries multiple regions in parallel
6. **Feature quality ratings**: Transparent quality assessment per provider
7. **Graceful degradation**: `Promise.allSettled` ensures partial results
8. **MusicBrainz integration**: MBID resolution and duplicate detection
9. **Caching**: 24-hour HTTP response cache reduces API calls
This architecture is production-ready and serves as an excellent reference for building multi-source metadata aggregation systems.
+394
View File
@@ -0,0 +1,394 @@
# Harmony - Project Overview
## Project Identity
| Property | Value |
|----------|-------|
| **Name** | Harmony |
| **Repository** | https://github.com/kellnerd/harmony |
| **License** | MIT (2022-2024 David Kellner) |
| **Language** | TypeScript |
| **Runtime** | Deno |
| **Primary Framework** | Fresh 1.6.8 |
| **UI Library** | Preact 10.19.6 |
| **Purpose** | Music metadata aggregator and MusicBrainz importer |
## Core Purpose
Harmony is a specialized tool designed to solve two critical problems in music metadata management:
1. **Multi-source metadata aggregation**: Fetches release information from 9 different music platforms and intelligently merges them into a unified, harmonized dataset
2. **MusicBrainz import facilitation**: Converts aggregated metadata into MusicBrainz-compatible format for seeding new releases or improving existing entries
The project targets MusicBrainz editors and music metadata enthusiasts who need to cross-reference multiple sources when adding or verifying release information.
## Technical Stack
### Runtime and Framework
- **Deno**: Modern TypeScript/JavaScript runtime with built-in tooling
- **Fresh 1.6.8**: Deno-native web framework with server-side rendering and islands architecture
- **Preact 10.19.6**: Lightweight React alternative for interactive UI components
### Key Dependencies
| Dependency | Purpose |
|------------|---------|
| `@kellnerd/musicbrainz` | MusicBrainz API client and data structures |
| `snap-storage` | HTTP response caching with SQLite backend |
| `@std/*` | Deno standard library modules (log, testing, http, etc.) |
| `preact` | UI rendering and component system |
| `preact-render-to-string` | Server-side rendering |
## Entry Points
The project provides three distinct entry points for different use cases:
### 1. Web Server (Production)
```bash
# File: server/main.ts
deno task server
```
Starts the Fresh web application for interactive metadata lookup and comparison.
### 2. Development Server
```bash
# File: server/dev.ts
deno task dev
```
Runs the web server with auto-reload on file changes.
### 3. Command-Line Interface
```bash
# File: cli.ts
deno task cli
```
Provides terminal-based GTIN/URL lookup for testing and automation.
## Available Tasks
The `deno.json` configuration defines the following tasks:
| Task | Command | Purpose |
|------|---------|---------|
| `check` | `deno fmt --check && deno lint && deno check **/*.ts` | Verify code formatting, linting, and type checking |
| `ok` | `deno fmt && deno lint && deno check **/*.ts && deno test -A` | Format, lint, check, and test in one command |
| `cli` | `deno run -A cli.ts` | Run command-line interface |
| `dev` | `deno run -A --watch=static/,routes/ server/dev.ts` | Start development server with auto-reload |
| `build` | `deno run -A server/dev.ts build` | Build static assets |
| `server` | `DENO_DEPLOYMENT_ID=$(git describe --tags --always) deno run -A server/main.ts` | Start production server |
## Provider Ecosystem
Harmony integrates with 9 music metadata providers, categorized by access method:
### API-Based Providers (5)
| Provider | Authentication | Rate Limit | Max Image Size | GTIN Support |
|----------|---------------|------------|----------------|--------------|
| **Spotify** | OAuth2 | Not specified | 2000px | Yes (UPC) |
| **Deezer** | Public API | 50 req/5s | 1400px | Yes |
| **iTunes** | Public API | Not specified | Varies | Yes |
| **Tidal** | OAuth2 | Not specified | 1280px | Yes |
| **MusicBrainz** | Public API | 5 req/5s | N/A | Yes (barcode) |
### HTML Scraping Providers (4)
| Provider | Region | Max Image Size | GTIN Support | Notes |
|----------|--------|----------------|--------------|-------|
| **Bandcamp** | Global | 3000px | No | JSON-LD extraction |
| **Beatport** | Global | Varies | Yes | Electronic music focus |
| **Mora** | Japan | Varies | Yes | Japanese market |
| **Ototoy** | Japan | Varies | Yes | Japanese market |
### Not Implemented
- **KKBOX**: Mentioned in documentation but not implemented
## Architecture Highlights
Harmony employs a **4-stage pipeline** for metadata processing:
1. **LOOKUP**: `CombinedReleaseLookup` queries multiple providers in parallel
2. **HARMONIZE**: Each provider converts its native format to `HarmonyRelease` schema
3. **MERGE**: Combines releases from multiple providers using configurable preferences
4. **SEED**: Converts harmonized data to MusicBrainz import format
This pipeline ensures:
- Parallel provider queries for performance
- Standardized internal data representation
- Intelligent conflict resolution
- MusicBrainz-compatible output
## Data Storage Strategy
Harmony uses a **cache-first, no-database** approach:
- **snap_storage**: SQLite-backed HTTP response cache (`snaps.db` + `snaps/` directory)
- **24-hour default cache policy**: Reduces API calls and enables permalink functionality
- **Permalink system**: `ts` parameter replays cached lookups for reproducible results
- **In-memory processing**: All data transformations happen in memory, no persistent storage
This design prioritizes:
- Reproducibility (permalinks)
- API rate limit compliance
- Simplicity (no database migrations)
- Statelessness (no user data storage)
## Deployment Model
Harmony is designed for **self-hosted deployment** without containerization:
### Production Deployment
```bash
deno run -A server/main.ts
```
Environment variables:
- `PORT`: Server port (default varies)
- `DENO_DEPLOYMENT_ID`: Version identifier (auto-set from git tags)
- `HARMONY_SPOTIFY_CLIENT_ID` / `HARMONY_SPOTIFY_CLIENT_SECRET`
- `HARMONY_TIDAL_CLIENT_ID` / `HARMONY_TIDAL_CLIENT_SECRET`
- `HARMONY_MB_API_URL`: MusicBrainz API endpoint
- `HARMONY_MB_TARGET_URL`: MusicBrainz target instance
- `HARMONY_DATA_DIR`: Data directory for cache storage
### CI/CD Pipeline
GitHub Actions workflow (`deno.yml`):
1. **Test stage**: Format check, lint, type check, unit tests
2. **Deploy stage**: SSH to server, rsync code, systemd service restart
3. **Trigger**: Tagged releases (`v*`) and authorized users only
### No Docker
The project intentionally avoids containerization:
- Deno provides consistent runtime across environments
- Fresh framework handles asset bundling
- Simple systemd service management
- Direct SSH deployment
## CLI Usage
The command-line interface supports GTIN and URL lookups:
```bash
# GTIN lookup
deno task cli --gtin 0602537347377
# URL lookup
deno task cli --url https://open.spotify.com/album/xyz
# Multiple URLs
deno task cli --url https://open.spotify.com/album/xyz --url https://www.deezer.com/album/123
# Region-specific lookup
deno task cli --gtin 0602537347377 --region JP,US
```
Output includes:
- Harmonized release metadata
- Provider comparison
- Compatibility warnings
- MusicBrainz seeding data
## Web Interface
The Fresh-based web UI provides:
### Main Route: `/release`
Query parameters:
- `gtin`: Global Trade Item Number (barcode)
- `url`: Provider URL(s) - supports multiple
- `region`: Market regions (default: GB,US,DE,JP)
- `category`: Provider category filter (all/default/preferred)
- `[provider_name]`: Provider-specific ID or GTIN lookup
- `[provider_name]!`: Template mode for provider
- `ts`: Timestamp for permalink replay
### Additional Routes
| Route | Purpose |
|-------|---------|
| `/` | Landing page with documentation |
| `/release/actions` | ISRC/cover submission for existing MusicBrainz releases |
| `/about` | Provider documentation and feature comparison |
| `/settings` | User preferences (stored in cookies) |
### UI Components
- **22 static components**: Server-rendered UI elements
- **5 interactive islands**: Client-side interactive features (Fresh islands architecture)
## Feature Quality System
Providers are rated on feature quality using a standardized scale:
| Rating | Meaning |
|--------|---------|
| `MISSING` | Feature not available |
| `BAD` | Feature present but unreliable/incomplete |
| `PRESENT` | Feature available with acceptable quality |
| `GOOD` | Feature available with high quality |
| Numeric | Specific measurements (e.g., image dimensions) |
This system enables:
- Informed provider selection
- Merge algorithm prioritization
- User transparency about data quality
## Development Workflow
### Code Quality Standards
```bash
# Format code (tabs, single quotes, 120 char width)
deno fmt
# Lint code
deno lint
# Type check
deno check **/*.ts
# Run tests
deno test -A
# All-in-one
deno task ok
```
### Testing Infrastructure
- **38 test files**: Comprehensive test coverage
- **Declarative provider specs**: `describeProvider` helper for consistent provider testing
- **Snapshot testing**: Verify output stability
- **Offline mode**: 43 cached responses in `testdata/` directory
- **Download flag**: `--download` to fetch fresh test data
### Logging System
5 specialized loggers using Deno std/log:
| Logger | Level | Purpose |
|--------|-------|---------|
| `harmony.lookup` | INFO | Release lookup operations |
| `harmony.mbid` | DEBUG | MusicBrainz ID resolution |
| `harmony.provider` | DEBUG/INFO | Provider interactions |
| `harmony.server` | INFO | Server lifecycle events |
| `requests` | INFO/WARN | HTTP request logging |
All loggers use `ConsoleHandler` with color formatting for readability.
## Error Handling Philosophy
Harmony uses a **graceful degradation** approach:
### Error Hierarchy
```
LookupError (base)
└── ProviderError
├── ResponseError (HTTP/API errors)
├── CompatibilityError (data conflicts)
└── CacheMissError (cache lookup failures)
```
### Resilience Strategy
- `Promise.allSettled`: Continue processing even if some providers fail
- Rate limit handling: Parse `Retry-After` headers, dynamic delay adjustment
- Partial results: Return available data even with provider failures
- User feedback: Display warnings for failed providers
## Project Maturity
### Strengths
- **Single developer project**: Consistent vision and architecture
- **Active maintenance**: Recent Tidal v1 deprecation handling (2025-01-21)
- **Production-ready**: Used by MusicBrainz community
- **Well-tested**: 38 test files with offline test data
- **Type-safe**: Full TypeScript coverage with 273-line `HarmonyRelease` schema
### Limitations
- **No REST API**: Web UI only, no programmatic JSON endpoints
- **No authentication**: Public access only
- **No metrics/monitoring**: No health endpoint, no Sentry integration
- **Scraping fragility**: HTML-based providers break when sites change
- **Deno-only**: Fresh framework ties project to Deno ecosystem
## Relevance to Metadata Aggregation
Harmony represents the **gold standard** for multi-source music metadata aggregation:
### Architectural Lessons
1. **Provider abstraction**: Base classes with URLPattern matching, rate limiting, caching
2. **Harmonized schema**: `HarmonyRelease` as universal internal format
3. **Intelligent merging**: 3-phase merge with provider preferences
4. **Permalink system**: Timestamp-based cache replay for reproducibility
5. **Quality ratings**: Per-feature, per-provider quality assessment
### Adoption Recommendations
- **HarmonyRelease schema**: Adopt as internal data model
- **Merge algorithm**: Study 3-phase merge with compatibility checking
- **Provider base classes**: Reuse abstraction patterns
- **MBID resolution**: Batch URL lookup (100 per request) is efficient
- **Testing framework**: Declarative provider specs with offline mode
## Configuration Management
### Environment Variables
```bash
# OAuth2 Credentials
HARMONY_SPOTIFY_CLIENT_ID=your_client_id
HARMONY_SPOTIFY_CLIENT_SECRET=your_client_secret
HARMONY_TIDAL_CLIENT_ID=your_client_id
HARMONY_TIDAL_CLIENT_SECRET=your_client_secret
# MusicBrainz Integration
HARMONY_MB_API_URL=https://musicbrainz.org/ws/2
HARMONY_MB_TARGET_URL=https://musicbrainz.org
# Storage
HARMONY_DATA_DIR=/path/to/data
# Server
PORT=8000
FORWARD_PROTO=https
```
### Configuration Helpers
Located in `utils/config.ts`:
- `getFromEnv(key, defaultValue)`: String environment variables
- `getBooleanFromEnv(key, defaultValue)`: Boolean parsing
- `getUrlFromEnv(key, defaultValue)`: URL validation
### Template
`.env.example` provides a complete configuration template for new deployments.
## Community and Licensing
- **License**: MIT (permissive, commercial-friendly)
- **Copyright**: 2022-2024 David Kellner
- **Community**: MusicBrainz editor community
- **Contribution**: Single maintainer, open to contributions
- **Documentation**: Comprehensive inline comments and type definitions
## Summary
Harmony is a production-ready, TypeScript-based music metadata aggregator that demonstrates best practices in:
- Multi-source data integration
- Intelligent conflict resolution
- MusicBrainz ecosystem integration
- Type-safe architecture
- Graceful error handling
Its 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED) and provider abstraction system make it the most relevant reference project for building a comprehensive metadata aggregation system.