feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,895 @@
|
||||
# Harmony - Provider Integrations Analysis
|
||||
|
||||
## Provider Ecosystem Overview
|
||||
|
||||
Harmony integrates with **9 music metadata providers** using two primary access methods:
|
||||
|
||||
1. **API-based providers (5)**: Structured data via REST APIs
|
||||
2. **HTML scraping providers (4)**: Data extraction from web pages
|
||||
|
||||
All providers share a common base architecture with URL pattern matching, rate limiting, caching, and harmonization to the `HarmonyRelease` schema.
|
||||
|
||||
## Provider Summary Table
|
||||
|
||||
| Provider | Type | Auth | Rate Limit | GTIN | Max Image | Regions | Status |
|
||||
|----------|------|------|------------|------|-----------|---------|--------|
|
||||
| Spotify | API | OAuth2 | Not specified | Yes (UPC) | 2000px | Global | Active |
|
||||
| Deezer | API | Public | 50 req/5s | Yes | 1400px | Global | Active |
|
||||
| iTunes | API | Public | Not specified | Yes | Varies | Multi-region | Active |
|
||||
| Tidal | API | OAuth2 | Not specified | Yes | 1280px | Global | Active (v2) |
|
||||
| MusicBrainz | API | Public | 5 req/5s | Yes (barcode) | N/A | Global | Active |
|
||||
| Bandcamp | Scraping | None | Not specified | No | 3000px | Global | Active |
|
||||
| Beatport | Scraping | None | Not specified | Yes | Varies | Global | Active |
|
||||
| Mora | Scraping | None | Not specified | Yes | Varies | Japan | Active |
|
||||
| Ototoy | Scraping | None | Not specified | Yes | Varies | Japan | Active |
|
||||
|
||||
## API-Based Providers
|
||||
|
||||
### 1. Spotify
|
||||
|
||||
**File**: `providers/spotify.ts`
|
||||
|
||||
#### Authentication
|
||||
|
||||
- **Method**: OAuth2 Client Credentials Flow
|
||||
- **Credentials**: `HARMONY_SPOTIFY_CLIENT_ID`, `HARMONY_SPOTIFY_CLIENT_SECRET`
|
||||
- **Token endpoint**: `https://accounts.spotify.com/api/token`
|
||||
- **Token caching**: localStorage (dev) / sessionStorage (prod)
|
||||
- **Token lifetime**: 3600 seconds (1 hour)
|
||||
|
||||
**OAuth2 Flow**:
|
||||
```typescript
|
||||
async function getAccessToken(): Promise<string> {
|
||||
const response = await fetch('https://accounts.spotify.com/api/token', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Authorization': `Basic ${btoa(`${clientId}:${clientSecret}`)}`,
|
||||
'Content-Type': 'application/x-www-form-urlencoded'
|
||||
},
|
||||
body: 'grant_type=client_credentials'
|
||||
});
|
||||
|
||||
const data = await response.json();
|
||||
return data.access_token;
|
||||
}
|
||||
```
|
||||
|
||||
#### API Endpoints
|
||||
|
||||
| Endpoint | Purpose | Example |
|
||||
|----------|---------|---------|
|
||||
| `GET /v1/albums/{id}` | Album lookup by Spotify ID | `/v1/albums/3DiDSNVBRYVzccLn2yqhMJ` |
|
||||
| `GET /v1/search` | Search by UPC | `/v1/search?q=upc:0602537347377&type=album` |
|
||||
|
||||
#### URL Pattern
|
||||
|
||||
```typescript
|
||||
urlPattern = new URLPattern({
|
||||
hostname: 'open.spotify.com',
|
||||
pathname: '/album/:id'
|
||||
});
|
||||
```
|
||||
|
||||
**Matches**:
|
||||
- `https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ`
|
||||
- `https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ?si=xyz`
|
||||
|
||||
#### Feature Quality
|
||||
|
||||
```typescript
|
||||
featureQuality = {
|
||||
gtin: FeatureQuality.GOOD, // UPC in external_ids
|
||||
title: FeatureQuality.GOOD, // Album name
|
||||
artists: FeatureQuality.GOOD, // Artist array with names
|
||||
releaseDate: FeatureQuality.GOOD, // release_date field
|
||||
labels: FeatureQuality.PRESENT, // Label name (no catalog number)
|
||||
media: FeatureQuality.GOOD, // Disc structure
|
||||
tracks: FeatureQuality.GOOD, // Track listing with durations
|
||||
isrc: FeatureQuality.GOOD, // ISRC per track
|
||||
images: 2000, // Max 2000x2000px
|
||||
copyright: FeatureQuality.PRESENT,// Copyright array
|
||||
availability: FeatureQuality.GOOD // available_markets array
|
||||
};
|
||||
```
|
||||
|
||||
#### Data Mapping
|
||||
|
||||
**Spotify Album Object** → **HarmonyRelease**:
|
||||
|
||||
| Spotify Field | Harmony Field | Transformation |
|
||||
|---------------|---------------|----------------|
|
||||
| `name` | `title` | Direct |
|
||||
| `artists[].name` | `artists[].name` | Map array |
|
||||
| `external_ids.upc` | `gtin` | Direct |
|
||||
| `release_date` | `releaseDate` | Parse to PartialDate |
|
||||
| `label` | `labels[0].name` | Single label |
|
||||
| `tracks.items[]` | `media[0].tracks[]` | Map to HarmonyTrack |
|
||||
| `images[]` | `images[]` | Map with dimensions |
|
||||
| `copyrights[0].text` | `copyright` | First copyright |
|
||||
| `available_markets[]` | `availableIn[]` | Direct |
|
||||
| `external_urls.spotify` | `externalLinks[0].url` | Streaming link |
|
||||
|
||||
**Example Harmonization**:
|
||||
```typescript
|
||||
harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease {
|
||||
return {
|
||||
title: spotifyAlbum.name,
|
||||
artists: spotifyAlbum.artists.map(a => ({ name: a.name })),
|
||||
gtin: spotifyAlbum.external_ids?.upc,
|
||||
media: [{
|
||||
format: MediumFormat.Digital,
|
||||
position: 1,
|
||||
tracks: spotifyAlbum.tracks.items.map((t, i) => ({
|
||||
title: t.name,
|
||||
position: i + 1,
|
||||
length: t.duration_ms,
|
||||
isrc: t.external_ids?.isrc,
|
||||
artists: t.artists.length !== spotifyAlbum.artists.length
|
||||
? t.artists.map(a => ({ name: a.name }))
|
||||
: undefined
|
||||
}))
|
||||
}],
|
||||
releaseDate: this.parseDate(spotifyAlbum.release_date),
|
||||
types: this.inferTypes(spotifyAlbum.album_type),
|
||||
images: spotifyAlbum.images.map(img => ({
|
||||
url: img.url,
|
||||
types: [ImageType.Front],
|
||||
width: img.width,
|
||||
height: img.height
|
||||
})),
|
||||
labels: spotifyAlbum.label ? [{ name: spotifyAlbum.label }] : [],
|
||||
copyright: spotifyAlbum.copyrights?.[0]?.text,
|
||||
availableIn: spotifyAlbum.available_markets,
|
||||
externalLinks: [{
|
||||
url: spotifyAlbum.external_urls.spotify,
|
||||
types: [LinkType.Streaming]
|
||||
}],
|
||||
info: {
|
||||
providers: ['spotify'],
|
||||
messages: []
|
||||
}
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
#### Rate Limiting
|
||||
|
||||
- **Limit**: Not publicly specified
|
||||
- **Handling**: Retry on 429 status with `Retry-After` header
|
||||
- **Caching**: 24-hour cache reduces API calls
|
||||
|
||||
### 2. Deezer
|
||||
|
||||
**File**: `providers/deezer.ts`
|
||||
|
||||
#### Authentication
|
||||
|
||||
- **Method**: Public API (no authentication required)
|
||||
- **Base URL**: `https://api.deezer.com`
|
||||
|
||||
#### Rate Limiting
|
||||
|
||||
- **Limit**: 50 requests per 5 seconds
|
||||
- **Enforcement**: Server-side (429 status on exceed)
|
||||
- **Handling**: Exponential backoff with `Retry-After` header
|
||||
|
||||
#### API Endpoints
|
||||
|
||||
| Endpoint | Purpose | Example |
|
||||
|----------|---------|---------|
|
||||
| `GET /album/{id}` | Album lookup by Deezer ID | `/album/123456` |
|
||||
| `GET /search/album` | Search by UPC | `/search/album?q=upc:0602537347377` |
|
||||
|
||||
#### URL Pattern
|
||||
|
||||
```typescript
|
||||
urlPattern = new URLPattern({
|
||||
hostname: 'www.deezer.com',
|
||||
pathname: '/:locale/album/:id'
|
||||
});
|
||||
```
|
||||
|
||||
**Matches**:
|
||||
- `https://www.deezer.com/en/album/123456`
|
||||
- `https://www.deezer.com/fr/album/123456`
|
||||
|
||||
#### Feature Quality
|
||||
|
||||
```typescript
|
||||
featureQuality = {
|
||||
gtin: FeatureQuality.GOOD, // UPC field
|
||||
title: FeatureQuality.GOOD, // Title field
|
||||
artists: FeatureQuality.GOOD, // Artist object
|
||||
releaseDate: FeatureQuality.GOOD, // release_date field
|
||||
labels: FeatureQuality.GOOD, // Label with catalog number
|
||||
media: FeatureQuality.GOOD, // Disc structure
|
||||
tracks: FeatureQuality.GOOD, // Track listing
|
||||
isrc: FeatureQuality.GOOD, // ISRC per track
|
||||
images: 1400, // Max 1400x1400px
|
||||
copyright: FeatureQuality.GOOD, // Copyright field
|
||||
availability: FeatureQuality.PRESENT // Available countries (limited)
|
||||
};
|
||||
```
|
||||
|
||||
#### Data Mapping
|
||||
|
||||
**Deezer Album Object** → **HarmonyRelease**:
|
||||
|
||||
| Deezer Field | Harmony Field | Notes |
|
||||
|--------------|---------------|-------|
|
||||
| `title` | `title` | Direct |
|
||||
| `artist.name` | `artists[0].name` | Single artist |
|
||||
| `upc` | `gtin` | Direct |
|
||||
| `release_date` | `releaseDate` | YYYY-MM-DD format |
|
||||
| `label` | `labels[0].name` | Label name |
|
||||
| `tracks.data[]` | `media[0].tracks[]` | Track array |
|
||||
| `cover_xl` | `images[0].url` | 1400x1400px |
|
||||
| `copyright` | `copyright` | Direct |
|
||||
|
||||
### 3. iTunes (Apple Music)
|
||||
|
||||
**File**: `providers/itunes.ts`
|
||||
|
||||
#### Authentication
|
||||
|
||||
- **Method**: Public API (no authentication required)
|
||||
- **Base URL**: `https://itunes.apple.com`
|
||||
|
||||
#### Multi-Region Support
|
||||
|
||||
iTunes API is region-specific. Harmony queries multiple regions in parallel.
|
||||
|
||||
**Supported Regions**:
|
||||
- `US` (United States)
|
||||
- `GB` (United Kingdom)
|
||||
- `DE` (Germany)
|
||||
- `JP` (Japan)
|
||||
- `FR` (France)
|
||||
- `CA` (Canada)
|
||||
- `AU` (Australia)
|
||||
|
||||
**Region-Specific Endpoints**:
|
||||
```
|
||||
https://itunes.apple.com/us/lookup?id=123456
|
||||
https://itunes.apple.com/gb/lookup?id=123456
|
||||
https://itunes.apple.com/jp/lookup?id=123456
|
||||
```
|
||||
|
||||
#### API Endpoints
|
||||
|
||||
| Endpoint | Purpose | Example |
|
||||
|----------|---------|---------|
|
||||
| `GET /{region}/lookup` | Album lookup by iTunes ID | `/us/lookup?id=123456` |
|
||||
| `GET /{region}/search` | Search by UPC | `/us/search?term=upc:0602537347377` |
|
||||
|
||||
#### URL Pattern
|
||||
|
||||
```typescript
|
||||
urlPattern = new URLPattern({
|
||||
hostname: 'music.apple.com',
|
||||
pathname: '/:region/album/:name/:id'
|
||||
});
|
||||
```
|
||||
|
||||
**Matches**:
|
||||
- `https://music.apple.com/us/album/album-name/123456`
|
||||
- `https://music.apple.com/jp/album/album-name/123456`
|
||||
|
||||
#### Feature Quality
|
||||
|
||||
```typescript
|
||||
featureQuality = {
|
||||
gtin: FeatureQuality.GOOD, // UPC in response
|
||||
title: FeatureQuality.GOOD, // collectionName
|
||||
artists: FeatureQuality.GOOD, // artistName
|
||||
releaseDate: FeatureQuality.GOOD, // releaseDate
|
||||
labels: FeatureQuality.PRESENT, // copyright (label name embedded)
|
||||
media: FeatureQuality.GOOD, // Track listing
|
||||
tracks: FeatureQuality.GOOD, // Track array
|
||||
isrc: FeatureQuality.MISSING, // Not provided
|
||||
images: 'varies', // 600x600 to 3000x3000
|
||||
copyright: FeatureQuality.PRESENT,// copyright field
|
||||
availability: FeatureQuality.GOOD // Region-specific
|
||||
};
|
||||
```
|
||||
|
||||
### 4. Tidal
|
||||
|
||||
**File**: `providers/tidal.ts`
|
||||
|
||||
#### Authentication
|
||||
|
||||
- **Method**: OAuth2 Client Credentials Flow
|
||||
- **Credentials**: `HARMONY_TIDAL_CLIENT_ID`, `HARMONY_TIDAL_CLIENT_SECRET`
|
||||
- **Token endpoint**: `https://auth.tidal.com/v1/oauth2/token`
|
||||
- **API version**: v2 (v1 deprecated 2025-01-21)
|
||||
|
||||
#### API Version Migration
|
||||
|
||||
**v1 (deprecated 2025-01-21)**:
|
||||
- Endpoint: `https://api.tidal.com/v1/albums/{id}`
|
||||
- Status: No longer supported
|
||||
|
||||
**v2 (current)**:
|
||||
- Endpoint: `https://openapi.tidal.com/v2/albums/{id}`
|
||||
- Migration: Completed in Harmony codebase
|
||||
|
||||
#### API Endpoints
|
||||
|
||||
| Endpoint | Purpose | Example |
|
||||
|----------|---------|---------|
|
||||
| `GET /v2/albums/{id}` | Album lookup by Tidal ID | `/v2/albums/123456` |
|
||||
| `GET /v2/albums/byBarcode/{upc}` | Lookup by UPC | `/v2/albums/byBarcode/0602537347377` |
|
||||
|
||||
#### URL Pattern
|
||||
|
||||
```typescript
|
||||
urlPattern = new URLPattern({
|
||||
hostname: 'tidal.com',
|
||||
pathname: '/browse/album/:id'
|
||||
});
|
||||
```
|
||||
|
||||
**Matches**:
|
||||
- `https://tidal.com/browse/album/123456`
|
||||
- `https://listen.tidal.com/album/123456`
|
||||
|
||||
#### Feature Quality
|
||||
|
||||
```typescript
|
||||
featureQuality = {
|
||||
gtin: FeatureQuality.GOOD, // barcode field
|
||||
title: FeatureQuality.GOOD, // title field
|
||||
artists: FeatureQuality.GOOD, // artists array
|
||||
releaseDate: FeatureQuality.GOOD, // releaseDate
|
||||
labels: FeatureQuality.GOOD, // label with catalog number
|
||||
media: FeatureQuality.GOOD, // Media array
|
||||
tracks: FeatureQuality.GOOD, // Track listing
|
||||
isrc: FeatureQuality.GOOD, // ISRC per track
|
||||
images: 1280, // Max 1280x1280px
|
||||
copyright: FeatureQuality.GOOD, // copyright field
|
||||
availability: FeatureQuality.GOOD // Available countries
|
||||
};
|
||||
```
|
||||
|
||||
### 5. MusicBrainz
|
||||
|
||||
**File**: `providers/musicbrainz.ts`
|
||||
|
||||
#### Authentication
|
||||
|
||||
- **Method**: Public API (no authentication required)
|
||||
- **Base URL**: Configurable via `HARMONY_MB_API_URL` (default: `https://musicbrainz.org/ws/2`)
|
||||
|
||||
#### Rate Limiting
|
||||
|
||||
- **Limit**: 5 requests per 5 seconds (1 req/sec average)
|
||||
- **Enforcement**: Server-side (503 status on exceed)
|
||||
- **Handling**: Exponential backoff, respect `Retry-After` header
|
||||
|
||||
#### API Endpoints
|
||||
|
||||
| Endpoint | Purpose | Example |
|
||||
|----------|---------|---------|
|
||||
| `GET /release/{mbid}` | Release lookup by MBID | `/release/12345678-1234-1234-1234-123456789012` |
|
||||
| `GET /release?barcode={gtin}` | Search by barcode | `/release?barcode=0602537347377` |
|
||||
| `GET /url?resource={url}` | MBID resolution | `/url?resource=https://open.spotify.com/album/xyz` |
|
||||
|
||||
#### URL Pattern
|
||||
|
||||
```typescript
|
||||
urlPattern = new URLPattern({
|
||||
hostname: 'musicbrainz.org',
|
||||
pathname: '/release/:mbid'
|
||||
});
|
||||
```
|
||||
|
||||
**Matches**:
|
||||
- `https://musicbrainz.org/release/12345678-1234-1234-1234-123456789012`
|
||||
|
||||
#### Feature Quality
|
||||
|
||||
```typescript
|
||||
featureQuality = {
|
||||
gtin: FeatureQuality.GOOD, // barcode field
|
||||
title: FeatureQuality.GOOD, // title field
|
||||
artists: FeatureQuality.GOOD, // artist-credit array
|
||||
releaseDate: FeatureQuality.GOOD, // date field
|
||||
labels: FeatureQuality.GOOD, // label-info array
|
||||
media: FeatureQuality.GOOD, // media array
|
||||
tracks: FeatureQuality.GOOD, // track array
|
||||
isrc: FeatureQuality.GOOD, // ISRC per recording
|
||||
images: FeatureQuality.MISSING, // No images in API
|
||||
copyright: FeatureQuality.MISSING,// Not in API
|
||||
availability: FeatureQuality.MISSING // Not tracked
|
||||
};
|
||||
```
|
||||
|
||||
#### Special Role: Template Provider
|
||||
|
||||
MusicBrainz serves as a **template provider** for merge algorithm:
|
||||
|
||||
- **Purpose**: Provide reference data for comparison
|
||||
- **Usage**: `musicbrainz!` parameter in URL
|
||||
- **Behavior**: MusicBrainz data used as baseline, other providers compared against it
|
||||
- **Use case**: Verify existing MusicBrainz releases against external sources
|
||||
|
||||
#### MBID Resolution
|
||||
|
||||
**Batch URL Lookup** (up to 100 URLs per request):
|
||||
|
||||
```typescript
|
||||
async function resolveMBIDs(urls: string[]): Promise<Map<string, string>> {
|
||||
const params = urls.map(url => `resource=${encodeURIComponent(url)}`).join('&');
|
||||
const response = await fetch(`https://musicbrainz.org/ws/2/url?${params}&inc=release-rels`);
|
||||
const data = await response.json();
|
||||
|
||||
const mbids = new Map<string, string>();
|
||||
for (const urlData of data.urls) {
|
||||
const mbid = urlData.relations.find(r => r.type === 'streaming')?.release?.id;
|
||||
if (mbid) {
|
||||
mbids.set(urlData.resource, mbid);
|
||||
}
|
||||
}
|
||||
|
||||
return mbids;
|
||||
}
|
||||
```
|
||||
|
||||
**Duplicate Detection**:
|
||||
- Check if external URLs already linked to MusicBrainz releases
|
||||
- Warn user before creating duplicate
|
||||
- Provide link to existing release
|
||||
|
||||
## HTML Scraping Providers
|
||||
|
||||
### 6. Bandcamp
|
||||
|
||||
**File**: `providers/bandcamp.ts`
|
||||
|
||||
#### Scraping Method
|
||||
|
||||
- **Technique**: JSON-LD extraction from `<script type="application/ld+json">`
|
||||
- **Fallback**: HTML parsing with CSS selectors
|
||||
- **Reliability**: High (JSON-LD is stable)
|
||||
|
||||
#### URL Pattern
|
||||
|
||||
```typescript
|
||||
urlPattern = new URLPattern({
|
||||
hostname: '*.bandcamp.com',
|
||||
pathname: '/album/:slug'
|
||||
});
|
||||
```
|
||||
|
||||
**Matches**:
|
||||
- `https://artist.bandcamp.com/album/album-name`
|
||||
- `https://label.bandcamp.com/album/album-name`
|
||||
|
||||
#### Data Extraction
|
||||
|
||||
**JSON-LD Schema.org MusicAlbum**:
|
||||
```json
|
||||
{
|
||||
"@type": "MusicAlbum",
|
||||
"name": "Album Title",
|
||||
"byArtist": {
|
||||
"@type": "MusicGroup",
|
||||
"name": "Artist Name"
|
||||
},
|
||||
"datePublished": "2014-11-24",
|
||||
"image": "https://f4.bcbits.com/img/a123456789_10.jpg",
|
||||
"track": [
|
||||
{
|
||||
"@type": "MusicRecording",
|
||||
"name": "Track 1",
|
||||
"duration": "PT4M5S"
|
||||
}
|
||||
],
|
||||
"recordLabel": {
|
||||
"@type": "Organization",
|
||||
"name": "Label Name"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Feature Quality
|
||||
|
||||
```typescript
|
||||
featureQuality = {
|
||||
gtin: FeatureQuality.MISSING, // Not provided
|
||||
title: FeatureQuality.GOOD, // name field
|
||||
artists: FeatureQuality.GOOD, // byArtist
|
||||
releaseDate: FeatureQuality.GOOD, // datePublished
|
||||
labels: FeatureQuality.GOOD, // recordLabel
|
||||
media: FeatureQuality.GOOD, // track array
|
||||
tracks: FeatureQuality.GOOD, // Track listing
|
||||
isrc: FeatureQuality.MISSING, // Not provided
|
||||
images: 3000, // Max 3000x3000px (a123456789_10.jpg)
|
||||
copyright: FeatureQuality.PRESENT,// publisher field
|
||||
availability: FeatureQuality.MISSING // Not specified
|
||||
};
|
||||
```
|
||||
|
||||
#### Challenges
|
||||
|
||||
- **No GTIN**: Bandcamp doesn't display barcodes
|
||||
- **Subdomain variability**: Each artist/label has unique subdomain
|
||||
- **Rate limiting**: Not publicly specified, conservative approach
|
||||
|
||||
### 7. Beatport
|
||||
|
||||
**File**: `providers/beatport.ts`
|
||||
|
||||
#### Scraping Method
|
||||
|
||||
- **Technique**: HTML parsing with CSS selectors
|
||||
- **Reliability**: Medium (HTML structure changes break scraper)
|
||||
|
||||
#### URL Pattern
|
||||
|
||||
```typescript
|
||||
urlPattern = new URLPattern({
|
||||
hostname: 'www.beatport.com',
|
||||
pathname: '/release/:slug/:id'
|
||||
});
|
||||
```
|
||||
|
||||
**Matches**:
|
||||
- `https://www.beatport.com/release/album-name/123456`
|
||||
|
||||
#### Data Extraction
|
||||
|
||||
**CSS Selectors**:
|
||||
```typescript
|
||||
const selectors = {
|
||||
title: '.interior-release-chart-content-item h1',
|
||||
artists: '.interior-release-chart-content-item .artist a',
|
||||
releaseDate: '.interior-release-chart-content-item .release-date',
|
||||
label: '.interior-release-chart-content-item .label a',
|
||||
catalogNumber: '.interior-release-chart-content-item .catalog-number',
|
||||
tracks: '.track-grid .track',
|
||||
trackTitle: '.track-title',
|
||||
trackArtists: '.track-artists a',
|
||||
trackLength: '.track-length',
|
||||
coverImage: '.interior-release-chart-artwork img'
|
||||
};
|
||||
```
|
||||
|
||||
#### Feature Quality
|
||||
|
||||
```typescript
|
||||
featureQuality = {
|
||||
gtin: FeatureQuality.PRESENT, // Sometimes in metadata
|
||||
title: FeatureQuality.GOOD, // h1 element
|
||||
artists: FeatureQuality.GOOD, // Artist links
|
||||
releaseDate: FeatureQuality.GOOD, // Release date element
|
||||
labels: FeatureQuality.GOOD, // Label + catalog number
|
||||
media: FeatureQuality.GOOD, // Track grid
|
||||
tracks: FeatureQuality.GOOD, // Track listing
|
||||
isrc: FeatureQuality.MISSING, // Not displayed
|
||||
images: 'varies', // Cover image
|
||||
copyright: FeatureQuality.MISSING,// Not displayed
|
||||
availability: FeatureQuality.MISSING // Not specified
|
||||
};
|
||||
```
|
||||
|
||||
#### Challenges
|
||||
|
||||
- **HTML structure changes**: Frequent redesigns break selectors
|
||||
- **JavaScript rendering**: Some content loaded dynamically
|
||||
- **Rate limiting**: Not specified, risk of IP blocking
|
||||
|
||||
### 8. Mora (Japan)
|
||||
|
||||
**File**: `providers/mora.ts`
|
||||
|
||||
#### Scraping Method
|
||||
|
||||
- **Technique**: HTML parsing with CSS selectors
|
||||
- **Language**: Japanese (requires UTF-8 handling)
|
||||
- **Reliability**: Medium
|
||||
|
||||
#### URL Pattern
|
||||
|
||||
```typescript
|
||||
urlPattern = new URLPattern({
|
||||
hostname: 'mora.jp',
|
||||
pathname: '/package/:id'
|
||||
});
|
||||
```
|
||||
|
||||
**Matches**:
|
||||
- `https://mora.jp/package/123456`
|
||||
|
||||
#### Data Extraction
|
||||
|
||||
**CSS Selectors** (Japanese labels):
|
||||
```typescript
|
||||
const selectors = {
|
||||
title: '.productTitle',
|
||||
artists: '.artistName a',
|
||||
releaseDate: '.releaseDate',
|
||||
label: '.labelName',
|
||||
catalogNumber: '.catalogNumber',
|
||||
tracks: '.trackList .track',
|
||||
coverImage: '.productImage img'
|
||||
};
|
||||
```
|
||||
|
||||
#### Feature Quality
|
||||
|
||||
```typescript
|
||||
featureQuality = {
|
||||
gtin: FeatureQuality.PRESENT, // JAN code (Japanese barcode)
|
||||
title: FeatureQuality.GOOD, // Product title
|
||||
artists: FeatureQuality.GOOD, // Artist links
|
||||
releaseDate: FeatureQuality.GOOD, // Release date
|
||||
labels: FeatureQuality.GOOD, // Label + catalog number
|
||||
media: FeatureQuality.GOOD, // Track list
|
||||
tracks: FeatureQuality.GOOD, // Track details
|
||||
isrc: FeatureQuality.MISSING, // Not displayed
|
||||
images: 'varies', // Product image
|
||||
copyright: FeatureQuality.PRESENT,// Copyright notice
|
||||
availability: FeatureQuality.GOOD // Japan-specific
|
||||
};
|
||||
```
|
||||
|
||||
#### Challenges
|
||||
|
||||
- **Japanese text**: Requires proper encoding and language detection
|
||||
- **JAN vs. UPC**: Japanese Article Number may differ from international UPC
|
||||
- **Regional availability**: Japan-only releases
|
||||
|
||||
### 9. Ototoy (Japan)
|
||||
|
||||
**File**: `providers/ototoy.ts`
|
||||
|
||||
#### Scraping Method
|
||||
|
||||
- **Technique**: HTML parsing with CSS selectors
|
||||
- **Language**: Japanese
|
||||
- **Reliability**: Medium
|
||||
|
||||
#### URL Pattern
|
||||
|
||||
```typescript
|
||||
urlPattern = new URLPattern({
|
||||
hostname: 'ototoy.jp',
|
||||
pathname: '/album/:id'
|
||||
});
|
||||
```
|
||||
|
||||
**Matches**:
|
||||
- `https://ototoy.jp/album/123456`
|
||||
|
||||
#### Feature Quality
|
||||
|
||||
```typescript
|
||||
featureQuality = {
|
||||
gtin: FeatureQuality.PRESENT, // JAN code
|
||||
title: FeatureQuality.GOOD, // Album title
|
||||
artists: FeatureQuality.GOOD, // Artist name
|
||||
releaseDate: FeatureQuality.GOOD, // Release date
|
||||
labels: FeatureQuality.GOOD, // Label info
|
||||
media: FeatureQuality.GOOD, // Track list
|
||||
tracks: FeatureQuality.GOOD, // Track details
|
||||
isrc: FeatureQuality.MISSING, // Not displayed
|
||||
images: 'varies', // Album art
|
||||
copyright: FeatureQuality.PRESENT,// Copyright info
|
||||
availability: FeatureQuality.GOOD // Japan-specific
|
||||
};
|
||||
```
|
||||
|
||||
## Provider Base Architecture
|
||||
|
||||
### MetadataProvider (Abstract Base)
|
||||
|
||||
**File**: `providers/base.ts`
|
||||
|
||||
**Core Functionality**:
|
||||
|
||||
```typescript
|
||||
abstract class MetadataProvider {
|
||||
// Identity
|
||||
abstract name: string;
|
||||
abstract urlPattern: URLPattern;
|
||||
|
||||
// Lookup methods
|
||||
abstract lookupByUrl(url: string): Promise<ProviderRelease>;
|
||||
abstract lookupByGtin(gtin: string, region?: string): Promise<ProviderRelease>;
|
||||
|
||||
// Harmonization
|
||||
abstract harmonize(release: ProviderRelease): HarmonyRelease;
|
||||
|
||||
// Feature quality
|
||||
abstract featureQuality: FeatureQualityMap;
|
||||
|
||||
// Rate limiting
|
||||
protected rateLimit: RateLimiter;
|
||||
protected async throttle(): Promise<void> {
|
||||
await this.rateLimit.wait();
|
||||
}
|
||||
|
||||
// Caching
|
||||
protected cache: SnapStorage;
|
||||
protected async getCached(key: string): Promise<Response | null> {
|
||||
return await this.cache.get(key);
|
||||
}
|
||||
protected async setCached(key: string, response: Response): Promise<void> {
|
||||
await this.cache.set(key, response);
|
||||
}
|
||||
|
||||
// URL matching
|
||||
matchesUrl(url: string): boolean {
|
||||
return this.urlPattern.test(url);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### MetadataApiProvider (OAuth2)
|
||||
|
||||
**File**: `providers/api_base.ts`
|
||||
|
||||
**OAuth2 Support**:
|
||||
|
||||
```typescript
|
||||
abstract class MetadataApiProvider extends MetadataProvider {
|
||||
protected abstract clientId: string;
|
||||
protected abstract clientSecret: string;
|
||||
protected abstract tokenEndpoint: string;
|
||||
|
||||
protected async getAccessToken(): Promise<string> {
|
||||
// Check cache
|
||||
const cached = this.getTokenFromCache();
|
||||
if (cached && !this.isTokenExpired(cached)) {
|
||||
return cached.access_token;
|
||||
}
|
||||
|
||||
// Request new token
|
||||
const token = await this.requestToken();
|
||||
this.cacheToken(token);
|
||||
return token.access_token;
|
||||
}
|
||||
|
||||
protected abstract async requestToken(): Promise<OAuth2Token>;
|
||||
|
||||
protected async fetch(url: string, options?: RequestInit): Promise<Response> {
|
||||
const token = await this.getAccessToken();
|
||||
return await fetch(url, {
|
||||
...options,
|
||||
headers: {
|
||||
...options?.headers,
|
||||
'Authorization': `Bearer ${token}`
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### RateLimiter
|
||||
|
||||
**File**: `utils/rate_limiter.ts`
|
||||
|
||||
**Implementation**:
|
||||
|
||||
```typescript
|
||||
class RateLimiter {
|
||||
private queue: number[] = [];
|
||||
private maxRequests: number;
|
||||
private timeWindow: number; // milliseconds
|
||||
|
||||
constructor(maxRequests: number, timeWindow: number) {
|
||||
this.maxRequests = maxRequests;
|
||||
this.timeWindow = timeWindow;
|
||||
}
|
||||
|
||||
async wait(): Promise<void> {
|
||||
const now = Date.now();
|
||||
|
||||
// Remove old requests outside time window
|
||||
this.queue = this.queue.filter(t => now - t < this.timeWindow);
|
||||
|
||||
// If at limit, wait until oldest request expires
|
||||
if (this.queue.length >= this.maxRequests) {
|
||||
const oldestRequest = this.queue[0];
|
||||
const waitTime = this.timeWindow - (now - oldestRequest);
|
||||
await new Promise(resolve => setTimeout(resolve, waitTime));
|
||||
return this.wait(); // Recursive call after waiting
|
||||
}
|
||||
|
||||
// Add current request to queue
|
||||
this.queue.push(now);
|
||||
}
|
||||
}
|
||||
|
||||
// Usage
|
||||
const deezerLimiter = new RateLimiter(50, 5000); // 50 req / 5 sec
|
||||
const mbLimiter = new RateLimiter(5, 5000); // 5 req / 5 sec
|
||||
```
|
||||
|
||||
## Provider Registry
|
||||
|
||||
**File**: `providers/registry.ts`
|
||||
|
||||
**Registration**:
|
||||
|
||||
```typescript
|
||||
class ProviderRegistry {
|
||||
private providers = new Map<string, MetadataProvider>();
|
||||
private categories = new Map<string, string[]>();
|
||||
|
||||
register(provider: MetadataProvider, category: string): void {
|
||||
this.providers.set(provider.name, provider);
|
||||
|
||||
if (!this.categories.has(category)) {
|
||||
this.categories.set(category, []);
|
||||
}
|
||||
this.categories.get(category)!.push(provider.name);
|
||||
}
|
||||
|
||||
get(name: string): MetadataProvider | undefined {
|
||||
return this.providers.get(name);
|
||||
}
|
||||
|
||||
getByCategory(category: string): MetadataProvider[] {
|
||||
const names = this.categories.get(category) || [];
|
||||
return names.map(name => this.providers.get(name)!);
|
||||
}
|
||||
|
||||
getByUrl(url: string): MetadataProvider | undefined {
|
||||
for (const provider of this.providers.values()) {
|
||||
if (provider.matchesUrl(url)) {
|
||||
return provider;
|
||||
}
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
getByGtin(): MetadataProvider[] {
|
||||
return Array.from(this.providers.values()).filter(p =>
|
||||
p.featureQuality.gtin !== FeatureQuality.MISSING
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// Initialize registry
|
||||
const registry = new ProviderRegistry();
|
||||
registry.register(new SpotifyProvider(), 'preferred');
|
||||
registry.register(new DeezerProvider(), 'default');
|
||||
registry.register(new iTunesProvider(), 'default');
|
||||
registry.register(new TidalProvider(), 'preferred');
|
||||
registry.register(new MusicBrainzProvider(), 'preferred');
|
||||
registry.register(new BandcampProvider(), 'all');
|
||||
registry.register(new BeatportProvider(), 'all');
|
||||
registry.register(new MoraProvider(), 'japan');
|
||||
registry.register(new OtotoyProvider(), 'japan');
|
||||
```
|
||||
|
||||
## Not Implemented: KKBOX
|
||||
|
||||
**Status**: Mentioned in documentation but not implemented
|
||||
|
||||
**Reason**: Unknown (possibly API access issues or low priority)
|
||||
|
||||
**Potential Implementation**:
|
||||
- **Region**: Taiwan, Hong Kong, Japan, Singapore, Malaysia
|
||||
- **API**: Public API available
|
||||
- **Authentication**: API key required
|
||||
- **Data quality**: High (official metadata)
|
||||
|
||||
## Summary
|
||||
|
||||
Harmony's provider integration demonstrates:
|
||||
|
||||
1. **Diverse access methods**: API-based (5) and HTML scraping (4)
|
||||
2. **Unified abstraction**: All providers implement common interface
|
||||
3. **OAuth2 support**: Spotify and Tidal with token caching
|
||||
4. **Rate limiting**: Per-provider rate limiters with exponential backoff
|
||||
5. **Multi-region support**: iTunes queries multiple regions in parallel
|
||||
6. **Feature quality ratings**: Transparent quality assessment per provider
|
||||
7. **Graceful degradation**: `Promise.allSettled` ensures partial results
|
||||
8. **MusicBrainz integration**: MBID resolution and duplicate detection
|
||||
9. **Caching**: 24-hour HTTP response cache reduces API calls
|
||||
|
||||
This architecture is production-ready and serves as an excellent reference for building multi-source metadata aggregation systems.
|
||||
Reference in New Issue
Block a user