a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
896 lines
25 KiB
Markdown
896 lines
25 KiB
Markdown
# Harmony - Provider Integrations Analysis
|
|
|
|
## Provider Ecosystem Overview
|
|
|
|
Harmony integrates with **9 music metadata providers** using two primary access methods:
|
|
|
|
1. **API-based providers (5)**: Structured data via REST APIs
|
|
2. **HTML scraping providers (4)**: Data extraction from web pages
|
|
|
|
All providers share a common base architecture with URL pattern matching, rate limiting, caching, and harmonization to the `HarmonyRelease` schema.
|
|
|
|
## Provider Summary Table
|
|
|
|
| Provider | Type | Auth | Rate Limit | GTIN | Max Image | Regions | Status |
|
|
|----------|------|------|------------|------|-----------|---------|--------|
|
|
| Spotify | API | OAuth2 | Not specified | Yes (UPC) | 2000px | Global | Active |
|
|
| Deezer | API | Public | 50 req/5s | Yes | 1400px | Global | Active |
|
|
| iTunes | API | Public | Not specified | Yes | Varies | Multi-region | Active |
|
|
| Tidal | API | OAuth2 | Not specified | Yes | 1280px | Global | Active (v2) |
|
|
| MusicBrainz | API | Public | 5 req/5s | Yes (barcode) | N/A | Global | Active |
|
|
| Bandcamp | Scraping | None | Not specified | No | 3000px | Global | Active |
|
|
| Beatport | Scraping | None | Not specified | Yes | Varies | Global | Active |
|
|
| Mora | Scraping | None | Not specified | Yes | Varies | Japan | Active |
|
|
| Ototoy | Scraping | None | Not specified | Yes | Varies | Japan | Active |
|
|
|
|
## API-Based Providers
|
|
|
|
### 1. Spotify
|
|
|
|
**File**: `providers/spotify.ts`
|
|
|
|
#### Authentication
|
|
|
|
- **Method**: OAuth2 Client Credentials Flow
|
|
- **Credentials**: `HARMONY_SPOTIFY_CLIENT_ID`, `HARMONY_SPOTIFY_CLIENT_SECRET`
|
|
- **Token endpoint**: `https://accounts.spotify.com/api/token`
|
|
- **Token caching**: localStorage (dev) / sessionStorage (prod)
|
|
- **Token lifetime**: 3600 seconds (1 hour)
|
|
|
|
**OAuth2 Flow**:
|
|
```typescript
|
|
async function getAccessToken(): Promise<string> {
|
|
const response = await fetch('https://accounts.spotify.com/api/token', {
|
|
method: 'POST',
|
|
headers: {
|
|
'Authorization': `Basic ${btoa(`${clientId}:${clientSecret}`)}`,
|
|
'Content-Type': 'application/x-www-form-urlencoded'
|
|
},
|
|
body: 'grant_type=client_credentials'
|
|
});
|
|
|
|
const data = await response.json();
|
|
return data.access_token;
|
|
}
|
|
```
|
|
|
|
#### API Endpoints
|
|
|
|
| Endpoint | Purpose | Example |
|
|
|----------|---------|---------|
|
|
| `GET /v1/albums/{id}` | Album lookup by Spotify ID | `/v1/albums/3DiDSNVBRYVzccLn2yqhMJ` |
|
|
| `GET /v1/search` | Search by UPC | `/v1/search?q=upc:0602537347377&type=album` |
|
|
|
|
#### URL Pattern
|
|
|
|
```typescript
|
|
urlPattern = new URLPattern({
|
|
hostname: 'open.spotify.com',
|
|
pathname: '/album/:id'
|
|
});
|
|
```
|
|
|
|
**Matches**:
|
|
- `https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ`
|
|
- `https://open.spotify.com/album/3DiDSNVBRYVzccLn2yqhMJ?si=xyz`
|
|
|
|
#### Feature Quality
|
|
|
|
```typescript
|
|
featureQuality = {
|
|
gtin: FeatureQuality.GOOD, // UPC in external_ids
|
|
title: FeatureQuality.GOOD, // Album name
|
|
artists: FeatureQuality.GOOD, // Artist array with names
|
|
releaseDate: FeatureQuality.GOOD, // release_date field
|
|
labels: FeatureQuality.PRESENT, // Label name (no catalog number)
|
|
media: FeatureQuality.GOOD, // Disc structure
|
|
tracks: FeatureQuality.GOOD, // Track listing with durations
|
|
isrc: FeatureQuality.GOOD, // ISRC per track
|
|
images: 2000, // Max 2000x2000px
|
|
copyright: FeatureQuality.PRESENT,// Copyright array
|
|
availability: FeatureQuality.GOOD // available_markets array
|
|
};
|
|
```
|
|
|
|
#### Data Mapping
|
|
|
|
**Spotify Album Object** → **HarmonyRelease**:
|
|
|
|
| Spotify Field | Harmony Field | Transformation |
|
|
|---------------|---------------|----------------|
|
|
| `name` | `title` | Direct |
|
|
| `artists[].name` | `artists[].name` | Map array |
|
|
| `external_ids.upc` | `gtin` | Direct |
|
|
| `release_date` | `releaseDate` | Parse to PartialDate |
|
|
| `label` | `labels[0].name` | Single label |
|
|
| `tracks.items[]` | `media[0].tracks[]` | Map to HarmonyTrack |
|
|
| `images[]` | `images[]` | Map with dimensions |
|
|
| `copyrights[0].text` | `copyright` | First copyright |
|
|
| `available_markets[]` | `availableIn[]` | Direct |
|
|
| `external_urls.spotify` | `externalLinks[0].url` | Streaming link |
|
|
|
|
**Example Harmonization**:
|
|
```typescript
|
|
harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease {
|
|
return {
|
|
title: spotifyAlbum.name,
|
|
artists: spotifyAlbum.artists.map(a => ({ name: a.name })),
|
|
gtin: spotifyAlbum.external_ids?.upc,
|
|
media: [{
|
|
format: MediumFormat.Digital,
|
|
position: 1,
|
|
tracks: spotifyAlbum.tracks.items.map((t, i) => ({
|
|
title: t.name,
|
|
position: i + 1,
|
|
length: t.duration_ms,
|
|
isrc: t.external_ids?.isrc,
|
|
artists: t.artists.length !== spotifyAlbum.artists.length
|
|
? t.artists.map(a => ({ name: a.name }))
|
|
: undefined
|
|
}))
|
|
}],
|
|
releaseDate: this.parseDate(spotifyAlbum.release_date),
|
|
types: this.inferTypes(spotifyAlbum.album_type),
|
|
images: spotifyAlbum.images.map(img => ({
|
|
url: img.url,
|
|
types: [ImageType.Front],
|
|
width: img.width,
|
|
height: img.height
|
|
})),
|
|
labels: spotifyAlbum.label ? [{ name: spotifyAlbum.label }] : [],
|
|
copyright: spotifyAlbum.copyrights?.[0]?.text,
|
|
availableIn: spotifyAlbum.available_markets,
|
|
externalLinks: [{
|
|
url: spotifyAlbum.external_urls.spotify,
|
|
types: [LinkType.Streaming]
|
|
}],
|
|
info: {
|
|
providers: ['spotify'],
|
|
messages: []
|
|
}
|
|
};
|
|
}
|
|
```
|
|
|
|
#### Rate Limiting
|
|
|
|
- **Limit**: Not publicly specified
|
|
- **Handling**: Retry on 429 status with `Retry-After` header
|
|
- **Caching**: 24-hour cache reduces API calls
|
|
|
|
### 2. Deezer
|
|
|
|
**File**: `providers/deezer.ts`
|
|
|
|
#### Authentication
|
|
|
|
- **Method**: Public API (no authentication required)
|
|
- **Base URL**: `https://api.deezer.com`
|
|
|
|
#### Rate Limiting
|
|
|
|
- **Limit**: 50 requests per 5 seconds
|
|
- **Enforcement**: Server-side (429 status on exceed)
|
|
- **Handling**: Exponential backoff with `Retry-After` header
|
|
|
|
#### API Endpoints
|
|
|
|
| Endpoint | Purpose | Example |
|
|
|----------|---------|---------|
|
|
| `GET /album/{id}` | Album lookup by Deezer ID | `/album/123456` |
|
|
| `GET /search/album` | Search by UPC | `/search/album?q=upc:0602537347377` |
|
|
|
|
#### URL Pattern
|
|
|
|
```typescript
|
|
urlPattern = new URLPattern({
|
|
hostname: 'www.deezer.com',
|
|
pathname: '/:locale/album/:id'
|
|
});
|
|
```
|
|
|
|
**Matches**:
|
|
- `https://www.deezer.com/en/album/123456`
|
|
- `https://www.deezer.com/fr/album/123456`
|
|
|
|
#### Feature Quality
|
|
|
|
```typescript
|
|
featureQuality = {
|
|
gtin: FeatureQuality.GOOD, // UPC field
|
|
title: FeatureQuality.GOOD, // Title field
|
|
artists: FeatureQuality.GOOD, // Artist object
|
|
releaseDate: FeatureQuality.GOOD, // release_date field
|
|
labels: FeatureQuality.GOOD, // Label with catalog number
|
|
media: FeatureQuality.GOOD, // Disc structure
|
|
tracks: FeatureQuality.GOOD, // Track listing
|
|
isrc: FeatureQuality.GOOD, // ISRC per track
|
|
images: 1400, // Max 1400x1400px
|
|
copyright: FeatureQuality.GOOD, // Copyright field
|
|
availability: FeatureQuality.PRESENT // Available countries (limited)
|
|
};
|
|
```
|
|
|
|
#### Data Mapping
|
|
|
|
**Deezer Album Object** → **HarmonyRelease**:
|
|
|
|
| Deezer Field | Harmony Field | Notes |
|
|
|--------------|---------------|-------|
|
|
| `title` | `title` | Direct |
|
|
| `artist.name` | `artists[0].name` | Single artist |
|
|
| `upc` | `gtin` | Direct |
|
|
| `release_date` | `releaseDate` | YYYY-MM-DD format |
|
|
| `label` | `labels[0].name` | Label name |
|
|
| `tracks.data[]` | `media[0].tracks[]` | Track array |
|
|
| `cover_xl` | `images[0].url` | 1400x1400px |
|
|
| `copyright` | `copyright` | Direct |
|
|
|
|
### 3. iTunes (Apple Music)
|
|
|
|
**File**: `providers/itunes.ts`
|
|
|
|
#### Authentication
|
|
|
|
- **Method**: Public API (no authentication required)
|
|
- **Base URL**: `https://itunes.apple.com`
|
|
|
|
#### Multi-Region Support
|
|
|
|
iTunes API is region-specific. Harmony queries multiple regions in parallel.
|
|
|
|
**Supported Regions**:
|
|
- `US` (United States)
|
|
- `GB` (United Kingdom)
|
|
- `DE` (Germany)
|
|
- `JP` (Japan)
|
|
- `FR` (France)
|
|
- `CA` (Canada)
|
|
- `AU` (Australia)
|
|
|
|
**Region-Specific Endpoints**:
|
|
```
|
|
https://itunes.apple.com/us/lookup?id=123456
|
|
https://itunes.apple.com/gb/lookup?id=123456
|
|
https://itunes.apple.com/jp/lookup?id=123456
|
|
```
|
|
|
|
#### API Endpoints
|
|
|
|
| Endpoint | Purpose | Example |
|
|
|----------|---------|---------|
|
|
| `GET /{region}/lookup` | Album lookup by iTunes ID | `/us/lookup?id=123456` |
|
|
| `GET /{region}/search` | Search by UPC | `/us/search?term=upc:0602537347377` |
|
|
|
|
#### URL Pattern
|
|
|
|
```typescript
|
|
urlPattern = new URLPattern({
|
|
hostname: 'music.apple.com',
|
|
pathname: '/:region/album/:name/:id'
|
|
});
|
|
```
|
|
|
|
**Matches**:
|
|
- `https://music.apple.com/us/album/album-name/123456`
|
|
- `https://music.apple.com/jp/album/album-name/123456`
|
|
|
|
#### Feature Quality
|
|
|
|
```typescript
|
|
featureQuality = {
|
|
gtin: FeatureQuality.GOOD, // UPC in response
|
|
title: FeatureQuality.GOOD, // collectionName
|
|
artists: FeatureQuality.GOOD, // artistName
|
|
releaseDate: FeatureQuality.GOOD, // releaseDate
|
|
labels: FeatureQuality.PRESENT, // copyright (label name embedded)
|
|
media: FeatureQuality.GOOD, // Track listing
|
|
tracks: FeatureQuality.GOOD, // Track array
|
|
isrc: FeatureQuality.MISSING, // Not provided
|
|
images: 'varies', // 600x600 to 3000x3000
|
|
copyright: FeatureQuality.PRESENT,// copyright field
|
|
availability: FeatureQuality.GOOD // Region-specific
|
|
};
|
|
```
|
|
|
|
### 4. Tidal
|
|
|
|
**File**: `providers/tidal.ts`
|
|
|
|
#### Authentication
|
|
|
|
- **Method**: OAuth2 Client Credentials Flow
|
|
- **Credentials**: `HARMONY_TIDAL_CLIENT_ID`, `HARMONY_TIDAL_CLIENT_SECRET`
|
|
- **Token endpoint**: `https://auth.tidal.com/v1/oauth2/token`
|
|
- **API version**: v2 (v1 deprecated 2025-01-21)
|
|
|
|
#### API Version Migration
|
|
|
|
**v1 (deprecated 2025-01-21)**:
|
|
- Endpoint: `https://api.tidal.com/v1/albums/{id}`
|
|
- Status: No longer supported
|
|
|
|
**v2 (current)**:
|
|
- Endpoint: `https://openapi.tidal.com/v2/albums/{id}`
|
|
- Migration: Completed in Harmony codebase
|
|
|
|
#### API Endpoints
|
|
|
|
| Endpoint | Purpose | Example |
|
|
|----------|---------|---------|
|
|
| `GET /v2/albums/{id}` | Album lookup by Tidal ID | `/v2/albums/123456` |
|
|
| `GET /v2/albums/byBarcode/{upc}` | Lookup by UPC | `/v2/albums/byBarcode/0602537347377` |
|
|
|
|
#### URL Pattern
|
|
|
|
```typescript
|
|
urlPattern = new URLPattern({
|
|
hostname: 'tidal.com',
|
|
pathname: '/browse/album/:id'
|
|
});
|
|
```
|
|
|
|
**Matches**:
|
|
- `https://tidal.com/browse/album/123456`
|
|
- `https://listen.tidal.com/album/123456`
|
|
|
|
#### Feature Quality
|
|
|
|
```typescript
|
|
featureQuality = {
|
|
gtin: FeatureQuality.GOOD, // barcode field
|
|
title: FeatureQuality.GOOD, // title field
|
|
artists: FeatureQuality.GOOD, // artists array
|
|
releaseDate: FeatureQuality.GOOD, // releaseDate
|
|
labels: FeatureQuality.GOOD, // label with catalog number
|
|
media: FeatureQuality.GOOD, // Media array
|
|
tracks: FeatureQuality.GOOD, // Track listing
|
|
isrc: FeatureQuality.GOOD, // ISRC per track
|
|
images: 1280, // Max 1280x1280px
|
|
copyright: FeatureQuality.GOOD, // copyright field
|
|
availability: FeatureQuality.GOOD // Available countries
|
|
};
|
|
```
|
|
|
|
### 5. MusicBrainz
|
|
|
|
**File**: `providers/musicbrainz.ts`
|
|
|
|
#### Authentication
|
|
|
|
- **Method**: Public API (no authentication required)
|
|
- **Base URL**: Configurable via `HARMONY_MB_API_URL` (default: `https://musicbrainz.org/ws/2`)
|
|
|
|
#### Rate Limiting
|
|
|
|
- **Limit**: 5 requests per 5 seconds (1 req/sec average)
|
|
- **Enforcement**: Server-side (503 status on exceed)
|
|
- **Handling**: Exponential backoff, respect `Retry-After` header
|
|
|
|
#### API Endpoints
|
|
|
|
| Endpoint | Purpose | Example |
|
|
|----------|---------|---------|
|
|
| `GET /release/{mbid}` | Release lookup by MBID | `/release/12345678-1234-1234-1234-123456789012` |
|
|
| `GET /release?barcode={gtin}` | Search by barcode | `/release?barcode=0602537347377` |
|
|
| `GET /url?resource={url}` | MBID resolution | `/url?resource=https://open.spotify.com/album/xyz` |
|
|
|
|
#### URL Pattern
|
|
|
|
```typescript
|
|
urlPattern = new URLPattern({
|
|
hostname: 'musicbrainz.org',
|
|
pathname: '/release/:mbid'
|
|
});
|
|
```
|
|
|
|
**Matches**:
|
|
- `https://musicbrainz.org/release/12345678-1234-1234-1234-123456789012`
|
|
|
|
#### Feature Quality
|
|
|
|
```typescript
|
|
featureQuality = {
|
|
gtin: FeatureQuality.GOOD, // barcode field
|
|
title: FeatureQuality.GOOD, // title field
|
|
artists: FeatureQuality.GOOD, // artist-credit array
|
|
releaseDate: FeatureQuality.GOOD, // date field
|
|
labels: FeatureQuality.GOOD, // label-info array
|
|
media: FeatureQuality.GOOD, // media array
|
|
tracks: FeatureQuality.GOOD, // track array
|
|
isrc: FeatureQuality.GOOD, // ISRC per recording
|
|
images: FeatureQuality.MISSING, // No images in API
|
|
copyright: FeatureQuality.MISSING,// Not in API
|
|
availability: FeatureQuality.MISSING // Not tracked
|
|
};
|
|
```
|
|
|
|
#### Special Role: Template Provider
|
|
|
|
MusicBrainz serves as a **template provider** for merge algorithm:
|
|
|
|
- **Purpose**: Provide reference data for comparison
|
|
- **Usage**: `musicbrainz!` parameter in URL
|
|
- **Behavior**: MusicBrainz data used as baseline, other providers compared against it
|
|
- **Use case**: Verify existing MusicBrainz releases against external sources
|
|
|
|
#### MBID Resolution
|
|
|
|
**Batch URL Lookup** (up to 100 URLs per request):
|
|
|
|
```typescript
|
|
async function resolveMBIDs(urls: string[]): Promise<Map<string, string>> {
|
|
const params = urls.map(url => `resource=${encodeURIComponent(url)}`).join('&');
|
|
const response = await fetch(`https://musicbrainz.org/ws/2/url?${params}&inc=release-rels`);
|
|
const data = await response.json();
|
|
|
|
const mbids = new Map<string, string>();
|
|
for (const urlData of data.urls) {
|
|
const mbid = urlData.relations.find(r => r.type === 'streaming')?.release?.id;
|
|
if (mbid) {
|
|
mbids.set(urlData.resource, mbid);
|
|
}
|
|
}
|
|
|
|
return mbids;
|
|
}
|
|
```
|
|
|
|
**Duplicate Detection**:
|
|
- Check if external URLs already linked to MusicBrainz releases
|
|
- Warn user before creating duplicate
|
|
- Provide link to existing release
|
|
|
|
## HTML Scraping Providers
|
|
|
|
### 6. Bandcamp
|
|
|
|
**File**: `providers/bandcamp.ts`
|
|
|
|
#### Scraping Method
|
|
|
|
- **Technique**: JSON-LD extraction from `<script type="application/ld+json">`
|
|
- **Fallback**: HTML parsing with CSS selectors
|
|
- **Reliability**: High (JSON-LD is stable)
|
|
|
|
#### URL Pattern
|
|
|
|
```typescript
|
|
urlPattern = new URLPattern({
|
|
hostname: '*.bandcamp.com',
|
|
pathname: '/album/:slug'
|
|
});
|
|
```
|
|
|
|
**Matches**:
|
|
- `https://artist.bandcamp.com/album/album-name`
|
|
- `https://label.bandcamp.com/album/album-name`
|
|
|
|
#### Data Extraction
|
|
|
|
**JSON-LD Schema.org MusicAlbum**:
|
|
```json
|
|
{
|
|
"@type": "MusicAlbum",
|
|
"name": "Album Title",
|
|
"byArtist": {
|
|
"@type": "MusicGroup",
|
|
"name": "Artist Name"
|
|
},
|
|
"datePublished": "2014-11-24",
|
|
"image": "https://f4.bcbits.com/img/a123456789_10.jpg",
|
|
"track": [
|
|
{
|
|
"@type": "MusicRecording",
|
|
"name": "Track 1",
|
|
"duration": "PT4M5S"
|
|
}
|
|
],
|
|
"recordLabel": {
|
|
"@type": "Organization",
|
|
"name": "Label Name"
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Feature Quality
|
|
|
|
```typescript
|
|
featureQuality = {
|
|
gtin: FeatureQuality.MISSING, // Not provided
|
|
title: FeatureQuality.GOOD, // name field
|
|
artists: FeatureQuality.GOOD, // byArtist
|
|
releaseDate: FeatureQuality.GOOD, // datePublished
|
|
labels: FeatureQuality.GOOD, // recordLabel
|
|
media: FeatureQuality.GOOD, // track array
|
|
tracks: FeatureQuality.GOOD, // Track listing
|
|
isrc: FeatureQuality.MISSING, // Not provided
|
|
images: 3000, // Max 3000x3000px (a123456789_10.jpg)
|
|
copyright: FeatureQuality.PRESENT,// publisher field
|
|
availability: FeatureQuality.MISSING // Not specified
|
|
};
|
|
```
|
|
|
|
#### Challenges
|
|
|
|
- **No GTIN**: Bandcamp doesn't display barcodes
|
|
- **Subdomain variability**: Each artist/label has unique subdomain
|
|
- **Rate limiting**: Not publicly specified, conservative approach
|
|
|
|
### 7. Beatport
|
|
|
|
**File**: `providers/beatport.ts`
|
|
|
|
#### Scraping Method
|
|
|
|
- **Technique**: HTML parsing with CSS selectors
|
|
- **Reliability**: Medium (HTML structure changes break scraper)
|
|
|
|
#### URL Pattern
|
|
|
|
```typescript
|
|
urlPattern = new URLPattern({
|
|
hostname: 'www.beatport.com',
|
|
pathname: '/release/:slug/:id'
|
|
});
|
|
```
|
|
|
|
**Matches**:
|
|
- `https://www.beatport.com/release/album-name/123456`
|
|
|
|
#### Data Extraction
|
|
|
|
**CSS Selectors**:
|
|
```typescript
|
|
const selectors = {
|
|
title: '.interior-release-chart-content-item h1',
|
|
artists: '.interior-release-chart-content-item .artist a',
|
|
releaseDate: '.interior-release-chart-content-item .release-date',
|
|
label: '.interior-release-chart-content-item .label a',
|
|
catalogNumber: '.interior-release-chart-content-item .catalog-number',
|
|
tracks: '.track-grid .track',
|
|
trackTitle: '.track-title',
|
|
trackArtists: '.track-artists a',
|
|
trackLength: '.track-length',
|
|
coverImage: '.interior-release-chart-artwork img'
|
|
};
|
|
```
|
|
|
|
#### Feature Quality
|
|
|
|
```typescript
|
|
featureQuality = {
|
|
gtin: FeatureQuality.PRESENT, // Sometimes in metadata
|
|
title: FeatureQuality.GOOD, // h1 element
|
|
artists: FeatureQuality.GOOD, // Artist links
|
|
releaseDate: FeatureQuality.GOOD, // Release date element
|
|
labels: FeatureQuality.GOOD, // Label + catalog number
|
|
media: FeatureQuality.GOOD, // Track grid
|
|
tracks: FeatureQuality.GOOD, // Track listing
|
|
isrc: FeatureQuality.MISSING, // Not displayed
|
|
images: 'varies', // Cover image
|
|
copyright: FeatureQuality.MISSING,// Not displayed
|
|
availability: FeatureQuality.MISSING // Not specified
|
|
};
|
|
```
|
|
|
|
#### Challenges
|
|
|
|
- **HTML structure changes**: Frequent redesigns break selectors
|
|
- **JavaScript rendering**: Some content loaded dynamically
|
|
- **Rate limiting**: Not specified, risk of IP blocking
|
|
|
|
### 8. Mora (Japan)
|
|
|
|
**File**: `providers/mora.ts`
|
|
|
|
#### Scraping Method
|
|
|
|
- **Technique**: HTML parsing with CSS selectors
|
|
- **Language**: Japanese (requires UTF-8 handling)
|
|
- **Reliability**: Medium
|
|
|
|
#### URL Pattern
|
|
|
|
```typescript
|
|
urlPattern = new URLPattern({
|
|
hostname: 'mora.jp',
|
|
pathname: '/package/:id'
|
|
});
|
|
```
|
|
|
|
**Matches**:
|
|
- `https://mora.jp/package/123456`
|
|
|
|
#### Data Extraction
|
|
|
|
**CSS Selectors** (Japanese labels):
|
|
```typescript
|
|
const selectors = {
|
|
title: '.productTitle',
|
|
artists: '.artistName a',
|
|
releaseDate: '.releaseDate',
|
|
label: '.labelName',
|
|
catalogNumber: '.catalogNumber',
|
|
tracks: '.trackList .track',
|
|
coverImage: '.productImage img'
|
|
};
|
|
```
|
|
|
|
#### Feature Quality
|
|
|
|
```typescript
|
|
featureQuality = {
|
|
gtin: FeatureQuality.PRESENT, // JAN code (Japanese barcode)
|
|
title: FeatureQuality.GOOD, // Product title
|
|
artists: FeatureQuality.GOOD, // Artist links
|
|
releaseDate: FeatureQuality.GOOD, // Release date
|
|
labels: FeatureQuality.GOOD, // Label + catalog number
|
|
media: FeatureQuality.GOOD, // Track list
|
|
tracks: FeatureQuality.GOOD, // Track details
|
|
isrc: FeatureQuality.MISSING, // Not displayed
|
|
images: 'varies', // Product image
|
|
copyright: FeatureQuality.PRESENT,// Copyright notice
|
|
availability: FeatureQuality.GOOD // Japan-specific
|
|
};
|
|
```
|
|
|
|
#### Challenges
|
|
|
|
- **Japanese text**: Requires proper encoding and language detection
|
|
- **JAN vs. UPC**: Japanese Article Number may differ from international UPC
|
|
- **Regional availability**: Japan-only releases
|
|
|
|
### 9. Ototoy (Japan)
|
|
|
|
**File**: `providers/ototoy.ts`
|
|
|
|
#### Scraping Method
|
|
|
|
- **Technique**: HTML parsing with CSS selectors
|
|
- **Language**: Japanese
|
|
- **Reliability**: Medium
|
|
|
|
#### URL Pattern
|
|
|
|
```typescript
|
|
urlPattern = new URLPattern({
|
|
hostname: 'ototoy.jp',
|
|
pathname: '/album/:id'
|
|
});
|
|
```
|
|
|
|
**Matches**:
|
|
- `https://ototoy.jp/album/123456`
|
|
|
|
#### Feature Quality
|
|
|
|
```typescript
|
|
featureQuality = {
|
|
gtin: FeatureQuality.PRESENT, // JAN code
|
|
title: FeatureQuality.GOOD, // Album title
|
|
artists: FeatureQuality.GOOD, // Artist name
|
|
releaseDate: FeatureQuality.GOOD, // Release date
|
|
labels: FeatureQuality.GOOD, // Label info
|
|
media: FeatureQuality.GOOD, // Track list
|
|
tracks: FeatureQuality.GOOD, // Track details
|
|
isrc: FeatureQuality.MISSING, // Not displayed
|
|
images: 'varies', // Album art
|
|
copyright: FeatureQuality.PRESENT,// Copyright info
|
|
availability: FeatureQuality.GOOD // Japan-specific
|
|
};
|
|
```
|
|
|
|
## Provider Base Architecture
|
|
|
|
### MetadataProvider (Abstract Base)
|
|
|
|
**File**: `providers/base.ts`
|
|
|
|
**Core Functionality**:
|
|
|
|
```typescript
|
|
abstract class MetadataProvider {
|
|
// Identity
|
|
abstract name: string;
|
|
abstract urlPattern: URLPattern;
|
|
|
|
// Lookup methods
|
|
abstract lookupByUrl(url: string): Promise<ProviderRelease>;
|
|
abstract lookupByGtin(gtin: string, region?: string): Promise<ProviderRelease>;
|
|
|
|
// Harmonization
|
|
abstract harmonize(release: ProviderRelease): HarmonyRelease;
|
|
|
|
// Feature quality
|
|
abstract featureQuality: FeatureQualityMap;
|
|
|
|
// Rate limiting
|
|
protected rateLimit: RateLimiter;
|
|
protected async throttle(): Promise<void> {
|
|
await this.rateLimit.wait();
|
|
}
|
|
|
|
// Caching
|
|
protected cache: SnapStorage;
|
|
protected async getCached(key: string): Promise<Response | null> {
|
|
return await this.cache.get(key);
|
|
}
|
|
protected async setCached(key: string, response: Response): Promise<void> {
|
|
await this.cache.set(key, response);
|
|
}
|
|
|
|
// URL matching
|
|
matchesUrl(url: string): boolean {
|
|
return this.urlPattern.test(url);
|
|
}
|
|
}
|
|
```
|
|
|
|
### MetadataApiProvider (OAuth2)
|
|
|
|
**File**: `providers/api_base.ts`
|
|
|
|
**OAuth2 Support**:
|
|
|
|
```typescript
|
|
abstract class MetadataApiProvider extends MetadataProvider {
|
|
protected abstract clientId: string;
|
|
protected abstract clientSecret: string;
|
|
protected abstract tokenEndpoint: string;
|
|
|
|
protected async getAccessToken(): Promise<string> {
|
|
// Check cache
|
|
const cached = this.getTokenFromCache();
|
|
if (cached && !this.isTokenExpired(cached)) {
|
|
return cached.access_token;
|
|
}
|
|
|
|
// Request new token
|
|
const token = await this.requestToken();
|
|
this.cacheToken(token);
|
|
return token.access_token;
|
|
}
|
|
|
|
protected abstract async requestToken(): Promise<OAuth2Token>;
|
|
|
|
protected async fetch(url: string, options?: RequestInit): Promise<Response> {
|
|
const token = await this.getAccessToken();
|
|
return await fetch(url, {
|
|
...options,
|
|
headers: {
|
|
...options?.headers,
|
|
'Authorization': `Bearer ${token}`
|
|
}
|
|
});
|
|
}
|
|
}
|
|
```
|
|
|
|
### RateLimiter
|
|
|
|
**File**: `utils/rate_limiter.ts`
|
|
|
|
**Implementation**:
|
|
|
|
```typescript
|
|
class RateLimiter {
|
|
private queue: number[] = [];
|
|
private maxRequests: number;
|
|
private timeWindow: number; // milliseconds
|
|
|
|
constructor(maxRequests: number, timeWindow: number) {
|
|
this.maxRequests = maxRequests;
|
|
this.timeWindow = timeWindow;
|
|
}
|
|
|
|
async wait(): Promise<void> {
|
|
const now = Date.now();
|
|
|
|
// Remove old requests outside time window
|
|
this.queue = this.queue.filter(t => now - t < this.timeWindow);
|
|
|
|
// If at limit, wait until oldest request expires
|
|
if (this.queue.length >= this.maxRequests) {
|
|
const oldestRequest = this.queue[0];
|
|
const waitTime = this.timeWindow - (now - oldestRequest);
|
|
await new Promise(resolve => setTimeout(resolve, waitTime));
|
|
return this.wait(); // Recursive call after waiting
|
|
}
|
|
|
|
// Add current request to queue
|
|
this.queue.push(now);
|
|
}
|
|
}
|
|
|
|
// Usage
|
|
const deezerLimiter = new RateLimiter(50, 5000); // 50 req / 5 sec
|
|
const mbLimiter = new RateLimiter(5, 5000); // 5 req / 5 sec
|
|
```
|
|
|
|
## Provider Registry
|
|
|
|
**File**: `providers/registry.ts`
|
|
|
|
**Registration**:
|
|
|
|
```typescript
|
|
class ProviderRegistry {
|
|
private providers = new Map<string, MetadataProvider>();
|
|
private categories = new Map<string, string[]>();
|
|
|
|
register(provider: MetadataProvider, category: string): void {
|
|
this.providers.set(provider.name, provider);
|
|
|
|
if (!this.categories.has(category)) {
|
|
this.categories.set(category, []);
|
|
}
|
|
this.categories.get(category)!.push(provider.name);
|
|
}
|
|
|
|
get(name: string): MetadataProvider | undefined {
|
|
return this.providers.get(name);
|
|
}
|
|
|
|
getByCategory(category: string): MetadataProvider[] {
|
|
const names = this.categories.get(category) || [];
|
|
return names.map(name => this.providers.get(name)!);
|
|
}
|
|
|
|
getByUrl(url: string): MetadataProvider | undefined {
|
|
for (const provider of this.providers.values()) {
|
|
if (provider.matchesUrl(url)) {
|
|
return provider;
|
|
}
|
|
}
|
|
return undefined;
|
|
}
|
|
|
|
getByGtin(): MetadataProvider[] {
|
|
return Array.from(this.providers.values()).filter(p =>
|
|
p.featureQuality.gtin !== FeatureQuality.MISSING
|
|
);
|
|
}
|
|
}
|
|
|
|
// Initialize registry
|
|
const registry = new ProviderRegistry();
|
|
registry.register(new SpotifyProvider(), 'preferred');
|
|
registry.register(new DeezerProvider(), 'default');
|
|
registry.register(new iTunesProvider(), 'default');
|
|
registry.register(new TidalProvider(), 'preferred');
|
|
registry.register(new MusicBrainzProvider(), 'preferred');
|
|
registry.register(new BandcampProvider(), 'all');
|
|
registry.register(new BeatportProvider(), 'all');
|
|
registry.register(new MoraProvider(), 'japan');
|
|
registry.register(new OtotoyProvider(), 'japan');
|
|
```
|
|
|
|
## Not Implemented: KKBOX
|
|
|
|
**Status**: Mentioned in documentation but not implemented
|
|
|
|
**Reason**: Unknown (possibly API access issues or low priority)
|
|
|
|
**Potential Implementation**:
|
|
- **Region**: Taiwan, Hong Kong, Japan, Singapore, Malaysia
|
|
- **API**: Public API available
|
|
- **Authentication**: API key required
|
|
- **Data quality**: High (official metadata)
|
|
|
|
## Summary
|
|
|
|
Harmony's provider integration demonstrates:
|
|
|
|
1. **Diverse access methods**: API-based (5) and HTML scraping (4)
|
|
2. **Unified abstraction**: All providers implement common interface
|
|
3. **OAuth2 support**: Spotify and Tidal with token caching
|
|
4. **Rate limiting**: Per-provider rate limiters with exponential backoff
|
|
5. **Multi-region support**: iTunes queries multiple regions in parallel
|
|
6. **Feature quality ratings**: Transparent quality assessment per provider
|
|
7. **Graceful degradation**: `Promise.allSettled` ensures partial results
|
|
8. **MusicBrainz integration**: MBID resolution and duplicate detection
|
|
9. **Caching**: 24-hour HTTP response cache reduces API calls
|
|
|
|
This architecture is production-ready and serves as an excellent reference for building multi-source metadata aggregation systems.
|