feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,955 @@
|
||||
# Harmony - Data Model and Storage Analysis
|
||||
|
||||
## Storage Philosophy
|
||||
|
||||
Harmony employs a **cache-first, no-database** architecture:
|
||||
|
||||
- **No traditional database**: No PostgreSQL, MySQL, MongoDB, etc.
|
||||
- **No persistent user data**: No accounts, no saved searches, no user-generated content
|
||||
- **Cache as storage**: HTTP response caching via `snap_storage` library
|
||||
- **In-memory processing**: All data transformations happen in memory
|
||||
- **Stateless design**: Each request is independent
|
||||
|
||||
This approach prioritizes:
|
||||
- **Simplicity**: No database migrations, no schema evolution
|
||||
- **Reproducibility**: Permalink system enables exact result replay
|
||||
- **API compliance**: Caching reduces provider API calls
|
||||
- **Deployment ease**: No database server required
|
||||
|
||||
## Persistence Layer: snap_storage
|
||||
|
||||
### Overview
|
||||
|
||||
`snap_storage` is a Deno library for HTTP response caching with SQLite backend.
|
||||
|
||||
**Repository**: https://github.com/kellnerd/snap-storage (same author as Harmony)
|
||||
|
||||
**Purpose**: Store HTTP responses with timestamps for later retrieval
|
||||
|
||||
### Storage Structure
|
||||
|
||||
#### SQLite Database: `snaps.db`
|
||||
|
||||
**Location**: `${HARMONY_DATA_DIR}/snaps.db` (default: `./snaps.db`)
|
||||
|
||||
**Schema** (conceptual):
|
||||
```sql
|
||||
CREATE TABLE snaps (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
key TEXT NOT NULL UNIQUE,
|
||||
url TEXT NOT NULL,
|
||||
timestamp INTEGER NOT NULL,
|
||||
status INTEGER NOT NULL,
|
||||
headers TEXT NOT NULL,
|
||||
body_path TEXT NOT NULL,
|
||||
created_at INTEGER NOT NULL
|
||||
);
|
||||
|
||||
CREATE INDEX idx_snaps_key ON snaps(key);
|
||||
CREATE INDEX idx_snaps_timestamp ON snaps(timestamp);
|
||||
CREATE INDEX idx_snaps_url ON snaps(url);
|
||||
```
|
||||
|
||||
**Fields**:
|
||||
- `key`: Cache key (hash of URL + parameters)
|
||||
- `url`: Original request URL
|
||||
- `timestamp`: Unix timestamp of request
|
||||
- `status`: HTTP status code
|
||||
- `headers`: JSON-encoded response headers
|
||||
- `body_path`: Path to response body file in `snaps/` directory
|
||||
- `created_at`: Record creation timestamp
|
||||
|
||||
#### File Directory: `snaps/`
|
||||
|
||||
**Location**: `${HARMONY_DATA_DIR}/snaps/` (default: `./snaps/`)
|
||||
|
||||
**Structure**:
|
||||
```
|
||||
snaps/
|
||||
├── 0a/
|
||||
│ ├── 0a1b2c3d4e5f6g7h8i9j.json
|
||||
│ └── 0a9f8e7d6c5b4a3.json
|
||||
├── 1b/
|
||||
│ └── 1b2c3d4e5f6g7h8i9j0a.json
|
||||
└── ...
|
||||
```
|
||||
|
||||
**File naming**: First 2 characters of hash as directory, full hash as filename
|
||||
|
||||
**File content**: Raw HTTP response body (JSON, HTML, XML, etc.)
|
||||
|
||||
### Cache Operations
|
||||
|
||||
#### Store Response
|
||||
|
||||
```typescript
|
||||
interface CacheEntry {
|
||||
url: string;
|
||||
timestamp: number;
|
||||
response: Response;
|
||||
}
|
||||
|
||||
async function storeResponse(entry: CacheEntry): Promise<void> {
|
||||
const key = hashUrl(entry.url);
|
||||
const bodyPath = `snaps/${key.slice(0, 2)}/${key}.json`;
|
||||
|
||||
// Store body to file
|
||||
await Deno.writeTextFile(bodyPath, await entry.response.text());
|
||||
|
||||
// Store metadata to database
|
||||
await db.execute(`
|
||||
INSERT INTO snaps (key, url, timestamp, status, headers, body_path, created_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||
`, [
|
||||
key,
|
||||
entry.url,
|
||||
entry.timestamp,
|
||||
entry.response.status,
|
||||
JSON.stringify(Object.fromEntries(entry.response.headers)),
|
||||
bodyPath,
|
||||
Date.now()
|
||||
]);
|
||||
}
|
||||
```
|
||||
|
||||
#### Retrieve Response
|
||||
|
||||
```typescript
|
||||
async function getResponse(url: string, timestamp?: number): Promise<Response | null> {
|
||||
const key = hashUrl(url);
|
||||
|
||||
let query = `SELECT * FROM snaps WHERE key = ?`;
|
||||
const params = [key];
|
||||
|
||||
if (timestamp) {
|
||||
// Permalink mode: exact timestamp match
|
||||
query += ` AND timestamp = ?`;
|
||||
params.push(timestamp);
|
||||
} else {
|
||||
// Normal mode: most recent within cache duration
|
||||
const maxAge = 24 * 60 * 60 * 1000; // 24 hours
|
||||
query += ` AND created_at > ? ORDER BY created_at DESC LIMIT 1`;
|
||||
params.push(Date.now() - maxAge);
|
||||
}
|
||||
|
||||
const row = await db.queryOne(query, params);
|
||||
if (!row) return null;
|
||||
|
||||
// Read body from file
|
||||
const body = await Deno.readTextFile(row.body_path);
|
||||
|
||||
// Reconstruct Response object
|
||||
return new Response(body, {
|
||||
status: row.status,
|
||||
headers: JSON.parse(row.headers)
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Cache Policy
|
||||
|
||||
#### Default Policy
|
||||
|
||||
- **Duration**: 24 hours
|
||||
- **Eviction**: No automatic eviction (manual cleanup required)
|
||||
- **Size limit**: No enforced limit (grows indefinitely)
|
||||
|
||||
#### Permalink Policy
|
||||
|
||||
- **Duration**: Indefinite (never evicted)
|
||||
- **Purpose**: Enable reproducible results
|
||||
- **Lookup**: Exact timestamp match
|
||||
|
||||
#### Cache Key Generation
|
||||
|
||||
```typescript
|
||||
function hashUrl(url: string): string {
|
||||
// Normalize URL
|
||||
const normalized = new URL(url);
|
||||
normalized.searchParams.sort(); // Consistent parameter order
|
||||
|
||||
// Hash normalized URL
|
||||
const encoder = new TextEncoder();
|
||||
const data = encoder.encode(normalized.toString());
|
||||
const hashBuffer = await crypto.subtle.digest('SHA-256', data);
|
||||
const hashArray = Array.from(new Uint8Array(hashBuffer));
|
||||
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
|
||||
}
|
||||
```
|
||||
|
||||
### Cache Management
|
||||
|
||||
#### Manual Cleanup
|
||||
|
||||
No automatic cleanup. Users must manually delete old cache entries:
|
||||
|
||||
```bash
|
||||
# Delete cache older than 30 days
|
||||
sqlite3 snaps.db "DELETE FROM snaps WHERE created_at < $(date -d '30 days ago' +%s)000"
|
||||
|
||||
# Clean up orphaned files
|
||||
find snaps/ -type f -mtime +30 -delete
|
||||
```
|
||||
|
||||
#### Cache Statistics
|
||||
|
||||
```bash
|
||||
# Total cache entries
|
||||
sqlite3 snaps.db "SELECT COUNT(*) FROM snaps"
|
||||
|
||||
# Cache size
|
||||
du -sh snaps/
|
||||
|
||||
# Entries per provider
|
||||
sqlite3 snaps.db "SELECT url, COUNT(*) FROM snaps GROUP BY url"
|
||||
```
|
||||
|
||||
## MBID Cache
|
||||
|
||||
### Purpose
|
||||
|
||||
Cache MusicBrainz ID (MBID) mappings for external URLs to avoid repeated API calls.
|
||||
|
||||
### Storage Location
|
||||
|
||||
- **Development**: `localStorage` (persistent across sessions)
|
||||
- **Production**: `sessionStorage` (cleared on browser close)
|
||||
|
||||
**Rationale**: Development benefits from persistent cache, production prioritizes fresh data.
|
||||
|
||||
### Cache Structure
|
||||
|
||||
```typescript
|
||||
interface MBIDCache {
|
||||
[externalUrl: string]: MBIDCacheEntry;
|
||||
}
|
||||
|
||||
interface MBIDCacheEntry {
|
||||
mbid: string;
|
||||
type: 'release' | 'release-group' | 'recording' | 'artist' | 'label';
|
||||
cached: number; // Unix timestamp
|
||||
}
|
||||
```
|
||||
|
||||
### Cache Operations
|
||||
|
||||
#### Store MBID Mapping
|
||||
|
||||
```typescript
|
||||
function cacheMBID(url: string, mbid: string, type: string): void {
|
||||
const cache = getMBIDCache();
|
||||
cache[url] = {
|
||||
mbid,
|
||||
type,
|
||||
cached: Date.now()
|
||||
};
|
||||
setMBIDCache(cache);
|
||||
}
|
||||
|
||||
function getMBIDCache(): MBIDCache {
|
||||
const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage;
|
||||
const cached = storage.getItem('harmony_mbid_cache');
|
||||
return cached ? JSON.parse(cached) : {};
|
||||
}
|
||||
|
||||
function setMBIDCache(cache: MBIDCache): void {
|
||||
const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage;
|
||||
storage.setItem('harmony_mbid_cache', JSON.stringify(cache));
|
||||
}
|
||||
```
|
||||
|
||||
#### Retrieve MBID Mapping
|
||||
|
||||
```typescript
|
||||
function getCachedMBID(url: string): MBIDCacheEntry | null {
|
||||
const cache = getMBIDCache();
|
||||
const entry = cache[url];
|
||||
|
||||
if (!entry) return null;
|
||||
|
||||
// Check if cache is stale (24 hours)
|
||||
const maxAge = 24 * 60 * 60 * 1000;
|
||||
if (Date.now() - entry.cached > maxAge) {
|
||||
delete cache[url];
|
||||
setMBIDCache(cache);
|
||||
return null;
|
||||
}
|
||||
|
||||
return entry;
|
||||
}
|
||||
```
|
||||
|
||||
#### Batch MBID Lookup
|
||||
|
||||
MusicBrainz API supports batch URL lookup (up to 100 URLs per request):
|
||||
|
||||
```typescript
|
||||
async function resolveMBIDs(urls: string[]): Promise<Map<string, MBIDCacheEntry>> {
|
||||
const results = new Map<string, MBIDCacheEntry>();
|
||||
|
||||
// Check cache first
|
||||
const uncached: string[] = [];
|
||||
for (const url of urls) {
|
||||
const cached = getCachedMBID(url);
|
||||
if (cached) {
|
||||
results.set(url, cached);
|
||||
} else {
|
||||
uncached.push(url);
|
||||
}
|
||||
}
|
||||
|
||||
// Batch lookup uncached URLs (100 at a time)
|
||||
for (let i = 0; i < uncached.length; i += 100) {
|
||||
const batch = uncached.slice(i, i + 100);
|
||||
const params = batch.map(url => `resource=${encodeURIComponent(url)}`).join('&');
|
||||
const response = await fetch(`https://musicbrainz.org/ws/2/url?${params}`);
|
||||
const data = await response.json();
|
||||
|
||||
// Parse response and cache results
|
||||
for (const urlData of data.urls) {
|
||||
const mbid = urlData.relations[0]?.release?.id;
|
||||
const type = urlData.relations[0]?.type;
|
||||
if (mbid) {
|
||||
cacheMBID(urlData.resource, mbid, type);
|
||||
results.set(urlData.resource, { mbid, type, cached: Date.now() });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return results;
|
||||
}
|
||||
```
|
||||
|
||||
## Core Data Model: HarmonyRelease
|
||||
|
||||
### Schema Definition
|
||||
|
||||
**Location**: `harmonizer/types.ts` (273 lines)
|
||||
|
||||
**Full Interface**:
|
||||
```typescript
|
||||
interface HarmonyRelease {
|
||||
// ===== Basic Metadata =====
|
||||
title: string;
|
||||
artists: ArtistCreditName[];
|
||||
gtin?: string; // Global Trade Item Number (barcode)
|
||||
|
||||
// ===== Media and Tracks =====
|
||||
media: HarmonyMedium[];
|
||||
|
||||
// ===== Release Details =====
|
||||
language?: string; // ISO 639-3 code
|
||||
script?: string; // ISO 15924 code
|
||||
status?: ReleaseStatus;
|
||||
types: ReleaseType[];
|
||||
releaseDate?: PartialDate;
|
||||
|
||||
// ===== Commercial Information =====
|
||||
labels: Label[];
|
||||
packaging?: PackagingType;
|
||||
copyright?: string;
|
||||
|
||||
// ===== Distribution =====
|
||||
availableIn?: string[]; // ISO 3166-1 alpha-2 country codes
|
||||
excludedFrom?: string[]; // ISO 3166-1 alpha-2 country codes
|
||||
|
||||
// ===== Visual Assets =====
|
||||
images: Image[];
|
||||
|
||||
// ===== External Links =====
|
||||
externalLinks: ExternalLink[];
|
||||
|
||||
// ===== Metadata About Metadata =====
|
||||
info: ReleaseInfo;
|
||||
}
|
||||
```
|
||||
|
||||
### Sub-Structures
|
||||
|
||||
#### ArtistCreditName
|
||||
|
||||
```typescript
|
||||
interface ArtistCreditName {
|
||||
name: string; // Artist name
|
||||
creditedName?: string; // Alternative credit (e.g., "feat. Artist")
|
||||
joinPhrase?: string; // Separator (e.g., " & ", " feat. ", " vs. ")
|
||||
mbid?: string; // MusicBrainz artist ID
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
[
|
||||
{ name: "Artist A", joinPhrase: " & " },
|
||||
{ name: "Artist B", joinPhrase: " feat. " },
|
||||
{ name: "Artist C", creditedName: "Artist C (DJ Set)" }
|
||||
]
|
||||
```
|
||||
|
||||
**Rendering**: "Artist A & Artist B feat. Artist C (DJ Set)"
|
||||
|
||||
#### HarmonyMedium
|
||||
|
||||
```typescript
|
||||
interface HarmonyMedium {
|
||||
title?: string; // Medium title (e.g., "Disc 1: The Album")
|
||||
format?: MediumFormat;
|
||||
position: number; // 1-indexed
|
||||
tracks: HarmonyTrack[];
|
||||
}
|
||||
|
||||
enum MediumFormat {
|
||||
CD = 'CD',
|
||||
Vinyl = 'Vinyl',
|
||||
Digital = 'Digital Media',
|
||||
Cassette = 'Cassette',
|
||||
DVD = 'DVD',
|
||||
BluRay = 'Blu-ray',
|
||||
Other = 'Other'
|
||||
}
|
||||
```
|
||||
|
||||
#### HarmonyTrack
|
||||
|
||||
```typescript
|
||||
interface HarmonyTrack {
|
||||
title: string;
|
||||
artists?: ArtistCreditName[]; // Track-specific artists (overrides release artists)
|
||||
position: number; // 1-indexed within medium
|
||||
length?: number; // Duration in milliseconds
|
||||
isrc?: string; // International Standard Recording Code
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
{
|
||||
title: "Track Title",
|
||||
artists: [{ name: "Track Artist" }],
|
||||
position: 1,
|
||||
length: 245000, // 4:05
|
||||
isrc: "USRC17607839"
|
||||
}
|
||||
```
|
||||
|
||||
#### Label
|
||||
|
||||
```typescript
|
||||
interface Label {
|
||||
name: string;
|
||||
catalogNumber?: string;
|
||||
mbid?: string; // MusicBrainz label ID
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
[
|
||||
{ name: "Record Label", catalogNumber: "RL-12345" },
|
||||
{ name: "Distributor", catalogNumber: "DIST-67890" }
|
||||
]
|
||||
```
|
||||
|
||||
#### Image
|
||||
|
||||
```typescript
|
||||
interface Image {
|
||||
url: string;
|
||||
types: ImageType[];
|
||||
width?: number;
|
||||
height?: number;
|
||||
comment?: string;
|
||||
}
|
||||
|
||||
enum ImageType {
|
||||
Front = 'front',
|
||||
Back = 'back',
|
||||
Medium = 'medium',
|
||||
Tray = 'tray',
|
||||
Booklet = 'booklet',
|
||||
Obi = 'obi',
|
||||
Spine = 'spine',
|
||||
Track = 'track',
|
||||
Liner = 'liner',
|
||||
Sticker = 'sticker',
|
||||
Poster = 'poster',
|
||||
Watermark = 'watermark',
|
||||
Raw = 'raw',
|
||||
Unedited = 'unedited'
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
[
|
||||
{
|
||||
url: "https://i.scdn.co/image/ab67616d0000b273...",
|
||||
types: [ImageType.Front],
|
||||
width: 2000,
|
||||
height: 2000
|
||||
},
|
||||
{
|
||||
url: "https://e-cdn-images.dzcdn.net/images/cover/...",
|
||||
types: [ImageType.Front],
|
||||
width: 1400,
|
||||
height: 1400,
|
||||
comment: "Deezer cover"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
#### ExternalLink
|
||||
|
||||
```typescript
|
||||
interface ExternalLink {
|
||||
url: string;
|
||||
types: LinkType[];
|
||||
}
|
||||
|
||||
enum LinkType {
|
||||
Streaming = 'streaming',
|
||||
Purchase = 'purchase',
|
||||
Download = 'download',
|
||||
License = 'license',
|
||||
Crowdfunding = 'crowdfunding',
|
||||
Other = 'other'
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
[
|
||||
{
|
||||
url: "https://open.spotify.com/album/xyz",
|
||||
types: [LinkType.Streaming]
|
||||
},
|
||||
{
|
||||
url: "https://bandcamp.com/album/xyz",
|
||||
types: [LinkType.Streaming, LinkType.Purchase]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
#### ReleaseInfo
|
||||
|
||||
```typescript
|
||||
interface ReleaseInfo {
|
||||
providers: string[]; // Provider names that contributed data
|
||||
messages: Message[]; // Warnings, errors, info messages
|
||||
sourceMap?: SourceMap; // Property -> provider mapping (only in MergedHarmonyRelease)
|
||||
incompatibleData?: IncompatibilityInfo; // Conflicts (only in MergedHarmonyRelease)
|
||||
}
|
||||
|
||||
interface Message {
|
||||
level: 'error' | 'warning' | 'info';
|
||||
text: string;
|
||||
provider?: string;
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
{
|
||||
providers: ["spotify", "deezer", "itunes"],
|
||||
messages: [
|
||||
{
|
||||
level: "warning",
|
||||
text: "Release date conflict: Spotify (2014-11-24) vs iTunes (2014-11-25)",
|
||||
provider: "itunes"
|
||||
},
|
||||
{
|
||||
level: "info",
|
||||
text: "Using Spotify value (higher preference)"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Enumerations
|
||||
|
||||
#### ReleaseStatus
|
||||
|
||||
```typescript
|
||||
enum ReleaseStatus {
|
||||
Official = 'official',
|
||||
Promotion = 'promotion',
|
||||
Bootleg = 'bootleg',
|
||||
PseudoRelease = 'pseudo-release'
|
||||
}
|
||||
```
|
||||
|
||||
#### ReleaseType
|
||||
|
||||
```typescript
|
||||
enum ReleaseType {
|
||||
// Primary types
|
||||
Album = 'album',
|
||||
Single = 'single',
|
||||
EP = 'ep',
|
||||
Broadcast = 'broadcast',
|
||||
Other = 'other',
|
||||
|
||||
// Secondary types
|
||||
Compilation = 'compilation',
|
||||
Soundtrack = 'soundtrack',
|
||||
Spokenword = 'spokenword',
|
||||
Interview = 'interview',
|
||||
Audiobook = 'audiobook',
|
||||
AudioDrama = 'audio drama',
|
||||
Live = 'live',
|
||||
Remix = 'remix',
|
||||
DJMix = 'dj-mix',
|
||||
Mixtape = 'mixtape',
|
||||
Demo = 'demo',
|
||||
FieldRecording = 'field recording'
|
||||
}
|
||||
```
|
||||
|
||||
**Usage**: Array of types (primary + secondary)
|
||||
```typescript
|
||||
types: [ReleaseType.Album, ReleaseType.Live] // Live album
|
||||
types: [ReleaseType.EP, ReleaseType.Remix] // Remix EP
|
||||
```
|
||||
|
||||
#### PackagingType
|
||||
|
||||
```typescript
|
||||
enum PackagingType {
|
||||
JewelCase = 'jewel case',
|
||||
SlimJewelCase = 'slim jewel case',
|
||||
Digipak = 'digipak',
|
||||
Cardboard = 'cardboard/paper sleeve',
|
||||
KeepCase = 'keep case',
|
||||
None = 'none',
|
||||
Other = 'other'
|
||||
}
|
||||
```
|
||||
|
||||
#### PartialDate
|
||||
|
||||
```typescript
|
||||
interface PartialDate {
|
||||
year: number;
|
||||
month?: number; // 1-12
|
||||
day?: number; // 1-31
|
||||
}
|
||||
```
|
||||
|
||||
**Examples**:
|
||||
```typescript
|
||||
{ year: 2014 } // Year only
|
||||
{ year: 2014, month: 11 } // Year and month
|
||||
{ year: 2014, month: 11, day: 24 } // Full date
|
||||
```
|
||||
|
||||
**Serialization**:
|
||||
```typescript
|
||||
function serializePartialDate(date: PartialDate): string {
|
||||
let result = date.year.toString();
|
||||
if (date.month) {
|
||||
result += `-${date.month.toString().padStart(2, '0')}`;
|
||||
if (date.day) {
|
||||
result += `-${date.day.toString().padStart(2, '0')}`;
|
||||
}
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
// Examples:
|
||||
// { year: 2014 } -> "2014"
|
||||
// { year: 2014, month: 11 } -> "2014-11"
|
||||
// { year: 2014, month: 11, day: 24 } -> "2014-11-24"
|
||||
```
|
||||
|
||||
## MergedHarmonyRelease
|
||||
|
||||
Extends `HarmonyRelease` with merge metadata.
|
||||
|
||||
```typescript
|
||||
interface MergedHarmonyRelease extends HarmonyRelease {
|
||||
info: ReleaseInfo & {
|
||||
sourceMap: SourceMap;
|
||||
incompatibleData?: IncompatibilityInfo;
|
||||
};
|
||||
}
|
||||
|
||||
interface SourceMap {
|
||||
[propertyPath: string]: string; // Property path -> provider name
|
||||
}
|
||||
|
||||
interface IncompatibilityInfo {
|
||||
conflicts: Conflict[];
|
||||
warnings: string[];
|
||||
}
|
||||
|
||||
interface Conflict {
|
||||
property: string;
|
||||
values: ConflictValue[];
|
||||
}
|
||||
|
||||
interface ConflictValue {
|
||||
provider: string;
|
||||
value: any;
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```typescript
|
||||
{
|
||||
title: "Album Title",
|
||||
releaseDate: { year: 2014, month: 11, day: 24 },
|
||||
// ... other fields
|
||||
info: {
|
||||
providers: ["spotify", "deezer", "itunes"],
|
||||
sourceMap: {
|
||||
"title": "spotify",
|
||||
"releaseDate": "spotify",
|
||||
"gtin": "deezer",
|
||||
"media[0].tracks[0].isrc": "spotify"
|
||||
},
|
||||
incompatibleData: {
|
||||
conflicts: [
|
||||
{
|
||||
property: "releaseDate",
|
||||
values: [
|
||||
{ provider: "spotify", value: { year: 2014, month: 11, day: 24 } },
|
||||
{ provider: "itunes", value: { year: 2014, month: 11, day: 25 } }
|
||||
]
|
||||
}
|
||||
],
|
||||
warnings: [
|
||||
"Release date conflict resolved using Spotify value (higher preference)"
|
||||
]
|
||||
},
|
||||
messages: []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Data Transformations
|
||||
|
||||
### Provider-Specific to HarmonyRelease
|
||||
|
||||
Each provider implements a `harmonize()` method:
|
||||
|
||||
```typescript
|
||||
// Spotify example (conceptual)
|
||||
class SpotifyProvider {
|
||||
harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease {
|
||||
return {
|
||||
title: spotifyAlbum.name,
|
||||
artists: spotifyAlbum.artists.map(a => ({
|
||||
name: a.name,
|
||||
mbid: undefined // Spotify doesn't provide MBIDs
|
||||
})),
|
||||
gtin: spotifyAlbum.external_ids?.upc,
|
||||
media: [{
|
||||
format: MediumFormat.Digital,
|
||||
position: 1,
|
||||
tracks: spotifyAlbum.tracks.items.map((t, i) => ({
|
||||
title: t.name,
|
||||
position: i + 1,
|
||||
length: t.duration_ms,
|
||||
isrc: t.external_ids?.isrc
|
||||
}))
|
||||
}],
|
||||
releaseDate: this.parseDate(spotifyAlbum.release_date),
|
||||
types: this.inferTypes(spotifyAlbum.album_type),
|
||||
images: spotifyAlbum.images.map(img => ({
|
||||
url: img.url,
|
||||
types: [ImageType.Front],
|
||||
width: img.width,
|
||||
height: img.height
|
||||
})),
|
||||
externalLinks: [{
|
||||
url: spotifyAlbum.external_urls.spotify,
|
||||
types: [LinkType.Streaming]
|
||||
}],
|
||||
labels: spotifyAlbum.label ? [{ name: spotifyAlbum.label }] : [],
|
||||
copyright: spotifyAlbum.copyrights?.[0]?.text,
|
||||
availableIn: spotifyAlbum.available_markets,
|
||||
info: {
|
||||
providers: ["spotify"],
|
||||
messages: []
|
||||
}
|
||||
};
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### HarmonyRelease to MusicBrainz Format
|
||||
|
||||
**Location**: `musicbrainz/seeding.ts`
|
||||
|
||||
```typescript
|
||||
interface MusicBrainzRelease {
|
||||
name: string;
|
||||
artist_credit: MBArtistCredit[];
|
||||
barcode?: string;
|
||||
release_events: MBReleaseEvent[];
|
||||
labels: MBLabel[];
|
||||
mediums: MBMedium[];
|
||||
release_group: {
|
||||
primary_type: string;
|
||||
secondary_types: string[];
|
||||
};
|
||||
language?: string;
|
||||
script?: string;
|
||||
packaging?: string;
|
||||
annotation?: string;
|
||||
}
|
||||
|
||||
function convertToMusicBrainz(release: MergedHarmonyRelease): MusicBrainzRelease {
|
||||
return {
|
||||
name: release.title,
|
||||
artist_credit: release.artists.map(a => ({
|
||||
name: a.name,
|
||||
credited_name: a.creditedName,
|
||||
join_phrase: a.joinPhrase || '',
|
||||
mbid: a.mbid
|
||||
})),
|
||||
barcode: release.gtin,
|
||||
release_events: convertReleaseEvents(release.releaseDate, release.availableIn),
|
||||
labels: release.labels.map(l => ({
|
||||
name: l.name,
|
||||
catalog_number: l.catalogNumber,
|
||||
mbid: l.mbid
|
||||
})),
|
||||
mediums: release.media.map(m => ({
|
||||
format: m.format,
|
||||
position: m.position,
|
||||
title: m.title,
|
||||
tracks: m.tracks.map(t => ({
|
||||
title: t.title,
|
||||
position: t.position,
|
||||
length: t.length,
|
||||
isrc: t.isrc,
|
||||
artist_credit: t.artists?.map(a => ({
|
||||
name: a.name,
|
||||
join_phrase: a.joinPhrase || ''
|
||||
}))
|
||||
}))
|
||||
})),
|
||||
release_group: {
|
||||
primary_type: release.types.find(t => isPrimaryType(t)) || 'album',
|
||||
secondary_types: release.types.filter(t => !isPrimaryType(t))
|
||||
},
|
||||
language: release.language,
|
||||
script: release.script,
|
||||
packaging: release.packaging,
|
||||
annotation: buildAnnotation(release)
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## Data Validation
|
||||
|
||||
### GTIN Validation
|
||||
|
||||
```typescript
|
||||
function validateGTIN(gtin: string): boolean {
|
||||
// GTIN-13 (EAN-13) validation
|
||||
if (!/^\d{13}$/.test(gtin)) return false;
|
||||
|
||||
// Check digit validation
|
||||
const digits = gtin.split('').map(Number);
|
||||
const checksum = digits.slice(0, 12).reduce((sum, digit, i) => {
|
||||
return sum + digit * (i % 2 === 0 ? 1 : 3);
|
||||
}, 0);
|
||||
const checkDigit = (10 - (checksum % 10)) % 10;
|
||||
|
||||
return checkDigit === digits[12];
|
||||
}
|
||||
```
|
||||
|
||||
### ISRC Validation
|
||||
|
||||
```typescript
|
||||
function validateISRC(isrc: string): boolean {
|
||||
// Format: CC-XXX-YY-NNNNN
|
||||
// CC: Country code (2 letters)
|
||||
// XXX: Registrant code (3 alphanumeric)
|
||||
// YY: Year (2 digits)
|
||||
// NNNNN: Designation code (5 digits)
|
||||
return /^[A-Z]{2}-?[A-Z0-9]{3}-?\d{2}-?\d{5}$/.test(isrc);
|
||||
}
|
||||
|
||||
function normalizeISRC(isrc: string): string {
|
||||
// Remove hyphens
|
||||
return isrc.replace(/-/g, '');
|
||||
}
|
||||
```
|
||||
|
||||
### Date Validation
|
||||
|
||||
```typescript
|
||||
function validatePartialDate(date: PartialDate): boolean {
|
||||
if (date.year < 1000 || date.year > 9999) return false;
|
||||
if (date.month && (date.month < 1 || date.month > 12)) return false;
|
||||
if (date.day && (date.day < 1 || date.day > 31)) return false;
|
||||
|
||||
// Validate day for specific month
|
||||
if (date.month && date.day) {
|
||||
const daysInMonth = new Date(date.year, date.month, 0).getDate();
|
||||
if (date.day > daysInMonth) return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
```
|
||||
|
||||
## Data Size Estimates
|
||||
|
||||
### Typical HarmonyRelease Size
|
||||
|
||||
**Single-disc album** (12 tracks):
|
||||
- JSON serialized: ~15-25 KB
|
||||
- With images: ~20-30 KB (image URLs only, not image data)
|
||||
|
||||
**Multi-disc compilation** (50 tracks):
|
||||
- JSON serialized: ~50-80 KB
|
||||
|
||||
### Cache Size Estimates
|
||||
|
||||
**Provider response sizes**:
|
||||
- Spotify album: ~10-20 KB
|
||||
- Deezer album: ~15-25 KB
|
||||
- iTunes album: ~20-30 KB
|
||||
- Bandcamp page: ~50-100 KB (HTML)
|
||||
|
||||
**Daily cache growth** (100 lookups/day):
|
||||
- Database: ~50 KB (metadata only)
|
||||
- Files: ~2-5 MB (response bodies)
|
||||
|
||||
**Annual cache size** (36,500 lookups/year):
|
||||
- Database: ~18 MB
|
||||
- Files: ~730 MB - 1.8 GB
|
||||
|
||||
## No Migrations
|
||||
|
||||
Since Harmony has no traditional database, there are no schema migrations.
|
||||
|
||||
**Schema evolution strategy**:
|
||||
1. Add new optional fields to `HarmonyRelease` interface
|
||||
2. Update provider `harmonize()` methods to populate new fields
|
||||
3. Update merge algorithm to handle new fields
|
||||
4. No data migration required (old cached responses still valid)
|
||||
|
||||
**Breaking changes**:
|
||||
1. Rename or remove fields in `HarmonyRelease`
|
||||
2. Clear cache (delete `snaps.db` and `snaps/`)
|
||||
3. Rebuild cache on next lookup
|
||||
|
||||
## Summary
|
||||
|
||||
Harmony's data architecture demonstrates:
|
||||
|
||||
1. **Cache-first design**: `snap_storage` eliminates need for traditional database
|
||||
2. **Permalink system**: Timestamp-based cache replay enables reproducibility
|
||||
3. **Rich data model**: 273-line `HarmonyRelease` schema covers all metadata needs
|
||||
4. **Type safety**: Full TypeScript coverage ensures data consistency
|
||||
5. **No migrations**: Schema evolution without data migration complexity
|
||||
6. **Stateless processing**: All transformations in-memory, no persistent state
|
||||
7. **MBID caching**: Efficient batch lookup reduces MusicBrainz API calls
|
||||
|
||||
This architecture is ideal for read-heavy, stateless applications where reproducibility and API compliance are priorities.
|
||||
Reference in New Issue
Block a user