a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
956 lines
21 KiB
Markdown
956 lines
21 KiB
Markdown
# Harmony - Data Model and Storage Analysis
|
|
|
|
## Storage Philosophy
|
|
|
|
Harmony employs a **cache-first, no-database** architecture:
|
|
|
|
- **No traditional database**: No PostgreSQL, MySQL, MongoDB, etc.
|
|
- **No persistent user data**: No accounts, no saved searches, no user-generated content
|
|
- **Cache as storage**: HTTP response caching via `snap_storage` library
|
|
- **In-memory processing**: All data transformations happen in memory
|
|
- **Stateless design**: Each request is independent
|
|
|
|
This approach prioritizes:
|
|
- **Simplicity**: No database migrations, no schema evolution
|
|
- **Reproducibility**: Permalink system enables exact result replay
|
|
- **API compliance**: Caching reduces provider API calls
|
|
- **Deployment ease**: No database server required
|
|
|
|
## Persistence Layer: snap_storage
|
|
|
|
### Overview
|
|
|
|
`snap_storage` is a Deno library for HTTP response caching with SQLite backend.
|
|
|
|
**Repository**: https://github.com/kellnerd/snap-storage (same author as Harmony)
|
|
|
|
**Purpose**: Store HTTP responses with timestamps for later retrieval
|
|
|
|
### Storage Structure
|
|
|
|
#### SQLite Database: `snaps.db`
|
|
|
|
**Location**: `${HARMONY_DATA_DIR}/snaps.db` (default: `./snaps.db`)
|
|
|
|
**Schema** (conceptual):
|
|
```sql
|
|
CREATE TABLE snaps (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
key TEXT NOT NULL UNIQUE,
|
|
url TEXT NOT NULL,
|
|
timestamp INTEGER NOT NULL,
|
|
status INTEGER NOT NULL,
|
|
headers TEXT NOT NULL,
|
|
body_path TEXT NOT NULL,
|
|
created_at INTEGER NOT NULL
|
|
);
|
|
|
|
CREATE INDEX idx_snaps_key ON snaps(key);
|
|
CREATE INDEX idx_snaps_timestamp ON snaps(timestamp);
|
|
CREATE INDEX idx_snaps_url ON snaps(url);
|
|
```
|
|
|
|
**Fields**:
|
|
- `key`: Cache key (hash of URL + parameters)
|
|
- `url`: Original request URL
|
|
- `timestamp`: Unix timestamp of request
|
|
- `status`: HTTP status code
|
|
- `headers`: JSON-encoded response headers
|
|
- `body_path`: Path to response body file in `snaps/` directory
|
|
- `created_at`: Record creation timestamp
|
|
|
|
#### File Directory: `snaps/`
|
|
|
|
**Location**: `${HARMONY_DATA_DIR}/snaps/` (default: `./snaps/`)
|
|
|
|
**Structure**:
|
|
```
|
|
snaps/
|
|
├── 0a/
|
|
│ ├── 0a1b2c3d4e5f6g7h8i9j.json
|
|
│ └── 0a9f8e7d6c5b4a3.json
|
|
├── 1b/
|
|
│ └── 1b2c3d4e5f6g7h8i9j0a.json
|
|
└── ...
|
|
```
|
|
|
|
**File naming**: First 2 characters of hash as directory, full hash as filename
|
|
|
|
**File content**: Raw HTTP response body (JSON, HTML, XML, etc.)
|
|
|
|
### Cache Operations
|
|
|
|
#### Store Response
|
|
|
|
```typescript
|
|
interface CacheEntry {
|
|
url: string;
|
|
timestamp: number;
|
|
response: Response;
|
|
}
|
|
|
|
async function storeResponse(entry: CacheEntry): Promise<void> {
|
|
const key = hashUrl(entry.url);
|
|
const bodyPath = `snaps/${key.slice(0, 2)}/${key}.json`;
|
|
|
|
// Store body to file
|
|
await Deno.writeTextFile(bodyPath, await entry.response.text());
|
|
|
|
// Store metadata to database
|
|
await db.execute(`
|
|
INSERT INTO snaps (key, url, timestamp, status, headers, body_path, created_at)
|
|
VALUES (?, ?, ?, ?, ?, ?, ?)
|
|
`, [
|
|
key,
|
|
entry.url,
|
|
entry.timestamp,
|
|
entry.response.status,
|
|
JSON.stringify(Object.fromEntries(entry.response.headers)),
|
|
bodyPath,
|
|
Date.now()
|
|
]);
|
|
}
|
|
```
|
|
|
|
#### Retrieve Response
|
|
|
|
```typescript
|
|
async function getResponse(url: string, timestamp?: number): Promise<Response | null> {
|
|
const key = hashUrl(url);
|
|
|
|
let query = `SELECT * FROM snaps WHERE key = ?`;
|
|
const params = [key];
|
|
|
|
if (timestamp) {
|
|
// Permalink mode: exact timestamp match
|
|
query += ` AND timestamp = ?`;
|
|
params.push(timestamp);
|
|
} else {
|
|
// Normal mode: most recent within cache duration
|
|
const maxAge = 24 * 60 * 60 * 1000; // 24 hours
|
|
query += ` AND created_at > ? ORDER BY created_at DESC LIMIT 1`;
|
|
params.push(Date.now() - maxAge);
|
|
}
|
|
|
|
const row = await db.queryOne(query, params);
|
|
if (!row) return null;
|
|
|
|
// Read body from file
|
|
const body = await Deno.readTextFile(row.body_path);
|
|
|
|
// Reconstruct Response object
|
|
return new Response(body, {
|
|
status: row.status,
|
|
headers: JSON.parse(row.headers)
|
|
});
|
|
}
|
|
```
|
|
|
|
### Cache Policy
|
|
|
|
#### Default Policy
|
|
|
|
- **Duration**: 24 hours
|
|
- **Eviction**: No automatic eviction (manual cleanup required)
|
|
- **Size limit**: No enforced limit (grows indefinitely)
|
|
|
|
#### Permalink Policy
|
|
|
|
- **Duration**: Indefinite (never evicted)
|
|
- **Purpose**: Enable reproducible results
|
|
- **Lookup**: Exact timestamp match
|
|
|
|
#### Cache Key Generation
|
|
|
|
```typescript
|
|
function hashUrl(url: string): string {
|
|
// Normalize URL
|
|
const normalized = new URL(url);
|
|
normalized.searchParams.sort(); // Consistent parameter order
|
|
|
|
// Hash normalized URL
|
|
const encoder = new TextEncoder();
|
|
const data = encoder.encode(normalized.toString());
|
|
const hashBuffer = await crypto.subtle.digest('SHA-256', data);
|
|
const hashArray = Array.from(new Uint8Array(hashBuffer));
|
|
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
|
|
}
|
|
```
|
|
|
|
### Cache Management
|
|
|
|
#### Manual Cleanup
|
|
|
|
No automatic cleanup. Users must manually delete old cache entries:
|
|
|
|
```bash
|
|
# Delete cache older than 30 days
|
|
sqlite3 snaps.db "DELETE FROM snaps WHERE created_at < $(date -d '30 days ago' +%s)000"
|
|
|
|
# Clean up orphaned files
|
|
find snaps/ -type f -mtime +30 -delete
|
|
```
|
|
|
|
#### Cache Statistics
|
|
|
|
```bash
|
|
# Total cache entries
|
|
sqlite3 snaps.db "SELECT COUNT(*) FROM snaps"
|
|
|
|
# Cache size
|
|
du -sh snaps/
|
|
|
|
# Entries per provider
|
|
sqlite3 snaps.db "SELECT url, COUNT(*) FROM snaps GROUP BY url"
|
|
```
|
|
|
|
## MBID Cache
|
|
|
|
### Purpose
|
|
|
|
Cache MusicBrainz ID (MBID) mappings for external URLs to avoid repeated API calls.
|
|
|
|
### Storage Location
|
|
|
|
- **Development**: `localStorage` (persistent across sessions)
|
|
- **Production**: `sessionStorage` (cleared on browser close)
|
|
|
|
**Rationale**: Development benefits from persistent cache, production prioritizes fresh data.
|
|
|
|
### Cache Structure
|
|
|
|
```typescript
|
|
interface MBIDCache {
|
|
[externalUrl: string]: MBIDCacheEntry;
|
|
}
|
|
|
|
interface MBIDCacheEntry {
|
|
mbid: string;
|
|
type: 'release' | 'release-group' | 'recording' | 'artist' | 'label';
|
|
cached: number; // Unix timestamp
|
|
}
|
|
```
|
|
|
|
### Cache Operations
|
|
|
|
#### Store MBID Mapping
|
|
|
|
```typescript
|
|
function cacheMBID(url: string, mbid: string, type: string): void {
|
|
const cache = getMBIDCache();
|
|
cache[url] = {
|
|
mbid,
|
|
type,
|
|
cached: Date.now()
|
|
};
|
|
setMBIDCache(cache);
|
|
}
|
|
|
|
function getMBIDCache(): MBIDCache {
|
|
const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage;
|
|
const cached = storage.getItem('harmony_mbid_cache');
|
|
return cached ? JSON.parse(cached) : {};
|
|
}
|
|
|
|
function setMBIDCache(cache: MBIDCache): void {
|
|
const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage;
|
|
storage.setItem('harmony_mbid_cache', JSON.stringify(cache));
|
|
}
|
|
```
|
|
|
|
#### Retrieve MBID Mapping
|
|
|
|
```typescript
|
|
function getCachedMBID(url: string): MBIDCacheEntry | null {
|
|
const cache = getMBIDCache();
|
|
const entry = cache[url];
|
|
|
|
if (!entry) return null;
|
|
|
|
// Check if cache is stale (24 hours)
|
|
const maxAge = 24 * 60 * 60 * 1000;
|
|
if (Date.now() - entry.cached > maxAge) {
|
|
delete cache[url];
|
|
setMBIDCache(cache);
|
|
return null;
|
|
}
|
|
|
|
return entry;
|
|
}
|
|
```
|
|
|
|
#### Batch MBID Lookup
|
|
|
|
MusicBrainz API supports batch URL lookup (up to 100 URLs per request):
|
|
|
|
```typescript
|
|
async function resolveMBIDs(urls: string[]): Promise<Map<string, MBIDCacheEntry>> {
|
|
const results = new Map<string, MBIDCacheEntry>();
|
|
|
|
// Check cache first
|
|
const uncached: string[] = [];
|
|
for (const url of urls) {
|
|
const cached = getCachedMBID(url);
|
|
if (cached) {
|
|
results.set(url, cached);
|
|
} else {
|
|
uncached.push(url);
|
|
}
|
|
}
|
|
|
|
// Batch lookup uncached URLs (100 at a time)
|
|
for (let i = 0; i < uncached.length; i += 100) {
|
|
const batch = uncached.slice(i, i + 100);
|
|
const params = batch.map(url => `resource=${encodeURIComponent(url)}`).join('&');
|
|
const response = await fetch(`https://musicbrainz.org/ws/2/url?${params}`);
|
|
const data = await response.json();
|
|
|
|
// Parse response and cache results
|
|
for (const urlData of data.urls) {
|
|
const mbid = urlData.relations[0]?.release?.id;
|
|
const type = urlData.relations[0]?.type;
|
|
if (mbid) {
|
|
cacheMBID(urlData.resource, mbid, type);
|
|
results.set(urlData.resource, { mbid, type, cached: Date.now() });
|
|
}
|
|
}
|
|
}
|
|
|
|
return results;
|
|
}
|
|
```
|
|
|
|
## Core Data Model: HarmonyRelease
|
|
|
|
### Schema Definition
|
|
|
|
**Location**: `harmonizer/types.ts` (273 lines)
|
|
|
|
**Full Interface**:
|
|
```typescript
|
|
interface HarmonyRelease {
|
|
// ===== Basic Metadata =====
|
|
title: string;
|
|
artists: ArtistCreditName[];
|
|
gtin?: string; // Global Trade Item Number (barcode)
|
|
|
|
// ===== Media and Tracks =====
|
|
media: HarmonyMedium[];
|
|
|
|
// ===== Release Details =====
|
|
language?: string; // ISO 639-3 code
|
|
script?: string; // ISO 15924 code
|
|
status?: ReleaseStatus;
|
|
types: ReleaseType[];
|
|
releaseDate?: PartialDate;
|
|
|
|
// ===== Commercial Information =====
|
|
labels: Label[];
|
|
packaging?: PackagingType;
|
|
copyright?: string;
|
|
|
|
// ===== Distribution =====
|
|
availableIn?: string[]; // ISO 3166-1 alpha-2 country codes
|
|
excludedFrom?: string[]; // ISO 3166-1 alpha-2 country codes
|
|
|
|
// ===== Visual Assets =====
|
|
images: Image[];
|
|
|
|
// ===== External Links =====
|
|
externalLinks: ExternalLink[];
|
|
|
|
// ===== Metadata About Metadata =====
|
|
info: ReleaseInfo;
|
|
}
|
|
```
|
|
|
|
### Sub-Structures
|
|
|
|
#### ArtistCreditName
|
|
|
|
```typescript
|
|
interface ArtistCreditName {
|
|
name: string; // Artist name
|
|
creditedName?: string; // Alternative credit (e.g., "feat. Artist")
|
|
joinPhrase?: string; // Separator (e.g., " & ", " feat. ", " vs. ")
|
|
mbid?: string; // MusicBrainz artist ID
|
|
}
|
|
```
|
|
|
|
**Example**:
|
|
```typescript
|
|
[
|
|
{ name: "Artist A", joinPhrase: " & " },
|
|
{ name: "Artist B", joinPhrase: " feat. " },
|
|
{ name: "Artist C", creditedName: "Artist C (DJ Set)" }
|
|
]
|
|
```
|
|
|
|
**Rendering**: "Artist A & Artist B feat. Artist C (DJ Set)"
|
|
|
|
#### HarmonyMedium
|
|
|
|
```typescript
|
|
interface HarmonyMedium {
|
|
title?: string; // Medium title (e.g., "Disc 1: The Album")
|
|
format?: MediumFormat;
|
|
position: number; // 1-indexed
|
|
tracks: HarmonyTrack[];
|
|
}
|
|
|
|
enum MediumFormat {
|
|
CD = 'CD',
|
|
Vinyl = 'Vinyl',
|
|
Digital = 'Digital Media',
|
|
Cassette = 'Cassette',
|
|
DVD = 'DVD',
|
|
BluRay = 'Blu-ray',
|
|
Other = 'Other'
|
|
}
|
|
```
|
|
|
|
#### HarmonyTrack
|
|
|
|
```typescript
|
|
interface HarmonyTrack {
|
|
title: string;
|
|
artists?: ArtistCreditName[]; // Track-specific artists (overrides release artists)
|
|
position: number; // 1-indexed within medium
|
|
length?: number; // Duration in milliseconds
|
|
isrc?: string; // International Standard Recording Code
|
|
}
|
|
```
|
|
|
|
**Example**:
|
|
```typescript
|
|
{
|
|
title: "Track Title",
|
|
artists: [{ name: "Track Artist" }],
|
|
position: 1,
|
|
length: 245000, // 4:05
|
|
isrc: "USRC17607839"
|
|
}
|
|
```
|
|
|
|
#### Label
|
|
|
|
```typescript
|
|
interface Label {
|
|
name: string;
|
|
catalogNumber?: string;
|
|
mbid?: string; // MusicBrainz label ID
|
|
}
|
|
```
|
|
|
|
**Example**:
|
|
```typescript
|
|
[
|
|
{ name: "Record Label", catalogNumber: "RL-12345" },
|
|
{ name: "Distributor", catalogNumber: "DIST-67890" }
|
|
]
|
|
```
|
|
|
|
#### Image
|
|
|
|
```typescript
|
|
interface Image {
|
|
url: string;
|
|
types: ImageType[];
|
|
width?: number;
|
|
height?: number;
|
|
comment?: string;
|
|
}
|
|
|
|
enum ImageType {
|
|
Front = 'front',
|
|
Back = 'back',
|
|
Medium = 'medium',
|
|
Tray = 'tray',
|
|
Booklet = 'booklet',
|
|
Obi = 'obi',
|
|
Spine = 'spine',
|
|
Track = 'track',
|
|
Liner = 'liner',
|
|
Sticker = 'sticker',
|
|
Poster = 'poster',
|
|
Watermark = 'watermark',
|
|
Raw = 'raw',
|
|
Unedited = 'unedited'
|
|
}
|
|
```
|
|
|
|
**Example**:
|
|
```typescript
|
|
[
|
|
{
|
|
url: "https://i.scdn.co/image/ab67616d0000b273...",
|
|
types: [ImageType.Front],
|
|
width: 2000,
|
|
height: 2000
|
|
},
|
|
{
|
|
url: "https://e-cdn-images.dzcdn.net/images/cover/...",
|
|
types: [ImageType.Front],
|
|
width: 1400,
|
|
height: 1400,
|
|
comment: "Deezer cover"
|
|
}
|
|
]
|
|
```
|
|
|
|
#### ExternalLink
|
|
|
|
```typescript
|
|
interface ExternalLink {
|
|
url: string;
|
|
types: LinkType[];
|
|
}
|
|
|
|
enum LinkType {
|
|
Streaming = 'streaming',
|
|
Purchase = 'purchase',
|
|
Download = 'download',
|
|
License = 'license',
|
|
Crowdfunding = 'crowdfunding',
|
|
Other = 'other'
|
|
}
|
|
```
|
|
|
|
**Example**:
|
|
```typescript
|
|
[
|
|
{
|
|
url: "https://open.spotify.com/album/xyz",
|
|
types: [LinkType.Streaming]
|
|
},
|
|
{
|
|
url: "https://bandcamp.com/album/xyz",
|
|
types: [LinkType.Streaming, LinkType.Purchase]
|
|
}
|
|
]
|
|
```
|
|
|
|
#### ReleaseInfo
|
|
|
|
```typescript
|
|
interface ReleaseInfo {
|
|
providers: string[]; // Provider names that contributed data
|
|
messages: Message[]; // Warnings, errors, info messages
|
|
sourceMap?: SourceMap; // Property -> provider mapping (only in MergedHarmonyRelease)
|
|
incompatibleData?: IncompatibilityInfo; // Conflicts (only in MergedHarmonyRelease)
|
|
}
|
|
|
|
interface Message {
|
|
level: 'error' | 'warning' | 'info';
|
|
text: string;
|
|
provider?: string;
|
|
}
|
|
```
|
|
|
|
**Example**:
|
|
```typescript
|
|
{
|
|
providers: ["spotify", "deezer", "itunes"],
|
|
messages: [
|
|
{
|
|
level: "warning",
|
|
text: "Release date conflict: Spotify (2014-11-24) vs iTunes (2014-11-25)",
|
|
provider: "itunes"
|
|
},
|
|
{
|
|
level: "info",
|
|
text: "Using Spotify value (higher preference)"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Enumerations
|
|
|
|
#### ReleaseStatus
|
|
|
|
```typescript
|
|
enum ReleaseStatus {
|
|
Official = 'official',
|
|
Promotion = 'promotion',
|
|
Bootleg = 'bootleg',
|
|
PseudoRelease = 'pseudo-release'
|
|
}
|
|
```
|
|
|
|
#### ReleaseType
|
|
|
|
```typescript
|
|
enum ReleaseType {
|
|
// Primary types
|
|
Album = 'album',
|
|
Single = 'single',
|
|
EP = 'ep',
|
|
Broadcast = 'broadcast',
|
|
Other = 'other',
|
|
|
|
// Secondary types
|
|
Compilation = 'compilation',
|
|
Soundtrack = 'soundtrack',
|
|
Spokenword = 'spokenword',
|
|
Interview = 'interview',
|
|
Audiobook = 'audiobook',
|
|
AudioDrama = 'audio drama',
|
|
Live = 'live',
|
|
Remix = 'remix',
|
|
DJMix = 'dj-mix',
|
|
Mixtape = 'mixtape',
|
|
Demo = 'demo',
|
|
FieldRecording = 'field recording'
|
|
}
|
|
```
|
|
|
|
**Usage**: Array of types (primary + secondary)
|
|
```typescript
|
|
types: [ReleaseType.Album, ReleaseType.Live] // Live album
|
|
types: [ReleaseType.EP, ReleaseType.Remix] // Remix EP
|
|
```
|
|
|
|
#### PackagingType
|
|
|
|
```typescript
|
|
enum PackagingType {
|
|
JewelCase = 'jewel case',
|
|
SlimJewelCase = 'slim jewel case',
|
|
Digipak = 'digipak',
|
|
Cardboard = 'cardboard/paper sleeve',
|
|
KeepCase = 'keep case',
|
|
None = 'none',
|
|
Other = 'other'
|
|
}
|
|
```
|
|
|
|
#### PartialDate
|
|
|
|
```typescript
|
|
interface PartialDate {
|
|
year: number;
|
|
month?: number; // 1-12
|
|
day?: number; // 1-31
|
|
}
|
|
```
|
|
|
|
**Examples**:
|
|
```typescript
|
|
{ year: 2014 } // Year only
|
|
{ year: 2014, month: 11 } // Year and month
|
|
{ year: 2014, month: 11, day: 24 } // Full date
|
|
```
|
|
|
|
**Serialization**:
|
|
```typescript
|
|
function serializePartialDate(date: PartialDate): string {
|
|
let result = date.year.toString();
|
|
if (date.month) {
|
|
result += `-${date.month.toString().padStart(2, '0')}`;
|
|
if (date.day) {
|
|
result += `-${date.day.toString().padStart(2, '0')}`;
|
|
}
|
|
}
|
|
return result;
|
|
}
|
|
|
|
// Examples:
|
|
// { year: 2014 } -> "2014"
|
|
// { year: 2014, month: 11 } -> "2014-11"
|
|
// { year: 2014, month: 11, day: 24 } -> "2014-11-24"
|
|
```
|
|
|
|
## MergedHarmonyRelease
|
|
|
|
Extends `HarmonyRelease` with merge metadata.
|
|
|
|
```typescript
|
|
interface MergedHarmonyRelease extends HarmonyRelease {
|
|
info: ReleaseInfo & {
|
|
sourceMap: SourceMap;
|
|
incompatibleData?: IncompatibilityInfo;
|
|
};
|
|
}
|
|
|
|
interface SourceMap {
|
|
[propertyPath: string]: string; // Property path -> provider name
|
|
}
|
|
|
|
interface IncompatibilityInfo {
|
|
conflicts: Conflict[];
|
|
warnings: string[];
|
|
}
|
|
|
|
interface Conflict {
|
|
property: string;
|
|
values: ConflictValue[];
|
|
}
|
|
|
|
interface ConflictValue {
|
|
provider: string;
|
|
value: any;
|
|
}
|
|
```
|
|
|
|
**Example**:
|
|
```typescript
|
|
{
|
|
title: "Album Title",
|
|
releaseDate: { year: 2014, month: 11, day: 24 },
|
|
// ... other fields
|
|
info: {
|
|
providers: ["spotify", "deezer", "itunes"],
|
|
sourceMap: {
|
|
"title": "spotify",
|
|
"releaseDate": "spotify",
|
|
"gtin": "deezer",
|
|
"media[0].tracks[0].isrc": "spotify"
|
|
},
|
|
incompatibleData: {
|
|
conflicts: [
|
|
{
|
|
property: "releaseDate",
|
|
values: [
|
|
{ provider: "spotify", value: { year: 2014, month: 11, day: 24 } },
|
|
{ provider: "itunes", value: { year: 2014, month: 11, day: 25 } }
|
|
]
|
|
}
|
|
],
|
|
warnings: [
|
|
"Release date conflict resolved using Spotify value (higher preference)"
|
|
]
|
|
},
|
|
messages: []
|
|
}
|
|
}
|
|
```
|
|
|
|
## Data Transformations
|
|
|
|
### Provider-Specific to HarmonyRelease
|
|
|
|
Each provider implements a `harmonize()` method:
|
|
|
|
```typescript
|
|
// Spotify example (conceptual)
|
|
class SpotifyProvider {
|
|
harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease {
|
|
return {
|
|
title: spotifyAlbum.name,
|
|
artists: spotifyAlbum.artists.map(a => ({
|
|
name: a.name,
|
|
mbid: undefined // Spotify doesn't provide MBIDs
|
|
})),
|
|
gtin: spotifyAlbum.external_ids?.upc,
|
|
media: [{
|
|
format: MediumFormat.Digital,
|
|
position: 1,
|
|
tracks: spotifyAlbum.tracks.items.map((t, i) => ({
|
|
title: t.name,
|
|
position: i + 1,
|
|
length: t.duration_ms,
|
|
isrc: t.external_ids?.isrc
|
|
}))
|
|
}],
|
|
releaseDate: this.parseDate(spotifyAlbum.release_date),
|
|
types: this.inferTypes(spotifyAlbum.album_type),
|
|
images: spotifyAlbum.images.map(img => ({
|
|
url: img.url,
|
|
types: [ImageType.Front],
|
|
width: img.width,
|
|
height: img.height
|
|
})),
|
|
externalLinks: [{
|
|
url: spotifyAlbum.external_urls.spotify,
|
|
types: [LinkType.Streaming]
|
|
}],
|
|
labels: spotifyAlbum.label ? [{ name: spotifyAlbum.label }] : [],
|
|
copyright: spotifyAlbum.copyrights?.[0]?.text,
|
|
availableIn: spotifyAlbum.available_markets,
|
|
info: {
|
|
providers: ["spotify"],
|
|
messages: []
|
|
}
|
|
};
|
|
}
|
|
}
|
|
```
|
|
|
|
### HarmonyRelease to MusicBrainz Format
|
|
|
|
**Location**: `musicbrainz/seeding.ts`
|
|
|
|
```typescript
|
|
interface MusicBrainzRelease {
|
|
name: string;
|
|
artist_credit: MBArtistCredit[];
|
|
barcode?: string;
|
|
release_events: MBReleaseEvent[];
|
|
labels: MBLabel[];
|
|
mediums: MBMedium[];
|
|
release_group: {
|
|
primary_type: string;
|
|
secondary_types: string[];
|
|
};
|
|
language?: string;
|
|
script?: string;
|
|
packaging?: string;
|
|
annotation?: string;
|
|
}
|
|
|
|
function convertToMusicBrainz(release: MergedHarmonyRelease): MusicBrainzRelease {
|
|
return {
|
|
name: release.title,
|
|
artist_credit: release.artists.map(a => ({
|
|
name: a.name,
|
|
credited_name: a.creditedName,
|
|
join_phrase: a.joinPhrase || '',
|
|
mbid: a.mbid
|
|
})),
|
|
barcode: release.gtin,
|
|
release_events: convertReleaseEvents(release.releaseDate, release.availableIn),
|
|
labels: release.labels.map(l => ({
|
|
name: l.name,
|
|
catalog_number: l.catalogNumber,
|
|
mbid: l.mbid
|
|
})),
|
|
mediums: release.media.map(m => ({
|
|
format: m.format,
|
|
position: m.position,
|
|
title: m.title,
|
|
tracks: m.tracks.map(t => ({
|
|
title: t.title,
|
|
position: t.position,
|
|
length: t.length,
|
|
isrc: t.isrc,
|
|
artist_credit: t.artists?.map(a => ({
|
|
name: a.name,
|
|
join_phrase: a.joinPhrase || ''
|
|
}))
|
|
}))
|
|
})),
|
|
release_group: {
|
|
primary_type: release.types.find(t => isPrimaryType(t)) || 'album',
|
|
secondary_types: release.types.filter(t => !isPrimaryType(t))
|
|
},
|
|
language: release.language,
|
|
script: release.script,
|
|
packaging: release.packaging,
|
|
annotation: buildAnnotation(release)
|
|
};
|
|
}
|
|
```
|
|
|
|
## Data Validation
|
|
|
|
### GTIN Validation
|
|
|
|
```typescript
|
|
function validateGTIN(gtin: string): boolean {
|
|
// GTIN-13 (EAN-13) validation
|
|
if (!/^\d{13}$/.test(gtin)) return false;
|
|
|
|
// Check digit validation
|
|
const digits = gtin.split('').map(Number);
|
|
const checksum = digits.slice(0, 12).reduce((sum, digit, i) => {
|
|
return sum + digit * (i % 2 === 0 ? 1 : 3);
|
|
}, 0);
|
|
const checkDigit = (10 - (checksum % 10)) % 10;
|
|
|
|
return checkDigit === digits[12];
|
|
}
|
|
```
|
|
|
|
### ISRC Validation
|
|
|
|
```typescript
|
|
function validateISRC(isrc: string): boolean {
|
|
// Format: CC-XXX-YY-NNNNN
|
|
// CC: Country code (2 letters)
|
|
// XXX: Registrant code (3 alphanumeric)
|
|
// YY: Year (2 digits)
|
|
// NNNNN: Designation code (5 digits)
|
|
return /^[A-Z]{2}-?[A-Z0-9]{3}-?\d{2}-?\d{5}$/.test(isrc);
|
|
}
|
|
|
|
function normalizeISRC(isrc: string): string {
|
|
// Remove hyphens
|
|
return isrc.replace(/-/g, '');
|
|
}
|
|
```
|
|
|
|
### Date Validation
|
|
|
|
```typescript
|
|
function validatePartialDate(date: PartialDate): boolean {
|
|
if (date.year < 1000 || date.year > 9999) return false;
|
|
if (date.month && (date.month < 1 || date.month > 12)) return false;
|
|
if (date.day && (date.day < 1 || date.day > 31)) return false;
|
|
|
|
// Validate day for specific month
|
|
if (date.month && date.day) {
|
|
const daysInMonth = new Date(date.year, date.month, 0).getDate();
|
|
if (date.day > daysInMonth) return false;
|
|
}
|
|
|
|
return true;
|
|
}
|
|
```
|
|
|
|
## Data Size Estimates
|
|
|
|
### Typical HarmonyRelease Size
|
|
|
|
**Single-disc album** (12 tracks):
|
|
- JSON serialized: ~15-25 KB
|
|
- With images: ~20-30 KB (image URLs only, not image data)
|
|
|
|
**Multi-disc compilation** (50 tracks):
|
|
- JSON serialized: ~50-80 KB
|
|
|
|
### Cache Size Estimates
|
|
|
|
**Provider response sizes**:
|
|
- Spotify album: ~10-20 KB
|
|
- Deezer album: ~15-25 KB
|
|
- iTunes album: ~20-30 KB
|
|
- Bandcamp page: ~50-100 KB (HTML)
|
|
|
|
**Daily cache growth** (100 lookups/day):
|
|
- Database: ~50 KB (metadata only)
|
|
- Files: ~2-5 MB (response bodies)
|
|
|
|
**Annual cache size** (36,500 lookups/year):
|
|
- Database: ~18 MB
|
|
- Files: ~730 MB - 1.8 GB
|
|
|
|
## No Migrations
|
|
|
|
Since Harmony has no traditional database, there are no schema migrations.
|
|
|
|
**Schema evolution strategy**:
|
|
1. Add new optional fields to `HarmonyRelease` interface
|
|
2. Update provider `harmonize()` methods to populate new fields
|
|
3. Update merge algorithm to handle new fields
|
|
4. No data migration required (old cached responses still valid)
|
|
|
|
**Breaking changes**:
|
|
1. Rename or remove fields in `HarmonyRelease`
|
|
2. Clear cache (delete `snaps.db` and `snaps/`)
|
|
3. Rebuild cache on next lookup
|
|
|
|
## Summary
|
|
|
|
Harmony's data architecture demonstrates:
|
|
|
|
1. **Cache-first design**: `snap_storage` eliminates need for traditional database
|
|
2. **Permalink system**: Timestamp-based cache replay enables reproducibility
|
|
3. **Rich data model**: 273-line `HarmonyRelease` schema covers all metadata needs
|
|
4. **Type safety**: Full TypeScript coverage ensures data consistency
|
|
5. **No migrations**: Schema evolution without data migration complexity
|
|
6. **Stateless processing**: All transformations in-memory, no persistent state
|
|
7. **MBID caching**: Efficient batch lookup reduces MusicBrainz API calls
|
|
|
|
This architecture is ideal for read-heavy, stateless applications where reproducibility and API compliance are priorities.
|