# Harmony - Data Model and Storage Analysis ## Storage Philosophy Harmony employs a **cache-first, no-database** architecture: - **No traditional database**: No PostgreSQL, MySQL, MongoDB, etc. - **No persistent user data**: No accounts, no saved searches, no user-generated content - **Cache as storage**: HTTP response caching via `snap_storage` library - **In-memory processing**: All data transformations happen in memory - **Stateless design**: Each request is independent This approach prioritizes: - **Simplicity**: No database migrations, no schema evolution - **Reproducibility**: Permalink system enables exact result replay - **API compliance**: Caching reduces provider API calls - **Deployment ease**: No database server required ## Persistence Layer: snap_storage ### Overview `snap_storage` is a Deno library for HTTP response caching with SQLite backend. **Repository**: https://github.com/kellnerd/snap-storage (same author as Harmony) **Purpose**: Store HTTP responses with timestamps for later retrieval ### Storage Structure #### SQLite Database: `snaps.db` **Location**: `${HARMONY_DATA_DIR}/snaps.db` (default: `./snaps.db`) **Schema** (conceptual): ```sql CREATE TABLE snaps ( id INTEGER PRIMARY KEY AUTOINCREMENT, key TEXT NOT NULL UNIQUE, url TEXT NOT NULL, timestamp INTEGER NOT NULL, status INTEGER NOT NULL, headers TEXT NOT NULL, body_path TEXT NOT NULL, created_at INTEGER NOT NULL ); CREATE INDEX idx_snaps_key ON snaps(key); CREATE INDEX idx_snaps_timestamp ON snaps(timestamp); CREATE INDEX idx_snaps_url ON snaps(url); ``` **Fields**: - `key`: Cache key (hash of URL + parameters) - `url`: Original request URL - `timestamp`: Unix timestamp of request - `status`: HTTP status code - `headers`: JSON-encoded response headers - `body_path`: Path to response body file in `snaps/` directory - `created_at`: Record creation timestamp #### File Directory: `snaps/` **Location**: `${HARMONY_DATA_DIR}/snaps/` (default: `./snaps/`) **Structure**: ``` snaps/ ├── 0a/ │ ├── 0a1b2c3d4e5f6g7h8i9j.json │ └── 0a9f8e7d6c5b4a3.json ├── 1b/ │ └── 1b2c3d4e5f6g7h8i9j0a.json └── ... ``` **File naming**: First 2 characters of hash as directory, full hash as filename **File content**: Raw HTTP response body (JSON, HTML, XML, etc.) ### Cache Operations #### Store Response ```typescript interface CacheEntry { url: string; timestamp: number; response: Response; } async function storeResponse(entry: CacheEntry): Promise { const key = hashUrl(entry.url); const bodyPath = `snaps/${key.slice(0, 2)}/${key}.json`; // Store body to file await Deno.writeTextFile(bodyPath, await entry.response.text()); // Store metadata to database await db.execute(` INSERT INTO snaps (key, url, timestamp, status, headers, body_path, created_at) VALUES (?, ?, ?, ?, ?, ?, ?) `, [ key, entry.url, entry.timestamp, entry.response.status, JSON.stringify(Object.fromEntries(entry.response.headers)), bodyPath, Date.now() ]); } ``` #### Retrieve Response ```typescript async function getResponse(url: string, timestamp?: number): Promise { const key = hashUrl(url); let query = `SELECT * FROM snaps WHERE key = ?`; const params = [key]; if (timestamp) { // Permalink mode: exact timestamp match query += ` AND timestamp = ?`; params.push(timestamp); } else { // Normal mode: most recent within cache duration const maxAge = 24 * 60 * 60 * 1000; // 24 hours query += ` AND created_at > ? ORDER BY created_at DESC LIMIT 1`; params.push(Date.now() - maxAge); } const row = await db.queryOne(query, params); if (!row) return null; // Read body from file const body = await Deno.readTextFile(row.body_path); // Reconstruct Response object return new Response(body, { status: row.status, headers: JSON.parse(row.headers) }); } ``` ### Cache Policy #### Default Policy - **Duration**: 24 hours - **Eviction**: No automatic eviction (manual cleanup required) - **Size limit**: No enforced limit (grows indefinitely) #### Permalink Policy - **Duration**: Indefinite (never evicted) - **Purpose**: Enable reproducible results - **Lookup**: Exact timestamp match #### Cache Key Generation ```typescript function hashUrl(url: string): string { // Normalize URL const normalized = new URL(url); normalized.searchParams.sort(); // Consistent parameter order // Hash normalized URL const encoder = new TextEncoder(); const data = encoder.encode(normalized.toString()); const hashBuffer = await crypto.subtle.digest('SHA-256', data); const hashArray = Array.from(new Uint8Array(hashBuffer)); return hashArray.map(b => b.toString(16).padStart(2, '0')).join(''); } ``` ### Cache Management #### Manual Cleanup No automatic cleanup. Users must manually delete old cache entries: ```bash # Delete cache older than 30 days sqlite3 snaps.db "DELETE FROM snaps WHERE created_at < $(date -d '30 days ago' +%s)000" # Clean up orphaned files find snaps/ -type f -mtime +30 -delete ``` #### Cache Statistics ```bash # Total cache entries sqlite3 snaps.db "SELECT COUNT(*) FROM snaps" # Cache size du -sh snaps/ # Entries per provider sqlite3 snaps.db "SELECT url, COUNT(*) FROM snaps GROUP BY url" ``` ## MBID Cache ### Purpose Cache MusicBrainz ID (MBID) mappings for external URLs to avoid repeated API calls. ### Storage Location - **Development**: `localStorage` (persistent across sessions) - **Production**: `sessionStorage` (cleared on browser close) **Rationale**: Development benefits from persistent cache, production prioritizes fresh data. ### Cache Structure ```typescript interface MBIDCache { [externalUrl: string]: MBIDCacheEntry; } interface MBIDCacheEntry { mbid: string; type: 'release' | 'release-group' | 'recording' | 'artist' | 'label'; cached: number; // Unix timestamp } ``` ### Cache Operations #### Store MBID Mapping ```typescript function cacheMBID(url: string, mbid: string, type: string): void { const cache = getMBIDCache(); cache[url] = { mbid, type, cached: Date.now() }; setMBIDCache(cache); } function getMBIDCache(): MBIDCache { const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage; const cached = storage.getItem('harmony_mbid_cache'); return cached ? JSON.parse(cached) : {}; } function setMBIDCache(cache: MBIDCache): void { const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage; storage.setItem('harmony_mbid_cache', JSON.stringify(cache)); } ``` #### Retrieve MBID Mapping ```typescript function getCachedMBID(url: string): MBIDCacheEntry | null { const cache = getMBIDCache(); const entry = cache[url]; if (!entry) return null; // Check if cache is stale (24 hours) const maxAge = 24 * 60 * 60 * 1000; if (Date.now() - entry.cached > maxAge) { delete cache[url]; setMBIDCache(cache); return null; } return entry; } ``` #### Batch MBID Lookup MusicBrainz API supports batch URL lookup (up to 100 URLs per request): ```typescript async function resolveMBIDs(urls: string[]): Promise> { const results = new Map(); // Check cache first const uncached: string[] = []; for (const url of urls) { const cached = getCachedMBID(url); if (cached) { results.set(url, cached); } else { uncached.push(url); } } // Batch lookup uncached URLs (100 at a time) for (let i = 0; i < uncached.length; i += 100) { const batch = uncached.slice(i, i + 100); const params = batch.map(url => `resource=${encodeURIComponent(url)}`).join('&'); const response = await fetch(`https://musicbrainz.org/ws/2/url?${params}`); const data = await response.json(); // Parse response and cache results for (const urlData of data.urls) { const mbid = urlData.relations[0]?.release?.id; const type = urlData.relations[0]?.type; if (mbid) { cacheMBID(urlData.resource, mbid, type); results.set(urlData.resource, { mbid, type, cached: Date.now() }); } } } return results; } ``` ## Core Data Model: HarmonyRelease ### Schema Definition **Location**: `harmonizer/types.ts` (273 lines) **Full Interface**: ```typescript interface HarmonyRelease { // ===== Basic Metadata ===== title: string; artists: ArtistCreditName[]; gtin?: string; // Global Trade Item Number (barcode) // ===== Media and Tracks ===== media: HarmonyMedium[]; // ===== Release Details ===== language?: string; // ISO 639-3 code script?: string; // ISO 15924 code status?: ReleaseStatus; types: ReleaseType[]; releaseDate?: PartialDate; // ===== Commercial Information ===== labels: Label[]; packaging?: PackagingType; copyright?: string; // ===== Distribution ===== availableIn?: string[]; // ISO 3166-1 alpha-2 country codes excludedFrom?: string[]; // ISO 3166-1 alpha-2 country codes // ===== Visual Assets ===== images: Image[]; // ===== External Links ===== externalLinks: ExternalLink[]; // ===== Metadata About Metadata ===== info: ReleaseInfo; } ``` ### Sub-Structures #### ArtistCreditName ```typescript interface ArtistCreditName { name: string; // Artist name creditedName?: string; // Alternative credit (e.g., "feat. Artist") joinPhrase?: string; // Separator (e.g., " & ", " feat. ", " vs. ") mbid?: string; // MusicBrainz artist ID } ``` **Example**: ```typescript [ { name: "Artist A", joinPhrase: " & " }, { name: "Artist B", joinPhrase: " feat. " }, { name: "Artist C", creditedName: "Artist C (DJ Set)" } ] ``` **Rendering**: "Artist A & Artist B feat. Artist C (DJ Set)" #### HarmonyMedium ```typescript interface HarmonyMedium { title?: string; // Medium title (e.g., "Disc 1: The Album") format?: MediumFormat; position: number; // 1-indexed tracks: HarmonyTrack[]; } enum MediumFormat { CD = 'CD', Vinyl = 'Vinyl', Digital = 'Digital Media', Cassette = 'Cassette', DVD = 'DVD', BluRay = 'Blu-ray', Other = 'Other' } ``` #### HarmonyTrack ```typescript interface HarmonyTrack { title: string; artists?: ArtistCreditName[]; // Track-specific artists (overrides release artists) position: number; // 1-indexed within medium length?: number; // Duration in milliseconds isrc?: string; // International Standard Recording Code } ``` **Example**: ```typescript { title: "Track Title", artists: [{ name: "Track Artist" }], position: 1, length: 245000, // 4:05 isrc: "USRC17607839" } ``` #### Label ```typescript interface Label { name: string; catalogNumber?: string; mbid?: string; // MusicBrainz label ID } ``` **Example**: ```typescript [ { name: "Record Label", catalogNumber: "RL-12345" }, { name: "Distributor", catalogNumber: "DIST-67890" } ] ``` #### Image ```typescript interface Image { url: string; types: ImageType[]; width?: number; height?: number; comment?: string; } enum ImageType { Front = 'front', Back = 'back', Medium = 'medium', Tray = 'tray', Booklet = 'booklet', Obi = 'obi', Spine = 'spine', Track = 'track', Liner = 'liner', Sticker = 'sticker', Poster = 'poster', Watermark = 'watermark', Raw = 'raw', Unedited = 'unedited' } ``` **Example**: ```typescript [ { url: "https://i.scdn.co/image/ab67616d0000b273...", types: [ImageType.Front], width: 2000, height: 2000 }, { url: "https://e-cdn-images.dzcdn.net/images/cover/...", types: [ImageType.Front], width: 1400, height: 1400, comment: "Deezer cover" } ] ``` #### ExternalLink ```typescript interface ExternalLink { url: string; types: LinkType[]; } enum LinkType { Streaming = 'streaming', Purchase = 'purchase', Download = 'download', License = 'license', Crowdfunding = 'crowdfunding', Other = 'other' } ``` **Example**: ```typescript [ { url: "https://open.spotify.com/album/xyz", types: [LinkType.Streaming] }, { url: "https://bandcamp.com/album/xyz", types: [LinkType.Streaming, LinkType.Purchase] } ] ``` #### ReleaseInfo ```typescript interface ReleaseInfo { providers: string[]; // Provider names that contributed data messages: Message[]; // Warnings, errors, info messages sourceMap?: SourceMap; // Property -> provider mapping (only in MergedHarmonyRelease) incompatibleData?: IncompatibilityInfo; // Conflicts (only in MergedHarmonyRelease) } interface Message { level: 'error' | 'warning' | 'info'; text: string; provider?: string; } ``` **Example**: ```typescript { providers: ["spotify", "deezer", "itunes"], messages: [ { level: "warning", text: "Release date conflict: Spotify (2014-11-24) vs iTunes (2014-11-25)", provider: "itunes" }, { level: "info", text: "Using Spotify value (higher preference)" } ] } ``` ### Enumerations #### ReleaseStatus ```typescript enum ReleaseStatus { Official = 'official', Promotion = 'promotion', Bootleg = 'bootleg', PseudoRelease = 'pseudo-release' } ``` #### ReleaseType ```typescript enum ReleaseType { // Primary types Album = 'album', Single = 'single', EP = 'ep', Broadcast = 'broadcast', Other = 'other', // Secondary types Compilation = 'compilation', Soundtrack = 'soundtrack', Spokenword = 'spokenword', Interview = 'interview', Audiobook = 'audiobook', AudioDrama = 'audio drama', Live = 'live', Remix = 'remix', DJMix = 'dj-mix', Mixtape = 'mixtape', Demo = 'demo', FieldRecording = 'field recording' } ``` **Usage**: Array of types (primary + secondary) ```typescript types: [ReleaseType.Album, ReleaseType.Live] // Live album types: [ReleaseType.EP, ReleaseType.Remix] // Remix EP ``` #### PackagingType ```typescript enum PackagingType { JewelCase = 'jewel case', SlimJewelCase = 'slim jewel case', Digipak = 'digipak', Cardboard = 'cardboard/paper sleeve', KeepCase = 'keep case', None = 'none', Other = 'other' } ``` #### PartialDate ```typescript interface PartialDate { year: number; month?: number; // 1-12 day?: number; // 1-31 } ``` **Examples**: ```typescript { year: 2014 } // Year only { year: 2014, month: 11 } // Year and month { year: 2014, month: 11, day: 24 } // Full date ``` **Serialization**: ```typescript function serializePartialDate(date: PartialDate): string { let result = date.year.toString(); if (date.month) { result += `-${date.month.toString().padStart(2, '0')}`; if (date.day) { result += `-${date.day.toString().padStart(2, '0')}`; } } return result; } // Examples: // { year: 2014 } -> "2014" // { year: 2014, month: 11 } -> "2014-11" // { year: 2014, month: 11, day: 24 } -> "2014-11-24" ``` ## MergedHarmonyRelease Extends `HarmonyRelease` with merge metadata. ```typescript interface MergedHarmonyRelease extends HarmonyRelease { info: ReleaseInfo & { sourceMap: SourceMap; incompatibleData?: IncompatibilityInfo; }; } interface SourceMap { [propertyPath: string]: string; // Property path -> provider name } interface IncompatibilityInfo { conflicts: Conflict[]; warnings: string[]; } interface Conflict { property: string; values: ConflictValue[]; } interface ConflictValue { provider: string; value: any; } ``` **Example**: ```typescript { title: "Album Title", releaseDate: { year: 2014, month: 11, day: 24 }, // ... other fields info: { providers: ["spotify", "deezer", "itunes"], sourceMap: { "title": "spotify", "releaseDate": "spotify", "gtin": "deezer", "media[0].tracks[0].isrc": "spotify" }, incompatibleData: { conflicts: [ { property: "releaseDate", values: [ { provider: "spotify", value: { year: 2014, month: 11, day: 24 } }, { provider: "itunes", value: { year: 2014, month: 11, day: 25 } } ] } ], warnings: [ "Release date conflict resolved using Spotify value (higher preference)" ] }, messages: [] } } ``` ## Data Transformations ### Provider-Specific to HarmonyRelease Each provider implements a `harmonize()` method: ```typescript // Spotify example (conceptual) class SpotifyProvider { harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease { return { title: spotifyAlbum.name, artists: spotifyAlbum.artists.map(a => ({ name: a.name, mbid: undefined // Spotify doesn't provide MBIDs })), gtin: spotifyAlbum.external_ids?.upc, media: [{ format: MediumFormat.Digital, position: 1, tracks: spotifyAlbum.tracks.items.map((t, i) => ({ title: t.name, position: i + 1, length: t.duration_ms, isrc: t.external_ids?.isrc })) }], releaseDate: this.parseDate(spotifyAlbum.release_date), types: this.inferTypes(spotifyAlbum.album_type), images: spotifyAlbum.images.map(img => ({ url: img.url, types: [ImageType.Front], width: img.width, height: img.height })), externalLinks: [{ url: spotifyAlbum.external_urls.spotify, types: [LinkType.Streaming] }], labels: spotifyAlbum.label ? [{ name: spotifyAlbum.label }] : [], copyright: spotifyAlbum.copyrights?.[0]?.text, availableIn: spotifyAlbum.available_markets, info: { providers: ["spotify"], messages: [] } }; } } ``` ### HarmonyRelease to MusicBrainz Format **Location**: `musicbrainz/seeding.ts` ```typescript interface MusicBrainzRelease { name: string; artist_credit: MBArtistCredit[]; barcode?: string; release_events: MBReleaseEvent[]; labels: MBLabel[]; mediums: MBMedium[]; release_group: { primary_type: string; secondary_types: string[]; }; language?: string; script?: string; packaging?: string; annotation?: string; } function convertToMusicBrainz(release: MergedHarmonyRelease): MusicBrainzRelease { return { name: release.title, artist_credit: release.artists.map(a => ({ name: a.name, credited_name: a.creditedName, join_phrase: a.joinPhrase || '', mbid: a.mbid })), barcode: release.gtin, release_events: convertReleaseEvents(release.releaseDate, release.availableIn), labels: release.labels.map(l => ({ name: l.name, catalog_number: l.catalogNumber, mbid: l.mbid })), mediums: release.media.map(m => ({ format: m.format, position: m.position, title: m.title, tracks: m.tracks.map(t => ({ title: t.title, position: t.position, length: t.length, isrc: t.isrc, artist_credit: t.artists?.map(a => ({ name: a.name, join_phrase: a.joinPhrase || '' })) })) })), release_group: { primary_type: release.types.find(t => isPrimaryType(t)) || 'album', secondary_types: release.types.filter(t => !isPrimaryType(t)) }, language: release.language, script: release.script, packaging: release.packaging, annotation: buildAnnotation(release) }; } ``` ## Data Validation ### GTIN Validation ```typescript function validateGTIN(gtin: string): boolean { // GTIN-13 (EAN-13) validation if (!/^\d{13}$/.test(gtin)) return false; // Check digit validation const digits = gtin.split('').map(Number); const checksum = digits.slice(0, 12).reduce((sum, digit, i) => { return sum + digit * (i % 2 === 0 ? 1 : 3); }, 0); const checkDigit = (10 - (checksum % 10)) % 10; return checkDigit === digits[12]; } ``` ### ISRC Validation ```typescript function validateISRC(isrc: string): boolean { // Format: CC-XXX-YY-NNNNN // CC: Country code (2 letters) // XXX: Registrant code (3 alphanumeric) // YY: Year (2 digits) // NNNNN: Designation code (5 digits) return /^[A-Z]{2}-?[A-Z0-9]{3}-?\d{2}-?\d{5}$/.test(isrc); } function normalizeISRC(isrc: string): string { // Remove hyphens return isrc.replace(/-/g, ''); } ``` ### Date Validation ```typescript function validatePartialDate(date: PartialDate): boolean { if (date.year < 1000 || date.year > 9999) return false; if (date.month && (date.month < 1 || date.month > 12)) return false; if (date.day && (date.day < 1 || date.day > 31)) return false; // Validate day for specific month if (date.month && date.day) { const daysInMonth = new Date(date.year, date.month, 0).getDate(); if (date.day > daysInMonth) return false; } return true; } ``` ## Data Size Estimates ### Typical HarmonyRelease Size **Single-disc album** (12 tracks): - JSON serialized: ~15-25 KB - With images: ~20-30 KB (image URLs only, not image data) **Multi-disc compilation** (50 tracks): - JSON serialized: ~50-80 KB ### Cache Size Estimates **Provider response sizes**: - Spotify album: ~10-20 KB - Deezer album: ~15-25 KB - iTunes album: ~20-30 KB - Bandcamp page: ~50-100 KB (HTML) **Daily cache growth** (100 lookups/day): - Database: ~50 KB (metadata only) - Files: ~2-5 MB (response bodies) **Annual cache size** (36,500 lookups/year): - Database: ~18 MB - Files: ~730 MB - 1.8 GB ## No Migrations Since Harmony has no traditional database, there are no schema migrations. **Schema evolution strategy**: 1. Add new optional fields to `HarmonyRelease` interface 2. Update provider `harmonize()` methods to populate new fields 3. Update merge algorithm to handle new fields 4. No data migration required (old cached responses still valid) **Breaking changes**: 1. Rename or remove fields in `HarmonyRelease` 2. Clear cache (delete `snaps.db` and `snaps/`) 3. Rebuild cache on next lookup ## Summary Harmony's data architecture demonstrates: 1. **Cache-first design**: `snap_storage` eliminates need for traditional database 2. **Permalink system**: Timestamp-based cache replay enables reproducibility 3. **Rich data model**: 273-line `HarmonyRelease` schema covers all metadata needs 4. **Type safety**: Full TypeScript coverage ensures data consistency 5. **No migrations**: Schema evolution without data migration complexity 6. **Stateless processing**: All transformations in-memory, no persistent state 7. **MBID caching**: Efficient batch lookup reduces MusicBrainz API calls This architecture is ideal for read-heavy, stateless applications where reproducibility and API compliance are priorities.