- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
21 KiB
Harmony - Data Model and Storage Analysis
Storage Philosophy
Harmony employs a cache-first, no-database architecture:
- No traditional database: No PostgreSQL, MySQL, MongoDB, etc.
- No persistent user data: No accounts, no saved searches, no user-generated content
- Cache as storage: HTTP response caching via
snap_storagelibrary - In-memory processing: All data transformations happen in memory
- Stateless design: Each request is independent
This approach prioritizes:
- Simplicity: No database migrations, no schema evolution
- Reproducibility: Permalink system enables exact result replay
- API compliance: Caching reduces provider API calls
- Deployment ease: No database server required
Persistence Layer: snap_storage
Overview
snap_storage is a Deno library for HTTP response caching with SQLite backend.
Repository: https://github.com/kellnerd/snap-storage (same author as Harmony)
Purpose: Store HTTP responses with timestamps for later retrieval
Storage Structure
SQLite Database: snaps.db
Location: ${HARMONY_DATA_DIR}/snaps.db (default: ./snaps.db)
Schema (conceptual):
CREATE TABLE snaps (
id INTEGER PRIMARY KEY AUTOINCREMENT,
key TEXT NOT NULL UNIQUE,
url TEXT NOT NULL,
timestamp INTEGER NOT NULL,
status INTEGER NOT NULL,
headers TEXT NOT NULL,
body_path TEXT NOT NULL,
created_at INTEGER NOT NULL
);
CREATE INDEX idx_snaps_key ON snaps(key);
CREATE INDEX idx_snaps_timestamp ON snaps(timestamp);
CREATE INDEX idx_snaps_url ON snaps(url);
Fields:
key: Cache key (hash of URL + parameters)url: Original request URLtimestamp: Unix timestamp of requeststatus: HTTP status codeheaders: JSON-encoded response headersbody_path: Path to response body file insnaps/directorycreated_at: Record creation timestamp
File Directory: snaps/
Location: ${HARMONY_DATA_DIR}/snaps/ (default: ./snaps/)
Structure:
snaps/
├── 0a/
│ ├── 0a1b2c3d4e5f6g7h8i9j.json
│ └── 0a9f8e7d6c5b4a3.json
├── 1b/
│ └── 1b2c3d4e5f6g7h8i9j0a.json
└── ...
File naming: First 2 characters of hash as directory, full hash as filename
File content: Raw HTTP response body (JSON, HTML, XML, etc.)
Cache Operations
Store Response
interface CacheEntry {
url: string;
timestamp: number;
response: Response;
}
async function storeResponse(entry: CacheEntry): Promise<void> {
const key = hashUrl(entry.url);
const bodyPath = `snaps/${key.slice(0, 2)}/${key}.json`;
// Store body to file
await Deno.writeTextFile(bodyPath, await entry.response.text());
// Store metadata to database
await db.execute(`
INSERT INTO snaps (key, url, timestamp, status, headers, body_path, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?)
`, [
key,
entry.url,
entry.timestamp,
entry.response.status,
JSON.stringify(Object.fromEntries(entry.response.headers)),
bodyPath,
Date.now()
]);
}
Retrieve Response
async function getResponse(url: string, timestamp?: number): Promise<Response | null> {
const key = hashUrl(url);
let query = `SELECT * FROM snaps WHERE key = ?`;
const params = [key];
if (timestamp) {
// Permalink mode: exact timestamp match
query += ` AND timestamp = ?`;
params.push(timestamp);
} else {
// Normal mode: most recent within cache duration
const maxAge = 24 * 60 * 60 * 1000; // 24 hours
query += ` AND created_at > ? ORDER BY created_at DESC LIMIT 1`;
params.push(Date.now() - maxAge);
}
const row = await db.queryOne(query, params);
if (!row) return null;
// Read body from file
const body = await Deno.readTextFile(row.body_path);
// Reconstruct Response object
return new Response(body, {
status: row.status,
headers: JSON.parse(row.headers)
});
}
Cache Policy
Default Policy
- Duration: 24 hours
- Eviction: No automatic eviction (manual cleanup required)
- Size limit: No enforced limit (grows indefinitely)
Permalink Policy
- Duration: Indefinite (never evicted)
- Purpose: Enable reproducible results
- Lookup: Exact timestamp match
Cache Key Generation
function hashUrl(url: string): string {
// Normalize URL
const normalized = new URL(url);
normalized.searchParams.sort(); // Consistent parameter order
// Hash normalized URL
const encoder = new TextEncoder();
const data = encoder.encode(normalized.toString());
const hashBuffer = await crypto.subtle.digest('SHA-256', data);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
Cache Management
Manual Cleanup
No automatic cleanup. Users must manually delete old cache entries:
# Delete cache older than 30 days
sqlite3 snaps.db "DELETE FROM snaps WHERE created_at < $(date -d '30 days ago' +%s)000"
# Clean up orphaned files
find snaps/ -type f -mtime +30 -delete
Cache Statistics
# Total cache entries
sqlite3 snaps.db "SELECT COUNT(*) FROM snaps"
# Cache size
du -sh snaps/
# Entries per provider
sqlite3 snaps.db "SELECT url, COUNT(*) FROM snaps GROUP BY url"
MBID Cache
Purpose
Cache MusicBrainz ID (MBID) mappings for external URLs to avoid repeated API calls.
Storage Location
- Development:
localStorage(persistent across sessions) - Production:
sessionStorage(cleared on browser close)
Rationale: Development benefits from persistent cache, production prioritizes fresh data.
Cache Structure
interface MBIDCache {
[externalUrl: string]: MBIDCacheEntry;
}
interface MBIDCacheEntry {
mbid: string;
type: 'release' | 'release-group' | 'recording' | 'artist' | 'label';
cached: number; // Unix timestamp
}
Cache Operations
Store MBID Mapping
function cacheMBID(url: string, mbid: string, type: string): void {
const cache = getMBIDCache();
cache[url] = {
mbid,
type,
cached: Date.now()
};
setMBIDCache(cache);
}
function getMBIDCache(): MBIDCache {
const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage;
const cached = storage.getItem('harmony_mbid_cache');
return cached ? JSON.parse(cached) : {};
}
function setMBIDCache(cache: MBIDCache): void {
const storage = DENO_DEPLOYMENT_ID ? sessionStorage : localStorage;
storage.setItem('harmony_mbid_cache', JSON.stringify(cache));
}
Retrieve MBID Mapping
function getCachedMBID(url: string): MBIDCacheEntry | null {
const cache = getMBIDCache();
const entry = cache[url];
if (!entry) return null;
// Check if cache is stale (24 hours)
const maxAge = 24 * 60 * 60 * 1000;
if (Date.now() - entry.cached > maxAge) {
delete cache[url];
setMBIDCache(cache);
return null;
}
return entry;
}
Batch MBID Lookup
MusicBrainz API supports batch URL lookup (up to 100 URLs per request):
async function resolveMBIDs(urls: string[]): Promise<Map<string, MBIDCacheEntry>> {
const results = new Map<string, MBIDCacheEntry>();
// Check cache first
const uncached: string[] = [];
for (const url of urls) {
const cached = getCachedMBID(url);
if (cached) {
results.set(url, cached);
} else {
uncached.push(url);
}
}
// Batch lookup uncached URLs (100 at a time)
for (let i = 0; i < uncached.length; i += 100) {
const batch = uncached.slice(i, i + 100);
const params = batch.map(url => `resource=${encodeURIComponent(url)}`).join('&');
const response = await fetch(`https://musicbrainz.org/ws/2/url?${params}`);
const data = await response.json();
// Parse response and cache results
for (const urlData of data.urls) {
const mbid = urlData.relations[0]?.release?.id;
const type = urlData.relations[0]?.type;
if (mbid) {
cacheMBID(urlData.resource, mbid, type);
results.set(urlData.resource, { mbid, type, cached: Date.now() });
}
}
}
return results;
}
Core Data Model: HarmonyRelease
Schema Definition
Location: harmonizer/types.ts (273 lines)
Full Interface:
interface HarmonyRelease {
// ===== Basic Metadata =====
title: string;
artists: ArtistCreditName[];
gtin?: string; // Global Trade Item Number (barcode)
// ===== Media and Tracks =====
media: HarmonyMedium[];
// ===== Release Details =====
language?: string; // ISO 639-3 code
script?: string; // ISO 15924 code
status?: ReleaseStatus;
types: ReleaseType[];
releaseDate?: PartialDate;
// ===== Commercial Information =====
labels: Label[];
packaging?: PackagingType;
copyright?: string;
// ===== Distribution =====
availableIn?: string[]; // ISO 3166-1 alpha-2 country codes
excludedFrom?: string[]; // ISO 3166-1 alpha-2 country codes
// ===== Visual Assets =====
images: Image[];
// ===== External Links =====
externalLinks: ExternalLink[];
// ===== Metadata About Metadata =====
info: ReleaseInfo;
}
Sub-Structures
ArtistCreditName
interface ArtistCreditName {
name: string; // Artist name
creditedName?: string; // Alternative credit (e.g., "feat. Artist")
joinPhrase?: string; // Separator (e.g., " & ", " feat. ", " vs. ")
mbid?: string; // MusicBrainz artist ID
}
Example:
[
{ name: "Artist A", joinPhrase: " & " },
{ name: "Artist B", joinPhrase: " feat. " },
{ name: "Artist C", creditedName: "Artist C (DJ Set)" }
]
Rendering: "Artist A & Artist B feat. Artist C (DJ Set)"
HarmonyMedium
interface HarmonyMedium {
title?: string; // Medium title (e.g., "Disc 1: The Album")
format?: MediumFormat;
position: number; // 1-indexed
tracks: HarmonyTrack[];
}
enum MediumFormat {
CD = 'CD',
Vinyl = 'Vinyl',
Digital = 'Digital Media',
Cassette = 'Cassette',
DVD = 'DVD',
BluRay = 'Blu-ray',
Other = 'Other'
}
HarmonyTrack
interface HarmonyTrack {
title: string;
artists?: ArtistCreditName[]; // Track-specific artists (overrides release artists)
position: number; // 1-indexed within medium
length?: number; // Duration in milliseconds
isrc?: string; // International Standard Recording Code
}
Example:
{
title: "Track Title",
artists: [{ name: "Track Artist" }],
position: 1,
length: 245000, // 4:05
isrc: "USRC17607839"
}
Label
interface Label {
name: string;
catalogNumber?: string;
mbid?: string; // MusicBrainz label ID
}
Example:
[
{ name: "Record Label", catalogNumber: "RL-12345" },
{ name: "Distributor", catalogNumber: "DIST-67890" }
]
Image
interface Image {
url: string;
types: ImageType[];
width?: number;
height?: number;
comment?: string;
}
enum ImageType {
Front = 'front',
Back = 'back',
Medium = 'medium',
Tray = 'tray',
Booklet = 'booklet',
Obi = 'obi',
Spine = 'spine',
Track = 'track',
Liner = 'liner',
Sticker = 'sticker',
Poster = 'poster',
Watermark = 'watermark',
Raw = 'raw',
Unedited = 'unedited'
}
Example:
[
{
url: "https://i.scdn.co/image/ab67616d0000b273...",
types: [ImageType.Front],
width: 2000,
height: 2000
},
{
url: "https://e-cdn-images.dzcdn.net/images/cover/...",
types: [ImageType.Front],
width: 1400,
height: 1400,
comment: "Deezer cover"
}
]
ExternalLink
interface ExternalLink {
url: string;
types: LinkType[];
}
enum LinkType {
Streaming = 'streaming',
Purchase = 'purchase',
Download = 'download',
License = 'license',
Crowdfunding = 'crowdfunding',
Other = 'other'
}
Example:
[
{
url: "https://open.spotify.com/album/xyz",
types: [LinkType.Streaming]
},
{
url: "https://bandcamp.com/album/xyz",
types: [LinkType.Streaming, LinkType.Purchase]
}
]
ReleaseInfo
interface ReleaseInfo {
providers: string[]; // Provider names that contributed data
messages: Message[]; // Warnings, errors, info messages
sourceMap?: SourceMap; // Property -> provider mapping (only in MergedHarmonyRelease)
incompatibleData?: IncompatibilityInfo; // Conflicts (only in MergedHarmonyRelease)
}
interface Message {
level: 'error' | 'warning' | 'info';
text: string;
provider?: string;
}
Example:
{
providers: ["spotify", "deezer", "itunes"],
messages: [
{
level: "warning",
text: "Release date conflict: Spotify (2014-11-24) vs iTunes (2014-11-25)",
provider: "itunes"
},
{
level: "info",
text: "Using Spotify value (higher preference)"
}
]
}
Enumerations
ReleaseStatus
enum ReleaseStatus {
Official = 'official',
Promotion = 'promotion',
Bootleg = 'bootleg',
PseudoRelease = 'pseudo-release'
}
ReleaseType
enum ReleaseType {
// Primary types
Album = 'album',
Single = 'single',
EP = 'ep',
Broadcast = 'broadcast',
Other = 'other',
// Secondary types
Compilation = 'compilation',
Soundtrack = 'soundtrack',
Spokenword = 'spokenword',
Interview = 'interview',
Audiobook = 'audiobook',
AudioDrama = 'audio drama',
Live = 'live',
Remix = 'remix',
DJMix = 'dj-mix',
Mixtape = 'mixtape',
Demo = 'demo',
FieldRecording = 'field recording'
}
Usage: Array of types (primary + secondary)
types: [ReleaseType.Album, ReleaseType.Live] // Live album
types: [ReleaseType.EP, ReleaseType.Remix] // Remix EP
PackagingType
enum PackagingType {
JewelCase = 'jewel case',
SlimJewelCase = 'slim jewel case',
Digipak = 'digipak',
Cardboard = 'cardboard/paper sleeve',
KeepCase = 'keep case',
None = 'none',
Other = 'other'
}
PartialDate
interface PartialDate {
year: number;
month?: number; // 1-12
day?: number; // 1-31
}
Examples:
{ year: 2014 } // Year only
{ year: 2014, month: 11 } // Year and month
{ year: 2014, month: 11, day: 24 } // Full date
Serialization:
function serializePartialDate(date: PartialDate): string {
let result = date.year.toString();
if (date.month) {
result += `-${date.month.toString().padStart(2, '0')}`;
if (date.day) {
result += `-${date.day.toString().padStart(2, '0')}`;
}
}
return result;
}
// Examples:
// { year: 2014 } -> "2014"
// { year: 2014, month: 11 } -> "2014-11"
// { year: 2014, month: 11, day: 24 } -> "2014-11-24"
MergedHarmonyRelease
Extends HarmonyRelease with merge metadata.
interface MergedHarmonyRelease extends HarmonyRelease {
info: ReleaseInfo & {
sourceMap: SourceMap;
incompatibleData?: IncompatibilityInfo;
};
}
interface SourceMap {
[propertyPath: string]: string; // Property path -> provider name
}
interface IncompatibilityInfo {
conflicts: Conflict[];
warnings: string[];
}
interface Conflict {
property: string;
values: ConflictValue[];
}
interface ConflictValue {
provider: string;
value: any;
}
Example:
{
title: "Album Title",
releaseDate: { year: 2014, month: 11, day: 24 },
// ... other fields
info: {
providers: ["spotify", "deezer", "itunes"],
sourceMap: {
"title": "spotify",
"releaseDate": "spotify",
"gtin": "deezer",
"media[0].tracks[0].isrc": "spotify"
},
incompatibleData: {
conflicts: [
{
property: "releaseDate",
values: [
{ provider: "spotify", value: { year: 2014, month: 11, day: 24 } },
{ provider: "itunes", value: { year: 2014, month: 11, day: 25 } }
]
}
],
warnings: [
"Release date conflict resolved using Spotify value (higher preference)"
]
},
messages: []
}
}
Data Transformations
Provider-Specific to HarmonyRelease
Each provider implements a harmonize() method:
// Spotify example (conceptual)
class SpotifyProvider {
harmonize(spotifyAlbum: SpotifyAlbum): HarmonyRelease {
return {
title: spotifyAlbum.name,
artists: spotifyAlbum.artists.map(a => ({
name: a.name,
mbid: undefined // Spotify doesn't provide MBIDs
})),
gtin: spotifyAlbum.external_ids?.upc,
media: [{
format: MediumFormat.Digital,
position: 1,
tracks: spotifyAlbum.tracks.items.map((t, i) => ({
title: t.name,
position: i + 1,
length: t.duration_ms,
isrc: t.external_ids?.isrc
}))
}],
releaseDate: this.parseDate(spotifyAlbum.release_date),
types: this.inferTypes(spotifyAlbum.album_type),
images: spotifyAlbum.images.map(img => ({
url: img.url,
types: [ImageType.Front],
width: img.width,
height: img.height
})),
externalLinks: [{
url: spotifyAlbum.external_urls.spotify,
types: [LinkType.Streaming]
}],
labels: spotifyAlbum.label ? [{ name: spotifyAlbum.label }] : [],
copyright: spotifyAlbum.copyrights?.[0]?.text,
availableIn: spotifyAlbum.available_markets,
info: {
providers: ["spotify"],
messages: []
}
};
}
}
HarmonyRelease to MusicBrainz Format
Location: musicbrainz/seeding.ts
interface MusicBrainzRelease {
name: string;
artist_credit: MBArtistCredit[];
barcode?: string;
release_events: MBReleaseEvent[];
labels: MBLabel[];
mediums: MBMedium[];
release_group: {
primary_type: string;
secondary_types: string[];
};
language?: string;
script?: string;
packaging?: string;
annotation?: string;
}
function convertToMusicBrainz(release: MergedHarmonyRelease): MusicBrainzRelease {
return {
name: release.title,
artist_credit: release.artists.map(a => ({
name: a.name,
credited_name: a.creditedName,
join_phrase: a.joinPhrase || '',
mbid: a.mbid
})),
barcode: release.gtin,
release_events: convertReleaseEvents(release.releaseDate, release.availableIn),
labels: release.labels.map(l => ({
name: l.name,
catalog_number: l.catalogNumber,
mbid: l.mbid
})),
mediums: release.media.map(m => ({
format: m.format,
position: m.position,
title: m.title,
tracks: m.tracks.map(t => ({
title: t.title,
position: t.position,
length: t.length,
isrc: t.isrc,
artist_credit: t.artists?.map(a => ({
name: a.name,
join_phrase: a.joinPhrase || ''
}))
}))
})),
release_group: {
primary_type: release.types.find(t => isPrimaryType(t)) || 'album',
secondary_types: release.types.filter(t => !isPrimaryType(t))
},
language: release.language,
script: release.script,
packaging: release.packaging,
annotation: buildAnnotation(release)
};
}
Data Validation
GTIN Validation
function validateGTIN(gtin: string): boolean {
// GTIN-13 (EAN-13) validation
if (!/^\d{13}$/.test(gtin)) return false;
// Check digit validation
const digits = gtin.split('').map(Number);
const checksum = digits.slice(0, 12).reduce((sum, digit, i) => {
return sum + digit * (i % 2 === 0 ? 1 : 3);
}, 0);
const checkDigit = (10 - (checksum % 10)) % 10;
return checkDigit === digits[12];
}
ISRC Validation
function validateISRC(isrc: string): boolean {
// Format: CC-XXX-YY-NNNNN
// CC: Country code (2 letters)
// XXX: Registrant code (3 alphanumeric)
// YY: Year (2 digits)
// NNNNN: Designation code (5 digits)
return /^[A-Z]{2}-?[A-Z0-9]{3}-?\d{2}-?\d{5}$/.test(isrc);
}
function normalizeISRC(isrc: string): string {
// Remove hyphens
return isrc.replace(/-/g, '');
}
Date Validation
function validatePartialDate(date: PartialDate): boolean {
if (date.year < 1000 || date.year > 9999) return false;
if (date.month && (date.month < 1 || date.month > 12)) return false;
if (date.day && (date.day < 1 || date.day > 31)) return false;
// Validate day for specific month
if (date.month && date.day) {
const daysInMonth = new Date(date.year, date.month, 0).getDate();
if (date.day > daysInMonth) return false;
}
return true;
}
Data Size Estimates
Typical HarmonyRelease Size
Single-disc album (12 tracks):
- JSON serialized: ~15-25 KB
- With images: ~20-30 KB (image URLs only, not image data)
Multi-disc compilation (50 tracks):
- JSON serialized: ~50-80 KB
Cache Size Estimates
Provider response sizes:
- Spotify album: ~10-20 KB
- Deezer album: ~15-25 KB
- iTunes album: ~20-30 KB
- Bandcamp page: ~50-100 KB (HTML)
Daily cache growth (100 lookups/day):
- Database: ~50 KB (metadata only)
- Files: ~2-5 MB (response bodies)
Annual cache size (36,500 lookups/year):
- Database: ~18 MB
- Files: ~730 MB - 1.8 GB
No Migrations
Since Harmony has no traditional database, there are no schema migrations.
Schema evolution strategy:
- Add new optional fields to
HarmonyReleaseinterface - Update provider
harmonize()methods to populate new fields - Update merge algorithm to handle new fields
- No data migration required (old cached responses still valid)
Breaking changes:
- Rename or remove fields in
HarmonyRelease - Clear cache (delete
snaps.dbandsnaps/) - Rebuild cache on next lookup
Summary
Harmony's data architecture demonstrates:
- Cache-first design:
snap_storageeliminates need for traditional database - Permalink system: Timestamp-based cache replay enables reproducibility
- Rich data model: 273-line
HarmonyReleaseschema covers all metadata needs - Type safety: Full TypeScript coverage ensures data consistency
- No migrations: Schema evolution without data migration complexity
- Stateless processing: All transformations in-memory, no persistent state
- MBID caching: Efficient batch lookup reduces MusicBrainz API calls
This architecture is ideal for read-heavy, stateless applications where reproducibility and API compliance are priorities.