# Music Metadata Aggregator - Internal Structure A clean, unified schema for storing music metadata from multiple sources. ## Generated Diagrams | Format | File | |--------|------| | **PNG** | [proposed_erd.png](./proposed_erd.png) | | **SVG** | [proposed_erd.svg](./proposed_erd.svg) | | **Source** | [proposed_erd.puml](./proposed_erd.puml) | ![ERD Diagram](./proposed_erd.png) --- ## Design Principles 1. **Single internal structure** - All data from any source converts to this schema 2. **Provenance tracking** - Each record tracks `source` and `source_id` 3. **Duplicate tolerance** - Same entity from different sources stored separately 4. **Read-optimized** - Denormalized where beneficial for API serving --- ## Entity Overview ### Core Entities | Entity | Purpose | Key Fields | |--------|---------|------------| | **artists** | Musicians, bands, producers | name, type, country, formed_date | | **works** | Compositions (the song as written) | title, type, language | | **tracks** | Recordings (specific version of a work) | title, duration, isrc, explicit | | **albums** | Releases (LP, EP, Single, Compilation) | title, type, release_date, upc | | **labels** | Record labels/publishers | name, country | | **genres** | Hierarchical categorization | name, parent_id | ### Relationships | Relationship | Purpose | Key Fields | |--------------|---------|------------| | **track_artists** | Who performed on a track | role (primary, featured, remixer) | | **album_artists** | Who is credited on an album | role, position | | **album_tracks** | Track listing on an album | disc_number, track_number | | **work_artists** | Who wrote/composed a work | role (composer, lyricist) | | **artist_genres** | Artist's genres | - | | **album_genres** | Album's genres | - | | **similar_artists** | Artist recommendations | score (0-1) | ### Content | Entity | Purpose | |--------|---------| | **lyrics** | Song lyrics (plain + synced) | | **playlists** | Collections of tracks | | **playlist_tracks** | Tracks in a playlist | ### External IDs | Entity | Purpose | |--------|---------| | **artist_external_ids** | Spotify ID, MusicBrainz MBID, etc. | | **album_external_ids** | Provider-specific album IDs | | **track_external_ids** | Provider-specific track IDs | --- ## Data Flow ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Spotify │ │ MusicBrainz │ │ Manual │ │ API │ │ API │ │ Input │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ └───────────────────┼───────────────────┘ │ ▼ ┌────────────────────────┐ │ Normalize & Convert │ │ to Internal Schema │ └────────────┬───────────┘ │ ▼ ┌────────────────────────┐ │ Internal Database │ │ (artists, albums, │ │ tracks, works...) │ └────────────────────────┘ ``` --- ## Entity Relationships ``` ┌─────────┐ │ works │ (composition) └────┬────┘ │ recorded as ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ artists │◄───────►│ tracks │◄───────►│ albums │ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ ┌────┴────┐ │ │ │ lyrics │ │ │ └─────────┘ │ │ │ └──────────────┬───────────────────────┘ │ ┌────┴────┐ │ labels │ └─────────┘ ``` --- ## Provenance Strategy Each record includes: - `source` - Provider name (e.g., "spotify", "musicbrainz", "manual") - `source_id` - ID in the source system - `created_at` / `updated_at` - Timestamps **External IDs tables** allow linking the same entity across providers: ```sql -- Find all Spotify IDs for an artist SELECT source_id, url FROM artist_external_ids WHERE artist_id = ? AND source = 'spotify'; -- Find artist by MusicBrainz MBID SELECT a.* FROM artists a JOIN artist_external_ids e ON a.id = e.artist_id WHERE e.source = 'musicbrainz' AND e.source_id = ?; ``` --- ## Role Types ### Track Artist Roles - `primary` - Main performer - `featured` - Featured artist ("feat.") - `remixer` - Remixed the track - `producer` - Produced the track ### Work Artist Roles - `composer` - Wrote the music - `lyricist` - Wrote the lyrics - `writer` - Wrote both (singer-songwriter) ### Album Artist Roles - `primary` - Main artist - `compiler` - Compilation curator - `various` - Various artists --- ## SQL Schema ```sql -- Core Entities CREATE TABLE artists ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT NOT NULL, sort_name TEXT, artist_type TEXT, country TEXT, formed_date DATE, disbanded_date DATE, description TEXT, image_url TEXT, source TEXT NOT NULL, source_id TEXT, created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT now() ); CREATE TABLE works ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), title TEXT NOT NULL, work_type TEXT, language TEXT, source TEXT NOT NULL, source_id TEXT, created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT now() ); CREATE TABLE tracks ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), work_id UUID REFERENCES works(id), title TEXT NOT NULL, duration_ms INT, isrc TEXT, explicit BOOLEAN DEFAULT false, source TEXT NOT NULL, source_id TEXT, created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT now() ); CREATE TABLE labels ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT NOT NULL, country TEXT, founded_date DATE, source TEXT NOT NULL, source_id TEXT, created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT now() ); CREATE TABLE albums ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), label_id UUID REFERENCES labels(id), title TEXT NOT NULL, album_type TEXT, release_date DATE, upc TEXT, total_tracks INT, total_discs INT DEFAULT 1, cover_url TEXT, source TEXT NOT NULL, source_id TEXT, created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT now() ); CREATE TABLE genres ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT NOT NULL UNIQUE, parent_id UUID REFERENCES genres(id) ); -- Relationships CREATE TABLE track_artists ( track_id UUID REFERENCES tracks(id) ON DELETE CASCADE, artist_id UUID REFERENCES artists(id) ON DELETE CASCADE, role TEXT DEFAULT 'primary', position INT DEFAULT 0, PRIMARY KEY (track_id, artist_id, role) ); CREATE TABLE album_artists ( album_id UUID REFERENCES albums(id) ON DELETE CASCADE, artist_id UUID REFERENCES artists(id) ON DELETE CASCADE, role TEXT DEFAULT 'primary', position INT DEFAULT 0, PRIMARY KEY (album_id, artist_id, role) ); CREATE TABLE album_tracks ( album_id UUID REFERENCES albums(id) ON DELETE CASCADE, track_id UUID REFERENCES tracks(id) ON DELETE CASCADE, disc_number INT DEFAULT 1, track_number INT NOT NULL, PRIMARY KEY (album_id, track_id) ); CREATE TABLE work_artists ( work_id UUID REFERENCES works(id) ON DELETE CASCADE, artist_id UUID REFERENCES artists(id) ON DELETE CASCADE, role TEXT DEFAULT 'writer', PRIMARY KEY (work_id, artist_id, role) ); CREATE TABLE artist_genres ( artist_id UUID REFERENCES artists(id) ON DELETE CASCADE, genre_id UUID REFERENCES genres(id) ON DELETE CASCADE, PRIMARY KEY (artist_id, genre_id) ); CREATE TABLE album_genres ( album_id UUID REFERENCES albums(id) ON DELETE CASCADE, genre_id UUID REFERENCES genres(id) ON DELETE CASCADE, PRIMARY KEY (album_id, genre_id) ); CREATE TABLE similar_artists ( artist_id UUID REFERENCES artists(id) ON DELETE CASCADE, similar_artist_id UUID REFERENCES artists(id) ON DELETE CASCADE, score REAL DEFAULT 0.5, PRIMARY KEY (artist_id, similar_artist_id) ); -- Content CREATE TABLE lyrics ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), track_id UUID REFERENCES tracks(id) ON DELETE CASCADE, content TEXT, synced_content JSONB, language TEXT, source TEXT NOT NULL, source_id TEXT, created_at TIMESTAMPTZ DEFAULT now() ); CREATE TABLE playlists ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT NOT NULL, description TEXT, is_public BOOLEAN DEFAULT true, cover_url TEXT, created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT now() ); CREATE TABLE playlist_tracks ( playlist_id UUID REFERENCES playlists(id) ON DELETE CASCADE, track_id UUID REFERENCES tracks(id) ON DELETE CASCADE, position INT NOT NULL, added_at TIMESTAMPTZ DEFAULT now(), PRIMARY KEY (playlist_id, track_id) ); -- External IDs CREATE TABLE artist_external_ids ( artist_id UUID REFERENCES artists(id) ON DELETE CASCADE, source TEXT NOT NULL, source_id TEXT NOT NULL, url TEXT, fetched_at TIMESTAMPTZ DEFAULT now(), PRIMARY KEY (artist_id, source, source_id) ); CREATE TABLE album_external_ids ( album_id UUID REFERENCES albums(id) ON DELETE CASCADE, source TEXT NOT NULL, source_id TEXT NOT NULL, url TEXT, fetched_at TIMESTAMPTZ DEFAULT now(), PRIMARY KEY (album_id, source, source_id) ); CREATE TABLE track_external_ids ( track_id UUID REFERENCES tracks(id) ON DELETE CASCADE, source TEXT NOT NULL, source_id TEXT NOT NULL, url TEXT, fetched_at TIMESTAMPTZ DEFAULT now(), PRIMARY KEY (track_id, source, source_id) ); -- Indexes for common queries CREATE INDEX idx_artists_name ON artists(name); CREATE INDEX idx_artists_source ON artists(source, source_id); CREATE INDEX idx_tracks_isrc ON tracks(isrc) WHERE isrc IS NOT NULL; CREATE INDEX idx_tracks_source ON tracks(source, source_id); CREATE INDEX idx_albums_upc ON albums(upc) WHERE upc IS NOT NULL; CREATE INDEX idx_albums_source ON albums(source, source_id); CREATE INDEX idx_albums_release_date ON albums(release_date); ``` --- ## Example Queries ### Get album with all tracks and artists ```sql SELECT a.title as album_title, a.release_date, t.title as track_title, t.duration_ms, at.track_number, ar.name as artist_name, ta.role FROM albums a JOIN album_tracks at ON a.id = at.album_id JOIN tracks t ON at.track_id = t.id JOIN track_artists ta ON t.id = ta.track_id JOIN artists ar ON ta.artist_id = ar.id WHERE a.id = ? ORDER BY at.disc_number, at.track_number, ta.position; ``` ### Find all versions of a song (via work) ```sql SELECT t.title, t.duration_ms, a.name as artist, al.title as album, al.release_date FROM works w JOIN tracks t ON t.work_id = w.id JOIN track_artists ta ON t.id = ta.track_id AND ta.role = 'primary' JOIN artists a ON ta.artist_id = a.id LEFT JOIN album_tracks alt ON t.id = alt.track_id LEFT JOIN albums al ON alt.album_id = al.id WHERE w.title ILIKE '%bohemian rhapsody%' ORDER BY al.release_date; ``` ### Get artist discography ```sql SELECT al.title, al.album_type, al.release_date, al.total_tracks FROM artists ar JOIN album_artists aa ON ar.id = aa.artist_id JOIN albums al ON aa.album_id = al.id WHERE ar.id = ? AND aa.role = 'primary' ORDER BY al.release_date DESC; ```