a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
12 KiB
12 KiB
Music Metadata Aggregator - Internal Structure
A clean, unified schema for storing music metadata from multiple sources.
Generated Diagrams
| Format | File |
|---|---|
| PNG | proposed_erd.png |
| SVG | proposed_erd.svg |
| Source | proposed_erd.puml |
Design Principles
- Single internal structure - All data from any source converts to this schema
- Provenance tracking - Each record tracks
sourceandsource_id - Duplicate tolerance - Same entity from different sources stored separately
- Read-optimized - Denormalized where beneficial for API serving
Entity Overview
Core Entities
| Entity | Purpose | Key Fields |
|---|---|---|
| artists | Musicians, bands, producers | name, type, country, formed_date |
| works | Compositions (the song as written) | title, type, language |
| tracks | Recordings (specific version of a work) | title, duration, isrc, explicit |
| albums | Releases (LP, EP, Single, Compilation) | title, type, release_date, upc |
| labels | Record labels/publishers | name, country |
| genres | Hierarchical categorization | name, parent_id |
Relationships
| Relationship | Purpose | Key Fields |
|---|---|---|
| track_artists | Who performed on a track | role (primary, featured, remixer) |
| album_artists | Who is credited on an album | role, position |
| album_tracks | Track listing on an album | disc_number, track_number |
| work_artists | Who wrote/composed a work | role (composer, lyricist) |
| artist_genres | Artist's genres | - |
| album_genres | Album's genres | - |
| similar_artists | Artist recommendations | score (0-1) |
Content
| Entity | Purpose |
|---|---|
| lyrics | Song lyrics (plain + synced) |
| playlists | Collections of tracks |
| playlist_tracks | Tracks in a playlist |
External IDs
| Entity | Purpose |
|---|---|
| artist_external_ids | Spotify ID, MusicBrainz MBID, etc. |
| album_external_ids | Provider-specific album IDs |
| track_external_ids | Provider-specific track IDs |
Data Flow
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Spotify │ │ MusicBrainz │ │ Manual │
│ API │ │ API │ │ Input │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────────┼───────────────────┘
│
▼
┌────────────────────────┐
│ Normalize & Convert │
│ to Internal Schema │
└────────────┬───────────┘
│
▼
┌────────────────────────┐
│ Internal Database │
│ (artists, albums, │
│ tracks, works...) │
└────────────────────────┘
Entity Relationships
┌─────────┐
│ works │ (composition)
└────┬────┘
│ recorded as
▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ artists │◄───────►│ tracks │◄───────►│ albums │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
│ ┌────┴────┐ │
│ │ lyrics │ │
│ └─────────┘ │
│ │
└──────────────┬───────────────────────┘
│
┌────┴────┐
│ labels │
└─────────┘
Provenance Strategy
Each record includes:
source- Provider name (e.g., "spotify", "musicbrainz", "manual")source_id- ID in the source systemcreated_at/updated_at- Timestamps
External IDs tables allow linking the same entity across providers:
-- Find all Spotify IDs for an artist
SELECT source_id, url
FROM artist_external_ids
WHERE artist_id = ? AND source = 'spotify';
-- Find artist by MusicBrainz MBID
SELECT a.*
FROM artists a
JOIN artist_external_ids e ON a.id = e.artist_id
WHERE e.source = 'musicbrainz' AND e.source_id = ?;
Role Types
Track Artist Roles
primary- Main performerfeatured- Featured artist ("feat.")remixer- Remixed the trackproducer- Produced the track
Work Artist Roles
composer- Wrote the musiclyricist- Wrote the lyricswriter- Wrote both (singer-songwriter)
Album Artist Roles
primary- Main artistcompiler- Compilation curatorvarious- Various artists
SQL Schema
-- Core Entities
CREATE TABLE artists (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
sort_name TEXT,
artist_type TEXT,
country TEXT,
formed_date DATE,
disbanded_date DATE,
description TEXT,
image_url TEXT,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE works (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
work_type TEXT,
language TEXT,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE tracks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
work_id UUID REFERENCES works(id),
title TEXT NOT NULL,
duration_ms INT,
isrc TEXT,
explicit BOOLEAN DEFAULT false,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE labels (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
country TEXT,
founded_date DATE,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE albums (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
label_id UUID REFERENCES labels(id),
title TEXT NOT NULL,
album_type TEXT,
release_date DATE,
upc TEXT,
total_tracks INT,
total_discs INT DEFAULT 1,
cover_url TEXT,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE genres (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL UNIQUE,
parent_id UUID REFERENCES genres(id)
);
-- Relationships
CREATE TABLE track_artists (
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
role TEXT DEFAULT 'primary',
position INT DEFAULT 0,
PRIMARY KEY (track_id, artist_id, role)
);
CREATE TABLE album_artists (
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
role TEXT DEFAULT 'primary',
position INT DEFAULT 0,
PRIMARY KEY (album_id, artist_id, role)
);
CREATE TABLE album_tracks (
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
disc_number INT DEFAULT 1,
track_number INT NOT NULL,
PRIMARY KEY (album_id, track_id)
);
CREATE TABLE work_artists (
work_id UUID REFERENCES works(id) ON DELETE CASCADE,
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
role TEXT DEFAULT 'writer',
PRIMARY KEY (work_id, artist_id, role)
);
CREATE TABLE artist_genres (
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
genre_id UUID REFERENCES genres(id) ON DELETE CASCADE,
PRIMARY KEY (artist_id, genre_id)
);
CREATE TABLE album_genres (
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
genre_id UUID REFERENCES genres(id) ON DELETE CASCADE,
PRIMARY KEY (album_id, genre_id)
);
CREATE TABLE similar_artists (
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
similar_artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
score REAL DEFAULT 0.5,
PRIMARY KEY (artist_id, similar_artist_id)
);
-- Content
CREATE TABLE lyrics (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
content TEXT,
synced_content JSONB,
language TEXT,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE playlists (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
description TEXT,
is_public BOOLEAN DEFAULT true,
cover_url TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE playlist_tracks (
playlist_id UUID REFERENCES playlists(id) ON DELETE CASCADE,
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
position INT NOT NULL,
added_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (playlist_id, track_id)
);
-- External IDs
CREATE TABLE artist_external_ids (
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
source TEXT NOT NULL,
source_id TEXT NOT NULL,
url TEXT,
fetched_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (artist_id, source, source_id)
);
CREATE TABLE album_external_ids (
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
source TEXT NOT NULL,
source_id TEXT NOT NULL,
url TEXT,
fetched_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (album_id, source, source_id)
);
CREATE TABLE track_external_ids (
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
source TEXT NOT NULL,
source_id TEXT NOT NULL,
url TEXT,
fetched_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (track_id, source, source_id)
);
-- Indexes for common queries
CREATE INDEX idx_artists_name ON artists(name);
CREATE INDEX idx_artists_source ON artists(source, source_id);
CREATE INDEX idx_tracks_isrc ON tracks(isrc) WHERE isrc IS NOT NULL;
CREATE INDEX idx_tracks_source ON tracks(source, source_id);
CREATE INDEX idx_albums_upc ON albums(upc) WHERE upc IS NOT NULL;
CREATE INDEX idx_albums_source ON albums(source, source_id);
CREATE INDEX idx_albums_release_date ON albums(release_date);
Example Queries
Get album with all tracks and artists
SELECT
a.title as album_title,
a.release_date,
t.title as track_title,
t.duration_ms,
at.track_number,
ar.name as artist_name,
ta.role
FROM albums a
JOIN album_tracks at ON a.id = at.album_id
JOIN tracks t ON at.track_id = t.id
JOIN track_artists ta ON t.id = ta.track_id
JOIN artists ar ON ta.artist_id = ar.id
WHERE a.id = ?
ORDER BY at.disc_number, at.track_number, ta.position;
Find all versions of a song (via work)
SELECT
t.title,
t.duration_ms,
a.name as artist,
al.title as album,
al.release_date
FROM works w
JOIN tracks t ON t.work_id = w.id
JOIN track_artists ta ON t.id = ta.track_id AND ta.role = 'primary'
JOIN artists a ON ta.artist_id = a.id
LEFT JOIN album_tracks alt ON t.id = alt.track_id
LEFT JOIN albums al ON alt.album_id = al.id
WHERE w.title ILIKE '%bohemian rhapsody%'
ORDER BY al.release_date;
Get artist discography
SELECT
al.title,
al.album_type,
al.release_date,
al.total_tracks
FROM artists ar
JOIN album_artists aa ON ar.id = aa.artist_id
JOIN albums al ON aa.album_id = al.id
WHERE ar.id = ? AND aa.role = 'primary'
ORDER BY al.release_date DESC;
