a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
413 lines
12 KiB
Markdown
413 lines
12 KiB
Markdown
# Music Metadata Aggregator - Internal Structure
|
|
|
|
A clean, unified schema for storing music metadata from multiple sources.
|
|
|
|
## Generated Diagrams
|
|
|
|
| Format | File |
|
|
|--------|------|
|
|
| **PNG** | [proposed_erd.png](./proposed_erd.png) |
|
|
| **SVG** | [proposed_erd.svg](./proposed_erd.svg) |
|
|
| **Source** | [proposed_erd.puml](./proposed_erd.puml) |
|
|
|
|

|
|
|
|
---
|
|
|
|
## Design Principles
|
|
|
|
1. **Single internal structure** - All data from any source converts to this schema
|
|
2. **Provenance tracking** - Each record tracks `source` and `source_id`
|
|
3. **Duplicate tolerance** - Same entity from different sources stored separately
|
|
4. **Read-optimized** - Denormalized where beneficial for API serving
|
|
|
|
---
|
|
|
|
## Entity Overview
|
|
|
|
### Core Entities
|
|
|
|
| Entity | Purpose | Key Fields |
|
|
|--------|---------|------------|
|
|
| **artists** | Musicians, bands, producers | name, type, country, formed_date |
|
|
| **works** | Compositions (the song as written) | title, type, language |
|
|
| **tracks** | Recordings (specific version of a work) | title, duration, isrc, explicit |
|
|
| **albums** | Releases (LP, EP, Single, Compilation) | title, type, release_date, upc |
|
|
| **labels** | Record labels/publishers | name, country |
|
|
| **genres** | Hierarchical categorization | name, parent_id |
|
|
|
|
### Relationships
|
|
|
|
| Relationship | Purpose | Key Fields |
|
|
|--------------|---------|------------|
|
|
| **track_artists** | Who performed on a track | role (primary, featured, remixer) |
|
|
| **album_artists** | Who is credited on an album | role, position |
|
|
| **album_tracks** | Track listing on an album | disc_number, track_number |
|
|
| **work_artists** | Who wrote/composed a work | role (composer, lyricist) |
|
|
| **artist_genres** | Artist's genres | - |
|
|
| **album_genres** | Album's genres | - |
|
|
| **similar_artists** | Artist recommendations | score (0-1) |
|
|
|
|
### Content
|
|
|
|
| Entity | Purpose |
|
|
|--------|---------|
|
|
| **lyrics** | Song lyrics (plain + synced) |
|
|
| **playlists** | Collections of tracks |
|
|
| **playlist_tracks** | Tracks in a playlist |
|
|
|
|
### External IDs
|
|
|
|
| Entity | Purpose |
|
|
|--------|---------|
|
|
| **artist_external_ids** | Spotify ID, MusicBrainz MBID, etc. |
|
|
| **album_external_ids** | Provider-specific album IDs |
|
|
| **track_external_ids** | Provider-specific track IDs |
|
|
|
|
---
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ Spotify │ │ MusicBrainz │ │ Manual │
|
|
│ API │ │ API │ │ Input │
|
|
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
|
|
│ │ │
|
|
└───────────────────┼───────────────────┘
|
|
│
|
|
▼
|
|
┌────────────────────────┐
|
|
│ Normalize & Convert │
|
|
│ to Internal Schema │
|
|
└────────────┬───────────┘
|
|
│
|
|
▼
|
|
┌────────────────────────┐
|
|
│ Internal Database │
|
|
│ (artists, albums, │
|
|
│ tracks, works...) │
|
|
└────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Entity Relationships
|
|
|
|
```
|
|
┌─────────┐
|
|
│ works │ (composition)
|
|
└────┬────┘
|
|
│ recorded as
|
|
▼
|
|
┌─────────┐ ┌─────────┐ ┌─────────┐
|
|
│ artists │◄───────►│ tracks │◄───────►│ albums │
|
|
└────┬────┘ └────┬────┘ └────┬────┘
|
|
│ │ │
|
|
│ ┌────┴────┐ │
|
|
│ │ lyrics │ │
|
|
│ └─────────┘ │
|
|
│ │
|
|
└──────────────┬───────────────────────┘
|
|
│
|
|
┌────┴────┐
|
|
│ labels │
|
|
└─────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Provenance Strategy
|
|
|
|
Each record includes:
|
|
- `source` - Provider name (e.g., "spotify", "musicbrainz", "manual")
|
|
- `source_id` - ID in the source system
|
|
- `created_at` / `updated_at` - Timestamps
|
|
|
|
**External IDs tables** allow linking the same entity across providers:
|
|
```sql
|
|
-- Find all Spotify IDs for an artist
|
|
SELECT source_id, url
|
|
FROM artist_external_ids
|
|
WHERE artist_id = ? AND source = 'spotify';
|
|
|
|
-- Find artist by MusicBrainz MBID
|
|
SELECT a.*
|
|
FROM artists a
|
|
JOIN artist_external_ids e ON a.id = e.artist_id
|
|
WHERE e.source = 'musicbrainz' AND e.source_id = ?;
|
|
```
|
|
|
|
---
|
|
|
|
## Role Types
|
|
|
|
### Track Artist Roles
|
|
- `primary` - Main performer
|
|
- `featured` - Featured artist ("feat.")
|
|
- `remixer` - Remixed the track
|
|
- `producer` - Produced the track
|
|
|
|
### Work Artist Roles
|
|
- `composer` - Wrote the music
|
|
- `lyricist` - Wrote the lyrics
|
|
- `writer` - Wrote both (singer-songwriter)
|
|
|
|
### Album Artist Roles
|
|
- `primary` - Main artist
|
|
- `compiler` - Compilation curator
|
|
- `various` - Various artists
|
|
|
|
---
|
|
|
|
## SQL Schema
|
|
|
|
```sql
|
|
-- Core Entities
|
|
CREATE TABLE artists (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
name TEXT NOT NULL,
|
|
sort_name TEXT,
|
|
artist_type TEXT,
|
|
country TEXT,
|
|
formed_date DATE,
|
|
disbanded_date DATE,
|
|
description TEXT,
|
|
image_url TEXT,
|
|
source TEXT NOT NULL,
|
|
source_id TEXT,
|
|
created_at TIMESTAMPTZ DEFAULT now(),
|
|
updated_at TIMESTAMPTZ DEFAULT now()
|
|
);
|
|
|
|
CREATE TABLE works (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
title TEXT NOT NULL,
|
|
work_type TEXT,
|
|
language TEXT,
|
|
source TEXT NOT NULL,
|
|
source_id TEXT,
|
|
created_at TIMESTAMPTZ DEFAULT now(),
|
|
updated_at TIMESTAMPTZ DEFAULT now()
|
|
);
|
|
|
|
CREATE TABLE tracks (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
work_id UUID REFERENCES works(id),
|
|
title TEXT NOT NULL,
|
|
duration_ms INT,
|
|
isrc TEXT,
|
|
explicit BOOLEAN DEFAULT false,
|
|
source TEXT NOT NULL,
|
|
source_id TEXT,
|
|
created_at TIMESTAMPTZ DEFAULT now(),
|
|
updated_at TIMESTAMPTZ DEFAULT now()
|
|
);
|
|
|
|
CREATE TABLE labels (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
name TEXT NOT NULL,
|
|
country TEXT,
|
|
founded_date DATE,
|
|
source TEXT NOT NULL,
|
|
source_id TEXT,
|
|
created_at TIMESTAMPTZ DEFAULT now(),
|
|
updated_at TIMESTAMPTZ DEFAULT now()
|
|
);
|
|
|
|
CREATE TABLE albums (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
label_id UUID REFERENCES labels(id),
|
|
title TEXT NOT NULL,
|
|
album_type TEXT,
|
|
release_date DATE,
|
|
upc TEXT,
|
|
total_tracks INT,
|
|
total_discs INT DEFAULT 1,
|
|
cover_url TEXT,
|
|
source TEXT NOT NULL,
|
|
source_id TEXT,
|
|
created_at TIMESTAMPTZ DEFAULT now(),
|
|
updated_at TIMESTAMPTZ DEFAULT now()
|
|
);
|
|
|
|
CREATE TABLE genres (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
name TEXT NOT NULL UNIQUE,
|
|
parent_id UUID REFERENCES genres(id)
|
|
);
|
|
|
|
-- Relationships
|
|
CREATE TABLE track_artists (
|
|
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
|
|
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
|
|
role TEXT DEFAULT 'primary',
|
|
position INT DEFAULT 0,
|
|
PRIMARY KEY (track_id, artist_id, role)
|
|
);
|
|
|
|
CREATE TABLE album_artists (
|
|
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
|
|
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
|
|
role TEXT DEFAULT 'primary',
|
|
position INT DEFAULT 0,
|
|
PRIMARY KEY (album_id, artist_id, role)
|
|
);
|
|
|
|
CREATE TABLE album_tracks (
|
|
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
|
|
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
|
|
disc_number INT DEFAULT 1,
|
|
track_number INT NOT NULL,
|
|
PRIMARY KEY (album_id, track_id)
|
|
);
|
|
|
|
CREATE TABLE work_artists (
|
|
work_id UUID REFERENCES works(id) ON DELETE CASCADE,
|
|
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
|
|
role TEXT DEFAULT 'writer',
|
|
PRIMARY KEY (work_id, artist_id, role)
|
|
);
|
|
|
|
CREATE TABLE artist_genres (
|
|
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
|
|
genre_id UUID REFERENCES genres(id) ON DELETE CASCADE,
|
|
PRIMARY KEY (artist_id, genre_id)
|
|
);
|
|
|
|
CREATE TABLE album_genres (
|
|
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
|
|
genre_id UUID REFERENCES genres(id) ON DELETE CASCADE,
|
|
PRIMARY KEY (album_id, genre_id)
|
|
);
|
|
|
|
CREATE TABLE similar_artists (
|
|
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
|
|
similar_artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
|
|
score REAL DEFAULT 0.5,
|
|
PRIMARY KEY (artist_id, similar_artist_id)
|
|
);
|
|
|
|
-- Content
|
|
CREATE TABLE lyrics (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
|
|
content TEXT,
|
|
synced_content JSONB,
|
|
language TEXT,
|
|
source TEXT NOT NULL,
|
|
source_id TEXT,
|
|
created_at TIMESTAMPTZ DEFAULT now()
|
|
);
|
|
|
|
CREATE TABLE playlists (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
name TEXT NOT NULL,
|
|
description TEXT,
|
|
is_public BOOLEAN DEFAULT true,
|
|
cover_url TEXT,
|
|
created_at TIMESTAMPTZ DEFAULT now(),
|
|
updated_at TIMESTAMPTZ DEFAULT now()
|
|
);
|
|
|
|
CREATE TABLE playlist_tracks (
|
|
playlist_id UUID REFERENCES playlists(id) ON DELETE CASCADE,
|
|
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
|
|
position INT NOT NULL,
|
|
added_at TIMESTAMPTZ DEFAULT now(),
|
|
PRIMARY KEY (playlist_id, track_id)
|
|
);
|
|
|
|
-- External IDs
|
|
CREATE TABLE artist_external_ids (
|
|
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
|
|
source TEXT NOT NULL,
|
|
source_id TEXT NOT NULL,
|
|
url TEXT,
|
|
fetched_at TIMESTAMPTZ DEFAULT now(),
|
|
PRIMARY KEY (artist_id, source, source_id)
|
|
);
|
|
|
|
CREATE TABLE album_external_ids (
|
|
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
|
|
source TEXT NOT NULL,
|
|
source_id TEXT NOT NULL,
|
|
url TEXT,
|
|
fetched_at TIMESTAMPTZ DEFAULT now(),
|
|
PRIMARY KEY (album_id, source, source_id)
|
|
);
|
|
|
|
CREATE TABLE track_external_ids (
|
|
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
|
|
source TEXT NOT NULL,
|
|
source_id TEXT NOT NULL,
|
|
url TEXT,
|
|
fetched_at TIMESTAMPTZ DEFAULT now(),
|
|
PRIMARY KEY (track_id, source, source_id)
|
|
);
|
|
|
|
-- Indexes for common queries
|
|
CREATE INDEX idx_artists_name ON artists(name);
|
|
CREATE INDEX idx_artists_source ON artists(source, source_id);
|
|
CREATE INDEX idx_tracks_isrc ON tracks(isrc) WHERE isrc IS NOT NULL;
|
|
CREATE INDEX idx_tracks_source ON tracks(source, source_id);
|
|
CREATE INDEX idx_albums_upc ON albums(upc) WHERE upc IS NOT NULL;
|
|
CREATE INDEX idx_albums_source ON albums(source, source_id);
|
|
CREATE INDEX idx_albums_release_date ON albums(release_date);
|
|
```
|
|
|
|
---
|
|
|
|
## Example Queries
|
|
|
|
### Get album with all tracks and artists
|
|
```sql
|
|
SELECT
|
|
a.title as album_title,
|
|
a.release_date,
|
|
t.title as track_title,
|
|
t.duration_ms,
|
|
at.track_number,
|
|
ar.name as artist_name,
|
|
ta.role
|
|
FROM albums a
|
|
JOIN album_tracks at ON a.id = at.album_id
|
|
JOIN tracks t ON at.track_id = t.id
|
|
JOIN track_artists ta ON t.id = ta.track_id
|
|
JOIN artists ar ON ta.artist_id = ar.id
|
|
WHERE a.id = ?
|
|
ORDER BY at.disc_number, at.track_number, ta.position;
|
|
```
|
|
|
|
### Find all versions of a song (via work)
|
|
```sql
|
|
SELECT
|
|
t.title,
|
|
t.duration_ms,
|
|
a.name as artist,
|
|
al.title as album,
|
|
al.release_date
|
|
FROM works w
|
|
JOIN tracks t ON t.work_id = w.id
|
|
JOIN track_artists ta ON t.id = ta.track_id AND ta.role = 'primary'
|
|
JOIN artists a ON ta.artist_id = a.id
|
|
LEFT JOIN album_tracks alt ON t.id = alt.track_id
|
|
LEFT JOIN albums al ON alt.album_id = al.id
|
|
WHERE w.title ILIKE '%bohemian rhapsody%'
|
|
ORDER BY al.release_date;
|
|
```
|
|
|
|
### Get artist discography
|
|
```sql
|
|
SELECT
|
|
al.title,
|
|
al.album_type,
|
|
al.release_date,
|
|
al.total_tracks
|
|
FROM artists ar
|
|
JOIN album_artists aa ON ar.id = aa.artist_id
|
|
JOIN albums al ON aa.album_id = al.id
|
|
WHERE ar.id = ? AND aa.role = 'primary'
|
|
ORDER BY al.release_date DESC;
|
|
```
|