Files
metadata-agregator/docs/PROPOSED_ERD.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

413 lines
12 KiB
Markdown

# Music Metadata Aggregator - Internal Structure
A clean, unified schema for storing music metadata from multiple sources.
## Generated Diagrams
| Format | File |
|--------|------|
| **PNG** | [proposed_erd.png](./proposed_erd.png) |
| **SVG** | [proposed_erd.svg](./proposed_erd.svg) |
| **Source** | [proposed_erd.puml](./proposed_erd.puml) |
![ERD Diagram](./proposed_erd.png)
---
## Design Principles
1. **Single internal structure** - All data from any source converts to this schema
2. **Provenance tracking** - Each record tracks `source` and `source_id`
3. **Duplicate tolerance** - Same entity from different sources stored separately
4. **Read-optimized** - Denormalized where beneficial for API serving
---
## Entity Overview
### Core Entities
| Entity | Purpose | Key Fields |
|--------|---------|------------|
| **artists** | Musicians, bands, producers | name, type, country, formed_date |
| **works** | Compositions (the song as written) | title, type, language |
| **tracks** | Recordings (specific version of a work) | title, duration, isrc, explicit |
| **albums** | Releases (LP, EP, Single, Compilation) | title, type, release_date, upc |
| **labels** | Record labels/publishers | name, country |
| **genres** | Hierarchical categorization | name, parent_id |
### Relationships
| Relationship | Purpose | Key Fields |
|--------------|---------|------------|
| **track_artists** | Who performed on a track | role (primary, featured, remixer) |
| **album_artists** | Who is credited on an album | role, position |
| **album_tracks** | Track listing on an album | disc_number, track_number |
| **work_artists** | Who wrote/composed a work | role (composer, lyricist) |
| **artist_genres** | Artist's genres | - |
| **album_genres** | Album's genres | - |
| **similar_artists** | Artist recommendations | score (0-1) |
### Content
| Entity | Purpose |
|--------|---------|
| **lyrics** | Song lyrics (plain + synced) |
| **playlists** | Collections of tracks |
| **playlist_tracks** | Tracks in a playlist |
### External IDs
| Entity | Purpose |
|--------|---------|
| **artist_external_ids** | Spotify ID, MusicBrainz MBID, etc. |
| **album_external_ids** | Provider-specific album IDs |
| **track_external_ids** | Provider-specific track IDs |
---
## Data Flow
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Spotify │ │ MusicBrainz │ │ Manual │
│ API │ │ API │ │ Input │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────────┼───────────────────┘
┌────────────────────────┐
│ Normalize & Convert │
│ to Internal Schema │
└────────────┬───────────┘
┌────────────────────────┐
│ Internal Database │
│ (artists, albums, │
│ tracks, works...) │
└────────────────────────┘
```
---
## Entity Relationships
```
┌─────────┐
│ works │ (composition)
└────┬────┘
│ recorded as
┌─────────┐ ┌─────────┐ ┌─────────┐
│ artists │◄───────►│ tracks │◄───────►│ albums │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
│ ┌────┴────┐ │
│ │ lyrics │ │
│ └─────────┘ │
│ │
└──────────────┬───────────────────────┘
┌────┴────┐
│ labels │
└─────────┘
```
---
## Provenance Strategy
Each record includes:
- `source` - Provider name (e.g., "spotify", "musicbrainz", "manual")
- `source_id` - ID in the source system
- `created_at` / `updated_at` - Timestamps
**External IDs tables** allow linking the same entity across providers:
```sql
-- Find all Spotify IDs for an artist
SELECT source_id, url
FROM artist_external_ids
WHERE artist_id = ? AND source = 'spotify';
-- Find artist by MusicBrainz MBID
SELECT a.*
FROM artists a
JOIN artist_external_ids e ON a.id = e.artist_id
WHERE e.source = 'musicbrainz' AND e.source_id = ?;
```
---
## Role Types
### Track Artist Roles
- `primary` - Main performer
- `featured` - Featured artist ("feat.")
- `remixer` - Remixed the track
- `producer` - Produced the track
### Work Artist Roles
- `composer` - Wrote the music
- `lyricist` - Wrote the lyrics
- `writer` - Wrote both (singer-songwriter)
### Album Artist Roles
- `primary` - Main artist
- `compiler` - Compilation curator
- `various` - Various artists
---
## SQL Schema
```sql
-- Core Entities
CREATE TABLE artists (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
sort_name TEXT,
artist_type TEXT,
country TEXT,
formed_date DATE,
disbanded_date DATE,
description TEXT,
image_url TEXT,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE works (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
title TEXT NOT NULL,
work_type TEXT,
language TEXT,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE tracks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
work_id UUID REFERENCES works(id),
title TEXT NOT NULL,
duration_ms INT,
isrc TEXT,
explicit BOOLEAN DEFAULT false,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE labels (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
country TEXT,
founded_date DATE,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE albums (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
label_id UUID REFERENCES labels(id),
title TEXT NOT NULL,
album_type TEXT,
release_date DATE,
upc TEXT,
total_tracks INT,
total_discs INT DEFAULT 1,
cover_url TEXT,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE genres (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL UNIQUE,
parent_id UUID REFERENCES genres(id)
);
-- Relationships
CREATE TABLE track_artists (
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
role TEXT DEFAULT 'primary',
position INT DEFAULT 0,
PRIMARY KEY (track_id, artist_id, role)
);
CREATE TABLE album_artists (
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
role TEXT DEFAULT 'primary',
position INT DEFAULT 0,
PRIMARY KEY (album_id, artist_id, role)
);
CREATE TABLE album_tracks (
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
disc_number INT DEFAULT 1,
track_number INT NOT NULL,
PRIMARY KEY (album_id, track_id)
);
CREATE TABLE work_artists (
work_id UUID REFERENCES works(id) ON DELETE CASCADE,
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
role TEXT DEFAULT 'writer',
PRIMARY KEY (work_id, artist_id, role)
);
CREATE TABLE artist_genres (
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
genre_id UUID REFERENCES genres(id) ON DELETE CASCADE,
PRIMARY KEY (artist_id, genre_id)
);
CREATE TABLE album_genres (
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
genre_id UUID REFERENCES genres(id) ON DELETE CASCADE,
PRIMARY KEY (album_id, genre_id)
);
CREATE TABLE similar_artists (
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
similar_artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
score REAL DEFAULT 0.5,
PRIMARY KEY (artist_id, similar_artist_id)
);
-- Content
CREATE TABLE lyrics (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
content TEXT,
synced_content JSONB,
language TEXT,
source TEXT NOT NULL,
source_id TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE playlists (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
description TEXT,
is_public BOOLEAN DEFAULT true,
cover_url TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE playlist_tracks (
playlist_id UUID REFERENCES playlists(id) ON DELETE CASCADE,
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
position INT NOT NULL,
added_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (playlist_id, track_id)
);
-- External IDs
CREATE TABLE artist_external_ids (
artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
source TEXT NOT NULL,
source_id TEXT NOT NULL,
url TEXT,
fetched_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (artist_id, source, source_id)
);
CREATE TABLE album_external_ids (
album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
source TEXT NOT NULL,
source_id TEXT NOT NULL,
url TEXT,
fetched_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (album_id, source, source_id)
);
CREATE TABLE track_external_ids (
track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
source TEXT NOT NULL,
source_id TEXT NOT NULL,
url TEXT,
fetched_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (track_id, source, source_id)
);
-- Indexes for common queries
CREATE INDEX idx_artists_name ON artists(name);
CREATE INDEX idx_artists_source ON artists(source, source_id);
CREATE INDEX idx_tracks_isrc ON tracks(isrc) WHERE isrc IS NOT NULL;
CREATE INDEX idx_tracks_source ON tracks(source, source_id);
CREATE INDEX idx_albums_upc ON albums(upc) WHERE upc IS NOT NULL;
CREATE INDEX idx_albums_source ON albums(source, source_id);
CREATE INDEX idx_albums_release_date ON albums(release_date);
```
---
## Example Queries
### Get album with all tracks and artists
```sql
SELECT
a.title as album_title,
a.release_date,
t.title as track_title,
t.duration_ms,
at.track_number,
ar.name as artist_name,
ta.role
FROM albums a
JOIN album_tracks at ON a.id = at.album_id
JOIN tracks t ON at.track_id = t.id
JOIN track_artists ta ON t.id = ta.track_id
JOIN artists ar ON ta.artist_id = ar.id
WHERE a.id = ?
ORDER BY at.disc_number, at.track_number, ta.position;
```
### Find all versions of a song (via work)
```sql
SELECT
t.title,
t.duration_ms,
a.name as artist,
al.title as album,
al.release_date
FROM works w
JOIN tracks t ON t.work_id = w.id
JOIN track_artists ta ON t.id = ta.track_id AND ta.role = 'primary'
JOIN artists a ON ta.artist_id = a.id
LEFT JOIN album_tracks alt ON t.id = alt.track_id
LEFT JOIN albums al ON alt.album_id = al.id
WHERE w.title ILIKE '%bohemian rhapsody%'
ORDER BY al.release_date;
```
### Get artist discography
```sql
SELECT
al.title,
al.album_type,
al.release_date,
al.total_tracks
FROM artists ar
JOIN album_artists aa ON ar.id = aa.artist_id
JOIN albums al ON aa.album_id = al.id
WHERE ar.id = ? AND aa.role = 'primary'
ORDER BY al.release_date DESC;
```