Files
metadata-agregator/docs/PROPOSED_ERD.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

12 KiB

Music Metadata Aggregator - Internal Structure

A clean, unified schema for storing music metadata from multiple sources.

Generated Diagrams

Format File
PNG proposed_erd.png
SVG proposed_erd.svg
Source proposed_erd.puml

ERD Diagram


Design Principles

  1. Single internal structure - All data from any source converts to this schema
  2. Provenance tracking - Each record tracks source and source_id
  3. Duplicate tolerance - Same entity from different sources stored separately
  4. Read-optimized - Denormalized where beneficial for API serving

Entity Overview

Core Entities

Entity Purpose Key Fields
artists Musicians, bands, producers name, type, country, formed_date
works Compositions (the song as written) title, type, language
tracks Recordings (specific version of a work) title, duration, isrc, explicit
albums Releases (LP, EP, Single, Compilation) title, type, release_date, upc
labels Record labels/publishers name, country
genres Hierarchical categorization name, parent_id

Relationships

Relationship Purpose Key Fields
track_artists Who performed on a track role (primary, featured, remixer)
album_artists Who is credited on an album role, position
album_tracks Track listing on an album disc_number, track_number
work_artists Who wrote/composed a work role (composer, lyricist)
artist_genres Artist's genres -
album_genres Album's genres -
similar_artists Artist recommendations score (0-1)

Content

Entity Purpose
lyrics Song lyrics (plain + synced)
playlists Collections of tracks
playlist_tracks Tracks in a playlist

External IDs

Entity Purpose
artist_external_ids Spotify ID, MusicBrainz MBID, etc.
album_external_ids Provider-specific album IDs
track_external_ids Provider-specific track IDs

Data Flow

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Spotify   │     │ MusicBrainz │     │   Manual    │
│    API      │     │    API      │     │   Input     │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                   │                   │
       └───────────────────┼───────────────────┘
                           │
                           ▼
              ┌────────────────────────┐
              │   Normalize & Convert  │
              │   to Internal Schema   │
              └────────────┬───────────┘
                           │
                           ▼
              ┌────────────────────────┐
              │   Internal Database    │
              │   (artists, albums,    │
              │    tracks, works...)   │
              └────────────────────────┘

Entity Relationships

                    ┌─────────┐
                    │  works  │ (composition)
                    └────┬────┘
                         │ recorded as
                         ▼
┌─────────┐         ┌─────────┐         ┌─────────┐
│ artists │◄───────►│ tracks  │◄───────►│ albums  │
└────┬────┘         └────┬────┘         └────┬────┘
     │                   │                   │
     │              ┌────┴────┐              │
     │              │ lyrics  │              │
     │              └─────────┘              │
     │                                       │
     └──────────────┬───────────────────────┘
                    │
               ┌────┴────┐
               │ labels  │
               └─────────┘

Provenance Strategy

Each record includes:

  • source - Provider name (e.g., "spotify", "musicbrainz", "manual")
  • source_id - ID in the source system
  • created_at / updated_at - Timestamps

External IDs tables allow linking the same entity across providers:

-- Find all Spotify IDs for an artist
SELECT source_id, url 
FROM artist_external_ids 
WHERE artist_id = ? AND source = 'spotify';

-- Find artist by MusicBrainz MBID
SELECT a.* 
FROM artists a
JOIN artist_external_ids e ON a.id = e.artist_id
WHERE e.source = 'musicbrainz' AND e.source_id = ?;

Role Types

Track Artist Roles

  • primary - Main performer
  • featured - Featured artist ("feat.")
  • remixer - Remixed the track
  • producer - Produced the track

Work Artist Roles

  • composer - Wrote the music
  • lyricist - Wrote the lyrics
  • writer - Wrote both (singer-songwriter)

Album Artist Roles

  • primary - Main artist
  • compiler - Compilation curator
  • various - Various artists

SQL Schema

-- Core Entities
CREATE TABLE artists (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    sort_name TEXT,
    artist_type TEXT,
    country TEXT,
    formed_date DATE,
    disbanded_date DATE,
    description TEXT,
    image_url TEXT,
    source TEXT NOT NULL,
    source_id TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE works (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    title TEXT NOT NULL,
    work_type TEXT,
    language TEXT,
    source TEXT NOT NULL,
    source_id TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE tracks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    work_id UUID REFERENCES works(id),
    title TEXT NOT NULL,
    duration_ms INT,
    isrc TEXT,
    explicit BOOLEAN DEFAULT false,
    source TEXT NOT NULL,
    source_id TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE labels (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    country TEXT,
    founded_date DATE,
    source TEXT NOT NULL,
    source_id TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE albums (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    label_id UUID REFERENCES labels(id),
    title TEXT NOT NULL,
    album_type TEXT,
    release_date DATE,
    upc TEXT,
    total_tracks INT,
    total_discs INT DEFAULT 1,
    cover_url TEXT,
    source TEXT NOT NULL,
    source_id TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE genres (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL UNIQUE,
    parent_id UUID REFERENCES genres(id)
);

-- Relationships
CREATE TABLE track_artists (
    track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
    artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
    role TEXT DEFAULT 'primary',
    position INT DEFAULT 0,
    PRIMARY KEY (track_id, artist_id, role)
);

CREATE TABLE album_artists (
    album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
    artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
    role TEXT DEFAULT 'primary',
    position INT DEFAULT 0,
    PRIMARY KEY (album_id, artist_id, role)
);

CREATE TABLE album_tracks (
    album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
    track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
    disc_number INT DEFAULT 1,
    track_number INT NOT NULL,
    PRIMARY KEY (album_id, track_id)
);

CREATE TABLE work_artists (
    work_id UUID REFERENCES works(id) ON DELETE CASCADE,
    artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
    role TEXT DEFAULT 'writer',
    PRIMARY KEY (work_id, artist_id, role)
);

CREATE TABLE artist_genres (
    artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
    genre_id UUID REFERENCES genres(id) ON DELETE CASCADE,
    PRIMARY KEY (artist_id, genre_id)
);

CREATE TABLE album_genres (
    album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
    genre_id UUID REFERENCES genres(id) ON DELETE CASCADE,
    PRIMARY KEY (album_id, genre_id)
);

CREATE TABLE similar_artists (
    artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
    similar_artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
    score REAL DEFAULT 0.5,
    PRIMARY KEY (artist_id, similar_artist_id)
);

-- Content
CREATE TABLE lyrics (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
    content TEXT,
    synced_content JSONB,
    language TEXT,
    source TEXT NOT NULL,
    source_id TEXT,
    created_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE playlists (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    description TEXT,
    is_public BOOLEAN DEFAULT true,
    cover_url TEXT,
    created_at TIMESTAMPTZ DEFAULT now(),
    updated_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE playlist_tracks (
    playlist_id UUID REFERENCES playlists(id) ON DELETE CASCADE,
    track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
    position INT NOT NULL,
    added_at TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (playlist_id, track_id)
);

-- External IDs
CREATE TABLE artist_external_ids (
    artist_id UUID REFERENCES artists(id) ON DELETE CASCADE,
    source TEXT NOT NULL,
    source_id TEXT NOT NULL,
    url TEXT,
    fetched_at TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (artist_id, source, source_id)
);

CREATE TABLE album_external_ids (
    album_id UUID REFERENCES albums(id) ON DELETE CASCADE,
    source TEXT NOT NULL,
    source_id TEXT NOT NULL,
    url TEXT,
    fetched_at TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (album_id, source, source_id)
);

CREATE TABLE track_external_ids (
    track_id UUID REFERENCES tracks(id) ON DELETE CASCADE,
    source TEXT NOT NULL,
    source_id TEXT NOT NULL,
    url TEXT,
    fetched_at TIMESTAMPTZ DEFAULT now(),
    PRIMARY KEY (track_id, source, source_id)
);

-- Indexes for common queries
CREATE INDEX idx_artists_name ON artists(name);
CREATE INDEX idx_artists_source ON artists(source, source_id);
CREATE INDEX idx_tracks_isrc ON tracks(isrc) WHERE isrc IS NOT NULL;
CREATE INDEX idx_tracks_source ON tracks(source, source_id);
CREATE INDEX idx_albums_upc ON albums(upc) WHERE upc IS NOT NULL;
CREATE INDEX idx_albums_source ON albums(source, source_id);
CREATE INDEX idx_albums_release_date ON albums(release_date);

Example Queries

Get album with all tracks and artists

SELECT 
    a.title as album_title,
    a.release_date,
    t.title as track_title,
    t.duration_ms,
    at.track_number,
    ar.name as artist_name,
    ta.role
FROM albums a
JOIN album_tracks at ON a.id = at.album_id
JOIN tracks t ON at.track_id = t.id
JOIN track_artists ta ON t.id = ta.track_id
JOIN artists ar ON ta.artist_id = ar.id
WHERE a.id = ?
ORDER BY at.disc_number, at.track_number, ta.position;

Find all versions of a song (via work)

SELECT 
    t.title,
    t.duration_ms,
    a.name as artist,
    al.title as album,
    al.release_date
FROM works w
JOIN tracks t ON t.work_id = w.id
JOIN track_artists ta ON t.id = ta.track_id AND ta.role = 'primary'
JOIN artists a ON ta.artist_id = a.id
LEFT JOIN album_tracks alt ON t.id = alt.track_id
LEFT JOIN albums al ON alt.album_id = al.id
WHERE w.title ILIKE '%bohemian rhapsody%'
ORDER BY al.release_date;

Get artist discography

SELECT 
    al.title,
    al.album_type,
    al.release_date,
    al.total_tracks
FROM artists ar
JOIN album_artists aa ON ar.id = aa.artist_id
JOIN albums al ON aa.album_id = al.id
WHERE ar.id = ? AND aa.role = 'primary'
ORDER BY al.release_date DESC;