Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

18 KiB

Aggregators - Entity Relationship Diagrams

Entity structure analysis for the 5 Tier 2 aggregator projects.

Overview

Project Type Persistence Entity Model
Harmony Multi-source merger In-memory Harmonized release structure
GraphBrainz GraphQL layer Cache only MusicBrainz schema mirror
Bedrock-API gRPC aggregator PostgreSQL Unified streaming model
minim Python library None API response wrappers
MusicMetaLinker Entity linker None Alignment/linking model

1. Harmony

Purpose: Harmonizes release metadata from 10+ providers into unified format for MusicBrainz seeding.

Storage: In-memory only (no database). Cached snapshots via permalinks.

erDiagram
    HarmonyRelease {
        string title
        GTIN gtin
        Language language
        ScriptFrequency script
        ReleaseStatus status
        ReleaseDate releaseDate
        ReleasePackaging packaging
        string credits
        string copyright
        CountryCode[] availableIn
        CountryCode[] excludedFrom
    }

    HarmonyMedium {
        string title
        int number
        MediumFormat format
    }

    HarmonyTrack {
        string title
        string number
        int length_ms
        TrackType type
        string isrc
        CountryCode[] availableIn
    }

    ArtistCreditName {
        string name
        string creditedName
        string joinPhrase
        string mbid
    }

    Label {
        string name
        string catalogNumber
        string mbid
    }

    Artwork {
        string url
        string thumbUrl
        ArtworkType[] types
        string comment
        string provider
    }

    ExternalLink {
        string url
        LinkType[] types
    }

    ExternalEntityId {
        string provider
        string type
        string id
        CountryCode region
        LinkType[] linkTypes
    }

    ProviderInfo {
        string name
        string internalName
        string id
        string url
        string apiUrl
        int processingTime
        int cacheTime
        string[] linkedReleases
        bool isTemplate
    }

    ReleaseInfo {
        ProviderMessage[] messages
    }

    ResolvableEntity {
        string name
        string mbid
    }

    HarmonyRelease ||--o{ HarmonyMedium : "media"
    HarmonyRelease ||--o{ ArtistCreditName : "artists"
    HarmonyRelease ||--o{ Label : "labels"
    HarmonyRelease ||--o{ Artwork : "images"
    HarmonyRelease ||--o{ ExternalLink : "externalLinks"
    HarmonyRelease ||--o| ResolvableEntity : "releaseGroup"
    HarmonyRelease ||--|| ReleaseInfo : "info"

    HarmonyMedium ||--o{ HarmonyTrack : "tracklist"

    HarmonyTrack ||--o{ ArtistCreditName : "artists"
    HarmonyTrack ||--o| ResolvableEntity : "recording"

    ArtistCreditName ||--o{ ExternalEntityId : "externalIds"
    Label ||--o{ ExternalEntityId : "externalIds"

    ReleaseInfo ||--o{ ProviderInfo : "providers"

Key Entities

Entity Description
HarmonyRelease Unified release from multiple providers
HarmonyMedium Disc/media within release (CD, Vinyl, Digital)
HarmonyTrack Individual track with ISRC
ArtistCreditName Artist credit with join phrases ("feat.", "&")
Label Record label with catalog number
ProviderInfo Metadata about each source provider used

2. GraphBrainz

Purpose: GraphQL interface to MusicBrainz with extension support (Discogs, Spotify, Last.fm, etc.).

Storage: Configurable cache (Redis/memory). No persistent database - proxies MusicBrainz API.

erDiagram
    Artist {
        string id
        string mbid
        string name
        string sortName
        string disambiguation
        string country
        string gender
        string type
        string[] ipis
        string[] isnis
    }

    ReleaseGroup {
        string id
        string mbid
        string title
        string disambiguation
        Date firstReleaseDate
        ReleaseGroupType primaryType
        ReleaseGroupType[] secondaryTypes
    }

    Release {
        string id
        string mbid
        string title
        string disambiguation
        Date date
        string country
        string asin
        string barcode
        ReleaseStatus status
        string packaging
        string quality
    }

    Recording {
        string id
        string mbid
        string title
        string disambiguation
        string[] isrcs
        int length
        bool video
    }

    Track {
        string mbid
        string title
        int position
        string number
        int length
    }

    Label {
        string id
        string mbid
        string name
        string sortName
        string disambiguation
        string country
        int labelCode
        string type
        string[] ipis
    }

    Work {
        string id
        string mbid
        string title
        string disambiguation
        string[] iswcs
        string language
        string type
    }

    Area {
        string id
        string mbid
        string name
        string type
    }

    ArtistCredit {
        string name
        string joinPhrase
    }

    Media {
        int position
        string format
        int trackCount
    }

    ReleaseEvent {
        Date date
        string country
    }

    LifeSpan {
        Date begin
        Date end
        bool ended
    }

    Relationship {
        string type
        string direction
        string[] attributes
    }

    Tag {
        string name
        int count
    }

    Rating {
        int voteCount
        float value
    }

    Artist ||--o{ ReleaseGroup : "releaseGroups"
    Artist ||--o{ Release : "releases"
    Artist ||--o{ Recording : "recordings"
    Artist ||--o{ Work : "works"
    Artist ||--o| Area : "area"
    Artist ||--o| Area : "beginArea"
    Artist ||--o| Area : "endArea"
    Artist ||--|| LifeSpan : "lifeSpan"
    Artist ||--o{ Tag : "tags"
    Artist ||--o| Rating : "rating"
    Artist ||--o{ Relationship : "relationships"

    ReleaseGroup ||--o{ Release : "releases"
    ReleaseGroup ||--o{ ArtistCredit : "artistCredits"
    ReleaseGroup ||--o{ Tag : "tags"
    ReleaseGroup ||--o| Rating : "rating"

    Release ||--o{ Media : "media"
    Release ||--o{ ReleaseEvent : "releaseEvents"
    Release ||--o{ ArtistCredit : "artistCredits"
    Release ||--o{ Label : "labels"
    Release ||--o{ Recording : "recordings"
    Release ||--o{ Tag : "tags"

    Media ||--o{ Track : "tracks"

    Track ||--|| Recording : "recording"

    Recording ||--o{ ArtistCredit : "artistCredits"
    Recording ||--o{ Release : "releases"
    Recording ||--o{ Tag : "tags"
    Recording ||--o| Rating : "rating"

    Label ||--o{ Release : "releases"
    Label ||--o| Area : "area"
    Label ||--|| LifeSpan : "lifeSpan"
    Label ||--o{ Tag : "tags"

    Work ||--o{ Artist : "artists"
    Work ||--o{ Tag : "tags"

    ArtistCredit }o--|| Artist : "artist"

Key Entities

Entity Description
Artist Musician, band, or music professional
ReleaseGroup Logical album concept (all editions)
Release Specific edition (CD, vinyl, digital)
Recording Distinct audio (linked to tracks)
Track Recording on a specific medium
Work Abstract composition (song as written)
Label Record label/imprint
Area Geographic region

3. Bedrock-API

Purpose: Multi-platform streaming aggregator with cross-platform track bridging.

Storage: PostgreSQL (users, listening stats). Providers are queried in real-time.

erDiagram
    Track {
        string id "platform:native_id"
        string title
        string artist
        string album_title
        string cover_url
        int duration_ms
        string preview_url
        string external_url
        bool is_streamable
        int popularity
        string genre
        Platform source
        string platform_id
    }

    Artist {
        string id "platform:native_id"
        string name
        string image_url
        string[] genres
        int followers
        string external_url
        Platform source
    }

    Album {
        string id "platform:native_id"
        string title
        string artist
        string cover_url
        int total_tracks
        string release_date
        string external_url
        string album_type
        Platform source
        string platform_id
    }

    Playlist {
        string id "platform:native_id"
        string title
        string description
        string cover_url
        int total_tracks
        string owner
        string external_url
        Platform source
        string platform_id
    }

    User {
        string id
        string email
        string password_hash
        timestamp created_at
    }

    ListeningEvent {
        string id "uuid"
        string user_id
        string track_id
        string title
        string artist
        string artist_id
        int duration_s
        Platform source
        bool is_public
        timestamp created_at
    }

    Lyrics {
        string lyrics
        bool synced
        LyricsSource source
        string resolved_title
        string resolved_artist
        float similarity
        LyricsType type
    }

    LyricsLine {
        int time_ms
        string text
    }

    LyricAnnotation {
        int id
        string url
        string fragment
        string body
        int votes_total
        bool verified
        bool pinned
        int comment_count
        string created_at
    }

    AnnotationContributor {
        string login
        string url
        string avatar_url
        string role
        int iq
    }

    PopularTrackItem {
        int play_count
    }

    PopularArtistItem {
        string artist_name
        int play_count
        string cover_url
        string external_url
    }

    Track ||--o{ Artist : "artists"
    Album ||--o{ Artist : "artists"
    Album ||--o{ Track : "tracks"
    Playlist ||--o{ Track : "tracks"

    User ||--o{ ListeningEvent : "history"
    ListeningEvent }o--|| Track : "track"

    Lyrics ||--o{ LyricsLine : "synced_lines"
    LyricAnnotation ||--|| AnnotationContributor : "contributor"

    PopularTrackItem ||--|| Track : "track"

Key Entities

Entity Description
Track Unified track from any platform (Spotify, Deezer, SoundCloud, etc.)
Artist Artist with platform-specific metadata
Album Album with release info
Playlist User/curated playlist
User Authenticated user (JWT)
ListeningEvent Play history for stats
Lyrics Plain or synced lyrics (LrcLib, Genius)
LyricAnnotation Genius community annotations

Platform Enum

PLATFORM_SPOTIFY, PLATFORM_YANDEX, PLATFORM_VK,
PLATFORM_DEEZER, PLATFORM_SOUNDCLOUD, PLATFORM_YOUTUBE

4. minim

Purpose: Python library providing unified client interface to 7 music APIs.

Storage: None (library only). OAuth tokens cached locally.

erDiagram
    SpotifyTrack {
        string id
        string name
        int duration_ms
        int popularity
        bool explicit
        string preview_url
        string external_url
    }

    SpotifyArtist {
        string id
        string name
        string[] genres
        int followers
        int popularity
        string image_url
    }

    SpotifyAlbum {
        string id
        string name
        string album_type
        string release_date
        int total_tracks
        string[] genres
    }

    DeezerTrack {
        int id
        string title
        int duration
        int rank
        bool explicit
        string preview
        string link
    }

    DeezerArtist {
        int id
        string name
        int nb_fan
        string picture_url
    }

    DeezerAlbum {
        int id
        string title
        string release_date
        int nb_tracks
        string cover_url
    }

    TidalTrack {
        int id
        string title
        int duration
        int popularity
        bool explicit
        string isrc
    }

    TidalArtist {
        int id
        string name
        string picture_url
    }

    TidalAlbum {
        int id
        string title
        string releaseDate
        int numberOfTracks
        string cover_url
    }

    QobuzTrack {
        int id
        string title
        int duration
        bool hires
        string isrc
    }

    iTunesTrack {
        int trackId
        string trackName
        int trackTimeMillis
        string previewUrl
        string trackViewUrl
    }

    iTunesArtist {
        int artistId
        string artistName
        string artistLinkUrl
    }

    iTunesAlbum {
        int collectionId
        string collectionName
        string releaseDate
        int trackCount
    }

    AudioFile {
        string path
        string format
        int bitrate
        int sample_rate
        int channels
    }

    AudioMetadata {
        string title
        string artist
        string album
        int track_number
        int year
        string genre
        bytes cover_art
    }

    SpotifyAlbum ||--o{ SpotifyTrack : "tracks"
    SpotifyAlbum ||--o{ SpotifyArtist : "artists"
    SpotifyTrack ||--o{ SpotifyArtist : "artists"

    DeezerAlbum ||--o{ DeezerTrack : "tracks"
    DeezerAlbum ||--|| DeezerArtist : "artist"
    DeezerTrack ||--|| DeezerArtist : "artist"

    TidalAlbum ||--o{ TidalTrack : "tracks"
    TidalAlbum ||--o{ TidalArtist : "artists"

    AudioFile ||--|| AudioMetadata : "metadata"

API Modules

Module Provider Auth
spotify Spotify Web API OAuth 2.0 (multiple grant types)
discogs Discogs API OAuth 1.0a
itunes iTunes Search API None
qobuz Qobuz API Password
tidal TIDAL API OAuth 2.0
audio Local files N/A

5. MusicMetaLinker

Purpose: Entity linking library - connects track metadata to external databases.

Storage: None (library only). Queries external APIs in real-time.

erDiagram
    Align {
        string mbid_track
        string mbid_release
        string artist
        string album
        string track
        int track_number
        float duration
        string[] isrc
        bool strict
    }

    MusicBrainzLink {
        string mbid
        string artist
        string album
        string track
        int track_number
        float duration
        string[] isrc
        string release_date
    }

    DeezerLink {
        int id
        string link
        string artist_name
        string album_title
        string track_title
        int track_number
        float duration
        string isrc
        float bpm
        string release_date
    }

    YouTubeLink {
        string video_id
        string link
        string title
        string artist
        string album
        float duration
    }

    AcousticBrainzLink {
        string mbid
        string link
        float bpm
        string key
        float danceability
        float energy
    }

    LinkedTrack {
        string mbid
        string isrc
        int deezer_id
        string youtube_id
        string acousticbrainz_link
        string artist
        string album
        string track
        int track_number
        float duration
        string release_date
        float bpm
    }

    Align ||--|| MusicBrainzLink : "mb_link"
    Align ||--|| DeezerLink : "dz_link"
    Align ||--|| YouTubeLink : "yt_link"

    MusicBrainzLink ||--o| AcousticBrainzLink : "acousticbrainz"

    LinkedTrack }o--|| MusicBrainzLink : "musicbrainz"
    LinkedTrack }o--|| DeezerLink : "deezer"
    LinkedTrack }o--|| YouTubeLink : "youtube"
    LinkedTrack }o--|| AcousticBrainzLink : "acousticbrainz"

Linking Flow

Input (any combination):
  - MBID (MusicBrainz ID)
  - ISRC
  - Artist + Track + Album
  - Duration

        ┌─────────────────┐
        │     Align       │
        │  (coordinator)  │
        └────────┬────────┘
                 │
    ┌────────────┼────────────┐
    │            │            │
    ▼            ▼            ▼
┌────────┐  ┌────────┐  ┌────────┐
│MusicBr.│  │ Deezer │  │YouTube │
│  Link  │  │  Link  │  │  Link  │
└────┬───┘  └────────┘  └────────┘
     │
     ▼
┌────────────┐
│AcousticBr. │
│   Link     │
└────────────┘

Output:
  - Enriched metadata from all sources
  - Cross-platform IDs (MBID, Deezer ID, YouTube ID)
  - Additional data (BPM, key, etc.)

Supported Sources

Source ID Type Data Retrieved
MusicBrainz MBID Track, artist, album, ISRC, release date
Deezer Deezer ID Track, BPM, ISRC, release date
YouTube Music Video ID Track, duration
AcousticBrainz MBID BPM, key, audio features

Comparison

Feature Harmony GraphBrainz Bedrock-API minim MusicMetaLinker
Primary Use MB seeding GraphQL proxy Streaming API library Entity linking
Database None Cache PostgreSQL None None
Sources 10+ MB + extensions 6 platforms 7 APIs 4 sources
Output Merged release GraphQL gRPC/Protobuf Python objects Linked IDs
Language TypeScript JavaScript Go Python Python
Unique Value Intelligent merge Schema stitching Stream bridging Unified interface Cross-DB linking