feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
+792
View File
@@ -0,0 +1,792 @@
# Aggregators - Entity Relationship Diagrams
Entity structure analysis for the 5 Tier 2 aggregator projects.
## Overview
| Project | Type | Persistence | Entity Model |
|---------|------|-------------|--------------|
| **Harmony** | Multi-source merger | In-memory | Harmonized release structure |
| **GraphBrainz** | GraphQL layer | Cache only | MusicBrainz schema mirror |
| **Bedrock-API** | gRPC aggregator | PostgreSQL | Unified streaming model |
| **minim** | Python library | None | API response wrappers |
| **MusicMetaLinker** | Entity linker | None | Alignment/linking model |
---
## 1. Harmony
**Purpose**: Harmonizes release metadata from 10+ providers into unified format for MusicBrainz seeding.
**Storage**: In-memory only (no database). Cached snapshots via permalinks.
```mermaid
erDiagram
HarmonyRelease {
string title
GTIN gtin
Language language
ScriptFrequency script
ReleaseStatus status
ReleaseDate releaseDate
ReleasePackaging packaging
string credits
string copyright
CountryCode[] availableIn
CountryCode[] excludedFrom
}
HarmonyMedium {
string title
int number
MediumFormat format
}
HarmonyTrack {
string title
string number
int length_ms
TrackType type
string isrc
CountryCode[] availableIn
}
ArtistCreditName {
string name
string creditedName
string joinPhrase
string mbid
}
Label {
string name
string catalogNumber
string mbid
}
Artwork {
string url
string thumbUrl
ArtworkType[] types
string comment
string provider
}
ExternalLink {
string url
LinkType[] types
}
ExternalEntityId {
string provider
string type
string id
CountryCode region
LinkType[] linkTypes
}
ProviderInfo {
string name
string internalName
string id
string url
string apiUrl
int processingTime
int cacheTime
string[] linkedReleases
bool isTemplate
}
ReleaseInfo {
ProviderMessage[] messages
}
ResolvableEntity {
string name
string mbid
}
HarmonyRelease ||--o{ HarmonyMedium : "media"
HarmonyRelease ||--o{ ArtistCreditName : "artists"
HarmonyRelease ||--o{ Label : "labels"
HarmonyRelease ||--o{ Artwork : "images"
HarmonyRelease ||--o{ ExternalLink : "externalLinks"
HarmonyRelease ||--o| ResolvableEntity : "releaseGroup"
HarmonyRelease ||--|| ReleaseInfo : "info"
HarmonyMedium ||--o{ HarmonyTrack : "tracklist"
HarmonyTrack ||--o{ ArtistCreditName : "artists"
HarmonyTrack ||--o| ResolvableEntity : "recording"
ArtistCreditName ||--o{ ExternalEntityId : "externalIds"
Label ||--o{ ExternalEntityId : "externalIds"
ReleaseInfo ||--o{ ProviderInfo : "providers"
```
### Key Entities
| Entity | Description |
|--------|-------------|
| `HarmonyRelease` | Unified release from multiple providers |
| `HarmonyMedium` | Disc/media within release (CD, Vinyl, Digital) |
| `HarmonyTrack` | Individual track with ISRC |
| `ArtistCreditName` | Artist credit with join phrases ("feat.", "&") |
| `Label` | Record label with catalog number |
| `ProviderInfo` | Metadata about each source provider used |
---
## 2. GraphBrainz
**Purpose**: GraphQL interface to MusicBrainz with extension support (Discogs, Spotify, Last.fm, etc.).
**Storage**: Configurable cache (Redis/memory). No persistent database - proxies MusicBrainz API.
```mermaid
erDiagram
Artist {
string id
string mbid
string name
string sortName
string disambiguation
string country
string gender
string type
string[] ipis
string[] isnis
}
ReleaseGroup {
string id
string mbid
string title
string disambiguation
Date firstReleaseDate
ReleaseGroupType primaryType
ReleaseGroupType[] secondaryTypes
}
Release {
string id
string mbid
string title
string disambiguation
Date date
string country
string asin
string barcode
ReleaseStatus status
string packaging
string quality
}
Recording {
string id
string mbid
string title
string disambiguation
string[] isrcs
int length
bool video
}
Track {
string mbid
string title
int position
string number
int length
}
Label {
string id
string mbid
string name
string sortName
string disambiguation
string country
int labelCode
string type
string[] ipis
}
Work {
string id
string mbid
string title
string disambiguation
string[] iswcs
string language
string type
}
Area {
string id
string mbid
string name
string type
}
ArtistCredit {
string name
string joinPhrase
}
Media {
int position
string format
int trackCount
}
ReleaseEvent {
Date date
string country
}
LifeSpan {
Date begin
Date end
bool ended
}
Relationship {
string type
string direction
string[] attributes
}
Tag {
string name
int count
}
Rating {
int voteCount
float value
}
Artist ||--o{ ReleaseGroup : "releaseGroups"
Artist ||--o{ Release : "releases"
Artist ||--o{ Recording : "recordings"
Artist ||--o{ Work : "works"
Artist ||--o| Area : "area"
Artist ||--o| Area : "beginArea"
Artist ||--o| Area : "endArea"
Artist ||--|| LifeSpan : "lifeSpan"
Artist ||--o{ Tag : "tags"
Artist ||--o| Rating : "rating"
Artist ||--o{ Relationship : "relationships"
ReleaseGroup ||--o{ Release : "releases"
ReleaseGroup ||--o{ ArtistCredit : "artistCredits"
ReleaseGroup ||--o{ Tag : "tags"
ReleaseGroup ||--o| Rating : "rating"
Release ||--o{ Media : "media"
Release ||--o{ ReleaseEvent : "releaseEvents"
Release ||--o{ ArtistCredit : "artistCredits"
Release ||--o{ Label : "labels"
Release ||--o{ Recording : "recordings"
Release ||--o{ Tag : "tags"
Media ||--o{ Track : "tracks"
Track ||--|| Recording : "recording"
Recording ||--o{ ArtistCredit : "artistCredits"
Recording ||--o{ Release : "releases"
Recording ||--o{ Tag : "tags"
Recording ||--o| Rating : "rating"
Label ||--o{ Release : "releases"
Label ||--o| Area : "area"
Label ||--|| LifeSpan : "lifeSpan"
Label ||--o{ Tag : "tags"
Work ||--o{ Artist : "artists"
Work ||--o{ Tag : "tags"
ArtistCredit }o--|| Artist : "artist"
```
### Key Entities
| Entity | Description |
|--------|-------------|
| `Artist` | Musician, band, or music professional |
| `ReleaseGroup` | Logical album concept (all editions) |
| `Release` | Specific edition (CD, vinyl, digital) |
| `Recording` | Distinct audio (linked to tracks) |
| `Track` | Recording on a specific medium |
| `Work` | Abstract composition (song as written) |
| `Label` | Record label/imprint |
| `Area` | Geographic region |
---
## 3. Bedrock-API
**Purpose**: Multi-platform streaming aggregator with cross-platform track bridging.
**Storage**: PostgreSQL (users, listening stats). Providers are queried in real-time.
```mermaid
erDiagram
Track {
string id "platform:native_id"
string title
string artist
string album_title
string cover_url
int duration_ms
string preview_url
string external_url
bool is_streamable
int popularity
string genre
Platform source
string platform_id
}
Artist {
string id "platform:native_id"
string name
string image_url
string[] genres
int followers
string external_url
Platform source
}
Album {
string id "platform:native_id"
string title
string artist
string cover_url
int total_tracks
string release_date
string external_url
string album_type
Platform source
string platform_id
}
Playlist {
string id "platform:native_id"
string title
string description
string cover_url
int total_tracks
string owner
string external_url
Platform source
string platform_id
}
User {
string id
string email
string password_hash
timestamp created_at
}
ListeningEvent {
string id "uuid"
string user_id
string track_id
string title
string artist
string artist_id
int duration_s
Platform source
bool is_public
timestamp created_at
}
Lyrics {
string lyrics
bool synced
LyricsSource source
string resolved_title
string resolved_artist
float similarity
LyricsType type
}
LyricsLine {
int time_ms
string text
}
LyricAnnotation {
int id
string url
string fragment
string body
int votes_total
bool verified
bool pinned
int comment_count
string created_at
}
AnnotationContributor {
string login
string url
string avatar_url
string role
int iq
}
PopularTrackItem {
int play_count
}
PopularArtistItem {
string artist_name
int play_count
string cover_url
string external_url
}
Track ||--o{ Artist : "artists"
Album ||--o{ Artist : "artists"
Album ||--o{ Track : "tracks"
Playlist ||--o{ Track : "tracks"
User ||--o{ ListeningEvent : "history"
ListeningEvent }o--|| Track : "track"
Lyrics ||--o{ LyricsLine : "synced_lines"
LyricAnnotation ||--|| AnnotationContributor : "contributor"
PopularTrackItem ||--|| Track : "track"
```
### Key Entities
| Entity | Description |
|--------|-------------|
| `Track` | Unified track from any platform (Spotify, Deezer, SoundCloud, etc.) |
| `Artist` | Artist with platform-specific metadata |
| `Album` | Album with release info |
| `Playlist` | User/curated playlist |
| `User` | Authenticated user (JWT) |
| `ListeningEvent` | Play history for stats |
| `Lyrics` | Plain or synced lyrics (LrcLib, Genius) |
| `LyricAnnotation` | Genius community annotations |
### Platform Enum
```
PLATFORM_SPOTIFY, PLATFORM_YANDEX, PLATFORM_VK,
PLATFORM_DEEZER, PLATFORM_SOUNDCLOUD, PLATFORM_YOUTUBE
```
---
## 4. minim
**Purpose**: Python library providing unified client interface to 7 music APIs.
**Storage**: None (library only). OAuth tokens cached locally.
```mermaid
erDiagram
SpotifyTrack {
string id
string name
int duration_ms
int popularity
bool explicit
string preview_url
string external_url
}
SpotifyArtist {
string id
string name
string[] genres
int followers
int popularity
string image_url
}
SpotifyAlbum {
string id
string name
string album_type
string release_date
int total_tracks
string[] genres
}
DeezerTrack {
int id
string title
int duration
int rank
bool explicit
string preview
string link
}
DeezerArtist {
int id
string name
int nb_fan
string picture_url
}
DeezerAlbum {
int id
string title
string release_date
int nb_tracks
string cover_url
}
TidalTrack {
int id
string title
int duration
int popularity
bool explicit
string isrc
}
TidalArtist {
int id
string name
string picture_url
}
TidalAlbum {
int id
string title
string releaseDate
int numberOfTracks
string cover_url
}
QobuzTrack {
int id
string title
int duration
bool hires
string isrc
}
iTunesTrack {
int trackId
string trackName
int trackTimeMillis
string previewUrl
string trackViewUrl
}
iTunesArtist {
int artistId
string artistName
string artistLinkUrl
}
iTunesAlbum {
int collectionId
string collectionName
string releaseDate
int trackCount
}
AudioFile {
string path
string format
int bitrate
int sample_rate
int channels
}
AudioMetadata {
string title
string artist
string album
int track_number
int year
string genre
bytes cover_art
}
SpotifyAlbum ||--o{ SpotifyTrack : "tracks"
SpotifyAlbum ||--o{ SpotifyArtist : "artists"
SpotifyTrack ||--o{ SpotifyArtist : "artists"
DeezerAlbum ||--o{ DeezerTrack : "tracks"
DeezerAlbum ||--|| DeezerArtist : "artist"
DeezerTrack ||--|| DeezerArtist : "artist"
TidalAlbum ||--o{ TidalTrack : "tracks"
TidalAlbum ||--o{ TidalArtist : "artists"
AudioFile ||--|| AudioMetadata : "metadata"
```
### API Modules
| Module | Provider | Auth |
|--------|----------|------|
| `spotify` | Spotify Web API | OAuth 2.0 (multiple grant types) |
| `discogs` | Discogs API | OAuth 1.0a |
| `itunes` | iTunes Search API | None |
| `qobuz` | Qobuz API | Password |
| `tidal` | TIDAL API | OAuth 2.0 |
| `audio` | Local files | N/A |
---
## 5. MusicMetaLinker
**Purpose**: Entity linking library - connects track metadata to external databases.
**Storage**: None (library only). Queries external APIs in real-time.
```mermaid
erDiagram
Align {
string mbid_track
string mbid_release
string artist
string album
string track
int track_number
float duration
string[] isrc
bool strict
}
MusicBrainzLink {
string mbid
string artist
string album
string track
int track_number
float duration
string[] isrc
string release_date
}
DeezerLink {
int id
string link
string artist_name
string album_title
string track_title
int track_number
float duration
string isrc
float bpm
string release_date
}
YouTubeLink {
string video_id
string link
string title
string artist
string album
float duration
}
AcousticBrainzLink {
string mbid
string link
float bpm
string key
float danceability
float energy
}
LinkedTrack {
string mbid
string isrc
int deezer_id
string youtube_id
string acousticbrainz_link
string artist
string album
string track
int track_number
float duration
string release_date
float bpm
}
Align ||--|| MusicBrainzLink : "mb_link"
Align ||--|| DeezerLink : "dz_link"
Align ||--|| YouTubeLink : "yt_link"
MusicBrainzLink ||--o| AcousticBrainzLink : "acousticbrainz"
LinkedTrack }o--|| MusicBrainzLink : "musicbrainz"
LinkedTrack }o--|| DeezerLink : "deezer"
LinkedTrack }o--|| YouTubeLink : "youtube"
LinkedTrack }o--|| AcousticBrainzLink : "acousticbrainz"
```
### Linking Flow
```
Input (any combination):
- MBID (MusicBrainz ID)
- ISRC
- Artist + Track + Album
- Duration
┌─────────────────┐
│ Align │
│ (coordinator) │
└────────┬────────┘
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│MusicBr.│ │ Deezer │ │YouTube │
│ Link │ │ Link │ │ Link │
└────┬───┘ └────────┘ └────────┘
┌────────────┐
│AcousticBr. │
│ Link │
└────────────┘
Output:
- Enriched metadata from all sources
- Cross-platform IDs (MBID, Deezer ID, YouTube ID)
- Additional data (BPM, key, etc.)
```
### Supported Sources
| Source | ID Type | Data Retrieved |
|--------|---------|----------------|
| MusicBrainz | MBID | Track, artist, album, ISRC, release date |
| Deezer | Deezer ID | Track, BPM, ISRC, release date |
| YouTube Music | Video ID | Track, duration |
| AcousticBrainz | MBID | BPM, key, audio features |
---
## Comparison
| Feature | Harmony | GraphBrainz | Bedrock-API | minim | MusicMetaLinker |
|---------|---------|-------------|-------------|-------|-----------------|
| **Primary Use** | MB seeding | GraphQL proxy | Streaming | API library | Entity linking |
| **Database** | None | Cache | PostgreSQL | None | None |
| **Sources** | 10+ | MB + extensions | 6 platforms | 7 APIs | 4 sources |
| **Output** | Merged release | GraphQL | gRPC/Protobuf | Python objects | Linked IDs |
| **Language** | TypeScript | JavaScript | Go | Python | Python |
| **Unique Value** | Intelligent merge | Schema stitching | Stream bridging | Unified interface | Cross-DB linking |