Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

26 KiB

Raw Blame History

MiniMediaMetadataAPI - Data Layer Analysis

Database Technology

RDBMS: PostgreSQL
Driver: Npgsql 10.0.2
ORM: Dapper 2.1.72 (micro-ORM)
Extensions: pg_trgm (trigram similarity search)

Schema Ownership

Critical Constraint: This API does NOT own the database schema.

Schema Owner: MiniMediaScanner (separate project)
API Role: Read-only consumer
Migration Strategy: None (schema managed externally)

Implications

Pros:

Clear separation of concerns
API doesn't need provider API credentials
Simpler deployment (no migration coordination)
Sync complexity isolated in MiniMediaScanner

Cons:

No control over schema evolution
Breaking changes in MiniMediaScanner break API
Can't optimize schema for query patterns
Data freshness depends on external sync schedule

Coupling Points:

Table names hardcoded in SQL queries
Column names hardcoded in Dapper mappings
Foreign key relationships assumed in joins
Data types must match C# model properties

Connection Configuration

Connection String Format:

Host=localhost;
Database=minimediametadata;
Username=postgres;
Password=password;
MinPoolSize=5;
MaxPoolSize=100;
Timeout=30;
CommandTimeout=30;

Pooling Settings:

MinPoolSize: 5 connections kept alive
MaxPoolSize: 100 concurrent connections
Timeout: 30 seconds to acquire connection
CommandTimeout: 30 seconds for query execution

Connection Lifecycle:

Connections created per repository method call
Returned to pool after query completion
No long-lived connections
No transaction management (read-only)

Fuzzy Search Implementation

pg_trgm Extension

Purpose: Trigram-based similarity search for fuzzy text matching

Configuration:

SET LOCAL pg_trgm.similarity_threshold = 0.5;

Threshold: 0.5 (50% similarity required)

Operators:

% - Similarity operator (returns true if similarity >= threshold)
similarity(text, text) - Returns similarity score (0.0 to 1.0)

Search Query Pattern

Example (Artist Search):

SET LOCAL pg_trgm.similarity_threshold = 0.5;

SELECT 
    id,
    name,
    popularity,
    external_url,
    followers,
    genres,
    last_sync_time
FROM spotify_artist
WHERE lower(name) % lower(@searchTerm)
ORDER BY similarity(lower(name), lower(@searchTerm)) DESC
LIMIT 20 OFFSET @offset;

Key Features:

Case-insensitive matching (lower())
Similarity-based ordering (best matches first)
Pagination support (LIMIT/OFFSET)
Threshold filtering (only >= 50% similarity)

Performance:

Requires GIN or GiST index on name column
Index creation: CREATE INDEX idx_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);
Query time: O(log n) with index, O(n) without

Similarity Scoring

Algorithm: Trigram overlap

Example:

"Beatles" vs "Beetles"
Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["bee", "eet", "etl", "tle", "les"]
Overlap: ["tle", "les"] = 2/5 = 0.4 (below threshold)

"Beatles" vs "The Beatles"
Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["the", "he ", "e b", " be", "bea", "eat", "atl", "tle", "les"]
Overlap: ["bea", "eat", "atl", "tle", "les"] = 5/9 = 0.56 (above threshold)

Tuning:

Lower threshold (0.3) = more results, more false positives
Higher threshold (0.7) = fewer results, more precision
Current 0.5 = balanced approach

Database Schema

Provider-Specific Tables

Each provider has isolated table structure. No cross-provider foreign keys.

Spotify Schema

Tables:

spotify_artist - Artist metadata
spotify_artist_image - Artist images (1:N)
spotify_album - Album metadata
spotify_album_artist - Album-artist relationships (M:N)
spotify_album_image - Album artwork (1:N)
spotify_album_externalid - External identifiers (UPC, EAN) (1:N)
spotify_track - Track metadata
spotify_track_artist - Track-artist relationships (M:N)
spotify_track_externalid - External identifiers (ISRC) (1:N)

spotify_artist:

CREATE TABLE spotify_artist (
    id VARCHAR(255) PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    popularity INTEGER,
    external_url VARCHAR(500),
    followers INTEGER,
    genres TEXT[], -- PostgreSQL array
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);

spotify_artist_image:

CREATE TABLE spotify_artist_image (
    id SERIAL PRIMARY KEY,
    artist_id VARCHAR(255) REFERENCES spotify_artist(id),
    url VARCHAR(1000) NOT NULL,
    height INTEGER,
    width INTEGER
);

CREATE INDEX idx_spotify_artist_image_artist ON spotify_artist_image(artist_id);

spotify_album:

CREATE TABLE spotify_album (
    id VARCHAR(255) PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    popularity INTEGER,
    external_url VARCHAR(500),
    label VARCHAR(500),
    release_date VARCHAR(50), -- Stored as string (YYYY, YYYY-MM, or YYYY-MM-DD)
    total_tracks INTEGER,
    album_type VARCHAR(50), -- album, single, compilation
    copyright TEXT,
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops);

spotify_album_artist (junction table):

CREATE TABLE spotify_album_artist (
    id SERIAL PRIMARY KEY,
    album_id VARCHAR(255) REFERENCES spotify_album(id),
    artist_id VARCHAR(255) REFERENCES spotify_artist(id)
);

CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id);
CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id);

spotify_track:

CREATE TABLE spotify_track (
    id VARCHAR(255) PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    album_id VARCHAR(255) REFERENCES spotify_album(id),
    popularity INTEGER,
    external_url VARCHAR(500),
    duration_ms INTEGER,
    explicit BOOLEAN,
    disc_number INTEGER,
    track_number INTEGER,
    label VARCHAR(500),
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_spotify_track_album ON spotify_track(album_id);

spotify_album_externalid:

CREATE TABLE spotify_album_externalid (
    id SERIAL PRIMARY KEY,
    album_id VARCHAR(255) REFERENCES spotify_album(id),
    type VARCHAR(50), -- upc, ean
    value VARCHAR(255)
);

CREATE INDEX idx_spotify_album_externalid_album ON spotify_album_externalid(album_id);
CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value);

spotify_track_externalid:

CREATE TABLE spotify_track_externalid (
    id SERIAL PRIMARY KEY,
    track_id VARCHAR(255) REFERENCES spotify_track(id),
    type VARCHAR(50), -- isrc
    value VARCHAR(255)
);

CREATE INDEX idx_spotify_track_externalid_track ON spotify_track_externalid(track_id);
CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value);

Tidal Schema

Tables:

tidal_artist - Artist metadata
tidal_artist_image_link - Artist image URLs (1:N)
tidal_album - Album metadata
tidal_album_external_link - External URLs (1:N)
tidal_album_image - Album artwork (1:N)
tidal_track - Track metadata
tidal_track_artist - Track-artist relationships (M:N)
tidal_track_external_link - External URLs (1:N)

Key Differences from Spotify:

ID type: INTEGER instead of VARCHAR
No popularity field
No genres field
External links instead of external IDs
Image links stored as separate table

tidal_artist:

CREATE TABLE tidal_artist (
    id INTEGER PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    url VARCHAR(500),
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_tidal_artist_name_trgm ON tidal_artist USING gin(lower(name) gin_trgm_ops);

tidal_album:

CREATE TABLE tidal_album (
    id INTEGER PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    artist_id INTEGER REFERENCES tidal_artist(id),
    url VARCHAR(500),
    release_date VARCHAR(50),
    total_tracks INTEGER,
    duration INTEGER, -- Total duration in seconds
    explicit BOOLEAN,
    upc VARCHAR(255),
    copyright TEXT,
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_tidal_album_name_trgm ON tidal_album USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_tidal_album_artist ON tidal_album(artist_id);

MusicBrainz Schema

Tables:

musicbrainz_artist - Artist metadata
musicbrainz_release - Release (album) metadata
musicbrainz_release_label - Release-label relationships (M:N)
musicbrainz_label - Label metadata
musicbrainz_release_track - Track metadata
musicbrainz_release_track_artist - Track-artist relationships (M:N)

Key Differences:

ID type: UUID (Guid)
"Release" instead of "Album"
Sort name field for artists
Label as separate entity
No popularity or follower counts
No images (stored externally via Cover Art Archive)

musicbrainz_artist:

CREATE TABLE musicbrainz_artist (
    id UUID PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    sort_name VARCHAR(500), -- For alphabetical sorting (e.g., "Beatles, The")
    type VARCHAR(100), -- Person, Group, Orchestra, etc.
    country VARCHAR(2), -- ISO country code
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_musicbrainz_artist_name_trgm ON musicbrainz_artist USING gin(lower(name) gin_trgm_ops);

musicbrainz_release:

CREATE TABLE musicbrainz_release (
    id UUID PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    artist_id UUID REFERENCES musicbrainz_artist(id),
    release_date VARCHAR(50),
    country VARCHAR(2),
    barcode VARCHAR(255), -- Similar to UPC
    status VARCHAR(100), -- Official, Promotion, Bootleg, etc.
    packaging VARCHAR(100), -- Jewel Case, Digipak, etc.
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_musicbrainz_release_name_trgm ON musicbrainz_release USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_musicbrainz_release_artist ON musicbrainz_release(artist_id);

musicbrainz_label:

CREATE TABLE musicbrainz_label (
    id UUID PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    type VARCHAR(100), -- Original Production, Bootleg Production, etc.
    country VARCHAR(2),
    last_sync_time TIMESTAMP WITH TIME ZONE
);

musicbrainz_release_label (junction table):

CREATE TABLE musicbrainz_release_label (
    id SERIAL PRIMARY KEY,
    release_id UUID REFERENCES musicbrainz_release(id),
    label_id UUID REFERENCES musicbrainz_label(id),
    catalog_number VARCHAR(255)
);

CREATE INDEX idx_musicbrainz_release_label_release ON musicbrainz_release_label(release_id);
CREATE INDEX idx_musicbrainz_release_label_label ON musicbrainz_release_label(label_id);

Deezer Schema

Tables:

deezer_artist - Artist metadata
deezer_artist_image_link - Artist image URLs (1:N)
deezer_album - Album metadata
deezer_album_image_link - Album artwork URLs (1:N)
deezer_album_artist - Album-artist relationships (M:N)
deezer_track - Track metadata
deezer_track_artist - Track-artist relationships (M:N)

Key Differences:

ID type: BIGINT
Has popularity (called "fans")
Has genres
No UPC/ISRC fields
No label information

deezer_artist:

CREATE TABLE deezer_artist (
    id BIGINT PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    url VARCHAR(500),
    fans INTEGER, -- Similar to followers
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_deezer_artist_name_trgm ON deezer_artist USING gin(lower(name) gin_trgm_ops);

deezer_album:

CREATE TABLE deezer_album (
    id BIGINT PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    url VARCHAR(500),
    release_date VARCHAR(50),
    total_tracks INTEGER,
    duration INTEGER, -- Total duration in seconds
    explicit BOOLEAN,
    fans INTEGER,
    genres TEXT[], -- PostgreSQL array
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_deezer_album_name_trgm ON deezer_album USING gin(lower(name) gin_trgm_ops);

Discogs Schema

Tables:

discogs_artist - Artist metadata
discogs_artist_alias - Artist aliases (1:N)
discogs_artist_url - Artist URLs (1:N)
discogs_release - Release metadata
discogs_release_artist - Release-artist relationships (M:N)
discogs_release_identifier - Barcodes/identifiers (1:N)
discogs_release_track - Track metadata
discogs_label - Label metadata
discogs_label_sublabel - Label hierarchy (1:N)
discogs_label_url - Label URLs (1:N)

Key Differences:

ID type: INTEGER
Most comprehensive label data
Artist aliases tracked
Multiple identifiers per release (Barcode, Matrix, etc.)
No popularity metrics
No image URLs (stored externally)

discogs_artist:

CREATE TABLE discogs_artist (
    id INTEGER PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    real_name VARCHAR(500), -- For pseudonyms
    profile TEXT, -- Biography
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_discogs_artist_name_trgm ON discogs_artist USING gin(lower(name) gin_trgm_ops);

discogs_artist_alias:

CREATE TABLE discogs_artist_alias (
    id SERIAL PRIMARY KEY,
    artist_id INTEGER REFERENCES discogs_artist(id),
    alias_name VARCHAR(500)
);

CREATE INDEX idx_discogs_artist_alias_artist ON discogs_artist_alias(artist_id);
CREATE INDEX idx_discogs_artist_alias_name_trgm ON discogs_artist_alias USING gin(lower(alias_name) gin_trgm_ops);

discogs_release:

CREATE TABLE discogs_release (
    id INTEGER PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    released VARCHAR(50),
    country VARCHAR(100),
    notes TEXT,
    genres TEXT[],
    styles TEXT[], -- More specific than genres
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_discogs_release_name_trgm ON discogs_release USING gin(lower(name) gin_trgm_ops);

discogs_release_identifier:

CREATE TABLE discogs_release_identifier (
    id SERIAL PRIMARY KEY,
    release_id INTEGER REFERENCES discogs_release(id),
    type VARCHAR(100), -- Barcode, Matrix/Runout, Label Code, etc.
    value VARCHAR(500),
    description TEXT
);

CREATE INDEX idx_discogs_release_identifier_release ON discogs_release_identifier(release_id);
CREATE INDEX idx_discogs_release_identifier_value ON discogs_release_identifier(value);

discogs_label:

CREATE TABLE discogs_label (
    id INTEGER PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    contact_info TEXT,
    profile TEXT,
    parent_label_id INTEGER REFERENCES discogs_label(id),
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_discogs_label_name_trgm ON discogs_label USING gin(lower(name) gin_trgm_ops);

SoundCloud Schema

Tables:

soundcloud_user - User/artist metadata
soundcloud_playlist - Playlist metadata
soundcloud_track - Track metadata
soundcloud_track_artist - Track-artist relationships (M:N)

Key Differences:

"User" instead of "Artist" (user-generated content platform)
Playlist as first-class entity
No album concept
Minimal metadata (no UPC, ISRC, labels)
ID type: BIGINT

soundcloud_user:

CREATE TABLE soundcloud_user (
    id BIGINT PRIMARY KEY,
    username VARCHAR(500) NOT NULL,
    full_name VARCHAR(500),
    url VARCHAR(500),
    avatar_url VARCHAR(1000),
    followers_count INTEGER,
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_soundcloud_user_username_trgm ON soundcloud_user USING gin(lower(username) gin_trgm_ops);

soundcloud_playlist:

CREATE TABLE soundcloud_playlist (
    id BIGINT PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    user_id BIGINT REFERENCES soundcloud_user(id),
    url VARCHAR(500),
    artwork_url VARCHAR(1000),
    duration INTEGER, -- Total duration in milliseconds
    track_count INTEGER,
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_soundcloud_playlist_title_trgm ON soundcloud_playlist USING gin(lower(title) gin_trgm_ops);
CREATE INDEX idx_soundcloud_playlist_user ON soundcloud_playlist(user_id);

soundcloud_track:

CREATE TABLE soundcloud_track (
    id BIGINT PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    user_id BIGINT REFERENCES soundcloud_user(id),
    url VARCHAR(500),
    artwork_url VARCHAR(1000),
    duration INTEGER, -- Duration in milliseconds
    genre VARCHAR(255),
    playback_count INTEGER,
    last_sync_time TIMESTAMP WITH TIME ZONE
);

CREATE INDEX idx_soundcloud_track_title_trgm ON soundcloud_track USING gin(lower(title) gin_trgm_ops);
CREATE INDEX idx_soundcloud_track_user ON soundcloud_track(user_id);

ID Type Comparison

Provider	Artist ID	Album ID	Track ID	Notes
Spotify	VARCHAR(255)	VARCHAR(255)	VARCHAR(255)	Base62 encoded (22 chars)
Tidal	INTEGER	INTEGER	INTEGER	Sequential integers
MusicBrainz	UUID	UUID	UUID	RFC 4122 UUIDs
Deezer	BIGINT	BIGINT	BIGINT	Large integers
Discogs	INTEGER	INTEGER	INTEGER	Sequential integers
SoundCloud	BIGINT	N/A	BIGINT	No album concept

Implications:

Cross-provider ID lookups impossible
ID parameter must match provider type
C# models use provider-specific types
No universal identifier system

Data Type Patterns

Arrays (PostgreSQL Native)

Usage: Genres, styles, external IDs

Example:

genres TEXT[] -- ["rock", "pop", "alternative"]

Dapper Mapping:

public class SpotifyArtist
{
    public string[] Genres { get; set; } // Dapper auto-maps PostgreSQL arrays
}

Timestamps

Type: TIMESTAMP WITH TIME ZONE
Purpose: Track last sync time from provider

Example:

last_sync_time TIMESTAMP WITH TIME ZONE DEFAULT NOW()

C# Mapping:

public DateTime? LastSyncTime { get; set; }

Variable-Length Dates

Type: VARCHAR(50)
Formats: YYYY, YYYY-MM, YYYY-MM-DD

Rationale: Providers return different precision levels

Examples:

"1969" - Year only
"1969-09" - Year and month
"1969-09-26" - Full date

C# Mapping:

public string ReleaseDate { get; set; } // Stored as string, parsed in application

Query Patterns

Artist Search

SET LOCAL pg_trgm.similarity_threshold = 0.5;

SELECT 
    a.id,
    a.name,
    a.popularity,
    a.external_url,
    a.followers,
    a.genres,
    a.last_sync_time,
    i.url AS image_url,
    i.height AS image_height,
    i.width AS image_width
FROM spotify_artist a
LEFT JOIN spotify_artist_image i ON a.id = i.artist_id
WHERE lower(a.name) % lower(@searchTerm)
ORDER BY similarity(lower(a.name), lower(@searchTerm)) DESC
LIMIT 20 OFFSET @offset;

Dapper Mapping:

var artistDict = new Dictionary<string, SpotifyArtist>();

var results = await connection.QueryAsync<SpotifyArtist, SpotifyArtistImage, SpotifyArtist>(
    sql,
    (artist, image) =>
    {
        if (!artistDict.TryGetValue(artist.Id, out var existingArtist))
        {
            existingArtist = artist;
            existingArtist.Images = new List<SpotifyArtistImage>();
            artistDict.Add(artist.Id, existingArtist);
        }
        
        if (image != null)
        {
            existingArtist.Images.Add(image);
        }
        
        return existingArtist;
    },
    new { searchTerm, offset },
    splitOn: "image_url"
);

return artistDict.Values.ToList();

Album with Artists

SELECT 
    a.id,
    a.name,
    a.popularity,
    a.external_url,
    a.label,
    a.release_date,
    a.total_tracks,
    a.album_type,
    a.copyright,
    a.last_sync_time,
    ar.id AS artist_id,
    ar.name AS artist_name
FROM spotify_album a
LEFT JOIN spotify_album_artist aa ON a.id = aa.album_id
LEFT JOIN spotify_artist ar ON aa.artist_id = ar.id
WHERE a.id = @albumId;

Multi-Mapping: Album with nested artist list.

Track with Album and Artists

SELECT 
    t.id,
    t.name,
    t.popularity,
    t.external_url,
    t.duration_ms,
    t.explicit,
    t.disc_number,
    t.track_number,
    t.label,
    t.last_sync_time,
    a.id AS album_id,
    a.name AS album_name,
    a.release_date AS album_release_date,
    ar.id AS artist_id,
    ar.name AS artist_name
FROM spotify_track t
LEFT JOIN spotify_album a ON t.album_id = a.id
LEFT JOIN spotify_track_artist ta ON t.id = ta.track_id
LEFT JOIN spotify_artist ar ON ta.artist_id = ar.id
WHERE t.id = @trackId;

Multi-Mapping: Track with nested album and artist list.

External ID Lookup

SELECT 
    a.id,
    a.name,
    a.popularity,
    a.external_url,
    a.label,
    a.release_date,
    a.total_tracks,
    a.album_type,
    a.last_sync_time
FROM spotify_album a
INNER JOIN spotify_album_externalid e ON a.id = e.album_id
WHERE e.type = 'upc' AND e.value = @upc;

Use Case: Find album by UPC barcode.

Index Strategy

Required Indexes

Fuzzy Search (GIN trigram):

CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops);

Foreign Keys:

CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id);
CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id);
CREATE INDEX idx_spotify_track_album ON spotify_track(album_id);
CREATE INDEX idx_spotify_track_artist_track ON spotify_track_artist(track_id);
CREATE INDEX idx_spotify_track_artist_artist ON spotify_track_artist(artist_id);

External IDs:

CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value);
CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value);

Index Maintenance

Owned by: MiniMediaScanner (schema owner)
API Responsibility: None (read-only consumer)

Performance Impact:

GIN indexes: Large (2-3x table size), slow writes, fast reads
B-tree indexes: Moderate size, fast writes, fast reads
No index = full table scan (unacceptable for fuzzy search)

Data Freshness

Sync Mechanism: MiniMediaScanner polls provider APIs

Sync Frequency: Unknown (configured in MiniMediaScanner)

Staleness Indicator: last_sync_time column

API Behavior:

Returns whatever data exists in database
No real-time provider API calls
No cache invalidation
No sync triggering

Client Considerations:

Check lastSyncTime in response
Stale data possible (hours to days old)
No guarantee of completeness
Provider outages affect sync, not queries

Provider Feature Matrix

Feature	Spotify	Tidal	MusicBrainz	Deezer	Discogs	SoundCloud
Artist Data
Popularity	✓	✗	✗	✓ (fans)	✗	✗
Followers	✓	✗	✗	✓ (fans)	✗	✓
Genres	✓	✗	✓	✓	✗	✗
Images	✓	✓	✗	✓	✗	✓ (avatar)
Sort Name	✗	✗	✓	✗	✗	✗
Aliases	✗	✗	✗	✗	✓	✗
Album Data
Popularity	✓	✗	✗	✓ (fans)	✗	N/A
Images	✓	✓	✗	✓	✗	N/A
Label	✓	✗	✓	✗	✓	N/A
UPC	✓	✓	✗	✗	✓	N/A
Copyright	✓	✓	✗	✗	✗	N/A
Album Type	✓	✗	✓	✗	✗	N/A
Track Data
Popularity	✓	✗	✗	✓	✗	✓ (playback_count)
Duration	✓	✓	✓	✓	✓	✓
Explicit	✓	✓	✗	✓	✗	✗
ISRC	✓	✓	✓	✗	✗	✗
Disc/Track #	✓	✓	✓	✓	✓	✗

Database Size Estimates

Assumptions:

1 million artists
10 million albums
100 million tracks

Spotify Tables:

spotify_artist: ~500 MB
spotify_artist_image: ~200 MB
spotify_album: ~5 GB
spotify_album_artist: ~1 GB
spotify_album_image: ~2 GB
spotify_track: ~50 GB
spotify_track_artist: ~10 GB
Total: ~70 GB per provider

All Providers: ~420 GB (6 providers)

Indexes: ~200 GB (GIN indexes are large)

Total Database: ~620 GB for comprehensive catalog

Implications:

Requires substantial storage
Backup/restore time significant
Index rebuilds time-consuming
Connection pooling critical

Performance Considerations

Query Performance

Fuzzy Search:

With GIN index: 10-50ms for 20 results
Without index: 5-30 seconds (full table scan)
Threshold tuning affects result count and speed

ID Lookup:

With primary key: <1ms
With foreign key index: 1-5ms

Join Queries:

Album with artists: 5-20ms
Track with album and artists: 10-30ms
Depends on relationship cardinality

Optimization Strategies

Implemented:

GIN indexes for fuzzy search
B-tree indexes for foreign keys
Connection pooling
Parameterized queries (SQL injection prevention)

Missing:

Query result caching (Redis/Memcached)
Materialized views for complex joins
Partitioning for large tables
Read replicas for horizontal scaling

Bottlenecks

GIN Index Size: Large memory footprint
Fuzzy Search: CPU-intensive similarity calculations
Multi-Provider Queries: 6 parallel database queries
No Caching: Every request hits database
Connection Pool Limit: 100 max connections per instance

Data Integrity

Constraints:

Primary keys on all entity tables
Foreign keys for relationships
NOT NULL on critical fields (id, name)

No Constraints:

No unique constraints on names (duplicates allowed)
No check constraints on data ranges
No triggers for data validation

Orphan Prevention:

Foreign keys with CASCADE delete (assumed)
Junction tables maintain referential integrity

Data Quality:

Depends entirely on MiniMediaScanner sync quality
No validation in this API
Garbage in, garbage out

Backup and Recovery

Responsibility: Database administrator (not API)

Recommended Strategy:

Daily full backups
Continuous WAL archiving
Point-in-time recovery capability
Backup retention: 30 days

Recovery Time:

Full restore: Hours (620 GB database)
Index rebuild: Hours (GIN indexes)
Sync from providers: Days to weeks

Schema Evolution

Change Process:

MiniMediaScanner updates schema
MiniMediaScanner deploys migration
MiniMediaMetadataAPI updates models
MiniMediaMetadataAPI redeploys

Risk: Breaking changes require coordinated deployment.

Mitigation:

Additive changes only (new columns, tables)
Deprecation period for removals
Version compatibility checks

No Automated Migration: API has no migration framework.

26 KiB Raw Blame History

MiniMediaMetadataAPI - Data Layer Analysis

Database Technology

Schema Ownership

Implications

Connection Configuration

Fuzzy Search Implementation

pg_trgm Extension

Search Query Pattern

Similarity Scoring

Database Schema

Provider-Specific Tables

Spotify Schema

Tidal Schema

MusicBrainz Schema

Deezer Schema

Discogs Schema

SoundCloud Schema

ID Type Comparison

Data Type Patterns

Arrays (PostgreSQL Native)

Timestamps

Variable-Length Dates

Query Patterns

Artist Search

Album with Artists

Track with Album and Artists

External ID Lookup

Index Strategy

Required Indexes

Index Maintenance

Data Freshness

Provider Feature Matrix

Database Size Estimates

Performance Considerations

Query Performance

Optimization Strategies

Bottlenecks

Data Integrity

Backup and Recovery

Schema Evolution

26 KiB

Raw Blame History