- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
26 KiB
MiniMediaMetadataAPI - Data Layer Analysis
Database Technology
RDBMS: PostgreSQL
Driver: Npgsql 10.0.2
ORM: Dapper 2.1.72 (micro-ORM)
Extensions: pg_trgm (trigram similarity search)
Schema Ownership
Critical Constraint: This API does NOT own the database schema.
Schema Owner: MiniMediaScanner (separate project)
API Role: Read-only consumer
Migration Strategy: None (schema managed externally)
Implications
Pros:
- Clear separation of concerns
- API doesn't need provider API credentials
- Simpler deployment (no migration coordination)
- Sync complexity isolated in MiniMediaScanner
Cons:
- No control over schema evolution
- Breaking changes in MiniMediaScanner break API
- Can't optimize schema for query patterns
- Data freshness depends on external sync schedule
Coupling Points:
- Table names hardcoded in SQL queries
- Column names hardcoded in Dapper mappings
- Foreign key relationships assumed in joins
- Data types must match C# model properties
Connection Configuration
Connection String Format:
Host=localhost;
Database=minimediametadata;
Username=postgres;
Password=password;
MinPoolSize=5;
MaxPoolSize=100;
Timeout=30;
CommandTimeout=30;
Pooling Settings:
- MinPoolSize: 5 connections kept alive
- MaxPoolSize: 100 concurrent connections
- Timeout: 30 seconds to acquire connection
- CommandTimeout: 30 seconds for query execution
Connection Lifecycle:
- Connections created per repository method call
- Returned to pool after query completion
- No long-lived connections
- No transaction management (read-only)
Fuzzy Search Implementation
pg_trgm Extension
Purpose: Trigram-based similarity search for fuzzy text matching
Configuration:
SET LOCAL pg_trgm.similarity_threshold = 0.5;
Threshold: 0.5 (50% similarity required)
Operators:
%- Similarity operator (returns true if similarity >= threshold)similarity(text, text)- Returns similarity score (0.0 to 1.0)
Search Query Pattern
Example (Artist Search):
SET LOCAL pg_trgm.similarity_threshold = 0.5;
SELECT
id,
name,
popularity,
external_url,
followers,
genres,
last_sync_time
FROM spotify_artist
WHERE lower(name) % lower(@searchTerm)
ORDER BY similarity(lower(name), lower(@searchTerm)) DESC
LIMIT 20 OFFSET @offset;
Key Features:
- Case-insensitive matching (
lower()) - Similarity-based ordering (best matches first)
- Pagination support (LIMIT/OFFSET)
- Threshold filtering (only >= 50% similarity)
Performance:
- Requires GIN or GiST index on name column
- Index creation:
CREATE INDEX idx_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops); - Query time: O(log n) with index, O(n) without
Similarity Scoring
Algorithm: Trigram overlap
Example:
"Beatles" vs "Beetles"
Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["bee", "eet", "etl", "tle", "les"]
Overlap: ["tle", "les"] = 2/5 = 0.4 (below threshold)
"Beatles" vs "The Beatles"
Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["the", "he ", "e b", " be", "bea", "eat", "atl", "tle", "les"]
Overlap: ["bea", "eat", "atl", "tle", "les"] = 5/9 = 0.56 (above threshold)
Tuning:
- Lower threshold (0.3) = more results, more false positives
- Higher threshold (0.7) = fewer results, more precision
- Current 0.5 = balanced approach
Database Schema
Provider-Specific Tables
Each provider has isolated table structure. No cross-provider foreign keys.
Spotify Schema
Tables:
spotify_artist- Artist metadataspotify_artist_image- Artist images (1:N)spotify_album- Album metadataspotify_album_artist- Album-artist relationships (M:N)spotify_album_image- Album artwork (1:N)spotify_album_externalid- External identifiers (UPC, EAN) (1:N)spotify_track- Track metadataspotify_track_artist- Track-artist relationships (M:N)spotify_track_externalid- External identifiers (ISRC) (1:N)
spotify_artist:
CREATE TABLE spotify_artist (
id VARCHAR(255) PRIMARY KEY,
name VARCHAR(500) NOT NULL,
popularity INTEGER,
external_url VARCHAR(500),
followers INTEGER,
genres TEXT[], -- PostgreSQL array
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);
spotify_artist_image:
CREATE TABLE spotify_artist_image (
id SERIAL PRIMARY KEY,
artist_id VARCHAR(255) REFERENCES spotify_artist(id),
url VARCHAR(1000) NOT NULL,
height INTEGER,
width INTEGER
);
CREATE INDEX idx_spotify_artist_image_artist ON spotify_artist_image(artist_id);
spotify_album:
CREATE TABLE spotify_album (
id VARCHAR(255) PRIMARY KEY,
name VARCHAR(500) NOT NULL,
popularity INTEGER,
external_url VARCHAR(500),
label VARCHAR(500),
release_date VARCHAR(50), -- Stored as string (YYYY, YYYY-MM, or YYYY-MM-DD)
total_tracks INTEGER,
album_type VARCHAR(50), -- album, single, compilation
copyright TEXT,
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops);
spotify_album_artist (junction table):
CREATE TABLE spotify_album_artist (
id SERIAL PRIMARY KEY,
album_id VARCHAR(255) REFERENCES spotify_album(id),
artist_id VARCHAR(255) REFERENCES spotify_artist(id)
);
CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id);
CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id);
spotify_track:
CREATE TABLE spotify_track (
id VARCHAR(255) PRIMARY KEY,
name VARCHAR(500) NOT NULL,
album_id VARCHAR(255) REFERENCES spotify_album(id),
popularity INTEGER,
external_url VARCHAR(500),
duration_ms INTEGER,
explicit BOOLEAN,
disc_number INTEGER,
track_number INTEGER,
label VARCHAR(500),
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_spotify_track_album ON spotify_track(album_id);
spotify_album_externalid:
CREATE TABLE spotify_album_externalid (
id SERIAL PRIMARY KEY,
album_id VARCHAR(255) REFERENCES spotify_album(id),
type VARCHAR(50), -- upc, ean
value VARCHAR(255)
);
CREATE INDEX idx_spotify_album_externalid_album ON spotify_album_externalid(album_id);
CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value);
spotify_track_externalid:
CREATE TABLE spotify_track_externalid (
id SERIAL PRIMARY KEY,
track_id VARCHAR(255) REFERENCES spotify_track(id),
type VARCHAR(50), -- isrc
value VARCHAR(255)
);
CREATE INDEX idx_spotify_track_externalid_track ON spotify_track_externalid(track_id);
CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value);
Tidal Schema
Tables:
tidal_artist- Artist metadatatidal_artist_image_link- Artist image URLs (1:N)tidal_album- Album metadatatidal_album_external_link- External URLs (1:N)tidal_album_image- Album artwork (1:N)tidal_track- Track metadatatidal_track_artist- Track-artist relationships (M:N)tidal_track_external_link- External URLs (1:N)
Key Differences from Spotify:
- ID type: INTEGER instead of VARCHAR
- No popularity field
- No genres field
- External links instead of external IDs
- Image links stored as separate table
tidal_artist:
CREATE TABLE tidal_artist (
id INTEGER PRIMARY KEY,
name VARCHAR(500) NOT NULL,
url VARCHAR(500),
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_tidal_artist_name_trgm ON tidal_artist USING gin(lower(name) gin_trgm_ops);
tidal_album:
CREATE TABLE tidal_album (
id INTEGER PRIMARY KEY,
name VARCHAR(500) NOT NULL,
artist_id INTEGER REFERENCES tidal_artist(id),
url VARCHAR(500),
release_date VARCHAR(50),
total_tracks INTEGER,
duration INTEGER, -- Total duration in seconds
explicit BOOLEAN,
upc VARCHAR(255),
copyright TEXT,
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_tidal_album_name_trgm ON tidal_album USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_tidal_album_artist ON tidal_album(artist_id);
MusicBrainz Schema
Tables:
musicbrainz_artist- Artist metadatamusicbrainz_release- Release (album) metadatamusicbrainz_release_label- Release-label relationships (M:N)musicbrainz_label- Label metadatamusicbrainz_release_track- Track metadatamusicbrainz_release_track_artist- Track-artist relationships (M:N)
Key Differences:
- ID type: UUID (Guid)
- "Release" instead of "Album"
- Sort name field for artists
- Label as separate entity
- No popularity or follower counts
- No images (stored externally via Cover Art Archive)
musicbrainz_artist:
CREATE TABLE musicbrainz_artist (
id UUID PRIMARY KEY,
name VARCHAR(500) NOT NULL,
sort_name VARCHAR(500), -- For alphabetical sorting (e.g., "Beatles, The")
type VARCHAR(100), -- Person, Group, Orchestra, etc.
country VARCHAR(2), -- ISO country code
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_musicbrainz_artist_name_trgm ON musicbrainz_artist USING gin(lower(name) gin_trgm_ops);
musicbrainz_release:
CREATE TABLE musicbrainz_release (
id UUID PRIMARY KEY,
name VARCHAR(500) NOT NULL,
artist_id UUID REFERENCES musicbrainz_artist(id),
release_date VARCHAR(50),
country VARCHAR(2),
barcode VARCHAR(255), -- Similar to UPC
status VARCHAR(100), -- Official, Promotion, Bootleg, etc.
packaging VARCHAR(100), -- Jewel Case, Digipak, etc.
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_musicbrainz_release_name_trgm ON musicbrainz_release USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_musicbrainz_release_artist ON musicbrainz_release(artist_id);
musicbrainz_label:
CREATE TABLE musicbrainz_label (
id UUID PRIMARY KEY,
name VARCHAR(500) NOT NULL,
type VARCHAR(100), -- Original Production, Bootleg Production, etc.
country VARCHAR(2),
last_sync_time TIMESTAMP WITH TIME ZONE
);
musicbrainz_release_label (junction table):
CREATE TABLE musicbrainz_release_label (
id SERIAL PRIMARY KEY,
release_id UUID REFERENCES musicbrainz_release(id),
label_id UUID REFERENCES musicbrainz_label(id),
catalog_number VARCHAR(255)
);
CREATE INDEX idx_musicbrainz_release_label_release ON musicbrainz_release_label(release_id);
CREATE INDEX idx_musicbrainz_release_label_label ON musicbrainz_release_label(label_id);
Deezer Schema
Tables:
deezer_artist- Artist metadatadeezer_artist_image_link- Artist image URLs (1:N)deezer_album- Album metadatadeezer_album_image_link- Album artwork URLs (1:N)deezer_album_artist- Album-artist relationships (M:N)deezer_track- Track metadatadeezer_track_artist- Track-artist relationships (M:N)
Key Differences:
- ID type: BIGINT
- Has popularity (called "fans")
- Has genres
- No UPC/ISRC fields
- No label information
deezer_artist:
CREATE TABLE deezer_artist (
id BIGINT PRIMARY KEY,
name VARCHAR(500) NOT NULL,
url VARCHAR(500),
fans INTEGER, -- Similar to followers
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_deezer_artist_name_trgm ON deezer_artist USING gin(lower(name) gin_trgm_ops);
deezer_album:
CREATE TABLE deezer_album (
id BIGINT PRIMARY KEY,
name VARCHAR(500) NOT NULL,
url VARCHAR(500),
release_date VARCHAR(50),
total_tracks INTEGER,
duration INTEGER, -- Total duration in seconds
explicit BOOLEAN,
fans INTEGER,
genres TEXT[], -- PostgreSQL array
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_deezer_album_name_trgm ON deezer_album USING gin(lower(name) gin_trgm_ops);
Discogs Schema
Tables:
discogs_artist- Artist metadatadiscogs_artist_alias- Artist aliases (1:N)discogs_artist_url- Artist URLs (1:N)discogs_release- Release metadatadiscogs_release_artist- Release-artist relationships (M:N)discogs_release_identifier- Barcodes/identifiers (1:N)discogs_release_track- Track metadatadiscogs_label- Label metadatadiscogs_label_sublabel- Label hierarchy (1:N)discogs_label_url- Label URLs (1:N)
Key Differences:
- ID type: INTEGER
- Most comprehensive label data
- Artist aliases tracked
- Multiple identifiers per release (Barcode, Matrix, etc.)
- No popularity metrics
- No image URLs (stored externally)
discogs_artist:
CREATE TABLE discogs_artist (
id INTEGER PRIMARY KEY,
name VARCHAR(500) NOT NULL,
real_name VARCHAR(500), -- For pseudonyms
profile TEXT, -- Biography
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_discogs_artist_name_trgm ON discogs_artist USING gin(lower(name) gin_trgm_ops);
discogs_artist_alias:
CREATE TABLE discogs_artist_alias (
id SERIAL PRIMARY KEY,
artist_id INTEGER REFERENCES discogs_artist(id),
alias_name VARCHAR(500)
);
CREATE INDEX idx_discogs_artist_alias_artist ON discogs_artist_alias(artist_id);
CREATE INDEX idx_discogs_artist_alias_name_trgm ON discogs_artist_alias USING gin(lower(alias_name) gin_trgm_ops);
discogs_release:
CREATE TABLE discogs_release (
id INTEGER PRIMARY KEY,
name VARCHAR(500) NOT NULL,
released VARCHAR(50),
country VARCHAR(100),
notes TEXT,
genres TEXT[],
styles TEXT[], -- More specific than genres
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_discogs_release_name_trgm ON discogs_release USING gin(lower(name) gin_trgm_ops);
discogs_release_identifier:
CREATE TABLE discogs_release_identifier (
id SERIAL PRIMARY KEY,
release_id INTEGER REFERENCES discogs_release(id),
type VARCHAR(100), -- Barcode, Matrix/Runout, Label Code, etc.
value VARCHAR(500),
description TEXT
);
CREATE INDEX idx_discogs_release_identifier_release ON discogs_release_identifier(release_id);
CREATE INDEX idx_discogs_release_identifier_value ON discogs_release_identifier(value);
discogs_label:
CREATE TABLE discogs_label (
id INTEGER PRIMARY KEY,
name VARCHAR(500) NOT NULL,
contact_info TEXT,
profile TEXT,
parent_label_id INTEGER REFERENCES discogs_label(id),
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_discogs_label_name_trgm ON discogs_label USING gin(lower(name) gin_trgm_ops);
SoundCloud Schema
Tables:
soundcloud_user- User/artist metadatasoundcloud_playlist- Playlist metadatasoundcloud_track- Track metadatasoundcloud_track_artist- Track-artist relationships (M:N)
Key Differences:
- "User" instead of "Artist" (user-generated content platform)
- Playlist as first-class entity
- No album concept
- Minimal metadata (no UPC, ISRC, labels)
- ID type: BIGINT
soundcloud_user:
CREATE TABLE soundcloud_user (
id BIGINT PRIMARY KEY,
username VARCHAR(500) NOT NULL,
full_name VARCHAR(500),
url VARCHAR(500),
avatar_url VARCHAR(1000),
followers_count INTEGER,
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_soundcloud_user_username_trgm ON soundcloud_user USING gin(lower(username) gin_trgm_ops);
soundcloud_playlist:
CREATE TABLE soundcloud_playlist (
id BIGINT PRIMARY KEY,
title VARCHAR(500) NOT NULL,
user_id BIGINT REFERENCES soundcloud_user(id),
url VARCHAR(500),
artwork_url VARCHAR(1000),
duration INTEGER, -- Total duration in milliseconds
track_count INTEGER,
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_soundcloud_playlist_title_trgm ON soundcloud_playlist USING gin(lower(title) gin_trgm_ops);
CREATE INDEX idx_soundcloud_playlist_user ON soundcloud_playlist(user_id);
soundcloud_track:
CREATE TABLE soundcloud_track (
id BIGINT PRIMARY KEY,
title VARCHAR(500) NOT NULL,
user_id BIGINT REFERENCES soundcloud_user(id),
url VARCHAR(500),
artwork_url VARCHAR(1000),
duration INTEGER, -- Duration in milliseconds
genre VARCHAR(255),
playback_count INTEGER,
last_sync_time TIMESTAMP WITH TIME ZONE
);
CREATE INDEX idx_soundcloud_track_title_trgm ON soundcloud_track USING gin(lower(title) gin_trgm_ops);
CREATE INDEX idx_soundcloud_track_user ON soundcloud_track(user_id);
ID Type Comparison
| Provider | Artist ID | Album ID | Track ID | Notes |
|---|---|---|---|---|
| Spotify | VARCHAR(255) | VARCHAR(255) | VARCHAR(255) | Base62 encoded (22 chars) |
| Tidal | INTEGER | INTEGER | INTEGER | Sequential integers |
| MusicBrainz | UUID | UUID | UUID | RFC 4122 UUIDs |
| Deezer | BIGINT | BIGINT | BIGINT | Large integers |
| Discogs | INTEGER | INTEGER | INTEGER | Sequential integers |
| SoundCloud | BIGINT | N/A | BIGINT | No album concept |
Implications:
- Cross-provider ID lookups impossible
- ID parameter must match provider type
- C# models use provider-specific types
- No universal identifier system
Data Type Patterns
Arrays (PostgreSQL Native)
Usage: Genres, styles, external IDs
Example:
genres TEXT[] -- ["rock", "pop", "alternative"]
Dapper Mapping:
public class SpotifyArtist
{
public string[] Genres { get; set; } // Dapper auto-maps PostgreSQL arrays
}
Timestamps
Type: TIMESTAMP WITH TIME ZONE
Purpose: Track last sync time from provider
Example:
last_sync_time TIMESTAMP WITH TIME ZONE DEFAULT NOW()
C# Mapping:
public DateTime? LastSyncTime { get; set; }
Variable-Length Dates
Type: VARCHAR(50)
Formats: YYYY, YYYY-MM, YYYY-MM-DD
Rationale: Providers return different precision levels
Examples:
"1969"- Year only"1969-09"- Year and month"1969-09-26"- Full date
C# Mapping:
public string ReleaseDate { get; set; } // Stored as string, parsed in application
Query Patterns
Artist Search
SET LOCAL pg_trgm.similarity_threshold = 0.5;
SELECT
a.id,
a.name,
a.popularity,
a.external_url,
a.followers,
a.genres,
a.last_sync_time,
i.url AS image_url,
i.height AS image_height,
i.width AS image_width
FROM spotify_artist a
LEFT JOIN spotify_artist_image i ON a.id = i.artist_id
WHERE lower(a.name) % lower(@searchTerm)
ORDER BY similarity(lower(a.name), lower(@searchTerm)) DESC
LIMIT 20 OFFSET @offset;
Dapper Mapping:
var artistDict = new Dictionary<string, SpotifyArtist>();
var results = await connection.QueryAsync<SpotifyArtist, SpotifyArtistImage, SpotifyArtist>(
sql,
(artist, image) =>
{
if (!artistDict.TryGetValue(artist.Id, out var existingArtist))
{
existingArtist = artist;
existingArtist.Images = new List<SpotifyArtistImage>();
artistDict.Add(artist.Id, existingArtist);
}
if (image != null)
{
existingArtist.Images.Add(image);
}
return existingArtist;
},
new { searchTerm, offset },
splitOn: "image_url"
);
return artistDict.Values.ToList();
Album with Artists
SELECT
a.id,
a.name,
a.popularity,
a.external_url,
a.label,
a.release_date,
a.total_tracks,
a.album_type,
a.copyright,
a.last_sync_time,
ar.id AS artist_id,
ar.name AS artist_name
FROM spotify_album a
LEFT JOIN spotify_album_artist aa ON a.id = aa.album_id
LEFT JOIN spotify_artist ar ON aa.artist_id = ar.id
WHERE a.id = @albumId;
Multi-Mapping: Album with nested artist list.
Track with Album and Artists
SELECT
t.id,
t.name,
t.popularity,
t.external_url,
t.duration_ms,
t.explicit,
t.disc_number,
t.track_number,
t.label,
t.last_sync_time,
a.id AS album_id,
a.name AS album_name,
a.release_date AS album_release_date,
ar.id AS artist_id,
ar.name AS artist_name
FROM spotify_track t
LEFT JOIN spotify_album a ON t.album_id = a.id
LEFT JOIN spotify_track_artist ta ON t.id = ta.track_id
LEFT JOIN spotify_artist ar ON ta.artist_id = ar.id
WHERE t.id = @trackId;
Multi-Mapping: Track with nested album and artist list.
External ID Lookup
SELECT
a.id,
a.name,
a.popularity,
a.external_url,
a.label,
a.release_date,
a.total_tracks,
a.album_type,
a.last_sync_time
FROM spotify_album a
INNER JOIN spotify_album_externalid e ON a.id = e.album_id
WHERE e.type = 'upc' AND e.value = @upc;
Use Case: Find album by UPC barcode.
Index Strategy
Required Indexes
Fuzzy Search (GIN trigram):
CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops);
CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops);
Foreign Keys:
CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id);
CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id);
CREATE INDEX idx_spotify_track_album ON spotify_track(album_id);
CREATE INDEX idx_spotify_track_artist_track ON spotify_track_artist(track_id);
CREATE INDEX idx_spotify_track_artist_artist ON spotify_track_artist(artist_id);
External IDs:
CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value);
CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value);
Index Maintenance
Owned by: MiniMediaScanner (schema owner)
API Responsibility: None (read-only consumer)
Performance Impact:
- GIN indexes: Large (2-3x table size), slow writes, fast reads
- B-tree indexes: Moderate size, fast writes, fast reads
- No index = full table scan (unacceptable for fuzzy search)
Data Freshness
Sync Mechanism: MiniMediaScanner polls provider APIs
Sync Frequency: Unknown (configured in MiniMediaScanner)
Staleness Indicator: last_sync_time column
API Behavior:
- Returns whatever data exists in database
- No real-time provider API calls
- No cache invalidation
- No sync triggering
Client Considerations:
- Check
lastSyncTimein response - Stale data possible (hours to days old)
- No guarantee of completeness
- Provider outages affect sync, not queries
Provider Feature Matrix
| Feature | Spotify | Tidal | MusicBrainz | Deezer | Discogs | SoundCloud |
|---|---|---|---|---|---|---|
| Artist Data | ||||||
| Popularity | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | ✗ |
| Followers | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | ✓ |
| Genres | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ |
| Images | ✓ | ✓ | ✗ | ✓ | ✗ | ✓ (avatar) |
| Sort Name | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
| Aliases | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ |
| Album Data | ||||||
| Popularity | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | N/A |
| Images | ✓ | ✓ | ✗ | ✓ | ✗ | N/A |
| Label | ✓ | ✗ | ✓ | ✗ | ✓ | N/A |
| UPC | ✓ | ✓ | ✗ | ✗ | ✓ | N/A |
| Copyright | ✓ | ✓ | ✗ | ✗ | ✗ | N/A |
| Album Type | ✓ | ✗ | ✓ | ✗ | ✗ | N/A |
| Track Data | ||||||
| Popularity | ✓ | ✗ | ✗ | ✓ | ✗ | ✓ (playback_count) |
| Duration | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Explicit | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
| ISRC | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| Disc/Track # | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
Database Size Estimates
Assumptions:
- 1 million artists
- 10 million albums
- 100 million tracks
Spotify Tables:
spotify_artist: ~500 MBspotify_artist_image: ~200 MBspotify_album: ~5 GBspotify_album_artist: ~1 GBspotify_album_image: ~2 GBspotify_track: ~50 GBspotify_track_artist: ~10 GB- Total: ~70 GB per provider
All Providers: ~420 GB (6 providers)
Indexes: ~200 GB (GIN indexes are large)
Total Database: ~620 GB for comprehensive catalog
Implications:
- Requires substantial storage
- Backup/restore time significant
- Index rebuilds time-consuming
- Connection pooling critical
Performance Considerations
Query Performance
Fuzzy Search:
- With GIN index: 10-50ms for 20 results
- Without index: 5-30 seconds (full table scan)
- Threshold tuning affects result count and speed
ID Lookup:
- With primary key: <1ms
- With foreign key index: 1-5ms
Join Queries:
- Album with artists: 5-20ms
- Track with album and artists: 10-30ms
- Depends on relationship cardinality
Optimization Strategies
Implemented:
- GIN indexes for fuzzy search
- B-tree indexes for foreign keys
- Connection pooling
- Parameterized queries (SQL injection prevention)
Missing:
- Query result caching (Redis/Memcached)
- Materialized views for complex joins
- Partitioning for large tables
- Read replicas for horizontal scaling
Bottlenecks
- GIN Index Size: Large memory footprint
- Fuzzy Search: CPU-intensive similarity calculations
- Multi-Provider Queries: 6 parallel database queries
- No Caching: Every request hits database
- Connection Pool Limit: 100 max connections per instance
Data Integrity
Constraints:
- Primary keys on all entity tables
- Foreign keys for relationships
- NOT NULL on critical fields (id, name)
No Constraints:
- No unique constraints on names (duplicates allowed)
- No check constraints on data ranges
- No triggers for data validation
Orphan Prevention:
- Foreign keys with CASCADE delete (assumed)
- Junction tables maintain referential integrity
Data Quality:
- Depends entirely on MiniMediaScanner sync quality
- No validation in this API
- Garbage in, garbage out
Backup and Recovery
Responsibility: Database administrator (not API)
Recommended Strategy:
- Daily full backups
- Continuous WAL archiving
- Point-in-time recovery capability
- Backup retention: 30 days
Recovery Time:
- Full restore: Hours (620 GB database)
- Index rebuild: Hours (GIN indexes)
- Sync from providers: Days to weeks
Schema Evolution
Change Process:
- MiniMediaScanner updates schema
- MiniMediaScanner deploys migration
- MiniMediaMetadataAPI updates models
- MiniMediaMetadataAPI redeploys
Risk: Breaking changes require coordinated deployment.
Mitigation:
- Additive changes only (new columns, tables)
- Deprecation period for removals
- Version compatibility checks
No Automated Migration: API has no migration framework.