# MiniMediaMetadataAPI - Data Layer Analysis ## Database Technology **RDBMS:** PostgreSQL **Driver:** Npgsql 10.0.2 **ORM:** Dapper 2.1.72 (micro-ORM) **Extensions:** pg_trgm (trigram similarity search) ## Schema Ownership **Critical Constraint:** This API does NOT own the database schema. **Schema Owner:** MiniMediaScanner (separate project) **API Role:** Read-only consumer **Migration Strategy:** None (schema managed externally) ### Implications **Pros:** - Clear separation of concerns - API doesn't need provider API credentials - Simpler deployment (no migration coordination) - Sync complexity isolated in MiniMediaScanner **Cons:** - No control over schema evolution - Breaking changes in MiniMediaScanner break API - Can't optimize schema for query patterns - Data freshness depends on external sync schedule **Coupling Points:** - Table names hardcoded in SQL queries - Column names hardcoded in Dapper mappings - Foreign key relationships assumed in joins - Data types must match C# model properties ## Connection Configuration **Connection String Format:** ``` Host=localhost; Database=minimediametadata; Username=postgres; Password=password; MinPoolSize=5; MaxPoolSize=100; Timeout=30; CommandTimeout=30; ``` **Pooling Settings:** - **MinPoolSize:** 5 connections kept alive - **MaxPoolSize:** 100 concurrent connections - **Timeout:** 30 seconds to acquire connection - **CommandTimeout:** 30 seconds for query execution **Connection Lifecycle:** - Connections created per repository method call - Returned to pool after query completion - No long-lived connections - No transaction management (read-only) ## Fuzzy Search Implementation ### pg_trgm Extension **Purpose:** Trigram-based similarity search for fuzzy text matching **Configuration:** ```sql SET LOCAL pg_trgm.similarity_threshold = 0.5; ``` **Threshold:** 0.5 (50% similarity required) **Operators:** - `%` - Similarity operator (returns true if similarity >= threshold) - `similarity(text, text)` - Returns similarity score (0.0 to 1.0) ### Search Query Pattern **Example (Artist Search):** ```sql SET LOCAL pg_trgm.similarity_threshold = 0.5; SELECT id, name, popularity, external_url, followers, genres, last_sync_time FROM spotify_artist WHERE lower(name) % lower(@searchTerm) ORDER BY similarity(lower(name), lower(@searchTerm)) DESC LIMIT 20 OFFSET @offset; ``` **Key Features:** - Case-insensitive matching (`lower()`) - Similarity-based ordering (best matches first) - Pagination support (LIMIT/OFFSET) - Threshold filtering (only >= 50% similarity) **Performance:** - Requires GIN or GiST index on name column - Index creation: `CREATE INDEX idx_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);` - Query time: O(log n) with index, O(n) without ### Similarity Scoring **Algorithm:** Trigram overlap **Example:** ``` "Beatles" vs "Beetles" Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["bee", "eet", "etl", "tle", "les"] Overlap: ["tle", "les"] = 2/5 = 0.4 (below threshold) "Beatles" vs "The Beatles" Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["the", "he ", "e b", " be", "bea", "eat", "atl", "tle", "les"] Overlap: ["bea", "eat", "atl", "tle", "les"] = 5/9 = 0.56 (above threshold) ``` **Tuning:** - Lower threshold (0.3) = more results, more false positives - Higher threshold (0.7) = fewer results, more precision - Current 0.5 = balanced approach ## Database Schema ### Provider-Specific Tables Each provider has isolated table structure. No cross-provider foreign keys. ### Spotify Schema **Tables:** 1. `spotify_artist` - Artist metadata 2. `spotify_artist_image` - Artist images (1:N) 3. `spotify_album` - Album metadata 4. `spotify_album_artist` - Album-artist relationships (M:N) 5. `spotify_album_image` - Album artwork (1:N) 6. `spotify_album_externalid` - External identifiers (UPC, EAN) (1:N) 7. `spotify_track` - Track metadata 8. `spotify_track_artist` - Track-artist relationships (M:N) 9. `spotify_track_externalid` - External identifiers (ISRC) (1:N) **spotify_artist:** ```sql CREATE TABLE spotify_artist ( id VARCHAR(255) PRIMARY KEY, name VARCHAR(500) NOT NULL, popularity INTEGER, external_url VARCHAR(500), followers INTEGER, genres TEXT[], -- PostgreSQL array last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops); ``` **spotify_artist_image:** ```sql CREATE TABLE spotify_artist_image ( id SERIAL PRIMARY KEY, artist_id VARCHAR(255) REFERENCES spotify_artist(id), url VARCHAR(1000) NOT NULL, height INTEGER, width INTEGER ); CREATE INDEX idx_spotify_artist_image_artist ON spotify_artist_image(artist_id); ``` **spotify_album:** ```sql CREATE TABLE spotify_album ( id VARCHAR(255) PRIMARY KEY, name VARCHAR(500) NOT NULL, popularity INTEGER, external_url VARCHAR(500), label VARCHAR(500), release_date VARCHAR(50), -- Stored as string (YYYY, YYYY-MM, or YYYY-MM-DD) total_tracks INTEGER, album_type VARCHAR(50), -- album, single, compilation copyright TEXT, last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops); ``` **spotify_album_artist (junction table):** ```sql CREATE TABLE spotify_album_artist ( id SERIAL PRIMARY KEY, album_id VARCHAR(255) REFERENCES spotify_album(id), artist_id VARCHAR(255) REFERENCES spotify_artist(id) ); CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id); CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id); ``` **spotify_track:** ```sql CREATE TABLE spotify_track ( id VARCHAR(255) PRIMARY KEY, name VARCHAR(500) NOT NULL, album_id VARCHAR(255) REFERENCES spotify_album(id), popularity INTEGER, external_url VARCHAR(500), duration_ms INTEGER, explicit BOOLEAN, disc_number INTEGER, track_number INTEGER, label VARCHAR(500), last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops); CREATE INDEX idx_spotify_track_album ON spotify_track(album_id); ``` **spotify_album_externalid:** ```sql CREATE TABLE spotify_album_externalid ( id SERIAL PRIMARY KEY, album_id VARCHAR(255) REFERENCES spotify_album(id), type VARCHAR(50), -- upc, ean value VARCHAR(255) ); CREATE INDEX idx_spotify_album_externalid_album ON spotify_album_externalid(album_id); CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value); ``` **spotify_track_externalid:** ```sql CREATE TABLE spotify_track_externalid ( id SERIAL PRIMARY KEY, track_id VARCHAR(255) REFERENCES spotify_track(id), type VARCHAR(50), -- isrc value VARCHAR(255) ); CREATE INDEX idx_spotify_track_externalid_track ON spotify_track_externalid(track_id); CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value); ``` ### Tidal Schema **Tables:** 1. `tidal_artist` - Artist metadata 2. `tidal_artist_image_link` - Artist image URLs (1:N) 3. `tidal_album` - Album metadata 4. `tidal_album_external_link` - External URLs (1:N) 5. `tidal_album_image` - Album artwork (1:N) 6. `tidal_track` - Track metadata 7. `tidal_track_artist` - Track-artist relationships (M:N) 8. `tidal_track_external_link` - External URLs (1:N) **Key Differences from Spotify:** - ID type: INTEGER instead of VARCHAR - No popularity field - No genres field - External links instead of external IDs - Image links stored as separate table **tidal_artist:** ```sql CREATE TABLE tidal_artist ( id INTEGER PRIMARY KEY, name VARCHAR(500) NOT NULL, url VARCHAR(500), last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_tidal_artist_name_trgm ON tidal_artist USING gin(lower(name) gin_trgm_ops); ``` **tidal_album:** ```sql CREATE TABLE tidal_album ( id INTEGER PRIMARY KEY, name VARCHAR(500) NOT NULL, artist_id INTEGER REFERENCES tidal_artist(id), url VARCHAR(500), release_date VARCHAR(50), total_tracks INTEGER, duration INTEGER, -- Total duration in seconds explicit BOOLEAN, upc VARCHAR(255), copyright TEXT, last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_tidal_album_name_trgm ON tidal_album USING gin(lower(name) gin_trgm_ops); CREATE INDEX idx_tidal_album_artist ON tidal_album(artist_id); ``` ### MusicBrainz Schema **Tables:** 1. `musicbrainz_artist` - Artist metadata 2. `musicbrainz_release` - Release (album) metadata 3. `musicbrainz_release_label` - Release-label relationships (M:N) 4. `musicbrainz_label` - Label metadata 5. `musicbrainz_release_track` - Track metadata 6. `musicbrainz_release_track_artist` - Track-artist relationships (M:N) **Key Differences:** - ID type: UUID (Guid) - "Release" instead of "Album" - Sort name field for artists - Label as separate entity - No popularity or follower counts - No images (stored externally via Cover Art Archive) **musicbrainz_artist:** ```sql CREATE TABLE musicbrainz_artist ( id UUID PRIMARY KEY, name VARCHAR(500) NOT NULL, sort_name VARCHAR(500), -- For alphabetical sorting (e.g., "Beatles, The") type VARCHAR(100), -- Person, Group, Orchestra, etc. country VARCHAR(2), -- ISO country code last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_musicbrainz_artist_name_trgm ON musicbrainz_artist USING gin(lower(name) gin_trgm_ops); ``` **musicbrainz_release:** ```sql CREATE TABLE musicbrainz_release ( id UUID PRIMARY KEY, name VARCHAR(500) NOT NULL, artist_id UUID REFERENCES musicbrainz_artist(id), release_date VARCHAR(50), country VARCHAR(2), barcode VARCHAR(255), -- Similar to UPC status VARCHAR(100), -- Official, Promotion, Bootleg, etc. packaging VARCHAR(100), -- Jewel Case, Digipak, etc. last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_musicbrainz_release_name_trgm ON musicbrainz_release USING gin(lower(name) gin_trgm_ops); CREATE INDEX idx_musicbrainz_release_artist ON musicbrainz_release(artist_id); ``` **musicbrainz_label:** ```sql CREATE TABLE musicbrainz_label ( id UUID PRIMARY KEY, name VARCHAR(500) NOT NULL, type VARCHAR(100), -- Original Production, Bootleg Production, etc. country VARCHAR(2), last_sync_time TIMESTAMP WITH TIME ZONE ); ``` **musicbrainz_release_label (junction table):** ```sql CREATE TABLE musicbrainz_release_label ( id SERIAL PRIMARY KEY, release_id UUID REFERENCES musicbrainz_release(id), label_id UUID REFERENCES musicbrainz_label(id), catalog_number VARCHAR(255) ); CREATE INDEX idx_musicbrainz_release_label_release ON musicbrainz_release_label(release_id); CREATE INDEX idx_musicbrainz_release_label_label ON musicbrainz_release_label(label_id); ``` ### Deezer Schema **Tables:** 1. `deezer_artist` - Artist metadata 2. `deezer_artist_image_link` - Artist image URLs (1:N) 3. `deezer_album` - Album metadata 4. `deezer_album_image_link` - Album artwork URLs (1:N) 5. `deezer_album_artist` - Album-artist relationships (M:N) 6. `deezer_track` - Track metadata 7. `deezer_track_artist` - Track-artist relationships (M:N) **Key Differences:** - ID type: BIGINT - Has popularity (called "fans") - Has genres - No UPC/ISRC fields - No label information **deezer_artist:** ```sql CREATE TABLE deezer_artist ( id BIGINT PRIMARY KEY, name VARCHAR(500) NOT NULL, url VARCHAR(500), fans INTEGER, -- Similar to followers last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_deezer_artist_name_trgm ON deezer_artist USING gin(lower(name) gin_trgm_ops); ``` **deezer_album:** ```sql CREATE TABLE deezer_album ( id BIGINT PRIMARY KEY, name VARCHAR(500) NOT NULL, url VARCHAR(500), release_date VARCHAR(50), total_tracks INTEGER, duration INTEGER, -- Total duration in seconds explicit BOOLEAN, fans INTEGER, genres TEXT[], -- PostgreSQL array last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_deezer_album_name_trgm ON deezer_album USING gin(lower(name) gin_trgm_ops); ``` ### Discogs Schema **Tables:** 1. `discogs_artist` - Artist metadata 2. `discogs_artist_alias` - Artist aliases (1:N) 3. `discogs_artist_url` - Artist URLs (1:N) 4. `discogs_release` - Release metadata 5. `discogs_release_artist` - Release-artist relationships (M:N) 6. `discogs_release_identifier` - Barcodes/identifiers (1:N) 7. `discogs_release_track` - Track metadata 8. `discogs_label` - Label metadata 9. `discogs_label_sublabel` - Label hierarchy (1:N) 10. `discogs_label_url` - Label URLs (1:N) **Key Differences:** - ID type: INTEGER - Most comprehensive label data - Artist aliases tracked - Multiple identifiers per release (Barcode, Matrix, etc.) - No popularity metrics - No image URLs (stored externally) **discogs_artist:** ```sql CREATE TABLE discogs_artist ( id INTEGER PRIMARY KEY, name VARCHAR(500) NOT NULL, real_name VARCHAR(500), -- For pseudonyms profile TEXT, -- Biography last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_discogs_artist_name_trgm ON discogs_artist USING gin(lower(name) gin_trgm_ops); ``` **discogs_artist_alias:** ```sql CREATE TABLE discogs_artist_alias ( id SERIAL PRIMARY KEY, artist_id INTEGER REFERENCES discogs_artist(id), alias_name VARCHAR(500) ); CREATE INDEX idx_discogs_artist_alias_artist ON discogs_artist_alias(artist_id); CREATE INDEX idx_discogs_artist_alias_name_trgm ON discogs_artist_alias USING gin(lower(alias_name) gin_trgm_ops); ``` **discogs_release:** ```sql CREATE TABLE discogs_release ( id INTEGER PRIMARY KEY, name VARCHAR(500) NOT NULL, released VARCHAR(50), country VARCHAR(100), notes TEXT, genres TEXT[], styles TEXT[], -- More specific than genres last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_discogs_release_name_trgm ON discogs_release USING gin(lower(name) gin_trgm_ops); ``` **discogs_release_identifier:** ```sql CREATE TABLE discogs_release_identifier ( id SERIAL PRIMARY KEY, release_id INTEGER REFERENCES discogs_release(id), type VARCHAR(100), -- Barcode, Matrix/Runout, Label Code, etc. value VARCHAR(500), description TEXT ); CREATE INDEX idx_discogs_release_identifier_release ON discogs_release_identifier(release_id); CREATE INDEX idx_discogs_release_identifier_value ON discogs_release_identifier(value); ``` **discogs_label:** ```sql CREATE TABLE discogs_label ( id INTEGER PRIMARY KEY, name VARCHAR(500) NOT NULL, contact_info TEXT, profile TEXT, parent_label_id INTEGER REFERENCES discogs_label(id), last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_discogs_label_name_trgm ON discogs_label USING gin(lower(name) gin_trgm_ops); ``` ### SoundCloud Schema **Tables:** 1. `soundcloud_user` - User/artist metadata 2. `soundcloud_playlist` - Playlist metadata 3. `soundcloud_track` - Track metadata 4. `soundcloud_track_artist` - Track-artist relationships (M:N) **Key Differences:** - "User" instead of "Artist" (user-generated content platform) - Playlist as first-class entity - No album concept - Minimal metadata (no UPC, ISRC, labels) - ID type: BIGINT **soundcloud_user:** ```sql CREATE TABLE soundcloud_user ( id BIGINT PRIMARY KEY, username VARCHAR(500) NOT NULL, full_name VARCHAR(500), url VARCHAR(500), avatar_url VARCHAR(1000), followers_count INTEGER, last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_soundcloud_user_username_trgm ON soundcloud_user USING gin(lower(username) gin_trgm_ops); ``` **soundcloud_playlist:** ```sql CREATE TABLE soundcloud_playlist ( id BIGINT PRIMARY KEY, title VARCHAR(500) NOT NULL, user_id BIGINT REFERENCES soundcloud_user(id), url VARCHAR(500), artwork_url VARCHAR(1000), duration INTEGER, -- Total duration in milliseconds track_count INTEGER, last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_soundcloud_playlist_title_trgm ON soundcloud_playlist USING gin(lower(title) gin_trgm_ops); CREATE INDEX idx_soundcloud_playlist_user ON soundcloud_playlist(user_id); ``` **soundcloud_track:** ```sql CREATE TABLE soundcloud_track ( id BIGINT PRIMARY KEY, title VARCHAR(500) NOT NULL, user_id BIGINT REFERENCES soundcloud_user(id), url VARCHAR(500), artwork_url VARCHAR(1000), duration INTEGER, -- Duration in milliseconds genre VARCHAR(255), playback_count INTEGER, last_sync_time TIMESTAMP WITH TIME ZONE ); CREATE INDEX idx_soundcloud_track_title_trgm ON soundcloud_track USING gin(lower(title) gin_trgm_ops); CREATE INDEX idx_soundcloud_track_user ON soundcloud_track(user_id); ``` ## ID Type Comparison | Provider | Artist ID | Album ID | Track ID | Notes | |----------|-----------|----------|----------|-------| | Spotify | VARCHAR(255) | VARCHAR(255) | VARCHAR(255) | Base62 encoded (22 chars) | | Tidal | INTEGER | INTEGER | INTEGER | Sequential integers | | MusicBrainz | UUID | UUID | UUID | RFC 4122 UUIDs | | Deezer | BIGINT | BIGINT | BIGINT | Large integers | | Discogs | INTEGER | INTEGER | INTEGER | Sequential integers | | SoundCloud | BIGINT | N/A | BIGINT | No album concept | **Implications:** - Cross-provider ID lookups impossible - ID parameter must match provider type - C# models use provider-specific types - No universal identifier system ## Data Type Patterns ### Arrays (PostgreSQL Native) **Usage:** Genres, styles, external IDs **Example:** ```sql genres TEXT[] -- ["rock", "pop", "alternative"] ``` **Dapper Mapping:** ```csharp public class SpotifyArtist { public string[] Genres { get; set; } // Dapper auto-maps PostgreSQL arrays } ``` ### Timestamps **Type:** `TIMESTAMP WITH TIME ZONE` **Purpose:** Track last sync time from provider **Example:** ```sql last_sync_time TIMESTAMP WITH TIME ZONE DEFAULT NOW() ``` **C# Mapping:** ```csharp public DateTime? LastSyncTime { get; set; } ``` ### Variable-Length Dates **Type:** VARCHAR(50) **Formats:** YYYY, YYYY-MM, YYYY-MM-DD **Rationale:** Providers return different precision levels **Examples:** - `"1969"` - Year only - `"1969-09"` - Year and month - `"1969-09-26"` - Full date **C# Mapping:** ```csharp public string ReleaseDate { get; set; } // Stored as string, parsed in application ``` ## Query Patterns ### Artist Search ```sql SET LOCAL pg_trgm.similarity_threshold = 0.5; SELECT a.id, a.name, a.popularity, a.external_url, a.followers, a.genres, a.last_sync_time, i.url AS image_url, i.height AS image_height, i.width AS image_width FROM spotify_artist a LEFT JOIN spotify_artist_image i ON a.id = i.artist_id WHERE lower(a.name) % lower(@searchTerm) ORDER BY similarity(lower(a.name), lower(@searchTerm)) DESC LIMIT 20 OFFSET @offset; ``` **Dapper Mapping:** ```csharp var artistDict = new Dictionary(); var results = await connection.QueryAsync( sql, (artist, image) => { if (!artistDict.TryGetValue(artist.Id, out var existingArtist)) { existingArtist = artist; existingArtist.Images = new List(); artistDict.Add(artist.Id, existingArtist); } if (image != null) { existingArtist.Images.Add(image); } return existingArtist; }, new { searchTerm, offset }, splitOn: "image_url" ); return artistDict.Values.ToList(); ``` ### Album with Artists ```sql SELECT a.id, a.name, a.popularity, a.external_url, a.label, a.release_date, a.total_tracks, a.album_type, a.copyright, a.last_sync_time, ar.id AS artist_id, ar.name AS artist_name FROM spotify_album a LEFT JOIN spotify_album_artist aa ON a.id = aa.album_id LEFT JOIN spotify_artist ar ON aa.artist_id = ar.id WHERE a.id = @albumId; ``` **Multi-Mapping:** Album with nested artist list. ### Track with Album and Artists ```sql SELECT t.id, t.name, t.popularity, t.external_url, t.duration_ms, t.explicit, t.disc_number, t.track_number, t.label, t.last_sync_time, a.id AS album_id, a.name AS album_name, a.release_date AS album_release_date, ar.id AS artist_id, ar.name AS artist_name FROM spotify_track t LEFT JOIN spotify_album a ON t.album_id = a.id LEFT JOIN spotify_track_artist ta ON t.id = ta.track_id LEFT JOIN spotify_artist ar ON ta.artist_id = ar.id WHERE t.id = @trackId; ``` **Multi-Mapping:** Track with nested album and artist list. ### External ID Lookup ```sql SELECT a.id, a.name, a.popularity, a.external_url, a.label, a.release_date, a.total_tracks, a.album_type, a.last_sync_time FROM spotify_album a INNER JOIN spotify_album_externalid e ON a.id = e.album_id WHERE e.type = 'upc' AND e.value = @upc; ``` **Use Case:** Find album by UPC barcode. ## Index Strategy ### Required Indexes **Fuzzy Search (GIN trigram):** ```sql CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops); CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops); CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops); ``` **Foreign Keys:** ```sql CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id); CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id); CREATE INDEX idx_spotify_track_album ON spotify_track(album_id); CREATE INDEX idx_spotify_track_artist_track ON spotify_track_artist(track_id); CREATE INDEX idx_spotify_track_artist_artist ON spotify_track_artist(artist_id); ``` **External IDs:** ```sql CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value); CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value); ``` ### Index Maintenance **Owned by:** MiniMediaScanner (schema owner) **API Responsibility:** None (read-only consumer) **Performance Impact:** - GIN indexes: Large (2-3x table size), slow writes, fast reads - B-tree indexes: Moderate size, fast writes, fast reads - No index = full table scan (unacceptable for fuzzy search) ## Data Freshness **Sync Mechanism:** MiniMediaScanner polls provider APIs **Sync Frequency:** Unknown (configured in MiniMediaScanner) **Staleness Indicator:** `last_sync_time` column **API Behavior:** - Returns whatever data exists in database - No real-time provider API calls - No cache invalidation - No sync triggering **Client Considerations:** - Check `lastSyncTime` in response - Stale data possible (hours to days old) - No guarantee of completeness - Provider outages affect sync, not queries ## Provider Feature Matrix | Feature | Spotify | Tidal | MusicBrainz | Deezer | Discogs | SoundCloud | |---------|---------|-------|-------------|--------|---------|------------| | **Artist Data** | | Popularity | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | ✗ | | Followers | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | ✓ | | Genres | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ | | Images | ✓ | ✓ | ✗ | ✓ | ✗ | ✓ (avatar) | | Sort Name | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | | Aliases | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | | **Album Data** | | Popularity | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | N/A | | Images | ✓ | ✓ | ✗ | ✓ | ✗ | N/A | | Label | ✓ | ✗ | ✓ | ✗ | ✓ | N/A | | UPC | ✓ | ✓ | ✗ | ✗ | ✓ | N/A | | Copyright | ✓ | ✓ | ✗ | ✗ | ✗ | N/A | | Album Type | ✓ | ✗ | ✓ | ✗ | ✗ | N/A | | **Track Data** | | Popularity | ✓ | ✗ | ✗ | ✓ | ✗ | ✓ (playback_count) | | Duration | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | Explicit | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | | ISRC | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | | Disc/Track # | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ## Database Size Estimates **Assumptions:** - 1 million artists - 10 million albums - 100 million tracks **Spotify Tables:** - `spotify_artist`: ~500 MB - `spotify_artist_image`: ~200 MB - `spotify_album`: ~5 GB - `spotify_album_artist`: ~1 GB - `spotify_album_image`: ~2 GB - `spotify_track`: ~50 GB - `spotify_track_artist`: ~10 GB - **Total:** ~70 GB per provider **All Providers:** ~420 GB (6 providers) **Indexes:** ~200 GB (GIN indexes are large) **Total Database:** ~620 GB for comprehensive catalog **Implications:** - Requires substantial storage - Backup/restore time significant - Index rebuilds time-consuming - Connection pooling critical ## Performance Considerations ### Query Performance **Fuzzy Search:** - With GIN index: 10-50ms for 20 results - Without index: 5-30 seconds (full table scan) - Threshold tuning affects result count and speed **ID Lookup:** - With primary key: <1ms - With foreign key index: 1-5ms **Join Queries:** - Album with artists: 5-20ms - Track with album and artists: 10-30ms - Depends on relationship cardinality ### Optimization Strategies **Implemented:** - GIN indexes for fuzzy search - B-tree indexes for foreign keys - Connection pooling - Parameterized queries (SQL injection prevention) **Missing:** - Query result caching (Redis/Memcached) - Materialized views for complex joins - Partitioning for large tables - Read replicas for horizontal scaling ### Bottlenecks 1. **GIN Index Size:** Large memory footprint 2. **Fuzzy Search:** CPU-intensive similarity calculations 3. **Multi-Provider Queries:** 6 parallel database queries 4. **No Caching:** Every request hits database 5. **Connection Pool Limit:** 100 max connections per instance ## Data Integrity **Constraints:** - Primary keys on all entity tables - Foreign keys for relationships - NOT NULL on critical fields (id, name) **No Constraints:** - No unique constraints on names (duplicates allowed) - No check constraints on data ranges - No triggers for data validation **Orphan Prevention:** - Foreign keys with CASCADE delete (assumed) - Junction tables maintain referential integrity **Data Quality:** - Depends entirely on MiniMediaScanner sync quality - No validation in this API - Garbage in, garbage out ## Backup and Recovery **Responsibility:** Database administrator (not API) **Recommended Strategy:** - Daily full backups - Continuous WAL archiving - Point-in-time recovery capability - Backup retention: 30 days **Recovery Time:** - Full restore: Hours (620 GB database) - Index rebuild: Hours (GIN indexes) - Sync from providers: Days to weeks ## Schema Evolution **Change Process:** 1. MiniMediaScanner updates schema 2. MiniMediaScanner deploys migration 3. MiniMediaMetadataAPI updates models 4. MiniMediaMetadataAPI redeploys **Risk:** Breaking changes require coordinated deployment. **Mitigation:** - Additive changes only (new columns, tables) - Deprecation period for removals - Version compatibility checks **No Automated Migration:** API has no migration framework.