a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
981 lines
26 KiB
Markdown
981 lines
26 KiB
Markdown
# MiniMediaMetadataAPI - Data Layer Analysis
|
|
|
|
## Database Technology
|
|
|
|
**RDBMS:** PostgreSQL
|
|
**Driver:** Npgsql 10.0.2
|
|
**ORM:** Dapper 2.1.72 (micro-ORM)
|
|
**Extensions:** pg_trgm (trigram similarity search)
|
|
|
|
## Schema Ownership
|
|
|
|
**Critical Constraint:** This API does NOT own the database schema.
|
|
|
|
**Schema Owner:** MiniMediaScanner (separate project)
|
|
**API Role:** Read-only consumer
|
|
**Migration Strategy:** None (schema managed externally)
|
|
|
|
### Implications
|
|
|
|
**Pros:**
|
|
- Clear separation of concerns
|
|
- API doesn't need provider API credentials
|
|
- Simpler deployment (no migration coordination)
|
|
- Sync complexity isolated in MiniMediaScanner
|
|
|
|
**Cons:**
|
|
- No control over schema evolution
|
|
- Breaking changes in MiniMediaScanner break API
|
|
- Can't optimize schema for query patterns
|
|
- Data freshness depends on external sync schedule
|
|
|
|
**Coupling Points:**
|
|
- Table names hardcoded in SQL queries
|
|
- Column names hardcoded in Dapper mappings
|
|
- Foreign key relationships assumed in joins
|
|
- Data types must match C# model properties
|
|
|
|
## Connection Configuration
|
|
|
|
**Connection String Format:**
|
|
```
|
|
Host=localhost;
|
|
Database=minimediametadata;
|
|
Username=postgres;
|
|
Password=password;
|
|
MinPoolSize=5;
|
|
MaxPoolSize=100;
|
|
Timeout=30;
|
|
CommandTimeout=30;
|
|
```
|
|
|
|
**Pooling Settings:**
|
|
- **MinPoolSize:** 5 connections kept alive
|
|
- **MaxPoolSize:** 100 concurrent connections
|
|
- **Timeout:** 30 seconds to acquire connection
|
|
- **CommandTimeout:** 30 seconds for query execution
|
|
|
|
**Connection Lifecycle:**
|
|
- Connections created per repository method call
|
|
- Returned to pool after query completion
|
|
- No long-lived connections
|
|
- No transaction management (read-only)
|
|
|
|
## Fuzzy Search Implementation
|
|
|
|
### pg_trgm Extension
|
|
|
|
**Purpose:** Trigram-based similarity search for fuzzy text matching
|
|
|
|
**Configuration:**
|
|
```sql
|
|
SET LOCAL pg_trgm.similarity_threshold = 0.5;
|
|
```
|
|
|
|
**Threshold:** 0.5 (50% similarity required)
|
|
|
|
**Operators:**
|
|
- `%` - Similarity operator (returns true if similarity >= threshold)
|
|
- `similarity(text, text)` - Returns similarity score (0.0 to 1.0)
|
|
|
|
### Search Query Pattern
|
|
|
|
**Example (Artist Search):**
|
|
```sql
|
|
SET LOCAL pg_trgm.similarity_threshold = 0.5;
|
|
|
|
SELECT
|
|
id,
|
|
name,
|
|
popularity,
|
|
external_url,
|
|
followers,
|
|
genres,
|
|
last_sync_time
|
|
FROM spotify_artist
|
|
WHERE lower(name) % lower(@searchTerm)
|
|
ORDER BY similarity(lower(name), lower(@searchTerm)) DESC
|
|
LIMIT 20 OFFSET @offset;
|
|
```
|
|
|
|
**Key Features:**
|
|
- Case-insensitive matching (`lower()`)
|
|
- Similarity-based ordering (best matches first)
|
|
- Pagination support (LIMIT/OFFSET)
|
|
- Threshold filtering (only >= 50% similarity)
|
|
|
|
**Performance:**
|
|
- Requires GIN or GiST index on name column
|
|
- Index creation: `CREATE INDEX idx_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);`
|
|
- Query time: O(log n) with index, O(n) without
|
|
|
|
### Similarity Scoring
|
|
|
|
**Algorithm:** Trigram overlap
|
|
|
|
**Example:**
|
|
```
|
|
"Beatles" vs "Beetles"
|
|
Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["bee", "eet", "etl", "tle", "les"]
|
|
Overlap: ["tle", "les"] = 2/5 = 0.4 (below threshold)
|
|
|
|
"Beatles" vs "The Beatles"
|
|
Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["the", "he ", "e b", " be", "bea", "eat", "atl", "tle", "les"]
|
|
Overlap: ["bea", "eat", "atl", "tle", "les"] = 5/9 = 0.56 (above threshold)
|
|
```
|
|
|
|
**Tuning:**
|
|
- Lower threshold (0.3) = more results, more false positives
|
|
- Higher threshold (0.7) = fewer results, more precision
|
|
- Current 0.5 = balanced approach
|
|
|
|
## Database Schema
|
|
|
|
### Provider-Specific Tables
|
|
|
|
Each provider has isolated table structure. No cross-provider foreign keys.
|
|
|
|
### Spotify Schema
|
|
|
|
**Tables:**
|
|
1. `spotify_artist` - Artist metadata
|
|
2. `spotify_artist_image` - Artist images (1:N)
|
|
3. `spotify_album` - Album metadata
|
|
4. `spotify_album_artist` - Album-artist relationships (M:N)
|
|
5. `spotify_album_image` - Album artwork (1:N)
|
|
6. `spotify_album_externalid` - External identifiers (UPC, EAN) (1:N)
|
|
7. `spotify_track` - Track metadata
|
|
8. `spotify_track_artist` - Track-artist relationships (M:N)
|
|
9. `spotify_track_externalid` - External identifiers (ISRC) (1:N)
|
|
|
|
**spotify_artist:**
|
|
```sql
|
|
CREATE TABLE spotify_artist (
|
|
id VARCHAR(255) PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
popularity INTEGER,
|
|
external_url VARCHAR(500),
|
|
followers INTEGER,
|
|
genres TEXT[], -- PostgreSQL array
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
**spotify_artist_image:**
|
|
```sql
|
|
CREATE TABLE spotify_artist_image (
|
|
id SERIAL PRIMARY KEY,
|
|
artist_id VARCHAR(255) REFERENCES spotify_artist(id),
|
|
url VARCHAR(1000) NOT NULL,
|
|
height INTEGER,
|
|
width INTEGER
|
|
);
|
|
|
|
CREATE INDEX idx_spotify_artist_image_artist ON spotify_artist_image(artist_id);
|
|
```
|
|
|
|
**spotify_album:**
|
|
```sql
|
|
CREATE TABLE spotify_album (
|
|
id VARCHAR(255) PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
popularity INTEGER,
|
|
external_url VARCHAR(500),
|
|
label VARCHAR(500),
|
|
release_date VARCHAR(50), -- Stored as string (YYYY, YYYY-MM, or YYYY-MM-DD)
|
|
total_tracks INTEGER,
|
|
album_type VARCHAR(50), -- album, single, compilation
|
|
copyright TEXT,
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
**spotify_album_artist (junction table):**
|
|
```sql
|
|
CREATE TABLE spotify_album_artist (
|
|
id SERIAL PRIMARY KEY,
|
|
album_id VARCHAR(255) REFERENCES spotify_album(id),
|
|
artist_id VARCHAR(255) REFERENCES spotify_artist(id)
|
|
);
|
|
|
|
CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id);
|
|
CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id);
|
|
```
|
|
|
|
**spotify_track:**
|
|
```sql
|
|
CREATE TABLE spotify_track (
|
|
id VARCHAR(255) PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
album_id VARCHAR(255) REFERENCES spotify_album(id),
|
|
popularity INTEGER,
|
|
external_url VARCHAR(500),
|
|
duration_ms INTEGER,
|
|
explicit BOOLEAN,
|
|
disc_number INTEGER,
|
|
track_number INTEGER,
|
|
label VARCHAR(500),
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops);
|
|
CREATE INDEX idx_spotify_track_album ON spotify_track(album_id);
|
|
```
|
|
|
|
**spotify_album_externalid:**
|
|
```sql
|
|
CREATE TABLE spotify_album_externalid (
|
|
id SERIAL PRIMARY KEY,
|
|
album_id VARCHAR(255) REFERENCES spotify_album(id),
|
|
type VARCHAR(50), -- upc, ean
|
|
value VARCHAR(255)
|
|
);
|
|
|
|
CREATE INDEX idx_spotify_album_externalid_album ON spotify_album_externalid(album_id);
|
|
CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value);
|
|
```
|
|
|
|
**spotify_track_externalid:**
|
|
```sql
|
|
CREATE TABLE spotify_track_externalid (
|
|
id SERIAL PRIMARY KEY,
|
|
track_id VARCHAR(255) REFERENCES spotify_track(id),
|
|
type VARCHAR(50), -- isrc
|
|
value VARCHAR(255)
|
|
);
|
|
|
|
CREATE INDEX idx_spotify_track_externalid_track ON spotify_track_externalid(track_id);
|
|
CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value);
|
|
```
|
|
|
|
### Tidal Schema
|
|
|
|
**Tables:**
|
|
1. `tidal_artist` - Artist metadata
|
|
2. `tidal_artist_image_link` - Artist image URLs (1:N)
|
|
3. `tidal_album` - Album metadata
|
|
4. `tidal_album_external_link` - External URLs (1:N)
|
|
5. `tidal_album_image` - Album artwork (1:N)
|
|
6. `tidal_track` - Track metadata
|
|
7. `tidal_track_artist` - Track-artist relationships (M:N)
|
|
8. `tidal_track_external_link` - External URLs (1:N)
|
|
|
|
**Key Differences from Spotify:**
|
|
- ID type: INTEGER instead of VARCHAR
|
|
- No popularity field
|
|
- No genres field
|
|
- External links instead of external IDs
|
|
- Image links stored as separate table
|
|
|
|
**tidal_artist:**
|
|
```sql
|
|
CREATE TABLE tidal_artist (
|
|
id INTEGER PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
url VARCHAR(500),
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_tidal_artist_name_trgm ON tidal_artist USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
**tidal_album:**
|
|
```sql
|
|
CREATE TABLE tidal_album (
|
|
id INTEGER PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
artist_id INTEGER REFERENCES tidal_artist(id),
|
|
url VARCHAR(500),
|
|
release_date VARCHAR(50),
|
|
total_tracks INTEGER,
|
|
duration INTEGER, -- Total duration in seconds
|
|
explicit BOOLEAN,
|
|
upc VARCHAR(255),
|
|
copyright TEXT,
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_tidal_album_name_trgm ON tidal_album USING gin(lower(name) gin_trgm_ops);
|
|
CREATE INDEX idx_tidal_album_artist ON tidal_album(artist_id);
|
|
```
|
|
|
|
### MusicBrainz Schema
|
|
|
|
**Tables:**
|
|
1. `musicbrainz_artist` - Artist metadata
|
|
2. `musicbrainz_release` - Release (album) metadata
|
|
3. `musicbrainz_release_label` - Release-label relationships (M:N)
|
|
4. `musicbrainz_label` - Label metadata
|
|
5. `musicbrainz_release_track` - Track metadata
|
|
6. `musicbrainz_release_track_artist` - Track-artist relationships (M:N)
|
|
|
|
**Key Differences:**
|
|
- ID type: UUID (Guid)
|
|
- "Release" instead of "Album"
|
|
- Sort name field for artists
|
|
- Label as separate entity
|
|
- No popularity or follower counts
|
|
- No images (stored externally via Cover Art Archive)
|
|
|
|
**musicbrainz_artist:**
|
|
```sql
|
|
CREATE TABLE musicbrainz_artist (
|
|
id UUID PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
sort_name VARCHAR(500), -- For alphabetical sorting (e.g., "Beatles, The")
|
|
type VARCHAR(100), -- Person, Group, Orchestra, etc.
|
|
country VARCHAR(2), -- ISO country code
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_musicbrainz_artist_name_trgm ON musicbrainz_artist USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
**musicbrainz_release:**
|
|
```sql
|
|
CREATE TABLE musicbrainz_release (
|
|
id UUID PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
artist_id UUID REFERENCES musicbrainz_artist(id),
|
|
release_date VARCHAR(50),
|
|
country VARCHAR(2),
|
|
barcode VARCHAR(255), -- Similar to UPC
|
|
status VARCHAR(100), -- Official, Promotion, Bootleg, etc.
|
|
packaging VARCHAR(100), -- Jewel Case, Digipak, etc.
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_musicbrainz_release_name_trgm ON musicbrainz_release USING gin(lower(name) gin_trgm_ops);
|
|
CREATE INDEX idx_musicbrainz_release_artist ON musicbrainz_release(artist_id);
|
|
```
|
|
|
|
**musicbrainz_label:**
|
|
```sql
|
|
CREATE TABLE musicbrainz_label (
|
|
id UUID PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
type VARCHAR(100), -- Original Production, Bootleg Production, etc.
|
|
country VARCHAR(2),
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
```
|
|
|
|
**musicbrainz_release_label (junction table):**
|
|
```sql
|
|
CREATE TABLE musicbrainz_release_label (
|
|
id SERIAL PRIMARY KEY,
|
|
release_id UUID REFERENCES musicbrainz_release(id),
|
|
label_id UUID REFERENCES musicbrainz_label(id),
|
|
catalog_number VARCHAR(255)
|
|
);
|
|
|
|
CREATE INDEX idx_musicbrainz_release_label_release ON musicbrainz_release_label(release_id);
|
|
CREATE INDEX idx_musicbrainz_release_label_label ON musicbrainz_release_label(label_id);
|
|
```
|
|
|
|
### Deezer Schema
|
|
|
|
**Tables:**
|
|
1. `deezer_artist` - Artist metadata
|
|
2. `deezer_artist_image_link` - Artist image URLs (1:N)
|
|
3. `deezer_album` - Album metadata
|
|
4. `deezer_album_image_link` - Album artwork URLs (1:N)
|
|
5. `deezer_album_artist` - Album-artist relationships (M:N)
|
|
6. `deezer_track` - Track metadata
|
|
7. `deezer_track_artist` - Track-artist relationships (M:N)
|
|
|
|
**Key Differences:**
|
|
- ID type: BIGINT
|
|
- Has popularity (called "fans")
|
|
- Has genres
|
|
- No UPC/ISRC fields
|
|
- No label information
|
|
|
|
**deezer_artist:**
|
|
```sql
|
|
CREATE TABLE deezer_artist (
|
|
id BIGINT PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
url VARCHAR(500),
|
|
fans INTEGER, -- Similar to followers
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_deezer_artist_name_trgm ON deezer_artist USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
**deezer_album:**
|
|
```sql
|
|
CREATE TABLE deezer_album (
|
|
id BIGINT PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
url VARCHAR(500),
|
|
release_date VARCHAR(50),
|
|
total_tracks INTEGER,
|
|
duration INTEGER, -- Total duration in seconds
|
|
explicit BOOLEAN,
|
|
fans INTEGER,
|
|
genres TEXT[], -- PostgreSQL array
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_deezer_album_name_trgm ON deezer_album USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
### Discogs Schema
|
|
|
|
**Tables:**
|
|
1. `discogs_artist` - Artist metadata
|
|
2. `discogs_artist_alias` - Artist aliases (1:N)
|
|
3. `discogs_artist_url` - Artist URLs (1:N)
|
|
4. `discogs_release` - Release metadata
|
|
5. `discogs_release_artist` - Release-artist relationships (M:N)
|
|
6. `discogs_release_identifier` - Barcodes/identifiers (1:N)
|
|
7. `discogs_release_track` - Track metadata
|
|
8. `discogs_label` - Label metadata
|
|
9. `discogs_label_sublabel` - Label hierarchy (1:N)
|
|
10. `discogs_label_url` - Label URLs (1:N)
|
|
|
|
**Key Differences:**
|
|
- ID type: INTEGER
|
|
- Most comprehensive label data
|
|
- Artist aliases tracked
|
|
- Multiple identifiers per release (Barcode, Matrix, etc.)
|
|
- No popularity metrics
|
|
- No image URLs (stored externally)
|
|
|
|
**discogs_artist:**
|
|
```sql
|
|
CREATE TABLE discogs_artist (
|
|
id INTEGER PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
real_name VARCHAR(500), -- For pseudonyms
|
|
profile TEXT, -- Biography
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_discogs_artist_name_trgm ON discogs_artist USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
**discogs_artist_alias:**
|
|
```sql
|
|
CREATE TABLE discogs_artist_alias (
|
|
id SERIAL PRIMARY KEY,
|
|
artist_id INTEGER REFERENCES discogs_artist(id),
|
|
alias_name VARCHAR(500)
|
|
);
|
|
|
|
CREATE INDEX idx_discogs_artist_alias_artist ON discogs_artist_alias(artist_id);
|
|
CREATE INDEX idx_discogs_artist_alias_name_trgm ON discogs_artist_alias USING gin(lower(alias_name) gin_trgm_ops);
|
|
```
|
|
|
|
**discogs_release:**
|
|
```sql
|
|
CREATE TABLE discogs_release (
|
|
id INTEGER PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
released VARCHAR(50),
|
|
country VARCHAR(100),
|
|
notes TEXT,
|
|
genres TEXT[],
|
|
styles TEXT[], -- More specific than genres
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_discogs_release_name_trgm ON discogs_release USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
**discogs_release_identifier:**
|
|
```sql
|
|
CREATE TABLE discogs_release_identifier (
|
|
id SERIAL PRIMARY KEY,
|
|
release_id INTEGER REFERENCES discogs_release(id),
|
|
type VARCHAR(100), -- Barcode, Matrix/Runout, Label Code, etc.
|
|
value VARCHAR(500),
|
|
description TEXT
|
|
);
|
|
|
|
CREATE INDEX idx_discogs_release_identifier_release ON discogs_release_identifier(release_id);
|
|
CREATE INDEX idx_discogs_release_identifier_value ON discogs_release_identifier(value);
|
|
```
|
|
|
|
**discogs_label:**
|
|
```sql
|
|
CREATE TABLE discogs_label (
|
|
id INTEGER PRIMARY KEY,
|
|
name VARCHAR(500) NOT NULL,
|
|
contact_info TEXT,
|
|
profile TEXT,
|
|
parent_label_id INTEGER REFERENCES discogs_label(id),
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_discogs_label_name_trgm ON discogs_label USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
### SoundCloud Schema
|
|
|
|
**Tables:**
|
|
1. `soundcloud_user` - User/artist metadata
|
|
2. `soundcloud_playlist` - Playlist metadata
|
|
3. `soundcloud_track` - Track metadata
|
|
4. `soundcloud_track_artist` - Track-artist relationships (M:N)
|
|
|
|
**Key Differences:**
|
|
- "User" instead of "Artist" (user-generated content platform)
|
|
- Playlist as first-class entity
|
|
- No album concept
|
|
- Minimal metadata (no UPC, ISRC, labels)
|
|
- ID type: BIGINT
|
|
|
|
**soundcloud_user:**
|
|
```sql
|
|
CREATE TABLE soundcloud_user (
|
|
id BIGINT PRIMARY KEY,
|
|
username VARCHAR(500) NOT NULL,
|
|
full_name VARCHAR(500),
|
|
url VARCHAR(500),
|
|
avatar_url VARCHAR(1000),
|
|
followers_count INTEGER,
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_soundcloud_user_username_trgm ON soundcloud_user USING gin(lower(username) gin_trgm_ops);
|
|
```
|
|
|
|
**soundcloud_playlist:**
|
|
```sql
|
|
CREATE TABLE soundcloud_playlist (
|
|
id BIGINT PRIMARY KEY,
|
|
title VARCHAR(500) NOT NULL,
|
|
user_id BIGINT REFERENCES soundcloud_user(id),
|
|
url VARCHAR(500),
|
|
artwork_url VARCHAR(1000),
|
|
duration INTEGER, -- Total duration in milliseconds
|
|
track_count INTEGER,
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_soundcloud_playlist_title_trgm ON soundcloud_playlist USING gin(lower(title) gin_trgm_ops);
|
|
CREATE INDEX idx_soundcloud_playlist_user ON soundcloud_playlist(user_id);
|
|
```
|
|
|
|
**soundcloud_track:**
|
|
```sql
|
|
CREATE TABLE soundcloud_track (
|
|
id BIGINT PRIMARY KEY,
|
|
title VARCHAR(500) NOT NULL,
|
|
user_id BIGINT REFERENCES soundcloud_user(id),
|
|
url VARCHAR(500),
|
|
artwork_url VARCHAR(1000),
|
|
duration INTEGER, -- Duration in milliseconds
|
|
genre VARCHAR(255),
|
|
playback_count INTEGER,
|
|
last_sync_time TIMESTAMP WITH TIME ZONE
|
|
);
|
|
|
|
CREATE INDEX idx_soundcloud_track_title_trgm ON soundcloud_track USING gin(lower(title) gin_trgm_ops);
|
|
CREATE INDEX idx_soundcloud_track_user ON soundcloud_track(user_id);
|
|
```
|
|
|
|
## ID Type Comparison
|
|
|
|
| Provider | Artist ID | Album ID | Track ID | Notes |
|
|
|----------|-----------|----------|----------|-------|
|
|
| Spotify | VARCHAR(255) | VARCHAR(255) | VARCHAR(255) | Base62 encoded (22 chars) |
|
|
| Tidal | INTEGER | INTEGER | INTEGER | Sequential integers |
|
|
| MusicBrainz | UUID | UUID | UUID | RFC 4122 UUIDs |
|
|
| Deezer | BIGINT | BIGINT | BIGINT | Large integers |
|
|
| Discogs | INTEGER | INTEGER | INTEGER | Sequential integers |
|
|
| SoundCloud | BIGINT | N/A | BIGINT | No album concept |
|
|
|
|
**Implications:**
|
|
- Cross-provider ID lookups impossible
|
|
- ID parameter must match provider type
|
|
- C# models use provider-specific types
|
|
- No universal identifier system
|
|
|
|
## Data Type Patterns
|
|
|
|
### Arrays (PostgreSQL Native)
|
|
|
|
**Usage:** Genres, styles, external IDs
|
|
|
|
**Example:**
|
|
```sql
|
|
genres TEXT[] -- ["rock", "pop", "alternative"]
|
|
```
|
|
|
|
**Dapper Mapping:**
|
|
```csharp
|
|
public class SpotifyArtist
|
|
{
|
|
public string[] Genres { get; set; } // Dapper auto-maps PostgreSQL arrays
|
|
}
|
|
```
|
|
|
|
### Timestamps
|
|
|
|
**Type:** `TIMESTAMP WITH TIME ZONE`
|
|
**Purpose:** Track last sync time from provider
|
|
|
|
**Example:**
|
|
```sql
|
|
last_sync_time TIMESTAMP WITH TIME ZONE DEFAULT NOW()
|
|
```
|
|
|
|
**C# Mapping:**
|
|
```csharp
|
|
public DateTime? LastSyncTime { get; set; }
|
|
```
|
|
|
|
### Variable-Length Dates
|
|
|
|
**Type:** VARCHAR(50)
|
|
**Formats:** YYYY, YYYY-MM, YYYY-MM-DD
|
|
|
|
**Rationale:** Providers return different precision levels
|
|
|
|
**Examples:**
|
|
- `"1969"` - Year only
|
|
- `"1969-09"` - Year and month
|
|
- `"1969-09-26"` - Full date
|
|
|
|
**C# Mapping:**
|
|
```csharp
|
|
public string ReleaseDate { get; set; } // Stored as string, parsed in application
|
|
```
|
|
|
|
## Query Patterns
|
|
|
|
### Artist Search
|
|
|
|
```sql
|
|
SET LOCAL pg_trgm.similarity_threshold = 0.5;
|
|
|
|
SELECT
|
|
a.id,
|
|
a.name,
|
|
a.popularity,
|
|
a.external_url,
|
|
a.followers,
|
|
a.genres,
|
|
a.last_sync_time,
|
|
i.url AS image_url,
|
|
i.height AS image_height,
|
|
i.width AS image_width
|
|
FROM spotify_artist a
|
|
LEFT JOIN spotify_artist_image i ON a.id = i.artist_id
|
|
WHERE lower(a.name) % lower(@searchTerm)
|
|
ORDER BY similarity(lower(a.name), lower(@searchTerm)) DESC
|
|
LIMIT 20 OFFSET @offset;
|
|
```
|
|
|
|
**Dapper Mapping:**
|
|
```csharp
|
|
var artistDict = new Dictionary<string, SpotifyArtist>();
|
|
|
|
var results = await connection.QueryAsync<SpotifyArtist, SpotifyArtistImage, SpotifyArtist>(
|
|
sql,
|
|
(artist, image) =>
|
|
{
|
|
if (!artistDict.TryGetValue(artist.Id, out var existingArtist))
|
|
{
|
|
existingArtist = artist;
|
|
existingArtist.Images = new List<SpotifyArtistImage>();
|
|
artistDict.Add(artist.Id, existingArtist);
|
|
}
|
|
|
|
if (image != null)
|
|
{
|
|
existingArtist.Images.Add(image);
|
|
}
|
|
|
|
return existingArtist;
|
|
},
|
|
new { searchTerm, offset },
|
|
splitOn: "image_url"
|
|
);
|
|
|
|
return artistDict.Values.ToList();
|
|
```
|
|
|
|
### Album with Artists
|
|
|
|
```sql
|
|
SELECT
|
|
a.id,
|
|
a.name,
|
|
a.popularity,
|
|
a.external_url,
|
|
a.label,
|
|
a.release_date,
|
|
a.total_tracks,
|
|
a.album_type,
|
|
a.copyright,
|
|
a.last_sync_time,
|
|
ar.id AS artist_id,
|
|
ar.name AS artist_name
|
|
FROM spotify_album a
|
|
LEFT JOIN spotify_album_artist aa ON a.id = aa.album_id
|
|
LEFT JOIN spotify_artist ar ON aa.artist_id = ar.id
|
|
WHERE a.id = @albumId;
|
|
```
|
|
|
|
**Multi-Mapping:** Album with nested artist list.
|
|
|
|
### Track with Album and Artists
|
|
|
|
```sql
|
|
SELECT
|
|
t.id,
|
|
t.name,
|
|
t.popularity,
|
|
t.external_url,
|
|
t.duration_ms,
|
|
t.explicit,
|
|
t.disc_number,
|
|
t.track_number,
|
|
t.label,
|
|
t.last_sync_time,
|
|
a.id AS album_id,
|
|
a.name AS album_name,
|
|
a.release_date AS album_release_date,
|
|
ar.id AS artist_id,
|
|
ar.name AS artist_name
|
|
FROM spotify_track t
|
|
LEFT JOIN spotify_album a ON t.album_id = a.id
|
|
LEFT JOIN spotify_track_artist ta ON t.id = ta.track_id
|
|
LEFT JOIN spotify_artist ar ON ta.artist_id = ar.id
|
|
WHERE t.id = @trackId;
|
|
```
|
|
|
|
**Multi-Mapping:** Track with nested album and artist list.
|
|
|
|
### External ID Lookup
|
|
|
|
```sql
|
|
SELECT
|
|
a.id,
|
|
a.name,
|
|
a.popularity,
|
|
a.external_url,
|
|
a.label,
|
|
a.release_date,
|
|
a.total_tracks,
|
|
a.album_type,
|
|
a.last_sync_time
|
|
FROM spotify_album a
|
|
INNER JOIN spotify_album_externalid e ON a.id = e.album_id
|
|
WHERE e.type = 'upc' AND e.value = @upc;
|
|
```
|
|
|
|
**Use Case:** Find album by UPC barcode.
|
|
|
|
## Index Strategy
|
|
|
|
### Required Indexes
|
|
|
|
**Fuzzy Search (GIN trigram):**
|
|
```sql
|
|
CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);
|
|
CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops);
|
|
CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops);
|
|
```
|
|
|
|
**Foreign Keys:**
|
|
```sql
|
|
CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id);
|
|
CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id);
|
|
CREATE INDEX idx_spotify_track_album ON spotify_track(album_id);
|
|
CREATE INDEX idx_spotify_track_artist_track ON spotify_track_artist(track_id);
|
|
CREATE INDEX idx_spotify_track_artist_artist ON spotify_track_artist(artist_id);
|
|
```
|
|
|
|
**External IDs:**
|
|
```sql
|
|
CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value);
|
|
CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value);
|
|
```
|
|
|
|
### Index Maintenance
|
|
|
|
**Owned by:** MiniMediaScanner (schema owner)
|
|
**API Responsibility:** None (read-only consumer)
|
|
|
|
**Performance Impact:**
|
|
- GIN indexes: Large (2-3x table size), slow writes, fast reads
|
|
- B-tree indexes: Moderate size, fast writes, fast reads
|
|
- No index = full table scan (unacceptable for fuzzy search)
|
|
|
|
## Data Freshness
|
|
|
|
**Sync Mechanism:** MiniMediaScanner polls provider APIs
|
|
|
|
**Sync Frequency:** Unknown (configured in MiniMediaScanner)
|
|
|
|
**Staleness Indicator:** `last_sync_time` column
|
|
|
|
**API Behavior:**
|
|
- Returns whatever data exists in database
|
|
- No real-time provider API calls
|
|
- No cache invalidation
|
|
- No sync triggering
|
|
|
|
**Client Considerations:**
|
|
- Check `lastSyncTime` in response
|
|
- Stale data possible (hours to days old)
|
|
- No guarantee of completeness
|
|
- Provider outages affect sync, not queries
|
|
|
|
## Provider Feature Matrix
|
|
|
|
| Feature | Spotify | Tidal | MusicBrainz | Deezer | Discogs | SoundCloud |
|
|
|---------|---------|-------|-------------|--------|---------|------------|
|
|
| **Artist Data** |
|
|
| Popularity | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | ✗ |
|
|
| Followers | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | ✓ |
|
|
| Genres | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ |
|
|
| Images | ✓ | ✓ | ✗ | ✓ | ✗ | ✓ (avatar) |
|
|
| Sort Name | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
|
|
| Aliases | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ |
|
|
| **Album Data** |
|
|
| Popularity | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | N/A |
|
|
| Images | ✓ | ✓ | ✗ | ✓ | ✗ | N/A |
|
|
| Label | ✓ | ✗ | ✓ | ✗ | ✓ | N/A |
|
|
| UPC | ✓ | ✓ | ✗ | ✗ | ✓ | N/A |
|
|
| Copyright | ✓ | ✓ | ✗ | ✗ | ✗ | N/A |
|
|
| Album Type | ✓ | ✗ | ✓ | ✗ | ✗ | N/A |
|
|
| **Track Data** |
|
|
| Popularity | ✓ | ✗ | ✗ | ✓ | ✗ | ✓ (playback_count) |
|
|
| Duration | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
|
|
| Explicit | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
|
|
| ISRC | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
|
|
| Disc/Track # | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
|
|
|
|
## Database Size Estimates
|
|
|
|
**Assumptions:**
|
|
- 1 million artists
|
|
- 10 million albums
|
|
- 100 million tracks
|
|
|
|
**Spotify Tables:**
|
|
- `spotify_artist`: ~500 MB
|
|
- `spotify_artist_image`: ~200 MB
|
|
- `spotify_album`: ~5 GB
|
|
- `spotify_album_artist`: ~1 GB
|
|
- `spotify_album_image`: ~2 GB
|
|
- `spotify_track`: ~50 GB
|
|
- `spotify_track_artist`: ~10 GB
|
|
- **Total:** ~70 GB per provider
|
|
|
|
**All Providers:** ~420 GB (6 providers)
|
|
|
|
**Indexes:** ~200 GB (GIN indexes are large)
|
|
|
|
**Total Database:** ~620 GB for comprehensive catalog
|
|
|
|
**Implications:**
|
|
- Requires substantial storage
|
|
- Backup/restore time significant
|
|
- Index rebuilds time-consuming
|
|
- Connection pooling critical
|
|
|
|
## Performance Considerations
|
|
|
|
### Query Performance
|
|
|
|
**Fuzzy Search:**
|
|
- With GIN index: 10-50ms for 20 results
|
|
- Without index: 5-30 seconds (full table scan)
|
|
- Threshold tuning affects result count and speed
|
|
|
|
**ID Lookup:**
|
|
- With primary key: <1ms
|
|
- With foreign key index: 1-5ms
|
|
|
|
**Join Queries:**
|
|
- Album with artists: 5-20ms
|
|
- Track with album and artists: 10-30ms
|
|
- Depends on relationship cardinality
|
|
|
|
### Optimization Strategies
|
|
|
|
**Implemented:**
|
|
- GIN indexes for fuzzy search
|
|
- B-tree indexes for foreign keys
|
|
- Connection pooling
|
|
- Parameterized queries (SQL injection prevention)
|
|
|
|
**Missing:**
|
|
- Query result caching (Redis/Memcached)
|
|
- Materialized views for complex joins
|
|
- Partitioning for large tables
|
|
- Read replicas for horizontal scaling
|
|
|
|
### Bottlenecks
|
|
|
|
1. **GIN Index Size:** Large memory footprint
|
|
2. **Fuzzy Search:** CPU-intensive similarity calculations
|
|
3. **Multi-Provider Queries:** 6 parallel database queries
|
|
4. **No Caching:** Every request hits database
|
|
5. **Connection Pool Limit:** 100 max connections per instance
|
|
|
|
## Data Integrity
|
|
|
|
**Constraints:**
|
|
- Primary keys on all entity tables
|
|
- Foreign keys for relationships
|
|
- NOT NULL on critical fields (id, name)
|
|
|
|
**No Constraints:**
|
|
- No unique constraints on names (duplicates allowed)
|
|
- No check constraints on data ranges
|
|
- No triggers for data validation
|
|
|
|
**Orphan Prevention:**
|
|
- Foreign keys with CASCADE delete (assumed)
|
|
- Junction tables maintain referential integrity
|
|
|
|
**Data Quality:**
|
|
- Depends entirely on MiniMediaScanner sync quality
|
|
- No validation in this API
|
|
- Garbage in, garbage out
|
|
|
|
## Backup and Recovery
|
|
|
|
**Responsibility:** Database administrator (not API)
|
|
|
|
**Recommended Strategy:**
|
|
- Daily full backups
|
|
- Continuous WAL archiving
|
|
- Point-in-time recovery capability
|
|
- Backup retention: 30 days
|
|
|
|
**Recovery Time:**
|
|
- Full restore: Hours (620 GB database)
|
|
- Index rebuild: Hours (GIN indexes)
|
|
- Sync from providers: Days to weeks
|
|
|
|
## Schema Evolution
|
|
|
|
**Change Process:**
|
|
1. MiniMediaScanner updates schema
|
|
2. MiniMediaScanner deploys migration
|
|
3. MiniMediaMetadataAPI updates models
|
|
4. MiniMediaMetadataAPI redeploys
|
|
|
|
**Risk:** Breaking changes require coordinated deployment.
|
|
|
|
**Mitigation:**
|
|
- Additive changes only (new columns, tables)
|
|
- Deprecation period for removals
|
|
- Version compatibility checks
|
|
|
|
**No Automated Migration:** API has no migration framework.
|