feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,980 @@
|
||||
# MiniMediaMetadataAPI - Data Layer Analysis
|
||||
|
||||
## Database Technology
|
||||
|
||||
**RDBMS:** PostgreSQL
|
||||
**Driver:** Npgsql 10.0.2
|
||||
**ORM:** Dapper 2.1.72 (micro-ORM)
|
||||
**Extensions:** pg_trgm (trigram similarity search)
|
||||
|
||||
## Schema Ownership
|
||||
|
||||
**Critical Constraint:** This API does NOT own the database schema.
|
||||
|
||||
**Schema Owner:** MiniMediaScanner (separate project)
|
||||
**API Role:** Read-only consumer
|
||||
**Migration Strategy:** None (schema managed externally)
|
||||
|
||||
### Implications
|
||||
|
||||
**Pros:**
|
||||
- Clear separation of concerns
|
||||
- API doesn't need provider API credentials
|
||||
- Simpler deployment (no migration coordination)
|
||||
- Sync complexity isolated in MiniMediaScanner
|
||||
|
||||
**Cons:**
|
||||
- No control over schema evolution
|
||||
- Breaking changes in MiniMediaScanner break API
|
||||
- Can't optimize schema for query patterns
|
||||
- Data freshness depends on external sync schedule
|
||||
|
||||
**Coupling Points:**
|
||||
- Table names hardcoded in SQL queries
|
||||
- Column names hardcoded in Dapper mappings
|
||||
- Foreign key relationships assumed in joins
|
||||
- Data types must match C# model properties
|
||||
|
||||
## Connection Configuration
|
||||
|
||||
**Connection String Format:**
|
||||
```
|
||||
Host=localhost;
|
||||
Database=minimediametadata;
|
||||
Username=postgres;
|
||||
Password=password;
|
||||
MinPoolSize=5;
|
||||
MaxPoolSize=100;
|
||||
Timeout=30;
|
||||
CommandTimeout=30;
|
||||
```
|
||||
|
||||
**Pooling Settings:**
|
||||
- **MinPoolSize:** 5 connections kept alive
|
||||
- **MaxPoolSize:** 100 concurrent connections
|
||||
- **Timeout:** 30 seconds to acquire connection
|
||||
- **CommandTimeout:** 30 seconds for query execution
|
||||
|
||||
**Connection Lifecycle:**
|
||||
- Connections created per repository method call
|
||||
- Returned to pool after query completion
|
||||
- No long-lived connections
|
||||
- No transaction management (read-only)
|
||||
|
||||
## Fuzzy Search Implementation
|
||||
|
||||
### pg_trgm Extension
|
||||
|
||||
**Purpose:** Trigram-based similarity search for fuzzy text matching
|
||||
|
||||
**Configuration:**
|
||||
```sql
|
||||
SET LOCAL pg_trgm.similarity_threshold = 0.5;
|
||||
```
|
||||
|
||||
**Threshold:** 0.5 (50% similarity required)
|
||||
|
||||
**Operators:**
|
||||
- `%` - Similarity operator (returns true if similarity >= threshold)
|
||||
- `similarity(text, text)` - Returns similarity score (0.0 to 1.0)
|
||||
|
||||
### Search Query Pattern
|
||||
|
||||
**Example (Artist Search):**
|
||||
```sql
|
||||
SET LOCAL pg_trgm.similarity_threshold = 0.5;
|
||||
|
||||
SELECT
|
||||
id,
|
||||
name,
|
||||
popularity,
|
||||
external_url,
|
||||
followers,
|
||||
genres,
|
||||
last_sync_time
|
||||
FROM spotify_artist
|
||||
WHERE lower(name) % lower(@searchTerm)
|
||||
ORDER BY similarity(lower(name), lower(@searchTerm)) DESC
|
||||
LIMIT 20 OFFSET @offset;
|
||||
```
|
||||
|
||||
**Key Features:**
|
||||
- Case-insensitive matching (`lower()`)
|
||||
- Similarity-based ordering (best matches first)
|
||||
- Pagination support (LIMIT/OFFSET)
|
||||
- Threshold filtering (only >= 50% similarity)
|
||||
|
||||
**Performance:**
|
||||
- Requires GIN or GiST index on name column
|
||||
- Index creation: `CREATE INDEX idx_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);`
|
||||
- Query time: O(log n) with index, O(n) without
|
||||
|
||||
### Similarity Scoring
|
||||
|
||||
**Algorithm:** Trigram overlap
|
||||
|
||||
**Example:**
|
||||
```
|
||||
"Beatles" vs "Beetles"
|
||||
Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["bee", "eet", "etl", "tle", "les"]
|
||||
Overlap: ["tle", "les"] = 2/5 = 0.4 (below threshold)
|
||||
|
||||
"Beatles" vs "The Beatles"
|
||||
Trigrams: ["bea", "eat", "atl", "tle", "les"] vs ["the", "he ", "e b", " be", "bea", "eat", "atl", "tle", "les"]
|
||||
Overlap: ["bea", "eat", "atl", "tle", "les"] = 5/9 = 0.56 (above threshold)
|
||||
```
|
||||
|
||||
**Tuning:**
|
||||
- Lower threshold (0.3) = more results, more false positives
|
||||
- Higher threshold (0.7) = fewer results, more precision
|
||||
- Current 0.5 = balanced approach
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Provider-Specific Tables
|
||||
|
||||
Each provider has isolated table structure. No cross-provider foreign keys.
|
||||
|
||||
### Spotify Schema
|
||||
|
||||
**Tables:**
|
||||
1. `spotify_artist` - Artist metadata
|
||||
2. `spotify_artist_image` - Artist images (1:N)
|
||||
3. `spotify_album` - Album metadata
|
||||
4. `spotify_album_artist` - Album-artist relationships (M:N)
|
||||
5. `spotify_album_image` - Album artwork (1:N)
|
||||
6. `spotify_album_externalid` - External identifiers (UPC, EAN) (1:N)
|
||||
7. `spotify_track` - Track metadata
|
||||
8. `spotify_track_artist` - Track-artist relationships (M:N)
|
||||
9. `spotify_track_externalid` - External identifiers (ISRC) (1:N)
|
||||
|
||||
**spotify_artist:**
|
||||
```sql
|
||||
CREATE TABLE spotify_artist (
|
||||
id VARCHAR(255) PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
popularity INTEGER,
|
||||
external_url VARCHAR(500),
|
||||
followers INTEGER,
|
||||
genres TEXT[], -- PostgreSQL array
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**spotify_artist_image:**
|
||||
```sql
|
||||
CREATE TABLE spotify_artist_image (
|
||||
id SERIAL PRIMARY KEY,
|
||||
artist_id VARCHAR(255) REFERENCES spotify_artist(id),
|
||||
url VARCHAR(1000) NOT NULL,
|
||||
height INTEGER,
|
||||
width INTEGER
|
||||
);
|
||||
|
||||
CREATE INDEX idx_spotify_artist_image_artist ON spotify_artist_image(artist_id);
|
||||
```
|
||||
|
||||
**spotify_album:**
|
||||
```sql
|
||||
CREATE TABLE spotify_album (
|
||||
id VARCHAR(255) PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
popularity INTEGER,
|
||||
external_url VARCHAR(500),
|
||||
label VARCHAR(500),
|
||||
release_date VARCHAR(50), -- Stored as string (YYYY, YYYY-MM, or YYYY-MM-DD)
|
||||
total_tracks INTEGER,
|
||||
album_type VARCHAR(50), -- album, single, compilation
|
||||
copyright TEXT,
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**spotify_album_artist (junction table):**
|
||||
```sql
|
||||
CREATE TABLE spotify_album_artist (
|
||||
id SERIAL PRIMARY KEY,
|
||||
album_id VARCHAR(255) REFERENCES spotify_album(id),
|
||||
artist_id VARCHAR(255) REFERENCES spotify_artist(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id);
|
||||
CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id);
|
||||
```
|
||||
|
||||
**spotify_track:**
|
||||
```sql
|
||||
CREATE TABLE spotify_track (
|
||||
id VARCHAR(255) PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
album_id VARCHAR(255) REFERENCES spotify_album(id),
|
||||
popularity INTEGER,
|
||||
external_url VARCHAR(500),
|
||||
duration_ms INTEGER,
|
||||
explicit BOOLEAN,
|
||||
disc_number INTEGER,
|
||||
track_number INTEGER,
|
||||
label VARCHAR(500),
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops);
|
||||
CREATE INDEX idx_spotify_track_album ON spotify_track(album_id);
|
||||
```
|
||||
|
||||
**spotify_album_externalid:**
|
||||
```sql
|
||||
CREATE TABLE spotify_album_externalid (
|
||||
id SERIAL PRIMARY KEY,
|
||||
album_id VARCHAR(255) REFERENCES spotify_album(id),
|
||||
type VARCHAR(50), -- upc, ean
|
||||
value VARCHAR(255)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_spotify_album_externalid_album ON spotify_album_externalid(album_id);
|
||||
CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value);
|
||||
```
|
||||
|
||||
**spotify_track_externalid:**
|
||||
```sql
|
||||
CREATE TABLE spotify_track_externalid (
|
||||
id SERIAL PRIMARY KEY,
|
||||
track_id VARCHAR(255) REFERENCES spotify_track(id),
|
||||
type VARCHAR(50), -- isrc
|
||||
value VARCHAR(255)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_spotify_track_externalid_track ON spotify_track_externalid(track_id);
|
||||
CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value);
|
||||
```
|
||||
|
||||
### Tidal Schema
|
||||
|
||||
**Tables:**
|
||||
1. `tidal_artist` - Artist metadata
|
||||
2. `tidal_artist_image_link` - Artist image URLs (1:N)
|
||||
3. `tidal_album` - Album metadata
|
||||
4. `tidal_album_external_link` - External URLs (1:N)
|
||||
5. `tidal_album_image` - Album artwork (1:N)
|
||||
6. `tidal_track` - Track metadata
|
||||
7. `tidal_track_artist` - Track-artist relationships (M:N)
|
||||
8. `tidal_track_external_link` - External URLs (1:N)
|
||||
|
||||
**Key Differences from Spotify:**
|
||||
- ID type: INTEGER instead of VARCHAR
|
||||
- No popularity field
|
||||
- No genres field
|
||||
- External links instead of external IDs
|
||||
- Image links stored as separate table
|
||||
|
||||
**tidal_artist:**
|
||||
```sql
|
||||
CREATE TABLE tidal_artist (
|
||||
id INTEGER PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
url VARCHAR(500),
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_tidal_artist_name_trgm ON tidal_artist USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**tidal_album:**
|
||||
```sql
|
||||
CREATE TABLE tidal_album (
|
||||
id INTEGER PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
artist_id INTEGER REFERENCES tidal_artist(id),
|
||||
url VARCHAR(500),
|
||||
release_date VARCHAR(50),
|
||||
total_tracks INTEGER,
|
||||
duration INTEGER, -- Total duration in seconds
|
||||
explicit BOOLEAN,
|
||||
upc VARCHAR(255),
|
||||
copyright TEXT,
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_tidal_album_name_trgm ON tidal_album USING gin(lower(name) gin_trgm_ops);
|
||||
CREATE INDEX idx_tidal_album_artist ON tidal_album(artist_id);
|
||||
```
|
||||
|
||||
### MusicBrainz Schema
|
||||
|
||||
**Tables:**
|
||||
1. `musicbrainz_artist` - Artist metadata
|
||||
2. `musicbrainz_release` - Release (album) metadata
|
||||
3. `musicbrainz_release_label` - Release-label relationships (M:N)
|
||||
4. `musicbrainz_label` - Label metadata
|
||||
5. `musicbrainz_release_track` - Track metadata
|
||||
6. `musicbrainz_release_track_artist` - Track-artist relationships (M:N)
|
||||
|
||||
**Key Differences:**
|
||||
- ID type: UUID (Guid)
|
||||
- "Release" instead of "Album"
|
||||
- Sort name field for artists
|
||||
- Label as separate entity
|
||||
- No popularity or follower counts
|
||||
- No images (stored externally via Cover Art Archive)
|
||||
|
||||
**musicbrainz_artist:**
|
||||
```sql
|
||||
CREATE TABLE musicbrainz_artist (
|
||||
id UUID PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
sort_name VARCHAR(500), -- For alphabetical sorting (e.g., "Beatles, The")
|
||||
type VARCHAR(100), -- Person, Group, Orchestra, etc.
|
||||
country VARCHAR(2), -- ISO country code
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_musicbrainz_artist_name_trgm ON musicbrainz_artist USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**musicbrainz_release:**
|
||||
```sql
|
||||
CREATE TABLE musicbrainz_release (
|
||||
id UUID PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
artist_id UUID REFERENCES musicbrainz_artist(id),
|
||||
release_date VARCHAR(50),
|
||||
country VARCHAR(2),
|
||||
barcode VARCHAR(255), -- Similar to UPC
|
||||
status VARCHAR(100), -- Official, Promotion, Bootleg, etc.
|
||||
packaging VARCHAR(100), -- Jewel Case, Digipak, etc.
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_musicbrainz_release_name_trgm ON musicbrainz_release USING gin(lower(name) gin_trgm_ops);
|
||||
CREATE INDEX idx_musicbrainz_release_artist ON musicbrainz_release(artist_id);
|
||||
```
|
||||
|
||||
**musicbrainz_label:**
|
||||
```sql
|
||||
CREATE TABLE musicbrainz_label (
|
||||
id UUID PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
type VARCHAR(100), -- Original Production, Bootleg Production, etc.
|
||||
country VARCHAR(2),
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
```
|
||||
|
||||
**musicbrainz_release_label (junction table):**
|
||||
```sql
|
||||
CREATE TABLE musicbrainz_release_label (
|
||||
id SERIAL PRIMARY KEY,
|
||||
release_id UUID REFERENCES musicbrainz_release(id),
|
||||
label_id UUID REFERENCES musicbrainz_label(id),
|
||||
catalog_number VARCHAR(255)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_musicbrainz_release_label_release ON musicbrainz_release_label(release_id);
|
||||
CREATE INDEX idx_musicbrainz_release_label_label ON musicbrainz_release_label(label_id);
|
||||
```
|
||||
|
||||
### Deezer Schema
|
||||
|
||||
**Tables:**
|
||||
1. `deezer_artist` - Artist metadata
|
||||
2. `deezer_artist_image_link` - Artist image URLs (1:N)
|
||||
3. `deezer_album` - Album metadata
|
||||
4. `deezer_album_image_link` - Album artwork URLs (1:N)
|
||||
5. `deezer_album_artist` - Album-artist relationships (M:N)
|
||||
6. `deezer_track` - Track metadata
|
||||
7. `deezer_track_artist` - Track-artist relationships (M:N)
|
||||
|
||||
**Key Differences:**
|
||||
- ID type: BIGINT
|
||||
- Has popularity (called "fans")
|
||||
- Has genres
|
||||
- No UPC/ISRC fields
|
||||
- No label information
|
||||
|
||||
**deezer_artist:**
|
||||
```sql
|
||||
CREATE TABLE deezer_artist (
|
||||
id BIGINT PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
url VARCHAR(500),
|
||||
fans INTEGER, -- Similar to followers
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deezer_artist_name_trgm ON deezer_artist USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**deezer_album:**
|
||||
```sql
|
||||
CREATE TABLE deezer_album (
|
||||
id BIGINT PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
url VARCHAR(500),
|
||||
release_date VARCHAR(50),
|
||||
total_tracks INTEGER,
|
||||
duration INTEGER, -- Total duration in seconds
|
||||
explicit BOOLEAN,
|
||||
fans INTEGER,
|
||||
genres TEXT[], -- PostgreSQL array
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_deezer_album_name_trgm ON deezer_album USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
### Discogs Schema
|
||||
|
||||
**Tables:**
|
||||
1. `discogs_artist` - Artist metadata
|
||||
2. `discogs_artist_alias` - Artist aliases (1:N)
|
||||
3. `discogs_artist_url` - Artist URLs (1:N)
|
||||
4. `discogs_release` - Release metadata
|
||||
5. `discogs_release_artist` - Release-artist relationships (M:N)
|
||||
6. `discogs_release_identifier` - Barcodes/identifiers (1:N)
|
||||
7. `discogs_release_track` - Track metadata
|
||||
8. `discogs_label` - Label metadata
|
||||
9. `discogs_label_sublabel` - Label hierarchy (1:N)
|
||||
10. `discogs_label_url` - Label URLs (1:N)
|
||||
|
||||
**Key Differences:**
|
||||
- ID type: INTEGER
|
||||
- Most comprehensive label data
|
||||
- Artist aliases tracked
|
||||
- Multiple identifiers per release (Barcode, Matrix, etc.)
|
||||
- No popularity metrics
|
||||
- No image URLs (stored externally)
|
||||
|
||||
**discogs_artist:**
|
||||
```sql
|
||||
CREATE TABLE discogs_artist (
|
||||
id INTEGER PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
real_name VARCHAR(500), -- For pseudonyms
|
||||
profile TEXT, -- Biography
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_discogs_artist_name_trgm ON discogs_artist USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**discogs_artist_alias:**
|
||||
```sql
|
||||
CREATE TABLE discogs_artist_alias (
|
||||
id SERIAL PRIMARY KEY,
|
||||
artist_id INTEGER REFERENCES discogs_artist(id),
|
||||
alias_name VARCHAR(500)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_discogs_artist_alias_artist ON discogs_artist_alias(artist_id);
|
||||
CREATE INDEX idx_discogs_artist_alias_name_trgm ON discogs_artist_alias USING gin(lower(alias_name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**discogs_release:**
|
||||
```sql
|
||||
CREATE TABLE discogs_release (
|
||||
id INTEGER PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
released VARCHAR(50),
|
||||
country VARCHAR(100),
|
||||
notes TEXT,
|
||||
genres TEXT[],
|
||||
styles TEXT[], -- More specific than genres
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_discogs_release_name_trgm ON discogs_release USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**discogs_release_identifier:**
|
||||
```sql
|
||||
CREATE TABLE discogs_release_identifier (
|
||||
id SERIAL PRIMARY KEY,
|
||||
release_id INTEGER REFERENCES discogs_release(id),
|
||||
type VARCHAR(100), -- Barcode, Matrix/Runout, Label Code, etc.
|
||||
value VARCHAR(500),
|
||||
description TEXT
|
||||
);
|
||||
|
||||
CREATE INDEX idx_discogs_release_identifier_release ON discogs_release_identifier(release_id);
|
||||
CREATE INDEX idx_discogs_release_identifier_value ON discogs_release_identifier(value);
|
||||
```
|
||||
|
||||
**discogs_label:**
|
||||
```sql
|
||||
CREATE TABLE discogs_label (
|
||||
id INTEGER PRIMARY KEY,
|
||||
name VARCHAR(500) NOT NULL,
|
||||
contact_info TEXT,
|
||||
profile TEXT,
|
||||
parent_label_id INTEGER REFERENCES discogs_label(id),
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_discogs_label_name_trgm ON discogs_label USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
### SoundCloud Schema
|
||||
|
||||
**Tables:**
|
||||
1. `soundcloud_user` - User/artist metadata
|
||||
2. `soundcloud_playlist` - Playlist metadata
|
||||
3. `soundcloud_track` - Track metadata
|
||||
4. `soundcloud_track_artist` - Track-artist relationships (M:N)
|
||||
|
||||
**Key Differences:**
|
||||
- "User" instead of "Artist" (user-generated content platform)
|
||||
- Playlist as first-class entity
|
||||
- No album concept
|
||||
- Minimal metadata (no UPC, ISRC, labels)
|
||||
- ID type: BIGINT
|
||||
|
||||
**soundcloud_user:**
|
||||
```sql
|
||||
CREATE TABLE soundcloud_user (
|
||||
id BIGINT PRIMARY KEY,
|
||||
username VARCHAR(500) NOT NULL,
|
||||
full_name VARCHAR(500),
|
||||
url VARCHAR(500),
|
||||
avatar_url VARCHAR(1000),
|
||||
followers_count INTEGER,
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_soundcloud_user_username_trgm ON soundcloud_user USING gin(lower(username) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**soundcloud_playlist:**
|
||||
```sql
|
||||
CREATE TABLE soundcloud_playlist (
|
||||
id BIGINT PRIMARY KEY,
|
||||
title VARCHAR(500) NOT NULL,
|
||||
user_id BIGINT REFERENCES soundcloud_user(id),
|
||||
url VARCHAR(500),
|
||||
artwork_url VARCHAR(1000),
|
||||
duration INTEGER, -- Total duration in milliseconds
|
||||
track_count INTEGER,
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_soundcloud_playlist_title_trgm ON soundcloud_playlist USING gin(lower(title) gin_trgm_ops);
|
||||
CREATE INDEX idx_soundcloud_playlist_user ON soundcloud_playlist(user_id);
|
||||
```
|
||||
|
||||
**soundcloud_track:**
|
||||
```sql
|
||||
CREATE TABLE soundcloud_track (
|
||||
id BIGINT PRIMARY KEY,
|
||||
title VARCHAR(500) NOT NULL,
|
||||
user_id BIGINT REFERENCES soundcloud_user(id),
|
||||
url VARCHAR(500),
|
||||
artwork_url VARCHAR(1000),
|
||||
duration INTEGER, -- Duration in milliseconds
|
||||
genre VARCHAR(255),
|
||||
playback_count INTEGER,
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_soundcloud_track_title_trgm ON soundcloud_track USING gin(lower(title) gin_trgm_ops);
|
||||
CREATE INDEX idx_soundcloud_track_user ON soundcloud_track(user_id);
|
||||
```
|
||||
|
||||
## ID Type Comparison
|
||||
|
||||
| Provider | Artist ID | Album ID | Track ID | Notes |
|
||||
|----------|-----------|----------|----------|-------|
|
||||
| Spotify | VARCHAR(255) | VARCHAR(255) | VARCHAR(255) | Base62 encoded (22 chars) |
|
||||
| Tidal | INTEGER | INTEGER | INTEGER | Sequential integers |
|
||||
| MusicBrainz | UUID | UUID | UUID | RFC 4122 UUIDs |
|
||||
| Deezer | BIGINT | BIGINT | BIGINT | Large integers |
|
||||
| Discogs | INTEGER | INTEGER | INTEGER | Sequential integers |
|
||||
| SoundCloud | BIGINT | N/A | BIGINT | No album concept |
|
||||
|
||||
**Implications:**
|
||||
- Cross-provider ID lookups impossible
|
||||
- ID parameter must match provider type
|
||||
- C# models use provider-specific types
|
||||
- No universal identifier system
|
||||
|
||||
## Data Type Patterns
|
||||
|
||||
### Arrays (PostgreSQL Native)
|
||||
|
||||
**Usage:** Genres, styles, external IDs
|
||||
|
||||
**Example:**
|
||||
```sql
|
||||
genres TEXT[] -- ["rock", "pop", "alternative"]
|
||||
```
|
||||
|
||||
**Dapper Mapping:**
|
||||
```csharp
|
||||
public class SpotifyArtist
|
||||
{
|
||||
public string[] Genres { get; set; } // Dapper auto-maps PostgreSQL arrays
|
||||
}
|
||||
```
|
||||
|
||||
### Timestamps
|
||||
|
||||
**Type:** `TIMESTAMP WITH TIME ZONE`
|
||||
**Purpose:** Track last sync time from provider
|
||||
|
||||
**Example:**
|
||||
```sql
|
||||
last_sync_time TIMESTAMP WITH TIME ZONE DEFAULT NOW()
|
||||
```
|
||||
|
||||
**C# Mapping:**
|
||||
```csharp
|
||||
public DateTime? LastSyncTime { get; set; }
|
||||
```
|
||||
|
||||
### Variable-Length Dates
|
||||
|
||||
**Type:** VARCHAR(50)
|
||||
**Formats:** YYYY, YYYY-MM, YYYY-MM-DD
|
||||
|
||||
**Rationale:** Providers return different precision levels
|
||||
|
||||
**Examples:**
|
||||
- `"1969"` - Year only
|
||||
- `"1969-09"` - Year and month
|
||||
- `"1969-09-26"` - Full date
|
||||
|
||||
**C# Mapping:**
|
||||
```csharp
|
||||
public string ReleaseDate { get; set; } // Stored as string, parsed in application
|
||||
```
|
||||
|
||||
## Query Patterns
|
||||
|
||||
### Artist Search
|
||||
|
||||
```sql
|
||||
SET LOCAL pg_trgm.similarity_threshold = 0.5;
|
||||
|
||||
SELECT
|
||||
a.id,
|
||||
a.name,
|
||||
a.popularity,
|
||||
a.external_url,
|
||||
a.followers,
|
||||
a.genres,
|
||||
a.last_sync_time,
|
||||
i.url AS image_url,
|
||||
i.height AS image_height,
|
||||
i.width AS image_width
|
||||
FROM spotify_artist a
|
||||
LEFT JOIN spotify_artist_image i ON a.id = i.artist_id
|
||||
WHERE lower(a.name) % lower(@searchTerm)
|
||||
ORDER BY similarity(lower(a.name), lower(@searchTerm)) DESC
|
||||
LIMIT 20 OFFSET @offset;
|
||||
```
|
||||
|
||||
**Dapper Mapping:**
|
||||
```csharp
|
||||
var artistDict = new Dictionary<string, SpotifyArtist>();
|
||||
|
||||
var results = await connection.QueryAsync<SpotifyArtist, SpotifyArtistImage, SpotifyArtist>(
|
||||
sql,
|
||||
(artist, image) =>
|
||||
{
|
||||
if (!artistDict.TryGetValue(artist.Id, out var existingArtist))
|
||||
{
|
||||
existingArtist = artist;
|
||||
existingArtist.Images = new List<SpotifyArtistImage>();
|
||||
artistDict.Add(artist.Id, existingArtist);
|
||||
}
|
||||
|
||||
if (image != null)
|
||||
{
|
||||
existingArtist.Images.Add(image);
|
||||
}
|
||||
|
||||
return existingArtist;
|
||||
},
|
||||
new { searchTerm, offset },
|
||||
splitOn: "image_url"
|
||||
);
|
||||
|
||||
return artistDict.Values.ToList();
|
||||
```
|
||||
|
||||
### Album with Artists
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
a.id,
|
||||
a.name,
|
||||
a.popularity,
|
||||
a.external_url,
|
||||
a.label,
|
||||
a.release_date,
|
||||
a.total_tracks,
|
||||
a.album_type,
|
||||
a.copyright,
|
||||
a.last_sync_time,
|
||||
ar.id AS artist_id,
|
||||
ar.name AS artist_name
|
||||
FROM spotify_album a
|
||||
LEFT JOIN spotify_album_artist aa ON a.id = aa.album_id
|
||||
LEFT JOIN spotify_artist ar ON aa.artist_id = ar.id
|
||||
WHERE a.id = @albumId;
|
||||
```
|
||||
|
||||
**Multi-Mapping:** Album with nested artist list.
|
||||
|
||||
### Track with Album and Artists
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
t.id,
|
||||
t.name,
|
||||
t.popularity,
|
||||
t.external_url,
|
||||
t.duration_ms,
|
||||
t.explicit,
|
||||
t.disc_number,
|
||||
t.track_number,
|
||||
t.label,
|
||||
t.last_sync_time,
|
||||
a.id AS album_id,
|
||||
a.name AS album_name,
|
||||
a.release_date AS album_release_date,
|
||||
ar.id AS artist_id,
|
||||
ar.name AS artist_name
|
||||
FROM spotify_track t
|
||||
LEFT JOIN spotify_album a ON t.album_id = a.id
|
||||
LEFT JOIN spotify_track_artist ta ON t.id = ta.track_id
|
||||
LEFT JOIN spotify_artist ar ON ta.artist_id = ar.id
|
||||
WHERE t.id = @trackId;
|
||||
```
|
||||
|
||||
**Multi-Mapping:** Track with nested album and artist list.
|
||||
|
||||
### External ID Lookup
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
a.id,
|
||||
a.name,
|
||||
a.popularity,
|
||||
a.external_url,
|
||||
a.label,
|
||||
a.release_date,
|
||||
a.total_tracks,
|
||||
a.album_type,
|
||||
a.last_sync_time
|
||||
FROM spotify_album a
|
||||
INNER JOIN spotify_album_externalid e ON a.id = e.album_id
|
||||
WHERE e.type = 'upc' AND e.value = @upc;
|
||||
```
|
||||
|
||||
**Use Case:** Find album by UPC barcode.
|
||||
|
||||
## Index Strategy
|
||||
|
||||
### Required Indexes
|
||||
|
||||
**Fuzzy Search (GIN trigram):**
|
||||
```sql
|
||||
CREATE INDEX idx_spotify_artist_name_trgm ON spotify_artist USING gin(lower(name) gin_trgm_ops);
|
||||
CREATE INDEX idx_spotify_album_name_trgm ON spotify_album USING gin(lower(name) gin_trgm_ops);
|
||||
CREATE INDEX idx_spotify_track_name_trgm ON spotify_track USING gin(lower(name) gin_trgm_ops);
|
||||
```
|
||||
|
||||
**Foreign Keys:**
|
||||
```sql
|
||||
CREATE INDEX idx_spotify_album_artist_album ON spotify_album_artist(album_id);
|
||||
CREATE INDEX idx_spotify_album_artist_artist ON spotify_album_artist(artist_id);
|
||||
CREATE INDEX idx_spotify_track_album ON spotify_track(album_id);
|
||||
CREATE INDEX idx_spotify_track_artist_track ON spotify_track_artist(track_id);
|
||||
CREATE INDEX idx_spotify_track_artist_artist ON spotify_track_artist(artist_id);
|
||||
```
|
||||
|
||||
**External IDs:**
|
||||
```sql
|
||||
CREATE INDEX idx_spotify_album_externalid_value ON spotify_album_externalid(value);
|
||||
CREATE INDEX idx_spotify_track_externalid_value ON spotify_track_externalid(value);
|
||||
```
|
||||
|
||||
### Index Maintenance
|
||||
|
||||
**Owned by:** MiniMediaScanner (schema owner)
|
||||
**API Responsibility:** None (read-only consumer)
|
||||
|
||||
**Performance Impact:**
|
||||
- GIN indexes: Large (2-3x table size), slow writes, fast reads
|
||||
- B-tree indexes: Moderate size, fast writes, fast reads
|
||||
- No index = full table scan (unacceptable for fuzzy search)
|
||||
|
||||
## Data Freshness
|
||||
|
||||
**Sync Mechanism:** MiniMediaScanner polls provider APIs
|
||||
|
||||
**Sync Frequency:** Unknown (configured in MiniMediaScanner)
|
||||
|
||||
**Staleness Indicator:** `last_sync_time` column
|
||||
|
||||
**API Behavior:**
|
||||
- Returns whatever data exists in database
|
||||
- No real-time provider API calls
|
||||
- No cache invalidation
|
||||
- No sync triggering
|
||||
|
||||
**Client Considerations:**
|
||||
- Check `lastSyncTime` in response
|
||||
- Stale data possible (hours to days old)
|
||||
- No guarantee of completeness
|
||||
- Provider outages affect sync, not queries
|
||||
|
||||
## Provider Feature Matrix
|
||||
|
||||
| Feature | Spotify | Tidal | MusicBrainz | Deezer | Discogs | SoundCloud |
|
||||
|---------|---------|-------|-------------|--------|---------|------------|
|
||||
| **Artist Data** |
|
||||
| Popularity | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | ✗ |
|
||||
| Followers | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | ✓ |
|
||||
| Genres | ✓ | ✗ | ✓ | ✓ | ✗ | ✗ |
|
||||
| Images | ✓ | ✓ | ✗ | ✓ | ✗ | ✓ (avatar) |
|
||||
| Sort Name | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ |
|
||||
| Aliases | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ |
|
||||
| **Album Data** |
|
||||
| Popularity | ✓ | ✗ | ✗ | ✓ (fans) | ✗ | N/A |
|
||||
| Images | ✓ | ✓ | ✗ | ✓ | ✗ | N/A |
|
||||
| Label | ✓ | ✗ | ✓ | ✗ | ✓ | N/A |
|
||||
| UPC | ✓ | ✓ | ✗ | ✗ | ✓ | N/A |
|
||||
| Copyright | ✓ | ✓ | ✗ | ✗ | ✗ | N/A |
|
||||
| Album Type | ✓ | ✗ | ✓ | ✗ | ✗ | N/A |
|
||||
| **Track Data** |
|
||||
| Popularity | ✓ | ✗ | ✗ | ✓ | ✗ | ✓ (playback_count) |
|
||||
| Duration | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
|
||||
| Explicit | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
|
||||
| ISRC | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
|
||||
| Disc/Track # | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
|
||||
|
||||
## Database Size Estimates
|
||||
|
||||
**Assumptions:**
|
||||
- 1 million artists
|
||||
- 10 million albums
|
||||
- 100 million tracks
|
||||
|
||||
**Spotify Tables:**
|
||||
- `spotify_artist`: ~500 MB
|
||||
- `spotify_artist_image`: ~200 MB
|
||||
- `spotify_album`: ~5 GB
|
||||
- `spotify_album_artist`: ~1 GB
|
||||
- `spotify_album_image`: ~2 GB
|
||||
- `spotify_track`: ~50 GB
|
||||
- `spotify_track_artist`: ~10 GB
|
||||
- **Total:** ~70 GB per provider
|
||||
|
||||
**All Providers:** ~420 GB (6 providers)
|
||||
|
||||
**Indexes:** ~200 GB (GIN indexes are large)
|
||||
|
||||
**Total Database:** ~620 GB for comprehensive catalog
|
||||
|
||||
**Implications:**
|
||||
- Requires substantial storage
|
||||
- Backup/restore time significant
|
||||
- Index rebuilds time-consuming
|
||||
- Connection pooling critical
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Query Performance
|
||||
|
||||
**Fuzzy Search:**
|
||||
- With GIN index: 10-50ms for 20 results
|
||||
- Without index: 5-30 seconds (full table scan)
|
||||
- Threshold tuning affects result count and speed
|
||||
|
||||
**ID Lookup:**
|
||||
- With primary key: <1ms
|
||||
- With foreign key index: 1-5ms
|
||||
|
||||
**Join Queries:**
|
||||
- Album with artists: 5-20ms
|
||||
- Track with album and artists: 10-30ms
|
||||
- Depends on relationship cardinality
|
||||
|
||||
### Optimization Strategies
|
||||
|
||||
**Implemented:**
|
||||
- GIN indexes for fuzzy search
|
||||
- B-tree indexes for foreign keys
|
||||
- Connection pooling
|
||||
- Parameterized queries (SQL injection prevention)
|
||||
|
||||
**Missing:**
|
||||
- Query result caching (Redis/Memcached)
|
||||
- Materialized views for complex joins
|
||||
- Partitioning for large tables
|
||||
- Read replicas for horizontal scaling
|
||||
|
||||
### Bottlenecks
|
||||
|
||||
1. **GIN Index Size:** Large memory footprint
|
||||
2. **Fuzzy Search:** CPU-intensive similarity calculations
|
||||
3. **Multi-Provider Queries:** 6 parallel database queries
|
||||
4. **No Caching:** Every request hits database
|
||||
5. **Connection Pool Limit:** 100 max connections per instance
|
||||
|
||||
## Data Integrity
|
||||
|
||||
**Constraints:**
|
||||
- Primary keys on all entity tables
|
||||
- Foreign keys for relationships
|
||||
- NOT NULL on critical fields (id, name)
|
||||
|
||||
**No Constraints:**
|
||||
- No unique constraints on names (duplicates allowed)
|
||||
- No check constraints on data ranges
|
||||
- No triggers for data validation
|
||||
|
||||
**Orphan Prevention:**
|
||||
- Foreign keys with CASCADE delete (assumed)
|
||||
- Junction tables maintain referential integrity
|
||||
|
||||
**Data Quality:**
|
||||
- Depends entirely on MiniMediaScanner sync quality
|
||||
- No validation in this API
|
||||
- Garbage in, garbage out
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
**Responsibility:** Database administrator (not API)
|
||||
|
||||
**Recommended Strategy:**
|
||||
- Daily full backups
|
||||
- Continuous WAL archiving
|
||||
- Point-in-time recovery capability
|
||||
- Backup retention: 30 days
|
||||
|
||||
**Recovery Time:**
|
||||
- Full restore: Hours (620 GB database)
|
||||
- Index rebuild: Hours (GIN indexes)
|
||||
- Sync from providers: Days to weeks
|
||||
|
||||
## Schema Evolution
|
||||
|
||||
**Change Process:**
|
||||
1. MiniMediaScanner updates schema
|
||||
2. MiniMediaScanner deploys migration
|
||||
3. MiniMediaMetadataAPI updates models
|
||||
4. MiniMediaMetadataAPI redeploys
|
||||
|
||||
**Risk:** Breaking changes require coordinated deployment.
|
||||
|
||||
**Mitigation:**
|
||||
- Additive changes only (new columns, tables)
|
||||
- Deprecation period for removals
|
||||
- Version compatibility checks
|
||||
|
||||
**No Automated Migration:** API has no migration framework.
|
||||
Reference in New Issue
Block a user