feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,911 @@
|
||||
# Music Metadata API - Data Layer
|
||||
|
||||
## Database Architecture
|
||||
|
||||
Music Metadata API uses a dual-database architecture with two separate SQLite files:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Application Layer │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌───────────┴───────────┐
|
||||
▼ ▼
|
||||
┌──────────────────────────┐ ┌──────────────────────────┐
|
||||
│ main_database.sqlite3 │ │ track_files.sqlite3 │
|
||||
│ (~117GB) │ │ (~99GB) │
|
||||
│ │ │ │
|
||||
│ - tracks │ │ - track_files │
|
||||
│ - albums │ │ (extended metadata) │
|
||||
│ - artists │ │ │
|
||||
│ - track_artists │ │ │
|
||||
│ - artist_albums │ │ │
|
||||
│ - album_images │ │ │
|
||||
│ - artist_images │ │ │
|
||||
│ - artist_genres │ │ │
|
||||
└──────────────────────────┘ └──────────────────────────┘
|
||||
```
|
||||
|
||||
**Total storage:** ~216GB
|
||||
**Total tracks:** 256 million
|
||||
**Connection mode:** Read-only
|
||||
**Driver:** modernc.org/sqlite v1.34.4 (pure Go, no CGO)
|
||||
|
||||
## Connection Configuration
|
||||
|
||||
### Connection Strings
|
||||
|
||||
**Main database:**
|
||||
```
|
||||
file:/path/to/main_database.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
|
||||
```
|
||||
|
||||
**Track files database:**
|
||||
```
|
||||
file:/path/to/track_files.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
|
||||
```
|
||||
|
||||
### PRAGMA Settings
|
||||
|
||||
| PRAGMA | Value | Purpose | Impact |
|
||||
|--------|-------|---------|--------|
|
||||
| `mode=ro` | Read-only | Prevents writes | No write locks, safe concurrent reads |
|
||||
| `_journal_mode=off` | Disabled | No WAL/rollback journal | Faster reads, safe for read-only |
|
||||
| `_cache_size=-64000` | 64MB | Page cache size | Reduces disk I/O for hot data |
|
||||
| `_mmap_size=1073741824` | 1GB | Memory-mapped I/O | Faster reads via mmap |
|
||||
| `_query_only=true` | Enabled | Additional read-only enforcement | Extra safety layer |
|
||||
|
||||
**Cache size calculation:**
|
||||
- Negative value = kilobytes
|
||||
- `-64000` = 64,000 KB = 64 MB
|
||||
- Default SQLite cache is ~2MB (32x increase)
|
||||
|
||||
**Memory-mapped I/O:**
|
||||
- Maps 1GB of database file into process memory
|
||||
- OS handles paging (faster than read() syscalls)
|
||||
- Effective for frequently accessed data
|
||||
|
||||
### Connection Pool
|
||||
|
||||
```go
|
||||
db.SetMaxOpenConns(8) // Conservative limit (8 concurrent queries)
|
||||
db.SetMaxIdleConns(8) // Keep all connections warm
|
||||
db.SetConnMaxLifetime(0) // No expiration (read-only safe)
|
||||
```
|
||||
|
||||
**Rationale:**
|
||||
- Read-only workload (no write contention)
|
||||
- SQLite handles concurrent reads well
|
||||
- 8 connections balance throughput vs resource usage
|
||||
- No connection recycling needed (no state changes)
|
||||
|
||||
## Main Database Schema
|
||||
|
||||
### tracks Table
|
||||
|
||||
**Purpose:** Core track metadata
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `rowid` | INTEGER | SQLite internal row ID | No |
|
||||
| `id` | TEXT | Internal track ID | No |
|
||||
| `name` | TEXT | Track title | No |
|
||||
| `isrc` | TEXT | ISRC code | Yes |
|
||||
| `duration_ms` | INTEGER | Duration in milliseconds | No |
|
||||
| `explicit` | INTEGER | Explicit content flag (0/1) | No |
|
||||
| `track_number` | INTEGER | Track number on album | No |
|
||||
| `disc_number` | INTEGER | Disc number | No |
|
||||
| `popularity` | INTEGER | Popularity score (0-100) | No |
|
||||
| `preview_url` | TEXT | 30-second preview URL | Yes |
|
||||
| `album_rowid` | INTEGER | Foreign key to albums.rowid | No |
|
||||
|
||||
**Indexes:**
|
||||
- Primary key on `id`
|
||||
- Index on `isrc` (for ISRC lookups)
|
||||
- Index on `album_rowid` (for album track listings)
|
||||
|
||||
**Sample row:**
|
||||
```sql
|
||||
id: 4cOdK2wGLETKBW3PvgPWqT
|
||||
name: Bohemian Rhapsody
|
||||
isrc: GBUM71029604
|
||||
duration_ms: 354320
|
||||
explicit: 0
|
||||
track_number: 11
|
||||
disc_number: 1
|
||||
popularity: 89
|
||||
preview_url: https://p.scdn.co/mp3-preview/...
|
||||
album_rowid: 12345
|
||||
```
|
||||
|
||||
**Estimated rows:** 256 million
|
||||
|
||||
### albums Table
|
||||
|
||||
**Purpose:** Album metadata
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `rowid` | INTEGER | SQLite internal row ID | No |
|
||||
| `id` | TEXT | Internal album ID | No |
|
||||
| `name` | TEXT | Album title | No |
|
||||
| `album_type` | TEXT | "album", "single", "compilation" | No |
|
||||
| `label` | TEXT | Record label | Yes |
|
||||
| `release_date` | TEXT | ISO 8601 date (YYYY-MM-DD) | No |
|
||||
| `release_date_precision` | TEXT | "year", "month", "day" | No |
|
||||
| `external_id_upc` | TEXT | UPC barcode | Yes |
|
||||
| `total_tracks` | INTEGER | Total tracks on album | No |
|
||||
| `copyright_c` | TEXT | Copyright notice | Yes |
|
||||
| `copyright_p` | TEXT | Phonographic copyright | Yes |
|
||||
|
||||
**Indexes:**
|
||||
- Primary key on `id`
|
||||
- Index on `rowid` (for track joins)
|
||||
|
||||
**Sample row:**
|
||||
```sql
|
||||
id: 2ODvWsOgouMbaA5xf0RkJe
|
||||
name: A Night at the Opera
|
||||
album_type: album
|
||||
label: Hollywood Records
|
||||
release_date: 1975-11-21
|
||||
release_date_precision: day
|
||||
external_id_upc: 050087246679
|
||||
total_tracks: 12
|
||||
copyright_c: 1975 Queen Productions Ltd
|
||||
copyright_p: 1975 Queen Productions Ltd
|
||||
```
|
||||
|
||||
**Estimated rows:** Tens of millions (fewer than tracks)
|
||||
|
||||
### artists Table
|
||||
|
||||
**Purpose:** Artist metadata
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `rowid` | INTEGER | SQLite internal row ID | No |
|
||||
| `id` | TEXT | Internal artist ID | No |
|
||||
| `name` | TEXT | Artist name | No |
|
||||
| `followers_total` | INTEGER | Total followers | Yes |
|
||||
| `popularity` | INTEGER | Popularity score (0-100) | Yes |
|
||||
|
||||
**Indexes:**
|
||||
- Primary key on `id`
|
||||
- Index on `name` (for search)
|
||||
|
||||
**Sample row:**
|
||||
```sql
|
||||
id: 0TnOYISbd1XYRBk9myaseg
|
||||
name: Queen
|
||||
followers_total: 45000000
|
||||
popularity: 92
|
||||
```
|
||||
|
||||
**Estimated rows:** Millions (fewer than albums)
|
||||
|
||||
### track_artists Table
|
||||
|
||||
**Purpose:** Many-to-many relationship between tracks and artists
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `track_id` | TEXT | Foreign key to tracks.id | No |
|
||||
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
||||
|
||||
**Indexes:**
|
||||
- Composite index on `(track_id, artist_id)`
|
||||
- Index on `artist_id` (for artist track listings)
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 0TnOYISbd1XYRBk9myaseg
|
||||
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 1A2B3C4D5E6F7G8H9I0J
|
||||
```
|
||||
|
||||
**Estimated rows:** Hundreds of millions (tracks can have multiple artists)
|
||||
|
||||
### artist_albums Table
|
||||
|
||||
**Purpose:** Many-to-many relationship between artists and albums with ordering
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
||||
| `album_id` | TEXT | Foreign key to albums.id | No |
|
||||
| `index_in_album` | INTEGER | Artist order on album | No |
|
||||
|
||||
**Indexes:**
|
||||
- Composite index on `(album_id, index_in_album)`
|
||||
- Index on `artist_id` (for artist discography)
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 0
|
||||
artist_id: 1A2B3C4D5E6F7G8H9I0J, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 1
|
||||
```
|
||||
|
||||
**Purpose of index_in_album:** Preserves artist order for multi-artist albums (e.g., "Artist A & Artist B")
|
||||
|
||||
### album_images Table
|
||||
|
||||
**Purpose:** Album artwork URLs
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `album_id` | TEXT | Foreign key to albums.id | No |
|
||||
| `url` | TEXT | Image URL | No |
|
||||
| `width` | INTEGER | Width in pixels | No |
|
||||
| `height` | INTEGER | Height in pixels | No |
|
||||
|
||||
**Indexes:**
|
||||
- Index on `album_id`
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d0000b273..., width: 640, height: 640
|
||||
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00001e02..., width: 300, height: 300
|
||||
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00004851..., width: 64, height: 64
|
||||
```
|
||||
|
||||
**Typical sizes:** 640x640, 300x300, 64x64
|
||||
|
||||
**Image hosting:** External CDN (i.scdn.co), not hosted by API
|
||||
|
||||
### artist_images Table
|
||||
|
||||
**Purpose:** Artist images/photos
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
||||
| `url` | TEXT | Image URL | No |
|
||||
| `width` | INTEGER | Width in pixels | No |
|
||||
| `height` | INTEGER | Height in pixels | No |
|
||||
|
||||
**Indexes:**
|
||||
- Index on `artist_id`
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/af2b8e57f6d7b5d..., width: 640, height: 640
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/c06971e9ff81696..., width: 320, height: 320
|
||||
```
|
||||
|
||||
### artist_genres Table
|
||||
|
||||
**Purpose:** Artist genre tags
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
||||
| `genre` | TEXT | Genre name | No |
|
||||
|
||||
**Indexes:**
|
||||
- Index on `artist_id`
|
||||
|
||||
**Sample rows:**
|
||||
```sql
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: rock
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: classic rock
|
||||
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: glam rock
|
||||
```
|
||||
|
||||
**Genre characteristics:**
|
||||
- Multiple genres per artist
|
||||
- Lowercase, hyphenated (e.g., "indie-rock")
|
||||
- Spotify-style genre taxonomy
|
||||
|
||||
## Track Files Database Schema
|
||||
|
||||
### track_files Table
|
||||
|
||||
**Purpose:** Extended track metadata not in main database
|
||||
|
||||
| Column | Type | Description | Nullable |
|
||||
|--------|------|-------------|----------|
|
||||
| `track_id` | TEXT | Foreign key to tracks.id | No |
|
||||
| `has_lyrics` | INTEGER | Lyrics availability flag (0/1) | No |
|
||||
| `original_title` | TEXT | Original title (if different) | Yes |
|
||||
| `version_title` | TEXT | Version descriptor (e.g., "Radio Edit") | Yes |
|
||||
| `language_of_performance` | TEXT | JSON array of language codes | Yes |
|
||||
| `artist_roles` | TEXT | JSON object mapping artist IDs to roles | Yes |
|
||||
|
||||
**Indexes:**
|
||||
- Primary key on `track_id`
|
||||
|
||||
**Sample row:**
|
||||
```sql
|
||||
track_id: 4cOdK2wGLETKBW3PvgPWqT
|
||||
has_lyrics: 1
|
||||
original_title: Bohemian Rhapsody
|
||||
version_title: NULL
|
||||
language_of_performance: ["en"]
|
||||
artist_roles: {"0TnOYISbd1XYRBk9myaseg": ["performer", "composer"]}
|
||||
```
|
||||
|
||||
**JSON field parsing:**
|
||||
|
||||
**language_of_performance:**
|
||||
```json
|
||||
["en", "es"] // ISO 639-1 language codes
|
||||
```
|
||||
|
||||
**artist_roles:**
|
||||
```json
|
||||
{
|
||||
"artist_id_1": ["performer", "composer"],
|
||||
"artist_id_2": ["producer"],
|
||||
"artist_id_3": ["lyricist"]
|
||||
}
|
||||
```
|
||||
|
||||
**Common roles:**
|
||||
- `performer` - Main performer
|
||||
- `composer` - Music composer
|
||||
- `lyricist` - Lyrics writer
|
||||
- `producer` - Producer
|
||||
- `engineer` - Recording engineer
|
||||
- `mixer` - Mix engineer
|
||||
|
||||
**Estimated rows:** 256 million (one per track)
|
||||
|
||||
## Query Patterns
|
||||
|
||||
### Individual Track Lookup
|
||||
|
||||
```sql
|
||||
-- Step 1: Fetch track + album (single JOIN)
|
||||
SELECT
|
||||
t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
||||
t.track_number, t.disc_number, t.popularity, t.preview_url,
|
||||
a.id AS album_id, a.name AS album_name, a.album_type,
|
||||
a.label, a.release_date, a.release_date_precision,
|
||||
a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
|
||||
FROM tracks t
|
||||
JOIN albums a ON t.album_rowid = a.rowid
|
||||
WHERE t.id = ?
|
||||
|
||||
-- Step 2: Fetch album images
|
||||
SELECT url, width, height
|
||||
FROM album_images
|
||||
WHERE album_id = ?
|
||||
ORDER BY width DESC
|
||||
|
||||
-- Step 3: Fetch album artists
|
||||
SELECT a.id, a.name, a.followers_total, a.popularity
|
||||
FROM artists a
|
||||
JOIN artist_albums aa ON a.id = aa.artist_id
|
||||
WHERE aa.album_id = ?
|
||||
ORDER BY aa.index_in_album
|
||||
|
||||
-- Step 4: Fetch track artists
|
||||
SELECT a.id, a.name, a.followers_total, a.popularity
|
||||
FROM artists a
|
||||
JOIN track_artists ta ON a.id = ta.artist_id
|
||||
WHERE ta.track_id = ?
|
||||
|
||||
-- Step 5: Fetch artist genres (for each artist)
|
||||
SELECT genre
|
||||
FROM artist_genres
|
||||
WHERE artist_id = ?
|
||||
|
||||
-- Step 6: Fetch artist images (for each artist)
|
||||
SELECT url, width, height
|
||||
FROM artist_images
|
||||
WHERE artist_id = ?
|
||||
ORDER BY width DESC
|
||||
|
||||
-- Step 7: Fetch track files (from track_files.sqlite3)
|
||||
SELECT has_lyrics, original_title, version_title,
|
||||
language_of_performance, artist_roles
|
||||
FROM track_files
|
||||
WHERE track_id = ?
|
||||
```
|
||||
|
||||
**Total queries for single track:** 7+ (depending on number of artists)
|
||||
|
||||
### Batch ISRC Lookup
|
||||
|
||||
```sql
|
||||
-- Step 1: Fetch all tracks by ISRC (single query with IN clause)
|
||||
SELECT
|
||||
t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
||||
t.track_number, t.disc_number, t.popularity, t.preview_url,
|
||||
a.id AS album_id, a.name AS album_name, a.album_type,
|
||||
a.label, a.release_date, a.release_date_precision,
|
||||
a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
|
||||
FROM tracks t
|
||||
JOIN albums a ON t.album_rowid = a.rowid
|
||||
WHERE t.isrc IN (?, ?, ?, ...) -- Up to 400 placeholders
|
||||
|
||||
-- Step 2: Batch fetch album images (all albums at once)
|
||||
SELECT album_id, url, width, height
|
||||
FROM album_images
|
||||
WHERE album_id IN (?, ?, ?, ...)
|
||||
ORDER BY album_id, width DESC
|
||||
|
||||
-- Step 3: Batch fetch album artists
|
||||
SELECT aa.album_id, a.id, a.name, a.followers_total, a.popularity, aa.index_in_album
|
||||
FROM artists a
|
||||
JOIN artist_albums aa ON a.id = aa.artist_id
|
||||
WHERE aa.album_id IN (?, ?, ?, ...)
|
||||
ORDER BY aa.album_id, aa.index_in_album
|
||||
|
||||
-- Step 4: Batch fetch track artists
|
||||
SELECT ta.track_id, a.id, a.name, a.followers_total, a.popularity
|
||||
FROM artists a
|
||||
JOIN track_artists ta ON a.id = ta.artist_id
|
||||
WHERE ta.track_id IN (?, ?, ?, ...)
|
||||
|
||||
-- Step 5: Batch fetch artist genres
|
||||
SELECT artist_id, genre
|
||||
FROM artist_genres
|
||||
WHERE artist_id IN (?, ?, ?, ...)
|
||||
|
||||
-- Step 6: Batch fetch artist images
|
||||
SELECT artist_id, url, width, height
|
||||
FROM artist_images
|
||||
WHERE artist_id IN (?, ?, ?, ...)
|
||||
ORDER BY artist_id, width DESC
|
||||
|
||||
-- Step 7: Batch fetch track files
|
||||
SELECT track_id, has_lyrics, original_title, version_title,
|
||||
language_of_performance, artist_roles
|
||||
FROM track_files
|
||||
WHERE track_id IN (?, ?, ?, ...)
|
||||
```
|
||||
|
||||
**Total queries for 400 tracks:** 7 (vs 2,800+ for individual lookups)
|
||||
|
||||
**Performance gain:** 400x fewer queries
|
||||
|
||||
### Search Queries
|
||||
|
||||
**Track search:**
|
||||
```sql
|
||||
SELECT id, name, isrc, duration_ms, popularity, album_rowid
|
||||
FROM tracks
|
||||
WHERE name LIKE ? COLLATE NOCASE -- ? = '%query%'
|
||||
ORDER BY popularity DESC
|
||||
LIMIT ?
|
||||
```
|
||||
|
||||
**Artist search:**
|
||||
```sql
|
||||
SELECT id, name, followers_total, popularity
|
||||
FROM artists
|
||||
WHERE name LIKE ? COLLATE NOCASE -- ? = '%query%'
|
||||
ORDER BY followers_total DESC
|
||||
LIMIT ?
|
||||
```
|
||||
|
||||
**Search characteristics:**
|
||||
- `LIKE %query%` can't use indexes (full table scan)
|
||||
- `COLLATE NOCASE` for case-insensitive matching
|
||||
- Ordered by popularity/followers (most relevant first)
|
||||
- Limited to 50 results max
|
||||
- 10-second timeout via context deadline
|
||||
|
||||
**Performance concern:** Searching 256M tracks with `LIKE %query%` is slow. Full-text search (FTS5) would be faster but not implemented.
|
||||
|
||||
### Album Tracks Lookup
|
||||
|
||||
```sql
|
||||
-- Fetch all tracks for an album
|
||||
SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
||||
t.track_number, t.disc_number, t.popularity, t.preview_url
|
||||
FROM tracks t
|
||||
WHERE t.album_rowid = (
|
||||
SELECT rowid FROM albums WHERE id = ?
|
||||
)
|
||||
ORDER BY t.disc_number, t.track_number
|
||||
```
|
||||
|
||||
**Ordering:** Disc number first, then track number (preserves album order)
|
||||
|
||||
## Data Enrichment Strategy
|
||||
|
||||
### Enrichment Pipeline
|
||||
|
||||
```
|
||||
1. Fetch base entity (track/album/artist)
|
||||
↓
|
||||
2. Collect related entity IDs
|
||||
↓
|
||||
3. Batch fetch related entities
|
||||
↓
|
||||
4. Assemble nested structures
|
||||
↓
|
||||
5. Return enriched object
|
||||
```
|
||||
|
||||
### Batch Optimization Functions
|
||||
|
||||
**Implementation in db.go (907 lines):**
|
||||
|
||||
```go
|
||||
// Batch fetch album images for multiple albums
|
||||
func (d *Database) batchGetAlbumImages(albumIDs []string) map[string][]Image {
|
||||
// Build IN clause
|
||||
placeholders := strings.Repeat("?,", len(albumIDs)-1) + "?"
|
||||
query := fmt.Sprintf(`
|
||||
SELECT album_id, url, width, height
|
||||
FROM album_images
|
||||
WHERE album_id IN (%s)
|
||||
ORDER BY album_id, width DESC
|
||||
`, placeholders)
|
||||
|
||||
// Execute query
|
||||
rows, _ := d.mainDB.Query(query, albumIDs...)
|
||||
|
||||
// Group by album_id
|
||||
result := make(map[string][]Image)
|
||||
for rows.Next() {
|
||||
var albumID string
|
||||
var img Image
|
||||
rows.Scan(&albumID, &img.URL, &img.Width, &img.Height)
|
||||
result[albumID] = append(result[albumID], img)
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
```
|
||||
|
||||
**Similar functions:**
|
||||
- `batchGetAlbumArtists(albumIDs []string) map[string][]Artist`
|
||||
- `batchGetTrackArtists(trackIDs []string) map[string][]Artist`
|
||||
- `batchGetArtistGenres(artistIDs []string) map[string][]string`
|
||||
- `batchGetArtistImages(artistIDs []string) map[string][]Image`
|
||||
- `batchEnrichTrackFiles(trackIDs []string) map[string]*TrackFile`
|
||||
|
||||
**Pattern:**
|
||||
1. Build IN clause with placeholders
|
||||
2. Execute single query for all IDs
|
||||
3. Group results by parent ID
|
||||
4. Return map for O(1) lookup
|
||||
|
||||
### Why Batch Matters
|
||||
|
||||
**Without batching (400 tracks):**
|
||||
- 400 track queries
|
||||
- 400 album queries
|
||||
- 400 album image queries
|
||||
- 400 album artist queries
|
||||
- 400 track artist queries
|
||||
- ~800 artist genre queries (2 artists per track avg)
|
||||
- ~800 artist image queries
|
||||
- 400 track file queries
|
||||
- **Total: ~3,600 queries**
|
||||
|
||||
**With batching (400 tracks):**
|
||||
- 1 batch track query
|
||||
- 1 batch album image query
|
||||
- 1 batch album artist query
|
||||
- 1 batch track artist query
|
||||
- 1 batch artist genre query
|
||||
- 1 batch artist image query
|
||||
- 1 batch track file query
|
||||
- **Total: 7 queries**
|
||||
|
||||
**Performance gain: 514x fewer queries**
|
||||
|
||||
## Data Provenance
|
||||
|
||||
### Source
|
||||
|
||||
**Disclaimer from repository:**
|
||||
> "This project is not affiliated with Spotify."
|
||||
|
||||
**Implications:**
|
||||
- Data source unclear (likely scraped or obtained from third party)
|
||||
- Legal status uncertain
|
||||
- No official Spotify endorsement
|
||||
|
||||
### Data Freshness
|
||||
|
||||
**Static snapshot:**
|
||||
- No update mechanism
|
||||
- Data frozen at time of database creation
|
||||
- No real-time sync with Spotify
|
||||
|
||||
**Staleness concerns:**
|
||||
- New releases not included
|
||||
- Popularity scores outdated
|
||||
- Artist follower counts stale
|
||||
- Deleted tracks still present
|
||||
|
||||
**Mitigation:**
|
||||
- Treat as historical snapshot
|
||||
- Complement with real-time APIs for fresh data
|
||||
- Periodically obtain updated database (if available)
|
||||
|
||||
### Data Quality
|
||||
|
||||
**Strengths:**
|
||||
- 256M tracks (massive coverage)
|
||||
- Rich metadata (genres, images, roles)
|
||||
- ISRC codes for cross-referencing
|
||||
- Popularity/follower metrics
|
||||
|
||||
**Weaknesses:**
|
||||
- No data validation visible
|
||||
- Potential duplicates (not deduplicated)
|
||||
- Missing ISRCs for some tracks
|
||||
- Incomplete artist roles
|
||||
|
||||
## Storage Requirements
|
||||
|
||||
### Disk Space
|
||||
|
||||
| Component | Size | Compressible |
|
||||
|-----------|------|--------------|
|
||||
| main_database.sqlite3 | ~117GB | Minimal (already compact) |
|
||||
| track_files.sqlite3 | ~99GB | Minimal (JSON fields) |
|
||||
| **Total** | **~216GB** | - |
|
||||
|
||||
**Recommendations:**
|
||||
- SSD strongly recommended (HDD too slow for 256M rows)
|
||||
- NVMe for best performance
|
||||
- RAID not necessary (read-only, can rebuild from backup)
|
||||
|
||||
### Memory Usage
|
||||
|
||||
**SQLite memory:**
|
||||
- Page cache: 64MB per connection
|
||||
- 8 connections: 512MB cache total
|
||||
- Memory-mapped I/O: 1GB per database (2GB total)
|
||||
- **Total: ~2.5GB minimum**
|
||||
|
||||
**Application memory:**
|
||||
- Go runtime: ~50MB
|
||||
- Rate limiter map: Grows unbounded (leak)
|
||||
- Request buffers: ~10MB per concurrent request
|
||||
- **Total: ~100MB + leak**
|
||||
|
||||
**Recommended RAM:** 4GB+ (2.5GB for SQLite + 1.5GB for OS/app)
|
||||
|
||||
### I/O Characteristics
|
||||
|
||||
**Read patterns:**
|
||||
- Random reads (track lookups by ID/ISRC)
|
||||
- Sequential scans (search queries)
|
||||
- Batch reads (IN clause queries)
|
||||
|
||||
**Write patterns:**
|
||||
- None (read-only)
|
||||
|
||||
**Cache effectiveness:**
|
||||
- Hot data (popular tracks): High hit rate
|
||||
- Cold data (obscure tracks): Low hit rate
|
||||
- Search queries: Low hit rate (full scans)
|
||||
|
||||
## Database Maintenance
|
||||
|
||||
### No Maintenance Required
|
||||
|
||||
**Read-only benefits:**
|
||||
- No VACUUM needed (no fragmentation from deletes)
|
||||
- No ANALYZE needed (statistics static)
|
||||
- No REINDEX needed (indexes don't degrade)
|
||||
- No WAL checkpoint (journal disabled)
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
**Simple backup:**
|
||||
```bash
|
||||
# Copy files (database must be idle)
|
||||
cp main_database.sqlite3 backup/
|
||||
cp track_files.sqlite3 backup/
|
||||
```
|
||||
|
||||
**Online backup (while running):**
|
||||
```bash
|
||||
# SQLite backup API (requires custom tool)
|
||||
sqlite3 main_database.sqlite3 ".backup backup/main_database.sqlite3"
|
||||
```
|
||||
|
||||
**Restore:**
|
||||
```bash
|
||||
# Simply replace files
|
||||
cp backup/main_database.sqlite3 .
|
||||
cp backup/track_files.sqlite3 .
|
||||
```
|
||||
|
||||
### Integrity Checks
|
||||
|
||||
**Verify database integrity:**
|
||||
```bash
|
||||
sqlite3 main_database.sqlite3 "PRAGMA integrity_check;"
|
||||
sqlite3 track_files.sqlite3 "PRAGMA integrity_check;"
|
||||
```
|
||||
|
||||
**Expected output:** `ok`
|
||||
|
||||
**Run periodically:** Monthly or after hardware issues
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Query Optimization
|
||||
|
||||
**Indexes already present:**
|
||||
- Primary keys on all ID columns
|
||||
- Foreign key indexes (album_rowid, artist_id, etc.)
|
||||
- Search indexes (tracks.name, artists.name)
|
||||
|
||||
**Missing indexes (potential improvements):**
|
||||
- Full-text search index (FTS5) on track/artist names
|
||||
- Composite index on (popularity, name) for sorted searches
|
||||
|
||||
### Connection Pool Tuning
|
||||
|
||||
**Current settings:**
|
||||
```go
|
||||
MaxOpenConns: 8
|
||||
MaxIdleConns: 8
|
||||
ConnMaxLifetime: 0
|
||||
```
|
||||
|
||||
**Tuning considerations:**
|
||||
- Increase MaxOpenConns for higher concurrency (16-32)
|
||||
- Monitor CPU usage (SQLite is CPU-bound for searches)
|
||||
- No benefit beyond CPU core count
|
||||
|
||||
### Cache Tuning
|
||||
|
||||
**Current cache:** 64MB per connection (512MB total)
|
||||
|
||||
**Increase cache:**
|
||||
```
|
||||
_cache_size=-128000 // 128MB per connection
|
||||
```
|
||||
|
||||
**Tradeoff:** More memory usage vs fewer disk reads
|
||||
|
||||
**Recommendation:** Monitor cache hit rate, increase if low
|
||||
|
||||
### Memory-Mapped I/O Tuning
|
||||
|
||||
**Current mmap:** 1GB per database
|
||||
|
||||
**Increase mmap:**
|
||||
```
|
||||
_mmap_size=2147483648 // 2GB
|
||||
```
|
||||
|
||||
**Tradeoff:** More virtual memory vs faster reads
|
||||
|
||||
**Recommendation:** Set to database size if RAM allows (117GB not feasible)
|
||||
|
||||
## Data Model Comparison
|
||||
|
||||
### vs Spotify Web API
|
||||
|
||||
| Feature | Music Metadata API | Spotify Web API |
|
||||
|---------|-------------------|-----------------|
|
||||
| Track ID format | Spotify-compatible | Spotify IDs |
|
||||
| ISRC support | Yes | Yes |
|
||||
| Popularity | Static snapshot | Real-time |
|
||||
| Followers | Static snapshot | Real-time |
|
||||
| Images | External URLs | External URLs |
|
||||
| Genres | Artist-level | Artist-level |
|
||||
| Lyrics | Flag only | Not available |
|
||||
| Artist roles | Detailed | Limited |
|
||||
| Languages | Supported | Not available |
|
||||
|
||||
### vs MusicBrainz
|
||||
|
||||
| Feature | Music Metadata API | MusicBrainz |
|
||||
|---------|-------------------|-------------|
|
||||
| Identifier | Spotify IDs, ISRC | MBIDs |
|
||||
| Dataset size | 256M tracks | ~40M recordings |
|
||||
| Popularity | Yes | No |
|
||||
| Followers | Yes | No |
|
||||
| Images | Yes (external) | Yes (Cover Art Archive) |
|
||||
| Genres | Yes | Yes (tags) |
|
||||
| Relationships | Limited | Extensive |
|
||||
| Credits | Artist roles | Detailed credits |
|
||||
| Updates | Static | Community-driven |
|
||||
|
||||
## Integration Considerations
|
||||
|
||||
### Joining with Other Databases
|
||||
|
||||
**ISRC as common key:**
|
||||
```sql
|
||||
-- Join with local library
|
||||
SELECT l.file_path, m.name, m.popularity
|
||||
FROM local_library l
|
||||
JOIN music_metadata_api.tracks m ON l.isrc = m.isrc
|
||||
```
|
||||
|
||||
**Spotify ID as common key:**
|
||||
```sql
|
||||
-- Join with MusicBrainz
|
||||
SELECT mb.mbid, mm.name, mm.popularity
|
||||
FROM musicbrainz.recording mb
|
||||
JOIN musicbrainz.isrc i ON mb.id = i.recording
|
||||
JOIN music_metadata_api.tracks mm ON i.isrc = mm.isrc
|
||||
```
|
||||
|
||||
### Data Export
|
||||
|
||||
**Export to JSON:**
|
||||
```bash
|
||||
sqlite3 main_database.sqlite3 <<EOF
|
||||
.mode json
|
||||
.output tracks.json
|
||||
SELECT * FROM tracks LIMIT 1000;
|
||||
EOF
|
||||
```
|
||||
|
||||
**Export to CSV:**
|
||||
```bash
|
||||
sqlite3 main_database.sqlite3 <<EOF
|
||||
.mode csv
|
||||
.output tracks.csv
|
||||
SELECT id, name, isrc, popularity FROM tracks;
|
||||
EOF
|
||||
```
|
||||
|
||||
### Data Import
|
||||
|
||||
**Import from CSV:**
|
||||
```bash
|
||||
sqlite3 new_database.sqlite3 <<EOF
|
||||
.mode csv
|
||||
.import tracks.csv tracks
|
||||
EOF
|
||||
```
|
||||
|
||||
**Bulk insert from application:**
|
||||
```go
|
||||
tx, _ := db.Begin()
|
||||
stmt, _ := tx.Prepare("INSERT INTO tracks VALUES (?, ?, ?, ...)")
|
||||
for _, track := range tracks {
|
||||
stmt.Exec(track.ID, track.Name, track.ISRC, ...)
|
||||
}
|
||||
tx.Commit()
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
### No Write Operations
|
||||
|
||||
**Implications:**
|
||||
- Can't add new tracks
|
||||
- Can't update popularity scores
|
||||
- Can't delete duplicates
|
||||
- Can't fix data errors
|
||||
|
||||
**Workarounds:**
|
||||
- Create separate writable database for local additions
|
||||
- Use views to merge read-only + writable data
|
||||
- Periodically obtain updated database snapshot
|
||||
|
||||
### No Full-Text Search
|
||||
|
||||
**Current search:** `LIKE %query%` (slow)
|
||||
|
||||
**FTS5 alternative:**
|
||||
```sql
|
||||
-- Create FTS5 virtual table (requires writable database)
|
||||
CREATE VIRTUAL TABLE tracks_fts USING fts5(name, content=tracks);
|
||||
INSERT INTO tracks_fts SELECT name FROM tracks;
|
||||
|
||||
-- Fast search
|
||||
SELECT * FROM tracks_fts WHERE name MATCH 'bohemian';
|
||||
```
|
||||
|
||||
**Limitation:** Can't create FTS5 on read-only database
|
||||
|
||||
**Workaround:** Create separate FTS5 database, sync periodically
|
||||
|
||||
### No Relationships Beyond Basics
|
||||
|
||||
**Missing relationships:**
|
||||
- Track-to-track (similar tracks, remixes)
|
||||
- Album-to-album (compilations, deluxe editions)
|
||||
- Artist-to-artist (collaborations, bands)
|
||||
|
||||
**Workaround:** Build relationship graph in separate database
|
||||
Reference in New Issue
Block a user