a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
912 lines
25 KiB
Markdown
912 lines
25 KiB
Markdown
# Music Metadata API - Data Layer
|
|
|
|
## Database Architecture
|
|
|
|
Music Metadata API uses a dual-database architecture with two separate SQLite files:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Application Layer │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
┌───────────┴───────────┐
|
|
▼ ▼
|
|
┌──────────────────────────┐ ┌──────────────────────────┐
|
|
│ main_database.sqlite3 │ │ track_files.sqlite3 │
|
|
│ (~117GB) │ │ (~99GB) │
|
|
│ │ │ │
|
|
│ - tracks │ │ - track_files │
|
|
│ - albums │ │ (extended metadata) │
|
|
│ - artists │ │ │
|
|
│ - track_artists │ │ │
|
|
│ - artist_albums │ │ │
|
|
│ - album_images │ │ │
|
|
│ - artist_images │ │ │
|
|
│ - artist_genres │ │ │
|
|
└──────────────────────────┘ └──────────────────────────┘
|
|
```
|
|
|
|
**Total storage:** ~216GB
|
|
**Total tracks:** 256 million
|
|
**Connection mode:** Read-only
|
|
**Driver:** modernc.org/sqlite v1.34.4 (pure Go, no CGO)
|
|
|
|
## Connection Configuration
|
|
|
|
### Connection Strings
|
|
|
|
**Main database:**
|
|
```
|
|
file:/path/to/main_database.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
|
|
```
|
|
|
|
**Track files database:**
|
|
```
|
|
file:/path/to/track_files.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
|
|
```
|
|
|
|
### PRAGMA Settings
|
|
|
|
| PRAGMA | Value | Purpose | Impact |
|
|
|--------|-------|---------|--------|
|
|
| `mode=ro` | Read-only | Prevents writes | No write locks, safe concurrent reads |
|
|
| `_journal_mode=off` | Disabled | No WAL/rollback journal | Faster reads, safe for read-only |
|
|
| `_cache_size=-64000` | 64MB | Page cache size | Reduces disk I/O for hot data |
|
|
| `_mmap_size=1073741824` | 1GB | Memory-mapped I/O | Faster reads via mmap |
|
|
| `_query_only=true` | Enabled | Additional read-only enforcement | Extra safety layer |
|
|
|
|
**Cache size calculation:**
|
|
- Negative value = kilobytes
|
|
- `-64000` = 64,000 KB = 64 MB
|
|
- Default SQLite cache is ~2MB (32x increase)
|
|
|
|
**Memory-mapped I/O:**
|
|
- Maps 1GB of database file into process memory
|
|
- OS handles paging (faster than read() syscalls)
|
|
- Effective for frequently accessed data
|
|
|
|
### Connection Pool
|
|
|
|
```go
|
|
db.SetMaxOpenConns(8) // Conservative limit (8 concurrent queries)
|
|
db.SetMaxIdleConns(8) // Keep all connections warm
|
|
db.SetConnMaxLifetime(0) // No expiration (read-only safe)
|
|
```
|
|
|
|
**Rationale:**
|
|
- Read-only workload (no write contention)
|
|
- SQLite handles concurrent reads well
|
|
- 8 connections balance throughput vs resource usage
|
|
- No connection recycling needed (no state changes)
|
|
|
|
## Main Database Schema
|
|
|
|
### tracks Table
|
|
|
|
**Purpose:** Core track metadata
|
|
|
|
| Column | Type | Description | Nullable |
|
|
|--------|------|-------------|----------|
|
|
| `rowid` | INTEGER | SQLite internal row ID | No |
|
|
| `id` | TEXT | Internal track ID | No |
|
|
| `name` | TEXT | Track title | No |
|
|
| `isrc` | TEXT | ISRC code | Yes |
|
|
| `duration_ms` | INTEGER | Duration in milliseconds | No |
|
|
| `explicit` | INTEGER | Explicit content flag (0/1) | No |
|
|
| `track_number` | INTEGER | Track number on album | No |
|
|
| `disc_number` | INTEGER | Disc number | No |
|
|
| `popularity` | INTEGER | Popularity score (0-100) | No |
|
|
| `preview_url` | TEXT | 30-second preview URL | Yes |
|
|
| `album_rowid` | INTEGER | Foreign key to albums.rowid | No |
|
|
|
|
**Indexes:**
|
|
- Primary key on `id`
|
|
- Index on `isrc` (for ISRC lookups)
|
|
- Index on `album_rowid` (for album track listings)
|
|
|
|
**Sample row:**
|
|
```sql
|
|
id: 4cOdK2wGLETKBW3PvgPWqT
|
|
name: Bohemian Rhapsody
|
|
isrc: GBUM71029604
|
|
duration_ms: 354320
|
|
explicit: 0
|
|
track_number: 11
|
|
disc_number: 1
|
|
popularity: 89
|
|
preview_url: https://p.scdn.co/mp3-preview/...
|
|
album_rowid: 12345
|
|
```
|
|
|
|
**Estimated rows:** 256 million
|
|
|
|
### albums Table
|
|
|
|
**Purpose:** Album metadata
|
|
|
|
| Column | Type | Description | Nullable |
|
|
|--------|------|-------------|----------|
|
|
| `rowid` | INTEGER | SQLite internal row ID | No |
|
|
| `id` | TEXT | Internal album ID | No |
|
|
| `name` | TEXT | Album title | No |
|
|
| `album_type` | TEXT | "album", "single", "compilation" | No |
|
|
| `label` | TEXT | Record label | Yes |
|
|
| `release_date` | TEXT | ISO 8601 date (YYYY-MM-DD) | No |
|
|
| `release_date_precision` | TEXT | "year", "month", "day" | No |
|
|
| `external_id_upc` | TEXT | UPC barcode | Yes |
|
|
| `total_tracks` | INTEGER | Total tracks on album | No |
|
|
| `copyright_c` | TEXT | Copyright notice | Yes |
|
|
| `copyright_p` | TEXT | Phonographic copyright | Yes |
|
|
|
|
**Indexes:**
|
|
- Primary key on `id`
|
|
- Index on `rowid` (for track joins)
|
|
|
|
**Sample row:**
|
|
```sql
|
|
id: 2ODvWsOgouMbaA5xf0RkJe
|
|
name: A Night at the Opera
|
|
album_type: album
|
|
label: Hollywood Records
|
|
release_date: 1975-11-21
|
|
release_date_precision: day
|
|
external_id_upc: 050087246679
|
|
total_tracks: 12
|
|
copyright_c: 1975 Queen Productions Ltd
|
|
copyright_p: 1975 Queen Productions Ltd
|
|
```
|
|
|
|
**Estimated rows:** Tens of millions (fewer than tracks)
|
|
|
|
### artists Table
|
|
|
|
**Purpose:** Artist metadata
|
|
|
|
| Column | Type | Description | Nullable |
|
|
|--------|------|-------------|----------|
|
|
| `rowid` | INTEGER | SQLite internal row ID | No |
|
|
| `id` | TEXT | Internal artist ID | No |
|
|
| `name` | TEXT | Artist name | No |
|
|
| `followers_total` | INTEGER | Total followers | Yes |
|
|
| `popularity` | INTEGER | Popularity score (0-100) | Yes |
|
|
|
|
**Indexes:**
|
|
- Primary key on `id`
|
|
- Index on `name` (for search)
|
|
|
|
**Sample row:**
|
|
```sql
|
|
id: 0TnOYISbd1XYRBk9myaseg
|
|
name: Queen
|
|
followers_total: 45000000
|
|
popularity: 92
|
|
```
|
|
|
|
**Estimated rows:** Millions (fewer than albums)
|
|
|
|
### track_artists Table
|
|
|
|
**Purpose:** Many-to-many relationship between tracks and artists
|
|
|
|
| Column | Type | Description | Nullable |
|
|
|--------|------|-------------|----------|
|
|
| `track_id` | TEXT | Foreign key to tracks.id | No |
|
|
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
|
|
|
**Indexes:**
|
|
- Composite index on `(track_id, artist_id)`
|
|
- Index on `artist_id` (for artist track listings)
|
|
|
|
**Sample rows:**
|
|
```sql
|
|
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 0TnOYISbd1XYRBk9myaseg
|
|
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 1A2B3C4D5E6F7G8H9I0J
|
|
```
|
|
|
|
**Estimated rows:** Hundreds of millions (tracks can have multiple artists)
|
|
|
|
### artist_albums Table
|
|
|
|
**Purpose:** Many-to-many relationship between artists and albums with ordering
|
|
|
|
| Column | Type | Description | Nullable |
|
|
|--------|------|-------------|----------|
|
|
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
|
| `album_id` | TEXT | Foreign key to albums.id | No |
|
|
| `index_in_album` | INTEGER | Artist order on album | No |
|
|
|
|
**Indexes:**
|
|
- Composite index on `(album_id, index_in_album)`
|
|
- Index on `artist_id` (for artist discography)
|
|
|
|
**Sample rows:**
|
|
```sql
|
|
artist_id: 0TnOYISbd1XYRBk9myaseg, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 0
|
|
artist_id: 1A2B3C4D5E6F7G8H9I0J, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 1
|
|
```
|
|
|
|
**Purpose of index_in_album:** Preserves artist order for multi-artist albums (e.g., "Artist A & Artist B")
|
|
|
|
### album_images Table
|
|
|
|
**Purpose:** Album artwork URLs
|
|
|
|
| Column | Type | Description | Nullable |
|
|
|--------|------|-------------|----------|
|
|
| `album_id` | TEXT | Foreign key to albums.id | No |
|
|
| `url` | TEXT | Image URL | No |
|
|
| `width` | INTEGER | Width in pixels | No |
|
|
| `height` | INTEGER | Height in pixels | No |
|
|
|
|
**Indexes:**
|
|
- Index on `album_id`
|
|
|
|
**Sample rows:**
|
|
```sql
|
|
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d0000b273..., width: 640, height: 640
|
|
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00001e02..., width: 300, height: 300
|
|
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00004851..., width: 64, height: 64
|
|
```
|
|
|
|
**Typical sizes:** 640x640, 300x300, 64x64
|
|
|
|
**Image hosting:** External CDN (i.scdn.co), not hosted by API
|
|
|
|
### artist_images Table
|
|
|
|
**Purpose:** Artist images/photos
|
|
|
|
| Column | Type | Description | Nullable |
|
|
|--------|------|-------------|----------|
|
|
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
|
| `url` | TEXT | Image URL | No |
|
|
| `width` | INTEGER | Width in pixels | No |
|
|
| `height` | INTEGER | Height in pixels | No |
|
|
|
|
**Indexes:**
|
|
- Index on `artist_id`
|
|
|
|
**Sample rows:**
|
|
```sql
|
|
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/af2b8e57f6d7b5d..., width: 640, height: 640
|
|
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/c06971e9ff81696..., width: 320, height: 320
|
|
```
|
|
|
|
### artist_genres Table
|
|
|
|
**Purpose:** Artist genre tags
|
|
|
|
| Column | Type | Description | Nullable |
|
|
|--------|------|-------------|----------|
|
|
| `artist_id` | TEXT | Foreign key to artists.id | No |
|
|
| `genre` | TEXT | Genre name | No |
|
|
|
|
**Indexes:**
|
|
- Index on `artist_id`
|
|
|
|
**Sample rows:**
|
|
```sql
|
|
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: rock
|
|
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: classic rock
|
|
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: glam rock
|
|
```
|
|
|
|
**Genre characteristics:**
|
|
- Multiple genres per artist
|
|
- Lowercase, hyphenated (e.g., "indie-rock")
|
|
- Spotify-style genre taxonomy
|
|
|
|
## Track Files Database Schema
|
|
|
|
### track_files Table
|
|
|
|
**Purpose:** Extended track metadata not in main database
|
|
|
|
| Column | Type | Description | Nullable |
|
|
|--------|------|-------------|----------|
|
|
| `track_id` | TEXT | Foreign key to tracks.id | No |
|
|
| `has_lyrics` | INTEGER | Lyrics availability flag (0/1) | No |
|
|
| `original_title` | TEXT | Original title (if different) | Yes |
|
|
| `version_title` | TEXT | Version descriptor (e.g., "Radio Edit") | Yes |
|
|
| `language_of_performance` | TEXT | JSON array of language codes | Yes |
|
|
| `artist_roles` | TEXT | JSON object mapping artist IDs to roles | Yes |
|
|
|
|
**Indexes:**
|
|
- Primary key on `track_id`
|
|
|
|
**Sample row:**
|
|
```sql
|
|
track_id: 4cOdK2wGLETKBW3PvgPWqT
|
|
has_lyrics: 1
|
|
original_title: Bohemian Rhapsody
|
|
version_title: NULL
|
|
language_of_performance: ["en"]
|
|
artist_roles: {"0TnOYISbd1XYRBk9myaseg": ["performer", "composer"]}
|
|
```
|
|
|
|
**JSON field parsing:**
|
|
|
|
**language_of_performance:**
|
|
```json
|
|
["en", "es"] // ISO 639-1 language codes
|
|
```
|
|
|
|
**artist_roles:**
|
|
```json
|
|
{
|
|
"artist_id_1": ["performer", "composer"],
|
|
"artist_id_2": ["producer"],
|
|
"artist_id_3": ["lyricist"]
|
|
}
|
|
```
|
|
|
|
**Common roles:**
|
|
- `performer` - Main performer
|
|
- `composer` - Music composer
|
|
- `lyricist` - Lyrics writer
|
|
- `producer` - Producer
|
|
- `engineer` - Recording engineer
|
|
- `mixer` - Mix engineer
|
|
|
|
**Estimated rows:** 256 million (one per track)
|
|
|
|
## Query Patterns
|
|
|
|
### Individual Track Lookup
|
|
|
|
```sql
|
|
-- Step 1: Fetch track + album (single JOIN)
|
|
SELECT
|
|
t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
|
t.track_number, t.disc_number, t.popularity, t.preview_url,
|
|
a.id AS album_id, a.name AS album_name, a.album_type,
|
|
a.label, a.release_date, a.release_date_precision,
|
|
a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
|
|
FROM tracks t
|
|
JOIN albums a ON t.album_rowid = a.rowid
|
|
WHERE t.id = ?
|
|
|
|
-- Step 2: Fetch album images
|
|
SELECT url, width, height
|
|
FROM album_images
|
|
WHERE album_id = ?
|
|
ORDER BY width DESC
|
|
|
|
-- Step 3: Fetch album artists
|
|
SELECT a.id, a.name, a.followers_total, a.popularity
|
|
FROM artists a
|
|
JOIN artist_albums aa ON a.id = aa.artist_id
|
|
WHERE aa.album_id = ?
|
|
ORDER BY aa.index_in_album
|
|
|
|
-- Step 4: Fetch track artists
|
|
SELECT a.id, a.name, a.followers_total, a.popularity
|
|
FROM artists a
|
|
JOIN track_artists ta ON a.id = ta.artist_id
|
|
WHERE ta.track_id = ?
|
|
|
|
-- Step 5: Fetch artist genres (for each artist)
|
|
SELECT genre
|
|
FROM artist_genres
|
|
WHERE artist_id = ?
|
|
|
|
-- Step 6: Fetch artist images (for each artist)
|
|
SELECT url, width, height
|
|
FROM artist_images
|
|
WHERE artist_id = ?
|
|
ORDER BY width DESC
|
|
|
|
-- Step 7: Fetch track files (from track_files.sqlite3)
|
|
SELECT has_lyrics, original_title, version_title,
|
|
language_of_performance, artist_roles
|
|
FROM track_files
|
|
WHERE track_id = ?
|
|
```
|
|
|
|
**Total queries for single track:** 7+ (depending on number of artists)
|
|
|
|
### Batch ISRC Lookup
|
|
|
|
```sql
|
|
-- Step 1: Fetch all tracks by ISRC (single query with IN clause)
|
|
SELECT
|
|
t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
|
t.track_number, t.disc_number, t.popularity, t.preview_url,
|
|
a.id AS album_id, a.name AS album_name, a.album_type,
|
|
a.label, a.release_date, a.release_date_precision,
|
|
a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
|
|
FROM tracks t
|
|
JOIN albums a ON t.album_rowid = a.rowid
|
|
WHERE t.isrc IN (?, ?, ?, ...) -- Up to 400 placeholders
|
|
|
|
-- Step 2: Batch fetch album images (all albums at once)
|
|
SELECT album_id, url, width, height
|
|
FROM album_images
|
|
WHERE album_id IN (?, ?, ?, ...)
|
|
ORDER BY album_id, width DESC
|
|
|
|
-- Step 3: Batch fetch album artists
|
|
SELECT aa.album_id, a.id, a.name, a.followers_total, a.popularity, aa.index_in_album
|
|
FROM artists a
|
|
JOIN artist_albums aa ON a.id = aa.artist_id
|
|
WHERE aa.album_id IN (?, ?, ?, ...)
|
|
ORDER BY aa.album_id, aa.index_in_album
|
|
|
|
-- Step 4: Batch fetch track artists
|
|
SELECT ta.track_id, a.id, a.name, a.followers_total, a.popularity
|
|
FROM artists a
|
|
JOIN track_artists ta ON a.id = ta.artist_id
|
|
WHERE ta.track_id IN (?, ?, ?, ...)
|
|
|
|
-- Step 5: Batch fetch artist genres
|
|
SELECT artist_id, genre
|
|
FROM artist_genres
|
|
WHERE artist_id IN (?, ?, ?, ...)
|
|
|
|
-- Step 6: Batch fetch artist images
|
|
SELECT artist_id, url, width, height
|
|
FROM artist_images
|
|
WHERE artist_id IN (?, ?, ?, ...)
|
|
ORDER BY artist_id, width DESC
|
|
|
|
-- Step 7: Batch fetch track files
|
|
SELECT track_id, has_lyrics, original_title, version_title,
|
|
language_of_performance, artist_roles
|
|
FROM track_files
|
|
WHERE track_id IN (?, ?, ?, ...)
|
|
```
|
|
|
|
**Total queries for 400 tracks:** 7 (vs 2,800+ for individual lookups)
|
|
|
|
**Performance gain:** 400x fewer queries
|
|
|
|
### Search Queries
|
|
|
|
**Track search:**
|
|
```sql
|
|
SELECT id, name, isrc, duration_ms, popularity, album_rowid
|
|
FROM tracks
|
|
WHERE name LIKE ? COLLATE NOCASE -- ? = '%query%'
|
|
ORDER BY popularity DESC
|
|
LIMIT ?
|
|
```
|
|
|
|
**Artist search:**
|
|
```sql
|
|
SELECT id, name, followers_total, popularity
|
|
FROM artists
|
|
WHERE name LIKE ? COLLATE NOCASE -- ? = '%query%'
|
|
ORDER BY followers_total DESC
|
|
LIMIT ?
|
|
```
|
|
|
|
**Search characteristics:**
|
|
- `LIKE %query%` can't use indexes (full table scan)
|
|
- `COLLATE NOCASE` for case-insensitive matching
|
|
- Ordered by popularity/followers (most relevant first)
|
|
- Limited to 50 results max
|
|
- 10-second timeout via context deadline
|
|
|
|
**Performance concern:** Searching 256M tracks with `LIKE %query%` is slow. Full-text search (FTS5) would be faster but not implemented.
|
|
|
|
### Album Tracks Lookup
|
|
|
|
```sql
|
|
-- Fetch all tracks for an album
|
|
SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
|
|
t.track_number, t.disc_number, t.popularity, t.preview_url
|
|
FROM tracks t
|
|
WHERE t.album_rowid = (
|
|
SELECT rowid FROM albums WHERE id = ?
|
|
)
|
|
ORDER BY t.disc_number, t.track_number
|
|
```
|
|
|
|
**Ordering:** Disc number first, then track number (preserves album order)
|
|
|
|
## Data Enrichment Strategy
|
|
|
|
### Enrichment Pipeline
|
|
|
|
```
|
|
1. Fetch base entity (track/album/artist)
|
|
↓
|
|
2. Collect related entity IDs
|
|
↓
|
|
3. Batch fetch related entities
|
|
↓
|
|
4. Assemble nested structures
|
|
↓
|
|
5. Return enriched object
|
|
```
|
|
|
|
### Batch Optimization Functions
|
|
|
|
**Implementation in db.go (907 lines):**
|
|
|
|
```go
|
|
// Batch fetch album images for multiple albums
|
|
func (d *Database) batchGetAlbumImages(albumIDs []string) map[string][]Image {
|
|
// Build IN clause
|
|
placeholders := strings.Repeat("?,", len(albumIDs)-1) + "?"
|
|
query := fmt.Sprintf(`
|
|
SELECT album_id, url, width, height
|
|
FROM album_images
|
|
WHERE album_id IN (%s)
|
|
ORDER BY album_id, width DESC
|
|
`, placeholders)
|
|
|
|
// Execute query
|
|
rows, _ := d.mainDB.Query(query, albumIDs...)
|
|
|
|
// Group by album_id
|
|
result := make(map[string][]Image)
|
|
for rows.Next() {
|
|
var albumID string
|
|
var img Image
|
|
rows.Scan(&albumID, &img.URL, &img.Width, &img.Height)
|
|
result[albumID] = append(result[albumID], img)
|
|
}
|
|
|
|
return result
|
|
}
|
|
```
|
|
|
|
**Similar functions:**
|
|
- `batchGetAlbumArtists(albumIDs []string) map[string][]Artist`
|
|
- `batchGetTrackArtists(trackIDs []string) map[string][]Artist`
|
|
- `batchGetArtistGenres(artistIDs []string) map[string][]string`
|
|
- `batchGetArtistImages(artistIDs []string) map[string][]Image`
|
|
- `batchEnrichTrackFiles(trackIDs []string) map[string]*TrackFile`
|
|
|
|
**Pattern:**
|
|
1. Build IN clause with placeholders
|
|
2. Execute single query for all IDs
|
|
3. Group results by parent ID
|
|
4. Return map for O(1) lookup
|
|
|
|
### Why Batch Matters
|
|
|
|
**Without batching (400 tracks):**
|
|
- 400 track queries
|
|
- 400 album queries
|
|
- 400 album image queries
|
|
- 400 album artist queries
|
|
- 400 track artist queries
|
|
- ~800 artist genre queries (2 artists per track avg)
|
|
- ~800 artist image queries
|
|
- 400 track file queries
|
|
- **Total: ~3,600 queries**
|
|
|
|
**With batching (400 tracks):**
|
|
- 1 batch track query
|
|
- 1 batch album image query
|
|
- 1 batch album artist query
|
|
- 1 batch track artist query
|
|
- 1 batch artist genre query
|
|
- 1 batch artist image query
|
|
- 1 batch track file query
|
|
- **Total: 7 queries**
|
|
|
|
**Performance gain: 514x fewer queries**
|
|
|
|
## Data Provenance
|
|
|
|
### Source
|
|
|
|
**Disclaimer from repository:**
|
|
> "This project is not affiliated with Spotify."
|
|
|
|
**Implications:**
|
|
- Data source unclear (likely scraped or obtained from third party)
|
|
- Legal status uncertain
|
|
- No official Spotify endorsement
|
|
|
|
### Data Freshness
|
|
|
|
**Static snapshot:**
|
|
- No update mechanism
|
|
- Data frozen at time of database creation
|
|
- No real-time sync with Spotify
|
|
|
|
**Staleness concerns:**
|
|
- New releases not included
|
|
- Popularity scores outdated
|
|
- Artist follower counts stale
|
|
- Deleted tracks still present
|
|
|
|
**Mitigation:**
|
|
- Treat as historical snapshot
|
|
- Complement with real-time APIs for fresh data
|
|
- Periodically obtain updated database (if available)
|
|
|
|
### Data Quality
|
|
|
|
**Strengths:**
|
|
- 256M tracks (massive coverage)
|
|
- Rich metadata (genres, images, roles)
|
|
- ISRC codes for cross-referencing
|
|
- Popularity/follower metrics
|
|
|
|
**Weaknesses:**
|
|
- No data validation visible
|
|
- Potential duplicates (not deduplicated)
|
|
- Missing ISRCs for some tracks
|
|
- Incomplete artist roles
|
|
|
|
## Storage Requirements
|
|
|
|
### Disk Space
|
|
|
|
| Component | Size | Compressible |
|
|
|-----------|------|--------------|
|
|
| main_database.sqlite3 | ~117GB | Minimal (already compact) |
|
|
| track_files.sqlite3 | ~99GB | Minimal (JSON fields) |
|
|
| **Total** | **~216GB** | - |
|
|
|
|
**Recommendations:**
|
|
- SSD strongly recommended (HDD too slow for 256M rows)
|
|
- NVMe for best performance
|
|
- RAID not necessary (read-only, can rebuild from backup)
|
|
|
|
### Memory Usage
|
|
|
|
**SQLite memory:**
|
|
- Page cache: 64MB per connection
|
|
- 8 connections: 512MB cache total
|
|
- Memory-mapped I/O: 1GB per database (2GB total)
|
|
- **Total: ~2.5GB minimum**
|
|
|
|
**Application memory:**
|
|
- Go runtime: ~50MB
|
|
- Rate limiter map: Grows unbounded (leak)
|
|
- Request buffers: ~10MB per concurrent request
|
|
- **Total: ~100MB + leak**
|
|
|
|
**Recommended RAM:** 4GB+ (2.5GB for SQLite + 1.5GB for OS/app)
|
|
|
|
### I/O Characteristics
|
|
|
|
**Read patterns:**
|
|
- Random reads (track lookups by ID/ISRC)
|
|
- Sequential scans (search queries)
|
|
- Batch reads (IN clause queries)
|
|
|
|
**Write patterns:**
|
|
- None (read-only)
|
|
|
|
**Cache effectiveness:**
|
|
- Hot data (popular tracks): High hit rate
|
|
- Cold data (obscure tracks): Low hit rate
|
|
- Search queries: Low hit rate (full scans)
|
|
|
|
## Database Maintenance
|
|
|
|
### No Maintenance Required
|
|
|
|
**Read-only benefits:**
|
|
- No VACUUM needed (no fragmentation from deletes)
|
|
- No ANALYZE needed (statistics static)
|
|
- No REINDEX needed (indexes don't degrade)
|
|
- No WAL checkpoint (journal disabled)
|
|
|
|
### Backup Strategy
|
|
|
|
**Simple backup:**
|
|
```bash
|
|
# Copy files (database must be idle)
|
|
cp main_database.sqlite3 backup/
|
|
cp track_files.sqlite3 backup/
|
|
```
|
|
|
|
**Online backup (while running):**
|
|
```bash
|
|
# SQLite backup API (requires custom tool)
|
|
sqlite3 main_database.sqlite3 ".backup backup/main_database.sqlite3"
|
|
```
|
|
|
|
**Restore:**
|
|
```bash
|
|
# Simply replace files
|
|
cp backup/main_database.sqlite3 .
|
|
cp backup/track_files.sqlite3 .
|
|
```
|
|
|
|
### Integrity Checks
|
|
|
|
**Verify database integrity:**
|
|
```bash
|
|
sqlite3 main_database.sqlite3 "PRAGMA integrity_check;"
|
|
sqlite3 track_files.sqlite3 "PRAGMA integrity_check;"
|
|
```
|
|
|
|
**Expected output:** `ok`
|
|
|
|
**Run periodically:** Monthly or after hardware issues
|
|
|
|
## Performance Tuning
|
|
|
|
### Query Optimization
|
|
|
|
**Indexes already present:**
|
|
- Primary keys on all ID columns
|
|
- Foreign key indexes (album_rowid, artist_id, etc.)
|
|
- Search indexes (tracks.name, artists.name)
|
|
|
|
**Missing indexes (potential improvements):**
|
|
- Full-text search index (FTS5) on track/artist names
|
|
- Composite index on (popularity, name) for sorted searches
|
|
|
|
### Connection Pool Tuning
|
|
|
|
**Current settings:**
|
|
```go
|
|
MaxOpenConns: 8
|
|
MaxIdleConns: 8
|
|
ConnMaxLifetime: 0
|
|
```
|
|
|
|
**Tuning considerations:**
|
|
- Increase MaxOpenConns for higher concurrency (16-32)
|
|
- Monitor CPU usage (SQLite is CPU-bound for searches)
|
|
- No benefit beyond CPU core count
|
|
|
|
### Cache Tuning
|
|
|
|
**Current cache:** 64MB per connection (512MB total)
|
|
|
|
**Increase cache:**
|
|
```
|
|
_cache_size=-128000 // 128MB per connection
|
|
```
|
|
|
|
**Tradeoff:** More memory usage vs fewer disk reads
|
|
|
|
**Recommendation:** Monitor cache hit rate, increase if low
|
|
|
|
### Memory-Mapped I/O Tuning
|
|
|
|
**Current mmap:** 1GB per database
|
|
|
|
**Increase mmap:**
|
|
```
|
|
_mmap_size=2147483648 // 2GB
|
|
```
|
|
|
|
**Tradeoff:** More virtual memory vs faster reads
|
|
|
|
**Recommendation:** Set to database size if RAM allows (117GB not feasible)
|
|
|
|
## Data Model Comparison
|
|
|
|
### vs Spotify Web API
|
|
|
|
| Feature | Music Metadata API | Spotify Web API |
|
|
|---------|-------------------|-----------------|
|
|
| Track ID format | Spotify-compatible | Spotify IDs |
|
|
| ISRC support | Yes | Yes |
|
|
| Popularity | Static snapshot | Real-time |
|
|
| Followers | Static snapshot | Real-time |
|
|
| Images | External URLs | External URLs |
|
|
| Genres | Artist-level | Artist-level |
|
|
| Lyrics | Flag only | Not available |
|
|
| Artist roles | Detailed | Limited |
|
|
| Languages | Supported | Not available |
|
|
|
|
### vs MusicBrainz
|
|
|
|
| Feature | Music Metadata API | MusicBrainz |
|
|
|---------|-------------------|-------------|
|
|
| Identifier | Spotify IDs, ISRC | MBIDs |
|
|
| Dataset size | 256M tracks | ~40M recordings |
|
|
| Popularity | Yes | No |
|
|
| Followers | Yes | No |
|
|
| Images | Yes (external) | Yes (Cover Art Archive) |
|
|
| Genres | Yes | Yes (tags) |
|
|
| Relationships | Limited | Extensive |
|
|
| Credits | Artist roles | Detailed credits |
|
|
| Updates | Static | Community-driven |
|
|
|
|
## Integration Considerations
|
|
|
|
### Joining with Other Databases
|
|
|
|
**ISRC as common key:**
|
|
```sql
|
|
-- Join with local library
|
|
SELECT l.file_path, m.name, m.popularity
|
|
FROM local_library l
|
|
JOIN music_metadata_api.tracks m ON l.isrc = m.isrc
|
|
```
|
|
|
|
**Spotify ID as common key:**
|
|
```sql
|
|
-- Join with MusicBrainz
|
|
SELECT mb.mbid, mm.name, mm.popularity
|
|
FROM musicbrainz.recording mb
|
|
JOIN musicbrainz.isrc i ON mb.id = i.recording
|
|
JOIN music_metadata_api.tracks mm ON i.isrc = mm.isrc
|
|
```
|
|
|
|
### Data Export
|
|
|
|
**Export to JSON:**
|
|
```bash
|
|
sqlite3 main_database.sqlite3 <<EOF
|
|
.mode json
|
|
.output tracks.json
|
|
SELECT * FROM tracks LIMIT 1000;
|
|
EOF
|
|
```
|
|
|
|
**Export to CSV:**
|
|
```bash
|
|
sqlite3 main_database.sqlite3 <<EOF
|
|
.mode csv
|
|
.output tracks.csv
|
|
SELECT id, name, isrc, popularity FROM tracks;
|
|
EOF
|
|
```
|
|
|
|
### Data Import
|
|
|
|
**Import from CSV:**
|
|
```bash
|
|
sqlite3 new_database.sqlite3 <<EOF
|
|
.mode csv
|
|
.import tracks.csv tracks
|
|
EOF
|
|
```
|
|
|
|
**Bulk insert from application:**
|
|
```go
|
|
tx, _ := db.Begin()
|
|
stmt, _ := tx.Prepare("INSERT INTO tracks VALUES (?, ?, ?, ...)")
|
|
for _, track := range tracks {
|
|
stmt.Exec(track.ID, track.Name, track.ISRC, ...)
|
|
}
|
|
tx.Commit()
|
|
```
|
|
|
|
## Limitations
|
|
|
|
### No Write Operations
|
|
|
|
**Implications:**
|
|
- Can't add new tracks
|
|
- Can't update popularity scores
|
|
- Can't delete duplicates
|
|
- Can't fix data errors
|
|
|
|
**Workarounds:**
|
|
- Create separate writable database for local additions
|
|
- Use views to merge read-only + writable data
|
|
- Periodically obtain updated database snapshot
|
|
|
|
### No Full-Text Search
|
|
|
|
**Current search:** `LIKE %query%` (slow)
|
|
|
|
**FTS5 alternative:**
|
|
```sql
|
|
-- Create FTS5 virtual table (requires writable database)
|
|
CREATE VIRTUAL TABLE tracks_fts USING fts5(name, content=tracks);
|
|
INSERT INTO tracks_fts SELECT name FROM tracks;
|
|
|
|
-- Fast search
|
|
SELECT * FROM tracks_fts WHERE name MATCH 'bohemian';
|
|
```
|
|
|
|
**Limitation:** Can't create FTS5 on read-only database
|
|
|
|
**Workaround:** Create separate FTS5 database, sync periodically
|
|
|
|
### No Relationships Beyond Basics
|
|
|
|
**Missing relationships:**
|
|
- Track-to-track (similar tracks, remixes)
|
|
- Album-to-album (compilations, deluxe editions)
|
|
- Artist-to-artist (collaborations, bands)
|
|
|
|
**Workaround:** Build relationship graph in separate database
|