Files
metadata-agregator/docs/research/music-metadata-api/analysis/DATA.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

912 lines
25 KiB
Markdown

# Music Metadata API - Data Layer
## Database Architecture
Music Metadata API uses a dual-database architecture with two separate SQLite files:
```
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
└─────────────────────────────────────────────────────────────┘
┌───────────┴───────────┐
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ main_database.sqlite3 │ │ track_files.sqlite3 │
│ (~117GB) │ │ (~99GB) │
│ │ │ │
│ - tracks │ │ - track_files │
│ - albums │ │ (extended metadata) │
│ - artists │ │ │
│ - track_artists │ │ │
│ - artist_albums │ │ │
│ - album_images │ │ │
│ - artist_images │ │ │
│ - artist_genres │ │ │
└──────────────────────────┘ └──────────────────────────┘
```
**Total storage:** ~216GB
**Total tracks:** 256 million
**Connection mode:** Read-only
**Driver:** modernc.org/sqlite v1.34.4 (pure Go, no CGO)
## Connection Configuration
### Connection Strings
**Main database:**
```
file:/path/to/main_database.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
```
**Track files database:**
```
file:/path/to/track_files.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
```
### PRAGMA Settings
| PRAGMA | Value | Purpose | Impact |
|--------|-------|---------|--------|
| `mode=ro` | Read-only | Prevents writes | No write locks, safe concurrent reads |
| `_journal_mode=off` | Disabled | No WAL/rollback journal | Faster reads, safe for read-only |
| `_cache_size=-64000` | 64MB | Page cache size | Reduces disk I/O for hot data |
| `_mmap_size=1073741824` | 1GB | Memory-mapped I/O | Faster reads via mmap |
| `_query_only=true` | Enabled | Additional read-only enforcement | Extra safety layer |
**Cache size calculation:**
- Negative value = kilobytes
- `-64000` = 64,000 KB = 64 MB
- Default SQLite cache is ~2MB (32x increase)
**Memory-mapped I/O:**
- Maps 1GB of database file into process memory
- OS handles paging (faster than read() syscalls)
- Effective for frequently accessed data
### Connection Pool
```go
db.SetMaxOpenConns(8) // Conservative limit (8 concurrent queries)
db.SetMaxIdleConns(8) // Keep all connections warm
db.SetConnMaxLifetime(0) // No expiration (read-only safe)
```
**Rationale:**
- Read-only workload (no write contention)
- SQLite handles concurrent reads well
- 8 connections balance throughput vs resource usage
- No connection recycling needed (no state changes)
## Main Database Schema
### tracks Table
**Purpose:** Core track metadata
| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `rowid` | INTEGER | SQLite internal row ID | No |
| `id` | TEXT | Internal track ID | No |
| `name` | TEXT | Track title | No |
| `isrc` | TEXT | ISRC code | Yes |
| `duration_ms` | INTEGER | Duration in milliseconds | No |
| `explicit` | INTEGER | Explicit content flag (0/1) | No |
| `track_number` | INTEGER | Track number on album | No |
| `disc_number` | INTEGER | Disc number | No |
| `popularity` | INTEGER | Popularity score (0-100) | No |
| `preview_url` | TEXT | 30-second preview URL | Yes |
| `album_rowid` | INTEGER | Foreign key to albums.rowid | No |
**Indexes:**
- Primary key on `id`
- Index on `isrc` (for ISRC lookups)
- Index on `album_rowid` (for album track listings)
**Sample row:**
```sql
id: 4cOdK2wGLETKBW3PvgPWqT
name: Bohemian Rhapsody
isrc: GBUM71029604
duration_ms: 354320
explicit: 0
track_number: 11
disc_number: 1
popularity: 89
preview_url: https://p.scdn.co/mp3-preview/...
album_rowid: 12345
```
**Estimated rows:** 256 million
### albums Table
**Purpose:** Album metadata
| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `rowid` | INTEGER | SQLite internal row ID | No |
| `id` | TEXT | Internal album ID | No |
| `name` | TEXT | Album title | No |
| `album_type` | TEXT | "album", "single", "compilation" | No |
| `label` | TEXT | Record label | Yes |
| `release_date` | TEXT | ISO 8601 date (YYYY-MM-DD) | No |
| `release_date_precision` | TEXT | "year", "month", "day" | No |
| `external_id_upc` | TEXT | UPC barcode | Yes |
| `total_tracks` | INTEGER | Total tracks on album | No |
| `copyright_c` | TEXT | Copyright notice | Yes |
| `copyright_p` | TEXT | Phonographic copyright | Yes |
**Indexes:**
- Primary key on `id`
- Index on `rowid` (for track joins)
**Sample row:**
```sql
id: 2ODvWsOgouMbaA5xf0RkJe
name: A Night at the Opera
album_type: album
label: Hollywood Records
release_date: 1975-11-21
release_date_precision: day
external_id_upc: 050087246679
total_tracks: 12
copyright_c: 1975 Queen Productions Ltd
copyright_p: 1975 Queen Productions Ltd
```
**Estimated rows:** Tens of millions (fewer than tracks)
### artists Table
**Purpose:** Artist metadata
| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `rowid` | INTEGER | SQLite internal row ID | No |
| `id` | TEXT | Internal artist ID | No |
| `name` | TEXT | Artist name | No |
| `followers_total` | INTEGER | Total followers | Yes |
| `popularity` | INTEGER | Popularity score (0-100) | Yes |
**Indexes:**
- Primary key on `id`
- Index on `name` (for search)
**Sample row:**
```sql
id: 0TnOYISbd1XYRBk9myaseg
name: Queen
followers_total: 45000000
popularity: 92
```
**Estimated rows:** Millions (fewer than albums)
### track_artists Table
**Purpose:** Many-to-many relationship between tracks and artists
| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `track_id` | TEXT | Foreign key to tracks.id | No |
| `artist_id` | TEXT | Foreign key to artists.id | No |
**Indexes:**
- Composite index on `(track_id, artist_id)`
- Index on `artist_id` (for artist track listings)
**Sample rows:**
```sql
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 0TnOYISbd1XYRBk9myaseg
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 1A2B3C4D5E6F7G8H9I0J
```
**Estimated rows:** Hundreds of millions (tracks can have multiple artists)
### artist_albums Table
**Purpose:** Many-to-many relationship between artists and albums with ordering
| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `artist_id` | TEXT | Foreign key to artists.id | No |
| `album_id` | TEXT | Foreign key to albums.id | No |
| `index_in_album` | INTEGER | Artist order on album | No |
**Indexes:**
- Composite index on `(album_id, index_in_album)`
- Index on `artist_id` (for artist discography)
**Sample rows:**
```sql
artist_id: 0TnOYISbd1XYRBk9myaseg, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 0
artist_id: 1A2B3C4D5E6F7G8H9I0J, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 1
```
**Purpose of index_in_album:** Preserves artist order for multi-artist albums (e.g., "Artist A & Artist B")
### album_images Table
**Purpose:** Album artwork URLs
| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `album_id` | TEXT | Foreign key to albums.id | No |
| `url` | TEXT | Image URL | No |
| `width` | INTEGER | Width in pixels | No |
| `height` | INTEGER | Height in pixels | No |
**Indexes:**
- Index on `album_id`
**Sample rows:**
```sql
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d0000b273..., width: 640, height: 640
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00001e02..., width: 300, height: 300
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00004851..., width: 64, height: 64
```
**Typical sizes:** 640x640, 300x300, 64x64
**Image hosting:** External CDN (i.scdn.co), not hosted by API
### artist_images Table
**Purpose:** Artist images/photos
| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `artist_id` | TEXT | Foreign key to artists.id | No |
| `url` | TEXT | Image URL | No |
| `width` | INTEGER | Width in pixels | No |
| `height` | INTEGER | Height in pixels | No |
**Indexes:**
- Index on `artist_id`
**Sample rows:**
```sql
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/af2b8e57f6d7b5d..., width: 640, height: 640
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/c06971e9ff81696..., width: 320, height: 320
```
### artist_genres Table
**Purpose:** Artist genre tags
| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `artist_id` | TEXT | Foreign key to artists.id | No |
| `genre` | TEXT | Genre name | No |
**Indexes:**
- Index on `artist_id`
**Sample rows:**
```sql
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: rock
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: classic rock
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: glam rock
```
**Genre characteristics:**
- Multiple genres per artist
- Lowercase, hyphenated (e.g., "indie-rock")
- Spotify-style genre taxonomy
## Track Files Database Schema
### track_files Table
**Purpose:** Extended track metadata not in main database
| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `track_id` | TEXT | Foreign key to tracks.id | No |
| `has_lyrics` | INTEGER | Lyrics availability flag (0/1) | No |
| `original_title` | TEXT | Original title (if different) | Yes |
| `version_title` | TEXT | Version descriptor (e.g., "Radio Edit") | Yes |
| `language_of_performance` | TEXT | JSON array of language codes | Yes |
| `artist_roles` | TEXT | JSON object mapping artist IDs to roles | Yes |
**Indexes:**
- Primary key on `track_id`
**Sample row:**
```sql
track_id: 4cOdK2wGLETKBW3PvgPWqT
has_lyrics: 1
original_title: Bohemian Rhapsody
version_title: NULL
language_of_performance: ["en"]
artist_roles: {"0TnOYISbd1XYRBk9myaseg": ["performer", "composer"]}
```
**JSON field parsing:**
**language_of_performance:**
```json
["en", "es"] // ISO 639-1 language codes
```
**artist_roles:**
```json
{
"artist_id_1": ["performer", "composer"],
"artist_id_2": ["producer"],
"artist_id_3": ["lyricist"]
}
```
**Common roles:**
- `performer` - Main performer
- `composer` - Music composer
- `lyricist` - Lyrics writer
- `producer` - Producer
- `engineer` - Recording engineer
- `mixer` - Mix engineer
**Estimated rows:** 256 million (one per track)
## Query Patterns
### Individual Track Lookup
```sql
-- Step 1: Fetch track + album (single JOIN)
SELECT
t.id, t.name, t.isrc, t.duration_ms, t.explicit,
t.track_number, t.disc_number, t.popularity, t.preview_url,
a.id AS album_id, a.name AS album_name, a.album_type,
a.label, a.release_date, a.release_date_precision,
a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
FROM tracks t
JOIN albums a ON t.album_rowid = a.rowid
WHERE t.id = ?
-- Step 2: Fetch album images
SELECT url, width, height
FROM album_images
WHERE album_id = ?
ORDER BY width DESC
-- Step 3: Fetch album artists
SELECT a.id, a.name, a.followers_total, a.popularity
FROM artists a
JOIN artist_albums aa ON a.id = aa.artist_id
WHERE aa.album_id = ?
ORDER BY aa.index_in_album
-- Step 4: Fetch track artists
SELECT a.id, a.name, a.followers_total, a.popularity
FROM artists a
JOIN track_artists ta ON a.id = ta.artist_id
WHERE ta.track_id = ?
-- Step 5: Fetch artist genres (for each artist)
SELECT genre
FROM artist_genres
WHERE artist_id = ?
-- Step 6: Fetch artist images (for each artist)
SELECT url, width, height
FROM artist_images
WHERE artist_id = ?
ORDER BY width DESC
-- Step 7: Fetch track files (from track_files.sqlite3)
SELECT has_lyrics, original_title, version_title,
language_of_performance, artist_roles
FROM track_files
WHERE track_id = ?
```
**Total queries for single track:** 7+ (depending on number of artists)
### Batch ISRC Lookup
```sql
-- Step 1: Fetch all tracks by ISRC (single query with IN clause)
SELECT
t.id, t.name, t.isrc, t.duration_ms, t.explicit,
t.track_number, t.disc_number, t.popularity, t.preview_url,
a.id AS album_id, a.name AS album_name, a.album_type,
a.label, a.release_date, a.release_date_precision,
a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
FROM tracks t
JOIN albums a ON t.album_rowid = a.rowid
WHERE t.isrc IN (?, ?, ?, ...) -- Up to 400 placeholders
-- Step 2: Batch fetch album images (all albums at once)
SELECT album_id, url, width, height
FROM album_images
WHERE album_id IN (?, ?, ?, ...)
ORDER BY album_id, width DESC
-- Step 3: Batch fetch album artists
SELECT aa.album_id, a.id, a.name, a.followers_total, a.popularity, aa.index_in_album
FROM artists a
JOIN artist_albums aa ON a.id = aa.artist_id
WHERE aa.album_id IN (?, ?, ?, ...)
ORDER BY aa.album_id, aa.index_in_album
-- Step 4: Batch fetch track artists
SELECT ta.track_id, a.id, a.name, a.followers_total, a.popularity
FROM artists a
JOIN track_artists ta ON a.id = ta.artist_id
WHERE ta.track_id IN (?, ?, ?, ...)
-- Step 5: Batch fetch artist genres
SELECT artist_id, genre
FROM artist_genres
WHERE artist_id IN (?, ?, ?, ...)
-- Step 6: Batch fetch artist images
SELECT artist_id, url, width, height
FROM artist_images
WHERE artist_id IN (?, ?, ?, ...)
ORDER BY artist_id, width DESC
-- Step 7: Batch fetch track files
SELECT track_id, has_lyrics, original_title, version_title,
language_of_performance, artist_roles
FROM track_files
WHERE track_id IN (?, ?, ?, ...)
```
**Total queries for 400 tracks:** 7 (vs 2,800+ for individual lookups)
**Performance gain:** 400x fewer queries
### Search Queries
**Track search:**
```sql
SELECT id, name, isrc, duration_ms, popularity, album_rowid
FROM tracks
WHERE name LIKE ? COLLATE NOCASE -- ? = '%query%'
ORDER BY popularity DESC
LIMIT ?
```
**Artist search:**
```sql
SELECT id, name, followers_total, popularity
FROM artists
WHERE name LIKE ? COLLATE NOCASE -- ? = '%query%'
ORDER BY followers_total DESC
LIMIT ?
```
**Search characteristics:**
- `LIKE %query%` can't use indexes (full table scan)
- `COLLATE NOCASE` for case-insensitive matching
- Ordered by popularity/followers (most relevant first)
- Limited to 50 results max
- 10-second timeout via context deadline
**Performance concern:** Searching 256M tracks with `LIKE %query%` is slow. Full-text search (FTS5) would be faster but not implemented.
### Album Tracks Lookup
```sql
-- Fetch all tracks for an album
SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
t.track_number, t.disc_number, t.popularity, t.preview_url
FROM tracks t
WHERE t.album_rowid = (
SELECT rowid FROM albums WHERE id = ?
)
ORDER BY t.disc_number, t.track_number
```
**Ordering:** Disc number first, then track number (preserves album order)
## Data Enrichment Strategy
### Enrichment Pipeline
```
1. Fetch base entity (track/album/artist)
2. Collect related entity IDs
3. Batch fetch related entities
4. Assemble nested structures
5. Return enriched object
```
### Batch Optimization Functions
**Implementation in db.go (907 lines):**
```go
// Batch fetch album images for multiple albums
func (d *Database) batchGetAlbumImages(albumIDs []string) map[string][]Image {
// Build IN clause
placeholders := strings.Repeat("?,", len(albumIDs)-1) + "?"
query := fmt.Sprintf(`
SELECT album_id, url, width, height
FROM album_images
WHERE album_id IN (%s)
ORDER BY album_id, width DESC
`, placeholders)
// Execute query
rows, _ := d.mainDB.Query(query, albumIDs...)
// Group by album_id
result := make(map[string][]Image)
for rows.Next() {
var albumID string
var img Image
rows.Scan(&albumID, &img.URL, &img.Width, &img.Height)
result[albumID] = append(result[albumID], img)
}
return result
}
```
**Similar functions:**
- `batchGetAlbumArtists(albumIDs []string) map[string][]Artist`
- `batchGetTrackArtists(trackIDs []string) map[string][]Artist`
- `batchGetArtistGenres(artistIDs []string) map[string][]string`
- `batchGetArtistImages(artistIDs []string) map[string][]Image`
- `batchEnrichTrackFiles(trackIDs []string) map[string]*TrackFile`
**Pattern:**
1. Build IN clause with placeholders
2. Execute single query for all IDs
3. Group results by parent ID
4. Return map for O(1) lookup
### Why Batch Matters
**Without batching (400 tracks):**
- 400 track queries
- 400 album queries
- 400 album image queries
- 400 album artist queries
- 400 track artist queries
- ~800 artist genre queries (2 artists per track avg)
- ~800 artist image queries
- 400 track file queries
- **Total: ~3,600 queries**
**With batching (400 tracks):**
- 1 batch track query
- 1 batch album image query
- 1 batch album artist query
- 1 batch track artist query
- 1 batch artist genre query
- 1 batch artist image query
- 1 batch track file query
- **Total: 7 queries**
**Performance gain: 514x fewer queries**
## Data Provenance
### Source
**Disclaimer from repository:**
> "This project is not affiliated with Spotify."
**Implications:**
- Data source unclear (likely scraped or obtained from third party)
- Legal status uncertain
- No official Spotify endorsement
### Data Freshness
**Static snapshot:**
- No update mechanism
- Data frozen at time of database creation
- No real-time sync with Spotify
**Staleness concerns:**
- New releases not included
- Popularity scores outdated
- Artist follower counts stale
- Deleted tracks still present
**Mitigation:**
- Treat as historical snapshot
- Complement with real-time APIs for fresh data
- Periodically obtain updated database (if available)
### Data Quality
**Strengths:**
- 256M tracks (massive coverage)
- Rich metadata (genres, images, roles)
- ISRC codes for cross-referencing
- Popularity/follower metrics
**Weaknesses:**
- No data validation visible
- Potential duplicates (not deduplicated)
- Missing ISRCs for some tracks
- Incomplete artist roles
## Storage Requirements
### Disk Space
| Component | Size | Compressible |
|-----------|------|--------------|
| main_database.sqlite3 | ~117GB | Minimal (already compact) |
| track_files.sqlite3 | ~99GB | Minimal (JSON fields) |
| **Total** | **~216GB** | - |
**Recommendations:**
- SSD strongly recommended (HDD too slow for 256M rows)
- NVMe for best performance
- RAID not necessary (read-only, can rebuild from backup)
### Memory Usage
**SQLite memory:**
- Page cache: 64MB per connection
- 8 connections: 512MB cache total
- Memory-mapped I/O: 1GB per database (2GB total)
- **Total: ~2.5GB minimum**
**Application memory:**
- Go runtime: ~50MB
- Rate limiter map: Grows unbounded (leak)
- Request buffers: ~10MB per concurrent request
- **Total: ~100MB + leak**
**Recommended RAM:** 4GB+ (2.5GB for SQLite + 1.5GB for OS/app)
### I/O Characteristics
**Read patterns:**
- Random reads (track lookups by ID/ISRC)
- Sequential scans (search queries)
- Batch reads (IN clause queries)
**Write patterns:**
- None (read-only)
**Cache effectiveness:**
- Hot data (popular tracks): High hit rate
- Cold data (obscure tracks): Low hit rate
- Search queries: Low hit rate (full scans)
## Database Maintenance
### No Maintenance Required
**Read-only benefits:**
- No VACUUM needed (no fragmentation from deletes)
- No ANALYZE needed (statistics static)
- No REINDEX needed (indexes don't degrade)
- No WAL checkpoint (journal disabled)
### Backup Strategy
**Simple backup:**
```bash
# Copy files (database must be idle)
cp main_database.sqlite3 backup/
cp track_files.sqlite3 backup/
```
**Online backup (while running):**
```bash
# SQLite backup API (requires custom tool)
sqlite3 main_database.sqlite3 ".backup backup/main_database.sqlite3"
```
**Restore:**
```bash
# Simply replace files
cp backup/main_database.sqlite3 .
cp backup/track_files.sqlite3 .
```
### Integrity Checks
**Verify database integrity:**
```bash
sqlite3 main_database.sqlite3 "PRAGMA integrity_check;"
sqlite3 track_files.sqlite3 "PRAGMA integrity_check;"
```
**Expected output:** `ok`
**Run periodically:** Monthly or after hardware issues
## Performance Tuning
### Query Optimization
**Indexes already present:**
- Primary keys on all ID columns
- Foreign key indexes (album_rowid, artist_id, etc.)
- Search indexes (tracks.name, artists.name)
**Missing indexes (potential improvements):**
- Full-text search index (FTS5) on track/artist names
- Composite index on (popularity, name) for sorted searches
### Connection Pool Tuning
**Current settings:**
```go
MaxOpenConns: 8
MaxIdleConns: 8
ConnMaxLifetime: 0
```
**Tuning considerations:**
- Increase MaxOpenConns for higher concurrency (16-32)
- Monitor CPU usage (SQLite is CPU-bound for searches)
- No benefit beyond CPU core count
### Cache Tuning
**Current cache:** 64MB per connection (512MB total)
**Increase cache:**
```
_cache_size=-128000 // 128MB per connection
```
**Tradeoff:** More memory usage vs fewer disk reads
**Recommendation:** Monitor cache hit rate, increase if low
### Memory-Mapped I/O Tuning
**Current mmap:** 1GB per database
**Increase mmap:**
```
_mmap_size=2147483648 // 2GB
```
**Tradeoff:** More virtual memory vs faster reads
**Recommendation:** Set to database size if RAM allows (117GB not feasible)
## Data Model Comparison
### vs Spotify Web API
| Feature | Music Metadata API | Spotify Web API |
|---------|-------------------|-----------------|
| Track ID format | Spotify-compatible | Spotify IDs |
| ISRC support | Yes | Yes |
| Popularity | Static snapshot | Real-time |
| Followers | Static snapshot | Real-time |
| Images | External URLs | External URLs |
| Genres | Artist-level | Artist-level |
| Lyrics | Flag only | Not available |
| Artist roles | Detailed | Limited |
| Languages | Supported | Not available |
### vs MusicBrainz
| Feature | Music Metadata API | MusicBrainz |
|---------|-------------------|-------------|
| Identifier | Spotify IDs, ISRC | MBIDs |
| Dataset size | 256M tracks | ~40M recordings |
| Popularity | Yes | No |
| Followers | Yes | No |
| Images | Yes (external) | Yes (Cover Art Archive) |
| Genres | Yes | Yes (tags) |
| Relationships | Limited | Extensive |
| Credits | Artist roles | Detailed credits |
| Updates | Static | Community-driven |
## Integration Considerations
### Joining with Other Databases
**ISRC as common key:**
```sql
-- Join with local library
SELECT l.file_path, m.name, m.popularity
FROM local_library l
JOIN music_metadata_api.tracks m ON l.isrc = m.isrc
```
**Spotify ID as common key:**
```sql
-- Join with MusicBrainz
SELECT mb.mbid, mm.name, mm.popularity
FROM musicbrainz.recording mb
JOIN musicbrainz.isrc i ON mb.id = i.recording
JOIN music_metadata_api.tracks mm ON i.isrc = mm.isrc
```
### Data Export
**Export to JSON:**
```bash
sqlite3 main_database.sqlite3 <<EOF
.mode json
.output tracks.json
SELECT * FROM tracks LIMIT 1000;
EOF
```
**Export to CSV:**
```bash
sqlite3 main_database.sqlite3 <<EOF
.mode csv
.output tracks.csv
SELECT id, name, isrc, popularity FROM tracks;
EOF
```
### Data Import
**Import from CSV:**
```bash
sqlite3 new_database.sqlite3 <<EOF
.mode csv
.import tracks.csv tracks
EOF
```
**Bulk insert from application:**
```go
tx, _ := db.Begin()
stmt, _ := tx.Prepare("INSERT INTO tracks VALUES (?, ?, ?, ...)")
for _, track := range tracks {
stmt.Exec(track.ID, track.Name, track.ISRC, ...)
}
tx.Commit()
```
## Limitations
### No Write Operations
**Implications:**
- Can't add new tracks
- Can't update popularity scores
- Can't delete duplicates
- Can't fix data errors
**Workarounds:**
- Create separate writable database for local additions
- Use views to merge read-only + writable data
- Periodically obtain updated database snapshot
### No Full-Text Search
**Current search:** `LIKE %query%` (slow)
**FTS5 alternative:**
```sql
-- Create FTS5 virtual table (requires writable database)
CREATE VIRTUAL TABLE tracks_fts USING fts5(name, content=tracks);
INSERT INTO tracks_fts SELECT name FROM tracks;
-- Fast search
SELECT * FROM tracks_fts WHERE name MATCH 'bohemian';
```
**Limitation:** Can't create FTS5 on read-only database
**Workaround:** Create separate FTS5 database, sync periodically
### No Relationships Beyond Basics
**Missing relationships:**
- Track-to-track (similar tracks, remixes)
- Album-to-album (compilations, deluxe editions)
- Artist-to-artist (collaborations, bands)
**Workaround:** Build relationship graph in separate database