metadata-agregator/docs/research/music-metadata-api/analysis/DATA.md

# Music Metadata API - Data Layer

## Database Architecture

Music Metadata API uses a dual-database architecture with two separate SQLite files:

```
┌─────────────────────────────────────────────────────────────┐
│                    Application Layer                         │
└─────────────────────────────────────────────────────────────┘
                            │
                ┌───────────┴───────────┐
                ▼                       ▼
┌──────────────────────────┐  ┌──────────────────────────┐
│  main_database.sqlite3   │  │  track_files.sqlite3     │
│       (~117GB)           │  │       (~99GB)            │
│                          │  │                          │
│  - tracks                │  │  - track_files           │
│  - albums                │  │    (extended metadata)   │
│  - artists               │  │                          │
│  - track_artists         │  │                          │
│  - artist_albums         │  │                          │
│  - album_images          │  │                          │
│  - artist_images         │  │                          │
│  - artist_genres         │  │                          │
└──────────────────────────┘  └──────────────────────────┘
```

**Total storage:** ~216GB
**Total tracks:** 256 million
**Connection mode:** Read-only
**Driver:** modernc.org/sqlite v1.34.4 (pure Go, no CGO)

## Connection Configuration

### Connection Strings

**Main database:**
```
file:/path/to/main_database.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
```

**Track files database:**
```
file:/path/to/track_files.sqlite3?mode=ro&_journal_mode=off&_cache_size=-64000&_mmap_size=1073741824&_query_only=true
```

### PRAGMA Settings

| PRAGMA | Value | Purpose | Impact |
|--------|-------|---------|--------|
| `mode=ro` | Read-only | Prevents writes | No write locks, safe concurrent reads |
| `_journal_mode=off` | Disabled | No WAL/rollback journal | Faster reads, safe for read-only |
| `_cache_size=-64000` | 64MB | Page cache size | Reduces disk I/O for hot data |
| `_mmap_size=1073741824` | 1GB | Memory-mapped I/O | Faster reads via mmap |
| `_query_only=true` | Enabled | Additional read-only enforcement | Extra safety layer |

**Cache size calculation:**
- Negative value = kilobytes
- `-64000` = 64,000 KB = 64 MB
- Default SQLite cache is ~2MB (32x increase)

**Memory-mapped I/O:**
- Maps 1GB of database file into process memory
- OS handles paging (faster than read() syscalls)
- Effective for frequently accessed data

### Connection Pool

```go
db.SetMaxOpenConns(8)      // Conservative limit (8 concurrent queries)
db.SetMaxIdleConns(8)      // Keep all connections warm
db.SetConnMaxLifetime(0)   // No expiration (read-only safe)
```

**Rationale:**
- Read-only workload (no write contention)
- SQLite handles concurrent reads well
- 8 connections balance throughput vs resource usage
- No connection recycling needed (no state changes)

## Main Database Schema

### tracks Table

**Purpose:** Core track metadata

| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `rowid` | INTEGER | SQLite internal row ID | No |
| `id` | TEXT | Internal track ID | No |
| `name` | TEXT | Track title | No |
| `isrc` | TEXT | ISRC code | Yes |
| `duration_ms` | INTEGER | Duration in milliseconds | No |
| `explicit` | INTEGER | Explicit content flag (0/1) | No |
| `track_number` | INTEGER | Track number on album | No |
| `disc_number` | INTEGER | Disc number | No |
| `popularity` | INTEGER | Popularity score (0-100) | No |
| `preview_url` | TEXT | 30-second preview URL | Yes |
| `album_rowid` | INTEGER | Foreign key to albums.rowid | No |

**Indexes:**
- Primary key on `id`
- Index on `isrc` (for ISRC lookups)
- Index on `album_rowid` (for album track listings)

**Sample row:**
```sql
id: 4cOdK2wGLETKBW3PvgPWqT
name: Bohemian Rhapsody
isrc: GBUM71029604
duration_ms: 354320
explicit: 0
track_number: 11
disc_number: 1
popularity: 89
preview_url: https://p.scdn.co/mp3-preview/...
album_rowid: 12345
```

**Estimated rows:** 256 million

### albums Table

**Purpose:** Album metadata

| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `rowid` | INTEGER | SQLite internal row ID | No |
| `id` | TEXT | Internal album ID | No |
| `name` | TEXT | Album title | No |
| `album_type` | TEXT | "album", "single", "compilation" | No |
| `label` | TEXT | Record label | Yes |
| `release_date` | TEXT | ISO 8601 date (YYYY-MM-DD) | No |
| `release_date_precision` | TEXT | "year", "month", "day" | No |
| `external_id_upc` | TEXT | UPC barcode | Yes |
| `total_tracks` | INTEGER | Total tracks on album | No |
| `copyright_c` | TEXT | Copyright notice | Yes |
| `copyright_p` | TEXT | Phonographic copyright | Yes |

**Indexes:**
- Primary key on `id`
- Index on `rowid` (for track joins)

**Sample row:**
```sql
id: 2ODvWsOgouMbaA5xf0RkJe
name: A Night at the Opera
album_type: album
label: Hollywood Records
release_date: 1975-11-21
release_date_precision: day
external_id_upc: 050087246679
total_tracks: 12
copyright_c: 1975 Queen Productions Ltd
copyright_p: 1975 Queen Productions Ltd
```

**Estimated rows:** Tens of millions (fewer than tracks)

### artists Table

**Purpose:** Artist metadata

| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `rowid` | INTEGER | SQLite internal row ID | No |
| `id` | TEXT | Internal artist ID | No |
| `name` | TEXT | Artist name | No |
| `followers_total` | INTEGER | Total followers | Yes |
| `popularity` | INTEGER | Popularity score (0-100) | Yes |

**Indexes:**
- Primary key on `id`
- Index on `name` (for search)

**Sample row:**
```sql
id: 0TnOYISbd1XYRBk9myaseg
name: Queen
followers_total: 45000000
popularity: 92
```

**Estimated rows:** Millions (fewer than albums)

### track_artists Table

**Purpose:** Many-to-many relationship between tracks and artists

| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `track_id` | TEXT | Foreign key to tracks.id | No |
| `artist_id` | TEXT | Foreign key to artists.id | No |

**Indexes:**
- Composite index on `(track_id, artist_id)`
- Index on `artist_id` (for artist track listings)

**Sample rows:**
```sql
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 0TnOYISbd1XYRBk9myaseg
track_id: 4cOdK2wGLETKBW3PvgPWqT, artist_id: 1A2B3C4D5E6F7G8H9I0J
```

**Estimated rows:** Hundreds of millions (tracks can have multiple artists)

### artist_albums Table

**Purpose:** Many-to-many relationship between artists and albums with ordering

| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `artist_id` | TEXT | Foreign key to artists.id | No |
| `album_id` | TEXT | Foreign key to albums.id | No |
| `index_in_album` | INTEGER | Artist order on album | No |

**Indexes:**
- Composite index on `(album_id, index_in_album)`
- Index on `artist_id` (for artist discography)

**Sample rows:**
```sql
artist_id: 0TnOYISbd1XYRBk9myaseg, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 0
artist_id: 1A2B3C4D5E6F7G8H9I0J, album_id: 2ODvWsOgouMbaA5xf0RkJe, index_in_album: 1
```

**Purpose of index_in_album:** Preserves artist order for multi-artist albums (e.g., "Artist A & Artist B")

### album_images Table

**Purpose:** Album artwork URLs

| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `album_id` | TEXT | Foreign key to albums.id | No |
| `url` | TEXT | Image URL | No |
| `width` | INTEGER | Width in pixels | No |
| `height` | INTEGER | Height in pixels | No |

**Indexes:**
- Index on `album_id`

**Sample rows:**
```sql
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d0000b273..., width: 640, height: 640
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00001e02..., width: 300, height: 300
album_id: 2ODvWsOgouMbaA5xf0RkJe, url: https://i.scdn.co/image/ab67616d00004851..., width: 64, height: 64
```

**Typical sizes:** 640x640, 300x300, 64x64

**Image hosting:** External CDN (i.scdn.co), not hosted by API

### artist_images Table

**Purpose:** Artist images/photos

| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `artist_id` | TEXT | Foreign key to artists.id | No |
| `url` | TEXT | Image URL | No |
| `width` | INTEGER | Width in pixels | No |
| `height` | INTEGER | Height in pixels | No |

**Indexes:**
- Index on `artist_id`

**Sample rows:**
```sql
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/af2b8e57f6d7b5d..., width: 640, height: 640
artist_id: 0TnOYISbd1XYRBk9myaseg, url: https://i.scdn.co/image/c06971e9ff81696..., width: 320, height: 320
```

### artist_genres Table

**Purpose:** Artist genre tags

| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `artist_id` | TEXT | Foreign key to artists.id | No |
| `genre` | TEXT | Genre name | No |

**Indexes:**
- Index on `artist_id`

**Sample rows:**
```sql
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: rock
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: classic rock
artist_id: 0TnOYISbd1XYRBk9myaseg, genre: glam rock
```

**Genre characteristics:**
- Multiple genres per artist
- Lowercase, hyphenated (e.g., "indie-rock")
- Spotify-style genre taxonomy

## Track Files Database Schema

### track_files Table

**Purpose:** Extended track metadata not in main database

| Column | Type | Description | Nullable |
|--------|------|-------------|----------|
| `track_id` | TEXT | Foreign key to tracks.id | No |
| `has_lyrics` | INTEGER | Lyrics availability flag (0/1) | No |
| `original_title` | TEXT | Original title (if different) | Yes |
| `version_title` | TEXT | Version descriptor (e.g., "Radio Edit") | Yes |
| `language_of_performance` | TEXT | JSON array of language codes | Yes |
| `artist_roles` | TEXT | JSON object mapping artist IDs to roles | Yes |

**Indexes:**
- Primary key on `track_id`

**Sample row:**
```sql
track_id: 4cOdK2wGLETKBW3PvgPWqT
has_lyrics: 1
original_title: Bohemian Rhapsody
version_title: NULL
language_of_performance: ["en"]
artist_roles: {"0TnOYISbd1XYRBk9myaseg": ["performer", "composer"]}
```

**JSON field parsing:**

**language_of_performance:**
```json
["en", "es"]  // ISO 639-1 language codes
```

**artist_roles:**
```json
{
  "artist_id_1": ["performer", "composer"],
  "artist_id_2": ["producer"],
  "artist_id_3": ["lyricist"]
}
```

**Common roles:**
- `performer` - Main performer
- `composer` - Music composer
- `lyricist` - Lyrics writer
- `producer` - Producer
- `engineer` - Recording engineer
- `mixer` - Mix engineer

**Estimated rows:** 256 million (one per track)

## Query Patterns

### Individual Track Lookup

```sql
-- Step 1: Fetch track + album (single JOIN)
SELECT
    t.id, t.name, t.isrc, t.duration_ms, t.explicit,
    t.track_number, t.disc_number, t.popularity, t.preview_url,
    a.id AS album_id, a.name AS album_name, a.album_type,
    a.label, a.release_date, a.release_date_precision,
    a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
FROM tracks t
JOIN albums a ON t.album_rowid = a.rowid
WHERE t.id = ?

-- Step 2: Fetch album images
SELECT url, width, height
FROM album_images
WHERE album_id = ?
ORDER BY width DESC

-- Step 3: Fetch album artists
SELECT a.id, a.name, a.followers_total, a.popularity
FROM artists a
JOIN artist_albums aa ON a.id = aa.artist_id
WHERE aa.album_id = ?
ORDER BY aa.index_in_album

-- Step 4: Fetch track artists
SELECT a.id, a.name, a.followers_total, a.popularity
FROM artists a
JOIN track_artists ta ON a.id = ta.artist_id
WHERE ta.track_id = ?

-- Step 5: Fetch artist genres (for each artist)
SELECT genre
FROM artist_genres
WHERE artist_id = ?

-- Step 6: Fetch artist images (for each artist)
SELECT url, width, height
FROM artist_images
WHERE artist_id = ?
ORDER BY width DESC

-- Step 7: Fetch track files (from track_files.sqlite3)
SELECT has_lyrics, original_title, version_title,
       language_of_performance, artist_roles
FROM track_files
WHERE track_id = ?
```

**Total queries for single track:** 7+ (depending on number of artists)

### Batch ISRC Lookup

```sql
-- Step 1: Fetch all tracks by ISRC (single query with IN clause)
SELECT
    t.id, t.name, t.isrc, t.duration_ms, t.explicit,
    t.track_number, t.disc_number, t.popularity, t.preview_url,
    a.id AS album_id, a.name AS album_name, a.album_type,
    a.label, a.release_date, a.release_date_precision,
    a.external_id_upc, a.total_tracks, a.copyright_c, a.copyright_p
FROM tracks t
JOIN albums a ON t.album_rowid = a.rowid
WHERE t.isrc IN (?, ?, ?, ...)  -- Up to 400 placeholders

-- Step 2: Batch fetch album images (all albums at once)
SELECT album_id, url, width, height
FROM album_images
WHERE album_id IN (?, ?, ?, ...)
ORDER BY album_id, width DESC

-- Step 3: Batch fetch album artists
SELECT aa.album_id, a.id, a.name, a.followers_total, a.popularity, aa.index_in_album
FROM artists a
JOIN artist_albums aa ON a.id = aa.artist_id
WHERE aa.album_id IN (?, ?, ?, ...)
ORDER BY aa.album_id, aa.index_in_album

-- Step 4: Batch fetch track artists
SELECT ta.track_id, a.id, a.name, a.followers_total, a.popularity
FROM artists a
JOIN track_artists ta ON a.id = ta.artist_id
WHERE ta.track_id IN (?, ?, ?, ...)

-- Step 5: Batch fetch artist genres
SELECT artist_id, genre
FROM artist_genres
WHERE artist_id IN (?, ?, ?, ...)

-- Step 6: Batch fetch artist images
SELECT artist_id, url, width, height
FROM artist_images
WHERE artist_id IN (?, ?, ?, ...)
ORDER BY artist_id, width DESC

-- Step 7: Batch fetch track files
SELECT track_id, has_lyrics, original_title, version_title,
       language_of_performance, artist_roles
FROM track_files
WHERE track_id IN (?, ?, ?, ...)
```

**Total queries for 400 tracks:** 7 (vs 2,800+ for individual lookups)

**Performance gain:** 400x fewer queries

### Search Queries

**Track search:**
```sql
SELECT id, name, isrc, duration_ms, popularity, album_rowid
FROM tracks
WHERE name LIKE ? COLLATE NOCASE  -- ? = '%query%'
ORDER BY popularity DESC
LIMIT ?
```

**Artist search:**
```sql
SELECT id, name, followers_total, popularity
FROM artists
WHERE name LIKE ? COLLATE NOCASE  -- ? = '%query%'
ORDER BY followers_total DESC
LIMIT ?
```

**Search characteristics:**
- `LIKE %query%` can't use indexes (full table scan)
- `COLLATE NOCASE` for case-insensitive matching
- Ordered by popularity/followers (most relevant first)
- Limited to 50 results max
- 10-second timeout via context deadline

**Performance concern:** Searching 256M tracks with `LIKE %query%` is slow. Full-text search (FTS5) would be faster but not implemented.

### Album Tracks Lookup

```sql
-- Fetch all tracks for an album
SELECT t.id, t.name, t.isrc, t.duration_ms, t.explicit,
       t.track_number, t.disc_number, t.popularity, t.preview_url
FROM tracks t
WHERE t.album_rowid = (
    SELECT rowid FROM albums WHERE id = ?
)
ORDER BY t.disc_number, t.track_number
```

**Ordering:** Disc number first, then track number (preserves album order)

## Data Enrichment Strategy

### Enrichment Pipeline

```
1. Fetch base entity (track/album/artist)
   ↓
2. Collect related entity IDs
   ↓
3. Batch fetch related entities
   ↓
4. Assemble nested structures
   ↓
5. Return enriched object
```

### Batch Optimization Functions

**Implementation in db.go (907 lines):**

```go
// Batch fetch album images for multiple albums
func (d *Database) batchGetAlbumImages(albumIDs []string) map[string][]Image {
    // Build IN clause
    placeholders := strings.Repeat("?,", len(albumIDs)-1) + "?"
    query := fmt.Sprintf(`
        SELECT album_id, url, width, height
        FROM album_images
        WHERE album_id IN (%s)
        ORDER BY album_id, width DESC
    `, placeholders)

    // Execute query
    rows, _ := d.mainDB.Query(query, albumIDs...)

    // Group by album_id
    result := make(map[string][]Image)
    for rows.Next() {
        var albumID string
        var img Image
        rows.Scan(&albumID, &img.URL, &img.Width, &img.Height)
        result[albumID] = append(result[albumID], img)
    }

    return result
}
```

**Similar functions:**
- `batchGetAlbumArtists(albumIDs []string) map[string][]Artist`
- `batchGetTrackArtists(trackIDs []string) map[string][]Artist`
- `batchGetArtistGenres(artistIDs []string) map[string][]string`
- `batchGetArtistImages(artistIDs []string) map[string][]Image`
- `batchEnrichTrackFiles(trackIDs []string) map[string]*TrackFile`

**Pattern:**
1. Build IN clause with placeholders
2. Execute single query for all IDs
3. Group results by parent ID
4. Return map for O(1) lookup

### Why Batch Matters

**Without batching (400 tracks):**
- 400 track queries
- 400 album queries
- 400 album image queries
- 400 album artist queries
- 400 track artist queries
- ~800 artist genre queries (2 artists per track avg)
- ~800 artist image queries
- 400 track file queries
- **Total: ~3,600 queries**

**With batching (400 tracks):**
- 1 batch track query
- 1 batch album image query
- 1 batch album artist query
- 1 batch track artist query
- 1 batch artist genre query
- 1 batch artist image query
- 1 batch track file query
- **Total: 7 queries**

**Performance gain: 514x fewer queries**

## Data Provenance

### Source

**Disclaimer from repository:**
> "This project is not affiliated with Spotify."

**Implications:**
- Data source unclear (likely scraped or obtained from third party)
- Legal status uncertain
- No official Spotify endorsement

### Data Freshness

**Static snapshot:**
- No update mechanism
- Data frozen at time of database creation
- No real-time sync with Spotify

**Staleness concerns:**
- New releases not included
- Popularity scores outdated
- Artist follower counts stale
- Deleted tracks still present

**Mitigation:**
- Treat as historical snapshot
- Complement with real-time APIs for fresh data
- Periodically obtain updated database (if available)

### Data Quality

**Strengths:**
- 256M tracks (massive coverage)
- Rich metadata (genres, images, roles)
- ISRC codes for cross-referencing
- Popularity/follower metrics

**Weaknesses:**
- No data validation visible
- Potential duplicates (not deduplicated)
- Missing ISRCs for some tracks
- Incomplete artist roles

## Storage Requirements

### Disk Space

| Component | Size | Compressible |
|-----------|------|--------------|
| main_database.sqlite3 | ~117GB | Minimal (already compact) |
| track_files.sqlite3 | ~99GB | Minimal (JSON fields) |
| **Total** | **~216GB** | - |

**Recommendations:**
- SSD strongly recommended (HDD too slow for 256M rows)
- NVMe for best performance
- RAID not necessary (read-only, can rebuild from backup)

### Memory Usage

**SQLite memory:**
- Page cache: 64MB per connection
- 8 connections: 512MB cache total
- Memory-mapped I/O: 1GB per database (2GB total)
- **Total: ~2.5GB minimum**

**Application memory:**
- Go runtime: ~50MB
- Rate limiter map: Grows unbounded (leak)
- Request buffers: ~10MB per concurrent request
- **Total: ~100MB + leak**

**Recommended RAM:** 4GB+ (2.5GB for SQLite + 1.5GB for OS/app)

### I/O Characteristics

**Read patterns:**
- Random reads (track lookups by ID/ISRC)
- Sequential scans (search queries)
- Batch reads (IN clause queries)

**Write patterns:**
- None (read-only)

**Cache effectiveness:**
- Hot data (popular tracks): High hit rate
- Cold data (obscure tracks): Low hit rate
- Search queries: Low hit rate (full scans)

## Database Maintenance

### No Maintenance Required

**Read-only benefits:**
- No VACUUM needed (no fragmentation from deletes)
- No ANALYZE needed (statistics static)
- No REINDEX needed (indexes don't degrade)
- No WAL checkpoint (journal disabled)

### Backup Strategy

**Simple backup:**
```bash
# Copy files (database must be idle)
cp main_database.sqlite3 backup/
cp track_files.sqlite3 backup/
```

**Online backup (while running):**
```bash
# SQLite backup API (requires custom tool)
sqlite3 main_database.sqlite3 ".backup backup/main_database.sqlite3"
```

**Restore:**
```bash
# Simply replace files
cp backup/main_database.sqlite3 .
cp backup/track_files.sqlite3 .
```

### Integrity Checks

**Verify database integrity:**
```bash
sqlite3 main_database.sqlite3 "PRAGMA integrity_check;"
sqlite3 track_files.sqlite3 "PRAGMA integrity_check;"
```

**Expected output:** `ok`

**Run periodically:** Monthly or after hardware issues

## Performance Tuning

### Query Optimization

**Indexes already present:**
- Primary keys on all ID columns
- Foreign key indexes (album_rowid, artist_id, etc.)
- Search indexes (tracks.name, artists.name)

**Missing indexes (potential improvements):**
- Full-text search index (FTS5) on track/artist names
- Composite index on (popularity, name) for sorted searches

### Connection Pool Tuning

**Current settings:**
```go
MaxOpenConns: 8
MaxIdleConns: 8
ConnMaxLifetime: 0
```

**Tuning considerations:**
- Increase MaxOpenConns for higher concurrency (16-32)
- Monitor CPU usage (SQLite is CPU-bound for searches)
- No benefit beyond CPU core count

### Cache Tuning

**Current cache:** 64MB per connection (512MB total)

**Increase cache:**
```
_cache_size=-128000  // 128MB per connection
```

**Tradeoff:** More memory usage vs fewer disk reads

**Recommendation:** Monitor cache hit rate, increase if low

### Memory-Mapped I/O Tuning

**Current mmap:** 1GB per database

**Increase mmap:**
```
_mmap_size=2147483648  // 2GB
```

**Tradeoff:** More virtual memory vs faster reads

**Recommendation:** Set to database size if RAM allows (117GB not feasible)

## Data Model Comparison

### vs Spotify Web API

| Feature | Music Metadata API | Spotify Web API |
|---------|-------------------|-----------------|
| Track ID format | Spotify-compatible | Spotify IDs |
| ISRC support | Yes | Yes |
| Popularity | Static snapshot | Real-time |
| Followers | Static snapshot | Real-time |
| Images | External URLs | External URLs |
| Genres | Artist-level | Artist-level |
| Lyrics | Flag only | Not available |
| Artist roles | Detailed | Limited |
| Languages | Supported | Not available |

### vs MusicBrainz

| Feature | Music Metadata API | MusicBrainz |
|---------|-------------------|-------------|
| Identifier | Spotify IDs, ISRC | MBIDs |
| Dataset size | 256M tracks | ~40M recordings |
| Popularity | Yes | No |
| Followers | Yes | No |
| Images | Yes (external) | Yes (Cover Art Archive) |
| Genres | Yes | Yes (tags) |
| Relationships | Limited | Extensive |
| Credits | Artist roles | Detailed credits |
| Updates | Static | Community-driven |

## Integration Considerations

### Joining with Other Databases

**ISRC as common key:**
```sql
-- Join with local library
SELECT l.file_path, m.name, m.popularity
FROM local_library l
JOIN music_metadata_api.tracks m ON l.isrc = m.isrc
```

**Spotify ID as common key:**
```sql
-- Join with MusicBrainz
SELECT mb.mbid, mm.name, mm.popularity
FROM musicbrainz.recording mb
JOIN musicbrainz.isrc i ON mb.id = i.recording
JOIN music_metadata_api.tracks mm ON i.isrc = mm.isrc
```

### Data Export

**Export to JSON:**
```bash
sqlite3 main_database.sqlite3 <<EOF
.mode json
.output tracks.json
SELECT * FROM tracks LIMIT 1000;
EOF
```

**Export to CSV:**
```bash
sqlite3 main_database.sqlite3 <<EOF
.mode csv
.output tracks.csv
SELECT id, name, isrc, popularity FROM tracks;
EOF
```

### Data Import

**Import from CSV:**
```bash
sqlite3 new_database.sqlite3 <<EOF
.mode csv
.import tracks.csv tracks
EOF
```

**Bulk insert from application:**
```go
tx, _ := db.Begin()
stmt, _ := tx.Prepare("INSERT INTO tracks VALUES (?, ?, ?, ...)")
for _, track := range tracks {
    stmt.Exec(track.ID, track.Name, track.ISRC, ...)
}
tx.Commit()
```

## Limitations

### No Write Operations

**Implications:**
- Can't add new tracks
- Can't update popularity scores
- Can't delete duplicates
- Can't fix data errors

**Workarounds:**
- Create separate writable database for local additions
- Use views to merge read-only + writable data
- Periodically obtain updated database snapshot

### No Full-Text Search

**Current search:** `LIKE %query%` (slow)

**FTS5 alternative:**
```sql
-- Create FTS5 virtual table (requires writable database)
CREATE VIRTUAL TABLE tracks_fts USING fts5(name, content=tracks);
INSERT INTO tracks_fts SELECT name FROM tracks;

-- Fast search
SELECT * FROM tracks_fts WHERE name MATCH 'bohemian';
```

**Limitation:** Can't create FTS5 on read-only database

**Workaround:** Create separate FTS5 database, sync periodically

### No Relationships Beyond Basics

**Missing relationships:**
- Track-to-track (similar tracks, remixes)
- Album-to-album (compilations, deluxe editions)
- Artist-to-artist (collaborations, bands)

**Workaround:** Build relationship graph in separate database