a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
1088 lines
32 KiB
Markdown
1088 lines
32 KiB
Markdown
# Lidarr Metadata API - Architecture
|
|
|
|
## Architectural Overview
|
|
|
|
LidarrAPI.Metadata implements a layered architecture with clear separation of concerns:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Cloudflare CDN │
|
|
│ (Edge Cache Layer) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Quart Application │
|
|
│ (app.py - Routes) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ API Layer │
|
|
│ (api.py - Business Logic) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Provider Layer │
|
|
│ (provider.py - Mixin Architecture) │
|
|
│ ┌──────────────┬──────────────┬──────────────────────┐ │
|
|
│ │ MusicBrainz │ Solr Search │ External Providers │ │
|
|
│ │ DB Direct │ (Artist/ │ (FanArt, TheAudioDB, │ │
|
|
│ │ │ Album) │ Wikipedia, Spotify) │ │
|
|
│ └──────────────┴──────────────┴──────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Cache Layer │
|
|
│ (cache.py - Multi-Tier Caching) │
|
|
│ ┌──────────────┬──────────────┬──────────────────────┐ │
|
|
│ │ Redis │ PostgreSQL │ Compression │ │
|
|
│ │ (Ephemeral) │ (Persistent) │ (zlib pickle) │ │
|
|
│ └──────────────┴──────────────┴──────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Data Sources │
|
|
│ ┌──────────────┬──────────────┬──────────────────────┐ │
|
|
│ │ MusicBrainz │ Solr │ External APIs │ │
|
|
│ │ PostgreSQL │ (Search) │ (15+ integrations) │ │
|
|
│ └──────────────┴──────────────┴──────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Mixin-Based Provider Architecture
|
|
|
|
The core architectural pattern is a mixin-based provider system that allows flexible composition of data source capabilities.
|
|
|
|
### Provider Mixin Hierarchy
|
|
|
|
```python
|
|
# Base capability mixins
|
|
class ArtistByIdMixin:
|
|
async def get_artist_by_id(self, mbid: str) -> dict:
|
|
raise NotImplementedError
|
|
|
|
class ArtistNameSearchMixin:
|
|
async def search_artist_name(self, query: str) -> list:
|
|
raise NotImplementedError
|
|
|
|
class AlbumByIdMixin:
|
|
async def get_album_by_id(self, mbid: str) -> dict:
|
|
raise NotImplementedError
|
|
|
|
class AlbumNameSearchMixin:
|
|
async def search_album_name(self, query: str) -> list:
|
|
raise NotImplementedError
|
|
|
|
class ArtistOverviewMixin:
|
|
async def get_artist_overview(self, mbid: str) -> str:
|
|
raise NotImplementedError
|
|
|
|
class ArtistImagesMixin:
|
|
async def get_artist_images(self, mbid: str) -> list:
|
|
raise NotImplementedError
|
|
|
|
class AlbumImagesMixin:
|
|
async def get_album_images(self, mbid: str) -> list:
|
|
raise NotImplementedError
|
|
|
|
class ArtistLinksMixin:
|
|
async def get_artist_links(self, mbid: str) -> list:
|
|
raise NotImplementedError
|
|
```
|
|
|
|
### Provider Implementations
|
|
|
|
Each provider implements one or more mixins based on its capabilities:
|
|
|
|
#### MusicbrainzDbProvider
|
|
|
|
**Mixins**: `ArtistByIdMixin`, `AlbumByIdMixin`
|
|
|
|
**Purpose**: Authoritative source for core music metadata
|
|
|
|
**Implementation**:
|
|
- Direct asyncpg connection to MusicBrainz PostgreSQL database
|
|
- Complex SQL queries with JSON aggregation (`row_to_json`, `json_agg`)
|
|
- Read-only access to replicated database
|
|
- Custom indices on `last_updated` columns for change detection
|
|
|
|
**SQL files**:
|
|
- `lidarrmetadata/sql/artist.sql`: Artist metadata with releases
|
|
- `lidarrmetadata/sql/album.sql`: Album metadata with tracks
|
|
- `lidarrmetadata/sql/updated_artists.sql`: Change detection query
|
|
- `lidarrmetadata/sql/updated_albums.sql`: Change detection query
|
|
|
|
**Key tables accessed**:
|
|
- `artist`: Core artist data
|
|
- `release_group`: Album groupings
|
|
- `release`: Specific releases
|
|
- `medium`: Physical/digital media
|
|
- `track`: Track listings
|
|
- `recording`: Recording metadata
|
|
- `url`: External links
|
|
- `l_artist_url`: Artist-URL relationships
|
|
- `cover_art_archive.index_listing`: Cover art availability
|
|
|
|
#### SolrSearchProvider
|
|
|
|
**Mixins**: `ArtistNameSearchMixin`, `AlbumNameSearchMixin`
|
|
|
|
**Purpose**: Full-text search for artist and album discovery
|
|
|
|
**Implementation**:
|
|
- Async HTTP client to Solr REST API
|
|
- Two cores: `artist` and `release-group`
|
|
- Dismax query parser for relevance ranking
|
|
- 5-second timeout per query
|
|
- Real-time index updates via RabbitMQ + SIR (Search Index Rebuilder)
|
|
|
|
**Query structure**:
|
|
```python
|
|
{
|
|
"query": query_string,
|
|
"limit": 10,
|
|
"params": {
|
|
"defType": "dismax",
|
|
"qf": "artist^2 sortname alias",
|
|
"mm": "1"
|
|
}
|
|
}
|
|
```
|
|
|
|
#### FanArtTvProvider
|
|
|
|
**Mixins**: `ArtistImagesMixin`, `AlbumImagesMixin`
|
|
|
|
**Purpose**: High-quality fan art and promotional images
|
|
|
|
**Implementation**:
|
|
- REST API with API key authentication
|
|
- 7-day lag for free API keys (personal keys have no lag)
|
|
- 30-day cache TTL
|
|
- Image types: `poster`, `banner`, `logo`, `fanart`, `cover`
|
|
- Fallback to Cover Art Archive if unavailable
|
|
|
|
**API endpoints**:
|
|
- `https://webservice.fanart.tv/v3/music/{mbid}`
|
|
- `https://webservice.fanart.tv/v3/music/albums/{mbid}`
|
|
|
|
#### TheAudioDbProvider
|
|
|
|
**Mixins**: `ArtistOverviewMixin`, `ArtistImagesMixin`, `AlbumImagesMixin`, `ArtistLinksMixin`
|
|
|
|
**Purpose**: Fallback provider for images and metadata
|
|
|
|
**Implementation**:
|
|
- REST API with API key "1" (public key)
|
|
- 10-second timeout
|
|
- Used as fallback when FanArt.tv or Wikipedia unavailable
|
|
- Provides artist biographies, images, and social media links
|
|
|
|
**API endpoints**:
|
|
- `https://theaudiodb.com/api/v1/json/1/artist-mb.php?i={mbid}`
|
|
- `https://theaudiodb.com/api/v1/json/1/album-mb.php?i={mbid}`
|
|
|
|
#### WikipediaProvider
|
|
|
|
**Mixins**: `ArtistOverviewMixin`
|
|
|
|
**Purpose**: Artist biographical information
|
|
|
|
**Implementation**:
|
|
- Multi-stage lookup: MusicBrainz → Wikidata → Wikipedia
|
|
- 32-language fallback chain (en, fr, de, es, it, ja, zh, ru, pt, nl, sv, fi, no, da, pl, cs, hu, ro, tr, el, he, ar, fa, hi, th, ko, vi, id, ms, tl, bn, ta)
|
|
- BeautifulSoup HTML parsing
|
|
- 2-second timeout per request
|
|
- 1 connection per host limit
|
|
- Extracts first paragraph as overview
|
|
|
|
**Lookup flow**:
|
|
```
|
|
MusicBrainz MBID → Wikidata entity → Wikipedia article → Extract summary
|
|
```
|
|
|
|
#### SpotifyProvider
|
|
|
|
**Mixins**: `ArtistByIdMixin`, `AlbumByIdMixin`, `ArtistLinksMixin`
|
|
|
|
**Purpose**: Spotify ID mapping and cross-platform linking
|
|
|
|
**Implementation**:
|
|
- spotipy library with OAuth authentication
|
|
- Levenshtein distance matching (0.8 threshold) for name-based lookups
|
|
- Provides Spotify URIs for deep linking
|
|
- Used for chart data correlation
|
|
|
|
**Authentication flow**:
|
|
- Client credentials OAuth
|
|
- Token refresh on expiration
|
|
- Tokens cached in Redis
|
|
|
|
## Layer Responsibilities
|
|
|
|
### 1. Quart Application Layer (app.py)
|
|
|
|
**Responsibilities**:
|
|
- HTTP request routing
|
|
- Request parameter validation
|
|
- Response serialization
|
|
- Cache-Control header management
|
|
- Error handling and HTTP status codes
|
|
|
|
**Key routes**:
|
|
```python
|
|
@app.route('/')
|
|
async def root():
|
|
# Health check and version info
|
|
|
|
@app.route('/artist/<mbid>')
|
|
async def get_artist(mbid):
|
|
# Artist metadata endpoint
|
|
|
|
@app.route('/artist/<mbid>/refresh', methods=['POST'])
|
|
async def refresh_artist(mbid):
|
|
# Cache invalidation endpoint
|
|
|
|
@app.route('/search/artist')
|
|
async def search_artist():
|
|
# Artist search endpoint
|
|
|
|
@app.route('/chart/<name>/<type>/<selection>')
|
|
async def get_chart(name, type, selection):
|
|
# Chart data endpoint
|
|
```
|
|
|
|
**Request flow**:
|
|
1. Parse and validate request parameters
|
|
2. Call API layer method
|
|
3. Set Cache-Control headers based on response metadata
|
|
4. Serialize response to JSON
|
|
5. Return HTTP response
|
|
|
|
### 2. API Layer (api.py)
|
|
|
|
**Responsibilities**:
|
|
- Business logic orchestration
|
|
- Provider coordination
|
|
- Data aggregation from multiple sources
|
|
- Cache management
|
|
- Response formatting
|
|
|
|
**Key methods**:
|
|
```python
|
|
async def get_artist_by_id(mbid, prim_types, sec_types, release_statuses):
|
|
# 1. Check cache
|
|
# 2. Query MusicBrainz DB
|
|
# 3. Parallel fetch: images, overview, links
|
|
# 4. Aggregate data
|
|
# 5. Cache result
|
|
# 6. Return formatted response
|
|
|
|
async def search_artist(query):
|
|
# 1. Check cache
|
|
# 2. Query Solr
|
|
# 3. Enrich results with cached metadata
|
|
# 4. Cache search results
|
|
# 5. Return formatted response
|
|
```
|
|
|
|
**Parallel fetching pattern**:
|
|
```python
|
|
# Fetch multiple data sources concurrently
|
|
images_task = asyncio.create_task(fanart_provider.get_artist_images(mbid))
|
|
overview_task = asyncio.create_task(wikipedia_provider.get_artist_overview(mbid))
|
|
links_task = asyncio.create_task(spotify_provider.get_artist_links(mbid))
|
|
|
|
images = await images_task
|
|
overview = await overview_task
|
|
links = await links_task
|
|
```
|
|
|
|
### 3. Provider Layer (provider.py)
|
|
|
|
**Responsibilities**:
|
|
- Data source abstraction
|
|
- External API communication
|
|
- Error handling and retries
|
|
- Timeout management
|
|
- Data transformation to common format
|
|
|
|
**Provider composition**:
|
|
```python
|
|
class MetadataProvider(
|
|
MusicbrainzDbProvider,
|
|
SolrSearchProvider,
|
|
FanArtTvProvider,
|
|
TheAudioDbProvider,
|
|
WikipediaProvider,
|
|
SpotifyProvider
|
|
):
|
|
"""Composite provider with all capabilities"""
|
|
pass
|
|
```
|
|
|
|
**Fallback chain example**:
|
|
```python
|
|
async def get_artist_images(self, mbid):
|
|
try:
|
|
# Primary: FanArt.tv
|
|
return await self.fanart.get_artist_images(mbid)
|
|
except (TimeoutError, HTTPError):
|
|
# Fallback: TheAudioDB
|
|
return await self.theaudiodb.get_artist_images(mbid)
|
|
except Exception:
|
|
# Last resort: Cover Art Archive
|
|
return await self.cover_art_archive.get_artist_images(mbid)
|
|
```
|
|
|
|
### 4. Cache Layer (cache.py)
|
|
|
|
**Responsibilities**:
|
|
- Multi-tier cache management
|
|
- Cache key generation
|
|
- TTL management
|
|
- Compression/decompression
|
|
- Cache invalidation
|
|
- Statistics tracking
|
|
|
|
**Cache tiers**:
|
|
|
|
#### Tier 1: Redis (Ephemeral)
|
|
|
|
**Configuration**:
|
|
- Namespace: `lm3.7`
|
|
- Memory limit: 512MB
|
|
- Eviction policy: LFU (Least Frequently Used)
|
|
- Default TTL: 7 days
|
|
|
|
**Use cases**:
|
|
- Hot data (frequently accessed artists/albums)
|
|
- Rate limiter state
|
|
- Sentry deduplication
|
|
- Invalidation locks
|
|
|
|
**Implementation**:
|
|
```python
|
|
class RedisCache:
|
|
async def get(self, key):
|
|
value = await self.redis.get(f"lm3.7:{key}")
|
|
if value:
|
|
return pickle.loads(zlib.decompress(value))
|
|
return None
|
|
|
|
async def set(self, key, value, ttl=604800): # 7 days
|
|
compressed = zlib.compress(pickle.dumps(value))
|
|
await self.redis.setex(f"lm3.7:{key}", ttl, compressed)
|
|
```
|
|
|
|
#### Tier 2: PostgreSQL (Persistent)
|
|
|
|
**Schema**:
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS {cache_name} (
|
|
key VARCHAR(255) PRIMARY KEY,
|
|
expires TIMESTAMP,
|
|
updated TIMESTAMP DEFAULT NOW(),
|
|
value BYTEA -- zlib compressed pickle
|
|
);
|
|
|
|
CREATE TRIGGER update_timestamp
|
|
BEFORE UPDATE ON {cache_name}
|
|
FOR EACH ROW
|
|
EXECUTE FUNCTION update_updated_column();
|
|
```
|
|
|
|
**Auto-created tables**:
|
|
- `artist`: Artist metadata cache
|
|
- `album`: Album metadata cache
|
|
- `spotify`: Spotify lookup cache
|
|
- `fanart`: FanArt.tv image cache
|
|
- `tadb`: TheAudioDB metadata cache
|
|
- `wikipedia`: Wikipedia overview cache
|
|
|
|
**Use cases**:
|
|
- Long-term storage for all metadata
|
|
- Fallback when Redis evicts data
|
|
- Historical data for analytics
|
|
- Compressed storage (10:1 ratio typical)
|
|
|
|
**Implementation**:
|
|
```python
|
|
class PostgresCache:
|
|
async def get(self, key):
|
|
row = await self.conn.fetchrow(
|
|
f"SELECT value, expires FROM {self.table} WHERE key = $1",
|
|
key
|
|
)
|
|
if row and (not row['expires'] or row['expires'] > datetime.now()):
|
|
return pickle.loads(zlib.decompress(row['value']))
|
|
return None
|
|
|
|
async def set(self, key, value, ttl=None):
|
|
compressed = zlib.compress(pickle.dumps(value))
|
|
expires = datetime.now() + timedelta(seconds=ttl) if ttl else None
|
|
await self.conn.execute(
|
|
f"""
|
|
INSERT INTO {self.table} (key, value, expires)
|
|
VALUES ($1, $2, $3)
|
|
ON CONFLICT (key) DO UPDATE
|
|
SET value = $2, expires = $3, updated = NOW()
|
|
""",
|
|
key, compressed, expires
|
|
)
|
|
```
|
|
|
|
#### Tier 3: Cloudflare CDN (Edge)
|
|
|
|
**Configuration**:
|
|
- Cache-Control header: `s-maxage=2592000, max-age=0`
|
|
- Programmatic purge via Cloudflare API
|
|
- Batch purge: 30 URLs per request
|
|
|
|
**Use cases**:
|
|
- Global edge caching for popular artists/albums
|
|
- Reduced origin load
|
|
- Low-latency responses worldwide
|
|
|
|
**Cache invalidation**:
|
|
```python
|
|
async def invalidate_cdn_cache(urls):
|
|
# Batch URLs into groups of 30
|
|
for batch in chunks(urls, 30):
|
|
await cloudflare_client.purge_cache(batch)
|
|
```
|
|
|
|
## Data Flow Patterns
|
|
|
|
### Artist Metadata Request Flow
|
|
|
|
```
|
|
1. Client → GET /artist/{mbid}
|
|
2. app.py → Validate MBID format
|
|
3. api.py → Check Redis cache
|
|
4. [CACHE MISS] → Check PostgreSQL cache
|
|
5. [CACHE MISS] → Query MusicBrainz DB (artist.sql)
|
|
6. api.py → Parallel fetch:
|
|
- FanArt.tv images
|
|
- Wikipedia overview
|
|
- Spotify links
|
|
- TheAudioDB metadata (fallback)
|
|
7. api.py → Aggregate data into Artist object
|
|
8. api.py → Store in PostgreSQL cache
|
|
9. api.py → Store in Redis cache
|
|
10. app.py → Set Cache-Control headers
|
|
11. app.py → Return JSON response
|
|
12. Cloudflare → Cache at edge
|
|
```
|
|
|
|
### Search Request Flow
|
|
|
|
```
|
|
1. Client → GET /search/artist?query=nirvana
|
|
2. app.py → Validate query parameter
|
|
3. api.py → Check Redis cache (key: search:artist:nirvana)
|
|
4. [CACHE MISS] → Query Solr artist core
|
|
5. Solr → Return list of MBIDs with scores
|
|
6. api.py → For each result:
|
|
- Check cache for artist metadata
|
|
- [CACHE MISS] → Fetch from MusicBrainz DB
|
|
7. api.py → Aggregate search results
|
|
8. api.py → Cache search results (TTL: 1 hour)
|
|
9. app.py → Return JSON response
|
|
```
|
|
|
|
### Cache Invalidation Flow
|
|
|
|
```
|
|
1. Crawler → Detect updated artist (updated_artists.sql)
|
|
2. Crawler → POST /artist/{mbid}/refresh
|
|
3. api.py → Verify INVALIDATE_APIKEY
|
|
4. api.py → Delete from Redis cache
|
|
5. api.py → Delete from PostgreSQL cache
|
|
6. api.py → Purge Cloudflare CDN cache
|
|
7. api.py → Return 200 OK
|
|
8. Next request → Cache miss → Fresh fetch
|
|
```
|
|
|
|
## Background Crawler Architecture
|
|
|
|
The crawler runs independently of the API server and proactively warms the cache.
|
|
|
|
### Crawler Types
|
|
|
|
#### 1. Wikipedia Overview Crawler
|
|
|
|
**Purpose**: Pre-fetch artist biographies
|
|
|
|
**Implementation**:
|
|
```python
|
|
async def crawl_wikipedia_overviews():
|
|
# Get recently updated artists
|
|
artists = await get_updated_artists(limit=100)
|
|
|
|
for artist in artists:
|
|
# Check if overview already cached
|
|
if not await cache.exists(f"wikipedia:{artist['mbid']}"):
|
|
# Fetch and cache overview
|
|
overview = await wikipedia_provider.get_artist_overview(artist['mbid'])
|
|
await cache.set(f"wikipedia:{artist['mbid']}", overview, ttl=2592000)
|
|
|
|
await asyncio.sleep(1) # Rate limiting
|
|
```
|
|
|
|
#### 2. FanArt.tv Image Crawler
|
|
|
|
**Purpose**: Pre-fetch artist and album images
|
|
|
|
**Implementation**:
|
|
```python
|
|
async def crawl_fanart_images():
|
|
artists = await get_updated_artists(limit=100)
|
|
|
|
for artist in artists:
|
|
if not await cache.exists(f"fanart:artist:{artist['mbid']}"):
|
|
images = await fanart_provider.get_artist_images(artist['mbid'])
|
|
await cache.set(f"fanart:artist:{artist['mbid']}", images, ttl=2592000)
|
|
|
|
await asyncio.sleep(2) # FanArt.tv rate limit
|
|
```
|
|
|
|
#### 3. TheAudioDB Metadata Crawler
|
|
|
|
**Purpose**: Pre-fetch fallback metadata
|
|
|
|
**Implementation**:
|
|
```python
|
|
async def crawl_theaudiodb_metadata():
|
|
artists = await get_updated_artists(limit=100)
|
|
|
|
for artist in artists:
|
|
if not await cache.exists(f"tadb:{artist['mbid']}"):
|
|
metadata = await theaudiodb_provider.get_artist_metadata(artist['mbid'])
|
|
await cache.set(f"tadb:{artist['mbid']}", metadata, ttl=2592000)
|
|
|
|
await asyncio.sleep(1)
|
|
```
|
|
|
|
#### 4. Artist Metadata Crawler
|
|
|
|
**Purpose**: Pre-fetch complete artist metadata
|
|
|
|
**Implementation**:
|
|
```python
|
|
async def crawl_artist_metadata():
|
|
artists = await get_updated_artists(limit=50)
|
|
|
|
for artist in artists:
|
|
# Fetch complete artist metadata (triggers all providers)
|
|
metadata = await api.get_artist_by_id(artist['mbid'])
|
|
# Already cached by get_artist_by_id
|
|
|
|
await asyncio.sleep(5) # Avoid overwhelming external APIs
|
|
```
|
|
|
|
#### 5. Album Metadata Crawler
|
|
|
|
**Purpose**: Pre-fetch album metadata for recently updated albums
|
|
|
|
**Implementation**:
|
|
```python
|
|
async def crawl_album_metadata():
|
|
albums = await get_updated_albums(limit=50)
|
|
|
|
for album in albums:
|
|
metadata = await api.get_album_by_id(album['mbid'])
|
|
await asyncio.sleep(5)
|
|
```
|
|
|
|
### Crawler Scheduling
|
|
|
|
Crawlers run on configurable schedules:
|
|
|
|
```python
|
|
# crawler.py
|
|
async def run_crawlers():
|
|
while True:
|
|
# Run all crawlers in parallel
|
|
await asyncio.gather(
|
|
crawl_wikipedia_overviews(),
|
|
crawl_fanart_images(),
|
|
crawl_theaudiodb_metadata(),
|
|
crawl_artist_metadata(),
|
|
crawl_album_metadata()
|
|
)
|
|
|
|
# Sleep between cycles
|
|
await asyncio.sleep(3600) # 1 hour
|
|
```
|
|
|
|
## MusicBrainz Database Integration
|
|
|
|
### Direct Database Access
|
|
|
|
Unlike most MusicBrainz consumers, this project queries the database directly rather than using the web API.
|
|
|
|
**Advantages**:
|
|
- Complex joins and aggregations in SQL
|
|
- No rate limiting
|
|
- Sub-second response times
|
|
- JSON aggregation in database
|
|
|
|
**Disadvantages**:
|
|
- Requires full MusicBrainz database replica (~100GB+)
|
|
- Must maintain custom indices
|
|
- Schema changes require SQL updates
|
|
|
|
### SQL Query Architecture
|
|
|
|
#### Artist Query (artist.sql)
|
|
|
|
**Purpose**: Fetch complete artist metadata with releases
|
|
|
|
**Key features**:
|
|
- JSON aggregation of releases
|
|
- Filtering by release type and status
|
|
- Link extraction
|
|
- Cover art availability check
|
|
|
|
**Query structure**:
|
|
```sql
|
|
SELECT
|
|
a.gid AS id,
|
|
a.name AS artist_name,
|
|
a.sort_name,
|
|
a.disambiguation,
|
|
a.type AS artist_type,
|
|
-- Aggregate releases as JSON array
|
|
COALESCE(
|
|
json_agg(
|
|
json_build_object(
|
|
'id', rg.gid,
|
|
'title', rg.name,
|
|
'type', rgt.name,
|
|
'status', rs.name,
|
|
'date', rd.date_year || '-' || rd.date_month || '-' || rd.date_day
|
|
)
|
|
ORDER BY rd.date_year DESC, rd.date_month DESC, rd.date_day DESC
|
|
) FILTER (WHERE rg.id IS NOT NULL),
|
|
'[]'::json
|
|
) AS releases,
|
|
-- Aggregate links as JSON array
|
|
COALESCE(
|
|
json_agg(
|
|
json_build_object(
|
|
'type', lt.name,
|
|
'url', u.url
|
|
)
|
|
) FILTER (WHERE u.id IS NOT NULL),
|
|
'[]'::json
|
|
) AS links
|
|
FROM artist a
|
|
LEFT JOIN release_group rg ON rg.artist_credit = a.id
|
|
LEFT JOIN release_group_type rgt ON rg.type = rgt.id
|
|
LEFT JOIN release r ON r.release_group = rg.id
|
|
LEFT JOIN release_status rs ON r.status = rs.id
|
|
LEFT JOIN l_artist_url lau ON lau.entity0 = a.id
|
|
LEFT JOIN url u ON lau.entity1 = u.id
|
|
LEFT JOIN link_type lt ON lau.link = lt.id
|
|
WHERE a.gid = $1
|
|
GROUP BY a.id;
|
|
```
|
|
|
|
#### Album Query (album.sql)
|
|
|
|
**Purpose**: Fetch album metadata with tracks
|
|
|
|
**Key features**:
|
|
- Track listing aggregation
|
|
- Medium information (CD, Vinyl, Digital)
|
|
- Recording metadata
|
|
- Cover art availability
|
|
|
|
**Query structure**:
|
|
```sql
|
|
SELECT
|
|
rg.gid AS id,
|
|
rg.name AS title,
|
|
a.name AS artist_name,
|
|
-- Aggregate media as JSON array
|
|
COALESCE(
|
|
json_agg(
|
|
json_build_object(
|
|
'position', m.position,
|
|
'format', mf.name,
|
|
'tracks', (
|
|
SELECT json_agg(
|
|
json_build_object(
|
|
'position', t.position,
|
|
'title', t.name,
|
|
'duration', t.length
|
|
)
|
|
ORDER BY t.position
|
|
)
|
|
FROM track t
|
|
WHERE t.medium = m.id
|
|
)
|
|
)
|
|
ORDER BY m.position
|
|
) FILTER (WHERE m.id IS NOT NULL),
|
|
'[]'::json
|
|
) AS media
|
|
FROM release_group rg
|
|
JOIN artist_credit ac ON rg.artist_credit = ac.id
|
|
JOIN artist a ON ac.id = a.id
|
|
LEFT JOIN release r ON r.release_group = rg.id
|
|
LEFT JOIN medium m ON m.release = r.id
|
|
LEFT JOIN medium_format mf ON m.format = mf.id
|
|
WHERE rg.gid = $1
|
|
GROUP BY rg.id, a.name;
|
|
```
|
|
|
|
#### Change Detection Queries
|
|
|
|
**updated_artists.sql**: Detects recently updated artists
|
|
|
|
**Change sources** (UNION of 5 queries):
|
|
1. Artists with updated metadata
|
|
2. Artists with new releases
|
|
3. Artists with updated releases
|
|
4. Artists with new links
|
|
5. Artists with updated cover art
|
|
|
|
**Query structure**:
|
|
```sql
|
|
-- Source 1: Updated artist metadata
|
|
SELECT DISTINCT a.gid, a.last_updated
|
|
FROM artist a
|
|
WHERE a.last_updated > $1
|
|
|
|
UNION
|
|
|
|
-- Source 2: New releases
|
|
SELECT DISTINCT a.gid, rg.last_updated
|
|
FROM artist a
|
|
JOIN release_group rg ON rg.artist_credit = a.id
|
|
WHERE rg.last_updated > $1
|
|
|
|
UNION
|
|
|
|
-- Source 3: Updated releases
|
|
SELECT DISTINCT a.gid, r.last_updated
|
|
FROM artist a
|
|
JOIN release_group rg ON rg.artist_credit = a.id
|
|
JOIN release r ON r.release_group = rg.id
|
|
WHERE r.last_updated > $1
|
|
|
|
UNION
|
|
|
|
-- Source 4: New links
|
|
SELECT DISTINCT a.gid, lau.last_updated
|
|
FROM artist a
|
|
JOIN l_artist_url lau ON lau.entity0 = a.id
|
|
WHERE lau.last_updated > $1
|
|
|
|
UNION
|
|
|
|
-- Source 5: Updated cover art
|
|
SELECT DISTINCT a.gid, caa.date_updated
|
|
FROM artist a
|
|
JOIN release_group rg ON rg.artist_credit = a.id
|
|
JOIN release r ON r.release_group = rg.id
|
|
JOIN cover_art_archive.index_listing caa ON caa.release = r.id
|
|
WHERE caa.date_updated > $1
|
|
|
|
ORDER BY last_updated DESC
|
|
LIMIT $2;
|
|
```
|
|
|
|
**updated_albums.sql**: Similar structure for album change detection
|
|
|
|
### Custom Database Indices
|
|
|
|
To support efficient change detection queries:
|
|
|
|
```sql
|
|
-- Artist last_updated index
|
|
CREATE INDEX IF NOT EXISTS idx_artist_last_updated
|
|
ON artist (last_updated DESC);
|
|
|
|
-- Release group last_updated index
|
|
CREATE INDEX IF NOT EXISTS idx_release_group_last_updated
|
|
ON release_group (last_updated DESC);
|
|
|
|
-- Release last_updated index
|
|
CREATE INDEX IF NOT EXISTS idx_release_last_updated
|
|
ON release (last_updated DESC);
|
|
|
|
-- Cover art date_updated index
|
|
CREATE INDEX IF NOT EXISTS idx_cover_art_date_updated
|
|
ON cover_art_archive.index_listing (date_updated DESC);
|
|
```
|
|
|
|
## Configuration Architecture
|
|
|
|
### Metaclass-Based Configuration System
|
|
|
|
The project uses a sophisticated metaclass-based configuration system that allows environment variable overrides with nested key support.
|
|
|
|
**Base configuration** (config.py):
|
|
```python
|
|
class ConfigMeta(type):
|
|
"""Metaclass that allows environment variable overrides"""
|
|
|
|
def __getattribute__(cls, name):
|
|
# Check for environment variable override
|
|
env_key = f"{cls.__name__.upper()}_{name.upper()}"
|
|
if env_key in os.environ:
|
|
return os.environ[env_key]
|
|
|
|
# Check for nested override (double underscore)
|
|
if '__' in name:
|
|
parts = name.split('__')
|
|
value = super().__getattribute__(parts[0])
|
|
for part in parts[1:]:
|
|
value = value[part]
|
|
return value
|
|
|
|
return super().__getattribute__(name)
|
|
|
|
class DefaultConfig(metaclass=ConfigMeta):
|
|
# Application
|
|
APPLICATION_ROOT = '/'
|
|
PORT = 5001
|
|
|
|
# Database
|
|
DATABASE = {
|
|
'host': 'localhost',
|
|
'port': 5432,
|
|
'database': 'musicbrainz',
|
|
'user': 'abc',
|
|
'password': 'abc'
|
|
}
|
|
|
|
# Cache
|
|
CACHE = {
|
|
'redis_url': 'redis://localhost:6379/0',
|
|
'postgres_url': 'postgresql://abc:abc@localhost/lm_cache'
|
|
}
|
|
|
|
# External APIs
|
|
FANART_API_KEY = None
|
|
THEAUDIODB_API_KEY = '1'
|
|
SPOTIFY_CLIENT_ID = None
|
|
SPOTIFY_CLIENT_SECRET = None
|
|
LASTFM_API_KEY = None
|
|
|
|
# Cloudflare
|
|
CLOUDFLARE_ZONE_ID = None
|
|
CLOUDFLARE_API_TOKEN = None
|
|
|
|
# Monitoring
|
|
SENTRY_DSN = None
|
|
STATSD_HOST = None
|
|
STATSD_PORT = 8125
|
|
```
|
|
|
|
**Environment variable override examples**:
|
|
```bash
|
|
# Simple override
|
|
export DEFAULTCONFIG_PORT=8080
|
|
|
|
# Nested override (double underscore)
|
|
export DEFAULTCONFIG_DATABASE__HOST=musicbrainz-db
|
|
export DEFAULTCONFIG_CACHE__REDIS_URL=redis://redis:6379/1
|
|
|
|
# Select configuration class
|
|
export LIDARR_METADATA_CONFIG=lidarrmetadata.config.ProductionConfig
|
|
```
|
|
|
|
### Configuration Classes
|
|
|
|
**DevelopmentConfig**:
|
|
- Debug logging enabled
|
|
- Local database connections
|
|
- No Sentry
|
|
- No rate limiting
|
|
|
|
**TestConfig**:
|
|
- In-memory SQLite for cache
|
|
- Mock external APIs
|
|
- Synchronous execution for deterministic tests
|
|
|
|
**ProductionConfig**:
|
|
- Container-based service discovery
|
|
- Sentry enabled
|
|
- Redis rate limiting
|
|
- Cloudflare CDN integration
|
|
|
|
## Error Handling and Resilience
|
|
|
|
### Provider Timeout Handling
|
|
|
|
Each provider has configurable timeouts:
|
|
|
|
```python
|
|
class ProviderConfig:
|
|
MUSICBRAINZ_TIMEOUT = 30 # Complex SQL queries
|
|
SOLR_TIMEOUT = 5 # Search queries
|
|
FANART_TIMEOUT = 10 # Image API
|
|
THEAUDIODB_TIMEOUT = 10 # Metadata API
|
|
WIKIPEDIA_TIMEOUT = 2 # Per-request
|
|
SPOTIFY_TIMEOUT = 5 # OAuth + API
|
|
```
|
|
|
|
### Fallback Chain Implementation
|
|
|
|
```python
|
|
async def get_artist_images(self, mbid):
|
|
providers = [
|
|
(self.fanart, "FanArt.tv"),
|
|
(self.theaudiodb, "TheAudioDB"),
|
|
(self.cover_art_archive, "Cover Art Archive")
|
|
]
|
|
|
|
for provider, name in providers:
|
|
try:
|
|
images = await asyncio.wait_for(
|
|
provider.get_artist_images(mbid),
|
|
timeout=provider.timeout
|
|
)
|
|
if images:
|
|
logger.info(f"Got images from {name}")
|
|
return images
|
|
except asyncio.TimeoutError:
|
|
logger.warning(f"{name} timeout, trying next provider")
|
|
except Exception as e:
|
|
logger.error(f"{name} error: {e}, trying next provider")
|
|
|
|
return [] # No images available
|
|
```
|
|
|
|
### Graceful Degradation
|
|
|
|
When external services fail, the API returns partial data:
|
|
|
|
```python
|
|
{
|
|
"Id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
|
|
"ArtistName": "Nirvana",
|
|
"Overview": null, # Wikipedia unavailable
|
|
"Images": [], # FanArt.tv and TheAudioDB unavailable
|
|
"Links": [...], # MusicBrainz links still available
|
|
"Albums": [...] # Core data from MusicBrainz DB
|
|
}
|
|
```
|
|
|
|
## Scalability Considerations
|
|
|
|
### Horizontal Scaling
|
|
|
|
The API is stateless and can be horizontally scaled:
|
|
|
|
```yaml
|
|
# docker-compose.prod.yml
|
|
services:
|
|
api-v0.3:
|
|
image: ghcr.io/lidarr/lidarrapi.metadata:v0.3
|
|
deploy:
|
|
replicas: 4
|
|
environment:
|
|
- CACHE__REDIS_URL=redis://redis:6379/0
|
|
```
|
|
|
|
### Database Connection Pooling
|
|
|
|
asyncpg connection pool configuration:
|
|
|
|
```python
|
|
pool = await asyncpg.create_pool(
|
|
host=config.DATABASE['host'],
|
|
port=config.DATABASE['port'],
|
|
database=config.DATABASE['database'],
|
|
user=config.DATABASE['user'],
|
|
password=config.DATABASE['password'],
|
|
min_size=10,
|
|
max_size=50,
|
|
command_timeout=30
|
|
)
|
|
```
|
|
|
|
### Redis Connection Pooling
|
|
|
|
aioredis connection pool:
|
|
|
|
```python
|
|
redis = await aioredis.create_redis_pool(
|
|
config.CACHE['redis_url'],
|
|
minsize=5,
|
|
maxsize=20,
|
|
encoding='utf-8'
|
|
)
|
|
```
|
|
|
|
### Rate Limiting Architecture
|
|
|
|
Three rate limiter implementations:
|
|
|
|
#### NullRateLimiter (Default)
|
|
|
|
No rate limiting, maximum throughput.
|
|
|
|
#### SimpleRateLimiter
|
|
|
|
In-memory queue-based rate limiting:
|
|
|
|
```python
|
|
class SimpleRateLimiter:
|
|
def __init__(self, max_requests, time_window):
|
|
self.max_requests = max_requests
|
|
self.time_window = time_window
|
|
self.requests = []
|
|
|
|
async def acquire(self):
|
|
now = time.time()
|
|
# Remove old requests
|
|
self.requests = [r for r in self.requests if r > now - self.time_window]
|
|
|
|
if len(self.requests) >= self.max_requests:
|
|
sleep_time = self.requests[0] + self.time_window - now
|
|
await asyncio.sleep(sleep_time)
|
|
|
|
self.requests.append(now)
|
|
```
|
|
|
|
#### RedisRateLimiter
|
|
|
|
Distributed rate limiting across multiple API instances:
|
|
|
|
```python
|
|
class RedisRateLimiter:
|
|
async def acquire(self, key):
|
|
now = time.time()
|
|
window_key = f"ratelimit:{key}:{int(now / self.time_window)}"
|
|
|
|
count = await self.redis.incr(window_key)
|
|
if count == 1:
|
|
await self.redis.expire(window_key, self.time_window)
|
|
|
|
if count > self.max_requests:
|
|
raise RateLimitExceeded(f"Rate limit exceeded for {key}")
|
|
```
|
|
|
|
## Conclusion
|
|
|
|
The architecture demonstrates several advanced patterns:
|
|
|
|
1. **Mixin-based provider composition**: Flexible, testable, extensible
|
|
2. **Multi-tier caching**: Redis (hot) + PostgreSQL (persistent) + CDN (edge)
|
|
3. **Direct database access**: Complex SQL aggregations for performance
|
|
4. **Async-first design**: Quart + asyncpg + aioredis for high concurrency
|
|
5. **Fallback chains**: Graceful degradation when external services fail
|
|
6. **Background crawling**: Proactive cache warming for better UX
|
|
7. **Change detection**: Efficient invalidation based on upstream updates
|
|
|
|
The mixin architecture is particularly elegant, allowing providers to be composed based on capabilities rather than inheritance hierarchies. This makes testing and mocking straightforward.
|
|
|
|
The three-tier caching strategy with compression achieves excellent hit rates while keeping storage costs reasonable. The crawler ensures popular content is always cached.
|
|
|
|
Direct MusicBrainz database access with JSON aggregation SQL is a key performance optimization that would be difficult to replicate with the web API.
|