Files
metadata-agregator/docs/research/lidarr-metadata-api/analysis/ARCHITECTURE.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

1088 lines
32 KiB
Markdown

# Lidarr Metadata API - Architecture
## Architectural Overview
LidarrAPI.Metadata implements a layered architecture with clear separation of concerns:
```
┌─────────────────────────────────────────────────────────────┐
│ Cloudflare CDN │
│ (Edge Cache Layer) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Quart Application │
│ (app.py - Routes) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ API Layer │
│ (api.py - Business Logic) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Provider Layer │
│ (provider.py - Mixin Architecture) │
│ ┌──────────────┬──────────────┬──────────────────────┐ │
│ │ MusicBrainz │ Solr Search │ External Providers │ │
│ │ DB Direct │ (Artist/ │ (FanArt, TheAudioDB, │ │
│ │ │ Album) │ Wikipedia, Spotify) │ │
│ └──────────────┴──────────────┴──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Cache Layer │
│ (cache.py - Multi-Tier Caching) │
│ ┌──────────────┬──────────────┬──────────────────────┐ │
│ │ Redis │ PostgreSQL │ Compression │ │
│ │ (Ephemeral) │ (Persistent) │ (zlib pickle) │ │
│ └──────────────┴──────────────┴──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Data Sources │
│ ┌──────────────┬──────────────┬──────────────────────┐ │
│ │ MusicBrainz │ Solr │ External APIs │ │
│ │ PostgreSQL │ (Search) │ (15+ integrations) │ │
│ └──────────────┴──────────────┴──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## Mixin-Based Provider Architecture
The core architectural pattern is a mixin-based provider system that allows flexible composition of data source capabilities.
### Provider Mixin Hierarchy
```python
# Base capability mixins
class ArtistByIdMixin:
async def get_artist_by_id(self, mbid: str) -> dict:
raise NotImplementedError
class ArtistNameSearchMixin:
async def search_artist_name(self, query: str) -> list:
raise NotImplementedError
class AlbumByIdMixin:
async def get_album_by_id(self, mbid: str) -> dict:
raise NotImplementedError
class AlbumNameSearchMixin:
async def search_album_name(self, query: str) -> list:
raise NotImplementedError
class ArtistOverviewMixin:
async def get_artist_overview(self, mbid: str) -> str:
raise NotImplementedError
class ArtistImagesMixin:
async def get_artist_images(self, mbid: str) -> list:
raise NotImplementedError
class AlbumImagesMixin:
async def get_album_images(self, mbid: str) -> list:
raise NotImplementedError
class ArtistLinksMixin:
async def get_artist_links(self, mbid: str) -> list:
raise NotImplementedError
```
### Provider Implementations
Each provider implements one or more mixins based on its capabilities:
#### MusicbrainzDbProvider
**Mixins**: `ArtistByIdMixin`, `AlbumByIdMixin`
**Purpose**: Authoritative source for core music metadata
**Implementation**:
- Direct asyncpg connection to MusicBrainz PostgreSQL database
- Complex SQL queries with JSON aggregation (`row_to_json`, `json_agg`)
- Read-only access to replicated database
- Custom indices on `last_updated` columns for change detection
**SQL files**:
- `lidarrmetadata/sql/artist.sql`: Artist metadata with releases
- `lidarrmetadata/sql/album.sql`: Album metadata with tracks
- `lidarrmetadata/sql/updated_artists.sql`: Change detection query
- `lidarrmetadata/sql/updated_albums.sql`: Change detection query
**Key tables accessed**:
- `artist`: Core artist data
- `release_group`: Album groupings
- `release`: Specific releases
- `medium`: Physical/digital media
- `track`: Track listings
- `recording`: Recording metadata
- `url`: External links
- `l_artist_url`: Artist-URL relationships
- `cover_art_archive.index_listing`: Cover art availability
#### SolrSearchProvider
**Mixins**: `ArtistNameSearchMixin`, `AlbumNameSearchMixin`
**Purpose**: Full-text search for artist and album discovery
**Implementation**:
- Async HTTP client to Solr REST API
- Two cores: `artist` and `release-group`
- Dismax query parser for relevance ranking
- 5-second timeout per query
- Real-time index updates via RabbitMQ + SIR (Search Index Rebuilder)
**Query structure**:
```python
{
"query": query_string,
"limit": 10,
"params": {
"defType": "dismax",
"qf": "artist^2 sortname alias",
"mm": "1"
}
}
```
#### FanArtTvProvider
**Mixins**: `ArtistImagesMixin`, `AlbumImagesMixin`
**Purpose**: High-quality fan art and promotional images
**Implementation**:
- REST API with API key authentication
- 7-day lag for free API keys (personal keys have no lag)
- 30-day cache TTL
- Image types: `poster`, `banner`, `logo`, `fanart`, `cover`
- Fallback to Cover Art Archive if unavailable
**API endpoints**:
- `https://webservice.fanart.tv/v3/music/{mbid}`
- `https://webservice.fanart.tv/v3/music/albums/{mbid}`
#### TheAudioDbProvider
**Mixins**: `ArtistOverviewMixin`, `ArtistImagesMixin`, `AlbumImagesMixin`, `ArtistLinksMixin`
**Purpose**: Fallback provider for images and metadata
**Implementation**:
- REST API with API key "1" (public key)
- 10-second timeout
- Used as fallback when FanArt.tv or Wikipedia unavailable
- Provides artist biographies, images, and social media links
**API endpoints**:
- `https://theaudiodb.com/api/v1/json/1/artist-mb.php?i={mbid}`
- `https://theaudiodb.com/api/v1/json/1/album-mb.php?i={mbid}`
#### WikipediaProvider
**Mixins**: `ArtistOverviewMixin`
**Purpose**: Artist biographical information
**Implementation**:
- Multi-stage lookup: MusicBrainz → Wikidata → Wikipedia
- 32-language fallback chain (en, fr, de, es, it, ja, zh, ru, pt, nl, sv, fi, no, da, pl, cs, hu, ro, tr, el, he, ar, fa, hi, th, ko, vi, id, ms, tl, bn, ta)
- BeautifulSoup HTML parsing
- 2-second timeout per request
- 1 connection per host limit
- Extracts first paragraph as overview
**Lookup flow**:
```
MusicBrainz MBID → Wikidata entity → Wikipedia article → Extract summary
```
#### SpotifyProvider
**Mixins**: `ArtistByIdMixin`, `AlbumByIdMixin`, `ArtistLinksMixin`
**Purpose**: Spotify ID mapping and cross-platform linking
**Implementation**:
- spotipy library with OAuth authentication
- Levenshtein distance matching (0.8 threshold) for name-based lookups
- Provides Spotify URIs for deep linking
- Used for chart data correlation
**Authentication flow**:
- Client credentials OAuth
- Token refresh on expiration
- Tokens cached in Redis
## Layer Responsibilities
### 1. Quart Application Layer (app.py)
**Responsibilities**:
- HTTP request routing
- Request parameter validation
- Response serialization
- Cache-Control header management
- Error handling and HTTP status codes
**Key routes**:
```python
@app.route('/')
async def root():
# Health check and version info
@app.route('/artist/<mbid>')
async def get_artist(mbid):
# Artist metadata endpoint
@app.route('/artist/<mbid>/refresh', methods=['POST'])
async def refresh_artist(mbid):
# Cache invalidation endpoint
@app.route('/search/artist')
async def search_artist():
# Artist search endpoint
@app.route('/chart/<name>/<type>/<selection>')
async def get_chart(name, type, selection):
# Chart data endpoint
```
**Request flow**:
1. Parse and validate request parameters
2. Call API layer method
3. Set Cache-Control headers based on response metadata
4. Serialize response to JSON
5. Return HTTP response
### 2. API Layer (api.py)
**Responsibilities**:
- Business logic orchestration
- Provider coordination
- Data aggregation from multiple sources
- Cache management
- Response formatting
**Key methods**:
```python
async def get_artist_by_id(mbid, prim_types, sec_types, release_statuses):
# 1. Check cache
# 2. Query MusicBrainz DB
# 3. Parallel fetch: images, overview, links
# 4. Aggregate data
# 5. Cache result
# 6. Return formatted response
async def search_artist(query):
# 1. Check cache
# 2. Query Solr
# 3. Enrich results with cached metadata
# 4. Cache search results
# 5. Return formatted response
```
**Parallel fetching pattern**:
```python
# Fetch multiple data sources concurrently
images_task = asyncio.create_task(fanart_provider.get_artist_images(mbid))
overview_task = asyncio.create_task(wikipedia_provider.get_artist_overview(mbid))
links_task = asyncio.create_task(spotify_provider.get_artist_links(mbid))
images = await images_task
overview = await overview_task
links = await links_task
```
### 3. Provider Layer (provider.py)
**Responsibilities**:
- Data source abstraction
- External API communication
- Error handling and retries
- Timeout management
- Data transformation to common format
**Provider composition**:
```python
class MetadataProvider(
MusicbrainzDbProvider,
SolrSearchProvider,
FanArtTvProvider,
TheAudioDbProvider,
WikipediaProvider,
SpotifyProvider
):
"""Composite provider with all capabilities"""
pass
```
**Fallback chain example**:
```python
async def get_artist_images(self, mbid):
try:
# Primary: FanArt.tv
return await self.fanart.get_artist_images(mbid)
except (TimeoutError, HTTPError):
# Fallback: TheAudioDB
return await self.theaudiodb.get_artist_images(mbid)
except Exception:
# Last resort: Cover Art Archive
return await self.cover_art_archive.get_artist_images(mbid)
```
### 4. Cache Layer (cache.py)
**Responsibilities**:
- Multi-tier cache management
- Cache key generation
- TTL management
- Compression/decompression
- Cache invalidation
- Statistics tracking
**Cache tiers**:
#### Tier 1: Redis (Ephemeral)
**Configuration**:
- Namespace: `lm3.7`
- Memory limit: 512MB
- Eviction policy: LFU (Least Frequently Used)
- Default TTL: 7 days
**Use cases**:
- Hot data (frequently accessed artists/albums)
- Rate limiter state
- Sentry deduplication
- Invalidation locks
**Implementation**:
```python
class RedisCache:
async def get(self, key):
value = await self.redis.get(f"lm3.7:{key}")
if value:
return pickle.loads(zlib.decompress(value))
return None
async def set(self, key, value, ttl=604800): # 7 days
compressed = zlib.compress(pickle.dumps(value))
await self.redis.setex(f"lm3.7:{key}", ttl, compressed)
```
#### Tier 2: PostgreSQL (Persistent)
**Schema**:
```sql
CREATE TABLE IF NOT EXISTS {cache_name} (
key VARCHAR(255) PRIMARY KEY,
expires TIMESTAMP,
updated TIMESTAMP DEFAULT NOW(),
value BYTEA -- zlib compressed pickle
);
CREATE TRIGGER update_timestamp
BEFORE UPDATE ON {cache_name}
FOR EACH ROW
EXECUTE FUNCTION update_updated_column();
```
**Auto-created tables**:
- `artist`: Artist metadata cache
- `album`: Album metadata cache
- `spotify`: Spotify lookup cache
- `fanart`: FanArt.tv image cache
- `tadb`: TheAudioDB metadata cache
- `wikipedia`: Wikipedia overview cache
**Use cases**:
- Long-term storage for all metadata
- Fallback when Redis evicts data
- Historical data for analytics
- Compressed storage (10:1 ratio typical)
**Implementation**:
```python
class PostgresCache:
async def get(self, key):
row = await self.conn.fetchrow(
f"SELECT value, expires FROM {self.table} WHERE key = $1",
key
)
if row and (not row['expires'] or row['expires'] > datetime.now()):
return pickle.loads(zlib.decompress(row['value']))
return None
async def set(self, key, value, ttl=None):
compressed = zlib.compress(pickle.dumps(value))
expires = datetime.now() + timedelta(seconds=ttl) if ttl else None
await self.conn.execute(
f"""
INSERT INTO {self.table} (key, value, expires)
VALUES ($1, $2, $3)
ON CONFLICT (key) DO UPDATE
SET value = $2, expires = $3, updated = NOW()
""",
key, compressed, expires
)
```
#### Tier 3: Cloudflare CDN (Edge)
**Configuration**:
- Cache-Control header: `s-maxage=2592000, max-age=0`
- Programmatic purge via Cloudflare API
- Batch purge: 30 URLs per request
**Use cases**:
- Global edge caching for popular artists/albums
- Reduced origin load
- Low-latency responses worldwide
**Cache invalidation**:
```python
async def invalidate_cdn_cache(urls):
# Batch URLs into groups of 30
for batch in chunks(urls, 30):
await cloudflare_client.purge_cache(batch)
```
## Data Flow Patterns
### Artist Metadata Request Flow
```
1. Client → GET /artist/{mbid}
2. app.py → Validate MBID format
3. api.py → Check Redis cache
4. [CACHE MISS] → Check PostgreSQL cache
5. [CACHE MISS] → Query MusicBrainz DB (artist.sql)
6. api.py → Parallel fetch:
- FanArt.tv images
- Wikipedia overview
- Spotify links
- TheAudioDB metadata (fallback)
7. api.py → Aggregate data into Artist object
8. api.py → Store in PostgreSQL cache
9. api.py → Store in Redis cache
10. app.py → Set Cache-Control headers
11. app.py → Return JSON response
12. Cloudflare → Cache at edge
```
### Search Request Flow
```
1. Client → GET /search/artist?query=nirvana
2. app.py → Validate query parameter
3. api.py → Check Redis cache (key: search:artist:nirvana)
4. [CACHE MISS] → Query Solr artist core
5. Solr → Return list of MBIDs with scores
6. api.py → For each result:
- Check cache for artist metadata
- [CACHE MISS] → Fetch from MusicBrainz DB
7. api.py → Aggregate search results
8. api.py → Cache search results (TTL: 1 hour)
9. app.py → Return JSON response
```
### Cache Invalidation Flow
```
1. Crawler → Detect updated artist (updated_artists.sql)
2. Crawler → POST /artist/{mbid}/refresh
3. api.py → Verify INVALIDATE_APIKEY
4. api.py → Delete from Redis cache
5. api.py → Delete from PostgreSQL cache
6. api.py → Purge Cloudflare CDN cache
7. api.py → Return 200 OK
8. Next request → Cache miss → Fresh fetch
```
## Background Crawler Architecture
The crawler runs independently of the API server and proactively warms the cache.
### Crawler Types
#### 1. Wikipedia Overview Crawler
**Purpose**: Pre-fetch artist biographies
**Implementation**:
```python
async def crawl_wikipedia_overviews():
# Get recently updated artists
artists = await get_updated_artists(limit=100)
for artist in artists:
# Check if overview already cached
if not await cache.exists(f"wikipedia:{artist['mbid']}"):
# Fetch and cache overview
overview = await wikipedia_provider.get_artist_overview(artist['mbid'])
await cache.set(f"wikipedia:{artist['mbid']}", overview, ttl=2592000)
await asyncio.sleep(1) # Rate limiting
```
#### 2. FanArt.tv Image Crawler
**Purpose**: Pre-fetch artist and album images
**Implementation**:
```python
async def crawl_fanart_images():
artists = await get_updated_artists(limit=100)
for artist in artists:
if not await cache.exists(f"fanart:artist:{artist['mbid']}"):
images = await fanart_provider.get_artist_images(artist['mbid'])
await cache.set(f"fanart:artist:{artist['mbid']}", images, ttl=2592000)
await asyncio.sleep(2) # FanArt.tv rate limit
```
#### 3. TheAudioDB Metadata Crawler
**Purpose**: Pre-fetch fallback metadata
**Implementation**:
```python
async def crawl_theaudiodb_metadata():
artists = await get_updated_artists(limit=100)
for artist in artists:
if not await cache.exists(f"tadb:{artist['mbid']}"):
metadata = await theaudiodb_provider.get_artist_metadata(artist['mbid'])
await cache.set(f"tadb:{artist['mbid']}", metadata, ttl=2592000)
await asyncio.sleep(1)
```
#### 4. Artist Metadata Crawler
**Purpose**: Pre-fetch complete artist metadata
**Implementation**:
```python
async def crawl_artist_metadata():
artists = await get_updated_artists(limit=50)
for artist in artists:
# Fetch complete artist metadata (triggers all providers)
metadata = await api.get_artist_by_id(artist['mbid'])
# Already cached by get_artist_by_id
await asyncio.sleep(5) # Avoid overwhelming external APIs
```
#### 5. Album Metadata Crawler
**Purpose**: Pre-fetch album metadata for recently updated albums
**Implementation**:
```python
async def crawl_album_metadata():
albums = await get_updated_albums(limit=50)
for album in albums:
metadata = await api.get_album_by_id(album['mbid'])
await asyncio.sleep(5)
```
### Crawler Scheduling
Crawlers run on configurable schedules:
```python
# crawler.py
async def run_crawlers():
while True:
# Run all crawlers in parallel
await asyncio.gather(
crawl_wikipedia_overviews(),
crawl_fanart_images(),
crawl_theaudiodb_metadata(),
crawl_artist_metadata(),
crawl_album_metadata()
)
# Sleep between cycles
await asyncio.sleep(3600) # 1 hour
```
## MusicBrainz Database Integration
### Direct Database Access
Unlike most MusicBrainz consumers, this project queries the database directly rather than using the web API.
**Advantages**:
- Complex joins and aggregations in SQL
- No rate limiting
- Sub-second response times
- JSON aggregation in database
**Disadvantages**:
- Requires full MusicBrainz database replica (~100GB+)
- Must maintain custom indices
- Schema changes require SQL updates
### SQL Query Architecture
#### Artist Query (artist.sql)
**Purpose**: Fetch complete artist metadata with releases
**Key features**:
- JSON aggregation of releases
- Filtering by release type and status
- Link extraction
- Cover art availability check
**Query structure**:
```sql
SELECT
a.gid AS id,
a.name AS artist_name,
a.sort_name,
a.disambiguation,
a.type AS artist_type,
-- Aggregate releases as JSON array
COALESCE(
json_agg(
json_build_object(
'id', rg.gid,
'title', rg.name,
'type', rgt.name,
'status', rs.name,
'date', rd.date_year || '-' || rd.date_month || '-' || rd.date_day
)
ORDER BY rd.date_year DESC, rd.date_month DESC, rd.date_day DESC
) FILTER (WHERE rg.id IS NOT NULL),
'[]'::json
) AS releases,
-- Aggregate links as JSON array
COALESCE(
json_agg(
json_build_object(
'type', lt.name,
'url', u.url
)
) FILTER (WHERE u.id IS NOT NULL),
'[]'::json
) AS links
FROM artist a
LEFT JOIN release_group rg ON rg.artist_credit = a.id
LEFT JOIN release_group_type rgt ON rg.type = rgt.id
LEFT JOIN release r ON r.release_group = rg.id
LEFT JOIN release_status rs ON r.status = rs.id
LEFT JOIN l_artist_url lau ON lau.entity0 = a.id
LEFT JOIN url u ON lau.entity1 = u.id
LEFT JOIN link_type lt ON lau.link = lt.id
WHERE a.gid = $1
GROUP BY a.id;
```
#### Album Query (album.sql)
**Purpose**: Fetch album metadata with tracks
**Key features**:
- Track listing aggregation
- Medium information (CD, Vinyl, Digital)
- Recording metadata
- Cover art availability
**Query structure**:
```sql
SELECT
rg.gid AS id,
rg.name AS title,
a.name AS artist_name,
-- Aggregate media as JSON array
COALESCE(
json_agg(
json_build_object(
'position', m.position,
'format', mf.name,
'tracks', (
SELECT json_agg(
json_build_object(
'position', t.position,
'title', t.name,
'duration', t.length
)
ORDER BY t.position
)
FROM track t
WHERE t.medium = m.id
)
)
ORDER BY m.position
) FILTER (WHERE m.id IS NOT NULL),
'[]'::json
) AS media
FROM release_group rg
JOIN artist_credit ac ON rg.artist_credit = ac.id
JOIN artist a ON ac.id = a.id
LEFT JOIN release r ON r.release_group = rg.id
LEFT JOIN medium m ON m.release = r.id
LEFT JOIN medium_format mf ON m.format = mf.id
WHERE rg.gid = $1
GROUP BY rg.id, a.name;
```
#### Change Detection Queries
**updated_artists.sql**: Detects recently updated artists
**Change sources** (UNION of 5 queries):
1. Artists with updated metadata
2. Artists with new releases
3. Artists with updated releases
4. Artists with new links
5. Artists with updated cover art
**Query structure**:
```sql
-- Source 1: Updated artist metadata
SELECT DISTINCT a.gid, a.last_updated
FROM artist a
WHERE a.last_updated > $1
UNION
-- Source 2: New releases
SELECT DISTINCT a.gid, rg.last_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
WHERE rg.last_updated > $1
UNION
-- Source 3: Updated releases
SELECT DISTINCT a.gid, r.last_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
JOIN release r ON r.release_group = rg.id
WHERE r.last_updated > $1
UNION
-- Source 4: New links
SELECT DISTINCT a.gid, lau.last_updated
FROM artist a
JOIN l_artist_url lau ON lau.entity0 = a.id
WHERE lau.last_updated > $1
UNION
-- Source 5: Updated cover art
SELECT DISTINCT a.gid, caa.date_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
JOIN release r ON r.release_group = rg.id
JOIN cover_art_archive.index_listing caa ON caa.release = r.id
WHERE caa.date_updated > $1
ORDER BY last_updated DESC
LIMIT $2;
```
**updated_albums.sql**: Similar structure for album change detection
### Custom Database Indices
To support efficient change detection queries:
```sql
-- Artist last_updated index
CREATE INDEX IF NOT EXISTS idx_artist_last_updated
ON artist (last_updated DESC);
-- Release group last_updated index
CREATE INDEX IF NOT EXISTS idx_release_group_last_updated
ON release_group (last_updated DESC);
-- Release last_updated index
CREATE INDEX IF NOT EXISTS idx_release_last_updated
ON release (last_updated DESC);
-- Cover art date_updated index
CREATE INDEX IF NOT EXISTS idx_cover_art_date_updated
ON cover_art_archive.index_listing (date_updated DESC);
```
## Configuration Architecture
### Metaclass-Based Configuration System
The project uses a sophisticated metaclass-based configuration system that allows environment variable overrides with nested key support.
**Base configuration** (config.py):
```python
class ConfigMeta(type):
"""Metaclass that allows environment variable overrides"""
def __getattribute__(cls, name):
# Check for environment variable override
env_key = f"{cls.__name__.upper()}_{name.upper()}"
if env_key in os.environ:
return os.environ[env_key]
# Check for nested override (double underscore)
if '__' in name:
parts = name.split('__')
value = super().__getattribute__(parts[0])
for part in parts[1:]:
value = value[part]
return value
return super().__getattribute__(name)
class DefaultConfig(metaclass=ConfigMeta):
# Application
APPLICATION_ROOT = '/'
PORT = 5001
# Database
DATABASE = {
'host': 'localhost',
'port': 5432,
'database': 'musicbrainz',
'user': 'abc',
'password': 'abc'
}
# Cache
CACHE = {
'redis_url': 'redis://localhost:6379/0',
'postgres_url': 'postgresql://abc:abc@localhost/lm_cache'
}
# External APIs
FANART_API_KEY = None
THEAUDIODB_API_KEY = '1'
SPOTIFY_CLIENT_ID = None
SPOTIFY_CLIENT_SECRET = None
LASTFM_API_KEY = None
# Cloudflare
CLOUDFLARE_ZONE_ID = None
CLOUDFLARE_API_TOKEN = None
# Monitoring
SENTRY_DSN = None
STATSD_HOST = None
STATSD_PORT = 8125
```
**Environment variable override examples**:
```bash
# Simple override
export DEFAULTCONFIG_PORT=8080
# Nested override (double underscore)
export DEFAULTCONFIG_DATABASE__HOST=musicbrainz-db
export DEFAULTCONFIG_CACHE__REDIS_URL=redis://redis:6379/1
# Select configuration class
export LIDARR_METADATA_CONFIG=lidarrmetadata.config.ProductionConfig
```
### Configuration Classes
**DevelopmentConfig**:
- Debug logging enabled
- Local database connections
- No Sentry
- No rate limiting
**TestConfig**:
- In-memory SQLite for cache
- Mock external APIs
- Synchronous execution for deterministic tests
**ProductionConfig**:
- Container-based service discovery
- Sentry enabled
- Redis rate limiting
- Cloudflare CDN integration
## Error Handling and Resilience
### Provider Timeout Handling
Each provider has configurable timeouts:
```python
class ProviderConfig:
MUSICBRAINZ_TIMEOUT = 30 # Complex SQL queries
SOLR_TIMEOUT = 5 # Search queries
FANART_TIMEOUT = 10 # Image API
THEAUDIODB_TIMEOUT = 10 # Metadata API
WIKIPEDIA_TIMEOUT = 2 # Per-request
SPOTIFY_TIMEOUT = 5 # OAuth + API
```
### Fallback Chain Implementation
```python
async def get_artist_images(self, mbid):
providers = [
(self.fanart, "FanArt.tv"),
(self.theaudiodb, "TheAudioDB"),
(self.cover_art_archive, "Cover Art Archive")
]
for provider, name in providers:
try:
images = await asyncio.wait_for(
provider.get_artist_images(mbid),
timeout=provider.timeout
)
if images:
logger.info(f"Got images from {name}")
return images
except asyncio.TimeoutError:
logger.warning(f"{name} timeout, trying next provider")
except Exception as e:
logger.error(f"{name} error: {e}, trying next provider")
return [] # No images available
```
### Graceful Degradation
When external services fail, the API returns partial data:
```python
{
"Id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"ArtistName": "Nirvana",
"Overview": null, # Wikipedia unavailable
"Images": [], # FanArt.tv and TheAudioDB unavailable
"Links": [...], # MusicBrainz links still available
"Albums": [...] # Core data from MusicBrainz DB
}
```
## Scalability Considerations
### Horizontal Scaling
The API is stateless and can be horizontally scaled:
```yaml
# docker-compose.prod.yml
services:
api-v0.3:
image: ghcr.io/lidarr/lidarrapi.metadata:v0.3
deploy:
replicas: 4
environment:
- CACHE__REDIS_URL=redis://redis:6379/0
```
### Database Connection Pooling
asyncpg connection pool configuration:
```python
pool = await asyncpg.create_pool(
host=config.DATABASE['host'],
port=config.DATABASE['port'],
database=config.DATABASE['database'],
user=config.DATABASE['user'],
password=config.DATABASE['password'],
min_size=10,
max_size=50,
command_timeout=30
)
```
### Redis Connection Pooling
aioredis connection pool:
```python
redis = await aioredis.create_redis_pool(
config.CACHE['redis_url'],
minsize=5,
maxsize=20,
encoding='utf-8'
)
```
### Rate Limiting Architecture
Three rate limiter implementations:
#### NullRateLimiter (Default)
No rate limiting, maximum throughput.
#### SimpleRateLimiter
In-memory queue-based rate limiting:
```python
class SimpleRateLimiter:
def __init__(self, max_requests, time_window):
self.max_requests = max_requests
self.time_window = time_window
self.requests = []
async def acquire(self):
now = time.time()
# Remove old requests
self.requests = [r for r in self.requests if r > now - self.time_window]
if len(self.requests) >= self.max_requests:
sleep_time = self.requests[0] + self.time_window - now
await asyncio.sleep(sleep_time)
self.requests.append(now)
```
#### RedisRateLimiter
Distributed rate limiting across multiple API instances:
```python
class RedisRateLimiter:
async def acquire(self, key):
now = time.time()
window_key = f"ratelimit:{key}:{int(now / self.time_window)}"
count = await self.redis.incr(window_key)
if count == 1:
await self.redis.expire(window_key, self.time_window)
if count > self.max_requests:
raise RateLimitExceeded(f"Rate limit exceeded for {key}")
```
## Conclusion
The architecture demonstrates several advanced patterns:
1. **Mixin-based provider composition**: Flexible, testable, extensible
2. **Multi-tier caching**: Redis (hot) + PostgreSQL (persistent) + CDN (edge)
3. **Direct database access**: Complex SQL aggregations for performance
4. **Async-first design**: Quart + asyncpg + aioredis for high concurrency
5. **Fallback chains**: Graceful degradation when external services fail
6. **Background crawling**: Proactive cache warming for better UX
7. **Change detection**: Efficient invalidation based on upstream updates
The mixin architecture is particularly elegant, allowing providers to be composed based on capabilities rather than inheritance hierarchies. This makes testing and mocking straightforward.
The three-tier caching strategy with compression achieves excellent hit rates while keeping storage costs reasonable. The crawler ensures popular content is always cached.
Direct MusicBrainz database access with JSON aggregation SQL is a key performance optimization that would be difficult to replicate with the web API.