- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
32 KiB
Lidarr Metadata API - Architecture
Architectural Overview
LidarrAPI.Metadata implements a layered architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────────┐
│ Cloudflare CDN │
│ (Edge Cache Layer) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Quart Application │
│ (app.py - Routes) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ API Layer │
│ (api.py - Business Logic) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Provider Layer │
│ (provider.py - Mixin Architecture) │
│ ┌──────────────┬──────────────┬──────────────────────┐ │
│ │ MusicBrainz │ Solr Search │ External Providers │ │
│ │ DB Direct │ (Artist/ │ (FanArt, TheAudioDB, │ │
│ │ │ Album) │ Wikipedia, Spotify) │ │
│ └──────────────┴──────────────┴──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Cache Layer │
│ (cache.py - Multi-Tier Caching) │
│ ┌──────────────┬──────────────┬──────────────────────┐ │
│ │ Redis │ PostgreSQL │ Compression │ │
│ │ (Ephemeral) │ (Persistent) │ (zlib pickle) │ │
│ └──────────────┴──────────────┴──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Data Sources │
│ ┌──────────────┬──────────────┬──────────────────────┐ │
│ │ MusicBrainz │ Solr │ External APIs │ │
│ │ PostgreSQL │ (Search) │ (15+ integrations) │ │
│ └──────────────┴──────────────┴──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Mixin-Based Provider Architecture
The core architectural pattern is a mixin-based provider system that allows flexible composition of data source capabilities.
Provider Mixin Hierarchy
# Base capability mixins
class ArtistByIdMixin:
async def get_artist_by_id(self, mbid: str) -> dict:
raise NotImplementedError
class ArtistNameSearchMixin:
async def search_artist_name(self, query: str) -> list:
raise NotImplementedError
class AlbumByIdMixin:
async def get_album_by_id(self, mbid: str) -> dict:
raise NotImplementedError
class AlbumNameSearchMixin:
async def search_album_name(self, query: str) -> list:
raise NotImplementedError
class ArtistOverviewMixin:
async def get_artist_overview(self, mbid: str) -> str:
raise NotImplementedError
class ArtistImagesMixin:
async def get_artist_images(self, mbid: str) -> list:
raise NotImplementedError
class AlbumImagesMixin:
async def get_album_images(self, mbid: str) -> list:
raise NotImplementedError
class ArtistLinksMixin:
async def get_artist_links(self, mbid: str) -> list:
raise NotImplementedError
Provider Implementations
Each provider implements one or more mixins based on its capabilities:
MusicbrainzDbProvider
Mixins: ArtistByIdMixin, AlbumByIdMixin
Purpose: Authoritative source for core music metadata
Implementation:
- Direct asyncpg connection to MusicBrainz PostgreSQL database
- Complex SQL queries with JSON aggregation (
row_to_json,json_agg) - Read-only access to replicated database
- Custom indices on
last_updatedcolumns for change detection
SQL files:
lidarrmetadata/sql/artist.sql: Artist metadata with releaseslidarrmetadata/sql/album.sql: Album metadata with trackslidarrmetadata/sql/updated_artists.sql: Change detection querylidarrmetadata/sql/updated_albums.sql: Change detection query
Key tables accessed:
artist: Core artist datarelease_group: Album groupingsrelease: Specific releasesmedium: Physical/digital mediatrack: Track listingsrecording: Recording metadataurl: External linksl_artist_url: Artist-URL relationshipscover_art_archive.index_listing: Cover art availability
SolrSearchProvider
Mixins: ArtistNameSearchMixin, AlbumNameSearchMixin
Purpose: Full-text search for artist and album discovery
Implementation:
- Async HTTP client to Solr REST API
- Two cores:
artistandrelease-group - Dismax query parser for relevance ranking
- 5-second timeout per query
- Real-time index updates via RabbitMQ + SIR (Search Index Rebuilder)
Query structure:
{
"query": query_string,
"limit": 10,
"params": {
"defType": "dismax",
"qf": "artist^2 sortname alias",
"mm": "1"
}
}
FanArtTvProvider
Mixins: ArtistImagesMixin, AlbumImagesMixin
Purpose: High-quality fan art and promotional images
Implementation:
- REST API with API key authentication
- 7-day lag for free API keys (personal keys have no lag)
- 30-day cache TTL
- Image types:
poster,banner,logo,fanart,cover - Fallback to Cover Art Archive if unavailable
API endpoints:
https://webservice.fanart.tv/v3/music/{mbid}https://webservice.fanart.tv/v3/music/albums/{mbid}
TheAudioDbProvider
Mixins: ArtistOverviewMixin, ArtistImagesMixin, AlbumImagesMixin, ArtistLinksMixin
Purpose: Fallback provider for images and metadata
Implementation:
- REST API with API key "1" (public key)
- 10-second timeout
- Used as fallback when FanArt.tv or Wikipedia unavailable
- Provides artist biographies, images, and social media links
API endpoints:
https://theaudiodb.com/api/v1/json/1/artist-mb.php?i={mbid}https://theaudiodb.com/api/v1/json/1/album-mb.php?i={mbid}
WikipediaProvider
Mixins: ArtistOverviewMixin
Purpose: Artist biographical information
Implementation:
- Multi-stage lookup: MusicBrainz → Wikidata → Wikipedia
- 32-language fallback chain (en, fr, de, es, it, ja, zh, ru, pt, nl, sv, fi, no, da, pl, cs, hu, ro, tr, el, he, ar, fa, hi, th, ko, vi, id, ms, tl, bn, ta)
- BeautifulSoup HTML parsing
- 2-second timeout per request
- 1 connection per host limit
- Extracts first paragraph as overview
Lookup flow:
MusicBrainz MBID → Wikidata entity → Wikipedia article → Extract summary
SpotifyProvider
Mixins: ArtistByIdMixin, AlbumByIdMixin, ArtistLinksMixin
Purpose: Spotify ID mapping and cross-platform linking
Implementation:
- spotipy library with OAuth authentication
- Levenshtein distance matching (0.8 threshold) for name-based lookups
- Provides Spotify URIs for deep linking
- Used for chart data correlation
Authentication flow:
- Client credentials OAuth
- Token refresh on expiration
- Tokens cached in Redis
Layer Responsibilities
1. Quart Application Layer (app.py)
Responsibilities:
- HTTP request routing
- Request parameter validation
- Response serialization
- Cache-Control header management
- Error handling and HTTP status codes
Key routes:
@app.route('/')
async def root():
# Health check and version info
@app.route('/artist/<mbid>')
async def get_artist(mbid):
# Artist metadata endpoint
@app.route('/artist/<mbid>/refresh', methods=['POST'])
async def refresh_artist(mbid):
# Cache invalidation endpoint
@app.route('/search/artist')
async def search_artist():
# Artist search endpoint
@app.route('/chart/<name>/<type>/<selection>')
async def get_chart(name, type, selection):
# Chart data endpoint
Request flow:
- Parse and validate request parameters
- Call API layer method
- Set Cache-Control headers based on response metadata
- Serialize response to JSON
- Return HTTP response
2. API Layer (api.py)
Responsibilities:
- Business logic orchestration
- Provider coordination
- Data aggregation from multiple sources
- Cache management
- Response formatting
Key methods:
async def get_artist_by_id(mbid, prim_types, sec_types, release_statuses):
# 1. Check cache
# 2. Query MusicBrainz DB
# 3. Parallel fetch: images, overview, links
# 4. Aggregate data
# 5. Cache result
# 6. Return formatted response
async def search_artist(query):
# 1. Check cache
# 2. Query Solr
# 3. Enrich results with cached metadata
# 4. Cache search results
# 5. Return formatted response
Parallel fetching pattern:
# Fetch multiple data sources concurrently
images_task = asyncio.create_task(fanart_provider.get_artist_images(mbid))
overview_task = asyncio.create_task(wikipedia_provider.get_artist_overview(mbid))
links_task = asyncio.create_task(spotify_provider.get_artist_links(mbid))
images = await images_task
overview = await overview_task
links = await links_task
3. Provider Layer (provider.py)
Responsibilities:
- Data source abstraction
- External API communication
- Error handling and retries
- Timeout management
- Data transformation to common format
Provider composition:
class MetadataProvider(
MusicbrainzDbProvider,
SolrSearchProvider,
FanArtTvProvider,
TheAudioDbProvider,
WikipediaProvider,
SpotifyProvider
):
"""Composite provider with all capabilities"""
pass
Fallback chain example:
async def get_artist_images(self, mbid):
try:
# Primary: FanArt.tv
return await self.fanart.get_artist_images(mbid)
except (TimeoutError, HTTPError):
# Fallback: TheAudioDB
return await self.theaudiodb.get_artist_images(mbid)
except Exception:
# Last resort: Cover Art Archive
return await self.cover_art_archive.get_artist_images(mbid)
4. Cache Layer (cache.py)
Responsibilities:
- Multi-tier cache management
- Cache key generation
- TTL management
- Compression/decompression
- Cache invalidation
- Statistics tracking
Cache tiers:
Tier 1: Redis (Ephemeral)
Configuration:
- Namespace:
lm3.7 - Memory limit: 512MB
- Eviction policy: LFU (Least Frequently Used)
- Default TTL: 7 days
Use cases:
- Hot data (frequently accessed artists/albums)
- Rate limiter state
- Sentry deduplication
- Invalidation locks
Implementation:
class RedisCache:
async def get(self, key):
value = await self.redis.get(f"lm3.7:{key}")
if value:
return pickle.loads(zlib.decompress(value))
return None
async def set(self, key, value, ttl=604800): # 7 days
compressed = zlib.compress(pickle.dumps(value))
await self.redis.setex(f"lm3.7:{key}", ttl, compressed)
Tier 2: PostgreSQL (Persistent)
Schema:
CREATE TABLE IF NOT EXISTS {cache_name} (
key VARCHAR(255) PRIMARY KEY,
expires TIMESTAMP,
updated TIMESTAMP DEFAULT NOW(),
value BYTEA -- zlib compressed pickle
);
CREATE TRIGGER update_timestamp
BEFORE UPDATE ON {cache_name}
FOR EACH ROW
EXECUTE FUNCTION update_updated_column();
Auto-created tables:
artist: Artist metadata cachealbum: Album metadata cachespotify: Spotify lookup cachefanart: FanArt.tv image cachetadb: TheAudioDB metadata cachewikipedia: Wikipedia overview cache
Use cases:
- Long-term storage for all metadata
- Fallback when Redis evicts data
- Historical data for analytics
- Compressed storage (10:1 ratio typical)
Implementation:
class PostgresCache:
async def get(self, key):
row = await self.conn.fetchrow(
f"SELECT value, expires FROM {self.table} WHERE key = $1",
key
)
if row and (not row['expires'] or row['expires'] > datetime.now()):
return pickle.loads(zlib.decompress(row['value']))
return None
async def set(self, key, value, ttl=None):
compressed = zlib.compress(pickle.dumps(value))
expires = datetime.now() + timedelta(seconds=ttl) if ttl else None
await self.conn.execute(
f"""
INSERT INTO {self.table} (key, value, expires)
VALUES ($1, $2, $3)
ON CONFLICT (key) DO UPDATE
SET value = $2, expires = $3, updated = NOW()
""",
key, compressed, expires
)
Tier 3: Cloudflare CDN (Edge)
Configuration:
- Cache-Control header:
s-maxage=2592000, max-age=0 - Programmatic purge via Cloudflare API
- Batch purge: 30 URLs per request
Use cases:
- Global edge caching for popular artists/albums
- Reduced origin load
- Low-latency responses worldwide
Cache invalidation:
async def invalidate_cdn_cache(urls):
# Batch URLs into groups of 30
for batch in chunks(urls, 30):
await cloudflare_client.purge_cache(batch)
Data Flow Patterns
Artist Metadata Request Flow
1. Client → GET /artist/{mbid}
2. app.py → Validate MBID format
3. api.py → Check Redis cache
4. [CACHE MISS] → Check PostgreSQL cache
5. [CACHE MISS] → Query MusicBrainz DB (artist.sql)
6. api.py → Parallel fetch:
- FanArt.tv images
- Wikipedia overview
- Spotify links
- TheAudioDB metadata (fallback)
7. api.py → Aggregate data into Artist object
8. api.py → Store in PostgreSQL cache
9. api.py → Store in Redis cache
10. app.py → Set Cache-Control headers
11. app.py → Return JSON response
12. Cloudflare → Cache at edge
Search Request Flow
1. Client → GET /search/artist?query=nirvana
2. app.py → Validate query parameter
3. api.py → Check Redis cache (key: search:artist:nirvana)
4. [CACHE MISS] → Query Solr artist core
5. Solr → Return list of MBIDs with scores
6. api.py → For each result:
- Check cache for artist metadata
- [CACHE MISS] → Fetch from MusicBrainz DB
7. api.py → Aggregate search results
8. api.py → Cache search results (TTL: 1 hour)
9. app.py → Return JSON response
Cache Invalidation Flow
1. Crawler → Detect updated artist (updated_artists.sql)
2. Crawler → POST /artist/{mbid}/refresh
3. api.py → Verify INVALIDATE_APIKEY
4. api.py → Delete from Redis cache
5. api.py → Delete from PostgreSQL cache
6. api.py → Purge Cloudflare CDN cache
7. api.py → Return 200 OK
8. Next request → Cache miss → Fresh fetch
Background Crawler Architecture
The crawler runs independently of the API server and proactively warms the cache.
Crawler Types
1. Wikipedia Overview Crawler
Purpose: Pre-fetch artist biographies
Implementation:
async def crawl_wikipedia_overviews():
# Get recently updated artists
artists = await get_updated_artists(limit=100)
for artist in artists:
# Check if overview already cached
if not await cache.exists(f"wikipedia:{artist['mbid']}"):
# Fetch and cache overview
overview = await wikipedia_provider.get_artist_overview(artist['mbid'])
await cache.set(f"wikipedia:{artist['mbid']}", overview, ttl=2592000)
await asyncio.sleep(1) # Rate limiting
2. FanArt.tv Image Crawler
Purpose: Pre-fetch artist and album images
Implementation:
async def crawl_fanart_images():
artists = await get_updated_artists(limit=100)
for artist in artists:
if not await cache.exists(f"fanart:artist:{artist['mbid']}"):
images = await fanart_provider.get_artist_images(artist['mbid'])
await cache.set(f"fanart:artist:{artist['mbid']}", images, ttl=2592000)
await asyncio.sleep(2) # FanArt.tv rate limit
3. TheAudioDB Metadata Crawler
Purpose: Pre-fetch fallback metadata
Implementation:
async def crawl_theaudiodb_metadata():
artists = await get_updated_artists(limit=100)
for artist in artists:
if not await cache.exists(f"tadb:{artist['mbid']}"):
metadata = await theaudiodb_provider.get_artist_metadata(artist['mbid'])
await cache.set(f"tadb:{artist['mbid']}", metadata, ttl=2592000)
await asyncio.sleep(1)
4. Artist Metadata Crawler
Purpose: Pre-fetch complete artist metadata
Implementation:
async def crawl_artist_metadata():
artists = await get_updated_artists(limit=50)
for artist in artists:
# Fetch complete artist metadata (triggers all providers)
metadata = await api.get_artist_by_id(artist['mbid'])
# Already cached by get_artist_by_id
await asyncio.sleep(5) # Avoid overwhelming external APIs
5. Album Metadata Crawler
Purpose: Pre-fetch album metadata for recently updated albums
Implementation:
async def crawl_album_metadata():
albums = await get_updated_albums(limit=50)
for album in albums:
metadata = await api.get_album_by_id(album['mbid'])
await asyncio.sleep(5)
Crawler Scheduling
Crawlers run on configurable schedules:
# crawler.py
async def run_crawlers():
while True:
# Run all crawlers in parallel
await asyncio.gather(
crawl_wikipedia_overviews(),
crawl_fanart_images(),
crawl_theaudiodb_metadata(),
crawl_artist_metadata(),
crawl_album_metadata()
)
# Sleep between cycles
await asyncio.sleep(3600) # 1 hour
MusicBrainz Database Integration
Direct Database Access
Unlike most MusicBrainz consumers, this project queries the database directly rather than using the web API.
Advantages:
- Complex joins and aggregations in SQL
- No rate limiting
- Sub-second response times
- JSON aggregation in database
Disadvantages:
- Requires full MusicBrainz database replica (~100GB+)
- Must maintain custom indices
- Schema changes require SQL updates
SQL Query Architecture
Artist Query (artist.sql)
Purpose: Fetch complete artist metadata with releases
Key features:
- JSON aggregation of releases
- Filtering by release type and status
- Link extraction
- Cover art availability check
Query structure:
SELECT
a.gid AS id,
a.name AS artist_name,
a.sort_name,
a.disambiguation,
a.type AS artist_type,
-- Aggregate releases as JSON array
COALESCE(
json_agg(
json_build_object(
'id', rg.gid,
'title', rg.name,
'type', rgt.name,
'status', rs.name,
'date', rd.date_year || '-' || rd.date_month || '-' || rd.date_day
)
ORDER BY rd.date_year DESC, rd.date_month DESC, rd.date_day DESC
) FILTER (WHERE rg.id IS NOT NULL),
'[]'::json
) AS releases,
-- Aggregate links as JSON array
COALESCE(
json_agg(
json_build_object(
'type', lt.name,
'url', u.url
)
) FILTER (WHERE u.id IS NOT NULL),
'[]'::json
) AS links
FROM artist a
LEFT JOIN release_group rg ON rg.artist_credit = a.id
LEFT JOIN release_group_type rgt ON rg.type = rgt.id
LEFT JOIN release r ON r.release_group = rg.id
LEFT JOIN release_status rs ON r.status = rs.id
LEFT JOIN l_artist_url lau ON lau.entity0 = a.id
LEFT JOIN url u ON lau.entity1 = u.id
LEFT JOIN link_type lt ON lau.link = lt.id
WHERE a.gid = $1
GROUP BY a.id;
Album Query (album.sql)
Purpose: Fetch album metadata with tracks
Key features:
- Track listing aggregation
- Medium information (CD, Vinyl, Digital)
- Recording metadata
- Cover art availability
Query structure:
SELECT
rg.gid AS id,
rg.name AS title,
a.name AS artist_name,
-- Aggregate media as JSON array
COALESCE(
json_agg(
json_build_object(
'position', m.position,
'format', mf.name,
'tracks', (
SELECT json_agg(
json_build_object(
'position', t.position,
'title', t.name,
'duration', t.length
)
ORDER BY t.position
)
FROM track t
WHERE t.medium = m.id
)
)
ORDER BY m.position
) FILTER (WHERE m.id IS NOT NULL),
'[]'::json
) AS media
FROM release_group rg
JOIN artist_credit ac ON rg.artist_credit = ac.id
JOIN artist a ON ac.id = a.id
LEFT JOIN release r ON r.release_group = rg.id
LEFT JOIN medium m ON m.release = r.id
LEFT JOIN medium_format mf ON m.format = mf.id
WHERE rg.gid = $1
GROUP BY rg.id, a.name;
Change Detection Queries
updated_artists.sql: Detects recently updated artists
Change sources (UNION of 5 queries):
- Artists with updated metadata
- Artists with new releases
- Artists with updated releases
- Artists with new links
- Artists with updated cover art
Query structure:
-- Source 1: Updated artist metadata
SELECT DISTINCT a.gid, a.last_updated
FROM artist a
WHERE a.last_updated > $1
UNION
-- Source 2: New releases
SELECT DISTINCT a.gid, rg.last_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
WHERE rg.last_updated > $1
UNION
-- Source 3: Updated releases
SELECT DISTINCT a.gid, r.last_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
JOIN release r ON r.release_group = rg.id
WHERE r.last_updated > $1
UNION
-- Source 4: New links
SELECT DISTINCT a.gid, lau.last_updated
FROM artist a
JOIN l_artist_url lau ON lau.entity0 = a.id
WHERE lau.last_updated > $1
UNION
-- Source 5: Updated cover art
SELECT DISTINCT a.gid, caa.date_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
JOIN release r ON r.release_group = rg.id
JOIN cover_art_archive.index_listing caa ON caa.release = r.id
WHERE caa.date_updated > $1
ORDER BY last_updated DESC
LIMIT $2;
updated_albums.sql: Similar structure for album change detection
Custom Database Indices
To support efficient change detection queries:
-- Artist last_updated index
CREATE INDEX IF NOT EXISTS idx_artist_last_updated
ON artist (last_updated DESC);
-- Release group last_updated index
CREATE INDEX IF NOT EXISTS idx_release_group_last_updated
ON release_group (last_updated DESC);
-- Release last_updated index
CREATE INDEX IF NOT EXISTS idx_release_last_updated
ON release (last_updated DESC);
-- Cover art date_updated index
CREATE INDEX IF NOT EXISTS idx_cover_art_date_updated
ON cover_art_archive.index_listing (date_updated DESC);
Configuration Architecture
Metaclass-Based Configuration System
The project uses a sophisticated metaclass-based configuration system that allows environment variable overrides with nested key support.
Base configuration (config.py):
class ConfigMeta(type):
"""Metaclass that allows environment variable overrides"""
def __getattribute__(cls, name):
# Check for environment variable override
env_key = f"{cls.__name__.upper()}_{name.upper()}"
if env_key in os.environ:
return os.environ[env_key]
# Check for nested override (double underscore)
if '__' in name:
parts = name.split('__')
value = super().__getattribute__(parts[0])
for part in parts[1:]:
value = value[part]
return value
return super().__getattribute__(name)
class DefaultConfig(metaclass=ConfigMeta):
# Application
APPLICATION_ROOT = '/'
PORT = 5001
# Database
DATABASE = {
'host': 'localhost',
'port': 5432,
'database': 'musicbrainz',
'user': 'abc',
'password': 'abc'
}
# Cache
CACHE = {
'redis_url': 'redis://localhost:6379/0',
'postgres_url': 'postgresql://abc:abc@localhost/lm_cache'
}
# External APIs
FANART_API_KEY = None
THEAUDIODB_API_KEY = '1'
SPOTIFY_CLIENT_ID = None
SPOTIFY_CLIENT_SECRET = None
LASTFM_API_KEY = None
# Cloudflare
CLOUDFLARE_ZONE_ID = None
CLOUDFLARE_API_TOKEN = None
# Monitoring
SENTRY_DSN = None
STATSD_HOST = None
STATSD_PORT = 8125
Environment variable override examples:
# Simple override
export DEFAULTCONFIG_PORT=8080
# Nested override (double underscore)
export DEFAULTCONFIG_DATABASE__HOST=musicbrainz-db
export DEFAULTCONFIG_CACHE__REDIS_URL=redis://redis:6379/1
# Select configuration class
export LIDARR_METADATA_CONFIG=lidarrmetadata.config.ProductionConfig
Configuration Classes
DevelopmentConfig:
- Debug logging enabled
- Local database connections
- No Sentry
- No rate limiting
TestConfig:
- In-memory SQLite for cache
- Mock external APIs
- Synchronous execution for deterministic tests
ProductionConfig:
- Container-based service discovery
- Sentry enabled
- Redis rate limiting
- Cloudflare CDN integration
Error Handling and Resilience
Provider Timeout Handling
Each provider has configurable timeouts:
class ProviderConfig:
MUSICBRAINZ_TIMEOUT = 30 # Complex SQL queries
SOLR_TIMEOUT = 5 # Search queries
FANART_TIMEOUT = 10 # Image API
THEAUDIODB_TIMEOUT = 10 # Metadata API
WIKIPEDIA_TIMEOUT = 2 # Per-request
SPOTIFY_TIMEOUT = 5 # OAuth + API
Fallback Chain Implementation
async def get_artist_images(self, mbid):
providers = [
(self.fanart, "FanArt.tv"),
(self.theaudiodb, "TheAudioDB"),
(self.cover_art_archive, "Cover Art Archive")
]
for provider, name in providers:
try:
images = await asyncio.wait_for(
provider.get_artist_images(mbid),
timeout=provider.timeout
)
if images:
logger.info(f"Got images from {name}")
return images
except asyncio.TimeoutError:
logger.warning(f"{name} timeout, trying next provider")
except Exception as e:
logger.error(f"{name} error: {e}, trying next provider")
return [] # No images available
Graceful Degradation
When external services fail, the API returns partial data:
{
"Id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"ArtistName": "Nirvana",
"Overview": null, # Wikipedia unavailable
"Images": [], # FanArt.tv and TheAudioDB unavailable
"Links": [...], # MusicBrainz links still available
"Albums": [...] # Core data from MusicBrainz DB
}
Scalability Considerations
Horizontal Scaling
The API is stateless and can be horizontally scaled:
# docker-compose.prod.yml
services:
api-v0.3:
image: ghcr.io/lidarr/lidarrapi.metadata:v0.3
deploy:
replicas: 4
environment:
- CACHE__REDIS_URL=redis://redis:6379/0
Database Connection Pooling
asyncpg connection pool configuration:
pool = await asyncpg.create_pool(
host=config.DATABASE['host'],
port=config.DATABASE['port'],
database=config.DATABASE['database'],
user=config.DATABASE['user'],
password=config.DATABASE['password'],
min_size=10,
max_size=50,
command_timeout=30
)
Redis Connection Pooling
aioredis connection pool:
redis = await aioredis.create_redis_pool(
config.CACHE['redis_url'],
minsize=5,
maxsize=20,
encoding='utf-8'
)
Rate Limiting Architecture
Three rate limiter implementations:
NullRateLimiter (Default)
No rate limiting, maximum throughput.
SimpleRateLimiter
In-memory queue-based rate limiting:
class SimpleRateLimiter:
def __init__(self, max_requests, time_window):
self.max_requests = max_requests
self.time_window = time_window
self.requests = []
async def acquire(self):
now = time.time()
# Remove old requests
self.requests = [r for r in self.requests if r > now - self.time_window]
if len(self.requests) >= self.max_requests:
sleep_time = self.requests[0] + self.time_window - now
await asyncio.sleep(sleep_time)
self.requests.append(now)
RedisRateLimiter
Distributed rate limiting across multiple API instances:
class RedisRateLimiter:
async def acquire(self, key):
now = time.time()
window_key = f"ratelimit:{key}:{int(now / self.time_window)}"
count = await self.redis.incr(window_key)
if count == 1:
await self.redis.expire(window_key, self.time_window)
if count > self.max_requests:
raise RateLimitExceeded(f"Rate limit exceeded for {key}")
Conclusion
The architecture demonstrates several advanced patterns:
- Mixin-based provider composition: Flexible, testable, extensible
- Multi-tier caching: Redis (hot) + PostgreSQL (persistent) + CDN (edge)
- Direct database access: Complex SQL aggregations for performance
- Async-first design: Quart + asyncpg + aioredis for high concurrency
- Fallback chains: Graceful degradation when external services fail
- Background crawling: Proactive cache warming for better UX
- Change detection: Efficient invalidation based on upstream updates
The mixin architecture is particularly elegant, allowing providers to be composed based on capabilities rather than inheritance hierarchies. This makes testing and mocking straightforward.
The three-tier caching strategy with compression achieves excellent hit rates while keeping storage costs reasonable. The crawler ensures popular content is always cached.
Direct MusicBrainz database access with JSON aggregation SQL is a key performance optimization that would be difficult to replicate with the web API.