Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

32 KiB

Raw Blame History

Lidarr Metadata API - Architecture

Architectural Overview

LidarrAPI.Metadata implements a layered architecture with clear separation of concerns:

┌─────────────────────────────────────────────────────────────┐
│                     Cloudflare CDN                          │
│                  (Edge Cache Layer)                         │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                   Quart Application                         │
│                  (app.py - Routes)                          │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                    API Layer                                │
│              (api.py - Business Logic)                      │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                  Provider Layer                             │
│         (provider.py - Mixin Architecture)                  │
│  ┌──────────────┬──────────────┬──────────────────────┐    │
│  │ MusicBrainz  │ Solr Search  │ External Providers   │    │
│  │   DB Direct  │   (Artist/   │ (FanArt, TheAudioDB, │    │
│  │              │    Album)    │  Wikipedia, Spotify) │    │
│  └──────────────┴──────────────┴──────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                   Cache Layer                               │
│         (cache.py - Multi-Tier Caching)                     │
│  ┌──────────────┬──────────────┬──────────────────────┐    │
│  │    Redis     │  PostgreSQL  │   Compression        │    │
│  │  (Ephemeral) │ (Persistent) │   (zlib pickle)      │    │
│  └──────────────┴──────────────┴──────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                  Data Sources                               │
│  ┌──────────────┬──────────────┬──────────────────────┐    │
│  │ MusicBrainz  │     Solr     │   External APIs      │    │
│  │  PostgreSQL  │   (Search)   │ (15+ integrations)   │    │
│  └──────────────┴──────────────┴──────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Mixin-Based Provider Architecture

The core architectural pattern is a mixin-based provider system that allows flexible composition of data source capabilities.

Provider Mixin Hierarchy

# Base capability mixins
class ArtistByIdMixin:
    async def get_artist_by_id(self, mbid: str) -> dict:
        raise NotImplementedError

class ArtistNameSearchMixin:
    async def search_artist_name(self, query: str) -> list:
        raise NotImplementedError

class AlbumByIdMixin:
    async def get_album_by_id(self, mbid: str) -> dict:
        raise NotImplementedError

class AlbumNameSearchMixin:
    async def search_album_name(self, query: str) -> list:
        raise NotImplementedError

class ArtistOverviewMixin:
    async def get_artist_overview(self, mbid: str) -> str:
        raise NotImplementedError

class ArtistImagesMixin:
    async def get_artist_images(self, mbid: str) -> list:
        raise NotImplementedError

class AlbumImagesMixin:
    async def get_album_images(self, mbid: str) -> list:
        raise NotImplementedError

class ArtistLinksMixin:
    async def get_artist_links(self, mbid: str) -> list:
        raise NotImplementedError

Provider Implementations

Each provider implements one or more mixins based on its capabilities:

MusicbrainzDbProvider

Mixins: ArtistByIdMixin, AlbumByIdMixin

Purpose: Authoritative source for core music metadata

Implementation:

Direct asyncpg connection to MusicBrainz PostgreSQL database
Complex SQL queries with JSON aggregation (row_to_json, json_agg)
Read-only access to replicated database
Custom indices on last_updated columns for change detection

SQL files:

lidarrmetadata/sql/artist.sql: Artist metadata with releases
lidarrmetadata/sql/album.sql: Album metadata with tracks
lidarrmetadata/sql/updated_artists.sql: Change detection query
lidarrmetadata/sql/updated_albums.sql: Change detection query

Key tables accessed:

artist: Core artist data
release_group: Album groupings
release: Specific releases
medium: Physical/digital media
track: Track listings
recording: Recording metadata
url: External links
l_artist_url: Artist-URL relationships
cover_art_archive.index_listing: Cover art availability

SolrSearchProvider

Mixins: ArtistNameSearchMixin, AlbumNameSearchMixin

Purpose: Full-text search for artist and album discovery

Implementation:

Async HTTP client to Solr REST API
Two cores: artist and release-group
Dismax query parser for relevance ranking
5-second timeout per query
Real-time index updates via RabbitMQ + SIR (Search Index Rebuilder)

Query structure:

{
    "query": query_string,
    "limit": 10,
    "params": {
        "defType": "dismax",
        "qf": "artist^2 sortname alias",
        "mm": "1"
    }
}

FanArtTvProvider

Mixins: ArtistImagesMixin, AlbumImagesMixin

Purpose: High-quality fan art and promotional images

Implementation:

REST API with API key authentication
7-day lag for free API keys (personal keys have no lag)
30-day cache TTL
Image types: poster, banner, logo, fanart, cover
Fallback to Cover Art Archive if unavailable

API endpoints:

https://webservice.fanart.tv/v3/music/{mbid}
https://webservice.fanart.tv/v3/music/albums/{mbid}

TheAudioDbProvider

Mixins: ArtistOverviewMixin, ArtistImagesMixin, AlbumImagesMixin, ArtistLinksMixin

Purpose: Fallback provider for images and metadata

Implementation:

REST API with API key "1" (public key)
10-second timeout
Used as fallback when FanArt.tv or Wikipedia unavailable
Provides artist biographies, images, and social media links

API endpoints:

https://theaudiodb.com/api/v1/json/1/artist-mb.php?i={mbid}
https://theaudiodb.com/api/v1/json/1/album-mb.php?i={mbid}

WikipediaProvider

Mixins: ArtistOverviewMixin

Purpose: Artist biographical information

Implementation:

Multi-stage lookup: MusicBrainz → Wikidata → Wikipedia
32-language fallback chain (en, fr, de, es, it, ja, zh, ru, pt, nl, sv, fi, no, da, pl, cs, hu, ro, tr, el, he, ar, fa, hi, th, ko, vi, id, ms, tl, bn, ta)
BeautifulSoup HTML parsing
2-second timeout per request
1 connection per host limit
Extracts first paragraph as overview

Lookup flow:

MusicBrainz MBID → Wikidata entity → Wikipedia article → Extract summary

SpotifyProvider

Mixins: ArtistByIdMixin, AlbumByIdMixin, ArtistLinksMixin

Purpose: Spotify ID mapping and cross-platform linking

Implementation:

spotipy library with OAuth authentication
Levenshtein distance matching (0.8 threshold) for name-based lookups
Provides Spotify URIs for deep linking
Used for chart data correlation

Authentication flow:

Client credentials OAuth
Token refresh on expiration
Tokens cached in Redis

Layer Responsibilities

1. Quart Application Layer (app.py)

Responsibilities:

HTTP request routing
Request parameter validation
Response serialization
Cache-Control header management
Error handling and HTTP status codes

Key routes:

@app.route('/')
async def root():
    # Health check and version info

@app.route('/artist/<mbid>')
async def get_artist(mbid):
    # Artist metadata endpoint

@app.route('/artist/<mbid>/refresh', methods=['POST'])
async def refresh_artist(mbid):
    # Cache invalidation endpoint

@app.route('/search/artist')
async def search_artist():
    # Artist search endpoint

@app.route('/chart/<name>/<type>/<selection>')
async def get_chart(name, type, selection):
    # Chart data endpoint

Request flow:

Parse and validate request parameters
Call API layer method
Set Cache-Control headers based on response metadata
Serialize response to JSON
Return HTTP response

2. API Layer (api.py)

Responsibilities:

Business logic orchestration
Provider coordination
Data aggregation from multiple sources
Cache management
Response formatting

Key methods:

async def get_artist_by_id(mbid, prim_types, sec_types, release_statuses):
    # 1. Check cache
    # 2. Query MusicBrainz DB
    # 3. Parallel fetch: images, overview, links
    # 4. Aggregate data
    # 5. Cache result
    # 6. Return formatted response

async def search_artist(query):
    # 1. Check cache
    # 2. Query Solr
    # 3. Enrich results with cached metadata
    # 4. Cache search results
    # 5. Return formatted response

Parallel fetching pattern:

# Fetch multiple data sources concurrently
images_task = asyncio.create_task(fanart_provider.get_artist_images(mbid))
overview_task = asyncio.create_task(wikipedia_provider.get_artist_overview(mbid))
links_task = asyncio.create_task(spotify_provider.get_artist_links(mbid))

images = await images_task
overview = await overview_task
links = await links_task

3. Provider Layer (provider.py)

Responsibilities:

Data source abstraction
External API communication
Error handling and retries
Timeout management
Data transformation to common format

Provider composition:

class MetadataProvider(
    MusicbrainzDbProvider,
    SolrSearchProvider,
    FanArtTvProvider,
    TheAudioDbProvider,
    WikipediaProvider,
    SpotifyProvider
):
    """Composite provider with all capabilities"""
    pass

Fallback chain example:

async def get_artist_images(self, mbid):
    try:
        # Primary: FanArt.tv
        return await self.fanart.get_artist_images(mbid)
    except (TimeoutError, HTTPError):
        # Fallback: TheAudioDB
        return await self.theaudiodb.get_artist_images(mbid)
    except Exception:
        # Last resort: Cover Art Archive
        return await self.cover_art_archive.get_artist_images(mbid)

4. Cache Layer (cache.py)

Responsibilities:

Multi-tier cache management
Cache key generation
TTL management
Compression/decompression
Cache invalidation
Statistics tracking

Cache tiers:

Tier 1: Redis (Ephemeral)

Configuration:

Namespace: lm3.7
Memory limit: 512MB
Eviction policy: LFU (Least Frequently Used)
Default TTL: 7 days

Use cases:

Hot data (frequently accessed artists/albums)
Rate limiter state
Sentry deduplication
Invalidation locks

Implementation:

class RedisCache:
    async def get(self, key):
        value = await self.redis.get(f"lm3.7:{key}")
        if value:
            return pickle.loads(zlib.decompress(value))
        return None

    async def set(self, key, value, ttl=604800):  # 7 days
        compressed = zlib.compress(pickle.dumps(value))
        await self.redis.setex(f"lm3.7:{key}", ttl, compressed)

Tier 2: PostgreSQL (Persistent)

Schema:

CREATE TABLE IF NOT EXISTS {cache_name} (
    key VARCHAR(255) PRIMARY KEY,
    expires TIMESTAMP,
    updated TIMESTAMP DEFAULT NOW(),
    value BYTEA  -- zlib compressed pickle
);

CREATE TRIGGER update_timestamp
    BEFORE UPDATE ON {cache_name}
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_column();

Auto-created tables:

artist: Artist metadata cache
album: Album metadata cache
spotify: Spotify lookup cache
fanart: FanArt.tv image cache
tadb: TheAudioDB metadata cache
wikipedia: Wikipedia overview cache

Use cases:

Long-term storage for all metadata
Fallback when Redis evicts data
Historical data for analytics
Compressed storage (10:1 ratio typical)

Implementation:

class PostgresCache:
    async def get(self, key):
        row = await self.conn.fetchrow(
            f"SELECT value, expires FROM {self.table} WHERE key = $1",
            key
        )
        if row and (not row['expires'] or row['expires'] > datetime.now()):
            return pickle.loads(zlib.decompress(row['value']))
        return None

    async def set(self, key, value, ttl=None):
        compressed = zlib.compress(pickle.dumps(value))
        expires = datetime.now() + timedelta(seconds=ttl) if ttl else None
        await self.conn.execute(
            f"""
            INSERT INTO {self.table} (key, value, expires)
            VALUES ($1, $2, $3)
            ON CONFLICT (key) DO UPDATE
            SET value = $2, expires = $3, updated = NOW()
            """,
            key, compressed, expires
        )

Tier 3: Cloudflare CDN (Edge)

Configuration:

Cache-Control header: s-maxage=2592000, max-age=0
Programmatic purge via Cloudflare API
Batch purge: 30 URLs per request

Use cases:

Global edge caching for popular artists/albums
Reduced origin load
Low-latency responses worldwide

Cache invalidation:

async def invalidate_cdn_cache(urls):
    # Batch URLs into groups of 30
    for batch in chunks(urls, 30):
        await cloudflare_client.purge_cache(batch)

Data Flow Patterns

Artist Metadata Request Flow

1. Client → GET /artist/{mbid}
2. app.py → Validate MBID format
3. api.py → Check Redis cache
4. [CACHE MISS] → Check PostgreSQL cache
5. [CACHE MISS] → Query MusicBrainz DB (artist.sql)
6. api.py → Parallel fetch:
   - FanArt.tv images
   - Wikipedia overview
   - Spotify links
   - TheAudioDB metadata (fallback)
7. api.py → Aggregate data into Artist object
8. api.py → Store in PostgreSQL cache
9. api.py → Store in Redis cache
10. app.py → Set Cache-Control headers
11. app.py → Return JSON response
12. Cloudflare → Cache at edge

Search Request Flow

1. Client → GET /search/artist?query=nirvana
2. app.py → Validate query parameter
3. api.py → Check Redis cache (key: search:artist:nirvana)
4. [CACHE MISS] → Query Solr artist core
5. Solr → Return list of MBIDs with scores
6. api.py → For each result:
   - Check cache for artist metadata
   - [CACHE MISS] → Fetch from MusicBrainz DB
7. api.py → Aggregate search results
8. api.py → Cache search results (TTL: 1 hour)
9. app.py → Return JSON response

Cache Invalidation Flow

1. Crawler → Detect updated artist (updated_artists.sql)
2. Crawler → POST /artist/{mbid}/refresh
3. api.py → Verify INVALIDATE_APIKEY
4. api.py → Delete from Redis cache
5. api.py → Delete from PostgreSQL cache
6. api.py → Purge Cloudflare CDN cache
7. api.py → Return 200 OK
8. Next request → Cache miss → Fresh fetch

Background Crawler Architecture

The crawler runs independently of the API server and proactively warms the cache.

Crawler Types

1. Wikipedia Overview Crawler

Purpose: Pre-fetch artist biographies

Implementation:

async def crawl_wikipedia_overviews():
    # Get recently updated artists
    artists = await get_updated_artists(limit=100)
    
    for artist in artists:
        # Check if overview already cached
        if not await cache.exists(f"wikipedia:{artist['mbid']}"):
            # Fetch and cache overview
            overview = await wikipedia_provider.get_artist_overview(artist['mbid'])
            await cache.set(f"wikipedia:{artist['mbid']}", overview, ttl=2592000)
        
        await asyncio.sleep(1)  # Rate limiting

2. FanArt.tv Image Crawler

Purpose: Pre-fetch artist and album images

Implementation:

async def crawl_fanart_images():
    artists = await get_updated_artists(limit=100)
    
    for artist in artists:
        if not await cache.exists(f"fanart:artist:{artist['mbid']}"):
            images = await fanart_provider.get_artist_images(artist['mbid'])
            await cache.set(f"fanart:artist:{artist['mbid']}", images, ttl=2592000)
        
        await asyncio.sleep(2)  # FanArt.tv rate limit

3. TheAudioDB Metadata Crawler

Purpose: Pre-fetch fallback metadata

Implementation:

async def crawl_theaudiodb_metadata():
    artists = await get_updated_artists(limit=100)
    
    for artist in artists:
        if not await cache.exists(f"tadb:{artist['mbid']}"):
            metadata = await theaudiodb_provider.get_artist_metadata(artist['mbid'])
            await cache.set(f"tadb:{artist['mbid']}", metadata, ttl=2592000)
        
        await asyncio.sleep(1)

4. Artist Metadata Crawler

Purpose: Pre-fetch complete artist metadata

Implementation:

async def crawl_artist_metadata():
    artists = await get_updated_artists(limit=50)
    
    for artist in artists:
        # Fetch complete artist metadata (triggers all providers)
        metadata = await api.get_artist_by_id(artist['mbid'])
        # Already cached by get_artist_by_id
        
        await asyncio.sleep(5)  # Avoid overwhelming external APIs

5. Album Metadata Crawler

Purpose: Pre-fetch album metadata for recently updated albums

Implementation:

async def crawl_album_metadata():
    albums = await get_updated_albums(limit=50)
    
    for album in albums:
        metadata = await api.get_album_by_id(album['mbid'])
        await asyncio.sleep(5)

Crawler Scheduling

Crawlers run on configurable schedules:

# crawler.py
async def run_crawlers():
    while True:
        # Run all crawlers in parallel
        await asyncio.gather(
            crawl_wikipedia_overviews(),
            crawl_fanart_images(),
            crawl_theaudiodb_metadata(),
            crawl_artist_metadata(),
            crawl_album_metadata()
        )
        
        # Sleep between cycles
        await asyncio.sleep(3600)  # 1 hour

MusicBrainz Database Integration

Direct Database Access

Unlike most MusicBrainz consumers, this project queries the database directly rather than using the web API.

Advantages:

Complex joins and aggregations in SQL
No rate limiting
Sub-second response times
JSON aggregation in database

Disadvantages:

Requires full MusicBrainz database replica (~100GB+)
Must maintain custom indices
Schema changes require SQL updates

SQL Query Architecture

Artist Query (artist.sql)

Purpose: Fetch complete artist metadata with releases

Key features:

JSON aggregation of releases
Filtering by release type and status
Link extraction
Cover art availability check

Query structure:

SELECT
    a.gid AS id,
    a.name AS artist_name,
    a.sort_name,
    a.disambiguation,
    a.type AS artist_type,
    -- Aggregate releases as JSON array
    COALESCE(
        json_agg(
            json_build_object(
                'id', rg.gid,
                'title', rg.name,
                'type', rgt.name,
                'status', rs.name,
                'date', rd.date_year || '-' || rd.date_month || '-' || rd.date_day
            )
            ORDER BY rd.date_year DESC, rd.date_month DESC, rd.date_day DESC
        ) FILTER (WHERE rg.id IS NOT NULL),
        '[]'::json
    ) AS releases,
    -- Aggregate links as JSON array
    COALESCE(
        json_agg(
            json_build_object(
                'type', lt.name,
                'url', u.url
            )
        ) FILTER (WHERE u.id IS NOT NULL),
        '[]'::json
    ) AS links
FROM artist a
LEFT JOIN release_group rg ON rg.artist_credit = a.id
LEFT JOIN release_group_type rgt ON rg.type = rgt.id
LEFT JOIN release r ON r.release_group = rg.id
LEFT JOIN release_status rs ON r.status = rs.id
LEFT JOIN l_artist_url lau ON lau.entity0 = a.id
LEFT JOIN url u ON lau.entity1 = u.id
LEFT JOIN link_type lt ON lau.link = lt.id
WHERE a.gid = $1
GROUP BY a.id;

Album Query (album.sql)

Purpose: Fetch album metadata with tracks

Key features:

Track listing aggregation
Medium information (CD, Vinyl, Digital)
Recording metadata
Cover art availability

Query structure:

SELECT
    rg.gid AS id,
    rg.name AS title,
    a.name AS artist_name,
    -- Aggregate media as JSON array
    COALESCE(
        json_agg(
            json_build_object(
                'position', m.position,
                'format', mf.name,
                'tracks', (
                    SELECT json_agg(
                        json_build_object(
                            'position', t.position,
                            'title', t.name,
                            'duration', t.length
                        )
                        ORDER BY t.position
                    )
                    FROM track t
                    WHERE t.medium = m.id
                )
            )
            ORDER BY m.position
        ) FILTER (WHERE m.id IS NOT NULL),
        '[]'::json
    ) AS media
FROM release_group rg
JOIN artist_credit ac ON rg.artist_credit = ac.id
JOIN artist a ON ac.id = a.id
LEFT JOIN release r ON r.release_group = rg.id
LEFT JOIN medium m ON m.release = r.id
LEFT JOIN medium_format mf ON m.format = mf.id
WHERE rg.gid = $1
GROUP BY rg.id, a.name;

Change Detection Queries

updated_artists.sql: Detects recently updated artists

Change sources (UNION of 5 queries):

Artists with updated metadata
Artists with new releases
Artists with updated releases
Artists with new links
Artists with updated cover art

Query structure:

-- Source 1: Updated artist metadata
SELECT DISTINCT a.gid, a.last_updated
FROM artist a
WHERE a.last_updated > $1

UNION

-- Source 2: New releases
SELECT DISTINCT a.gid, rg.last_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
WHERE rg.last_updated > $1

UNION

-- Source 3: Updated releases
SELECT DISTINCT a.gid, r.last_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
JOIN release r ON r.release_group = rg.id
WHERE r.last_updated > $1

UNION

-- Source 4: New links
SELECT DISTINCT a.gid, lau.last_updated
FROM artist a
JOIN l_artist_url lau ON lau.entity0 = a.id
WHERE lau.last_updated > $1

UNION

-- Source 5: Updated cover art
SELECT DISTINCT a.gid, caa.date_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
JOIN release r ON r.release_group = rg.id
JOIN cover_art_archive.index_listing caa ON caa.release = r.id
WHERE caa.date_updated > $1

ORDER BY last_updated DESC
LIMIT $2;

updated_albums.sql: Similar structure for album change detection

Custom Database Indices

To support efficient change detection queries:

-- Artist last_updated index
CREATE INDEX IF NOT EXISTS idx_artist_last_updated
ON artist (last_updated DESC);

-- Release group last_updated index
CREATE INDEX IF NOT EXISTS idx_release_group_last_updated
ON release_group (last_updated DESC);

-- Release last_updated index
CREATE INDEX IF NOT EXISTS idx_release_last_updated
ON release (last_updated DESC);

-- Cover art date_updated index
CREATE INDEX IF NOT EXISTS idx_cover_art_date_updated
ON cover_art_archive.index_listing (date_updated DESC);

Configuration Architecture

Metaclass-Based Configuration System

The project uses a sophisticated metaclass-based configuration system that allows environment variable overrides with nested key support.

Base configuration (config.py):

class ConfigMeta(type):
    """Metaclass that allows environment variable overrides"""
    
    def __getattribute__(cls, name):
        # Check for environment variable override
        env_key = f"{cls.__name__.upper()}_{name.upper()}"
        if env_key in os.environ:
            return os.environ[env_key]
        
        # Check for nested override (double underscore)
        if '__' in name:
            parts = name.split('__')
            value = super().__getattribute__(parts[0])
            for part in parts[1:]:
                value = value[part]
            return value
        
        return super().__getattribute__(name)

class DefaultConfig(metaclass=ConfigMeta):
    # Application
    APPLICATION_ROOT = '/'
    PORT = 5001
    
    # Database
    DATABASE = {
        'host': 'localhost',
        'port': 5432,
        'database': 'musicbrainz',
        'user': 'abc',
        'password': 'abc'
    }
    
    # Cache
    CACHE = {
        'redis_url': 'redis://localhost:6379/0',
        'postgres_url': 'postgresql://abc:abc@localhost/lm_cache'
    }
    
    # External APIs
    FANART_API_KEY = None
    THEAUDIODB_API_KEY = '1'
    SPOTIFY_CLIENT_ID = None
    SPOTIFY_CLIENT_SECRET = None
    LASTFM_API_KEY = None
    
    # Cloudflare
    CLOUDFLARE_ZONE_ID = None
    CLOUDFLARE_API_TOKEN = None
    
    # Monitoring
    SENTRY_DSN = None
    STATSD_HOST = None
    STATSD_PORT = 8125

Environment variable override examples:

# Simple override
export DEFAULTCONFIG_PORT=8080

# Nested override (double underscore)
export DEFAULTCONFIG_DATABASE__HOST=musicbrainz-db
export DEFAULTCONFIG_CACHE__REDIS_URL=redis://redis:6379/1

# Select configuration class
export LIDARR_METADATA_CONFIG=lidarrmetadata.config.ProductionConfig

Configuration Classes

DevelopmentConfig:

Debug logging enabled
Local database connections
No Sentry
No rate limiting

TestConfig:

In-memory SQLite for cache
Mock external APIs
Synchronous execution for deterministic tests

ProductionConfig:

Container-based service discovery
Sentry enabled
Redis rate limiting
Cloudflare CDN integration

Error Handling and Resilience

Provider Timeout Handling

Each provider has configurable timeouts:

class ProviderConfig:
    MUSICBRAINZ_TIMEOUT = 30  # Complex SQL queries
    SOLR_TIMEOUT = 5          # Search queries
    FANART_TIMEOUT = 10       # Image API
    THEAUDIODB_TIMEOUT = 10   # Metadata API
    WIKIPEDIA_TIMEOUT = 2     # Per-request
    SPOTIFY_TIMEOUT = 5       # OAuth + API

Fallback Chain Implementation

async def get_artist_images(self, mbid):
    providers = [
        (self.fanart, "FanArt.tv"),
        (self.theaudiodb, "TheAudioDB"),
        (self.cover_art_archive, "Cover Art Archive")
    ]
    
    for provider, name in providers:
        try:
            images = await asyncio.wait_for(
                provider.get_artist_images(mbid),
                timeout=provider.timeout
            )
            if images:
                logger.info(f"Got images from {name}")
                return images
        except asyncio.TimeoutError:
            logger.warning(f"{name} timeout, trying next provider")
        except Exception as e:
            logger.error(f"{name} error: {e}, trying next provider")
    
    return []  # No images available

Graceful Degradation

When external services fail, the API returns partial data:

{
    "Id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
    "ArtistName": "Nirvana",
    "Overview": null,  # Wikipedia unavailable
    "Images": [],      # FanArt.tv and TheAudioDB unavailable
    "Links": [...],    # MusicBrainz links still available
    "Albums": [...]    # Core data from MusicBrainz DB
}

Scalability Considerations

Horizontal Scaling

The API is stateless and can be horizontally scaled:

# docker-compose.prod.yml
services:
  api-v0.3:
    image: ghcr.io/lidarr/lidarrapi.metadata:v0.3
    deploy:
      replicas: 4
    environment:
      - CACHE__REDIS_URL=redis://redis:6379/0

Database Connection Pooling

asyncpg connection pool configuration:

pool = await asyncpg.create_pool(
    host=config.DATABASE['host'],
    port=config.DATABASE['port'],
    database=config.DATABASE['database'],
    user=config.DATABASE['user'],
    password=config.DATABASE['password'],
    min_size=10,
    max_size=50,
    command_timeout=30
)

Redis Connection Pooling

aioredis connection pool:

redis = await aioredis.create_redis_pool(
    config.CACHE['redis_url'],
    minsize=5,
    maxsize=20,
    encoding='utf-8'
)

Rate Limiting Architecture

Three rate limiter implementations:

NullRateLimiter (Default)

No rate limiting, maximum throughput.

SimpleRateLimiter

In-memory queue-based rate limiting:

class SimpleRateLimiter:
    def __init__(self, max_requests, time_window):
        self.max_requests = max_requests
        self.time_window = time_window
        self.requests = []
    
    async def acquire(self):
        now = time.time()
        # Remove old requests
        self.requests = [r for r in self.requests if r > now - self.time_window]
        
        if len(self.requests) >= self.max_requests:
            sleep_time = self.requests[0] + self.time_window - now
            await asyncio.sleep(sleep_time)
        
        self.requests.append(now)

RedisRateLimiter

Distributed rate limiting across multiple API instances:

class RedisRateLimiter:
    async def acquire(self, key):
        now = time.time()
        window_key = f"ratelimit:{key}:{int(now / self.time_window)}"
        
        count = await self.redis.incr(window_key)
        if count == 1:
            await self.redis.expire(window_key, self.time_window)
        
        if count > self.max_requests:
            raise RateLimitExceeded(f"Rate limit exceeded for {key}")

Conclusion

The architecture demonstrates several advanced patterns:

Mixin-based provider composition: Flexible, testable, extensible
Multi-tier caching: Redis (hot) + PostgreSQL (persistent) + CDN (edge)
Direct database access: Complex SQL aggregations for performance
Async-first design: Quart + asyncpg + aioredis for high concurrency
Fallback chains: Graceful degradation when external services fail
Background crawling: Proactive cache warming for better UX
Change detection: Efficient invalidation based on upstream updates

The mixin architecture is particularly elegant, allowing providers to be composed based on capabilities rather than inheritance hierarchies. This makes testing and mocking straightforward.

The three-tier caching strategy with compression achieves excellent hit rates while keeping storage costs reasonable. The crawler ensures popular content is always cached.

Direct MusicBrainz database access with JSON aggregation SQL is a key performance optimization that would be difficult to replicate with the web API.

32 KiB Raw Blame History

Lidarr Metadata API - Architecture

Architectural Overview

Mixin-Based Provider Architecture

Provider Mixin Hierarchy

Provider Implementations

MusicbrainzDbProvider

SolrSearchProvider

FanArtTvProvider

TheAudioDbProvider

WikipediaProvider

SpotifyProvider

Layer Responsibilities

1. Quart Application Layer (app.py)

2. API Layer (api.py)

3. Provider Layer (provider.py)

4. Cache Layer (cache.py)

Tier 1: Redis (Ephemeral)

Tier 2: PostgreSQL (Persistent)

Tier 3: Cloudflare CDN (Edge)

Data Flow Patterns

Artist Metadata Request Flow

Search Request Flow

Cache Invalidation Flow

Background Crawler Architecture

Crawler Types

1. Wikipedia Overview Crawler

2. FanArt.tv Image Crawler

3. TheAudioDB Metadata Crawler

4. Artist Metadata Crawler

5. Album Metadata Crawler

Crawler Scheduling

MusicBrainz Database Integration

Direct Database Access

SQL Query Architecture

Artist Query (artist.sql)

Album Query (album.sql)

Change Detection Queries

Custom Database Indices

Configuration Architecture

Metaclass-Based Configuration System

Configuration Classes

Error Handling and Resilience

Provider Timeout Handling

Fallback Chain Implementation

Graceful Degradation

Scalability Considerations

Horizontal Scaling

Database Connection Pooling

Redis Connection Pooling

Rate Limiting Architecture

NullRateLimiter (Default)

SimpleRateLimiter

RedisRateLimiter

Conclusion

32 KiB

Raw Blame History