metadata-agregator/docs/research/lidarr-metadata-api/analysis/ARCHITECTURE.md

# Lidarr Metadata API - Architecture

## Architectural Overview

LidarrAPI.Metadata implements a layered architecture with clear separation of concerns:

```
┌─────────────────────────────────────────────────────────────┐
│                     Cloudflare CDN                          │
│                  (Edge Cache Layer)                         │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                   Quart Application                         │
│                  (app.py - Routes)                          │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                    API Layer                                │
│              (api.py - Business Logic)                      │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                  Provider Layer                             │
│         (provider.py - Mixin Architecture)                  │
│  ┌──────────────┬──────────────┬──────────────────────┐    │
│  │ MusicBrainz  │ Solr Search  │ External Providers   │    │
│  │   DB Direct  │   (Artist/   │ (FanArt, TheAudioDB, │    │
│  │              │    Album)    │  Wikipedia, Spotify) │    │
│  └──────────────┴──────────────┴──────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                   Cache Layer                               │
│         (cache.py - Multi-Tier Caching)                     │
│  ┌──────────────┬──────────────┬──────────────────────┐    │
│  │    Redis     │  PostgreSQL  │   Compression        │    │
│  │  (Ephemeral) │ (Persistent) │   (zlib pickle)      │    │
│  └──────────────┴──────────────┴──────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                            ↓
┌─────────────────────────────────────────────────────────────┐
│                  Data Sources                               │
│  ┌──────────────┬──────────────┬──────────────────────┐    │
│  │ MusicBrainz  │     Solr     │   External APIs      │    │
│  │  PostgreSQL  │   (Search)   │ (15+ integrations)   │    │
│  └──────────────┴──────────────┴──────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
```

## Mixin-Based Provider Architecture

The core architectural pattern is a mixin-based provider system that allows flexible composition of data source capabilities.

### Provider Mixin Hierarchy

```python
# Base capability mixins
class ArtistByIdMixin:
    async def get_artist_by_id(self, mbid: str) -> dict:
        raise NotImplementedError

class ArtistNameSearchMixin:
    async def search_artist_name(self, query: str) -> list:
        raise NotImplementedError

class AlbumByIdMixin:
    async def get_album_by_id(self, mbid: str) -> dict:
        raise NotImplementedError

class AlbumNameSearchMixin:
    async def search_album_name(self, query: str) -> list:
        raise NotImplementedError

class ArtistOverviewMixin:
    async def get_artist_overview(self, mbid: str) -> str:
        raise NotImplementedError

class ArtistImagesMixin:
    async def get_artist_images(self, mbid: str) -> list:
        raise NotImplementedError

class AlbumImagesMixin:
    async def get_album_images(self, mbid: str) -> list:
        raise NotImplementedError

class ArtistLinksMixin:
    async def get_artist_links(self, mbid: str) -> list:
        raise NotImplementedError
```

### Provider Implementations

Each provider implements one or more mixins based on its capabilities:

#### MusicbrainzDbProvider

**Mixins**: `ArtistByIdMixin`, `AlbumByIdMixin`

**Purpose**: Authoritative source for core music metadata

**Implementation**:
- Direct asyncpg connection to MusicBrainz PostgreSQL database
- Complex SQL queries with JSON aggregation (`row_to_json`, `json_agg`)
- Read-only access to replicated database
- Custom indices on `last_updated` columns for change detection

**SQL files**:
- `lidarrmetadata/sql/artist.sql`: Artist metadata with releases
- `lidarrmetadata/sql/album.sql`: Album metadata with tracks
- `lidarrmetadata/sql/updated_artists.sql`: Change detection query
- `lidarrmetadata/sql/updated_albums.sql`: Change detection query

**Key tables accessed**:
- `artist`: Core artist data
- `release_group`: Album groupings
- `release`: Specific releases
- `medium`: Physical/digital media
- `track`: Track listings
- `recording`: Recording metadata
- `url`: External links
- `l_artist_url`: Artist-URL relationships
- `cover_art_archive.index_listing`: Cover art availability

#### SolrSearchProvider

**Mixins**: `ArtistNameSearchMixin`, `AlbumNameSearchMixin`

**Purpose**: Full-text search for artist and album discovery

**Implementation**:
- Async HTTP client to Solr REST API
- Two cores: `artist` and `release-group`
- Dismax query parser for relevance ranking
- 5-second timeout per query
- Real-time index updates via RabbitMQ + SIR (Search Index Rebuilder)

**Query structure**:
```python
{
    "query": query_string,
    "limit": 10,
    "params": {
        "defType": "dismax",
        "qf": "artist^2 sortname alias",
        "mm": "1"
    }
}
```

#### FanArtTvProvider

**Mixins**: `ArtistImagesMixin`, `AlbumImagesMixin`

**Purpose**: High-quality fan art and promotional images

**Implementation**:
- REST API with API key authentication
- 7-day lag for free API keys (personal keys have no lag)
- 30-day cache TTL
- Image types: `poster`, `banner`, `logo`, `fanart`, `cover`
- Fallback to Cover Art Archive if unavailable

**API endpoints**:
- `https://webservice.fanart.tv/v3/music/{mbid}`
- `https://webservice.fanart.tv/v3/music/albums/{mbid}`

#### TheAudioDbProvider

**Mixins**: `ArtistOverviewMixin`, `ArtistImagesMixin`, `AlbumImagesMixin`, `ArtistLinksMixin`

**Purpose**: Fallback provider for images and metadata

**Implementation**:
- REST API with API key "1" (public key)
- 10-second timeout
- Used as fallback when FanArt.tv or Wikipedia unavailable
- Provides artist biographies, images, and social media links

**API endpoints**:
- `https://theaudiodb.com/api/v1/json/1/artist-mb.php?i={mbid}`
- `https://theaudiodb.com/api/v1/json/1/album-mb.php?i={mbid}`

#### WikipediaProvider

**Mixins**: `ArtistOverviewMixin`

**Purpose**: Artist biographical information

**Implementation**:
- Multi-stage lookup: MusicBrainz → Wikidata → Wikipedia
- 32-language fallback chain (en, fr, de, es, it, ja, zh, ru, pt, nl, sv, fi, no, da, pl, cs, hu, ro, tr, el, he, ar, fa, hi, th, ko, vi, id, ms, tl, bn, ta)
- BeautifulSoup HTML parsing
- 2-second timeout per request
- 1 connection per host limit
- Extracts first paragraph as overview

**Lookup flow**:
```
MusicBrainz MBID → Wikidata entity → Wikipedia article → Extract summary
```

#### SpotifyProvider

**Mixins**: `ArtistByIdMixin`, `AlbumByIdMixin`, `ArtistLinksMixin`

**Purpose**: Spotify ID mapping and cross-platform linking

**Implementation**:
- spotipy library with OAuth authentication
- Levenshtein distance matching (0.8 threshold) for name-based lookups
- Provides Spotify URIs for deep linking
- Used for chart data correlation

**Authentication flow**:
- Client credentials OAuth
- Token refresh on expiration
- Tokens cached in Redis

## Layer Responsibilities

### 1. Quart Application Layer (app.py)

**Responsibilities**:
- HTTP request routing
- Request parameter validation
- Response serialization
- Cache-Control header management
- Error handling and HTTP status codes

**Key routes**:
```python
@app.route('/')
async def root():
    # Health check and version info

@app.route('/artist/<mbid>')
async def get_artist(mbid):
    # Artist metadata endpoint

@app.route('/artist/<mbid>/refresh', methods=['POST'])
async def refresh_artist(mbid):
    # Cache invalidation endpoint

@app.route('/search/artist')
async def search_artist():
    # Artist search endpoint

@app.route('/chart/<name>/<type>/<selection>')
async def get_chart(name, type, selection):
    # Chart data endpoint
```

**Request flow**:
1. Parse and validate request parameters
2. Call API layer method
3. Set Cache-Control headers based on response metadata
4. Serialize response to JSON
5. Return HTTP response

### 2. API Layer (api.py)

**Responsibilities**:
- Business logic orchestration
- Provider coordination
- Data aggregation from multiple sources
- Cache management
- Response formatting

**Key methods**:
```python
async def get_artist_by_id(mbid, prim_types, sec_types, release_statuses):
    # 1. Check cache
    # 2. Query MusicBrainz DB
    # 3. Parallel fetch: images, overview, links
    # 4. Aggregate data
    # 5. Cache result
    # 6. Return formatted response

async def search_artist(query):
    # 1. Check cache
    # 2. Query Solr
    # 3. Enrich results with cached metadata
    # 4. Cache search results
    # 5. Return formatted response
```

**Parallel fetching pattern**:
```python
# Fetch multiple data sources concurrently
images_task = asyncio.create_task(fanart_provider.get_artist_images(mbid))
overview_task = asyncio.create_task(wikipedia_provider.get_artist_overview(mbid))
links_task = asyncio.create_task(spotify_provider.get_artist_links(mbid))

images = await images_task
overview = await overview_task
links = await links_task
```

### 3. Provider Layer (provider.py)

**Responsibilities**:
- Data source abstraction
- External API communication
- Error handling and retries
- Timeout management
- Data transformation to common format

**Provider composition**:
```python
class MetadataProvider(
    MusicbrainzDbProvider,
    SolrSearchProvider,
    FanArtTvProvider,
    TheAudioDbProvider,
    WikipediaProvider,
    SpotifyProvider
):
    """Composite provider with all capabilities"""
    pass
```

**Fallback chain example**:
```python
async def get_artist_images(self, mbid):
    try:
        # Primary: FanArt.tv
        return await self.fanart.get_artist_images(mbid)
    except (TimeoutError, HTTPError):
        # Fallback: TheAudioDB
        return await self.theaudiodb.get_artist_images(mbid)
    except Exception:
        # Last resort: Cover Art Archive
        return await self.cover_art_archive.get_artist_images(mbid)
```

### 4. Cache Layer (cache.py)

**Responsibilities**:
- Multi-tier cache management
- Cache key generation
- TTL management
- Compression/decompression
- Cache invalidation
- Statistics tracking

**Cache tiers**:

#### Tier 1: Redis (Ephemeral)

**Configuration**:
- Namespace: `lm3.7`
- Memory limit: 512MB
- Eviction policy: LFU (Least Frequently Used)
- Default TTL: 7 days

**Use cases**:
- Hot data (frequently accessed artists/albums)
- Rate limiter state
- Sentry deduplication
- Invalidation locks

**Implementation**:
```python
class RedisCache:
    async def get(self, key):
        value = await self.redis.get(f"lm3.7:{key}")
        if value:
            return pickle.loads(zlib.decompress(value))
        return None

    async def set(self, key, value, ttl=604800):  # 7 days
        compressed = zlib.compress(pickle.dumps(value))
        await self.redis.setex(f"lm3.7:{key}", ttl, compressed)
```

#### Tier 2: PostgreSQL (Persistent)

**Schema**:
```sql
CREATE TABLE IF NOT EXISTS {cache_name} (
    key VARCHAR(255) PRIMARY KEY,
    expires TIMESTAMP,
    updated TIMESTAMP DEFAULT NOW(),
    value BYTEA  -- zlib compressed pickle
);

CREATE TRIGGER update_timestamp
    BEFORE UPDATE ON {cache_name}
    FOR EACH ROW
    EXECUTE FUNCTION update_updated_column();
```

**Auto-created tables**:
- `artist`: Artist metadata cache
- `album`: Album metadata cache
- `spotify`: Spotify lookup cache
- `fanart`: FanArt.tv image cache
- `tadb`: TheAudioDB metadata cache
- `wikipedia`: Wikipedia overview cache

**Use cases**:
- Long-term storage for all metadata
- Fallback when Redis evicts data
- Historical data for analytics
- Compressed storage (10:1 ratio typical)

**Implementation**:
```python
class PostgresCache:
    async def get(self, key):
        row = await self.conn.fetchrow(
            f"SELECT value, expires FROM {self.table} WHERE key = $1",
            key
        )
        if row and (not row['expires'] or row['expires'] > datetime.now()):
            return pickle.loads(zlib.decompress(row['value']))
        return None

    async def set(self, key, value, ttl=None):
        compressed = zlib.compress(pickle.dumps(value))
        expires = datetime.now() + timedelta(seconds=ttl) if ttl else None
        await self.conn.execute(
            f"""
            INSERT INTO {self.table} (key, value, expires)
            VALUES ($1, $2, $3)
            ON CONFLICT (key) DO UPDATE
            SET value = $2, expires = $3, updated = NOW()
            """,
            key, compressed, expires
        )
```

#### Tier 3: Cloudflare CDN (Edge)

**Configuration**:
- Cache-Control header: `s-maxage=2592000, max-age=0`
- Programmatic purge via Cloudflare API
- Batch purge: 30 URLs per request

**Use cases**:
- Global edge caching for popular artists/albums
- Reduced origin load
- Low-latency responses worldwide

**Cache invalidation**:
```python
async def invalidate_cdn_cache(urls):
    # Batch URLs into groups of 30
    for batch in chunks(urls, 30):
        await cloudflare_client.purge_cache(batch)
```

## Data Flow Patterns

### Artist Metadata Request Flow

```
1. Client → GET /artist/{mbid}
2. app.py → Validate MBID format
3. api.py → Check Redis cache
4. [CACHE MISS] → Check PostgreSQL cache
5. [CACHE MISS] → Query MusicBrainz DB (artist.sql)
6. api.py → Parallel fetch:
   - FanArt.tv images
   - Wikipedia overview
   - Spotify links
   - TheAudioDB metadata (fallback)
7. api.py → Aggregate data into Artist object
8. api.py → Store in PostgreSQL cache
9. api.py → Store in Redis cache
10. app.py → Set Cache-Control headers
11. app.py → Return JSON response
12. Cloudflare → Cache at edge
```

### Search Request Flow

```
1. Client → GET /search/artist?query=nirvana
2. app.py → Validate query parameter
3. api.py → Check Redis cache (key: search:artist:nirvana)
4. [CACHE MISS] → Query Solr artist core
5. Solr → Return list of MBIDs with scores
6. api.py → For each result:
   - Check cache for artist metadata
   - [CACHE MISS] → Fetch from MusicBrainz DB
7. api.py → Aggregate search results
8. api.py → Cache search results (TTL: 1 hour)
9. app.py → Return JSON response
```

### Cache Invalidation Flow

```
1. Crawler → Detect updated artist (updated_artists.sql)
2. Crawler → POST /artist/{mbid}/refresh
3. api.py → Verify INVALIDATE_APIKEY
4. api.py → Delete from Redis cache
5. api.py → Delete from PostgreSQL cache
6. api.py → Purge Cloudflare CDN cache
7. api.py → Return 200 OK
8. Next request → Cache miss → Fresh fetch
```

## Background Crawler Architecture

The crawler runs independently of the API server and proactively warms the cache.

### Crawler Types

#### 1. Wikipedia Overview Crawler

**Purpose**: Pre-fetch artist biographies

**Implementation**:
```python
async def crawl_wikipedia_overviews():
    # Get recently updated artists
    artists = await get_updated_artists(limit=100)

    for artist in artists:
        # Check if overview already cached
        if not await cache.exists(f"wikipedia:{artist['mbid']}"):
            # Fetch and cache overview
            overview = await wikipedia_provider.get_artist_overview(artist['mbid'])
            await cache.set(f"wikipedia:{artist['mbid']}", overview, ttl=2592000)

        await asyncio.sleep(1)  # Rate limiting
```

#### 2. FanArt.tv Image Crawler

**Purpose**: Pre-fetch artist and album images

**Implementation**:
```python
async def crawl_fanart_images():
    artists = await get_updated_artists(limit=100)

    for artist in artists:
        if not await cache.exists(f"fanart:artist:{artist['mbid']}"):
            images = await fanart_provider.get_artist_images(artist['mbid'])
            await cache.set(f"fanart:artist:{artist['mbid']}", images, ttl=2592000)

        await asyncio.sleep(2)  # FanArt.tv rate limit
```

#### 3. TheAudioDB Metadata Crawler

**Purpose**: Pre-fetch fallback metadata

**Implementation**:
```python
async def crawl_theaudiodb_metadata():
    artists = await get_updated_artists(limit=100)

    for artist in artists:
        if not await cache.exists(f"tadb:{artist['mbid']}"):
            metadata = await theaudiodb_provider.get_artist_metadata(artist['mbid'])
            await cache.set(f"tadb:{artist['mbid']}", metadata, ttl=2592000)

        await asyncio.sleep(1)
```

#### 4. Artist Metadata Crawler

**Purpose**: Pre-fetch complete artist metadata

**Implementation**:
```python
async def crawl_artist_metadata():
    artists = await get_updated_artists(limit=50)

    for artist in artists:
        # Fetch complete artist metadata (triggers all providers)
        metadata = await api.get_artist_by_id(artist['mbid'])
        # Already cached by get_artist_by_id

        await asyncio.sleep(5)  # Avoid overwhelming external APIs
```

#### 5. Album Metadata Crawler

**Purpose**: Pre-fetch album metadata for recently updated albums

**Implementation**:
```python
async def crawl_album_metadata():
    albums = await get_updated_albums(limit=50)

    for album in albums:
        metadata = await api.get_album_by_id(album['mbid'])
        await asyncio.sleep(5)
```

### Crawler Scheduling

Crawlers run on configurable schedules:

```python
# crawler.py
async def run_crawlers():
    while True:
        # Run all crawlers in parallel
        await asyncio.gather(
            crawl_wikipedia_overviews(),
            crawl_fanart_images(),
            crawl_theaudiodb_metadata(),
            crawl_artist_metadata(),
            crawl_album_metadata()
        )

        # Sleep between cycles
        await asyncio.sleep(3600)  # 1 hour
```

## MusicBrainz Database Integration

### Direct Database Access

Unlike most MusicBrainz consumers, this project queries the database directly rather than using the web API.

**Advantages**:
- Complex joins and aggregations in SQL
- No rate limiting
- Sub-second response times
- JSON aggregation in database

**Disadvantages**:
- Requires full MusicBrainz database replica (~100GB+)
- Must maintain custom indices
- Schema changes require SQL updates

### SQL Query Architecture

#### Artist Query (artist.sql)

**Purpose**: Fetch complete artist metadata with releases

**Key features**:
- JSON aggregation of releases
- Filtering by release type and status
- Link extraction
- Cover art availability check

**Query structure**:
```sql
SELECT
    a.gid AS id,
    a.name AS artist_name,
    a.sort_name,
    a.disambiguation,
    a.type AS artist_type,
    -- Aggregate releases as JSON array
    COALESCE(
        json_agg(
            json_build_object(
                'id', rg.gid,
                'title', rg.name,
                'type', rgt.name,
                'status', rs.name,
                'date', rd.date_year || '-' || rd.date_month || '-' || rd.date_day
            )
            ORDER BY rd.date_year DESC, rd.date_month DESC, rd.date_day DESC
        ) FILTER (WHERE rg.id IS NOT NULL),
        '[]'::json
    ) AS releases,
    -- Aggregate links as JSON array
    COALESCE(
        json_agg(
            json_build_object(
                'type', lt.name,
                'url', u.url
            )
        ) FILTER (WHERE u.id IS NOT NULL),
        '[]'::json
    ) AS links
FROM artist a
LEFT JOIN release_group rg ON rg.artist_credit = a.id
LEFT JOIN release_group_type rgt ON rg.type = rgt.id
LEFT JOIN release r ON r.release_group = rg.id
LEFT JOIN release_status rs ON r.status = rs.id
LEFT JOIN l_artist_url lau ON lau.entity0 = a.id
LEFT JOIN url u ON lau.entity1 = u.id
LEFT JOIN link_type lt ON lau.link = lt.id
WHERE a.gid = $1
GROUP BY a.id;
```

#### Album Query (album.sql)

**Purpose**: Fetch album metadata with tracks

**Key features**:
- Track listing aggregation
- Medium information (CD, Vinyl, Digital)
- Recording metadata
- Cover art availability

**Query structure**:
```sql
SELECT
    rg.gid AS id,
    rg.name AS title,
    a.name AS artist_name,
    -- Aggregate media as JSON array
    COALESCE(
        json_agg(
            json_build_object(
                'position', m.position,
                'format', mf.name,
                'tracks', (
                    SELECT json_agg(
                        json_build_object(
                            'position', t.position,
                            'title', t.name,
                            'duration', t.length
                        )
                        ORDER BY t.position
                    )
                    FROM track t
                    WHERE t.medium = m.id
                )
            )
            ORDER BY m.position
        ) FILTER (WHERE m.id IS NOT NULL),
        '[]'::json
    ) AS media
FROM release_group rg
JOIN artist_credit ac ON rg.artist_credit = ac.id
JOIN artist a ON ac.id = a.id
LEFT JOIN release r ON r.release_group = rg.id
LEFT JOIN medium m ON m.release = r.id
LEFT JOIN medium_format mf ON m.format = mf.id
WHERE rg.gid = $1
GROUP BY rg.id, a.name;
```

#### Change Detection Queries

**updated_artists.sql**: Detects recently updated artists

**Change sources** (UNION of 5 queries):
1. Artists with updated metadata
2. Artists with new releases
3. Artists with updated releases
4. Artists with new links
5. Artists with updated cover art

**Query structure**:
```sql
-- Source 1: Updated artist metadata
SELECT DISTINCT a.gid, a.last_updated
FROM artist a
WHERE a.last_updated > $1

UNION

-- Source 2: New releases
SELECT DISTINCT a.gid, rg.last_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
WHERE rg.last_updated > $1

UNION

-- Source 3: Updated releases
SELECT DISTINCT a.gid, r.last_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
JOIN release r ON r.release_group = rg.id
WHERE r.last_updated > $1

UNION

-- Source 4: New links
SELECT DISTINCT a.gid, lau.last_updated
FROM artist a
JOIN l_artist_url lau ON lau.entity0 = a.id
WHERE lau.last_updated > $1

UNION

-- Source 5: Updated cover art
SELECT DISTINCT a.gid, caa.date_updated
FROM artist a
JOIN release_group rg ON rg.artist_credit = a.id
JOIN release r ON r.release_group = rg.id
JOIN cover_art_archive.index_listing caa ON caa.release = r.id
WHERE caa.date_updated > $1

ORDER BY last_updated DESC
LIMIT $2;
```

**updated_albums.sql**: Similar structure for album change detection

### Custom Database Indices

To support efficient change detection queries:

```sql
-- Artist last_updated index
CREATE INDEX IF NOT EXISTS idx_artist_last_updated
ON artist (last_updated DESC);

-- Release group last_updated index
CREATE INDEX IF NOT EXISTS idx_release_group_last_updated
ON release_group (last_updated DESC);

-- Release last_updated index
CREATE INDEX IF NOT EXISTS idx_release_last_updated
ON release (last_updated DESC);

-- Cover art date_updated index
CREATE INDEX IF NOT EXISTS idx_cover_art_date_updated
ON cover_art_archive.index_listing (date_updated DESC);
```

## Configuration Architecture

### Metaclass-Based Configuration System

The project uses a sophisticated metaclass-based configuration system that allows environment variable overrides with nested key support.

**Base configuration** (config.py):
```python
class ConfigMeta(type):
    """Metaclass that allows environment variable overrides"""

    def __getattribute__(cls, name):
        # Check for environment variable override
        env_key = f"{cls.__name__.upper()}_{name.upper()}"
        if env_key in os.environ:
            return os.environ[env_key]

        # Check for nested override (double underscore)
        if '__' in name:
            parts = name.split('__')
            value = super().__getattribute__(parts[0])
            for part in parts[1:]:
                value = value[part]
            return value

        return super().__getattribute__(name)

class DefaultConfig(metaclass=ConfigMeta):
    # Application
    APPLICATION_ROOT = '/'
    PORT = 5001

    # Database
    DATABASE = {
        'host': 'localhost',
        'port': 5432,
        'database': 'musicbrainz',
        'user': 'abc',
        'password': 'abc'
    }

    # Cache
    CACHE = {
        'redis_url': 'redis://localhost:6379/0',
        'postgres_url': 'postgresql://abc:abc@localhost/lm_cache'
    }

    # External APIs
    FANART_API_KEY = None
    THEAUDIODB_API_KEY = '1'
    SPOTIFY_CLIENT_ID = None
    SPOTIFY_CLIENT_SECRET = None
    LASTFM_API_KEY = None

    # Cloudflare
    CLOUDFLARE_ZONE_ID = None
    CLOUDFLARE_API_TOKEN = None

    # Monitoring
    SENTRY_DSN = None
    STATSD_HOST = None
    STATSD_PORT = 8125
```

**Environment variable override examples**:
```bash
# Simple override
export DEFAULTCONFIG_PORT=8080

# Nested override (double underscore)
export DEFAULTCONFIG_DATABASE__HOST=musicbrainz-db
export DEFAULTCONFIG_CACHE__REDIS_URL=redis://redis:6379/1

# Select configuration class
export LIDARR_METADATA_CONFIG=lidarrmetadata.config.ProductionConfig
```

### Configuration Classes

**DevelopmentConfig**:
- Debug logging enabled
- Local database connections
- No Sentry
- No rate limiting

**TestConfig**:
- In-memory SQLite for cache
- Mock external APIs
- Synchronous execution for deterministic tests

**ProductionConfig**:
- Container-based service discovery
- Sentry enabled
- Redis rate limiting
- Cloudflare CDN integration

## Error Handling and Resilience

### Provider Timeout Handling

Each provider has configurable timeouts:

```python
class ProviderConfig:
    MUSICBRAINZ_TIMEOUT = 30  # Complex SQL queries
    SOLR_TIMEOUT = 5          # Search queries
    FANART_TIMEOUT = 10       # Image API
    THEAUDIODB_TIMEOUT = 10   # Metadata API
    WIKIPEDIA_TIMEOUT = 2     # Per-request
    SPOTIFY_TIMEOUT = 5       # OAuth + API
```

### Fallback Chain Implementation

```python
async def get_artist_images(self, mbid):
    providers = [
        (self.fanart, "FanArt.tv"),
        (self.theaudiodb, "TheAudioDB"),
        (self.cover_art_archive, "Cover Art Archive")
    ]

    for provider, name in providers:
        try:
            images = await asyncio.wait_for(
                provider.get_artist_images(mbid),
                timeout=provider.timeout
            )
            if images:
                logger.info(f"Got images from {name}")
                return images
        except asyncio.TimeoutError:
            logger.warning(f"{name} timeout, trying next provider")
        except Exception as e:
            logger.error(f"{name} error: {e}, trying next provider")

    return []  # No images available
```

### Graceful Degradation

When external services fail, the API returns partial data:

```python
{
    "Id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
    "ArtistName": "Nirvana",
    "Overview": null,  # Wikipedia unavailable
    "Images": [],      # FanArt.tv and TheAudioDB unavailable
    "Links": [...],    # MusicBrainz links still available
    "Albums": [...]    # Core data from MusicBrainz DB
}
```

## Scalability Considerations

### Horizontal Scaling

The API is stateless and can be horizontally scaled:

```yaml
# docker-compose.prod.yml
services:
  api-v0.3:
    image: ghcr.io/lidarr/lidarrapi.metadata:v0.3
    deploy:
      replicas: 4
    environment:
      - CACHE__REDIS_URL=redis://redis:6379/0
```

### Database Connection Pooling

asyncpg connection pool configuration:

```python
pool = await asyncpg.create_pool(
    host=config.DATABASE['host'],
    port=config.DATABASE['port'],
    database=config.DATABASE['database'],
    user=config.DATABASE['user'],
    password=config.DATABASE['password'],
    min_size=10,
    max_size=50,
    command_timeout=30
)
```

### Redis Connection Pooling

aioredis connection pool:

```python
redis = await aioredis.create_redis_pool(
    config.CACHE['redis_url'],
    minsize=5,
    maxsize=20,
    encoding='utf-8'
)
```

### Rate Limiting Architecture

Three rate limiter implementations:

#### NullRateLimiter (Default)

No rate limiting, maximum throughput.

#### SimpleRateLimiter

In-memory queue-based rate limiting:

```python
class SimpleRateLimiter:
    def __init__(self, max_requests, time_window):
        self.max_requests = max_requests
        self.time_window = time_window
        self.requests = []

    async def acquire(self):
        now = time.time()
        # Remove old requests
        self.requests = [r for r in self.requests if r > now - self.time_window]

        if len(self.requests) >= self.max_requests:
            sleep_time = self.requests[0] + self.time_window - now
            await asyncio.sleep(sleep_time)

        self.requests.append(now)
```

#### RedisRateLimiter

Distributed rate limiting across multiple API instances:

```python
class RedisRateLimiter:
    async def acquire(self, key):
        now = time.time()
        window_key = f"ratelimit:{key}:{int(now / self.time_window)}"

        count = await self.redis.incr(window_key)
        if count == 1:
            await self.redis.expire(window_key, self.time_window)

        if count > self.max_requests:
            raise RateLimitExceeded(f"Rate limit exceeded for {key}")
```

## Conclusion

The architecture demonstrates several advanced patterns:

1. **Mixin-based provider composition**: Flexible, testable, extensible
2. **Multi-tier caching**: Redis (hot) + PostgreSQL (persistent) + CDN (edge)
3. **Direct database access**: Complex SQL aggregations for performance
4. **Async-first design**: Quart + asyncpg + aioredis for high concurrency
5. **Fallback chains**: Graceful degradation when external services fail
6. **Background crawling**: Proactive cache warming for better UX
7. **Change detection**: Efficient invalidation based on upstream updates

The mixin architecture is particularly elegant, allowing providers to be composed based on capabilities rather than inheritance hierarchies. This makes testing and mocking straightforward.

The three-tier caching strategy with compression achieves excellent hit rates while keeping storage costs reasonable. The crawler ensures popular content is always cached.

Direct MusicBrainz database access with JSON aggregation SQL is a key performance optimization that would be difficult to replicate with the web API.