Files
metadata-agregator/docs/research/lidarr-metadata-api/analysis/CODEBASE.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

28 KiB

Lidarr Metadata API - Codebase Analysis

Project Structure

LidarrAPI.Metadata/
├── lidarrmetadata/              # Main package
│   ├── __init__.py              # Version and package metadata
│   ├── server.py                # API server entry point
│   ├── crawler.py               # Background crawler entry point
│   ├── app.py                   # Quart application factory + routes
│   ├── api.py                   # Business logic layer
│   ├── provider.py              # Provider mixin architecture
│   ├── cache.py                 # Multi-tier cache implementation
│   ├── config.py                # Configuration metaclass system
│   ├── util.py                  # Utility functions
│   ├── sql/                     # MusicBrainz SQL queries
│   │   ├── artist.sql
│   │   ├── album.sql
│   │   ├── updated_artists.sql
│   │   └── updated_albums.sql
│   └── providers/               # Individual provider implementations
│       ├── __init__.py
│       ├── musicbrainz_db.py
│       ├── solr_search.py
│       ├── fanart.py
│       ├── theaudiodb.py
│       ├── wikipedia.py
│       └── spotify.py
├── tests/                       # Test suite
│   ├── __init__.py
│   ├── test_config.py           # Configuration tests (152 lines)
│   ├── test_provider.py         # Provider tests
│   ├── test_cache.py            # Cache tests
│   ├── test_api.py              # API endpoint tests
│   ├── test_util.py             # Utility function tests
│   └── test_app.py              # Application tests
├── docker/                      # Docker configuration
│   ├── Dockerfile
│   ├── docker-compose.yml
│   ├── docker-compose.dev.yml
│   ├── docker-compose.prod.yml
│   └── docker-compose.crawler.yml
├── scripts/                     # Deployment scripts
│   ├── init-db.sh
│   ├── setup-amqp.sh
│   ├── create-indices.sql
│   └── backup.sh
├── pyproject.toml               # Poetry dependencies
├── poetry.lock                  # Locked dependencies
├── azure-pipelines.yml          # CI/CD configuration
├── sonar-project.properties     # SonarCloud configuration
├── README.md                    # Project documentation
└── LICENSE                      # GPL-3.0 license

Configuration System

Metaclass-Based Configuration

File: lidarrmetadata/config.py

Design pattern: Metaclass with environment variable override support

Implementation:

import os
from typing import Any

class ConfigMeta(type):
    """Metaclass that allows environment variable overrides"""
    
    def __getattribute__(cls, name: str) -> Any:
        # Skip special attributes
        if name.startswith('_'):
            return super().__getattribute__(name)
        
        # Check for environment variable override
        env_key = f"{cls.__name__.upper()}_{name.upper()}"
        if env_key in os.environ:
            value = os.environ[env_key]
            
            # Type conversion
            original = super().__getattribute__(name)
            if isinstance(original, bool):
                return value.lower() in ('true', '1', 'yes')
            elif isinstance(original, int):
                return int(value)
            elif isinstance(original, float):
                return float(value)
            else:
                return value
        
        # Check for nested override (double underscore)
        if '__' in name:
            parts = name.split('__')
            value = super().__getattribute__(parts[0])
            for part in parts[1:]:
                if isinstance(value, dict):
                    value = value[part]
                else:
                    value = getattr(value, part)
            return value
        
        return super().__getattribute__(name)

Configuration Classes

DefaultConfig

Base configuration with sensible defaults:

class DefaultConfig(metaclass=ConfigMeta):
    """Default configuration"""
    
    # Application
    APPLICATION_ROOT = '/'
    PORT = 5001
    DEBUG = False
    TESTING = False
    
    # Database
    DATABASE = {
        'host': 'localhost',
        'port': 5432,
        'database': 'musicbrainz_db',
        'user': 'abc',
        'password': 'abc',
        'min_pool_size': 10,
        'max_pool_size': 50,
        'command_timeout': 30
    }
    
    # Cache
    CACHE = {
        'redis_url': 'redis://localhost:6379/0',
        'postgres_url': 'postgresql://abc:abc@localhost/lm_cache_db',
        'namespace': 'lm3.7',
        'default_ttl': 604800,  # 7 days
        'max_memory': '512mb'
    }
    
    # Solr
    SOLR = {
        'url': 'http://localhost:8983/solr',
        'artist_core': 'artist',
        'album_core': 'release-group',
        'timeout': 5,
        'rows': 10
    }
    
    # RabbitMQ
    RABBITMQ = {
        'host': 'localhost',
        'port': 5672,
        'user': 'abc',
        'password': 'abc',
        'exchange': 'search.index',
        'artist_queue': 'search.index.artist',
        'album_queue': 'search.index.album'
    }
    
    # External APIs
    FANART_API_KEY = None
    THEAUDIODB_API_KEY = '1'
    SPOTIFY_CLIENT_ID = None
    SPOTIFY_CLIENT_SECRET = None
    LASTFM_API_KEY = None
    LASTFM_API_SECRET = None
    
    # Wikipedia
    WIKIPEDIA = {
        'timeout': 2,
        'max_connections_per_host': 1,
        'user_agent': 'LidarrMetadataAPI/10.0.0 (https://github.com/Lidarr/LidarrAPI.Metadata)',
        'languages': ['en', 'fr', 'de', 'es', 'it', 'ja', 'zh', 'ru', 'pt', 'nl', 'sv', 'fi', 'no', 'da', 'pl', 'cs', 'hu', 'ro', 'tr', 'el', 'he', 'ar', 'fa', 'hi', 'th', 'ko', 'vi', 'id', 'ms', 'tl', 'bn', 'ta']
    }
    
    # Cloudflare
    CLOUDFLARE_ZONE_ID = None
    CLOUDFLARE_API_TOKEN = None
    
    # Monitoring
    SENTRY_DSN = None
    SENTRY_ENVIRONMENT = 'development'
    STATSD_HOST = None
    STATSD_PORT = 8125
    
    # Rate Limiting
    RATE_LIMITER = 'null'  # 'null', 'simple', 'redis'
    RATE_LIMIT_REQUESTS = 100
    RATE_LIMIT_WINDOW = 60
    
    # Crawler
    CRAWLER_INTERVAL = 3600  # 1 hour
    CRAWLER_BATCH_SIZE = 100
    CRAWLER_INVALIDATE_ONLY = False
    
    # Security
    INVALIDATE_APIKEY = 'replaceme'
    CORS_ORIGINS = ['*']

DevelopmentConfig

Development-specific overrides:

class DevelopmentConfig(DefaultConfig):
    """Development configuration"""
    
    DEBUG = True
    TESTING = False
    
    # Disable Sentry in development
    SENTRY_DSN = None
    
    # Use null rate limiter
    RATE_LIMITER = 'null'
    
    # Shorter cache TTL for testing
    CACHE = {
        **DefaultConfig.CACHE,
        'default_ttl': 300  # 5 minutes
    }

TestConfig

Test-specific configuration:

class TestConfig(DefaultConfig):
    """Test configuration"""
    
    DEBUG = False
    TESTING = True
    
    # In-memory SQLite for cache
    CACHE = {
        'redis_url': 'redis://localhost:6379/15',  # Separate DB
        'postgres_url': 'sqlite:///:memory:',
        'namespace': 'test',
        'default_ttl': 60
    }
    
    # Mock external APIs
    FANART_API_KEY = 'test-key'
    SPOTIFY_CLIENT_ID = 'test-client-id'
    SPOTIFY_CLIENT_SECRET = 'test-client-secret'
    
    # Disable Sentry
    SENTRY_DSN = None

ProductionConfig

Production-specific configuration:

class ProductionConfig(DefaultConfig):
    """Production configuration"""
    
    DEBUG = False
    TESTING = False
    
    # Use Redis rate limiter
    RATE_LIMITER = 'redis'
    
    # Longer cache TTL
    CACHE = {
        **DefaultConfig.CACHE,
        'default_ttl': 2592000  # 30 days
    }
    
    # Enable Sentry
    SENTRY_ENVIRONMENT = 'production'

Configuration Selection

Environment variable:

export LIDARR_METADATA_CONFIG=lidarrmetadata.config.ProductionConfig

Loading configuration:

import os
import importlib

def load_config():
    """Load configuration from environment variable"""
    config_path = os.environ.get(
        'LIDARR_METADATA_CONFIG',
        'lidarrmetadata.config.DefaultConfig'
    )
    
    module_path, class_name = config_path.rsplit('.', 1)
    module = importlib.import_module(module_path)
    config_class = getattr(module, class_name)
    
    return config_class

Environment Variable Override Examples

Simple override:

export DEFAULTCONFIG_PORT=8080
export DEFAULTCONFIG_DEBUG=true

Nested override (double underscore):

export DEFAULTCONFIG_DATABASE__HOST=musicbrainz-db
export DEFAULTCONFIG_DATABASE__PORT=5433
export DEFAULTCONFIG_CACHE__REDIS_URL=redis://redis:6379/1

Type conversion:

  • Booleans: true, 1, yesTrue
  • Integers: "5001"5001
  • Floats: "3.14"3.14
  • Strings: No conversion

Logging System

Logger Configuration

File: lidarrmetadata/util.py

Implementation:

import logging
import sys

def setup_logging(config):
    """Configure logging for application"""
    
    # Root logger
    root_logger = logging.getLogger()
    root_logger.setLevel(logging.DEBUG if config.DEBUG else logging.INFO)
    
    # Console handler
    console_handler = logging.StreamHandler(sys.stdout)
    console_handler.setLevel(logging.DEBUG if config.DEBUG else logging.INFO)
    
    # Formatter
    formatter = logging.Formatter(
        '%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S'
    )
    console_handler.setFormatter(formatter)
    
    root_logger.addHandler(console_handler)
    
    # Suppress noisy loggers
    logging.getLogger('asyncio').setLevel(logging.WARNING)
    logging.getLogger('aiohttp').setLevel(logging.WARNING)
    logging.getLogger('urllib3').setLevel(logging.WARNING)

Per-Module Loggers

Usage pattern:

import logging

logger = logging.getLogger(__name__)

# Debug logging
logger.debug(f"Fetching artist {mbid} from cache")

# Info logging
logger.info(f"Artist {mbid} cache miss, querying database")

# Warning logging
logger.warning(f"FanArt.tv timeout for artist {mbid}, using fallback")

# Error logging
logger.error(f"Failed to fetch artist {mbid}: {error}", exc_info=True)

Log Levels

Level Usage Example
DEBUG Detailed diagnostic info Cache key generation, SQL queries
INFO General informational messages Request summaries, cache operations
WARNING Warning messages Provider timeouts, fallback usage
ERROR Error messages Unhandled exceptions, data inconsistencies
CRITICAL Critical errors Database connection failures

Sentry Integration

Initialization

File: lidarrmetadata/server.py

Implementation:

import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration

def init_sentry(config):
    """Initialize Sentry error tracking"""
    
    if not config.SENTRY_DSN:
        return
    
    sentry_sdk.init(
        dsn=config.SENTRY_DSN,
        integrations=[FlaskIntegration()],
        release=f"lidarr-metadata@{__version__}",
        environment=config.SENTRY_ENVIRONMENT,
        traces_sample_rate=0.1,  # 10% transaction sampling
        before_send=sentry_rate_limiter
    )

Redis-Based Rate Limiting

Purpose: Prevent duplicate error reports and alert fatigue

Implementation:

import hashlib
import aioredis

redis = None  # Global Redis connection

async def sentry_rate_limiter(event, hint):
    """Rate limit Sentry events using Redis"""
    
    if not redis:
        return event
    
    # Generate error hash
    error_type = event.get('exception', {}).get('type', 'unknown')
    error_value = event.get('exception', {}).get('value', 'unknown')
    error_hash = hashlib.md5(f"{error_type}:{error_value}".encode()).hexdigest()
    
    key = f"lm3.7:sentry:{error_hash}"
    
    # Check if error seen recently (1 hour window)
    if await redis.exists(key):
        logger.debug(f"Sentry event {error_hash} rate limited")
        return None  # Drop event
    
    # Mark error as seen
    await redis.setex(key, 3600, "1")
    
    return event

Error Context

Adding context to Sentry events:

from sentry_sdk import configure_scope

async def get_artist(mbid):
    """Get artist metadata with Sentry context"""
    
    with configure_scope() as scope:
        scope.set_tag('entity_type', 'artist')
        scope.set_tag('mbid', mbid)
        scope.set_context('request', {
            'mbid': mbid,
            'endpoint': '/artist/{mbid}'
        })
        
        try:
            artist = await fetch_artist(mbid)
            return artist
        except Exception as e:
            scope.set_extra('error_details', str(e))
            raise

Release Tracking

CI/CD integration:

# Create Sentry release
sentry-cli releases new "lidarr-metadata@${GIT_SHA}"

# Associate commits
sentry-cli releases set-commits "lidarr-metadata@${GIT_SHA}" --auto

# Finalize release
sentry-cli releases finalize "lidarr-metadata@${GIT_SHA}"

Telegraf Integration

StatsD Client

File: lidarrmetadata/util.py

Implementation:

import statsd

stats_client = None

def init_statsd(config):
    """Initialize StatsD client"""
    
    global stats_client
    
    if not config.STATSD_HOST:
        stats_client = statsd.StatsClient(prefix='lidarr.metadata')  # No-op client
        return
    
    stats_client = statsd.StatsClient(
        host=config.STATSD_HOST,
        port=config.STATSD_PORT,
        prefix='lidarr.metadata'
    )

Metrics Collection

Request counters:

# Increment request counter
stats_client.incr('requests.artist')
stats_client.incr('requests.album')
stats_client.incr('requests.search')

Response time tracking:

# Timer context manager
with stats_client.timer('response_time.artist'):
    artist = await get_artist(mbid)

# Manual timing
start = time.time()
artist = await get_artist(mbid)
duration = time.time() - start
stats_client.timing('response_time.artist', duration * 1000)  # milliseconds

Cache metrics:

# Cache hit
stats_client.incr('cache.hit')
stats_client.incr('cache.hit.redis')

# Cache miss
stats_client.incr('cache.miss')
stats_client.incr('cache.miss.redis')

# Cache size
stats_client.gauge('cache.size.redis', redis_key_count)

Provider metrics:

# Provider request
stats_client.incr('provider.fanart.request')
stats_client.incr('provider.wikipedia.request')

# Provider error
stats_client.incr('provider.fanart.error')

# Provider timeout
stats_client.incr('provider.fanart.timeout')

Health Checks

Health Endpoint

No dedicated health endpoint: The root endpoint (/) serves as health check

Implementation:

@app.route('/')
async def root():
    """Health check and version info"""
    return {
        'version': __version__,
        'status': 'healthy',
        'timestamp': datetime.utcnow().isoformat()
    }

Component Health Checks

Database health:

async def check_database_health():
    """Check MusicBrainz database connectivity"""
    try:
        async with db_pool.acquire() as conn:
            await conn.fetchval('SELECT 1')
        return True
    except Exception as e:
        logger.error(f"Database health check failed: {e}")
        return False

Redis health:

async def check_redis_health():
    """Check Redis connectivity"""
    try:
        await redis.ping()
        return True
    except Exception as e:
        logger.error(f"Redis health check failed: {e}")
        return False

Solr health:

async def check_solr_health():
    """Check Solr connectivity"""
    try:
        async with aiohttp.ClientSession() as session:
            url = f"{config.SOLR['url']}/admin/ping"
            async with session.get(url, timeout=5) as response:
                return response.status == 200
    except Exception as e:
        logger.error(f"Solr health check failed: {e}")
        return False

Authentication and Authorization

API Key Authentication

Implementation: Single API key for invalidation endpoint only

Configuration:

INVALIDATE_APIKEY = 'replaceme'  # MUST change in production

Middleware:

from quart import request, abort

async def require_api_key():
    """Require API key for protected endpoints"""
    
    api_key = request.headers.get('X-Api-Key')
    
    if not api_key:
        abort(401, 'Missing API key')
    
    if api_key != config.INVALIDATE_APIKEY:
        abort(401, 'Invalid API key')

Usage:

@app.route('/invalidate', methods=['POST'])
async def invalidate():
    """Invalidate cache (requires API key)"""
    
    await require_api_key()
    
    # Invalidation logic
    ...

CORS Configuration

Implementation: Permissive CORS by default

Configuration:

CORS_ORIGINS = ['*']  # Allow all origins

Middleware:

from quart_cors import cors

app = cors(app, allow_origin=config.CORS_ORIGINS)

Production recommendation: Restrict to specific origins

CORS_ORIGINS = ['https://lidarr.audio', 'https://app.lidarr.audio']

Rate Limiting

Rate Limiter Implementations

1. NullRateLimiter (Default)

Purpose: No rate limiting, maximum throughput

Implementation:

class NullRateLimiter:
    """No-op rate limiter"""
    
    async def acquire(self, key: str):
        """Always allow request"""
        pass
    
    async def release(self, key: str):
        """No-op"""
        pass

2. SimpleRateLimiter

Purpose: In-memory rate limiting (single instance)

Implementation:

import time
from collections import defaultdict

class SimpleRateLimiter:
    """In-memory rate limiter"""
    
    def __init__(self, max_requests: int, window: int):
        self.max_requests = max_requests
        self.window = window
        self.requests = defaultdict(list)
    
    async def acquire(self, key: str):
        """Check and update rate limit"""
        now = time.time()
        
        # Remove old requests
        self.requests[key] = [
            req_time for req_time in self.requests[key]
            if req_time > now - self.window
        ]
        
        # Check limit
        if len(self.requests[key]) >= self.max_requests:
            raise RateLimitExceeded(
                f"Rate limit exceeded: {len(self.requests[key])}/{self.max_requests}"
            )
        
        # Add current request
        self.requests[key].append(now)

3. RedisRateLimiter

Purpose: Distributed rate limiting (multiple instances)

Implementation:

class RedisRateLimiter:
    """Redis-based distributed rate limiter"""
    
    def __init__(self, redis, max_requests: int, window: int):
        self.redis = redis
        self.max_requests = max_requests
        self.window = window
    
    async def acquire(self, key: str):
        """Check and update rate limit using Redis"""
        now = time.time()
        window_key = f"lm3.7:ratelimit:{key}:{int(now / self.window)}"
        
        # Increment counter
        count = await self.redis.incr(window_key)
        
        # Set expiration on first request
        if count == 1:
            await self.redis.expire(window_key, self.window)
        
        # Check limit
        if count > self.max_requests:
            raise RateLimitExceeded(
                f"Rate limit exceeded: {count}/{self.max_requests}"
            )

Rate Limiter Selection

Configuration:

RATE_LIMITER = 'redis'  # 'null', 'simple', 'redis'
RATE_LIMIT_REQUESTS = 100
RATE_LIMIT_WINDOW = 60

Factory function:

def create_rate_limiter(config):
    """Create rate limiter based on configuration"""
    
    if config.RATE_LIMITER == 'null':
        return NullRateLimiter()
    elif config.RATE_LIMITER == 'simple':
        return SimpleRateLimiter(
            max_requests=config.RATE_LIMIT_REQUESTS,
            window=config.RATE_LIMIT_WINDOW
        )
    elif config.RATE_LIMITER == 'redis':
        return RedisRateLimiter(
            redis=redis,
            max_requests=config.RATE_LIMIT_REQUESTS,
            window=config.RATE_LIMIT_WINDOW
        )
    else:
        raise ValueError(f"Unknown rate limiter: {config.RATE_LIMITER}")

Testing

Test Suite Structure

Framework: pytest with pytest-asyncio

Test files:

File Lines Coverage Description
test_config.py 152 High Configuration system tests
test_provider.py 98 Medium Provider mixin tests
test_cache.py 87 Medium Cache layer tests
test_api.py 76 Low API endpoint tests
test_util.py 45 High Utility function tests
test_app.py 34 Low Application initialization tests

Configuration Tests (Most Comprehensive)

File: tests/test_config.py (152 lines)

Test cases:

import pytest
from lidarrmetadata.config import DefaultConfig, DevelopmentConfig, ProductionConfig

def test_default_config():
    """Test default configuration values"""
    assert DefaultConfig.PORT == 5001
    assert DefaultConfig.DEBUG is False
    assert DefaultConfig.DATABASE['host'] == 'localhost'

def test_environment_override(monkeypatch):
    """Test environment variable override"""
    monkeypatch.setenv('DEFAULTCONFIG_PORT', '8080')
    assert DefaultConfig.PORT == 8080

def test_nested_override(monkeypatch):
    """Test nested environment variable override"""
    monkeypatch.setenv('DEFAULTCONFIG_DATABASE__HOST', 'musicbrainz-db')
    assert DefaultConfig.DATABASE__HOST == 'musicbrainz-db'

def test_boolean_conversion(monkeypatch):
    """Test boolean type conversion"""
    monkeypatch.setenv('DEFAULTCONFIG_DEBUG', 'true')
    assert DefaultConfig.DEBUG is True
    
    monkeypatch.setenv('DEFAULTCONFIG_DEBUG', '1')
    assert DefaultConfig.DEBUG is True
    
    monkeypatch.setenv('DEFAULTCONFIG_DEBUG', 'false')
    assert DefaultConfig.DEBUG is False

def test_integer_conversion(monkeypatch):
    """Test integer type conversion"""
    monkeypatch.setenv('DEFAULTCONFIG_PORT', '9000')
    assert DefaultConfig.PORT == 9000
    assert isinstance(DefaultConfig.PORT, int)

def test_development_config():
    """Test development configuration"""
    assert DevelopmentConfig.DEBUG is True
    assert DevelopmentConfig.SENTRY_DSN is None

def test_production_config():
    """Test production configuration"""
    assert ProductionConfig.DEBUG is False
    assert ProductionConfig.RATE_LIMITER == 'redis'

Provider Tests

File: tests/test_provider.py

Test cases:

import pytest
from lidarrmetadata.provider import MusicbrainzDbProvider

@pytest.mark.asyncio
async def test_get_artist_by_id():
    """Test artist lookup by MBID"""
    provider = MusicbrainzDbProvider(config)
    artist = await provider.get_artist_by_id('5b11f4ce-a62d-471e-81fc-a69a8278c7da')
    
    assert artist is not None
    assert artist['ArtistName'] == 'Nirvana'

@pytest.mark.asyncio
async def test_get_artist_not_found():
    """Test artist lookup with invalid MBID"""
    provider = MusicbrainzDbProvider(config)
    artist = await provider.get_artist_by_id('00000000-0000-0000-0000-000000000000')
    
    assert artist is None

Cache Tests

File: tests/test_cache.py

Test cases:

import pytest
from lidarrmetadata.cache import RedisCache, PostgresCache

@pytest.mark.asyncio
async def test_redis_cache_set_get():
    """Test Redis cache set and get"""
    cache = RedisCache(redis, namespace='test')
    
    await cache.set('key1', {'value': 'test'}, ttl=60)
    value = await cache.get('key1')
    
    assert value == {'value': 'test'}

@pytest.mark.asyncio
async def test_redis_cache_expiration():
    """Test Redis cache expiration"""
    cache = RedisCache(redis, namespace='test')
    
    await cache.set('key2', {'value': 'test'}, ttl=1)
    await asyncio.sleep(2)
    value = await cache.get('key2')
    
    assert value is None

Test Configuration

pytest.ini:

[pytest]
asyncio_mode = auto
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts = -v --tb=short

Test dependencies (pyproject.toml):

[tool.poetry.dev-dependencies]
pytest = "^7.0"
pytest-asyncio = "^0.18"
pytest-cov = "^3.0"
pytest-mock = "^3.6"

CI/CD Test Execution

Azure Pipelines (commented out):

# - script: |
#     poetry run pytest tests/ --cov=lidarrmetadata --cov-report=xml
#   displayName: 'Run tests'

Reason for disabling: Tests require full infrastructure (MusicBrainz DB, Solr, Redis)

Code Quality

SonarCloud Integration

Configuration: sonar-project.properties

sonar.projectKey=Lidarr_LidarrAPI.Metadata
sonar.organization=lidarr
sonar.projectName=LidarrAPI.Metadata
sonar.projectVersion=10.0.0.0

sonar.sources=lidarrmetadata
sonar.tests=tests
sonar.python.version=3.9

sonar.coverage.exclusions=tests/**,lidarrmetadata/server.py,lidarrmetadata/crawler.py

Code Formatting

Tool: Black (Python code formatter)

Configuration: pyproject.toml

[tool.black]
line-length = 100
target-version = ['py39']
include = '\.pyi?$'
exclude = '''
/(
    \.git
  | \.venv
  | build
  | dist
)/
'''

Linting

Tool: Flake8

Configuration: .flake8

[flake8]
max-line-length = 100
exclude = .git,__pycache__,build,dist,.venv
ignore = E203,W503

Security Considerations

Hardcoded Credentials

Problem: Default credentials hardcoded throughout codebase

Instances:

  • Database: abc/abc
  • RabbitMQ: abc/abc
  • Redis: No password
  • API key: replaceme

Recommendation: Use environment variables or secrets management

No Authentication on Read Endpoints

Problem: All read endpoints are publicly accessible

Impact: Anyone can query the API without authentication

Recommendation: Implement API key authentication or OAuth

SQL Injection Protection

Protection: asyncpg parameterized queries

Example:

# Safe (parameterized)
await conn.fetchrow("SELECT * FROM artist WHERE gid = $1", mbid)

# Unsafe (string interpolation) - NOT USED
# await conn.fetchrow(f"SELECT * FROM artist WHERE gid = '{mbid}'")

Dependency Vulnerabilities

Outdated dependencies:

  • Python 3.9 (EOL October 2025)
  • aioredis 1.3.1 (deprecated)
  • Quart 0.14.1 (5 years old)
  • sentry-sdk 0.19.5 (major version behind)

Recommendation: Upgrade to latest versions

Performance Optimizations

Async/Await Throughout

Pattern: All I/O operations use async/await

Benefits:

  • High concurrency
  • Efficient resource usage
  • Non-blocking operations

Connection Pooling

Database:

pool = await asyncpg.create_pool(
    min_size=10,
    max_size=50
)

Redis:

redis = await aioredis.create_redis_pool(
    minsize=5,
    maxsize=20
)

Parallel Fetching

Pattern: Fetch multiple data sources concurrently

Example:

images_task = asyncio.create_task(get_images(mbid))
overview_task = asyncio.create_task(get_overview(mbid))
links_task = asyncio.create_task(get_links(mbid))

images = await images_task
overview = await overview_task
links = await links_task

Compression

Cache compression: zlib compression of pickled objects

Ratio: Typically 10:1 for JSON metadata

Trade-off: CPU time for storage savings

Conclusion

The codebase demonstrates several advanced patterns:

  1. Metaclass-based configuration: Elegant environment variable override system
  2. Comprehensive logging: Per-module loggers with appropriate levels
  3. Sentry integration: Redis-based rate limiting prevents alert fatigue
  4. Telegraf metrics: StatsD integration for operational visibility
  5. Multiple rate limiters: Pluggable rate limiting strategies
  6. Async-first design: All I/O operations use async/await
  7. Connection pooling: Efficient resource management

Key weaknesses:

  1. Tests disabled in CI: Infrastructure dependencies prevent automated testing
  2. Hardcoded credentials: Insecure defaults throughout
  3. No read authentication: Public API access
  4. Outdated dependencies: Python 3.9, deprecated libraries
  5. Limited test coverage: Only 6 test files, most are minimal

The configuration system is particularly well-designed, allowing flexible deployment across environments while maintaining type safety and validation.