- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
14 KiB
Lidarr Metadata API - Overview
Project Identity
| Property | Value |
|---|---|
| Name | LidarrAPI.Metadata |
| Repository | https://github.com/Lidarr/LidarrAPI.Metadata |
| Version | 10.0.0.0 |
| License | GPL-3.0 |
| Primary Language | Python 3.9 |
| Purpose | Enriched metadata aggregation API for Lidarr music manager |
Core Purpose
LidarrAPI.Metadata serves as a metadata enrichment layer for the Lidarr music management application. It aggregates data from multiple authoritative sources (MusicBrainz, FanArt.tv, TheAudioDB, Wikipedia, Spotify, Last.fm, Billboard, Apple Music) to provide comprehensive artist and album metadata including:
- Artist biographical information
- Album release details
- High-quality cover art and artist images
- Genre classifications
- Music charts and trending data
- Cross-platform ID mappings (MusicBrainz, Spotify, TheAudioDB)
The API acts as an intelligent caching proxy that transforms raw MusicBrainz database records into enriched JSON responses suitable for consumption by Lidarr clients.
Technology Stack
Core Framework
| Component | Version | Purpose |
|---|---|---|
| Python | 3.9 | Runtime environment |
| Quart | 0.14.1 | Async web framework (Flask-compatible) |
| Gunicorn | Latest | WSGI HTTP server |
| Uvicorn | Latest | ASGI server (worker class) |
Data Layer
| Component | Version | Purpose |
|---|---|---|
| asyncpg | 0.26.0 | PostgreSQL async driver |
| aioredis | 1.3.1 | Redis async client |
| PostgreSQL | 12+ | MusicBrainz database + cache storage |
| Redis | 6+ | Ephemeral cache + rate limiting |
| Solr | 8.x | Full-text search engine |
External Integrations
| Library | Version | Purpose |
|---|---|---|
| spotipy | 2.16.1 | Spotify API client |
| pylast | 4.3.0 | Last.fm API client |
| billboard-py | 7.0.0 | Billboard chart scraper |
| beautifulsoup4 | Latest | HTML parsing (Wikipedia) |
| sentry-sdk | 0.19.5 | Error tracking |
Application Entry Points
The project provides two executable entry points:
1. API Server
lidarr-metadata-server
Implementation: lidarrmetadata/server.py
Starts the Quart web application serving the metadata API on port 5001. Supports configurable path prefix via APPLICATION_ROOT environment variable.
Production command:
gunicorn -w 1 -k uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:5001 \
--access-logfile - \
lidarrmetadata.server:app
2. Background Crawler
lidarr-metadata-crawler
Implementation: lidarrmetadata/crawler.py
Runs background cache warming tasks to proactively fetch and cache metadata for recently updated artists and albums. Operates independently of the API server.
Crawler types:
- Wikipedia overview crawler
- FanArt.tv image crawler
- TheAudioDB metadata crawler
- Artist metadata crawler
- Album metadata crawler
Network Configuration
| Setting | Default | Configurable Via |
|---|---|---|
| Port | 5001 | Docker/Gunicorn bind |
| Path Prefix | / |
APPLICATION_ROOT env var |
| Workers | 1 | Gunicorn -w flag |
| Worker Class | uvicorn | Gunicorn -k flag |
Related Ecosystem Components
Lidarr Music Manager
The primary consumer of this API. Lidarr is an automated music collection manager for Usenet and BitTorrent users. It monitors multiple RSS feeds for new albums from favorite artists and grabs, sorts, and renames them.
Integration: Lidarr queries this API to enrich its local music library database with metadata, images, and biographical information.
MusicBrainz Database
The authoritative source for music metadata. MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public.
Integration: Direct PostgreSQL connection to a replicated MusicBrainz database instance. The API does NOT use the MusicBrainz web API; it queries the database directly for performance.
Database size: ~100GB+ for full MusicBrainz dataset with hourly replication.
Cover Art Archive
A joint project between the Internet Archive and MusicBrainz providing cover art images for releases in the MusicBrainz database.
Integration: Images are proxied through imagecache.lidarr.audio CDN for performance and bandwidth optimization.
Deployment Architecture
The application is designed for containerized deployment with Docker Compose. A typical production deployment includes:
| Container | Purpose | Resource Requirements |
|---|---|---|
| musicbrainz | PostgreSQL with MusicBrainz schema | 100GB+ storage, 4GB+ RAM |
| solr | Search index (artist/album) | 8GB+ storage, 2GB+ RAM |
| redis | Cache + rate limiting | 512MB RAM limit |
| rabbitmq | Search index updates | 1GB RAM |
| indexer | Solr index updater (SIR) | 512MB RAM |
| api-v0.3 | Stable API version | 1GB+ RAM |
| api-testing | Development API version | 1GB+ RAM |
| crawler | Background cache warmer | 512MB RAM |
Version Strategy
The project uses semantic versioning with a unique dual-deployment strategy:
- v0.3: Stable production version
- testing: Development/staging version
Both versions run simultaneously in production, allowing gradual rollout and A/B testing of new features.
Configuration Management
Configuration is managed through a metaclass-based system with environment variable overrides:
# Select configuration class
LIDARR_METADATA_CONFIG=lidarrmetadata.config.ProductionConfig
# Override specific settings (double underscore for nesting)
CACHE__REDIS_URL=redis://redis:6379/0
DATABASE__HOST=musicbrainz
Key Features
Multi-Source Aggregation
Combines data from 15+ external sources into unified artist/album responses:
- Core metadata: MusicBrainz database (direct SQL)
- Images: Cover Art Archive, FanArt.tv, TheAudioDB
- Biographies: Wikipedia (32 language fallback)
- Cross-platform IDs: Spotify, TheAudioDB, MusicBrainz
- Charts: Last.fm, Billboard, Apple Music, iTunes
Intelligent Caching
Three-tier caching strategy:
- Redis: Ephemeral cache (7-day TTL, 512MB limit, LFU eviction)
- PostgreSQL: Persistent cache with zlib compression
- Cloudflare CDN: Edge caching with programmatic invalidation
Change Detection
Monitors MusicBrainz replication stream to detect updated artists/albums and invalidate stale cache entries. SQL queries track changes across 5 different update sources per entity type.
Background Crawling
Proactive cache warming for recently updated entities. Crawlers run on configurable schedules to pre-fetch expensive metadata (Wikipedia overviews, FanArt images) before user requests.
Provider Fallback Chain
Graceful degradation when external services are unavailable. Each metadata type has a primary provider and optional fallback providers with timeout handling.
Performance Characteristics
| Metric | Value | Notes |
|---|---|---|
| Cache hit rate | ~85%+ | With crawler enabled |
| Cold request latency | 2-5s | Multiple external API calls |
| Cached request latency | 50-200ms | Redis/PostgreSQL lookup |
| CDN request latency | 10-50ms | Cloudflare edge cache |
| Database size | 100GB+ | MusicBrainz full dataset |
| Cache database size | 10-50GB | Compressed metadata cache |
API Response Format
All endpoints return JSON with consistent structure:
{
"Id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"ArtistName": "Nirvana",
"Disambiguation": "90s US grunge band",
"Overview": "Nirvana was an American rock band...",
"Images": [
{
"Url": "https://imagecache.lidarr.audio/...",
"CoverType": "poster",
"Extension": ".jpg"
}
],
"Links": [
{
"Url": "https://www.spotify.com/artist/...",
"Name": "spotify"
}
],
"Genres": ["Grunge", "Alternative Rock"],
"Albums": [...]
}
Security Posture
Current state: Development-focused with insecure defaults.
| Aspect | Status | Details |
|---|---|---|
| API authentication | None | Read endpoints are public |
| Admin authentication | Single API key | /invalidate endpoint only |
| Database credentials | Hardcoded | abc/abc in multiple configs |
| RabbitMQ credentials | Hardcoded | abc/abc default |
| HTTPS | Not enforced | Relies on reverse proxy |
| Rate limiting | Optional | Disabled by default (NullRateLimiter) |
Production recommendation: Deploy behind authenticated reverse proxy (Cloudflare Access, OAuth2 Proxy, etc.).
Monitoring and Observability
Error Tracking
Sentry integration with custom rate limiting to prevent alert fatigue:
sentry_sdk.init(
dsn=config.SENTRY_DSN,
integrations=[FlaskIntegration()],
release=f"lidarr-metadata@{__version__}"
)
Redis-backed deduplication prevents duplicate error reports.
Metrics
StatsD/Telegraf integration for operational metrics:
- Provider request counts
- Response time histograms
- Cache hit/miss rates
- Rate limiter state
Logging
Python standard library logging with per-module handlers:
- DEBUG: Detailed request/response logging
- INFO: Request summaries, cache operations
- WARN: Provider timeouts, fallback usage
- ERROR: Unhandled exceptions, data inconsistencies
Development Workflow
Local Development
# Install dependencies
poetry install
# Start infrastructure
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d
# Run API server
LIDARR_METADATA_CONFIG=lidarrmetadata.config.DevelopmentConfig \
python -m lidarrmetadata.server
# Run tests (currently disabled in CI)
pytest tests/
Testing
Test suite uses pytest with async support:
tests/test_config.py: Configuration system (152 lines, most comprehensive)tests/test_provider.py: Provider mixin behaviortests/test_cache.py: Cache layer functionalitytests/test_api.py: API endpoint responsestests/test_util.py: Utility functionstests/test_app.py: Application initialization
Note: Tests are commented out in Azure Pipelines CI configuration.
Project Maturity Assessment
| Aspect | Maturity | Evidence |
|---|---|---|
| Production readiness | High | Running in production for Lidarr ecosystem |
| Code quality | Medium | SonarCloud integration, but tests disabled |
| Security | Low | Hardcoded credentials, no auth on read endpoints |
| Documentation | Medium | README comprehensive, inline docs sparse |
| Dependency freshness | Low | Python 3.9, aioredis 1.x (deprecated) |
| Test coverage | Unknown | Tests disabled in CI |
| Operational maturity | High | Sentry, metrics, multi-tier caching, CDN integration |
Relevance to Metadata Aggregator Project
This codebase represents the closest real-world implementation of a production metadata aggregation service. Key learnings:
- Multi-source enrichment pattern: MusicBrainz as authoritative core + specialized providers for images/bios/charts
- Caching strategy: Three-tier approach with compression and invalidation is battle-tested
- Provider architecture: Mixin-based design allows flexible composition of data sources
- Change detection: Monitoring upstream data sources for cache invalidation is critical
- Background crawling: Proactive cache warming significantly improves user experience
- Direct database access: Querying MusicBrainz DB directly (vs API) enables complex aggregations
- SQL aggregation: Using
row_to_jsonandjson_aggto build nested JSON in database is highly efficient
File Structure Overview
lidarrmetadata/
├── __init__.py # Version and package metadata
├── server.py # API server entry point
├── crawler.py # Background crawler entry point
├── app.py # Quart application factory + routes
├── api.py # Business logic layer
├── provider.py # Provider mixins and implementations
├── cache.py # Multi-tier cache implementation
├── config.py # Configuration metaclass system
├── util.py # Utility functions
├── sql/ # MusicBrainz SQL queries
│ ├── artist.sql
│ ├── album.sql
│ ├── updated_artists.sql
│ └── updated_albums.sql
└── providers/ # Individual provider implementations
├── musicbrainz_db.py
├── solr_search.py
├── fanart.py
├── theaudiodb.py
├── wikipedia.py
└── spotify.py
Dependencies Analysis
Production Dependencies (17 total)
Web framework:
- quart==0.14.1 (async Flask alternative)
- hypercorn (ASGI server, Quart dependency)
Database:
- asyncpg==0.26.0 (PostgreSQL async driver)
- aioredis==1.3.1 (Redis async client, deprecated)
External APIs:
- spotipy==2.16.1 (Spotify)
- pylast==4.3.0 (Last.fm)
- billboard-py==7.0.0 (Billboard charts)
- beautifulsoup4 (Wikipedia scraping)
Utilities:
- python-dateutil (date parsing)
- pytz (timezone handling)
- requests (HTTP client for sync operations)
- lxml (XML parsing)
Monitoring:
- sentry-sdk==0.19.5 (error tracking)
- statsd (metrics)
Server:
- gunicorn (WSGI server)
- uvicorn (ASGI worker)
Development Dependencies
- pytest
- pytest-asyncio
- black (code formatting)
- flake8 (linting)
Dependency Concerns
- Python 3.9: End of life October 2025, should upgrade to 3.11+
- aioredis 1.3.1: Deprecated, merged into redis-py 4.2+
- Quart 0.14.1: Current version is 0.19+, missing 5 years of updates
- asyncpg 0.26.0: Current version is 0.29+
- sentry-sdk 0.19.5: Current version is 2.0+, missing major version
Conclusion
LidarrAPI.Metadata is a production-grade metadata aggregation service with sophisticated caching, multi-source enrichment, and operational maturity. While it has technical debt (outdated dependencies, disabled tests, insecure defaults), its architecture and patterns provide an excellent reference for building a modern metadata aggregator.
The direct MusicBrainz database integration, provider fallback chain, and three-tier caching strategy are particularly valuable patterns to adopt.