Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

35 KiB

Raw Blame History

Lidarr Metadata API - External Integrations

Integration Overview

The Lidarr Metadata API integrates with 15 external systems to provide comprehensive music metadata aggregation:

Integration	Type	Purpose	Authentication	Rate Limit
MusicBrainz DB	Database	Core metadata	Read-only user	N/A
Solr Search	Search Engine	Full-text search	None	N/A
Cover Art Archive	CDN	Album cover art	None	N/A
FanArt.tv	REST API	Artist/album images	API key	7-day lag (free)
TheAudioDB	REST API	Metadata fallback	API key "1"	Unknown
Wikipedia	Web Scraping	Artist biographies	None	Polite crawling
Spotify	REST API	ID mapping, charts	OAuth	429 handling
Last.fm	REST API	Charts	API key	Unknown
Billboard	Web Scraping	Charts	None	Polite crawling
Apple Music	RSS API	Charts	None	N/A
RabbitMQ	Message Queue	Search index updates	Basic auth	N/A
Redis	Cache	Ephemeral cache	None	N/A
Sentry	Error Tracking	Monitoring	DSN	Redis-based
Telegraf	Metrics	StatsD metrics	None	N/A
Cloudflare	CDN	Edge caching	API token	1200 req/5min

1. MusicBrainz Database

Overview

Type: Direct PostgreSQL database access

Purpose: Authoritative source for all music metadata

Container: ghcr.io/lidarr/mb-postgres:1.0.10

Access method: Read-only asyncpg connection

Configuration

MUSICBRAINZ_DB = {
    'host': 'musicbrainz',
    'port': 5432,
    'database': 'musicbrainz_db',
    'user': 'musicbrainz_ro',  # Read-only user
    'password': 'abc',
    'min_pool_size': 10,
    'max_pool_size': 50,
    'command_timeout': 30
}

Connection Pool

import asyncpg

pool = await asyncpg.create_pool(
    host=config.MUSICBRAINZ_DB['host'],
    port=config.MUSICBRAINZ_DB['port'],
    database=config.MUSICBRAINZ_DB['database'],
    user=config.MUSICBRAINZ_DB['user'],
    password=config.MUSICBRAINZ_DB['password'],
    min_size=config.MUSICBRAINZ_DB['min_pool_size'],
    max_size=config.MUSICBRAINZ_DB['max_pool_size'],
    command_timeout=config.MUSICBRAINZ_DB['command_timeout']
)

Replication Setup

Replication method: MusicBrainz replication packets

Update frequency: Hourly

Replication script: Custom script in container

Process:

Check current replication sequence
Download replication packets from MusicBrainz FTP
Apply SQL changes
Update replication control table
Trigger search index updates

Monitoring replication lag:

SELECT
    current_replication_sequence,
    last_replication_date,
    NOW() - last_replication_date AS lag
FROM replication_control;

Database Size and Performance

Database size: 100GB+ (full MusicBrainz dataset)

Query performance:

Simple artist lookup: 50-100ms
Complex artist with releases: 100-500ms
Album with tracks: 200-1000ms
Change detection query: 500-2000ms

Optimization: Custom indices on last_updated columns

Security Considerations

Read-only access: User has SELECT-only permissions

Network isolation: Database accessible only within Docker network

Credentials: Hardcoded (insecure default, should be changed)

2. Solr Search

Overview

Type: Apache Solr 8.x search engine

Purpose: Full-text search for artists and albums

Container: ghcr.io/lidarr/mb-solr:3.3.1.9

Cores: artist, release-group

Configuration

SOLR = {
    'url': 'http://solr:8983/solr',
    'artist_core': 'artist',
    'album_core': 'release-group',
    'timeout': 5,
    'rows': 10
}

Query Interface

HTTP client: aiohttp

Query format: JSON API

Example query:

import aiohttp

async def search_artist(query, limit=10):
    async with aiohttp.ClientSession() as session:
        params = {
            'q': query,
            'defType': 'dismax',
            'qf': 'artist^2 sortname alias',
            'mm': '1',
            'rows': limit,
            'wt': 'json'
        }
        
        async with session.get(
            f"{config.SOLR['url']}/{config.SOLR['artist_core']}/select",
            params=params,
            timeout=aiohttp.ClientTimeout(total=config.SOLR['timeout'])
        ) as response:
            data = await response.json()
            return data['response']['docs']

Real-Time Index Updates

Update mechanism: RabbitMQ + SIR (Search Index Rebuilder)

Process:

MusicBrainz database changes trigger RabbitMQ messages
SIR consumes messages from queue
SIR queries MusicBrainz DB for updated entity
SIR posts update to Solr core
Solr performs soft commit (1 second)

Update latency: 1-5 seconds from database change

Index Maintenance

Full reindex: Required after schema changes

Reindex process:

# Stop SIR
docker-compose stop indexer

# Clear Solr cores
curl "http://solr:8983/solr/artist/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
curl "http://solr:8983/solr/release-group/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'

# Rebuild indices
docker-compose run indexer rebuild-artist
docker-compose run indexer rebuild-album

# Restart SIR
docker-compose start indexer

Reindex duration: 4-8 hours for full MusicBrainz dataset

Performance Tuning

JVM heap size: 2GB

Solr cache settings:

<filterCache size="512" initialSize="512" autowarmCount="256"/>
<queryResultCache size="512" initialSize="512" autowarmCount="256"/>
<documentCache size="512" initialSize="512" autowarmCount="0"/>

Commit settings:

<autoCommit>
  <maxTime>15000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
  <maxTime>1000</maxTime>
</autoSoftCommit>

3. Cover Art Archive

Overview

Type: Image CDN

Purpose: Album cover art images

Base URL: https://coverartarchive.org

Proxy: https://imagecache.lidarr.audio

Image URL Format

Direct URL:

https://coverartarchive.org/release/{release-mbid}/front-500.jpg

Proxied URL:

https://imagecache.lidarr.audio/cover/{release-mbid}/front.jpg

Image Types

Type	Description	Typical Size
`front`	Front cover	500x500 - 1200x1200
`back`	Back cover	500x500 - 1200x1200
`booklet`	Booklet pages	Variable
`medium`	Disc/vinyl image	500x500
`tray`	CD tray card	Variable
`obi`	Japanese obi strip	Variable
`spine`	Spine image	Variable
`track`	Track listing	Variable
`liner`	Liner notes	Variable
`sticker`	Sticker image	Variable
`poster`	Poster image	Variable

Image Proxy Benefits

Advantages of using imagecache.lidarr.audio:

Caching: Images cached at edge for faster delivery
Resizing: Automatic image resizing via query parameters
Format conversion: WebP conversion for modern browsers
Bandwidth: Reduced load on Cover Art Archive
Reliability: Fallback to direct URL if proxy fails

Proxy query parameters:

https://imagecache.lidarr.audio/cover/{mbid}/front.jpg?size=500&format=webp

Integration Code

async def get_cover_art(release_mbid):
    """Fetch cover art URLs for release"""
    images = []
    
    # Try proxy first
    proxy_url = f"https://imagecache.lidarr.audio/cover/{release_mbid}/front.jpg"
    if await check_url_exists(proxy_url):
        images.append({
            'Url': proxy_url,
            'CoverType': 'cover',
            'Extension': '.jpg'
        })
    else:
        # Fallback to direct URL
        direct_url = f"https://coverartarchive.org/release/{release_mbid}/front-500.jpg"
        if await check_url_exists(direct_url):
            images.append({
                'Url': direct_url,
                'CoverType': 'cover',
                'Extension': '.jpg'
            })
    
    return images

Error Handling

404 Not Found: No cover art available for release

503 Service Unavailable: Cover Art Archive temporarily down

Fallback: Use FanArt.tv or TheAudioDB images

4. FanArt.tv

Overview

Type: REST API

Purpose: High-quality artist and album images

Base URL: https://webservice.fanart.tv/v3

Authentication: API key

Configuration

FANART = {
    'api_key': 'your-api-key-here',
    'base_url': 'https://webservice.fanart.tv/v3',
    'timeout': 10,
    'cache_ttl': 2592000  # 30 days
}

API Key Types

Key Type	Cost	Rate Limit	Image Lag
Free	Free	Unknown	7 days
Personal	$2/month	Higher	No lag
Commercial	$5/month	Highest	No lag

7-day lag: Free API keys only return images added 7+ days ago

Endpoints

Artist Images

Endpoint: GET /music/{mbid}

Request:

async def get_fanart_artist_images(mbid):
    async with aiohttp.ClientSession() as session:
        headers = {'api-key': config.FANART['api_key']}
        url = f"{config.FANART['base_url']}/music/{mbid}"
        
        async with session.get(url, headers=headers, timeout=10) as response:
            if response.status == 404:
                return []
            response.raise_for_status()
            return await response.json()

Response:

{
  "name": "Nirvana",
  "mbid_id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
  "artistbackground": [
    {
      "id": "12345",
      "url": "https://assets.fanart.tv/fanart/music/5b11f4ce-a62d-471e-81fc-a69a8278c7da/artistbackground/nirvana-1.jpg",
      "likes": "42"
    }
  ],
  "artistthumb": [...],
  "hdmusiclogo": [...],
  "musicbanner": [...],
  "musiclogo": [...]
}

Album Images

Endpoint: GET /music/albums/{mbid}

Response:

{
  "albums": {
    "1b022e01-4da6-387b-8658-8678046e4cef": {
      "albumcover": [
        {
          "id": "67890",
          "url": "https://assets.fanart.tv/fanart/music/1b022e01-4da6-387b-8658-8678046e4cef/albumcover/nevermind-1.jpg",
          "likes": "156"
        }
      ],
      "cdart": [...]
    }
  }
}

Image Types

Artist images:

artistbackground: Background images (1920x1080)
artistthumb: Artist thumbnails (1000x1000)
hdmusiclogo: HD logos (transparent PNG)
musicbanner: Banners (1000x185)
musiclogo: Standard logos (transparent PNG)

Album images:

albumcover: Album covers (1000x1000)
cdart: CD art (transparent PNG)

Mapping to Lidarr Image Types

FANART_TYPE_MAPPING = {
    'artistbackground': 'fanart',
    'artistthumb': 'poster',
    'hdmusiclogo': 'logo',
    'musicbanner': 'banner',
    'musiclogo': 'logo',
    'albumcover': 'cover',
    'cdart': 'disc'
}

Error Handling

404 Not Found: No images available for artist/album

429 Too Many Requests: Rate limit exceeded (retry with backoff)

503 Service Unavailable: FanArt.tv temporarily down

Fallback: Use TheAudioDB or Cover Art Archive

Caching Strategy

Cache TTL: 30 days (images rarely change)

Cache key: fanart:artist:{mbid} or fanart:album:{mbid}

Invalidation: Manual only (images are immutable)

5. TheAudioDB

Overview

Type: REST API

Purpose: Fallback metadata and images

Base URL: https://theaudiodb.com/api/v1/json

Authentication: API key "1" (public key)

Configuration

THEAUDIODB = {
    'api_key': '1',
    'base_url': 'https://theaudiodb.com/api/v1/json',
    'timeout': 10,
    'cache_ttl': 2592000  # 30 days
}

Endpoints

Artist by MusicBrainz ID

Endpoint: GET /1/artist-mb.php?i={mbid}

Request:

async def get_theaudiodb_artist(mbid):
    async with aiohttp.ClientSession() as session:
        url = f"{config.THEAUDIODB['base_url']}/1/artist-mb.php"
        params = {'i': mbid}
        
        async with session.get(url, params=params, timeout=10) as response:
            if response.status == 404:
                return None
            response.raise_for_status()
            data = await response.json()
            return data['artists'][0] if data['artists'] else None

Response:

{
  "artists": [
    {
      "idArtist": "111247",
      "strArtist": "Nirvana",
      "strArtistAlternate": "",
      "strLabel": "DGC Records",
      "idLabel": "45114",
      "intFormedYear": "1987",
      "intBornYear": "",
      "intDiedYear": "",
      "strDisbanded": "1994",
      "strStyle": "Grunge",
      "strGenre": "Rock",
      "strMood": "Angry",
      "strWebsite": "www.nirvana.com",
      "strFacebook": "www.facebook.com/Nirvana",
      "strTwitter": "twitter.com/nirvana",
      "strBiographyEN": "Nirvana was an American rock band...",
      "strBiographyDE": null,
      "strBiographyFR": null,
      "strGender": "Male",
      "strCountry": "United States",
      "strCountryCode": "US",
      "strArtistThumb": "https://www.theaudiodb.com/images/media/artist/thumb/uxrqxy1347913147.jpg",
      "strArtistLogo": "https://www.theaudiodb.com/images/media/artist/logo/urspuv1434553994.png",
      "strArtistFanart": "https://www.theaudiodb.com/images/media/artist/fanart/spvryu1347980801.jpg",
      "strArtistBanner": "https://www.theaudiodb.com/images/media/artist/banner/xuypqw1342640163.jpg",
      "strMusicBrainzID": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
      "strLastFMChart": "https://www.last.fm/music/Nirvana",
      "intCharted": "5",
      "strLocked": "unlocked"
    }
  ]
}

Album by MusicBrainz ID

Endpoint: GET /1/album-mb.php?i={mbid}

Response: Similar structure with album-specific fields

Data Extraction

Biography:

def extract_biography(artist_data):
    """Extract biography with language fallback"""
    languages = ['EN', 'DE', 'FR', 'ES', 'IT', 'JP']
    for lang in languages:
        bio = artist_data.get(f'strBiography{lang}')
        if bio:
            return bio
    return None

Images:

def extract_images(artist_data):
    """Extract image URLs"""
    images = []
    
    if artist_data.get('strArtistThumb'):
        images.append({
            'Url': artist_data['strArtistThumb'],
            'CoverType': 'poster',
            'Extension': '.jpg'
        })
    
    if artist_data.get('strArtistLogo'):
        images.append({
            'Url': artist_data['strArtistLogo'],
            'CoverType': 'logo',
            'Extension': '.png'
        })
    
    if artist_data.get('strArtistFanart'):
        images.append({
            'Url': artist_data['strArtistFanart'],
            'CoverType': 'fanart',
            'Extension': '.jpg'
        })
    
    if artist_data.get('strArtistBanner'):
        images.append({
            'Url': artist_data['strArtistBanner'],
            'CoverType': 'banner',
            'Extension': '.jpg'
        })
    
    return images

Links:

def extract_links(artist_data):
    """Extract social media links"""
    links = []
    
    if artist_data.get('strWebsite'):
        links.append({
            'Url': f"http://{artist_data['strWebsite']}",
            'Name': 'website'
        })
    
    if artist_data.get('strFacebook'):
        links.append({
            'Url': f"https://{artist_data['strFacebook']}",
            'Name': 'facebook'
        })
    
    if artist_data.get('strTwitter'):
        links.append({
            'Url': f"https://{artist_data['strTwitter']}",
            'Name': 'twitter'
        })
    
    return links

Error Handling

404 Not Found: Artist/album not in TheAudioDB

Timeout: 10-second timeout, fallback to other providers

Invalid JSON: Graceful degradation

6. Wikipedia

Overview

Type: Web scraping

Purpose: Artist biographical information

Base URL: https://{lang}.wikipedia.org

Authentication: None (public access)

Configuration

WIKIPEDIA = {
    'timeout': 2,
    'max_connections_per_host': 1,
    'user_agent': 'LidarrMetadataAPI/10.0.0 (https://github.com/Lidarr/LidarrAPI.Metadata)',
    'languages': ['en', 'fr', 'de', 'es', 'it', 'ja', 'zh', 'ru', 'pt', 'nl', 'sv', 'fi', 'no', 'da', 'pl', 'cs', 'hu', 'ro', 'tr', 'el', 'he', 'ar', 'fa', 'hi', 'th', 'ko', 'vi', 'id', 'ms', 'tl', 'bn', 'ta']
}

Lookup Process

Multi-stage lookup:

MusicBrainz → Wikidata: Extract Wikidata ID from MusicBrainz links
Wikidata → Wikipedia: Get Wikipedia article title from Wikidata
Wikipedia → Extract: Scrape and parse Wikipedia article

Wikidata Integration

Wikidata entity URL: https://www.wikidata.org/wiki/Special:EntityData/{entity_id}.json

Extract Wikipedia links:

async def get_wikipedia_title_from_wikidata(wikidata_id, language='en'):
    """Get Wikipedia article title from Wikidata entity"""
    async with aiohttp.ClientSession() as session:
        url = f"https://www.wikidata.org/wiki/Special:EntityData/{wikidata_id}.json"
        
        async with session.get(url, timeout=2) as response:
            data = await response.json()
            entity = data['entities'][wikidata_id]
            
            # Get Wikipedia link for language
            sitelinks = entity.get('sitelinks', {})
            wiki_key = f'{language}wiki'
            
            if wiki_key in sitelinks:
                return sitelinks[wiki_key]['title']
            
            return None

Wikipedia Article Extraction

Fetch article HTML:

async def get_wikipedia_article(title, language='en'):
    """Fetch Wikipedia article HTML"""
    async with aiohttp.ClientSession() as session:
        url = f"https://{language}.wikipedia.org/wiki/{title}"
        headers = {'User-Agent': config.WIKIPEDIA['user_agent']}
        
        async with session.get(url, headers=headers, timeout=2) as response:
            if response.status == 404:
                return None
            response.raise_for_status()
            return await response.text()

Parse and extract summary:

from bs4 import BeautifulSoup

def extract_wikipedia_summary(html):
    """Extract first paragraph as summary"""
    soup = BeautifulSoup(html, 'lxml')
    
    # Find main content div
    content = soup.find('div', {'id': 'mw-content-text'})
    if not content:
        return None
    
    # Find first paragraph (skip disambiguation notices)
    for p in content.find_all('p', recursive=False):
        text = p.get_text().strip()
        
        # Skip empty paragraphs
        if not text:
            continue
        
        # Skip coordinate-only paragraphs
        if text.startswith('Coordinates:'):
            continue
        
        # Return first substantial paragraph
        if len(text) > 50:
            return text
    
    return None

Language Fallback

32-language fallback chain:

async def get_artist_overview(mbid):
    """Get artist overview with language fallback"""
    # Get Wikidata ID from MusicBrainz
    wikidata_id = await get_wikidata_id_from_musicbrainz(mbid)
    if not wikidata_id:
        return None
    
    # Try each language in order
    for language in config.WIKIPEDIA['languages']:
        try:
            # Get Wikipedia title for language
            title = await get_wikipedia_title_from_wikidata(wikidata_id, language)
            if not title:
                continue
            
            # Fetch and parse article
            html = await get_wikipedia_article(title, language)
            if not html:
                continue
            
            summary = extract_wikipedia_summary(html)
            if summary:
                return summary
        
        except Exception as e:
            logger.debug(f"Wikipedia lookup failed for {language}: {e}")
            continue
    
    return None

Rate Limiting

Polite crawling:

1 connection per host maximum
2-second timeout per request
User-Agent header identifies bot
Respect robots.txt (manual check)

No explicit rate limit: Wikipedia allows reasonable bot traffic

Error Handling

404 Not Found: Article doesn't exist in language

Timeout: 2-second timeout, try next language

Parse errors: Graceful degradation, try next language

Fallback: Use TheAudioDB biography if Wikipedia unavailable

7. Spotify

Overview

Type: REST API with OAuth

Purpose: ID mapping and cross-platform linking

Base URL: https://api.spotify.com/v1

Authentication: OAuth 2.0 Client Credentials

Library: spotipy 2.16.1

Configuration

SPOTIFY = {
    'client_id': 'your-client-id',
    'client_secret': 'your-client-secret',
    'redirect_uri': 'http://localhost:5001/spotify/callback',
    'timeout': 5
}

OAuth Flow

Client Credentials Grant (for server-to-server):

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

auth_manager = SpotifyClientCredentials(
    client_id=config.SPOTIFY['client_id'],
    client_secret=config.SPOTIFY['client_secret']
)

spotify = spotipy.Spotify(auth_manager=auth_manager)

Token caching: Tokens cached in Redis with automatic refresh

ID Mapping

MusicBrainz → Spotify:

async def map_musicbrainz_to_spotify(mbid, artist_name):
    """Map MusicBrainz ID to Spotify ID"""
    # Search Spotify by artist name
    results = spotify.search(q=f'artist:{artist_name}', type='artist', limit=10)
    
    if not results['artists']['items']:
        return None
    
    # Find best match using Levenshtein distance
    best_match = None
    best_score = 0
    
    for artist in results['artists']['items']:
        score = levenshtein_similarity(artist_name, artist['name'])
        if score > best_score and score >= 0.8:
            best_score = score
            best_match = artist
    
    return best_match['id'] if best_match else None

Levenshtein similarity:

from Levenshtein import ratio

def levenshtein_similarity(s1, s2):
    """Calculate Levenshtein similarity (0-1)"""
    return ratio(s1.lower(), s2.lower())

Threshold: 0.8 minimum similarity for match

Spotify API Endpoints

Get artist:

artist = spotify.artist('6olE6TJLqED3rqDCT0FyPh')

Get album:

album = spotify.album('2guirTSEqLizK7j9i1MTTZ')

Search:

results = spotify.search(q='nirvana', type='artist', limit=10)

Error Handling

429 Too Many Requests: Retry with exponential backoff

401 Unauthorized: Refresh OAuth token

404 Not Found: Artist/album not on Spotify

Timeout: 5-second timeout, graceful degradation

Caching Strategy

Cache TTL: 90 days (Spotify IDs rarely change)

Cache key: spotify:artist:{spotify_id} or spotify:mbid:{mbid}

8. Last.fm

Overview

Type: REST API

Purpose: Music charts and scrobble data

Base URL: https://ws.audioscrobbler.com/2.0

Authentication: API key

Library: pylast 4.3.0

Configuration

LASTFM = {
    'api_key': 'your-api-key',
    'api_secret': 'your-api-secret',
    'timeout': 5
}

pylast Integration

import pylast

network = pylast.LastFMNetwork(
    api_key=config.LASTFM['api_key'],
    api_secret=config.LASTFM['api_secret']
)

Chart Endpoints

Top artists:

def get_lastfm_top_artists(limit=50):
    """Get Last.fm top artists chart"""
    top_artists = network.get_top_artists(limit=limit)
    
    results = []
    for artist in top_artists:
        results.append({
            'name': artist.item.name,
            'playcount': artist.weight,
            'listeners': artist.item.get_listener_count()
        })
    
    return results

Top albums:

def get_lastfm_top_albums(limit=50):
    """Get Last.fm top albums chart"""
    top_albums = network.get_top_albums(limit=limit)
    
    results = []
    for album in top_albums:
        results.append({
            'name': album.item.title,
            'artist': album.item.artist.name,
            'playcount': album.weight
        })
    
    return results

Top tracks:

def get_lastfm_top_tracks(limit=50):
    """Get Last.fm top tracks chart"""
    top_tracks = network.get_top_tracks(limit=limit)
    
    results = []
    for track in top_tracks:
        results.append({
            'name': track.item.title,
            'artist': track.item.artist.name,
            'playcount': track.weight
        })
    
    return results

MusicBrainz Mapping

Map Last.fm artist to MusicBrainz:

async def map_lastfm_to_musicbrainz(lastfm_artist_name):
    """Map Last.fm artist to MusicBrainz ID"""
    # Search MusicBrainz via Solr
    results = await search_artist(lastfm_artist_name, limit=5)
    
    if not results:
        return None
    
    # Return best match (first result)
    return results[0]['Id']

Caching

Cache TTL: 6 hours (charts update daily)

Cache key: lastfm:chart:{type}:{limit}

9. Billboard

Overview

Type: Web scraping

Purpose: Billboard music charts

Base URL: https://www.billboard.com/charts

Authentication: None

Library: billboard-py 7.0.0

billboard-py Integration

import billboard

def get_billboard_hot_100():
    """Get Billboard Hot 100 chart"""
    chart = billboard.ChartData('hot-100')
    
    results = []
    for entry in chart:
        results.append({
            'position': entry.rank,
            'title': entry.title,
            'artist': entry.artist,
            'last_position': entry.lastPos,
            'peak_position': entry.peakPos,
            'weeks_on_chart': entry.weeks
        })
    
    return results

Supported Charts

Chart Name	billboard-py ID	Type
Hot 100	`hot-100`	Tracks
Billboard 200	`billboard-200`	Albums
Artist 100	`artist-100`	Artists
Streaming Songs	`streaming-songs`	Tracks
Radio Songs	`radio-songs`	Tracks
Digital Song Sales	`digital-song-sales`	Tracks

MusicBrainz Mapping

Map Billboard entry to MusicBrainz:

async def map_billboard_to_musicbrainz(artist_name, track_title=None):
    """Map Billboard entry to MusicBrainz"""
    # Search artist
    artist_results = await search_artist(artist_name, limit=5)
    if not artist_results:
        return None
    
    artist_mbid = artist_results[0]['Id']
    
    # If track title provided, search for recording
    if track_title:
        # Search would require recording search (not implemented)
        pass
    
    return artist_mbid

Error Handling

HTTP errors: Retry with backoff

Parse errors: Graceful degradation

Rate limiting: Polite crawling (1 request per second)

Caching

Cache TTL: 6 hours (charts update weekly)

Cache key: billboard:chart:{chart_name}

10. Apple Music / iTunes

Overview

Type: RSS API

Purpose: Apple Music and iTunes charts

Base URL: https://rss.applemarketingtools.com/api/v2

Authentication: None

RSS Feed URLs

Top albums:

https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/albums.json

Top songs:

https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/songs.json

New releases:

https://rss.applemarketingtools.com/api/v2/us/music/new-releases/100/albums.json

Fetch and Parse

async def get_apple_music_chart(chart_type, limit=100):
    """Fetch Apple Music chart"""
    async with aiohttp.ClientSession() as session:
        url = f"https://rss.applemarketingtools.com/api/v2/us/music/most-played/{limit}/{chart_type}.json"
        
        async with session.get(url, timeout=5) as response:
            response.raise_for_status()
            data = await response.json()
            
            results = []
            for entry in data['feed']['results']:
                results.append({
                    'position': len(results) + 1,
                    'name': entry['name'],
                    'artist': entry['artistName'],
                    'url': entry['url'],
                    'artwork': entry['artworkUrl100']
                })
            
            return results

MusicBrainz Mapping

Map Apple Music entry to MusicBrainz: Similar to Billboard mapping

Caching

Cache TTL: 6 hours

Cache key: apple:chart:{type}:{limit}

11. RabbitMQ

Overview

Type: Message queue

Purpose: Real-time search index updates

Technology: RabbitMQ 3.x

Protocol: AMQP 0.9.1

Configuration

RABBITMQ = {
    'host': 'rabbitmq',
    'port': 5672,
    'user': 'abc',
    'password': 'abc',
    'exchange': 'search.index',
    'artist_queue': 'search.index.artist',
    'album_queue': 'search.index.album'
}

Message Format

Artist update message:

{
  "entity_type": "artist",
  "mbid": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
  "action": "update",
  "timestamp": "2025-04-28T12:34:56Z"
}

Album update message:

{
  "entity_type": "release_group",
  "mbid": "1b022e01-4da6-387b-8658-8678046e4cef",
  "action": "update",
  "timestamp": "2025-04-28T12:34:56Z"
}

SIR (Search Index Rebuilder)

Purpose: Consume RabbitMQ messages and update Solr

Process:

Connect to RabbitMQ
Subscribe to queues
Consume messages
Query MusicBrainz DB for entity
Post update to Solr
Acknowledge message

Container: Separate service in docker-compose

Monitoring

Queue depth:

rabbitmqctl list_queues name messages

Consumer count:

rabbitmqctl list_consumers

12. Redis

Overview

Type: In-memory cache

Purpose: Ephemeral cache and rate limiting

Technology: Redis 6+

Memory: 512MB limit

Configuration

REDIS = {
    'url': 'redis://redis:6379/0',
    'namespace': 'lm3.7',
    'max_memory': '512mb',
    'eviction_policy': 'allkeys-lfu'
}

Use Cases

Hot cache: Frequently accessed metadata
Rate limiting: Request counting
Sentry deduplication: Error tracking
Invalidation locks: Distributed locking

Connection Pool

import aioredis

redis = await aioredis.create_redis_pool(
    config.REDIS['url'],
    minsize=5,
    maxsize=20,
    encoding='utf-8'
)

13. Sentry

Overview

Type: Error tracking

Purpose: Application monitoring

Technology: Sentry SaaS

Library: sentry-sdk 0.19.5

Configuration

import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration

sentry_sdk.init(
    dsn=config.SENTRY_DSN,
    integrations=[FlaskIntegration()],
    release=f"lidarr-metadata@{__version__}",
    environment=config.ENVIRONMENT,
    traces_sample_rate=0.1
)

Redis-Based Rate Limiting

Purpose: Prevent alert fatigue

class SentryRedisTtlProcessor:
    """Rate limit Sentry events using Redis"""
    
    def __init__(self, redis, ttl=3600):
        self.redis = redis
        self.ttl = ttl
    
    async def __call__(self, event, hint):
        # Generate error hash
        error_hash = hashlib.md5(
            f"{event['exception']['type']}:{event['exception']['value']}".encode()
        ).hexdigest()
        
        key = f"lm3.7:sentry:{error_hash}"
        
        # Check if error seen recently
        if await self.redis.exists(key):
            return None  # Drop event
        
        # Mark error as seen
        await self.redis.setex(key, self.ttl, "1")
        
        return event

Release Tracking

Sentry releases: Tied to git commits

CI/CD integration:

sentry-cli releases new "lidarr-metadata@${GIT_SHA}"
sentry-cli releases set-commits "lidarr-metadata@${GIT_SHA}" --auto
sentry-cli releases finalize "lidarr-metadata@${GIT_SHA}"

14. Telegraf

Overview

Type: Metrics collection

Purpose: StatsD metrics aggregation

Technology: Telegraf (InfluxData)

Protocol: StatsD

Configuration

TELEGRAF = {
    'host': 'telegraf',
    'port': 8125,
    'prefix': 'lidarr.metadata'
}

StatsD Client

import statsd

stats = statsd.StatsClient(
    host=config.TELEGRAF['host'],
    port=config.TELEGRAF['port'],
    prefix=config.TELEGRAF['prefix']
)

Metrics

Request counters:

stats.incr('requests.artist')
stats.incr('requests.album')
stats.incr('requests.search')

Response times:

with stats.timer('response_time.artist'):
    artist = await get_artist(mbid)

Cache hits/misses:

stats.incr('cache.hit')
stats.incr('cache.miss')

Provider requests:

stats.incr('provider.fanart.request')
stats.incr('provider.wikipedia.request')

15. Cloudflare

Overview

Type: CDN and edge caching

Purpose: Global content delivery

Technology: Cloudflare CDN

API: Cloudflare REST API v4

Configuration

CLOUDFLARE = {
    'zone_id': 'your-zone-id',
    'api_token': 'your-api-token',
    'base_url': 'https://api.cloudflare.com/client/v4'
}

Cache Purge

Purge by URL:

async def purge_cloudflare_cache(urls):
    """Purge Cloudflare cache for URLs"""
    async with aiohttp.ClientSession() as session:
        headers = {
            'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}",
            'Content-Type': 'application/json'
        }
        
        # Batch URLs (max 30 per request)
        for batch in chunks(urls, 30):
            data = {'files': batch}
            url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache"
            
            async with session.post(url, headers=headers, json=data) as response:
                response.raise_for_status()

Purge all:

async def purge_all_cloudflare_cache():
    """Purge entire Cloudflare cache"""
    async with aiohttp.ClientSession() as session:
        headers = {
            'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}",
            'Content-Type': 'application/json'
        }
        
        data = {'purge_everything': True}
        url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache"
        
        async with session.post(url, headers=headers, json=data) as response:
            response.raise_for_status()

Rate Limits

Cloudflare API: 1200 requests per 5 minutes

Batch purging: Max 30 URLs per request

Cache-Control Headers

Set by API:

response.headers['Cache-Control'] = 's-maxage=2592000, max-age=0'

Interpretation:

s-maxage=2592000: CDN caches for 30 days
max-age=0: Clients must revalidate

Integration Summary

The 15 integrations provide comprehensive metadata aggregation:

Core data: MusicBrainz DB (direct SQL)

Search: Solr (real-time via RabbitMQ)

Images: Cover Art Archive, FanArt.tv, TheAudioDB

Biographies: Wikipedia (32 languages), TheAudioDB

Charts: Last.fm, Billboard, Apple Music, Spotify

Cross-platform: Spotify ID mapping

Infrastructure: Redis (cache), PostgreSQL (persistent cache), RabbitMQ (messaging)

Monitoring: Sentry (errors), Telegraf (metrics)

CDN: Cloudflare (edge caching)

The integration architecture demonstrates excellent separation of concerns with fallback chains for resilience.

35 KiB Raw Blame History

Lidarr Metadata API - External Integrations

Integration Overview

1. MusicBrainz Database

Overview

Configuration

Connection Pool

Replication Setup

Database Size and Performance

Security Considerations

2. Solr Search

Overview

Configuration

Query Interface

Real-Time Index Updates

Index Maintenance

Performance Tuning

3. Cover Art Archive

Overview

Image URL Format

Image Types

Image Proxy Benefits

Integration Code

Error Handling

4. FanArt.tv

Overview

Configuration

API Key Types

Endpoints

Artist Images

Album Images

Image Types

Mapping to Lidarr Image Types

Error Handling

Caching Strategy

5. TheAudioDB

Overview

Configuration

Endpoints

Artist by MusicBrainz ID

Album by MusicBrainz ID

Data Extraction

Error Handling

6. Wikipedia

Overview

Configuration

Lookup Process

Wikidata Integration

Wikipedia Article Extraction

Language Fallback

Rate Limiting

Error Handling

7. Spotify

Overview

Configuration

OAuth Flow

ID Mapping

Spotify API Endpoints

Error Handling

Caching Strategy

8. Last.fm

Overview

Configuration

pylast Integration

Chart Endpoints

MusicBrainz Mapping

Caching

9. Billboard

Overview

billboard-py Integration

Supported Charts

MusicBrainz Mapping

Error Handling

Caching

10. Apple Music / iTunes

Overview

RSS Feed URLs

Fetch and Parse

MusicBrainz Mapping

Caching

35 KiB

Raw Blame History