Files
metadata-agregator/docs/research/lidarr-metadata-api/analysis/INTEGRATIONS.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

35 KiB

Lidarr Metadata API - External Integrations

Integration Overview

The Lidarr Metadata API integrates with 15 external systems to provide comprehensive music metadata aggregation:

Integration Type Purpose Authentication Rate Limit
MusicBrainz DB Database Core metadata Read-only user N/A
Solr Search Search Engine Full-text search None N/A
Cover Art Archive CDN Album cover art None N/A
FanArt.tv REST API Artist/album images API key 7-day lag (free)
TheAudioDB REST API Metadata fallback API key "1" Unknown
Wikipedia Web Scraping Artist biographies None Polite crawling
Spotify REST API ID mapping, charts OAuth 429 handling
Last.fm REST API Charts API key Unknown
Billboard Web Scraping Charts None Polite crawling
Apple Music RSS API Charts None N/A
RabbitMQ Message Queue Search index updates Basic auth N/A
Redis Cache Ephemeral cache None N/A
Sentry Error Tracking Monitoring DSN Redis-based
Telegraf Metrics StatsD metrics None N/A
Cloudflare CDN Edge caching API token 1200 req/5min

1. MusicBrainz Database

Overview

Type: Direct PostgreSQL database access

Purpose: Authoritative source for all music metadata

Container: ghcr.io/lidarr/mb-postgres:1.0.10

Access method: Read-only asyncpg connection

Configuration

MUSICBRAINZ_DB = {
    'host': 'musicbrainz',
    'port': 5432,
    'database': 'musicbrainz_db',
    'user': 'musicbrainz_ro',  # Read-only user
    'password': 'abc',
    'min_pool_size': 10,
    'max_pool_size': 50,
    'command_timeout': 30
}

Connection Pool

import asyncpg

pool = await asyncpg.create_pool(
    host=config.MUSICBRAINZ_DB['host'],
    port=config.MUSICBRAINZ_DB['port'],
    database=config.MUSICBRAINZ_DB['database'],
    user=config.MUSICBRAINZ_DB['user'],
    password=config.MUSICBRAINZ_DB['password'],
    min_size=config.MUSICBRAINZ_DB['min_pool_size'],
    max_size=config.MUSICBRAINZ_DB['max_pool_size'],
    command_timeout=config.MUSICBRAINZ_DB['command_timeout']
)

Replication Setup

Replication method: MusicBrainz replication packets

Update frequency: Hourly

Replication script: Custom script in container

Process:

  1. Check current replication sequence
  2. Download replication packets from MusicBrainz FTP
  3. Apply SQL changes
  4. Update replication control table
  5. Trigger search index updates

Monitoring replication lag:

SELECT
    current_replication_sequence,
    last_replication_date,
    NOW() - last_replication_date AS lag
FROM replication_control;

Database Size and Performance

Database size: 100GB+ (full MusicBrainz dataset)

Query performance:

  • Simple artist lookup: 50-100ms
  • Complex artist with releases: 100-500ms
  • Album with tracks: 200-1000ms
  • Change detection query: 500-2000ms

Optimization: Custom indices on last_updated columns

Security Considerations

Read-only access: User has SELECT-only permissions

Network isolation: Database accessible only within Docker network

Credentials: Hardcoded (insecure default, should be changed)

Overview

Type: Apache Solr 8.x search engine

Purpose: Full-text search for artists and albums

Container: ghcr.io/lidarr/mb-solr:3.3.1.9

Cores: artist, release-group

Configuration

SOLR = {
    'url': 'http://solr:8983/solr',
    'artist_core': 'artist',
    'album_core': 'release-group',
    'timeout': 5,
    'rows': 10
}

Query Interface

HTTP client: aiohttp

Query format: JSON API

Example query:

import aiohttp

async def search_artist(query, limit=10):
    async with aiohttp.ClientSession() as session:
        params = {
            'q': query,
            'defType': 'dismax',
            'qf': 'artist^2 sortname alias',
            'mm': '1',
            'rows': limit,
            'wt': 'json'
        }
        
        async with session.get(
            f"{config.SOLR['url']}/{config.SOLR['artist_core']}/select",
            params=params,
            timeout=aiohttp.ClientTimeout(total=config.SOLR['timeout'])
        ) as response:
            data = await response.json()
            return data['response']['docs']

Real-Time Index Updates

Update mechanism: RabbitMQ + SIR (Search Index Rebuilder)

Process:

  1. MusicBrainz database changes trigger RabbitMQ messages
  2. SIR consumes messages from queue
  3. SIR queries MusicBrainz DB for updated entity
  4. SIR posts update to Solr core
  5. Solr performs soft commit (1 second)

Update latency: 1-5 seconds from database change

Index Maintenance

Full reindex: Required after schema changes

Reindex process:

# Stop SIR
docker-compose stop indexer

# Clear Solr cores
curl "http://solr:8983/solr/artist/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
curl "http://solr:8983/solr/release-group/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'

# Rebuild indices
docker-compose run indexer rebuild-artist
docker-compose run indexer rebuild-album

# Restart SIR
docker-compose start indexer

Reindex duration: 4-8 hours for full MusicBrainz dataset

Performance Tuning

JVM heap size: 2GB

Solr cache settings:

<filterCache size="512" initialSize="512" autowarmCount="256"/>
<queryResultCache size="512" initialSize="512" autowarmCount="256"/>
<documentCache size="512" initialSize="512" autowarmCount="0"/>

Commit settings:

<autoCommit>
  <maxTime>15000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
  <maxTime>1000</maxTime>
</autoSoftCommit>

3. Cover Art Archive

Overview

Type: Image CDN

Purpose: Album cover art images

Base URL: https://coverartarchive.org

Proxy: https://imagecache.lidarr.audio

Image URL Format

Direct URL:

https://coverartarchive.org/release/{release-mbid}/front-500.jpg

Proxied URL:

https://imagecache.lidarr.audio/cover/{release-mbid}/front.jpg

Image Types

Type Description Typical Size
front Front cover 500x500 - 1200x1200
back Back cover 500x500 - 1200x1200
booklet Booklet pages Variable
medium Disc/vinyl image 500x500
tray CD tray card Variable
obi Japanese obi strip Variable
spine Spine image Variable
track Track listing Variable
liner Liner notes Variable
sticker Sticker image Variable
poster Poster image Variable

Image Proxy Benefits

Advantages of using imagecache.lidarr.audio:

  1. Caching: Images cached at edge for faster delivery
  2. Resizing: Automatic image resizing via query parameters
  3. Format conversion: WebP conversion for modern browsers
  4. Bandwidth: Reduced load on Cover Art Archive
  5. Reliability: Fallback to direct URL if proxy fails

Proxy query parameters:

https://imagecache.lidarr.audio/cover/{mbid}/front.jpg?size=500&format=webp

Integration Code

async def get_cover_art(release_mbid):
    """Fetch cover art URLs for release"""
    images = []
    
    # Try proxy first
    proxy_url = f"https://imagecache.lidarr.audio/cover/{release_mbid}/front.jpg"
    if await check_url_exists(proxy_url):
        images.append({
            'Url': proxy_url,
            'CoverType': 'cover',
            'Extension': '.jpg'
        })
    else:
        # Fallback to direct URL
        direct_url = f"https://coverartarchive.org/release/{release_mbid}/front-500.jpg"
        if await check_url_exists(direct_url):
            images.append({
                'Url': direct_url,
                'CoverType': 'cover',
                'Extension': '.jpg'
            })
    
    return images

Error Handling

404 Not Found: No cover art available for release

503 Service Unavailable: Cover Art Archive temporarily down

Fallback: Use FanArt.tv or TheAudioDB images

4. FanArt.tv

Overview

Type: REST API

Purpose: High-quality artist and album images

Base URL: https://webservice.fanart.tv/v3

Authentication: API key

Configuration

FANART = {
    'api_key': 'your-api-key-here',
    'base_url': 'https://webservice.fanart.tv/v3',
    'timeout': 10,
    'cache_ttl': 2592000  # 30 days
}

API Key Types

Key Type Cost Rate Limit Image Lag
Free Free Unknown 7 days
Personal $2/month Higher No lag
Commercial $5/month Highest No lag

7-day lag: Free API keys only return images added 7+ days ago

Endpoints

Artist Images

Endpoint: GET /music/{mbid}

Request:

async def get_fanart_artist_images(mbid):
    async with aiohttp.ClientSession() as session:
        headers = {'api-key': config.FANART['api_key']}
        url = f"{config.FANART['base_url']}/music/{mbid}"
        
        async with session.get(url, headers=headers, timeout=10) as response:
            if response.status == 404:
                return []
            response.raise_for_status()
            return await response.json()

Response:

{
  "name": "Nirvana",
  "mbid_id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
  "artistbackground": [
    {
      "id": "12345",
      "url": "https://assets.fanart.tv/fanart/music/5b11f4ce-a62d-471e-81fc-a69a8278c7da/artistbackground/nirvana-1.jpg",
      "likes": "42"
    }
  ],
  "artistthumb": [...],
  "hdmusiclogo": [...],
  "musicbanner": [...],
  "musiclogo": [...]
}

Album Images

Endpoint: GET /music/albums/{mbid}

Response:

{
  "albums": {
    "1b022e01-4da6-387b-8658-8678046e4cef": {
      "albumcover": [
        {
          "id": "67890",
          "url": "https://assets.fanart.tv/fanart/music/1b022e01-4da6-387b-8658-8678046e4cef/albumcover/nevermind-1.jpg",
          "likes": "156"
        }
      ],
      "cdart": [...]
    }
  }
}

Image Types

Artist images:

  • artistbackground: Background images (1920x1080)
  • artistthumb: Artist thumbnails (1000x1000)
  • hdmusiclogo: HD logos (transparent PNG)
  • musicbanner: Banners (1000x185)
  • musiclogo: Standard logos (transparent PNG)

Album images:

  • albumcover: Album covers (1000x1000)
  • cdart: CD art (transparent PNG)

Mapping to Lidarr Image Types

FANART_TYPE_MAPPING = {
    'artistbackground': 'fanart',
    'artistthumb': 'poster',
    'hdmusiclogo': 'logo',
    'musicbanner': 'banner',
    'musiclogo': 'logo',
    'albumcover': 'cover',
    'cdart': 'disc'
}

Error Handling

404 Not Found: No images available for artist/album

429 Too Many Requests: Rate limit exceeded (retry with backoff)

503 Service Unavailable: FanArt.tv temporarily down

Fallback: Use TheAudioDB or Cover Art Archive

Caching Strategy

Cache TTL: 30 days (images rarely change)

Cache key: fanart:artist:{mbid} or fanart:album:{mbid}

Invalidation: Manual only (images are immutable)

5. TheAudioDB

Overview

Type: REST API

Purpose: Fallback metadata and images

Base URL: https://theaudiodb.com/api/v1/json

Authentication: API key "1" (public key)

Configuration

THEAUDIODB = {
    'api_key': '1',
    'base_url': 'https://theaudiodb.com/api/v1/json',
    'timeout': 10,
    'cache_ttl': 2592000  # 30 days
}

Endpoints

Artist by MusicBrainz ID

Endpoint: GET /1/artist-mb.php?i={mbid}

Request:

async def get_theaudiodb_artist(mbid):
    async with aiohttp.ClientSession() as session:
        url = f"{config.THEAUDIODB['base_url']}/1/artist-mb.php"
        params = {'i': mbid}
        
        async with session.get(url, params=params, timeout=10) as response:
            if response.status == 404:
                return None
            response.raise_for_status()
            data = await response.json()
            return data['artists'][0] if data['artists'] else None

Response:

{
  "artists": [
    {
      "idArtist": "111247",
      "strArtist": "Nirvana",
      "strArtistAlternate": "",
      "strLabel": "DGC Records",
      "idLabel": "45114",
      "intFormedYear": "1987",
      "intBornYear": "",
      "intDiedYear": "",
      "strDisbanded": "1994",
      "strStyle": "Grunge",
      "strGenre": "Rock",
      "strMood": "Angry",
      "strWebsite": "www.nirvana.com",
      "strFacebook": "www.facebook.com/Nirvana",
      "strTwitter": "twitter.com/nirvana",
      "strBiographyEN": "Nirvana was an American rock band...",
      "strBiographyDE": null,
      "strBiographyFR": null,
      "strGender": "Male",
      "strCountry": "United States",
      "strCountryCode": "US",
      "strArtistThumb": "https://www.theaudiodb.com/images/media/artist/thumb/uxrqxy1347913147.jpg",
      "strArtistLogo": "https://www.theaudiodb.com/images/media/artist/logo/urspuv1434553994.png",
      "strArtistFanart": "https://www.theaudiodb.com/images/media/artist/fanart/spvryu1347980801.jpg",
      "strArtistBanner": "https://www.theaudiodb.com/images/media/artist/banner/xuypqw1342640163.jpg",
      "strMusicBrainzID": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
      "strLastFMChart": "https://www.last.fm/music/Nirvana",
      "intCharted": "5",
      "strLocked": "unlocked"
    }
  ]
}

Album by MusicBrainz ID

Endpoint: GET /1/album-mb.php?i={mbid}

Response: Similar structure with album-specific fields

Data Extraction

Biography:

def extract_biography(artist_data):
    """Extract biography with language fallback"""
    languages = ['EN', 'DE', 'FR', 'ES', 'IT', 'JP']
    for lang in languages:
        bio = artist_data.get(f'strBiography{lang}')
        if bio:
            return bio
    return None

Images:

def extract_images(artist_data):
    """Extract image URLs"""
    images = []
    
    if artist_data.get('strArtistThumb'):
        images.append({
            'Url': artist_data['strArtistThumb'],
            'CoverType': 'poster',
            'Extension': '.jpg'
        })
    
    if artist_data.get('strArtistLogo'):
        images.append({
            'Url': artist_data['strArtistLogo'],
            'CoverType': 'logo',
            'Extension': '.png'
        })
    
    if artist_data.get('strArtistFanart'):
        images.append({
            'Url': artist_data['strArtistFanart'],
            'CoverType': 'fanart',
            'Extension': '.jpg'
        })
    
    if artist_data.get('strArtistBanner'):
        images.append({
            'Url': artist_data['strArtistBanner'],
            'CoverType': 'banner',
            'Extension': '.jpg'
        })
    
    return images

Links:

def extract_links(artist_data):
    """Extract social media links"""
    links = []
    
    if artist_data.get('strWebsite'):
        links.append({
            'Url': f"http://{artist_data['strWebsite']}",
            'Name': 'website'
        })
    
    if artist_data.get('strFacebook'):
        links.append({
            'Url': f"https://{artist_data['strFacebook']}",
            'Name': 'facebook'
        })
    
    if artist_data.get('strTwitter'):
        links.append({
            'Url': f"https://{artist_data['strTwitter']}",
            'Name': 'twitter'
        })
    
    return links

Error Handling

404 Not Found: Artist/album not in TheAudioDB

Timeout: 10-second timeout, fallback to other providers

Invalid JSON: Graceful degradation

6. Wikipedia

Overview

Type: Web scraping

Purpose: Artist biographical information

Base URL: https://{lang}.wikipedia.org

Authentication: None (public access)

Configuration

WIKIPEDIA = {
    'timeout': 2,
    'max_connections_per_host': 1,
    'user_agent': 'LidarrMetadataAPI/10.0.0 (https://github.com/Lidarr/LidarrAPI.Metadata)',
    'languages': ['en', 'fr', 'de', 'es', 'it', 'ja', 'zh', 'ru', 'pt', 'nl', 'sv', 'fi', 'no', 'da', 'pl', 'cs', 'hu', 'ro', 'tr', 'el', 'he', 'ar', 'fa', 'hi', 'th', 'ko', 'vi', 'id', 'ms', 'tl', 'bn', 'ta']
}

Lookup Process

Multi-stage lookup:

  1. MusicBrainz → Wikidata: Extract Wikidata ID from MusicBrainz links
  2. Wikidata → Wikipedia: Get Wikipedia article title from Wikidata
  3. Wikipedia → Extract: Scrape and parse Wikipedia article

Wikidata Integration

Wikidata entity URL: https://www.wikidata.org/wiki/Special:EntityData/{entity_id}.json

Extract Wikipedia links:

async def get_wikipedia_title_from_wikidata(wikidata_id, language='en'):
    """Get Wikipedia article title from Wikidata entity"""
    async with aiohttp.ClientSession() as session:
        url = f"https://www.wikidata.org/wiki/Special:EntityData/{wikidata_id}.json"
        
        async with session.get(url, timeout=2) as response:
            data = await response.json()
            entity = data['entities'][wikidata_id]
            
            # Get Wikipedia link for language
            sitelinks = entity.get('sitelinks', {})
            wiki_key = f'{language}wiki'
            
            if wiki_key in sitelinks:
                return sitelinks[wiki_key]['title']
            
            return None

Wikipedia Article Extraction

Fetch article HTML:

async def get_wikipedia_article(title, language='en'):
    """Fetch Wikipedia article HTML"""
    async with aiohttp.ClientSession() as session:
        url = f"https://{language}.wikipedia.org/wiki/{title}"
        headers = {'User-Agent': config.WIKIPEDIA['user_agent']}
        
        async with session.get(url, headers=headers, timeout=2) as response:
            if response.status == 404:
                return None
            response.raise_for_status()
            return await response.text()

Parse and extract summary:

from bs4 import BeautifulSoup

def extract_wikipedia_summary(html):
    """Extract first paragraph as summary"""
    soup = BeautifulSoup(html, 'lxml')
    
    # Find main content div
    content = soup.find('div', {'id': 'mw-content-text'})
    if not content:
        return None
    
    # Find first paragraph (skip disambiguation notices)
    for p in content.find_all('p', recursive=False):
        text = p.get_text().strip()
        
        # Skip empty paragraphs
        if not text:
            continue
        
        # Skip coordinate-only paragraphs
        if text.startswith('Coordinates:'):
            continue
        
        # Return first substantial paragraph
        if len(text) > 50:
            return text
    
    return None

Language Fallback

32-language fallback chain:

async def get_artist_overview(mbid):
    """Get artist overview with language fallback"""
    # Get Wikidata ID from MusicBrainz
    wikidata_id = await get_wikidata_id_from_musicbrainz(mbid)
    if not wikidata_id:
        return None
    
    # Try each language in order
    for language in config.WIKIPEDIA['languages']:
        try:
            # Get Wikipedia title for language
            title = await get_wikipedia_title_from_wikidata(wikidata_id, language)
            if not title:
                continue
            
            # Fetch and parse article
            html = await get_wikipedia_article(title, language)
            if not html:
                continue
            
            summary = extract_wikipedia_summary(html)
            if summary:
                return summary
        
        except Exception as e:
            logger.debug(f"Wikipedia lookup failed for {language}: {e}")
            continue
    
    return None

Rate Limiting

Polite crawling:

  • 1 connection per host maximum
  • 2-second timeout per request
  • User-Agent header identifies bot
  • Respect robots.txt (manual check)

No explicit rate limit: Wikipedia allows reasonable bot traffic

Error Handling

404 Not Found: Article doesn't exist in language

Timeout: 2-second timeout, try next language

Parse errors: Graceful degradation, try next language

Fallback: Use TheAudioDB biography if Wikipedia unavailable

7. Spotify

Overview

Type: REST API with OAuth

Purpose: ID mapping and cross-platform linking

Base URL: https://api.spotify.com/v1

Authentication: OAuth 2.0 Client Credentials

Library: spotipy 2.16.1

Configuration

SPOTIFY = {
    'client_id': 'your-client-id',
    'client_secret': 'your-client-secret',
    'redirect_uri': 'http://localhost:5001/spotify/callback',
    'timeout': 5
}

OAuth Flow

Client Credentials Grant (for server-to-server):

import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

auth_manager = SpotifyClientCredentials(
    client_id=config.SPOTIFY['client_id'],
    client_secret=config.SPOTIFY['client_secret']
)

spotify = spotipy.Spotify(auth_manager=auth_manager)

Token caching: Tokens cached in Redis with automatic refresh

ID Mapping

MusicBrainz → Spotify:

async def map_musicbrainz_to_spotify(mbid, artist_name):
    """Map MusicBrainz ID to Spotify ID"""
    # Search Spotify by artist name
    results = spotify.search(q=f'artist:{artist_name}', type='artist', limit=10)
    
    if not results['artists']['items']:
        return None
    
    # Find best match using Levenshtein distance
    best_match = None
    best_score = 0
    
    for artist in results['artists']['items']:
        score = levenshtein_similarity(artist_name, artist['name'])
        if score > best_score and score >= 0.8:
            best_score = score
            best_match = artist
    
    return best_match['id'] if best_match else None

Levenshtein similarity:

from Levenshtein import ratio

def levenshtein_similarity(s1, s2):
    """Calculate Levenshtein similarity (0-1)"""
    return ratio(s1.lower(), s2.lower())

Threshold: 0.8 minimum similarity for match

Spotify API Endpoints

Get artist:

artist = spotify.artist('6olE6TJLqED3rqDCT0FyPh')

Get album:

album = spotify.album('2guirTSEqLizK7j9i1MTTZ')

Search:

results = spotify.search(q='nirvana', type='artist', limit=10)

Error Handling

429 Too Many Requests: Retry with exponential backoff

401 Unauthorized: Refresh OAuth token

404 Not Found: Artist/album not on Spotify

Timeout: 5-second timeout, graceful degradation

Caching Strategy

Cache TTL: 90 days (Spotify IDs rarely change)

Cache key: spotify:artist:{spotify_id} or spotify:mbid:{mbid}

8. Last.fm

Overview

Type: REST API

Purpose: Music charts and scrobble data

Base URL: https://ws.audioscrobbler.com/2.0

Authentication: API key

Library: pylast 4.3.0

Configuration

LASTFM = {
    'api_key': 'your-api-key',
    'api_secret': 'your-api-secret',
    'timeout': 5
}

pylast Integration

import pylast

network = pylast.LastFMNetwork(
    api_key=config.LASTFM['api_key'],
    api_secret=config.LASTFM['api_secret']
)

Chart Endpoints

Top artists:

def get_lastfm_top_artists(limit=50):
    """Get Last.fm top artists chart"""
    top_artists = network.get_top_artists(limit=limit)
    
    results = []
    for artist in top_artists:
        results.append({
            'name': artist.item.name,
            'playcount': artist.weight,
            'listeners': artist.item.get_listener_count()
        })
    
    return results

Top albums:

def get_lastfm_top_albums(limit=50):
    """Get Last.fm top albums chart"""
    top_albums = network.get_top_albums(limit=limit)
    
    results = []
    for album in top_albums:
        results.append({
            'name': album.item.title,
            'artist': album.item.artist.name,
            'playcount': album.weight
        })
    
    return results

Top tracks:

def get_lastfm_top_tracks(limit=50):
    """Get Last.fm top tracks chart"""
    top_tracks = network.get_top_tracks(limit=limit)
    
    results = []
    for track in top_tracks:
        results.append({
            'name': track.item.title,
            'artist': track.item.artist.name,
            'playcount': track.weight
        })
    
    return results

MusicBrainz Mapping

Map Last.fm artist to MusicBrainz:

async def map_lastfm_to_musicbrainz(lastfm_artist_name):
    """Map Last.fm artist to MusicBrainz ID"""
    # Search MusicBrainz via Solr
    results = await search_artist(lastfm_artist_name, limit=5)
    
    if not results:
        return None
    
    # Return best match (first result)
    return results[0]['Id']

Caching

Cache TTL: 6 hours (charts update daily)

Cache key: lastfm:chart:{type}:{limit}

9. Billboard

Overview

Type: Web scraping

Purpose: Billboard music charts

Base URL: https://www.billboard.com/charts

Authentication: None

Library: billboard-py 7.0.0

billboard-py Integration

import billboard

def get_billboard_hot_100():
    """Get Billboard Hot 100 chart"""
    chart = billboard.ChartData('hot-100')
    
    results = []
    for entry in chart:
        results.append({
            'position': entry.rank,
            'title': entry.title,
            'artist': entry.artist,
            'last_position': entry.lastPos,
            'peak_position': entry.peakPos,
            'weeks_on_chart': entry.weeks
        })
    
    return results

Supported Charts

Chart Name billboard-py ID Type
Hot 100 hot-100 Tracks
Billboard 200 billboard-200 Albums
Artist 100 artist-100 Artists
Streaming Songs streaming-songs Tracks
Radio Songs radio-songs Tracks
Digital Song Sales digital-song-sales Tracks

MusicBrainz Mapping

Map Billboard entry to MusicBrainz:

async def map_billboard_to_musicbrainz(artist_name, track_title=None):
    """Map Billboard entry to MusicBrainz"""
    # Search artist
    artist_results = await search_artist(artist_name, limit=5)
    if not artist_results:
        return None
    
    artist_mbid = artist_results[0]['Id']
    
    # If track title provided, search for recording
    if track_title:
        # Search would require recording search (not implemented)
        pass
    
    return artist_mbid

Error Handling

HTTP errors: Retry with backoff

Parse errors: Graceful degradation

Rate limiting: Polite crawling (1 request per second)

Caching

Cache TTL: 6 hours (charts update weekly)

Cache key: billboard:chart:{chart_name}

10. Apple Music / iTunes

Overview

Type: RSS API

Purpose: Apple Music and iTunes charts

Base URL: https://rss.applemarketingtools.com/api/v2

Authentication: None

RSS Feed URLs

Top albums:

https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/albums.json

Top songs:

https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/songs.json

New releases:

https://rss.applemarketingtools.com/api/v2/us/music/new-releases/100/albums.json

Fetch and Parse

async def get_apple_music_chart(chart_type, limit=100):
    """Fetch Apple Music chart"""
    async with aiohttp.ClientSession() as session:
        url = f"https://rss.applemarketingtools.com/api/v2/us/music/most-played/{limit}/{chart_type}.json"
        
        async with session.get(url, timeout=5) as response:
            response.raise_for_status()
            data = await response.json()
            
            results = []
            for entry in data['feed']['results']:
                results.append({
                    'position': len(results) + 1,
                    'name': entry['name'],
                    'artist': entry['artistName'],
                    'url': entry['url'],
                    'artwork': entry['artworkUrl100']
                })
            
            return results

MusicBrainz Mapping

Map Apple Music entry to MusicBrainz: Similar to Billboard mapping

Caching

Cache TTL: 6 hours

Cache key: apple:chart:{type}:{limit}

11. RabbitMQ

Overview

Type: Message queue

Purpose: Real-time search index updates

Technology: RabbitMQ 3.x

Protocol: AMQP 0.9.1

Configuration

RABBITMQ = {
    'host': 'rabbitmq',
    'port': 5672,
    'user': 'abc',
    'password': 'abc',
    'exchange': 'search.index',
    'artist_queue': 'search.index.artist',
    'album_queue': 'search.index.album'
}

Message Format

Artist update message:

{
  "entity_type": "artist",
  "mbid": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
  "action": "update",
  "timestamp": "2025-04-28T12:34:56Z"
}

Album update message:

{
  "entity_type": "release_group",
  "mbid": "1b022e01-4da6-387b-8658-8678046e4cef",
  "action": "update",
  "timestamp": "2025-04-28T12:34:56Z"
}

SIR (Search Index Rebuilder)

Purpose: Consume RabbitMQ messages and update Solr

Process:

  1. Connect to RabbitMQ
  2. Subscribe to queues
  3. Consume messages
  4. Query MusicBrainz DB for entity
  5. Post update to Solr
  6. Acknowledge message

Container: Separate service in docker-compose

Monitoring

Queue depth:

rabbitmqctl list_queues name messages

Consumer count:

rabbitmqctl list_consumers

12. Redis

Overview

Type: In-memory cache

Purpose: Ephemeral cache and rate limiting

Technology: Redis 6+

Memory: 512MB limit

Configuration

REDIS = {
    'url': 'redis://redis:6379/0',
    'namespace': 'lm3.7',
    'max_memory': '512mb',
    'eviction_policy': 'allkeys-lfu'
}

Use Cases

  1. Hot cache: Frequently accessed metadata
  2. Rate limiting: Request counting
  3. Sentry deduplication: Error tracking
  4. Invalidation locks: Distributed locking

Connection Pool

import aioredis

redis = await aioredis.create_redis_pool(
    config.REDIS['url'],
    minsize=5,
    maxsize=20,
    encoding='utf-8'
)

13. Sentry

Overview

Type: Error tracking

Purpose: Application monitoring

Technology: Sentry SaaS

Library: sentry-sdk 0.19.5

Configuration

import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration

sentry_sdk.init(
    dsn=config.SENTRY_DSN,
    integrations=[FlaskIntegration()],
    release=f"lidarr-metadata@{__version__}",
    environment=config.ENVIRONMENT,
    traces_sample_rate=0.1
)

Redis-Based Rate Limiting

Purpose: Prevent alert fatigue

class SentryRedisTtlProcessor:
    """Rate limit Sentry events using Redis"""
    
    def __init__(self, redis, ttl=3600):
        self.redis = redis
        self.ttl = ttl
    
    async def __call__(self, event, hint):
        # Generate error hash
        error_hash = hashlib.md5(
            f"{event['exception']['type']}:{event['exception']['value']}".encode()
        ).hexdigest()
        
        key = f"lm3.7:sentry:{error_hash}"
        
        # Check if error seen recently
        if await self.redis.exists(key):
            return None  # Drop event
        
        # Mark error as seen
        await self.redis.setex(key, self.ttl, "1")
        
        return event

Release Tracking

Sentry releases: Tied to git commits

CI/CD integration:

sentry-cli releases new "lidarr-metadata@${GIT_SHA}"
sentry-cli releases set-commits "lidarr-metadata@${GIT_SHA}" --auto
sentry-cli releases finalize "lidarr-metadata@${GIT_SHA}"

14. Telegraf

Overview

Type: Metrics collection

Purpose: StatsD metrics aggregation

Technology: Telegraf (InfluxData)

Protocol: StatsD

Configuration

TELEGRAF = {
    'host': 'telegraf',
    'port': 8125,
    'prefix': 'lidarr.metadata'
}

StatsD Client

import statsd

stats = statsd.StatsClient(
    host=config.TELEGRAF['host'],
    port=config.TELEGRAF['port'],
    prefix=config.TELEGRAF['prefix']
)

Metrics

Request counters:

stats.incr('requests.artist')
stats.incr('requests.album')
stats.incr('requests.search')

Response times:

with stats.timer('response_time.artist'):
    artist = await get_artist(mbid)

Cache hits/misses:

stats.incr('cache.hit')
stats.incr('cache.miss')

Provider requests:

stats.incr('provider.fanart.request')
stats.incr('provider.wikipedia.request')

15. Cloudflare

Overview

Type: CDN and edge caching

Purpose: Global content delivery

Technology: Cloudflare CDN

API: Cloudflare REST API v4

Configuration

CLOUDFLARE = {
    'zone_id': 'your-zone-id',
    'api_token': 'your-api-token',
    'base_url': 'https://api.cloudflare.com/client/v4'
}

Cache Purge

Purge by URL:

async def purge_cloudflare_cache(urls):
    """Purge Cloudflare cache for URLs"""
    async with aiohttp.ClientSession() as session:
        headers = {
            'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}",
            'Content-Type': 'application/json'
        }
        
        # Batch URLs (max 30 per request)
        for batch in chunks(urls, 30):
            data = {'files': batch}
            url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache"
            
            async with session.post(url, headers=headers, json=data) as response:
                response.raise_for_status()

Purge all:

async def purge_all_cloudflare_cache():
    """Purge entire Cloudflare cache"""
    async with aiohttp.ClientSession() as session:
        headers = {
            'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}",
            'Content-Type': 'application/json'
        }
        
        data = {'purge_everything': True}
        url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache"
        
        async with session.post(url, headers=headers, json=data) as response:
            response.raise_for_status()

Rate Limits

Cloudflare API: 1200 requests per 5 minutes

Batch purging: Max 30 URLs per request

Cache-Control Headers

Set by API:

response.headers['Cache-Control'] = 's-maxage=2592000, max-age=0'

Interpretation:

  • s-maxage=2592000: CDN caches for 30 days
  • max-age=0: Clients must revalidate

Integration Summary

The 15 integrations provide comprehensive metadata aggregation:

Core data: MusicBrainz DB (direct SQL)

Search: Solr (real-time via RabbitMQ)

Images: Cover Art Archive, FanArt.tv, TheAudioDB

Biographies: Wikipedia (32 languages), TheAudioDB

Charts: Last.fm, Billboard, Apple Music, Spotify

Cross-platform: Spotify ID mapping

Infrastructure: Redis (cache), PostgreSQL (persistent cache), RabbitMQ (messaging)

Monitoring: Sentry (errors), Telegraf (metrics)

CDN: Cloudflare (edge caching)

The integration architecture demonstrates excellent separation of concerns with fallback chains for resilience.