- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
35 KiB
Lidarr Metadata API - External Integrations
Integration Overview
The Lidarr Metadata API integrates with 15 external systems to provide comprehensive music metadata aggregation:
| Integration | Type | Purpose | Authentication | Rate Limit |
|---|---|---|---|---|
| MusicBrainz DB | Database | Core metadata | Read-only user | N/A |
| Solr Search | Search Engine | Full-text search | None | N/A |
| Cover Art Archive | CDN | Album cover art | None | N/A |
| FanArt.tv | REST API | Artist/album images | API key | 7-day lag (free) |
| TheAudioDB | REST API | Metadata fallback | API key "1" | Unknown |
| Wikipedia | Web Scraping | Artist biographies | None | Polite crawling |
| Spotify | REST API | ID mapping, charts | OAuth | 429 handling |
| Last.fm | REST API | Charts | API key | Unknown |
| Billboard | Web Scraping | Charts | None | Polite crawling |
| Apple Music | RSS API | Charts | None | N/A |
| RabbitMQ | Message Queue | Search index updates | Basic auth | N/A |
| Redis | Cache | Ephemeral cache | None | N/A |
| Sentry | Error Tracking | Monitoring | DSN | Redis-based |
| Telegraf | Metrics | StatsD metrics | None | N/A |
| Cloudflare | CDN | Edge caching | API token | 1200 req/5min |
1. MusicBrainz Database
Overview
Type: Direct PostgreSQL database access
Purpose: Authoritative source for all music metadata
Container: ghcr.io/lidarr/mb-postgres:1.0.10
Access method: Read-only asyncpg connection
Configuration
MUSICBRAINZ_DB = {
'host': 'musicbrainz',
'port': 5432,
'database': 'musicbrainz_db',
'user': 'musicbrainz_ro', # Read-only user
'password': 'abc',
'min_pool_size': 10,
'max_pool_size': 50,
'command_timeout': 30
}
Connection Pool
import asyncpg
pool = await asyncpg.create_pool(
host=config.MUSICBRAINZ_DB['host'],
port=config.MUSICBRAINZ_DB['port'],
database=config.MUSICBRAINZ_DB['database'],
user=config.MUSICBRAINZ_DB['user'],
password=config.MUSICBRAINZ_DB['password'],
min_size=config.MUSICBRAINZ_DB['min_pool_size'],
max_size=config.MUSICBRAINZ_DB['max_pool_size'],
command_timeout=config.MUSICBRAINZ_DB['command_timeout']
)
Replication Setup
Replication method: MusicBrainz replication packets
Update frequency: Hourly
Replication script: Custom script in container
Process:
- Check current replication sequence
- Download replication packets from MusicBrainz FTP
- Apply SQL changes
- Update replication control table
- Trigger search index updates
Monitoring replication lag:
SELECT
current_replication_sequence,
last_replication_date,
NOW() - last_replication_date AS lag
FROM replication_control;
Database Size and Performance
Database size: 100GB+ (full MusicBrainz dataset)
Query performance:
- Simple artist lookup: 50-100ms
- Complex artist with releases: 100-500ms
- Album with tracks: 200-1000ms
- Change detection query: 500-2000ms
Optimization: Custom indices on last_updated columns
Security Considerations
Read-only access: User has SELECT-only permissions
Network isolation: Database accessible only within Docker network
Credentials: Hardcoded (insecure default, should be changed)
2. Solr Search
Overview
Type: Apache Solr 8.x search engine
Purpose: Full-text search for artists and albums
Container: ghcr.io/lidarr/mb-solr:3.3.1.9
Cores: artist, release-group
Configuration
SOLR = {
'url': 'http://solr:8983/solr',
'artist_core': 'artist',
'album_core': 'release-group',
'timeout': 5,
'rows': 10
}
Query Interface
HTTP client: aiohttp
Query format: JSON API
Example query:
import aiohttp
async def search_artist(query, limit=10):
async with aiohttp.ClientSession() as session:
params = {
'q': query,
'defType': 'dismax',
'qf': 'artist^2 sortname alias',
'mm': '1',
'rows': limit,
'wt': 'json'
}
async with session.get(
f"{config.SOLR['url']}/{config.SOLR['artist_core']}/select",
params=params,
timeout=aiohttp.ClientTimeout(total=config.SOLR['timeout'])
) as response:
data = await response.json()
return data['response']['docs']
Real-Time Index Updates
Update mechanism: RabbitMQ + SIR (Search Index Rebuilder)
Process:
- MusicBrainz database changes trigger RabbitMQ messages
- SIR consumes messages from queue
- SIR queries MusicBrainz DB for updated entity
- SIR posts update to Solr core
- Solr performs soft commit (1 second)
Update latency: 1-5 seconds from database change
Index Maintenance
Full reindex: Required after schema changes
Reindex process:
# Stop SIR
docker-compose stop indexer
# Clear Solr cores
curl "http://solr:8983/solr/artist/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
curl "http://solr:8983/solr/release-group/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
# Rebuild indices
docker-compose run indexer rebuild-artist
docker-compose run indexer rebuild-album
# Restart SIR
docker-compose start indexer
Reindex duration: 4-8 hours for full MusicBrainz dataset
Performance Tuning
JVM heap size: 2GB
Solr cache settings:
<filterCache size="512" initialSize="512" autowarmCount="256"/>
<queryResultCache size="512" initialSize="512" autowarmCount="256"/>
<documentCache size="512" initialSize="512" autowarmCount="0"/>
Commit settings:
<autoCommit>
<maxTime>15000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>1000</maxTime>
</autoSoftCommit>
3. Cover Art Archive
Overview
Type: Image CDN
Purpose: Album cover art images
Base URL: https://coverartarchive.org
Proxy: https://imagecache.lidarr.audio
Image URL Format
Direct URL:
https://coverartarchive.org/release/{release-mbid}/front-500.jpg
Proxied URL:
https://imagecache.lidarr.audio/cover/{release-mbid}/front.jpg
Image Types
| Type | Description | Typical Size |
|---|---|---|
front |
Front cover | 500x500 - 1200x1200 |
back |
Back cover | 500x500 - 1200x1200 |
booklet |
Booklet pages | Variable |
medium |
Disc/vinyl image | 500x500 |
tray |
CD tray card | Variable |
obi |
Japanese obi strip | Variable |
spine |
Spine image | Variable |
track |
Track listing | Variable |
liner |
Liner notes | Variable |
sticker |
Sticker image | Variable |
poster |
Poster image | Variable |
Image Proxy Benefits
Advantages of using imagecache.lidarr.audio:
- Caching: Images cached at edge for faster delivery
- Resizing: Automatic image resizing via query parameters
- Format conversion: WebP conversion for modern browsers
- Bandwidth: Reduced load on Cover Art Archive
- Reliability: Fallback to direct URL if proxy fails
Proxy query parameters:
https://imagecache.lidarr.audio/cover/{mbid}/front.jpg?size=500&format=webp
Integration Code
async def get_cover_art(release_mbid):
"""Fetch cover art URLs for release"""
images = []
# Try proxy first
proxy_url = f"https://imagecache.lidarr.audio/cover/{release_mbid}/front.jpg"
if await check_url_exists(proxy_url):
images.append({
'Url': proxy_url,
'CoverType': 'cover',
'Extension': '.jpg'
})
else:
# Fallback to direct URL
direct_url = f"https://coverartarchive.org/release/{release_mbid}/front-500.jpg"
if await check_url_exists(direct_url):
images.append({
'Url': direct_url,
'CoverType': 'cover',
'Extension': '.jpg'
})
return images
Error Handling
404 Not Found: No cover art available for release
503 Service Unavailable: Cover Art Archive temporarily down
Fallback: Use FanArt.tv or TheAudioDB images
4. FanArt.tv
Overview
Type: REST API
Purpose: High-quality artist and album images
Base URL: https://webservice.fanart.tv/v3
Authentication: API key
Configuration
FANART = {
'api_key': 'your-api-key-here',
'base_url': 'https://webservice.fanart.tv/v3',
'timeout': 10,
'cache_ttl': 2592000 # 30 days
}
API Key Types
| Key Type | Cost | Rate Limit | Image Lag |
|---|---|---|---|
| Free | Free | Unknown | 7 days |
| Personal | $2/month | Higher | No lag |
| Commercial | $5/month | Highest | No lag |
7-day lag: Free API keys only return images added 7+ days ago
Endpoints
Artist Images
Endpoint: GET /music/{mbid}
Request:
async def get_fanart_artist_images(mbid):
async with aiohttp.ClientSession() as session:
headers = {'api-key': config.FANART['api_key']}
url = f"{config.FANART['base_url']}/music/{mbid}"
async with session.get(url, headers=headers, timeout=10) as response:
if response.status == 404:
return []
response.raise_for_status()
return await response.json()
Response:
{
"name": "Nirvana",
"mbid_id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"artistbackground": [
{
"id": "12345",
"url": "https://assets.fanart.tv/fanart/music/5b11f4ce-a62d-471e-81fc-a69a8278c7da/artistbackground/nirvana-1.jpg",
"likes": "42"
}
],
"artistthumb": [...],
"hdmusiclogo": [...],
"musicbanner": [...],
"musiclogo": [...]
}
Album Images
Endpoint: GET /music/albums/{mbid}
Response:
{
"albums": {
"1b022e01-4da6-387b-8658-8678046e4cef": {
"albumcover": [
{
"id": "67890",
"url": "https://assets.fanart.tv/fanart/music/1b022e01-4da6-387b-8658-8678046e4cef/albumcover/nevermind-1.jpg",
"likes": "156"
}
],
"cdart": [...]
}
}
}
Image Types
Artist images:
artistbackground: Background images (1920x1080)artistthumb: Artist thumbnails (1000x1000)hdmusiclogo: HD logos (transparent PNG)musicbanner: Banners (1000x185)musiclogo: Standard logos (transparent PNG)
Album images:
albumcover: Album covers (1000x1000)cdart: CD art (transparent PNG)
Mapping to Lidarr Image Types
FANART_TYPE_MAPPING = {
'artistbackground': 'fanart',
'artistthumb': 'poster',
'hdmusiclogo': 'logo',
'musicbanner': 'banner',
'musiclogo': 'logo',
'albumcover': 'cover',
'cdart': 'disc'
}
Error Handling
404 Not Found: No images available for artist/album
429 Too Many Requests: Rate limit exceeded (retry with backoff)
503 Service Unavailable: FanArt.tv temporarily down
Fallback: Use TheAudioDB or Cover Art Archive
Caching Strategy
Cache TTL: 30 days (images rarely change)
Cache key: fanart:artist:{mbid} or fanart:album:{mbid}
Invalidation: Manual only (images are immutable)
5. TheAudioDB
Overview
Type: REST API
Purpose: Fallback metadata and images
Base URL: https://theaudiodb.com/api/v1/json
Authentication: API key "1" (public key)
Configuration
THEAUDIODB = {
'api_key': '1',
'base_url': 'https://theaudiodb.com/api/v1/json',
'timeout': 10,
'cache_ttl': 2592000 # 30 days
}
Endpoints
Artist by MusicBrainz ID
Endpoint: GET /1/artist-mb.php?i={mbid}
Request:
async def get_theaudiodb_artist(mbid):
async with aiohttp.ClientSession() as session:
url = f"{config.THEAUDIODB['base_url']}/1/artist-mb.php"
params = {'i': mbid}
async with session.get(url, params=params, timeout=10) as response:
if response.status == 404:
return None
response.raise_for_status()
data = await response.json()
return data['artists'][0] if data['artists'] else None
Response:
{
"artists": [
{
"idArtist": "111247",
"strArtist": "Nirvana",
"strArtistAlternate": "",
"strLabel": "DGC Records",
"idLabel": "45114",
"intFormedYear": "1987",
"intBornYear": "",
"intDiedYear": "",
"strDisbanded": "1994",
"strStyle": "Grunge",
"strGenre": "Rock",
"strMood": "Angry",
"strWebsite": "www.nirvana.com",
"strFacebook": "www.facebook.com/Nirvana",
"strTwitter": "twitter.com/nirvana",
"strBiographyEN": "Nirvana was an American rock band...",
"strBiographyDE": null,
"strBiographyFR": null,
"strGender": "Male",
"strCountry": "United States",
"strCountryCode": "US",
"strArtistThumb": "https://www.theaudiodb.com/images/media/artist/thumb/uxrqxy1347913147.jpg",
"strArtistLogo": "https://www.theaudiodb.com/images/media/artist/logo/urspuv1434553994.png",
"strArtistFanart": "https://www.theaudiodb.com/images/media/artist/fanart/spvryu1347980801.jpg",
"strArtistBanner": "https://www.theaudiodb.com/images/media/artist/banner/xuypqw1342640163.jpg",
"strMusicBrainzID": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"strLastFMChart": "https://www.last.fm/music/Nirvana",
"intCharted": "5",
"strLocked": "unlocked"
}
]
}
Album by MusicBrainz ID
Endpoint: GET /1/album-mb.php?i={mbid}
Response: Similar structure with album-specific fields
Data Extraction
Biography:
def extract_biography(artist_data):
"""Extract biography with language fallback"""
languages = ['EN', 'DE', 'FR', 'ES', 'IT', 'JP']
for lang in languages:
bio = artist_data.get(f'strBiography{lang}')
if bio:
return bio
return None
Images:
def extract_images(artist_data):
"""Extract image URLs"""
images = []
if artist_data.get('strArtistThumb'):
images.append({
'Url': artist_data['strArtistThumb'],
'CoverType': 'poster',
'Extension': '.jpg'
})
if artist_data.get('strArtistLogo'):
images.append({
'Url': artist_data['strArtistLogo'],
'CoverType': 'logo',
'Extension': '.png'
})
if artist_data.get('strArtistFanart'):
images.append({
'Url': artist_data['strArtistFanart'],
'CoverType': 'fanart',
'Extension': '.jpg'
})
if artist_data.get('strArtistBanner'):
images.append({
'Url': artist_data['strArtistBanner'],
'CoverType': 'banner',
'Extension': '.jpg'
})
return images
Links:
def extract_links(artist_data):
"""Extract social media links"""
links = []
if artist_data.get('strWebsite'):
links.append({
'Url': f"http://{artist_data['strWebsite']}",
'Name': 'website'
})
if artist_data.get('strFacebook'):
links.append({
'Url': f"https://{artist_data['strFacebook']}",
'Name': 'facebook'
})
if artist_data.get('strTwitter'):
links.append({
'Url': f"https://{artist_data['strTwitter']}",
'Name': 'twitter'
})
return links
Error Handling
404 Not Found: Artist/album not in TheAudioDB
Timeout: 10-second timeout, fallback to other providers
Invalid JSON: Graceful degradation
6. Wikipedia
Overview
Type: Web scraping
Purpose: Artist biographical information
Base URL: https://{lang}.wikipedia.org
Authentication: None (public access)
Configuration
WIKIPEDIA = {
'timeout': 2,
'max_connections_per_host': 1,
'user_agent': 'LidarrMetadataAPI/10.0.0 (https://github.com/Lidarr/LidarrAPI.Metadata)',
'languages': ['en', 'fr', 'de', 'es', 'it', 'ja', 'zh', 'ru', 'pt', 'nl', 'sv', 'fi', 'no', 'da', 'pl', 'cs', 'hu', 'ro', 'tr', 'el', 'he', 'ar', 'fa', 'hi', 'th', 'ko', 'vi', 'id', 'ms', 'tl', 'bn', 'ta']
}
Lookup Process
Multi-stage lookup:
- MusicBrainz → Wikidata: Extract Wikidata ID from MusicBrainz links
- Wikidata → Wikipedia: Get Wikipedia article title from Wikidata
- Wikipedia → Extract: Scrape and parse Wikipedia article
Wikidata Integration
Wikidata entity URL: https://www.wikidata.org/wiki/Special:EntityData/{entity_id}.json
Extract Wikipedia links:
async def get_wikipedia_title_from_wikidata(wikidata_id, language='en'):
"""Get Wikipedia article title from Wikidata entity"""
async with aiohttp.ClientSession() as session:
url = f"https://www.wikidata.org/wiki/Special:EntityData/{wikidata_id}.json"
async with session.get(url, timeout=2) as response:
data = await response.json()
entity = data['entities'][wikidata_id]
# Get Wikipedia link for language
sitelinks = entity.get('sitelinks', {})
wiki_key = f'{language}wiki'
if wiki_key in sitelinks:
return sitelinks[wiki_key]['title']
return None
Wikipedia Article Extraction
Fetch article HTML:
async def get_wikipedia_article(title, language='en'):
"""Fetch Wikipedia article HTML"""
async with aiohttp.ClientSession() as session:
url = f"https://{language}.wikipedia.org/wiki/{title}"
headers = {'User-Agent': config.WIKIPEDIA['user_agent']}
async with session.get(url, headers=headers, timeout=2) as response:
if response.status == 404:
return None
response.raise_for_status()
return await response.text()
Parse and extract summary:
from bs4 import BeautifulSoup
def extract_wikipedia_summary(html):
"""Extract first paragraph as summary"""
soup = BeautifulSoup(html, 'lxml')
# Find main content div
content = soup.find('div', {'id': 'mw-content-text'})
if not content:
return None
# Find first paragraph (skip disambiguation notices)
for p in content.find_all('p', recursive=False):
text = p.get_text().strip()
# Skip empty paragraphs
if not text:
continue
# Skip coordinate-only paragraphs
if text.startswith('Coordinates:'):
continue
# Return first substantial paragraph
if len(text) > 50:
return text
return None
Language Fallback
32-language fallback chain:
async def get_artist_overview(mbid):
"""Get artist overview with language fallback"""
# Get Wikidata ID from MusicBrainz
wikidata_id = await get_wikidata_id_from_musicbrainz(mbid)
if not wikidata_id:
return None
# Try each language in order
for language in config.WIKIPEDIA['languages']:
try:
# Get Wikipedia title for language
title = await get_wikipedia_title_from_wikidata(wikidata_id, language)
if not title:
continue
# Fetch and parse article
html = await get_wikipedia_article(title, language)
if not html:
continue
summary = extract_wikipedia_summary(html)
if summary:
return summary
except Exception as e:
logger.debug(f"Wikipedia lookup failed for {language}: {e}")
continue
return None
Rate Limiting
Polite crawling:
- 1 connection per host maximum
- 2-second timeout per request
- User-Agent header identifies bot
- Respect robots.txt (manual check)
No explicit rate limit: Wikipedia allows reasonable bot traffic
Error Handling
404 Not Found: Article doesn't exist in language
Timeout: 2-second timeout, try next language
Parse errors: Graceful degradation, try next language
Fallback: Use TheAudioDB biography if Wikipedia unavailable
7. Spotify
Overview
Type: REST API with OAuth
Purpose: ID mapping and cross-platform linking
Base URL: https://api.spotify.com/v1
Authentication: OAuth 2.0 Client Credentials
Library: spotipy 2.16.1
Configuration
SPOTIFY = {
'client_id': 'your-client-id',
'client_secret': 'your-client-secret',
'redirect_uri': 'http://localhost:5001/spotify/callback',
'timeout': 5
}
OAuth Flow
Client Credentials Grant (for server-to-server):
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials
auth_manager = SpotifyClientCredentials(
client_id=config.SPOTIFY['client_id'],
client_secret=config.SPOTIFY['client_secret']
)
spotify = spotipy.Spotify(auth_manager=auth_manager)
Token caching: Tokens cached in Redis with automatic refresh
ID Mapping
MusicBrainz → Spotify:
async def map_musicbrainz_to_spotify(mbid, artist_name):
"""Map MusicBrainz ID to Spotify ID"""
# Search Spotify by artist name
results = spotify.search(q=f'artist:{artist_name}', type='artist', limit=10)
if not results['artists']['items']:
return None
# Find best match using Levenshtein distance
best_match = None
best_score = 0
for artist in results['artists']['items']:
score = levenshtein_similarity(artist_name, artist['name'])
if score > best_score and score >= 0.8:
best_score = score
best_match = artist
return best_match['id'] if best_match else None
Levenshtein similarity:
from Levenshtein import ratio
def levenshtein_similarity(s1, s2):
"""Calculate Levenshtein similarity (0-1)"""
return ratio(s1.lower(), s2.lower())
Threshold: 0.8 minimum similarity for match
Spotify API Endpoints
Get artist:
artist = spotify.artist('6olE6TJLqED3rqDCT0FyPh')
Get album:
album = spotify.album('2guirTSEqLizK7j9i1MTTZ')
Search:
results = spotify.search(q='nirvana', type='artist', limit=10)
Error Handling
429 Too Many Requests: Retry with exponential backoff
401 Unauthorized: Refresh OAuth token
404 Not Found: Artist/album not on Spotify
Timeout: 5-second timeout, graceful degradation
Caching Strategy
Cache TTL: 90 days (Spotify IDs rarely change)
Cache key: spotify:artist:{spotify_id} or spotify:mbid:{mbid}
8. Last.fm
Overview
Type: REST API
Purpose: Music charts and scrobble data
Base URL: https://ws.audioscrobbler.com/2.0
Authentication: API key
Library: pylast 4.3.0
Configuration
LASTFM = {
'api_key': 'your-api-key',
'api_secret': 'your-api-secret',
'timeout': 5
}
pylast Integration
import pylast
network = pylast.LastFMNetwork(
api_key=config.LASTFM['api_key'],
api_secret=config.LASTFM['api_secret']
)
Chart Endpoints
Top artists:
def get_lastfm_top_artists(limit=50):
"""Get Last.fm top artists chart"""
top_artists = network.get_top_artists(limit=limit)
results = []
for artist in top_artists:
results.append({
'name': artist.item.name,
'playcount': artist.weight,
'listeners': artist.item.get_listener_count()
})
return results
Top albums:
def get_lastfm_top_albums(limit=50):
"""Get Last.fm top albums chart"""
top_albums = network.get_top_albums(limit=limit)
results = []
for album in top_albums:
results.append({
'name': album.item.title,
'artist': album.item.artist.name,
'playcount': album.weight
})
return results
Top tracks:
def get_lastfm_top_tracks(limit=50):
"""Get Last.fm top tracks chart"""
top_tracks = network.get_top_tracks(limit=limit)
results = []
for track in top_tracks:
results.append({
'name': track.item.title,
'artist': track.item.artist.name,
'playcount': track.weight
})
return results
MusicBrainz Mapping
Map Last.fm artist to MusicBrainz:
async def map_lastfm_to_musicbrainz(lastfm_artist_name):
"""Map Last.fm artist to MusicBrainz ID"""
# Search MusicBrainz via Solr
results = await search_artist(lastfm_artist_name, limit=5)
if not results:
return None
# Return best match (first result)
return results[0]['Id']
Caching
Cache TTL: 6 hours (charts update daily)
Cache key: lastfm:chart:{type}:{limit}
9. Billboard
Overview
Type: Web scraping
Purpose: Billboard music charts
Base URL: https://www.billboard.com/charts
Authentication: None
Library: billboard-py 7.0.0
billboard-py Integration
import billboard
def get_billboard_hot_100():
"""Get Billboard Hot 100 chart"""
chart = billboard.ChartData('hot-100')
results = []
for entry in chart:
results.append({
'position': entry.rank,
'title': entry.title,
'artist': entry.artist,
'last_position': entry.lastPos,
'peak_position': entry.peakPos,
'weeks_on_chart': entry.weeks
})
return results
Supported Charts
| Chart Name | billboard-py ID | Type |
|---|---|---|
| Hot 100 | hot-100 |
Tracks |
| Billboard 200 | billboard-200 |
Albums |
| Artist 100 | artist-100 |
Artists |
| Streaming Songs | streaming-songs |
Tracks |
| Radio Songs | radio-songs |
Tracks |
| Digital Song Sales | digital-song-sales |
Tracks |
MusicBrainz Mapping
Map Billboard entry to MusicBrainz:
async def map_billboard_to_musicbrainz(artist_name, track_title=None):
"""Map Billboard entry to MusicBrainz"""
# Search artist
artist_results = await search_artist(artist_name, limit=5)
if not artist_results:
return None
artist_mbid = artist_results[0]['Id']
# If track title provided, search for recording
if track_title:
# Search would require recording search (not implemented)
pass
return artist_mbid
Error Handling
HTTP errors: Retry with backoff
Parse errors: Graceful degradation
Rate limiting: Polite crawling (1 request per second)
Caching
Cache TTL: 6 hours (charts update weekly)
Cache key: billboard:chart:{chart_name}
10. Apple Music / iTunes
Overview
Type: RSS API
Purpose: Apple Music and iTunes charts
Base URL: https://rss.applemarketingtools.com/api/v2
Authentication: None
RSS Feed URLs
Top albums:
https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/albums.json
Top songs:
https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/songs.json
New releases:
https://rss.applemarketingtools.com/api/v2/us/music/new-releases/100/albums.json
Fetch and Parse
async def get_apple_music_chart(chart_type, limit=100):
"""Fetch Apple Music chart"""
async with aiohttp.ClientSession() as session:
url = f"https://rss.applemarketingtools.com/api/v2/us/music/most-played/{limit}/{chart_type}.json"
async with session.get(url, timeout=5) as response:
response.raise_for_status()
data = await response.json()
results = []
for entry in data['feed']['results']:
results.append({
'position': len(results) + 1,
'name': entry['name'],
'artist': entry['artistName'],
'url': entry['url'],
'artwork': entry['artworkUrl100']
})
return results
MusicBrainz Mapping
Map Apple Music entry to MusicBrainz: Similar to Billboard mapping
Caching
Cache TTL: 6 hours
Cache key: apple:chart:{type}:{limit}
11. RabbitMQ
Overview
Type: Message queue
Purpose: Real-time search index updates
Technology: RabbitMQ 3.x
Protocol: AMQP 0.9.1
Configuration
RABBITMQ = {
'host': 'rabbitmq',
'port': 5672,
'user': 'abc',
'password': 'abc',
'exchange': 'search.index',
'artist_queue': 'search.index.artist',
'album_queue': 'search.index.album'
}
Message Format
Artist update message:
{
"entity_type": "artist",
"mbid": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
"action": "update",
"timestamp": "2025-04-28T12:34:56Z"
}
Album update message:
{
"entity_type": "release_group",
"mbid": "1b022e01-4da6-387b-8658-8678046e4cef",
"action": "update",
"timestamp": "2025-04-28T12:34:56Z"
}
SIR (Search Index Rebuilder)
Purpose: Consume RabbitMQ messages and update Solr
Process:
- Connect to RabbitMQ
- Subscribe to queues
- Consume messages
- Query MusicBrainz DB for entity
- Post update to Solr
- Acknowledge message
Container: Separate service in docker-compose
Monitoring
Queue depth:
rabbitmqctl list_queues name messages
Consumer count:
rabbitmqctl list_consumers
12. Redis
Overview
Type: In-memory cache
Purpose: Ephemeral cache and rate limiting
Technology: Redis 6+
Memory: 512MB limit
Configuration
REDIS = {
'url': 'redis://redis:6379/0',
'namespace': 'lm3.7',
'max_memory': '512mb',
'eviction_policy': 'allkeys-lfu'
}
Use Cases
- Hot cache: Frequently accessed metadata
- Rate limiting: Request counting
- Sentry deduplication: Error tracking
- Invalidation locks: Distributed locking
Connection Pool
import aioredis
redis = await aioredis.create_redis_pool(
config.REDIS['url'],
minsize=5,
maxsize=20,
encoding='utf-8'
)
13. Sentry
Overview
Type: Error tracking
Purpose: Application monitoring
Technology: Sentry SaaS
Library: sentry-sdk 0.19.5
Configuration
import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration
sentry_sdk.init(
dsn=config.SENTRY_DSN,
integrations=[FlaskIntegration()],
release=f"lidarr-metadata@{__version__}",
environment=config.ENVIRONMENT,
traces_sample_rate=0.1
)
Redis-Based Rate Limiting
Purpose: Prevent alert fatigue
class SentryRedisTtlProcessor:
"""Rate limit Sentry events using Redis"""
def __init__(self, redis, ttl=3600):
self.redis = redis
self.ttl = ttl
async def __call__(self, event, hint):
# Generate error hash
error_hash = hashlib.md5(
f"{event['exception']['type']}:{event['exception']['value']}".encode()
).hexdigest()
key = f"lm3.7:sentry:{error_hash}"
# Check if error seen recently
if await self.redis.exists(key):
return None # Drop event
# Mark error as seen
await self.redis.setex(key, self.ttl, "1")
return event
Release Tracking
Sentry releases: Tied to git commits
CI/CD integration:
sentry-cli releases new "lidarr-metadata@${GIT_SHA}"
sentry-cli releases set-commits "lidarr-metadata@${GIT_SHA}" --auto
sentry-cli releases finalize "lidarr-metadata@${GIT_SHA}"
14. Telegraf
Overview
Type: Metrics collection
Purpose: StatsD metrics aggregation
Technology: Telegraf (InfluxData)
Protocol: StatsD
Configuration
TELEGRAF = {
'host': 'telegraf',
'port': 8125,
'prefix': 'lidarr.metadata'
}
StatsD Client
import statsd
stats = statsd.StatsClient(
host=config.TELEGRAF['host'],
port=config.TELEGRAF['port'],
prefix=config.TELEGRAF['prefix']
)
Metrics
Request counters:
stats.incr('requests.artist')
stats.incr('requests.album')
stats.incr('requests.search')
Response times:
with stats.timer('response_time.artist'):
artist = await get_artist(mbid)
Cache hits/misses:
stats.incr('cache.hit')
stats.incr('cache.miss')
Provider requests:
stats.incr('provider.fanart.request')
stats.incr('provider.wikipedia.request')
15. Cloudflare
Overview
Type: CDN and edge caching
Purpose: Global content delivery
Technology: Cloudflare CDN
API: Cloudflare REST API v4
Configuration
CLOUDFLARE = {
'zone_id': 'your-zone-id',
'api_token': 'your-api-token',
'base_url': 'https://api.cloudflare.com/client/v4'
}
Cache Purge
Purge by URL:
async def purge_cloudflare_cache(urls):
"""Purge Cloudflare cache for URLs"""
async with aiohttp.ClientSession() as session:
headers = {
'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}",
'Content-Type': 'application/json'
}
# Batch URLs (max 30 per request)
for batch in chunks(urls, 30):
data = {'files': batch}
url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache"
async with session.post(url, headers=headers, json=data) as response:
response.raise_for_status()
Purge all:
async def purge_all_cloudflare_cache():
"""Purge entire Cloudflare cache"""
async with aiohttp.ClientSession() as session:
headers = {
'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}",
'Content-Type': 'application/json'
}
data = {'purge_everything': True}
url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache"
async with session.post(url, headers=headers, json=data) as response:
response.raise_for_status()
Rate Limits
Cloudflare API: 1200 requests per 5 minutes
Batch purging: Max 30 URLs per request
Cache-Control Headers
Set by API:
response.headers['Cache-Control'] = 's-maxage=2592000, max-age=0'
Interpretation:
s-maxage=2592000: CDN caches for 30 daysmax-age=0: Clients must revalidate
Integration Summary
The 15 integrations provide comprehensive metadata aggregation:
Core data: MusicBrainz DB (direct SQL)
Search: Solr (real-time via RabbitMQ)
Images: Cover Art Archive, FanArt.tv, TheAudioDB
Biographies: Wikipedia (32 languages), TheAudioDB
Charts: Last.fm, Billboard, Apple Music, Spotify
Cross-platform: Spotify ID mapping
Infrastructure: Redis (cache), PostgreSQL (persistent cache), RabbitMQ (messaging)
Monitoring: Sentry (errors), Telegraf (metrics)
CDN: Cloudflare (edge caching)
The integration architecture demonstrates excellent separation of concerns with fallback chains for resilience.