# Lidarr Metadata API - External Integrations ## Integration Overview The Lidarr Metadata API integrates with 15 external systems to provide comprehensive music metadata aggregation: | Integration | Type | Purpose | Authentication | Rate Limit | |-------------|------|---------|----------------|------------| | **MusicBrainz DB** | Database | Core metadata | Read-only user | N/A | | **Solr Search** | Search Engine | Full-text search | None | N/A | | **Cover Art Archive** | CDN | Album cover art | None | N/A | | **FanArt.tv** | REST API | Artist/album images | API key | 7-day lag (free) | | **TheAudioDB** | REST API | Metadata fallback | API key "1" | Unknown | | **Wikipedia** | Web Scraping | Artist biographies | None | Polite crawling | | **Spotify** | REST API | ID mapping, charts | OAuth | 429 handling | | **Last.fm** | REST API | Charts | API key | Unknown | | **Billboard** | Web Scraping | Charts | None | Polite crawling | | **Apple Music** | RSS API | Charts | None | N/A | | **RabbitMQ** | Message Queue | Search index updates | Basic auth | N/A | | **Redis** | Cache | Ephemeral cache | None | N/A | | **Sentry** | Error Tracking | Monitoring | DSN | Redis-based | | **Telegraf** | Metrics | StatsD metrics | None | N/A | | **Cloudflare** | CDN | Edge caching | API token | 1200 req/5min | ## 1. MusicBrainz Database ### Overview **Type**: Direct PostgreSQL database access **Purpose**: Authoritative source for all music metadata **Container**: `ghcr.io/lidarr/mb-postgres:1.0.10` **Access method**: Read-only asyncpg connection ### Configuration ```python MUSICBRAINZ_DB = { 'host': 'musicbrainz', 'port': 5432, 'database': 'musicbrainz_db', 'user': 'musicbrainz_ro', # Read-only user 'password': 'abc', 'min_pool_size': 10, 'max_pool_size': 50, 'command_timeout': 30 } ``` ### Connection Pool ```python import asyncpg pool = await asyncpg.create_pool( host=config.MUSICBRAINZ_DB['host'], port=config.MUSICBRAINZ_DB['port'], database=config.MUSICBRAINZ_DB['database'], user=config.MUSICBRAINZ_DB['user'], password=config.MUSICBRAINZ_DB['password'], min_size=config.MUSICBRAINZ_DB['min_pool_size'], max_size=config.MUSICBRAINZ_DB['max_pool_size'], command_timeout=config.MUSICBRAINZ_DB['command_timeout'] ) ``` ### Replication Setup **Replication method**: MusicBrainz replication packets **Update frequency**: Hourly **Replication script**: Custom script in container **Process**: 1. Check current replication sequence 2. Download replication packets from MusicBrainz FTP 3. Apply SQL changes 4. Update replication control table 5. Trigger search index updates **Monitoring replication lag**: ```sql SELECT current_replication_sequence, last_replication_date, NOW() - last_replication_date AS lag FROM replication_control; ``` ### Database Size and Performance **Database size**: 100GB+ (full MusicBrainz dataset) **Query performance**: - Simple artist lookup: 50-100ms - Complex artist with releases: 100-500ms - Album with tracks: 200-1000ms - Change detection query: 500-2000ms **Optimization**: Custom indices on `last_updated` columns ### Security Considerations **Read-only access**: User has SELECT-only permissions **Network isolation**: Database accessible only within Docker network **Credentials**: Hardcoded (insecure default, should be changed) ## 2. Solr Search ### Overview **Type**: Apache Solr 8.x search engine **Purpose**: Full-text search for artists and albums **Container**: `ghcr.io/lidarr/mb-solr:3.3.1.9` **Cores**: `artist`, `release-group` ### Configuration ```python SOLR = { 'url': 'http://solr:8983/solr', 'artist_core': 'artist', 'album_core': 'release-group', 'timeout': 5, 'rows': 10 } ``` ### Query Interface **HTTP client**: aiohttp **Query format**: JSON API **Example query**: ```python import aiohttp async def search_artist(query, limit=10): async with aiohttp.ClientSession() as session: params = { 'q': query, 'defType': 'dismax', 'qf': 'artist^2 sortname alias', 'mm': '1', 'rows': limit, 'wt': 'json' } async with session.get( f"{config.SOLR['url']}/{config.SOLR['artist_core']}/select", params=params, timeout=aiohttp.ClientTimeout(total=config.SOLR['timeout']) ) as response: data = await response.json() return data['response']['docs'] ``` ### Real-Time Index Updates **Update mechanism**: RabbitMQ + SIR (Search Index Rebuilder) **Process**: 1. MusicBrainz database changes trigger RabbitMQ messages 2. SIR consumes messages from queue 3. SIR queries MusicBrainz DB for updated entity 4. SIR posts update to Solr core 5. Solr performs soft commit (1 second) **Update latency**: 1-5 seconds from database change ### Index Maintenance **Full reindex**: Required after schema changes **Reindex process**: ```bash # Stop SIR docker-compose stop indexer # Clear Solr cores curl "http://solr:8983/solr/artist/update?commit=true" -H "Content-Type: text/xml" --data-binary '*:*' curl "http://solr:8983/solr/release-group/update?commit=true" -H "Content-Type: text/xml" --data-binary '*:*' # Rebuild indices docker-compose run indexer rebuild-artist docker-compose run indexer rebuild-album # Restart SIR docker-compose start indexer ``` **Reindex duration**: 4-8 hours for full MusicBrainz dataset ### Performance Tuning **JVM heap size**: 2GB **Solr cache settings**: ```xml ``` **Commit settings**: ```xml 15000 false 1000 ``` ## 3. Cover Art Archive ### Overview **Type**: Image CDN **Purpose**: Album cover art images **Base URL**: `https://coverartarchive.org` **Proxy**: `https://imagecache.lidarr.audio` ### Image URL Format **Direct URL**: ``` https://coverartarchive.org/release/{release-mbid}/front-500.jpg ``` **Proxied URL**: ``` https://imagecache.lidarr.audio/cover/{release-mbid}/front.jpg ``` ### Image Types | Type | Description | Typical Size | |------|-------------|--------------| | `front` | Front cover | 500x500 - 1200x1200 | | `back` | Back cover | 500x500 - 1200x1200 | | `booklet` | Booklet pages | Variable | | `medium` | Disc/vinyl image | 500x500 | | `tray` | CD tray card | Variable | | `obi` | Japanese obi strip | Variable | | `spine` | Spine image | Variable | | `track` | Track listing | Variable | | `liner` | Liner notes | Variable | | `sticker` | Sticker image | Variable | | `poster` | Poster image | Variable | ### Image Proxy Benefits **Advantages of using imagecache.lidarr.audio**: 1. **Caching**: Images cached at edge for faster delivery 2. **Resizing**: Automatic image resizing via query parameters 3. **Format conversion**: WebP conversion for modern browsers 4. **Bandwidth**: Reduced load on Cover Art Archive 5. **Reliability**: Fallback to direct URL if proxy fails **Proxy query parameters**: ``` https://imagecache.lidarr.audio/cover/{mbid}/front.jpg?size=500&format=webp ``` ### Integration Code ```python async def get_cover_art(release_mbid): """Fetch cover art URLs for release""" images = [] # Try proxy first proxy_url = f"https://imagecache.lidarr.audio/cover/{release_mbid}/front.jpg" if await check_url_exists(proxy_url): images.append({ 'Url': proxy_url, 'CoverType': 'cover', 'Extension': '.jpg' }) else: # Fallback to direct URL direct_url = f"https://coverartarchive.org/release/{release_mbid}/front-500.jpg" if await check_url_exists(direct_url): images.append({ 'Url': direct_url, 'CoverType': 'cover', 'Extension': '.jpg' }) return images ``` ### Error Handling **404 Not Found**: No cover art available for release **503 Service Unavailable**: Cover Art Archive temporarily down **Fallback**: Use FanArt.tv or TheAudioDB images ## 4. FanArt.tv ### Overview **Type**: REST API **Purpose**: High-quality artist and album images **Base URL**: `https://webservice.fanart.tv/v3` **Authentication**: API key ### Configuration ```python FANART = { 'api_key': 'your-api-key-here', 'base_url': 'https://webservice.fanart.tv/v3', 'timeout': 10, 'cache_ttl': 2592000 # 30 days } ``` ### API Key Types | Key Type | Cost | Rate Limit | Image Lag | |----------|------|------------|-----------| | **Free** | Free | Unknown | 7 days | | **Personal** | $2/month | Higher | No lag | | **Commercial** | $5/month | Highest | No lag | **7-day lag**: Free API keys only return images added 7+ days ago ### Endpoints #### Artist Images **Endpoint**: `GET /music/{mbid}` **Request**: ```python async def get_fanart_artist_images(mbid): async with aiohttp.ClientSession() as session: headers = {'api-key': config.FANART['api_key']} url = f"{config.FANART['base_url']}/music/{mbid}" async with session.get(url, headers=headers, timeout=10) as response: if response.status == 404: return [] response.raise_for_status() return await response.json() ``` **Response**: ```json { "name": "Nirvana", "mbid_id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da", "artistbackground": [ { "id": "12345", "url": "https://assets.fanart.tv/fanart/music/5b11f4ce-a62d-471e-81fc-a69a8278c7da/artistbackground/nirvana-1.jpg", "likes": "42" } ], "artistthumb": [...], "hdmusiclogo": [...], "musicbanner": [...], "musiclogo": [...] } ``` #### Album Images **Endpoint**: `GET /music/albums/{mbid}` **Response**: ```json { "albums": { "1b022e01-4da6-387b-8658-8678046e4cef": { "albumcover": [ { "id": "67890", "url": "https://assets.fanart.tv/fanart/music/1b022e01-4da6-387b-8658-8678046e4cef/albumcover/nevermind-1.jpg", "likes": "156" } ], "cdart": [...] } } } ``` ### Image Types **Artist images**: - `artistbackground`: Background images (1920x1080) - `artistthumb`: Artist thumbnails (1000x1000) - `hdmusiclogo`: HD logos (transparent PNG) - `musicbanner`: Banners (1000x185) - `musiclogo`: Standard logos (transparent PNG) **Album images**: - `albumcover`: Album covers (1000x1000) - `cdart`: CD art (transparent PNG) ### Mapping to Lidarr Image Types ```python FANART_TYPE_MAPPING = { 'artistbackground': 'fanart', 'artistthumb': 'poster', 'hdmusiclogo': 'logo', 'musicbanner': 'banner', 'musiclogo': 'logo', 'albumcover': 'cover', 'cdart': 'disc' } ``` ### Error Handling **404 Not Found**: No images available for artist/album **429 Too Many Requests**: Rate limit exceeded (retry with backoff) **503 Service Unavailable**: FanArt.tv temporarily down **Fallback**: Use TheAudioDB or Cover Art Archive ### Caching Strategy **Cache TTL**: 30 days (images rarely change) **Cache key**: `fanart:artist:{mbid}` or `fanart:album:{mbid}` **Invalidation**: Manual only (images are immutable) ## 5. TheAudioDB ### Overview **Type**: REST API **Purpose**: Fallback metadata and images **Base URL**: `https://theaudiodb.com/api/v1/json` **Authentication**: API key "1" (public key) ### Configuration ```python THEAUDIODB = { 'api_key': '1', 'base_url': 'https://theaudiodb.com/api/v1/json', 'timeout': 10, 'cache_ttl': 2592000 # 30 days } ``` ### Endpoints #### Artist by MusicBrainz ID **Endpoint**: `GET /1/artist-mb.php?i={mbid}` **Request**: ```python async def get_theaudiodb_artist(mbid): async with aiohttp.ClientSession() as session: url = f"{config.THEAUDIODB['base_url']}/1/artist-mb.php" params = {'i': mbid} async with session.get(url, params=params, timeout=10) as response: if response.status == 404: return None response.raise_for_status() data = await response.json() return data['artists'][0] if data['artists'] else None ``` **Response**: ```json { "artists": [ { "idArtist": "111247", "strArtist": "Nirvana", "strArtistAlternate": "", "strLabel": "DGC Records", "idLabel": "45114", "intFormedYear": "1987", "intBornYear": "", "intDiedYear": "", "strDisbanded": "1994", "strStyle": "Grunge", "strGenre": "Rock", "strMood": "Angry", "strWebsite": "www.nirvana.com", "strFacebook": "www.facebook.com/Nirvana", "strTwitter": "twitter.com/nirvana", "strBiographyEN": "Nirvana was an American rock band...", "strBiographyDE": null, "strBiographyFR": null, "strGender": "Male", "strCountry": "United States", "strCountryCode": "US", "strArtistThumb": "https://www.theaudiodb.com/images/media/artist/thumb/uxrqxy1347913147.jpg", "strArtistLogo": "https://www.theaudiodb.com/images/media/artist/logo/urspuv1434553994.png", "strArtistFanart": "https://www.theaudiodb.com/images/media/artist/fanart/spvryu1347980801.jpg", "strArtistBanner": "https://www.theaudiodb.com/images/media/artist/banner/xuypqw1342640163.jpg", "strMusicBrainzID": "5b11f4ce-a62d-471e-81fc-a69a8278c7da", "strLastFMChart": "https://www.last.fm/music/Nirvana", "intCharted": "5", "strLocked": "unlocked" } ] } ``` #### Album by MusicBrainz ID **Endpoint**: `GET /1/album-mb.php?i={mbid}` **Response**: Similar structure with album-specific fields ### Data Extraction **Biography**: ```python def extract_biography(artist_data): """Extract biography with language fallback""" languages = ['EN', 'DE', 'FR', 'ES', 'IT', 'JP'] for lang in languages: bio = artist_data.get(f'strBiography{lang}') if bio: return bio return None ``` **Images**: ```python def extract_images(artist_data): """Extract image URLs""" images = [] if artist_data.get('strArtistThumb'): images.append({ 'Url': artist_data['strArtistThumb'], 'CoverType': 'poster', 'Extension': '.jpg' }) if artist_data.get('strArtistLogo'): images.append({ 'Url': artist_data['strArtistLogo'], 'CoverType': 'logo', 'Extension': '.png' }) if artist_data.get('strArtistFanart'): images.append({ 'Url': artist_data['strArtistFanart'], 'CoverType': 'fanart', 'Extension': '.jpg' }) if artist_data.get('strArtistBanner'): images.append({ 'Url': artist_data['strArtistBanner'], 'CoverType': 'banner', 'Extension': '.jpg' }) return images ``` **Links**: ```python def extract_links(artist_data): """Extract social media links""" links = [] if artist_data.get('strWebsite'): links.append({ 'Url': f"http://{artist_data['strWebsite']}", 'Name': 'website' }) if artist_data.get('strFacebook'): links.append({ 'Url': f"https://{artist_data['strFacebook']}", 'Name': 'facebook' }) if artist_data.get('strTwitter'): links.append({ 'Url': f"https://{artist_data['strTwitter']}", 'Name': 'twitter' }) return links ``` ### Error Handling **404 Not Found**: Artist/album not in TheAudioDB **Timeout**: 10-second timeout, fallback to other providers **Invalid JSON**: Graceful degradation ## 6. Wikipedia ### Overview **Type**: Web scraping **Purpose**: Artist biographical information **Base URL**: `https://{lang}.wikipedia.org` **Authentication**: None (public access) ### Configuration ```python WIKIPEDIA = { 'timeout': 2, 'max_connections_per_host': 1, 'user_agent': 'LidarrMetadataAPI/10.0.0 (https://github.com/Lidarr/LidarrAPI.Metadata)', 'languages': ['en', 'fr', 'de', 'es', 'it', 'ja', 'zh', 'ru', 'pt', 'nl', 'sv', 'fi', 'no', 'da', 'pl', 'cs', 'hu', 'ro', 'tr', 'el', 'he', 'ar', 'fa', 'hi', 'th', 'ko', 'vi', 'id', 'ms', 'tl', 'bn', 'ta'] } ``` ### Lookup Process **Multi-stage lookup**: 1. **MusicBrainz → Wikidata**: Extract Wikidata ID from MusicBrainz links 2. **Wikidata → Wikipedia**: Get Wikipedia article title from Wikidata 3. **Wikipedia → Extract**: Scrape and parse Wikipedia article ### Wikidata Integration **Wikidata entity URL**: `https://www.wikidata.org/wiki/Special:EntityData/{entity_id}.json` **Extract Wikipedia links**: ```python async def get_wikipedia_title_from_wikidata(wikidata_id, language='en'): """Get Wikipedia article title from Wikidata entity""" async with aiohttp.ClientSession() as session: url = f"https://www.wikidata.org/wiki/Special:EntityData/{wikidata_id}.json" async with session.get(url, timeout=2) as response: data = await response.json() entity = data['entities'][wikidata_id] # Get Wikipedia link for language sitelinks = entity.get('sitelinks', {}) wiki_key = f'{language}wiki' if wiki_key in sitelinks: return sitelinks[wiki_key]['title'] return None ``` ### Wikipedia Article Extraction **Fetch article HTML**: ```python async def get_wikipedia_article(title, language='en'): """Fetch Wikipedia article HTML""" async with aiohttp.ClientSession() as session: url = f"https://{language}.wikipedia.org/wiki/{title}" headers = {'User-Agent': config.WIKIPEDIA['user_agent']} async with session.get(url, headers=headers, timeout=2) as response: if response.status == 404: return None response.raise_for_status() return await response.text() ``` **Parse and extract summary**: ```python from bs4 import BeautifulSoup def extract_wikipedia_summary(html): """Extract first paragraph as summary""" soup = BeautifulSoup(html, 'lxml') # Find main content div content = soup.find('div', {'id': 'mw-content-text'}) if not content: return None # Find first paragraph (skip disambiguation notices) for p in content.find_all('p', recursive=False): text = p.get_text().strip() # Skip empty paragraphs if not text: continue # Skip coordinate-only paragraphs if text.startswith('Coordinates:'): continue # Return first substantial paragraph if len(text) > 50: return text return None ``` ### Language Fallback **32-language fallback chain**: ```python async def get_artist_overview(mbid): """Get artist overview with language fallback""" # Get Wikidata ID from MusicBrainz wikidata_id = await get_wikidata_id_from_musicbrainz(mbid) if not wikidata_id: return None # Try each language in order for language in config.WIKIPEDIA['languages']: try: # Get Wikipedia title for language title = await get_wikipedia_title_from_wikidata(wikidata_id, language) if not title: continue # Fetch and parse article html = await get_wikipedia_article(title, language) if not html: continue summary = extract_wikipedia_summary(html) if summary: return summary except Exception as e: logger.debug(f"Wikipedia lookup failed for {language}: {e}") continue return None ``` ### Rate Limiting **Polite crawling**: - 1 connection per host maximum - 2-second timeout per request - User-Agent header identifies bot - Respect robots.txt (manual check) **No explicit rate limit**: Wikipedia allows reasonable bot traffic ### Error Handling **404 Not Found**: Article doesn't exist in language **Timeout**: 2-second timeout, try next language **Parse errors**: Graceful degradation, try next language **Fallback**: Use TheAudioDB biography if Wikipedia unavailable ## 7. Spotify ### Overview **Type**: REST API with OAuth **Purpose**: ID mapping and cross-platform linking **Base URL**: `https://api.spotify.com/v1` **Authentication**: OAuth 2.0 Client Credentials **Library**: spotipy 2.16.1 ### Configuration ```python SPOTIFY = { 'client_id': 'your-client-id', 'client_secret': 'your-client-secret', 'redirect_uri': 'http://localhost:5001/spotify/callback', 'timeout': 5 } ``` ### OAuth Flow **Client Credentials Grant** (for server-to-server): ```python import spotipy from spotipy.oauth2 import SpotifyClientCredentials auth_manager = SpotifyClientCredentials( client_id=config.SPOTIFY['client_id'], client_secret=config.SPOTIFY['client_secret'] ) spotify = spotipy.Spotify(auth_manager=auth_manager) ``` **Token caching**: Tokens cached in Redis with automatic refresh ### ID Mapping **MusicBrainz → Spotify**: ```python async def map_musicbrainz_to_spotify(mbid, artist_name): """Map MusicBrainz ID to Spotify ID""" # Search Spotify by artist name results = spotify.search(q=f'artist:{artist_name}', type='artist', limit=10) if not results['artists']['items']: return None # Find best match using Levenshtein distance best_match = None best_score = 0 for artist in results['artists']['items']: score = levenshtein_similarity(artist_name, artist['name']) if score > best_score and score >= 0.8: best_score = score best_match = artist return best_match['id'] if best_match else None ``` **Levenshtein similarity**: ```python from Levenshtein import ratio def levenshtein_similarity(s1, s2): """Calculate Levenshtein similarity (0-1)""" return ratio(s1.lower(), s2.lower()) ``` **Threshold**: 0.8 minimum similarity for match ### Spotify API Endpoints **Get artist**: ```python artist = spotify.artist('6olE6TJLqED3rqDCT0FyPh') ``` **Get album**: ```python album = spotify.album('2guirTSEqLizK7j9i1MTTZ') ``` **Search**: ```python results = spotify.search(q='nirvana', type='artist', limit=10) ``` ### Error Handling **429 Too Many Requests**: Retry with exponential backoff **401 Unauthorized**: Refresh OAuth token **404 Not Found**: Artist/album not on Spotify **Timeout**: 5-second timeout, graceful degradation ### Caching Strategy **Cache TTL**: 90 days (Spotify IDs rarely change) **Cache key**: `spotify:artist:{spotify_id}` or `spotify:mbid:{mbid}` ## 8. Last.fm ### Overview **Type**: REST API **Purpose**: Music charts and scrobble data **Base URL**: `https://ws.audioscrobbler.com/2.0` **Authentication**: API key **Library**: pylast 4.3.0 ### Configuration ```python LASTFM = { 'api_key': 'your-api-key', 'api_secret': 'your-api-secret', 'timeout': 5 } ``` ### pylast Integration ```python import pylast network = pylast.LastFMNetwork( api_key=config.LASTFM['api_key'], api_secret=config.LASTFM['api_secret'] ) ``` ### Chart Endpoints **Top artists**: ```python def get_lastfm_top_artists(limit=50): """Get Last.fm top artists chart""" top_artists = network.get_top_artists(limit=limit) results = [] for artist in top_artists: results.append({ 'name': artist.item.name, 'playcount': artist.weight, 'listeners': artist.item.get_listener_count() }) return results ``` **Top albums**: ```python def get_lastfm_top_albums(limit=50): """Get Last.fm top albums chart""" top_albums = network.get_top_albums(limit=limit) results = [] for album in top_albums: results.append({ 'name': album.item.title, 'artist': album.item.artist.name, 'playcount': album.weight }) return results ``` **Top tracks**: ```python def get_lastfm_top_tracks(limit=50): """Get Last.fm top tracks chart""" top_tracks = network.get_top_tracks(limit=limit) results = [] for track in top_tracks: results.append({ 'name': track.item.title, 'artist': track.item.artist.name, 'playcount': track.weight }) return results ``` ### MusicBrainz Mapping **Map Last.fm artist to MusicBrainz**: ```python async def map_lastfm_to_musicbrainz(lastfm_artist_name): """Map Last.fm artist to MusicBrainz ID""" # Search MusicBrainz via Solr results = await search_artist(lastfm_artist_name, limit=5) if not results: return None # Return best match (first result) return results[0]['Id'] ``` ### Caching **Cache TTL**: 6 hours (charts update daily) **Cache key**: `lastfm:chart:{type}:{limit}` ## 9. Billboard ### Overview **Type**: Web scraping **Purpose**: Billboard music charts **Base URL**: `https://www.billboard.com/charts` **Authentication**: None **Library**: billboard-py 7.0.0 ### billboard-py Integration ```python import billboard def get_billboard_hot_100(): """Get Billboard Hot 100 chart""" chart = billboard.ChartData('hot-100') results = [] for entry in chart: results.append({ 'position': entry.rank, 'title': entry.title, 'artist': entry.artist, 'last_position': entry.lastPos, 'peak_position': entry.peakPos, 'weeks_on_chart': entry.weeks }) return results ``` ### Supported Charts | Chart Name | billboard-py ID | Type | |------------|-----------------|------| | **Hot 100** | `hot-100` | Tracks | | **Billboard 200** | `billboard-200` | Albums | | **Artist 100** | `artist-100` | Artists | | **Streaming Songs** | `streaming-songs` | Tracks | | **Radio Songs** | `radio-songs` | Tracks | | **Digital Song Sales** | `digital-song-sales` | Tracks | ### MusicBrainz Mapping **Map Billboard entry to MusicBrainz**: ```python async def map_billboard_to_musicbrainz(artist_name, track_title=None): """Map Billboard entry to MusicBrainz""" # Search artist artist_results = await search_artist(artist_name, limit=5) if not artist_results: return None artist_mbid = artist_results[0]['Id'] # If track title provided, search for recording if track_title: # Search would require recording search (not implemented) pass return artist_mbid ``` ### Error Handling **HTTP errors**: Retry with backoff **Parse errors**: Graceful degradation **Rate limiting**: Polite crawling (1 request per second) ### Caching **Cache TTL**: 6 hours (charts update weekly) **Cache key**: `billboard:chart:{chart_name}` ## 10. Apple Music / iTunes ### Overview **Type**: RSS API **Purpose**: Apple Music and iTunes charts **Base URL**: `https://rss.applemarketingtools.com/api/v2` **Authentication**: None ### RSS Feed URLs **Top albums**: ``` https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/albums.json ``` **Top songs**: ``` https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/songs.json ``` **New releases**: ``` https://rss.applemarketingtools.com/api/v2/us/music/new-releases/100/albums.json ``` ### Fetch and Parse ```python async def get_apple_music_chart(chart_type, limit=100): """Fetch Apple Music chart""" async with aiohttp.ClientSession() as session: url = f"https://rss.applemarketingtools.com/api/v2/us/music/most-played/{limit}/{chart_type}.json" async with session.get(url, timeout=5) as response: response.raise_for_status() data = await response.json() results = [] for entry in data['feed']['results']: results.append({ 'position': len(results) + 1, 'name': entry['name'], 'artist': entry['artistName'], 'url': entry['url'], 'artwork': entry['artworkUrl100'] }) return results ``` ### MusicBrainz Mapping **Map Apple Music entry to MusicBrainz**: Similar to Billboard mapping ### Caching **Cache TTL**: 6 hours **Cache key**: `apple:chart:{type}:{limit}` ## 11. RabbitMQ ### Overview **Type**: Message queue **Purpose**: Real-time search index updates **Technology**: RabbitMQ 3.x **Protocol**: AMQP 0.9.1 ### Configuration ```python RABBITMQ = { 'host': 'rabbitmq', 'port': 5672, 'user': 'abc', 'password': 'abc', 'exchange': 'search.index', 'artist_queue': 'search.index.artist', 'album_queue': 'search.index.album' } ``` ### Message Format **Artist update message**: ```json { "entity_type": "artist", "mbid": "5b11f4ce-a62d-471e-81fc-a69a8278c7da", "action": "update", "timestamp": "2025-04-28T12:34:56Z" } ``` **Album update message**: ```json { "entity_type": "release_group", "mbid": "1b022e01-4da6-387b-8658-8678046e4cef", "action": "update", "timestamp": "2025-04-28T12:34:56Z" } ``` ### SIR (Search Index Rebuilder) **Purpose**: Consume RabbitMQ messages and update Solr **Process**: 1. Connect to RabbitMQ 2. Subscribe to queues 3. Consume messages 4. Query MusicBrainz DB for entity 5. Post update to Solr 6. Acknowledge message **Container**: Separate service in docker-compose ### Monitoring **Queue depth**: ```bash rabbitmqctl list_queues name messages ``` **Consumer count**: ```bash rabbitmqctl list_consumers ``` ## 12. Redis ### Overview **Type**: In-memory cache **Purpose**: Ephemeral cache and rate limiting **Technology**: Redis 6+ **Memory**: 512MB limit ### Configuration ```python REDIS = { 'url': 'redis://redis:6379/0', 'namespace': 'lm3.7', 'max_memory': '512mb', 'eviction_policy': 'allkeys-lfu' } ``` ### Use Cases 1. **Hot cache**: Frequently accessed metadata 2. **Rate limiting**: Request counting 3. **Sentry deduplication**: Error tracking 4. **Invalidation locks**: Distributed locking ### Connection Pool ```python import aioredis redis = await aioredis.create_redis_pool( config.REDIS['url'], minsize=5, maxsize=20, encoding='utf-8' ) ``` ## 13. Sentry ### Overview **Type**: Error tracking **Purpose**: Application monitoring **Technology**: Sentry SaaS **Library**: sentry-sdk 0.19.5 ### Configuration ```python import sentry_sdk from sentry_sdk.integrations.flask import FlaskIntegration sentry_sdk.init( dsn=config.SENTRY_DSN, integrations=[FlaskIntegration()], release=f"lidarr-metadata@{__version__}", environment=config.ENVIRONMENT, traces_sample_rate=0.1 ) ``` ### Redis-Based Rate Limiting **Purpose**: Prevent alert fatigue ```python class SentryRedisTtlProcessor: """Rate limit Sentry events using Redis""" def __init__(self, redis, ttl=3600): self.redis = redis self.ttl = ttl async def __call__(self, event, hint): # Generate error hash error_hash = hashlib.md5( f"{event['exception']['type']}:{event['exception']['value']}".encode() ).hexdigest() key = f"lm3.7:sentry:{error_hash}" # Check if error seen recently if await self.redis.exists(key): return None # Drop event # Mark error as seen await self.redis.setex(key, self.ttl, "1") return event ``` ### Release Tracking **Sentry releases**: Tied to git commits **CI/CD integration**: ```bash sentry-cli releases new "lidarr-metadata@${GIT_SHA}" sentry-cli releases set-commits "lidarr-metadata@${GIT_SHA}" --auto sentry-cli releases finalize "lidarr-metadata@${GIT_SHA}" ``` ## 14. Telegraf ### Overview **Type**: Metrics collection **Purpose**: StatsD metrics aggregation **Technology**: Telegraf (InfluxData) **Protocol**: StatsD ### Configuration ```python TELEGRAF = { 'host': 'telegraf', 'port': 8125, 'prefix': 'lidarr.metadata' } ``` ### StatsD Client ```python import statsd stats = statsd.StatsClient( host=config.TELEGRAF['host'], port=config.TELEGRAF['port'], prefix=config.TELEGRAF['prefix'] ) ``` ### Metrics **Request counters**: ```python stats.incr('requests.artist') stats.incr('requests.album') stats.incr('requests.search') ``` **Response times**: ```python with stats.timer('response_time.artist'): artist = await get_artist(mbid) ``` **Cache hits/misses**: ```python stats.incr('cache.hit') stats.incr('cache.miss') ``` **Provider requests**: ```python stats.incr('provider.fanart.request') stats.incr('provider.wikipedia.request') ``` ## 15. Cloudflare ### Overview **Type**: CDN and edge caching **Purpose**: Global content delivery **Technology**: Cloudflare CDN **API**: Cloudflare REST API v4 ### Configuration ```python CLOUDFLARE = { 'zone_id': 'your-zone-id', 'api_token': 'your-api-token', 'base_url': 'https://api.cloudflare.com/client/v4' } ``` ### Cache Purge **Purge by URL**: ```python async def purge_cloudflare_cache(urls): """Purge Cloudflare cache for URLs""" async with aiohttp.ClientSession() as session: headers = { 'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}", 'Content-Type': 'application/json' } # Batch URLs (max 30 per request) for batch in chunks(urls, 30): data = {'files': batch} url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache" async with session.post(url, headers=headers, json=data) as response: response.raise_for_status() ``` **Purge all**: ```python async def purge_all_cloudflare_cache(): """Purge entire Cloudflare cache""" async with aiohttp.ClientSession() as session: headers = { 'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}", 'Content-Type': 'application/json' } data = {'purge_everything': True} url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache" async with session.post(url, headers=headers, json=data) as response: response.raise_for_status() ``` ### Rate Limits **Cloudflare API**: 1200 requests per 5 minutes **Batch purging**: Max 30 URLs per request ### Cache-Control Headers **Set by API**: ```python response.headers['Cache-Control'] = 's-maxage=2592000, max-age=0' ``` **Interpretation**: - `s-maxage=2592000`: CDN caches for 30 days - `max-age=0`: Clients must revalidate ## Integration Summary The 15 integrations provide comprehensive metadata aggregation: **Core data**: MusicBrainz DB (direct SQL) **Search**: Solr (real-time via RabbitMQ) **Images**: Cover Art Archive, FanArt.tv, TheAudioDB **Biographies**: Wikipedia (32 languages), TheAudioDB **Charts**: Last.fm, Billboard, Apple Music, Spotify **Cross-platform**: Spotify ID mapping **Infrastructure**: Redis (cache), PostgreSQL (persistent cache), RabbitMQ (messaging) **Monitoring**: Sentry (errors), Telegraf (metrics) **CDN**: Cloudflare (edge caching) The integration architecture demonstrates excellent separation of concerns with fallback chains for resilience.