metadata-agregator/docs/research/lidarr-metadata-api/analysis/INTEGRATIONS.md

# Lidarr Metadata API - External Integrations

## Integration Overview

The Lidarr Metadata API integrates with 15 external systems to provide comprehensive music metadata aggregation:

| Integration | Type | Purpose | Authentication | Rate Limit |
|-------------|------|---------|----------------|------------|
| **MusicBrainz DB** | Database | Core metadata | Read-only user | N/A |
| **Solr Search** | Search Engine | Full-text search | None | N/A |
| **Cover Art Archive** | CDN | Album cover art | None | N/A |
| **FanArt.tv** | REST API | Artist/album images | API key | 7-day lag (free) |
| **TheAudioDB** | REST API | Metadata fallback | API key "1" | Unknown |
| **Wikipedia** | Web Scraping | Artist biographies | None | Polite crawling |
| **Spotify** | REST API | ID mapping, charts | OAuth | 429 handling |
| **Last.fm** | REST API | Charts | API key | Unknown |
| **Billboard** | Web Scraping | Charts | None | Polite crawling |
| **Apple Music** | RSS API | Charts | None | N/A |
| **RabbitMQ** | Message Queue | Search index updates | Basic auth | N/A |
| **Redis** | Cache | Ephemeral cache | None | N/A |
| **Sentry** | Error Tracking | Monitoring | DSN | Redis-based |
| **Telegraf** | Metrics | StatsD metrics | None | N/A |
| **Cloudflare** | CDN | Edge caching | API token | 1200 req/5min |

## 1. MusicBrainz Database

### Overview

**Type**: Direct PostgreSQL database access

**Purpose**: Authoritative source for all music metadata

**Container**: `ghcr.io/lidarr/mb-postgres:1.0.10`

**Access method**: Read-only asyncpg connection

### Configuration

```python
MUSICBRAINZ_DB = {
    'host': 'musicbrainz',
    'port': 5432,
    'database': 'musicbrainz_db',
    'user': 'musicbrainz_ro',  # Read-only user
    'password': 'abc',
    'min_pool_size': 10,
    'max_pool_size': 50,
    'command_timeout': 30
}
```

### Connection Pool

```python
import asyncpg

pool = await asyncpg.create_pool(
    host=config.MUSICBRAINZ_DB['host'],
    port=config.MUSICBRAINZ_DB['port'],
    database=config.MUSICBRAINZ_DB['database'],
    user=config.MUSICBRAINZ_DB['user'],
    password=config.MUSICBRAINZ_DB['password'],
    min_size=config.MUSICBRAINZ_DB['min_pool_size'],
    max_size=config.MUSICBRAINZ_DB['max_pool_size'],
    command_timeout=config.MUSICBRAINZ_DB['command_timeout']
)
```

### Replication Setup

**Replication method**: MusicBrainz replication packets

**Update frequency**: Hourly

**Replication script**: Custom script in container

**Process**:
1. Check current replication sequence
2. Download replication packets from MusicBrainz FTP
3. Apply SQL changes
4. Update replication control table
5. Trigger search index updates

**Monitoring replication lag**:
```sql
SELECT
    current_replication_sequence,
    last_replication_date,
    NOW() - last_replication_date AS lag
FROM replication_control;
```

### Database Size and Performance

**Database size**: 100GB+ (full MusicBrainz dataset)

**Query performance**:
- Simple artist lookup: 50-100ms
- Complex artist with releases: 100-500ms
- Album with tracks: 200-1000ms
- Change detection query: 500-2000ms

**Optimization**: Custom indices on `last_updated` columns

### Security Considerations

**Read-only access**: User has SELECT-only permissions

**Network isolation**: Database accessible only within Docker network

**Credentials**: Hardcoded (insecure default, should be changed)

## 2. Solr Search

### Overview

**Type**: Apache Solr 8.x search engine

**Purpose**: Full-text search for artists and albums

**Container**: `ghcr.io/lidarr/mb-solr:3.3.1.9`

**Cores**: `artist`, `release-group`

### Configuration

```python
SOLR = {
    'url': 'http://solr:8983/solr',
    'artist_core': 'artist',
    'album_core': 'release-group',
    'timeout': 5,
    'rows': 10
}
```

### Query Interface

**HTTP client**: aiohttp

**Query format**: JSON API

**Example query**:
```python
import aiohttp

async def search_artist(query, limit=10):
    async with aiohttp.ClientSession() as session:
        params = {
            'q': query,
            'defType': 'dismax',
            'qf': 'artist^2 sortname alias',
            'mm': '1',
            'rows': limit,
            'wt': 'json'
        }

        async with session.get(
            f"{config.SOLR['url']}/{config.SOLR['artist_core']}/select",
            params=params,
            timeout=aiohttp.ClientTimeout(total=config.SOLR['timeout'])
        ) as response:
            data = await response.json()
            return data['response']['docs']
```

### Real-Time Index Updates

**Update mechanism**: RabbitMQ + SIR (Search Index Rebuilder)

**Process**:
1. MusicBrainz database changes trigger RabbitMQ messages
2. SIR consumes messages from queue
3. SIR queries MusicBrainz DB for updated entity
4. SIR posts update to Solr core
5. Solr performs soft commit (1 second)

**Update latency**: 1-5 seconds from database change

### Index Maintenance

**Full reindex**: Required after schema changes

**Reindex process**:
```bash
# Stop SIR
docker-compose stop indexer

# Clear Solr cores
curl "http://solr:8983/solr/artist/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'
curl "http://solr:8983/solr/release-group/update?commit=true" -H "Content-Type: text/xml" --data-binary '<delete><query>*:*</query></delete>'

# Rebuild indices
docker-compose run indexer rebuild-artist
docker-compose run indexer rebuild-album

# Restart SIR
docker-compose start indexer
```

**Reindex duration**: 4-8 hours for full MusicBrainz dataset

### Performance Tuning

**JVM heap size**: 2GB

**Solr cache settings**:
```xml
<filterCache size="512" initialSize="512" autowarmCount="256"/>
<queryResultCache size="512" initialSize="512" autowarmCount="256"/>
<documentCache size="512" initialSize="512" autowarmCount="0"/>
```

**Commit settings**:
```xml
<autoCommit>
  <maxTime>15000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
  <maxTime>1000</maxTime>
</autoSoftCommit>
```

## 3. Cover Art Archive

### Overview

**Type**: Image CDN

**Purpose**: Album cover art images

**Base URL**: `https://coverartarchive.org`

**Proxy**: `https://imagecache.lidarr.audio`

### Image URL Format

**Direct URL**:
```
https://coverartarchive.org/release/{release-mbid}/front-500.jpg
```

**Proxied URL**:
```
https://imagecache.lidarr.audio/cover/{release-mbid}/front.jpg
```

### Image Types

| Type | Description | Typical Size |
|------|-------------|--------------|
| `front` | Front cover | 500x500 - 1200x1200 |
| `back` | Back cover | 500x500 - 1200x1200 |
| `booklet` | Booklet pages | Variable |
| `medium` | Disc/vinyl image | 500x500 |
| `tray` | CD tray card | Variable |
| `obi` | Japanese obi strip | Variable |
| `spine` | Spine image | Variable |
| `track` | Track listing | Variable |
| `liner` | Liner notes | Variable |
| `sticker` | Sticker image | Variable |
| `poster` | Poster image | Variable |

### Image Proxy Benefits

**Advantages of using imagecache.lidarr.audio**:
1. **Caching**: Images cached at edge for faster delivery
2. **Resizing**: Automatic image resizing via query parameters
3. **Format conversion**: WebP conversion for modern browsers
4. **Bandwidth**: Reduced load on Cover Art Archive
5. **Reliability**: Fallback to direct URL if proxy fails

**Proxy query parameters**:
```
https://imagecache.lidarr.audio/cover/{mbid}/front.jpg?size=500&format=webp
```

### Integration Code

```python
async def get_cover_art(release_mbid):
    """Fetch cover art URLs for release"""
    images = []

    # Try proxy first
    proxy_url = f"https://imagecache.lidarr.audio/cover/{release_mbid}/front.jpg"
    if await check_url_exists(proxy_url):
        images.append({
            'Url': proxy_url,
            'CoverType': 'cover',
            'Extension': '.jpg'
        })
    else:
        # Fallback to direct URL
        direct_url = f"https://coverartarchive.org/release/{release_mbid}/front-500.jpg"
        if await check_url_exists(direct_url):
            images.append({
                'Url': direct_url,
                'CoverType': 'cover',
                'Extension': '.jpg'
            })

    return images
```

### Error Handling

**404 Not Found**: No cover art available for release

**503 Service Unavailable**: Cover Art Archive temporarily down

**Fallback**: Use FanArt.tv or TheAudioDB images

## 4. FanArt.tv

### Overview

**Type**: REST API

**Purpose**: High-quality artist and album images

**Base URL**: `https://webservice.fanart.tv/v3`

**Authentication**: API key

### Configuration

```python
FANART = {
    'api_key': 'your-api-key-here',
    'base_url': 'https://webservice.fanart.tv/v3',
    'timeout': 10,
    'cache_ttl': 2592000  # 30 days
}
```

### API Key Types

| Key Type | Cost | Rate Limit | Image Lag |
|----------|------|------------|-----------|
| **Free** | Free | Unknown | 7 days |
| **Personal** | $2/month | Higher | No lag |
| **Commercial** | $5/month | Highest | No lag |

**7-day lag**: Free API keys only return images added 7+ days ago

### Endpoints

#### Artist Images

**Endpoint**: `GET /music/{mbid}`

**Request**:
```python
async def get_fanart_artist_images(mbid):
    async with aiohttp.ClientSession() as session:
        headers = {'api-key': config.FANART['api_key']}
        url = f"{config.FANART['base_url']}/music/{mbid}"

        async with session.get(url, headers=headers, timeout=10) as response:
            if response.status == 404:
                return []
            response.raise_for_status()
            return await response.json()
```

**Response**:
```json
{
  "name": "Nirvana",
  "mbid_id": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
  "artistbackground": [
    {
      "id": "12345",
      "url": "https://assets.fanart.tv/fanart/music/5b11f4ce-a62d-471e-81fc-a69a8278c7da/artistbackground/nirvana-1.jpg",
      "likes": "42"
    }
  ],
  "artistthumb": [...],
  "hdmusiclogo": [...],
  "musicbanner": [...],
  "musiclogo": [...]
}
```

#### Album Images

**Endpoint**: `GET /music/albums/{mbid}`

**Response**:
```json
{
  "albums": {
    "1b022e01-4da6-387b-8658-8678046e4cef": {
      "albumcover": [
        {
          "id": "67890",
          "url": "https://assets.fanart.tv/fanart/music/1b022e01-4da6-387b-8658-8678046e4cef/albumcover/nevermind-1.jpg",
          "likes": "156"
        }
      ],
      "cdart": [...]
    }
  }
}
```

### Image Types

**Artist images**:
- `artistbackground`: Background images (1920x1080)
- `artistthumb`: Artist thumbnails (1000x1000)
- `hdmusiclogo`: HD logos (transparent PNG)
- `musicbanner`: Banners (1000x185)
- `musiclogo`: Standard logos (transparent PNG)

**Album images**:
- `albumcover`: Album covers (1000x1000)
- `cdart`: CD art (transparent PNG)

### Mapping to Lidarr Image Types

```python
FANART_TYPE_MAPPING = {
    'artistbackground': 'fanart',
    'artistthumb': 'poster',
    'hdmusiclogo': 'logo',
    'musicbanner': 'banner',
    'musiclogo': 'logo',
    'albumcover': 'cover',
    'cdart': 'disc'
}
```

### Error Handling

**404 Not Found**: No images available for artist/album

**429 Too Many Requests**: Rate limit exceeded (retry with backoff)

**503 Service Unavailable**: FanArt.tv temporarily down

**Fallback**: Use TheAudioDB or Cover Art Archive

### Caching Strategy

**Cache TTL**: 30 days (images rarely change)

**Cache key**: `fanart:artist:{mbid}` or `fanart:album:{mbid}`

**Invalidation**: Manual only (images are immutable)

## 5. TheAudioDB

### Overview

**Type**: REST API

**Purpose**: Fallback metadata and images

**Base URL**: `https://theaudiodb.com/api/v1/json`

**Authentication**: API key "1" (public key)

### Configuration

```python
THEAUDIODB = {
    'api_key': '1',
    'base_url': 'https://theaudiodb.com/api/v1/json',
    'timeout': 10,
    'cache_ttl': 2592000  # 30 days
}
```

### Endpoints

#### Artist by MusicBrainz ID

**Endpoint**: `GET /1/artist-mb.php?i={mbid}`

**Request**:
```python
async def get_theaudiodb_artist(mbid):
    async with aiohttp.ClientSession() as session:
        url = f"{config.THEAUDIODB['base_url']}/1/artist-mb.php"
        params = {'i': mbid}

        async with session.get(url, params=params, timeout=10) as response:
            if response.status == 404:
                return None
            response.raise_for_status()
            data = await response.json()
            return data['artists'][0] if data['artists'] else None
```

**Response**:
```json
{
  "artists": [
    {
      "idArtist": "111247",
      "strArtist": "Nirvana",
      "strArtistAlternate": "",
      "strLabel": "DGC Records",
      "idLabel": "45114",
      "intFormedYear": "1987",
      "intBornYear": "",
      "intDiedYear": "",
      "strDisbanded": "1994",
      "strStyle": "Grunge",
      "strGenre": "Rock",
      "strMood": "Angry",
      "strWebsite": "www.nirvana.com",
      "strFacebook": "www.facebook.com/Nirvana",
      "strTwitter": "twitter.com/nirvana",
      "strBiographyEN": "Nirvana was an American rock band...",
      "strBiographyDE": null,
      "strBiographyFR": null,
      "strGender": "Male",
      "strCountry": "United States",
      "strCountryCode": "US",
      "strArtistThumb": "https://www.theaudiodb.com/images/media/artist/thumb/uxrqxy1347913147.jpg",
      "strArtistLogo": "https://www.theaudiodb.com/images/media/artist/logo/urspuv1434553994.png",
      "strArtistFanart": "https://www.theaudiodb.com/images/media/artist/fanart/spvryu1347980801.jpg",
      "strArtistBanner": "https://www.theaudiodb.com/images/media/artist/banner/xuypqw1342640163.jpg",
      "strMusicBrainzID": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
      "strLastFMChart": "https://www.last.fm/music/Nirvana",
      "intCharted": "5",
      "strLocked": "unlocked"
    }
  ]
}
```

#### Album by MusicBrainz ID

**Endpoint**: `GET /1/album-mb.php?i={mbid}`

**Response**: Similar structure with album-specific fields

### Data Extraction

**Biography**:
```python
def extract_biography(artist_data):
    """Extract biography with language fallback"""
    languages = ['EN', 'DE', 'FR', 'ES', 'IT', 'JP']
    for lang in languages:
        bio = artist_data.get(f'strBiography{lang}')
        if bio:
            return bio
    return None
```

**Images**:
```python
def extract_images(artist_data):
    """Extract image URLs"""
    images = []

    if artist_data.get('strArtistThumb'):
        images.append({
            'Url': artist_data['strArtistThumb'],
            'CoverType': 'poster',
            'Extension': '.jpg'
        })

    if artist_data.get('strArtistLogo'):
        images.append({
            'Url': artist_data['strArtistLogo'],
            'CoverType': 'logo',
            'Extension': '.png'
        })

    if artist_data.get('strArtistFanart'):
        images.append({
            'Url': artist_data['strArtistFanart'],
            'CoverType': 'fanart',
            'Extension': '.jpg'
        })

    if artist_data.get('strArtistBanner'):
        images.append({
            'Url': artist_data['strArtistBanner'],
            'CoverType': 'banner',
            'Extension': '.jpg'
        })

    return images
```

**Links**:
```python
def extract_links(artist_data):
    """Extract social media links"""
    links = []

    if artist_data.get('strWebsite'):
        links.append({
            'Url': f"http://{artist_data['strWebsite']}",
            'Name': 'website'
        })

    if artist_data.get('strFacebook'):
        links.append({
            'Url': f"https://{artist_data['strFacebook']}",
            'Name': 'facebook'
        })

    if artist_data.get('strTwitter'):
        links.append({
            'Url': f"https://{artist_data['strTwitter']}",
            'Name': 'twitter'
        })

    return links
```

### Error Handling

**404 Not Found**: Artist/album not in TheAudioDB

**Timeout**: 10-second timeout, fallback to other providers

**Invalid JSON**: Graceful degradation

## 6. Wikipedia

### Overview

**Type**: Web scraping

**Purpose**: Artist biographical information

**Base URL**: `https://{lang}.wikipedia.org`

**Authentication**: None (public access)

### Configuration

```python
WIKIPEDIA = {
    'timeout': 2,
    'max_connections_per_host': 1,
    'user_agent': 'LidarrMetadataAPI/10.0.0 (https://github.com/Lidarr/LidarrAPI.Metadata)',
    'languages': ['en', 'fr', 'de', 'es', 'it', 'ja', 'zh', 'ru', 'pt', 'nl', 'sv', 'fi', 'no', 'da', 'pl', 'cs', 'hu', 'ro', 'tr', 'el', 'he', 'ar', 'fa', 'hi', 'th', 'ko', 'vi', 'id', 'ms', 'tl', 'bn', 'ta']
}
```

### Lookup Process

**Multi-stage lookup**:

1. **MusicBrainz → Wikidata**: Extract Wikidata ID from MusicBrainz links
2. **Wikidata → Wikipedia**: Get Wikipedia article title from Wikidata
3. **Wikipedia → Extract**: Scrape and parse Wikipedia article

### Wikidata Integration

**Wikidata entity URL**: `https://www.wikidata.org/wiki/Special:EntityData/{entity_id}.json`

**Extract Wikipedia links**:
```python
async def get_wikipedia_title_from_wikidata(wikidata_id, language='en'):
    """Get Wikipedia article title from Wikidata entity"""
    async with aiohttp.ClientSession() as session:
        url = f"https://www.wikidata.org/wiki/Special:EntityData/{wikidata_id}.json"

        async with session.get(url, timeout=2) as response:
            data = await response.json()
            entity = data['entities'][wikidata_id]

            # Get Wikipedia link for language
            sitelinks = entity.get('sitelinks', {})
            wiki_key = f'{language}wiki'

            if wiki_key in sitelinks:
                return sitelinks[wiki_key]['title']

            return None
```

### Wikipedia Article Extraction

**Fetch article HTML**:
```python
async def get_wikipedia_article(title, language='en'):
    """Fetch Wikipedia article HTML"""
    async with aiohttp.ClientSession() as session:
        url = f"https://{language}.wikipedia.org/wiki/{title}"
        headers = {'User-Agent': config.WIKIPEDIA['user_agent']}

        async with session.get(url, headers=headers, timeout=2) as response:
            if response.status == 404:
                return None
            response.raise_for_status()
            return await response.text()
```

**Parse and extract summary**:
```python
from bs4 import BeautifulSoup

def extract_wikipedia_summary(html):
    """Extract first paragraph as summary"""
    soup = BeautifulSoup(html, 'lxml')

    # Find main content div
    content = soup.find('div', {'id': 'mw-content-text'})
    if not content:
        return None

    # Find first paragraph (skip disambiguation notices)
    for p in content.find_all('p', recursive=False):
        text = p.get_text().strip()

        # Skip empty paragraphs
        if not text:
            continue

        # Skip coordinate-only paragraphs
        if text.startswith('Coordinates:'):
            continue

        # Return first substantial paragraph
        if len(text) > 50:
            return text

    return None
```

### Language Fallback

**32-language fallback chain**:

```python
async def get_artist_overview(mbid):
    """Get artist overview with language fallback"""
    # Get Wikidata ID from MusicBrainz
    wikidata_id = await get_wikidata_id_from_musicbrainz(mbid)
    if not wikidata_id:
        return None

    # Try each language in order
    for language in config.WIKIPEDIA['languages']:
        try:
            # Get Wikipedia title for language
            title = await get_wikipedia_title_from_wikidata(wikidata_id, language)
            if not title:
                continue

            # Fetch and parse article
            html = await get_wikipedia_article(title, language)
            if not html:
                continue

            summary = extract_wikipedia_summary(html)
            if summary:
                return summary

        except Exception as e:
            logger.debug(f"Wikipedia lookup failed for {language}: {e}")
            continue

    return None
```

### Rate Limiting

**Polite crawling**:
- 1 connection per host maximum
- 2-second timeout per request
- User-Agent header identifies bot
- Respect robots.txt (manual check)

**No explicit rate limit**: Wikipedia allows reasonable bot traffic

### Error Handling

**404 Not Found**: Article doesn't exist in language

**Timeout**: 2-second timeout, try next language

**Parse errors**: Graceful degradation, try next language

**Fallback**: Use TheAudioDB biography if Wikipedia unavailable

## 7. Spotify

### Overview

**Type**: REST API with OAuth

**Purpose**: ID mapping and cross-platform linking

**Base URL**: `https://api.spotify.com/v1`

**Authentication**: OAuth 2.0 Client Credentials

**Library**: spotipy 2.16.1

### Configuration

```python
SPOTIFY = {
    'client_id': 'your-client-id',
    'client_secret': 'your-client-secret',
    'redirect_uri': 'http://localhost:5001/spotify/callback',
    'timeout': 5
}
```

### OAuth Flow

**Client Credentials Grant** (for server-to-server):

```python
import spotipy
from spotipy.oauth2 import SpotifyClientCredentials

auth_manager = SpotifyClientCredentials(
    client_id=config.SPOTIFY['client_id'],
    client_secret=config.SPOTIFY['client_secret']
)

spotify = spotipy.Spotify(auth_manager=auth_manager)
```

**Token caching**: Tokens cached in Redis with automatic refresh

### ID Mapping

**MusicBrainz → Spotify**:

```python
async def map_musicbrainz_to_spotify(mbid, artist_name):
    """Map MusicBrainz ID to Spotify ID"""
    # Search Spotify by artist name
    results = spotify.search(q=f'artist:{artist_name}', type='artist', limit=10)

    if not results['artists']['items']:
        return None

    # Find best match using Levenshtein distance
    best_match = None
    best_score = 0

    for artist in results['artists']['items']:
        score = levenshtein_similarity(artist_name, artist['name'])
        if score > best_score and score >= 0.8:
            best_score = score
            best_match = artist

    return best_match['id'] if best_match else None
```

**Levenshtein similarity**:
```python
from Levenshtein import ratio

def levenshtein_similarity(s1, s2):
    """Calculate Levenshtein similarity (0-1)"""
    return ratio(s1.lower(), s2.lower())
```

**Threshold**: 0.8 minimum similarity for match

### Spotify API Endpoints

**Get artist**:
```python
artist = spotify.artist('6olE6TJLqED3rqDCT0FyPh')
```

**Get album**:
```python
album = spotify.album('2guirTSEqLizK7j9i1MTTZ')
```

**Search**:
```python
results = spotify.search(q='nirvana', type='artist', limit=10)
```

### Error Handling

**429 Too Many Requests**: Retry with exponential backoff

**401 Unauthorized**: Refresh OAuth token

**404 Not Found**: Artist/album not on Spotify

**Timeout**: 5-second timeout, graceful degradation

### Caching Strategy

**Cache TTL**: 90 days (Spotify IDs rarely change)

**Cache key**: `spotify:artist:{spotify_id}` or `spotify:mbid:{mbid}`

## 8. Last.fm

### Overview

**Type**: REST API

**Purpose**: Music charts and scrobble data

**Base URL**: `https://ws.audioscrobbler.com/2.0`

**Authentication**: API key

**Library**: pylast 4.3.0

### Configuration

```python
LASTFM = {
    'api_key': 'your-api-key',
    'api_secret': 'your-api-secret',
    'timeout': 5
}
```

### pylast Integration

```python
import pylast

network = pylast.LastFMNetwork(
    api_key=config.LASTFM['api_key'],
    api_secret=config.LASTFM['api_secret']
)
```

### Chart Endpoints

**Top artists**:
```python
def get_lastfm_top_artists(limit=50):
    """Get Last.fm top artists chart"""
    top_artists = network.get_top_artists(limit=limit)

    results = []
    for artist in top_artists:
        results.append({
            'name': artist.item.name,
            'playcount': artist.weight,
            'listeners': artist.item.get_listener_count()
        })

    return results
```

**Top albums**:
```python
def get_lastfm_top_albums(limit=50):
    """Get Last.fm top albums chart"""
    top_albums = network.get_top_albums(limit=limit)

    results = []
    for album in top_albums:
        results.append({
            'name': album.item.title,
            'artist': album.item.artist.name,
            'playcount': album.weight
        })

    return results
```

**Top tracks**:
```python
def get_lastfm_top_tracks(limit=50):
    """Get Last.fm top tracks chart"""
    top_tracks = network.get_top_tracks(limit=limit)

    results = []
    for track in top_tracks:
        results.append({
            'name': track.item.title,
            'artist': track.item.artist.name,
            'playcount': track.weight
        })

    return results
```

### MusicBrainz Mapping

**Map Last.fm artist to MusicBrainz**:
```python
async def map_lastfm_to_musicbrainz(lastfm_artist_name):
    """Map Last.fm artist to MusicBrainz ID"""
    # Search MusicBrainz via Solr
    results = await search_artist(lastfm_artist_name, limit=5)

    if not results:
        return None

    # Return best match (first result)
    return results[0]['Id']
```

### Caching

**Cache TTL**: 6 hours (charts update daily)

**Cache key**: `lastfm:chart:{type}:{limit}`

## 9. Billboard

### Overview

**Type**: Web scraping

**Purpose**: Billboard music charts

**Base URL**: `https://www.billboard.com/charts`

**Authentication**: None

**Library**: billboard-py 7.0.0

### billboard-py Integration

```python
import billboard

def get_billboard_hot_100():
    """Get Billboard Hot 100 chart"""
    chart = billboard.ChartData('hot-100')

    results = []
    for entry in chart:
        results.append({
            'position': entry.rank,
            'title': entry.title,
            'artist': entry.artist,
            'last_position': entry.lastPos,
            'peak_position': entry.peakPos,
            'weeks_on_chart': entry.weeks
        })

    return results
```

### Supported Charts

| Chart Name | billboard-py ID | Type |
|------------|-----------------|------|
| **Hot 100** | `hot-100` | Tracks |
| **Billboard 200** | `billboard-200` | Albums |
| **Artist 100** | `artist-100` | Artists |
| **Streaming Songs** | `streaming-songs` | Tracks |
| **Radio Songs** | `radio-songs` | Tracks |
| **Digital Song Sales** | `digital-song-sales` | Tracks |

### MusicBrainz Mapping

**Map Billboard entry to MusicBrainz**:
```python
async def map_billboard_to_musicbrainz(artist_name, track_title=None):
    """Map Billboard entry to MusicBrainz"""
    # Search artist
    artist_results = await search_artist(artist_name, limit=5)
    if not artist_results:
        return None

    artist_mbid = artist_results[0]['Id']

    # If track title provided, search for recording
    if track_title:
        # Search would require recording search (not implemented)
        pass

    return artist_mbid
```

### Error Handling

**HTTP errors**: Retry with backoff

**Parse errors**: Graceful degradation

**Rate limiting**: Polite crawling (1 request per second)

### Caching

**Cache TTL**: 6 hours (charts update weekly)

**Cache key**: `billboard:chart:{chart_name}`

## 10. Apple Music / iTunes

### Overview

**Type**: RSS API

**Purpose**: Apple Music and iTunes charts

**Base URL**: `https://rss.applemarketingtools.com/api/v2`

**Authentication**: None

### RSS Feed URLs

**Top albums**:
```
https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/albums.json
```

**Top songs**:
```
https://rss.applemarketingtools.com/api/v2/us/music/most-played/100/songs.json
```

**New releases**:
```
https://rss.applemarketingtools.com/api/v2/us/music/new-releases/100/albums.json
```

### Fetch and Parse

```python
async def get_apple_music_chart(chart_type, limit=100):
    """Fetch Apple Music chart"""
    async with aiohttp.ClientSession() as session:
        url = f"https://rss.applemarketingtools.com/api/v2/us/music/most-played/{limit}/{chart_type}.json"

        async with session.get(url, timeout=5) as response:
            response.raise_for_status()
            data = await response.json()

            results = []
            for entry in data['feed']['results']:
                results.append({
                    'position': len(results) + 1,
                    'name': entry['name'],
                    'artist': entry['artistName'],
                    'url': entry['url'],
                    'artwork': entry['artworkUrl100']
                })

            return results
```

### MusicBrainz Mapping

**Map Apple Music entry to MusicBrainz**: Similar to Billboard mapping

### Caching

**Cache TTL**: 6 hours

**Cache key**: `apple:chart:{type}:{limit}`

## 11. RabbitMQ

### Overview

**Type**: Message queue

**Purpose**: Real-time search index updates

**Technology**: RabbitMQ 3.x

**Protocol**: AMQP 0.9.1

### Configuration

```python
RABBITMQ = {
    'host': 'rabbitmq',
    'port': 5672,
    'user': 'abc',
    'password': 'abc',
    'exchange': 'search.index',
    'artist_queue': 'search.index.artist',
    'album_queue': 'search.index.album'
}
```

### Message Format

**Artist update message**:
```json
{
  "entity_type": "artist",
  "mbid": "5b11f4ce-a62d-471e-81fc-a69a8278c7da",
  "action": "update",
  "timestamp": "2025-04-28T12:34:56Z"
}
```

**Album update message**:
```json
{
  "entity_type": "release_group",
  "mbid": "1b022e01-4da6-387b-8658-8678046e4cef",
  "action": "update",
  "timestamp": "2025-04-28T12:34:56Z"
}
```

### SIR (Search Index Rebuilder)

**Purpose**: Consume RabbitMQ messages and update Solr

**Process**:
1. Connect to RabbitMQ
2. Subscribe to queues
3. Consume messages
4. Query MusicBrainz DB for entity
5. Post update to Solr
6. Acknowledge message

**Container**: Separate service in docker-compose

### Monitoring

**Queue depth**:
```bash
rabbitmqctl list_queues name messages
```

**Consumer count**:
```bash
rabbitmqctl list_consumers
```

## 12. Redis

### Overview

**Type**: In-memory cache

**Purpose**: Ephemeral cache and rate limiting

**Technology**: Redis 6+

**Memory**: 512MB limit

### Configuration

```python
REDIS = {
    'url': 'redis://redis:6379/0',
    'namespace': 'lm3.7',
    'max_memory': '512mb',
    'eviction_policy': 'allkeys-lfu'
}
```

### Use Cases

1. **Hot cache**: Frequently accessed metadata
2. **Rate limiting**: Request counting
3. **Sentry deduplication**: Error tracking
4. **Invalidation locks**: Distributed locking

### Connection Pool

```python
import aioredis

redis = await aioredis.create_redis_pool(
    config.REDIS['url'],
    minsize=5,
    maxsize=20,
    encoding='utf-8'
)
```

## 13. Sentry

### Overview

**Type**: Error tracking

**Purpose**: Application monitoring

**Technology**: Sentry SaaS

**Library**: sentry-sdk 0.19.5

### Configuration

```python
import sentry_sdk
from sentry_sdk.integrations.flask import FlaskIntegration

sentry_sdk.init(
    dsn=config.SENTRY_DSN,
    integrations=[FlaskIntegration()],
    release=f"lidarr-metadata@{__version__}",
    environment=config.ENVIRONMENT,
    traces_sample_rate=0.1
)
```

### Redis-Based Rate Limiting

**Purpose**: Prevent alert fatigue

```python
class SentryRedisTtlProcessor:
    """Rate limit Sentry events using Redis"""

    def __init__(self, redis, ttl=3600):
        self.redis = redis
        self.ttl = ttl

    async def __call__(self, event, hint):
        # Generate error hash
        error_hash = hashlib.md5(
            f"{event['exception']['type']}:{event['exception']['value']}".encode()
        ).hexdigest()

        key = f"lm3.7:sentry:{error_hash}"

        # Check if error seen recently
        if await self.redis.exists(key):
            return None  # Drop event

        # Mark error as seen
        await self.redis.setex(key, self.ttl, "1")

        return event
```

### Release Tracking

**Sentry releases**: Tied to git commits

**CI/CD integration**:
```bash
sentry-cli releases new "lidarr-metadata@${GIT_SHA}"
sentry-cli releases set-commits "lidarr-metadata@${GIT_SHA}" --auto
sentry-cli releases finalize "lidarr-metadata@${GIT_SHA}"
```

## 14. Telegraf

### Overview

**Type**: Metrics collection

**Purpose**: StatsD metrics aggregation

**Technology**: Telegraf (InfluxData)

**Protocol**: StatsD

### Configuration

```python
TELEGRAF = {
    'host': 'telegraf',
    'port': 8125,
    'prefix': 'lidarr.metadata'
}
```

### StatsD Client

```python
import statsd

stats = statsd.StatsClient(
    host=config.TELEGRAF['host'],
    port=config.TELEGRAF['port'],
    prefix=config.TELEGRAF['prefix']
)
```

### Metrics

**Request counters**:
```python
stats.incr('requests.artist')
stats.incr('requests.album')
stats.incr('requests.search')
```

**Response times**:
```python
with stats.timer('response_time.artist'):
    artist = await get_artist(mbid)
```

**Cache hits/misses**:
```python
stats.incr('cache.hit')
stats.incr('cache.miss')
```

**Provider requests**:
```python
stats.incr('provider.fanart.request')
stats.incr('provider.wikipedia.request')
```

## 15. Cloudflare

### Overview

**Type**: CDN and edge caching

**Purpose**: Global content delivery

**Technology**: Cloudflare CDN

**API**: Cloudflare REST API v4

### Configuration

```python
CLOUDFLARE = {
    'zone_id': 'your-zone-id',
    'api_token': 'your-api-token',
    'base_url': 'https://api.cloudflare.com/client/v4'
}
```

### Cache Purge

**Purge by URL**:
```python
async def purge_cloudflare_cache(urls):
    """Purge Cloudflare cache for URLs"""
    async with aiohttp.ClientSession() as session:
        headers = {
            'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}",
            'Content-Type': 'application/json'
        }

        # Batch URLs (max 30 per request)
        for batch in chunks(urls, 30):
            data = {'files': batch}
            url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache"

            async with session.post(url, headers=headers, json=data) as response:
                response.raise_for_status()
```

**Purge all**:
```python
async def purge_all_cloudflare_cache():
    """Purge entire Cloudflare cache"""
    async with aiohttp.ClientSession() as session:
        headers = {
            'Authorization': f"Bearer {config.CLOUDFLARE['api_token']}",
            'Content-Type': 'application/json'
        }

        data = {'purge_everything': True}
        url = f"{config.CLOUDFLARE['base_url']}/zones/{config.CLOUDFLARE['zone_id']}/purge_cache"

        async with session.post(url, headers=headers, json=data) as response:
            response.raise_for_status()
```

### Rate Limits

**Cloudflare API**: 1200 requests per 5 minutes

**Batch purging**: Max 30 URLs per request

### Cache-Control Headers

**Set by API**:
```python
response.headers['Cache-Control'] = 's-maxage=2592000, max-age=0'
```

**Interpretation**:
- `s-maxage=2592000`: CDN caches for 30 days
- `max-age=0`: Clients must revalidate

## Integration Summary

The 15 integrations provide comprehensive metadata aggregation:

**Core data**: MusicBrainz DB (direct SQL)

**Search**: Solr (real-time via RabbitMQ)

**Images**: Cover Art Archive, FanArt.tv, TheAudioDB

**Biographies**: Wikipedia (32 languages), TheAudioDB

**Charts**: Last.fm, Billboard, Apple Music, Spotify

**Cross-platform**: Spotify ID mapping

**Infrastructure**: Redis (cache), PostgreSQL (persistent cache), RabbitMQ (messaging)

**Monitoring**: Sentry (errors), Telegraf (metrics)

**CDN**: Cloudflare (edge caching)

The integration architecture demonstrates excellent separation of concerns with fallback chains for resilience.