# Meelo Integrations ## Integration Overview Meelo integrates with 8 metadata providers and 2 scrobbling services. The Matcher service handles provider queries, while the Server handles scrobbling. All integrations are configurable via settings.json and .env. ## Metadata Providers ### MusicBrainz **Type**: Primary music database **Library**: musicbrainzngs (Python) **Authentication**: None (public API) **Rate Limit**: 1 request/second **Priority**: Highest (primary source) #### Capabilities - Artist metadata (name, sort name, areas, relationships) - Album metadata (title, type, release date, labels) - Track metadata (title, duration, ISRC) - Recording relationships (covers, remixes, versions) - Release groups and releases - Area data (countries, cities with ISO 3166 codes) #### Matching Strategy 1. Query by AcoustID fingerprint (most accurate) 2. If no fingerprint, search by artist + album + track title 3. Extract MBID (MusicBrainz ID) for future queries 4. Store MBID in LocalIdentifiers table #### Data Extraction **Artist**: ```python artist_data = mb.get_artist_by_id(mbid, includes=['areas', 'aliases']) { 'name': artist_data['artist']['name'], 'sortName': artist_data['artist']['sort-name'], 'areas': [area['name'] for area in artist_data['artist'].get('areas', [])] } ``` **Album**: ```python release_group = mb.get_release_group_by_id(mbid, includes=['releases', 'labels']) { 'name': release_group['release-group']['title'], 'type': release_group['release-group']['type'], 'releaseDate': release_group['release-group']['first-release-date'], 'releases': [...] } ``` **Track**: ```python recording = mb.get_recording_by_id(mbid, includes=['isrcs', 'releases']) { 'title': recording['recording']['title'], 'duration': recording['recording']['length'], 'isrc': recording['recording'].get('isrc-list', [None])[0] } ``` #### Rate Limiting musicbrainzngs library enforces 1 request/second automatically. No additional limiting needed. #### Error Handling - **404 Not Found**: No match, skip provider - **503 Service Unavailable**: Retry with exponential backoff (max 3 attempts) - **Rate Limit Exceeded**: Wait and retry ### Genius **Type**: Lyrics and song descriptions **Library**: lyricsgenius (Python) **Authentication**: API token (GENIUS_ACCESS_TOKEN) **Rate Limit**: 10 requests/second **Priority**: High (for lyrics) #### Capabilities - Song lyrics (plain text) - Song descriptions and annotations - Artist biographies - Album descriptions #### Matching Strategy 1. Search by artist + song title 2. Extract song ID from search results 3. Fetch full song data including lyrics 4. Store lyrics in Lyrics table #### Data Extraction **Lyrics**: ```python genius = lyricsgenius.Genius(token) song = genius.search_song(title, artist) { 'plain': song.lyrics, 'description': song.description } ``` **Artist Bio**: ```python artist = genius.search_artist(name) { 'description': artist.description } ``` #### Rate Limiting Implemented using aiolimiter: ```python limiter = AsyncLimiter(10, 1) # 10 requests per second async with limiter: result = await fetch_genius(...) ``` #### Error Handling - **404 Not Found**: No lyrics available, skip - **401 Unauthorized**: Invalid token, log error - **Rate Limit**: Wait and retry ### Wikipedia **Type**: Artist and album context **Library**: wikipedia (Python) **Authentication**: None **Rate Limit**: 5 requests/second (self-imposed) **Priority**: Medium (for descriptions) #### Capabilities - Artist biographies - Album background and reception - Contextual information (formation, breakup, influences) #### Matching Strategy 1. Search Wikipedia by artist/album name 2. Extract first paragraph as description 3. Store full URL as source #### Data Extraction **Artist Bio**: ```python import wikipedia page = wikipedia.page(artist_name) { 'description': page.summary, 'url': page.url } ``` **Album Context**: ```python page = wikipedia.page(f"{album_name} ({artist_name} album)") { 'description': page.summary, 'url': page.url } ``` #### Disambiguation Wikipedia often returns disambiguation pages. Handle by: 1. Detect disambiguation page (check for "may refer to") 2. Search for most likely option (e.g., add "band" or "musician") 3. If still ambiguous, skip #### Rate Limiting ```python limiter = AsyncLimiter(5, 1) # 5 requests per second ``` #### Error Handling - **PageError**: No Wikipedia page, skip - **DisambiguationError**: Try disambiguation, or skip - **HTTPError**: Retry with backoff ### Wikidata **Type**: Structured data **Library**: SPARQLWrapper (Python) **Authentication**: None **Rate Limit**: None (fast SPARQL endpoint) **Priority**: Medium (for structured data) #### Capabilities - Artist relationships (members, collaborators) - Area data (countries, cities, ISO codes) - Dates (birth, death, formation, dissolution) - External IDs (MusicBrainz, Discogs, AllMusic) #### Matching Strategy 1. Query by MusicBrainz ID (if available) 2. Extract Wikidata entity ID 3. Query for additional properties 4. Store structured data #### Data Extraction **Artist Data**: ```sparql SELECT ?property ?value WHERE { ?artist wdt:P434 "MBID" . # MusicBrainz artist ID ?artist ?property ?value . } ``` **Area Hierarchy**: ```sparql SELECT ?area ?parent ?iso WHERE { ?area wdt:P31 wd:Q515 . # instance of city ?area wdt:P131 ?parent . # located in ?area wdt:P300 ?iso . # ISO 3166 code } ``` #### Rate Limiting No rate limit. SPARQL endpoint is fast and public. #### Error Handling - **No Results**: Entity not in Wikidata, skip - **Timeout**: Retry with simpler query - **SPARQL Error**: Log and skip ### Discogs **Type**: Release information **Library**: discogs_client (Python) **Authentication**: API token (DISCOGS_ACCESS_TOKEN) **Rate Limit**: 60 requests/minute **Priority**: Low (optional) #### Capabilities - Release details (catalog number, barcode, format) - Label information - Release variations (country, format) - Marketplace data (not used) #### Matching Strategy 1. Search by artist + album title 2. Filter by format (CD, Vinyl, etc.) 3. Extract release details 4. Store in Release.extensions JSON #### Data Extraction **Release**: ```python import discogs_client d = discogs_client.Client('Meelo/1.0', user_token=token) results = d.search(artist=artist, release_title=album, type='release') release = results[0] { 'catalogNumber': release.data['catno'], 'barcode': release.data.get('barcode'), 'format': release.formats[0]['name'], 'country': release.country, 'label': release.labels[0].name } ``` #### Rate Limiting ```python limiter = AsyncLimiter(60, 60) # 60 requests per minute ``` #### Error Handling - **404 Not Found**: No Discogs entry, skip - **401 Unauthorized**: Invalid token, log error - **Rate Limit**: Wait 60 seconds and retry ### AllMusic **Type**: Editorial reviews and ratings **Library**: BeautifulSoup (web scraping) **Authentication**: None **Rate Limit**: 1 request/second (self-imposed, no official API) **Priority**: Low (optional) #### Capabilities - Album reviews - Album ratings (1-5 stars) - Artist biographies - Genre classifications #### Matching Strategy 1. Search AllMusic by artist + album 2. Scrape search results page 3. Extract review and rating 4. Store rating normalized to 0-100 scale #### Data Extraction **Album Review**: ```python from bs4 import BeautifulSoup import httpx url = f"https://www.allmusic.com/search/albums/{artist}+{album}" response = httpx.get(url) soup = BeautifulSoup(response.text, 'html.parser') rating_elem = soup.select_one('.allmusic-rating') rating = len(rating_elem.select('.star-rating.full')) # Count full stars review_elem = soup.select_one('.review-text') review = review_elem.text.strip() { 'rating': rating * 20, # Convert 1-5 to 0-100 'description': review } ``` #### Rate Limiting ```python limiter = AsyncLimiter(1, 1) # 1 request per second ``` #### Error Handling - **404 Not Found**: No AllMusic page, skip - **Parsing Error**: HTML structure changed, log and skip - **Timeout**: Retry with backoff #### Scraping Risks AllMusic has no official API. Scraping may break if HTML structure changes. Disabled by default in settings.json. ### Metacritic **Type**: Aggregated critic scores **Library**: BeautifulSoup (web scraping) **Authentication**: None **Rate Limit**: 1 request/second (self-imposed) **Priority**: Low (optional) #### Capabilities - Album critic scores (0-100) - User scores (not used) - Critic reviews (not extracted) #### Matching Strategy 1. Search Metacritic by artist + album 2. Scrape album page 3. Extract Metascore 4. Store as rating (already 0-100 scale) #### Data Extraction **Album Score**: ```python url = f"https://www.metacritic.com/music/{album_slug}/{artist_slug}" response = httpx.get(url) soup = BeautifulSoup(response.text, 'html.parser') score_elem = soup.select_one('.metascore_w') score = int(score_elem.text.strip()) { 'rating': score } ``` #### Rate Limiting ```python limiter = AsyncLimiter(1, 1) # 1 request per second ``` #### Error Handling - **404 Not Found**: Album not on Metacritic, skip - **Parsing Error**: HTML structure changed, log and skip - **Timeout**: Retry with backoff #### Scraping Risks Same as AllMusic. Disabled by default. ### LrcLib **Type**: Synced lyrics **Library**: httpx (direct API calls) **Authentication**: None **Rate Limit**: 10 requests/second (self-imposed) **Priority**: High (for synced lyrics) #### Capabilities - Synced lyrics in .lrc format - Plain lyrics (fallback) - Lyrics by duration matching (improves accuracy) #### Matching Strategy 1. Search by artist + title + duration 2. Parse .lrc format to JSON 3. Store in Lyrics.synced field #### Data Extraction **Synced Lyrics**: ```python import httpx url = "https://lrclib.net/api/get" params = { 'artist_name': artist, 'track_name': title, 'duration': duration } response = httpx.get(url, params=params) data = response.json() lrc_text = data['syncedLyrics'] # Parse .lrc format lines = [] for line in lrc_text.split('\n'): match = re.match(r'\[(\d+):(\d+\.\d+)\](.*)', line) if match: minutes, seconds, text = match.groups() time_ms = (int(minutes) * 60 + float(seconds)) * 1000 lines.append({'time': int(time_ms), 'text': text.strip()}) { 'synced': lines, 'plain': data.get('plainLyrics') } ``` #### Rate Limiting ```python limiter = AsyncLimiter(10, 1) # 10 requests per second ``` #### Error Handling - **404 Not Found**: No synced lyrics, try plain lyrics - **Parsing Error**: Invalid .lrc format, skip - **Timeout**: Retry with backoff ## Scrobbling Services ### Last.fm **Type**: Scrobbling service **Library**: pylast (Python) **Authentication**: OAuth (LASTFM_API_KEY, LASTFM_API_SECRET) **Rate Limit**: None specified **Integration**: Server (NestJS) #### Capabilities - Scrobble track plays - Update "now playing" status - Retrieve user listening history (not implemented) #### OAuth Flow 1. User clicks "Connect Last.fm" in settings 2. Server redirects to Last.fm OAuth page 3. User authorizes Meelo 4. Last.fm redirects to callback with token 5. Server exchanges token for session key 6. Session key stored in UserScrobbler.data JSON #### Scrobbling **Now Playing**: ```typescript await lastfm.updateNowPlaying({ artist: track.song.artist.name, track: track.song.name, album: track.release.album.name, duration: track.duration }); ``` **Scrobble**: ```typescript await lastfm.scrobble({ artist: track.song.artist.name, track: track.song.name, album: track.release.album.name, timestamp: Math.floor(Date.now() / 1000) }); ``` #### Scrobble Rules - Track must play for at least 30 seconds or 50% of duration (whichever is shorter) - Scrobble sent when track ends or user skips past 50% - "Now playing" sent immediately on play #### Error Handling - **Invalid Session**: Re-authenticate user - **Network Error**: Queue scrobble for retry - **Rate Limit**: Wait and retry ### ListenBrainz **Type**: Open-source scrobbling service **Library**: pylistenbrainz (Python) **Authentication**: User token **Rate Limit**: None specified **Integration**: Server (NestJS) #### Capabilities - Submit listens (scrobbles) - Retrieve listening history (not implemented) - Statistics and recommendations (not implemented) #### Authentication 1. User obtains token from ListenBrainz settings 2. User enters token in Meelo settings 3. Token stored in UserScrobbler.data JSON 4. No OAuth flow needed #### Submitting Listens **Single Listen**: ```typescript await listenbrainz.submitListen({ listened_at: Math.floor(Date.now() / 1000), track_metadata: { artist_name: track.song.artist.name, track_name: track.song.name, release_name: track.release.album.name, additional_info: { duration_ms: track.duration * 1000, tracknumber: track.trackIndex } } }); ``` #### Listen Types - **Single**: Submit one listen (used for scrobbling) - **Playing Now**: Update current track (not implemented) - **Import**: Bulk import (not used) #### Error Handling - **Invalid Token**: Notify user to re-enter token - **Network Error**: Queue listen for retry - **Rate Limit**: Wait and retry ## Provider Configuration ### settings.json ```json { "providers": { "musicbrainz": { "enabled": true }, "genius": { "enabled": true }, "wikipedia": { "enabled": true }, "wikidata": { "enabled": true }, "discogs": { "enabled": false }, "allmusic": { "enabled": false }, "metacritic": { "enabled": false }, "lrclib": { "enabled": true } }, "metadata": { "source": "providers", "order": ["musicbrainz", "genius", "wikipedia", "lrclib", "wikidata"] } } ``` **Fields**: - `providers..enabled`: Enable/disable provider - `metadata.source`: Prefer "embedded" tags or "providers" - `metadata.order`: Provider priority for conflicting data ### .env ```bash # Genius GENIUS_ACCESS_TOKEN=your_genius_token # Discogs DISCOGS_ACCESS_TOKEN=your_discogs_token # Last.fm LASTFM_API_KEY=your_lastfm_key LASTFM_API_SECRET=your_lastfm_secret # Public URL for OAuth callbacks PUBLIC_URL=https://meelo.example.com ``` ## Provider Priority When multiple providers return conflicting data, Matcher uses priority from `metadata.order`: 1. **MusicBrainz**: Highest priority (most accurate) 2. **Genius**: High priority for lyrics 3. **Wikipedia**: Medium priority for descriptions 4. **LrcLib**: High priority for synced lyrics 5. **Wikidata**: Medium priority for structured data 6. **Discogs**: Low priority (optional) 7. **AllMusic**: Low priority (optional) 8. **Metacritic**: Low priority (optional) ## Data Aggregation ### Descriptions Concatenate descriptions from multiple providers: ``` MusicBrainz: "The Beatles were an English rock band..." Wikipedia: "Formed in Liverpool in 1960..." Genius: "Known for their innovative songwriting..." Result: "The Beatles were an English rock band... Formed in Liverpool in 1960... Known for their innovative songwriting..." ``` ### Ratings Average ratings from multiple providers: ``` AllMusic: 90/100 Metacritic: 85/100 Result: (90 + 85) / 2 = 87.5 → 88/100 ``` ### Lyrics Prefer synced lyrics over plain: ``` LrcLib: Synced lyrics available → Use synced Genius: Plain lyrics available → Use as fallback ``` If both available, store both in Lyrics table. ## Matching Workflow 1. **Scanner** registers file with Server 2. **Scanner** publishes `file.added` event to RabbitMQ 3. **Matcher** consumes event 4. **Matcher** fetches file metadata from Server 5. **Matcher** queries enabled providers in parallel: - MusicBrainz by AcoustID fingerprint - Genius by artist + title - Wikipedia by artist name - LrcLib by artist + title + duration - Wikidata by MusicBrainz ID (if found) - Discogs by artist + album (if enabled) - AllMusic by artist + album (if enabled) - Metacritic by artist + album (if enabled) 6. **Matcher** aggregates results based on priority 7. **Matcher** pushes enriched metadata to Server 8. **Server** updates database and search index ## Error Recovery ### Provider Failures If provider fails: 1. Log error with provider name and reason 2. Continue with other providers 3. Push partial metadata to Server 4. Mark track as "partially matched" ### Retry Logic For transient errors (network, rate limit): 1. Retry with exponential backoff 2. Max 3 attempts per provider 3. If all attempts fail, skip provider ### Manual Refresh Users can trigger metadata refresh via Scanner API: ```bash POST /scanner/refresh ``` This re-queries all providers for existing tracks. ## Performance Optimization ### Parallel Queries Matcher queries all providers in parallel using asyncio: ```python async def enrich_metadata(file_id): tasks = [ fetch_musicbrainz(file_id), fetch_genius(file_id), fetch_wikipedia(file_id), fetch_lrclib(file_id), fetch_wikidata(file_id) ] results = await asyncio.gather(*tasks, return_exceptions=True) return aggregate_results(results) ``` ### Caching Provider responses cached in memory for 1 hour: - Reduces duplicate queries during batch scans - Invalidated on manual refresh ### Rate Limit Coordination Rate limiters shared across all workers: - Prevents exceeding provider limits - Uses token bucket algorithm ## Privacy Considerations ### Data Sent to Providers - **MusicBrainz**: AcoustID fingerprint, artist/album/track names - **Genius**: Artist and track names - **Wikipedia**: Artist and album names - **Wikidata**: MusicBrainz IDs - **Discogs**: Artist and album names - **AllMusic**: Artist and album names - **Metacritic**: Artist and album names - **LrcLib**: Artist, track name, duration No file paths or user data sent. ### Scrobbling Privacy - **Last.fm**: Track plays sent with timestamp - **ListenBrainz**: Track plays sent with timestamp Users control scrobbling via settings. Disabled by default. ## Future Enhancements ### Additional Providers Potential providers to add: - **Spotify**: Metadata and popularity scores - **Apple Music**: Editorial content - **Bandcamp**: Independent artist data - **RateYourMusic**: User ratings and reviews ### Provider Plugins Allow users to add custom providers via plugin system. ### Offline Mode Cache provider responses for offline access. ### Provider Statistics Track provider accuracy and response times. Display in admin panel. ## Summary Meelo's integration architecture separates concerns: Matcher handles provider queries, Server handles scrobbling. The provider pattern enables easy addition of new sources. Parallel queries and rate limiting optimize performance. Priority-based aggregation ensures data quality. OAuth flows and token management handle authentication. The system is flexible (enable/disable providers), resilient (retry logic, partial results), and privacy-conscious (no file paths sent).