- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
19 KiB
Meelo Integrations
Integration Overview
Meelo integrates with 8 metadata providers and 2 scrobbling services. The Matcher service handles provider queries, while the Server handles scrobbling. All integrations are configurable via settings.json and .env.
Metadata Providers
MusicBrainz
Type: Primary music database
Library: musicbrainzngs (Python)
Authentication: None (public API)
Rate Limit: 1 request/second
Priority: Highest (primary source)
Capabilities
- Artist metadata (name, sort name, areas, relationships)
- Album metadata (title, type, release date, labels)
- Track metadata (title, duration, ISRC)
- Recording relationships (covers, remixes, versions)
- Release groups and releases
- Area data (countries, cities with ISO 3166 codes)
Matching Strategy
- Query by AcoustID fingerprint (most accurate)
- If no fingerprint, search by artist + album + track title
- Extract MBID (MusicBrainz ID) for future queries
- Store MBID in LocalIdentifiers table
Data Extraction
Artist:
artist_data = mb.get_artist_by_id(mbid, includes=['areas', 'aliases'])
{
'name': artist_data['artist']['name'],
'sortName': artist_data['artist']['sort-name'],
'areas': [area['name'] for area in artist_data['artist'].get('areas', [])]
}
Album:
release_group = mb.get_release_group_by_id(mbid, includes=['releases', 'labels'])
{
'name': release_group['release-group']['title'],
'type': release_group['release-group']['type'],
'releaseDate': release_group['release-group']['first-release-date'],
'releases': [...]
}
Track:
recording = mb.get_recording_by_id(mbid, includes=['isrcs', 'releases'])
{
'title': recording['recording']['title'],
'duration': recording['recording']['length'],
'isrc': recording['recording'].get('isrc-list', [None])[0]
}
Rate Limiting
musicbrainzngs library enforces 1 request/second automatically. No additional limiting needed.
Error Handling
- 404 Not Found: No match, skip provider
- 503 Service Unavailable: Retry with exponential backoff (max 3 attempts)
- Rate Limit Exceeded: Wait and retry
Genius
Type: Lyrics and song descriptions
Library: lyricsgenius (Python)
Authentication: API token (GENIUS_ACCESS_TOKEN)
Rate Limit: 10 requests/second
Priority: High (for lyrics)
Capabilities
- Song lyrics (plain text)
- Song descriptions and annotations
- Artist biographies
- Album descriptions
Matching Strategy
- Search by artist + song title
- Extract song ID from search results
- Fetch full song data including lyrics
- Store lyrics in Lyrics table
Data Extraction
Lyrics:
genius = lyricsgenius.Genius(token)
song = genius.search_song(title, artist)
{
'plain': song.lyrics,
'description': song.description
}
Artist Bio:
artist = genius.search_artist(name)
{
'description': artist.description
}
Rate Limiting
Implemented using aiolimiter:
limiter = AsyncLimiter(10, 1) # 10 requests per second
async with limiter:
result = await fetch_genius(...)
Error Handling
- 404 Not Found: No lyrics available, skip
- 401 Unauthorized: Invalid token, log error
- Rate Limit: Wait and retry
Wikipedia
Type: Artist and album context
Library: wikipedia (Python)
Authentication: None
Rate Limit: 5 requests/second (self-imposed)
Priority: Medium (for descriptions)
Capabilities
- Artist biographies
- Album background and reception
- Contextual information (formation, breakup, influences)
Matching Strategy
- Search Wikipedia by artist/album name
- Extract first paragraph as description
- Store full URL as source
Data Extraction
Artist Bio:
import wikipedia
page = wikipedia.page(artist_name)
{
'description': page.summary,
'url': page.url
}
Album Context:
page = wikipedia.page(f"{album_name} ({artist_name} album)")
{
'description': page.summary,
'url': page.url
}
Disambiguation
Wikipedia often returns disambiguation pages. Handle by:
- Detect disambiguation page (check for "may refer to")
- Search for most likely option (e.g., add "band" or "musician")
- If still ambiguous, skip
Rate Limiting
limiter = AsyncLimiter(5, 1) # 5 requests per second
Error Handling
- PageError: No Wikipedia page, skip
- DisambiguationError: Try disambiguation, or skip
- HTTPError: Retry with backoff
Wikidata
Type: Structured data
Library: SPARQLWrapper (Python)
Authentication: None
Rate Limit: None (fast SPARQL endpoint)
Priority: Medium (for structured data)
Capabilities
- Artist relationships (members, collaborators)
- Area data (countries, cities, ISO codes)
- Dates (birth, death, formation, dissolution)
- External IDs (MusicBrainz, Discogs, AllMusic)
Matching Strategy
- Query by MusicBrainz ID (if available)
- Extract Wikidata entity ID
- Query for additional properties
- Store structured data
Data Extraction
Artist Data:
SELECT ?property ?value WHERE {
?artist wdt:P434 "MBID" . # MusicBrainz artist ID
?artist ?property ?value .
}
Area Hierarchy:
SELECT ?area ?parent ?iso WHERE {
?area wdt:P31 wd:Q515 . # instance of city
?area wdt:P131 ?parent . # located in
?area wdt:P300 ?iso . # ISO 3166 code
}
Rate Limiting
No rate limit. SPARQL endpoint is fast and public.
Error Handling
- No Results: Entity not in Wikidata, skip
- Timeout: Retry with simpler query
- SPARQL Error: Log and skip
Discogs
Type: Release information
Library: discogs_client (Python)
Authentication: API token (DISCOGS_ACCESS_TOKEN)
Rate Limit: 60 requests/minute
Priority: Low (optional)
Capabilities
- Release details (catalog number, barcode, format)
- Label information
- Release variations (country, format)
- Marketplace data (not used)
Matching Strategy
- Search by artist + album title
- Filter by format (CD, Vinyl, etc.)
- Extract release details
- Store in Release.extensions JSON
Data Extraction
Release:
import discogs_client
d = discogs_client.Client('Meelo/1.0', user_token=token)
results = d.search(artist=artist, release_title=album, type='release')
release = results[0]
{
'catalogNumber': release.data['catno'],
'barcode': release.data.get('barcode'),
'format': release.formats[0]['name'],
'country': release.country,
'label': release.labels[0].name
}
Rate Limiting
limiter = AsyncLimiter(60, 60) # 60 requests per minute
Error Handling
- 404 Not Found: No Discogs entry, skip
- 401 Unauthorized: Invalid token, log error
- Rate Limit: Wait 60 seconds and retry
AllMusic
Type: Editorial reviews and ratings
Library: BeautifulSoup (web scraping)
Authentication: None
Rate Limit: 1 request/second (self-imposed, no official API)
Priority: Low (optional)
Capabilities
- Album reviews
- Album ratings (1-5 stars)
- Artist biographies
- Genre classifications
Matching Strategy
- Search AllMusic by artist + album
- Scrape search results page
- Extract review and rating
- Store rating normalized to 0-100 scale
Data Extraction
Album Review:
from bs4 import BeautifulSoup
import httpx
url = f"https://www.allmusic.com/search/albums/{artist}+{album}"
response = httpx.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
rating_elem = soup.select_one('.allmusic-rating')
rating = len(rating_elem.select('.star-rating.full')) # Count full stars
review_elem = soup.select_one('.review-text')
review = review_elem.text.strip()
{
'rating': rating * 20, # Convert 1-5 to 0-100
'description': review
}
Rate Limiting
limiter = AsyncLimiter(1, 1) # 1 request per second
Error Handling
- 404 Not Found: No AllMusic page, skip
- Parsing Error: HTML structure changed, log and skip
- Timeout: Retry with backoff
Scraping Risks
AllMusic has no official API. Scraping may break if HTML structure changes. Disabled by default in settings.json.
Metacritic
Type: Aggregated critic scores
Library: BeautifulSoup (web scraping)
Authentication: None
Rate Limit: 1 request/second (self-imposed)
Priority: Low (optional)
Capabilities
- Album critic scores (0-100)
- User scores (not used)
- Critic reviews (not extracted)
Matching Strategy
- Search Metacritic by artist + album
- Scrape album page
- Extract Metascore
- Store as rating (already 0-100 scale)
Data Extraction
Album Score:
url = f"https://www.metacritic.com/music/{album_slug}/{artist_slug}"
response = httpx.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
score_elem = soup.select_one('.metascore_w')
score = int(score_elem.text.strip())
{
'rating': score
}
Rate Limiting
limiter = AsyncLimiter(1, 1) # 1 request per second
Error Handling
- 404 Not Found: Album not on Metacritic, skip
- Parsing Error: HTML structure changed, log and skip
- Timeout: Retry with backoff
Scraping Risks
Same as AllMusic. Disabled by default.
LrcLib
Type: Synced lyrics
Library: httpx (direct API calls)
Authentication: None
Rate Limit: 10 requests/second (self-imposed)
Priority: High (for synced lyrics)
Capabilities
- Synced lyrics in .lrc format
- Plain lyrics (fallback)
- Lyrics by duration matching (improves accuracy)
Matching Strategy
- Search by artist + title + duration
- Parse .lrc format to JSON
- Store in Lyrics.synced field
Data Extraction
Synced Lyrics:
import httpx
url = "https://lrclib.net/api/get"
params = {
'artist_name': artist,
'track_name': title,
'duration': duration
}
response = httpx.get(url, params=params)
data = response.json()
lrc_text = data['syncedLyrics']
# Parse .lrc format
lines = []
for line in lrc_text.split('\n'):
match = re.match(r'\[(\d+):(\d+\.\d+)\](.*)', line)
if match:
minutes, seconds, text = match.groups()
time_ms = (int(minutes) * 60 + float(seconds)) * 1000
lines.append({'time': int(time_ms), 'text': text.strip()})
{
'synced': lines,
'plain': data.get('plainLyrics')
}
Rate Limiting
limiter = AsyncLimiter(10, 1) # 10 requests per second
Error Handling
- 404 Not Found: No synced lyrics, try plain lyrics
- Parsing Error: Invalid .lrc format, skip
- Timeout: Retry with backoff
Scrobbling Services
Last.fm
Type: Scrobbling service
Library: pylast (Python)
Authentication: OAuth (LASTFM_API_KEY, LASTFM_API_SECRET)
Rate Limit: None specified
Integration: Server (NestJS)
Capabilities
- Scrobble track plays
- Update "now playing" status
- Retrieve user listening history (not implemented)
OAuth Flow
- User clicks "Connect Last.fm" in settings
- Server redirects to Last.fm OAuth page
- User authorizes Meelo
- Last.fm redirects to callback with token
- Server exchanges token for session key
- Session key stored in UserScrobbler.data JSON
Scrobbling
Now Playing:
await lastfm.updateNowPlaying({
artist: track.song.artist.name,
track: track.song.name,
album: track.release.album.name,
duration: track.duration
});
Scrobble:
await lastfm.scrobble({
artist: track.song.artist.name,
track: track.song.name,
album: track.release.album.name,
timestamp: Math.floor(Date.now() / 1000)
});
Scrobble Rules
- Track must play for at least 30 seconds or 50% of duration (whichever is shorter)
- Scrobble sent when track ends or user skips past 50%
- "Now playing" sent immediately on play
Error Handling
- Invalid Session: Re-authenticate user
- Network Error: Queue scrobble for retry
- Rate Limit: Wait and retry
ListenBrainz
Type: Open-source scrobbling service
Library: pylistenbrainz (Python)
Authentication: User token
Rate Limit: None specified
Integration: Server (NestJS)
Capabilities
- Submit listens (scrobbles)
- Retrieve listening history (not implemented)
- Statistics and recommendations (not implemented)
Authentication
- User obtains token from ListenBrainz settings
- User enters token in Meelo settings
- Token stored in UserScrobbler.data JSON
- No OAuth flow needed
Submitting Listens
Single Listen:
await listenbrainz.submitListen({
listened_at: Math.floor(Date.now() / 1000),
track_metadata: {
artist_name: track.song.artist.name,
track_name: track.song.name,
release_name: track.release.album.name,
additional_info: {
duration_ms: track.duration * 1000,
tracknumber: track.trackIndex
}
}
});
Listen Types
- Single: Submit one listen (used for scrobbling)
- Playing Now: Update current track (not implemented)
- Import: Bulk import (not used)
Error Handling
- Invalid Token: Notify user to re-enter token
- Network Error: Queue listen for retry
- Rate Limit: Wait and retry
Provider Configuration
settings.json
{
"providers": {
"musicbrainz": {
"enabled": true
},
"genius": {
"enabled": true
},
"wikipedia": {
"enabled": true
},
"wikidata": {
"enabled": true
},
"discogs": {
"enabled": false
},
"allmusic": {
"enabled": false
},
"metacritic": {
"enabled": false
},
"lrclib": {
"enabled": true
}
},
"metadata": {
"source": "providers",
"order": ["musicbrainz", "genius", "wikipedia", "lrclib", "wikidata"]
}
}
Fields:
providers.<name>.enabled: Enable/disable providermetadata.source: Prefer "embedded" tags or "providers"metadata.order: Provider priority for conflicting data
.env
# Genius
GENIUS_ACCESS_TOKEN=your_genius_token
# Discogs
DISCOGS_ACCESS_TOKEN=your_discogs_token
# Last.fm
LASTFM_API_KEY=your_lastfm_key
LASTFM_API_SECRET=your_lastfm_secret
# Public URL for OAuth callbacks
PUBLIC_URL=https://meelo.example.com
Provider Priority
When multiple providers return conflicting data, Matcher uses priority from metadata.order:
- MusicBrainz: Highest priority (most accurate)
- Genius: High priority for lyrics
- Wikipedia: Medium priority for descriptions
- LrcLib: High priority for synced lyrics
- Wikidata: Medium priority for structured data
- Discogs: Low priority (optional)
- AllMusic: Low priority (optional)
- Metacritic: Low priority (optional)
Data Aggregation
Descriptions
Concatenate descriptions from multiple providers:
MusicBrainz: "The Beatles were an English rock band..."
Wikipedia: "Formed in Liverpool in 1960..."
Genius: "Known for their innovative songwriting..."
Result: "The Beatles were an English rock band... Formed in Liverpool in 1960... Known for their innovative songwriting..."
Ratings
Average ratings from multiple providers:
AllMusic: 90/100
Metacritic: 85/100
Result: (90 + 85) / 2 = 87.5 → 88/100
Lyrics
Prefer synced lyrics over plain:
LrcLib: Synced lyrics available → Use synced
Genius: Plain lyrics available → Use as fallback
If both available, store both in Lyrics table.
Matching Workflow
- Scanner registers file with Server
- Scanner publishes
file.addedevent to RabbitMQ - Matcher consumes event
- Matcher fetches file metadata from Server
- Matcher queries enabled providers in parallel:
- MusicBrainz by AcoustID fingerprint
- Genius by artist + title
- Wikipedia by artist name
- LrcLib by artist + title + duration
- Wikidata by MusicBrainz ID (if found)
- Discogs by artist + album (if enabled)
- AllMusic by artist + album (if enabled)
- Metacritic by artist + album (if enabled)
- Matcher aggregates results based on priority
- Matcher pushes enriched metadata to Server
- Server updates database and search index
Error Recovery
Provider Failures
If provider fails:
- Log error with provider name and reason
- Continue with other providers
- Push partial metadata to Server
- Mark track as "partially matched"
Retry Logic
For transient errors (network, rate limit):
- Retry with exponential backoff
- Max 3 attempts per provider
- If all attempts fail, skip provider
Manual Refresh
Users can trigger metadata refresh via Scanner API:
POST /scanner/refresh
This re-queries all providers for existing tracks.
Performance Optimization
Parallel Queries
Matcher queries all providers in parallel using asyncio:
async def enrich_metadata(file_id):
tasks = [
fetch_musicbrainz(file_id),
fetch_genius(file_id),
fetch_wikipedia(file_id),
fetch_lrclib(file_id),
fetch_wikidata(file_id)
]
results = await asyncio.gather(*tasks, return_exceptions=True)
return aggregate_results(results)
Caching
Provider responses cached in memory for 1 hour:
- Reduces duplicate queries during batch scans
- Invalidated on manual refresh
Rate Limit Coordination
Rate limiters shared across all workers:
- Prevents exceeding provider limits
- Uses token bucket algorithm
Privacy Considerations
Data Sent to Providers
- MusicBrainz: AcoustID fingerprint, artist/album/track names
- Genius: Artist and track names
- Wikipedia: Artist and album names
- Wikidata: MusicBrainz IDs
- Discogs: Artist and album names
- AllMusic: Artist and album names
- Metacritic: Artist and album names
- LrcLib: Artist, track name, duration
No file paths or user data sent.
Scrobbling Privacy
- Last.fm: Track plays sent with timestamp
- ListenBrainz: Track plays sent with timestamp
Users control scrobbling via settings. Disabled by default.
Future Enhancements
Additional Providers
Potential providers to add:
- Spotify: Metadata and popularity scores
- Apple Music: Editorial content
- Bandcamp: Independent artist data
- RateYourMusic: User ratings and reviews
Provider Plugins
Allow users to add custom providers via plugin system.
Offline Mode
Cache provider responses for offline access.
Provider Statistics
Track provider accuracy and response times. Display in admin panel.
Summary
Meelo's integration architecture separates concerns: Matcher handles provider queries, Server handles scrobbling. The provider pattern enables easy addition of new sources. Parallel queries and rate limiting optimize performance. Priority-based aggregation ensures data quality. OAuth flows and token management handle authentication. The system is flexible (enable/disable providers), resilient (retry logic, partial results), and privacy-conscious (no file paths sent).