- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
14 KiB
MusicMetaLinker API Reference
API Type
MusicMetaLinker is a Python library API. No REST API, no GraphQL, no command-line interface for library functionality.
Batch processing has a CLI (link_partitions.py) but the core library is Python-only.
Primary Interface: Align Class
Constructor
from musicmetalinker.linking import Align
linker = Align(
mbid_track=None,
mbid_release=None,
artist=None,
album=None,
track=None,
track_number=None,
duration=None,
isrc=None,
strict=False
)
Parameters:
mbid_track (str, optional): MusicBrainz recording ID. If provided, MusicBrainz is queried first and treated as authoritative.
mbid_release (str, optional): MusicBrainz release ID. Used for album-level metadata.
artist (str, optional): Artist name. Used for metadata-based search when identifiers unavailable.
album (str, optional): Album name. Used for filtering and matching.
track (str, optional): Track name. Primary search term for metadata-based queries.
track_number (int, optional): Track position on album. Used for filtering multiple matches.
duration (int or float, optional): Track duration in seconds. Critical for filtering. Deezer uses ±3 second threshold.
isrc (str, optional): International Standard Recording Code. If provided, used for direct lookup on Deezer and MusicBrainz.
strict (bool, optional): Strict matching mode. Behavior not fully documented. Likely affects fuzzy matching thresholds.
Returns: Align instance. No exceptions raised during construction. Queries execute lazily when getters called.
Usage patterns:
Minimal input (metadata only):
linker = Align(artist="Radiohead", track="Creep")
With identifiers (preferred):
linker = Align(
mbid_track="6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e",
isrc="GBAYE9200070"
)
Full metadata for best matching:
linker = Align(
artist="The Beatles",
track="Hey Jude",
album="Hey Jude",
duration=431,
track_number=1
)
Metadata Getter Methods
All getters return None if data unavailable. No exceptions raised.
get_artist()
artist = linker.get_artist()
Returns: str or None. Artist name from best available source (MusicBrainz > Deezer > YouTube > input).
Behavior:
- If MBID available, returns MusicBrainz artist
- Falls back to Deezer artist if found
- Falls back to YouTube artist if found
- Returns input artist if no services matched
- Returns None if no artist information available
get_album()
album = linker.get_album()
Returns: str or None. Album/release name.
Behavior: Same cascading fallback as get_artist().
get_track()
track = linker.get_track()
Returns: str or None. Track/recording name.
Behavior: Same cascading fallback as get_artist().
get_track_number()
track_number = linker.get_track_number()
Returns: int or None. Track position on album.
Behavior:
- Returns MusicBrainz track number if available
- Falls back to input track_number
- Returns None if unavailable
get_duration()
duration = linker.get_duration()
Returns: int, float, or None. Track duration in seconds.
Behavior:
- Returns MusicBrainz duration if available (milliseconds converted to seconds)
- Falls back to Deezer duration
- Falls back to input duration
- Returns None if unavailable
Note: MusicBrainz stores duration in milliseconds. The library converts to seconds for consistency.
get_release_date()
release_date = linker.get_release_date()
Returns: str or None. Release date in ISO format (YYYY-MM-DD) or year only (YYYY).
Behavior:
- Returns MusicBrainz release date if available
- Falls back to Deezer release date
- Returns None if unavailable
Format inconsistency: MusicBrainz may return full date, Deezer typically returns year only.
get_isrc()
isrc = linker.get_isrc()
Returns: str or None. International Standard Recording Code.
Behavior:
- Returns input ISRC if provided
- Extracts from MusicBrainz recording if available
- Extracts from Deezer result if available
- Returns None if unavailable
Format: Standard ISRC format (e.g., "GBAYE9200070"). No validation performed.
get_bpm()
bpm = linker.get_bpm()
Returns: int, float, or None. Tempo in beats per minute.
Behavior:
- Returns Deezer BPM if available
- Returns None if unavailable
Note: MusicBrainz doesn't provide BPM in standard queries. Only Deezer source.
Identifier Getter Methods
get_mbid()
mbid = linker.get_mbid()
Returns: str or None. MusicBrainz recording ID (UUID format).
Behavior:
- Returns input mbid_track if provided
- Queries MusicBrainz by ISRC if available
- Queries MusicBrainz by metadata if ISRC unavailable
- Returns None if no match found
Format: UUID string (e.g., "6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e").
get_deezer_id()
deezer_id = linker.get_deezer_id()
Returns: int or None. Deezer track ID.
Behavior:
- Queries Deezer by ISRC if available
- Queries Deezer by metadata if ISRC unavailable
- Filters by duration (±3 seconds)
- Returns None if no match found
Format: Integer (e.g., 123456789).
get_deezer_link()
deezer_link = linker.get_deezer_link()
Returns: str or None. Full Deezer track URL.
Behavior:
- Calls get_deezer_id() internally
- Constructs URL: f"https://www.deezer.com/track/{deezer_id}"
- Returns None if no Deezer ID available
Format: Full URL (e.g., "https://www.deezer.com/track/123456789").
get_youtube_link()
youtube_link = linker.get_youtube_link()
Returns: str or None. YouTube Music track URL.
Behavior:
- Queries YouTube Music by metadata (artist, track, album)
- Returns first result (no sophisticated ranking)
- Returns None if no results
Format: Full YouTube URL (e.g., "https://www.youtube.com/watch?v=dQw4w9WgXcQ").
Warning: YouTube matching is weak. First result assumed correct. No duration filtering.
get_acousticbrainz_link()
acousticbrainz_link = linker.get_acousticbrainz_link()
Returns: str or None. AcousticBrainz URL.
Behavior:
- Requires MBID (calls get_mbid() internally)
- Checks if https://acousticbrainz.org/{mbid} returns HTTP 200
- Returns URL if exists, None otherwise
Critical issue: AcousticBrainz shut down in 2022. This method always returns None. Dead code.
Internal Service Methods
Not part of public API but exposed in service classes.
MusicBrainzAlign Methods
get_recording(mbid): Direct MusicBrainz recording lookup by MBID.
get_best_match(artist, track, album, duration): Search MusicBrainz by metadata with filtering.
get_iswc(): Retrieve International Standard Musical Work Code.
Implementation details:
from musicmetalinker.linking import MusicBrainzAlign
mb = MusicBrainzAlign(mbid="...")
recording = mb.get_recording(mbid)
# Returns dict with artist, album, track, duration, isrcs, etc.
Not intended for direct use. Align class wraps these methods.
DeezerAlign Methods
best_match(artist, track, album, duration, duration_threshold=3): Search Deezer with duration filtering.
get_rank(): Retrieve Deezer popularity rank.
Implementation details:
from musicmetalinker.linking import DeezerAlign
deezer = DeezerAlign(artist="...", track="...", album="...", duration=123)
match = deezer.best_match(artist, track, album, duration)
# Returns Deezer track object or None
Duration threshold defaults to 3 seconds. Adjustable for stricter/looser matching.
YouTubeAlign Methods
get_best_match(artist, track, album): Search YouTube Music.
get_youtube_id(): Extract video ID from search results.
Implementation details:
from musicmetalinker.linking import YouTubeAlign
yt = YouTubeAlign(artist="...", track="...", album="...")
match = yt.get_best_match(artist, track, album)
# Returns YouTube Music result dict or None
No duration parameter. No filtering. First result returned.
Batch Processing API
link_partitions.py CLI
python link_partitions.py <directory> [options]
Arguments:
directory (positional): Path to directory containing JAMS files.
Options:
--save: Write enriched JAMS files back to disk. Without this flag, only CSV output generated.
--limit audio: Only process JAMS files with audio content. Skip annotation-only files.
--overwrite: Overwrite existing enriched JAMS files. Without this flag, existing files skipped.
Output:
CSV file with columns:
- jams_file: Original JAMS filename
- track_name, artist_name, album_name: Metadata
- track_number, duration, release_year: Attributes
- musicbrainz: MBID
- isrc: ISRC
- deezer_id, deezer_url: Deezer identifiers
- youtube_url: YouTube Music link
- acousticbrainz: AcousticBrainz link (always None)
- spotify_id: Spotify ID (if available)
Log file: link_partitions.log in current directory.
JAMSProcessor API
from musicmetalinker.preprocessor import JAMSProcessor
processor = JAMSProcessor(jams_file_path)
metadata = processor.extract_metadata()
# Returns dict with artist, track, album, duration, etc.
processor.enrich_jams(align_instance)
processor.write_jams(output_path)
extract_metadata(): Parses JAMS file and returns metadata dict.
enrich_jams(align): Takes Align instance and adds identifiers to JAMS structure.
write_jams(path): Writes enriched JAMS to file.
Error Handling
No exceptions raised by public API. All errors silently suppressed.
Pattern:
- Service query fails: Returns None
- Network error: Returns None
- Invalid input: Returns None
- No match found: Returns None
Implications:
- No distinction between error types
- No error messages
- No logging of failures (except in batch mode)
- Caller cannot determine why None returned
Debugging:
- Enable logging to see internal errors
- Check link_partitions.log for batch processing errors
- Add print statements to source code
Rate Limiting
No rate limiting implemented.
Risks:
- MusicBrainz rate limits: 1 request/second recommended, not enforced
- Deezer rate limits: Unknown, not enforced
- YouTube Music rate limits: Unknown, not enforced
Batch processing: No delays between requests. High risk of rate limiting or IP bans.
Recommendation: Add manual delays in batch processing loops.
Caching
Results cached within Align instance lifetime. No cross-instance caching.
Behavior:
- First call to get_mbid() queries MusicBrainz
- Second call to get_mbid() returns cached value
- Creating new Align instance queries again
No persistent cache: No disk cache, no Redis, no memcached.
Batch processing: Each track creates new Align instance. No cache reuse across tracks.
Thread Safety
Not thread-safe. No synchronization primitives.
Unsafe operations:
- Concurrent calls to same Align instance
- Concurrent batch processing of same directory
Safe operations:
- Multiple Align instances in separate threads (each queries independently)
Authentication
MusicBrainz: No authentication. User-Agent header required ("elka/0.1" hardcoded).
Deezer: No authentication for search API.
YouTube Music: No authentication. Uses unofficial API.
Spotify: OAuth2 client credentials required. Configured in external mml_secrets.py file.
Spotify usage: Limited to ISRC extraction in Billboard dataset cleaning. Not used in main Align workflow.
API Versioning
No API versioning. Library version 0.0.1 indicates pre-release.
Breaking changes: Possible in any release. No stability guarantees.
Compatibility: No backward compatibility promises.
Dependencies for API Usage
Minimum dependencies for using Align class:
- musicbrainzngs
- deezer-python
- ytmusicapi
- requests
Optional dependencies:
- jams (for JAMS file support)
- pandas (for batch CSV output)
- spotipy (for Spotify integration)
Performance Characteristics
Query latency:
- MusicBrainz: 100-500ms per query
- Deezer: 50-200ms per query
- YouTube Music: 100-300ms per query
Total latency: Sum of all service queries (sequential execution). Expect 250-1000ms per track.
Batch processing: Linear scaling. 1000 tracks = 1000x single track latency.
API Limitations
- No bulk queries: Each track requires separate Align instance
- No async support: Synchronous only
- No streaming results: All-or-nothing queries
- No partial updates: Can't update single field
- No validation: No input validation, no output validation
- No error details: Only None on failure
- Dead integrations: AcousticBrainz non-functional
- Weak YouTube matching: First result assumed correct
API Strengths
- Simple interface: Single class, clear getters
- Flexible input: Works with identifiers or metadata
- Cascading fallback: Graceful degradation
- Lazy evaluation: Only query when needed
- JAMS support: Academic standard format
API Design Recommendations
For production use:
- Add exceptions: Raise specific errors instead of returning None
- Add validation: Validate input parameters
- Add async API: Async versions of all getters
- Add bulk API: Process multiple tracks in single call
- Add configuration: Runtime configuration for thresholds
- Add logging: Structured logging with correlation IDs
- Add rate limiting: Respect API limits
- Remove dead code: Delete AcousticBrainz methods
- Add documentation: Docstrings for all public methods
- Add type hints: Full type annotations
The API surface is clean and simple. The implementation needs hardening.