Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

14 KiB

MusicMetaLinker API Reference

API Type

MusicMetaLinker is a Python library API. No REST API, no GraphQL, no command-line interface for library functionality.

Batch processing has a CLI (link_partitions.py) but the core library is Python-only.

Primary Interface: Align Class

Constructor

from musicmetalinker.linking import Align

linker = Align(
    mbid_track=None,
    mbid_release=None,
    artist=None,
    album=None,
    track=None,
    track_number=None,
    duration=None,
    isrc=None,
    strict=False
)

Parameters:

mbid_track (str, optional): MusicBrainz recording ID. If provided, MusicBrainz is queried first and treated as authoritative.

mbid_release (str, optional): MusicBrainz release ID. Used for album-level metadata.

artist (str, optional): Artist name. Used for metadata-based search when identifiers unavailable.

album (str, optional): Album name. Used for filtering and matching.

track (str, optional): Track name. Primary search term for metadata-based queries.

track_number (int, optional): Track position on album. Used for filtering multiple matches.

duration (int or float, optional): Track duration in seconds. Critical for filtering. Deezer uses ±3 second threshold.

isrc (str, optional): International Standard Recording Code. If provided, used for direct lookup on Deezer and MusicBrainz.

strict (bool, optional): Strict matching mode. Behavior not fully documented. Likely affects fuzzy matching thresholds.

Returns: Align instance. No exceptions raised during construction. Queries execute lazily when getters called.

Usage patterns:

Minimal input (metadata only):

linker = Align(artist="Radiohead", track="Creep")

With identifiers (preferred):

linker = Align(
    mbid_track="6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e",
    isrc="GBAYE9200070"
)

Full metadata for best matching:

linker = Align(
    artist="The Beatles",
    track="Hey Jude",
    album="Hey Jude",
    duration=431,
    track_number=1
)

Metadata Getter Methods

All getters return None if data unavailable. No exceptions raised.

get_artist()

artist = linker.get_artist()

Returns: str or None. Artist name from best available source (MusicBrainz > Deezer > YouTube > input).

Behavior:

  • If MBID available, returns MusicBrainz artist
  • Falls back to Deezer artist if found
  • Falls back to YouTube artist if found
  • Returns input artist if no services matched
  • Returns None if no artist information available

get_album()

album = linker.get_album()

Returns: str or None. Album/release name.

Behavior: Same cascading fallback as get_artist().

get_track()

track = linker.get_track()

Returns: str or None. Track/recording name.

Behavior: Same cascading fallback as get_artist().

get_track_number()

track_number = linker.get_track_number()

Returns: int or None. Track position on album.

Behavior:

  • Returns MusicBrainz track number if available
  • Falls back to input track_number
  • Returns None if unavailable

get_duration()

duration = linker.get_duration()

Returns: int, float, or None. Track duration in seconds.

Behavior:

  • Returns MusicBrainz duration if available (milliseconds converted to seconds)
  • Falls back to Deezer duration
  • Falls back to input duration
  • Returns None if unavailable

Note: MusicBrainz stores duration in milliseconds. The library converts to seconds for consistency.

get_release_date()

release_date = linker.get_release_date()

Returns: str or None. Release date in ISO format (YYYY-MM-DD) or year only (YYYY).

Behavior:

  • Returns MusicBrainz release date if available
  • Falls back to Deezer release date
  • Returns None if unavailable

Format inconsistency: MusicBrainz may return full date, Deezer typically returns year only.

get_isrc()

isrc = linker.get_isrc()

Returns: str or None. International Standard Recording Code.

Behavior:

  • Returns input ISRC if provided
  • Extracts from MusicBrainz recording if available
  • Extracts from Deezer result if available
  • Returns None if unavailable

Format: Standard ISRC format (e.g., "GBAYE9200070"). No validation performed.

get_bpm()

bpm = linker.get_bpm()

Returns: int, float, or None. Tempo in beats per minute.

Behavior:

  • Returns Deezer BPM if available
  • Returns None if unavailable

Note: MusicBrainz doesn't provide BPM in standard queries. Only Deezer source.

Identifier Getter Methods

get_mbid()

mbid = linker.get_mbid()

Returns: str or None. MusicBrainz recording ID (UUID format).

Behavior:

  • Returns input mbid_track if provided
  • Queries MusicBrainz by ISRC if available
  • Queries MusicBrainz by metadata if ISRC unavailable
  • Returns None if no match found

Format: UUID string (e.g., "6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e").

get_deezer_id()

deezer_id = linker.get_deezer_id()

Returns: int or None. Deezer track ID.

Behavior:

  • Queries Deezer by ISRC if available
  • Queries Deezer by metadata if ISRC unavailable
  • Filters by duration (±3 seconds)
  • Returns None if no match found

Format: Integer (e.g., 123456789).

deezer_link = linker.get_deezer_link()

Returns: str or None. Full Deezer track URL.

Behavior:

Format: Full URL (e.g., "https://www.deezer.com/track/123456789").

youtube_link = linker.get_youtube_link()

Returns: str or None. YouTube Music track URL.

Behavior:

  • Queries YouTube Music by metadata (artist, track, album)
  • Returns first result (no sophisticated ranking)
  • Returns None if no results

Format: Full YouTube URL (e.g., "https://www.youtube.com/watch?v=dQw4w9WgXcQ").

Warning: YouTube matching is weak. First result assumed correct. No duration filtering.

acousticbrainz_link = linker.get_acousticbrainz_link()

Returns: str or None. AcousticBrainz URL.

Behavior:

Critical issue: AcousticBrainz shut down in 2022. This method always returns None. Dead code.

Internal Service Methods

Not part of public API but exposed in service classes.

MusicBrainzAlign Methods

get_recording(mbid): Direct MusicBrainz recording lookup by MBID.

get_best_match(artist, track, album, duration): Search MusicBrainz by metadata with filtering.

get_iswc(): Retrieve International Standard Musical Work Code.

Implementation details:

from musicmetalinker.linking import MusicBrainzAlign

mb = MusicBrainzAlign(mbid="...")
recording = mb.get_recording(mbid)
# Returns dict with artist, album, track, duration, isrcs, etc.

Not intended for direct use. Align class wraps these methods.

DeezerAlign Methods

best_match(artist, track, album, duration, duration_threshold=3): Search Deezer with duration filtering.

get_rank(): Retrieve Deezer popularity rank.

Implementation details:

from musicmetalinker.linking import DeezerAlign

deezer = DeezerAlign(artist="...", track="...", album="...", duration=123)
match = deezer.best_match(artist, track, album, duration)
# Returns Deezer track object or None

Duration threshold defaults to 3 seconds. Adjustable for stricter/looser matching.

YouTubeAlign Methods

get_best_match(artist, track, album): Search YouTube Music.

get_youtube_id(): Extract video ID from search results.

Implementation details:

from musicmetalinker.linking import YouTubeAlign

yt = YouTubeAlign(artist="...", track="...", album="...")
match = yt.get_best_match(artist, track, album)
# Returns YouTube Music result dict or None

No duration parameter. No filtering. First result returned.

Batch Processing API

python link_partitions.py <directory> [options]

Arguments:

directory (positional): Path to directory containing JAMS files.

Options:

--save: Write enriched JAMS files back to disk. Without this flag, only CSV output generated.

--limit audio: Only process JAMS files with audio content. Skip annotation-only files.

--overwrite: Overwrite existing enriched JAMS files. Without this flag, existing files skipped.

Output:

CSV file with columns:

  • jams_file: Original JAMS filename
  • track_name, artist_name, album_name: Metadata
  • track_number, duration, release_year: Attributes
  • musicbrainz: MBID
  • isrc: ISRC
  • deezer_id, deezer_url: Deezer identifiers
  • youtube_url: YouTube Music link
  • acousticbrainz: AcousticBrainz link (always None)
  • spotify_id: Spotify ID (if available)

Log file: link_partitions.log in current directory.

JAMSProcessor API

from musicmetalinker.preprocessor import JAMSProcessor

processor = JAMSProcessor(jams_file_path)
metadata = processor.extract_metadata()
# Returns dict with artist, track, album, duration, etc.

processor.enrich_jams(align_instance)
processor.write_jams(output_path)

extract_metadata(): Parses JAMS file and returns metadata dict.

enrich_jams(align): Takes Align instance and adds identifiers to JAMS structure.

write_jams(path): Writes enriched JAMS to file.

Error Handling

No exceptions raised by public API. All errors silently suppressed.

Pattern:

  • Service query fails: Returns None
  • Network error: Returns None
  • Invalid input: Returns None
  • No match found: Returns None

Implications:

  • No distinction between error types
  • No error messages
  • No logging of failures (except in batch mode)
  • Caller cannot determine why None returned

Debugging:

  • Enable logging to see internal errors
  • Check link_partitions.log for batch processing errors
  • Add print statements to source code

Rate Limiting

No rate limiting implemented.

Risks:

  • MusicBrainz rate limits: 1 request/second recommended, not enforced
  • Deezer rate limits: Unknown, not enforced
  • YouTube Music rate limits: Unknown, not enforced

Batch processing: No delays between requests. High risk of rate limiting or IP bans.

Recommendation: Add manual delays in batch processing loops.

Caching

Results cached within Align instance lifetime. No cross-instance caching.

Behavior:

  • First call to get_mbid() queries MusicBrainz
  • Second call to get_mbid() returns cached value
  • Creating new Align instance queries again

No persistent cache: No disk cache, no Redis, no memcached.

Batch processing: Each track creates new Align instance. No cache reuse across tracks.

Thread Safety

Not thread-safe. No synchronization primitives.

Unsafe operations:

  • Concurrent calls to same Align instance
  • Concurrent batch processing of same directory

Safe operations:

  • Multiple Align instances in separate threads (each queries independently)

Authentication

MusicBrainz: No authentication. User-Agent header required ("elka/0.1" hardcoded).

Deezer: No authentication for search API.

YouTube Music: No authentication. Uses unofficial API.

Spotify: OAuth2 client credentials required. Configured in external mml_secrets.py file.

Spotify usage: Limited to ISRC extraction in Billboard dataset cleaning. Not used in main Align workflow.

API Versioning

No API versioning. Library version 0.0.1 indicates pre-release.

Breaking changes: Possible in any release. No stability guarantees.

Compatibility: No backward compatibility promises.

Dependencies for API Usage

Minimum dependencies for using Align class:

  • musicbrainzngs
  • deezer-python
  • ytmusicapi
  • requests

Optional dependencies:

  • jams (for JAMS file support)
  • pandas (for batch CSV output)
  • spotipy (for Spotify integration)

Performance Characteristics

Query latency:

  • MusicBrainz: 100-500ms per query
  • Deezer: 50-200ms per query
  • YouTube Music: 100-300ms per query

Total latency: Sum of all service queries (sequential execution). Expect 250-1000ms per track.

Batch processing: Linear scaling. 1000 tracks = 1000x single track latency.

API Limitations

  1. No bulk queries: Each track requires separate Align instance
  2. No async support: Synchronous only
  3. No streaming results: All-or-nothing queries
  4. No partial updates: Can't update single field
  5. No validation: No input validation, no output validation
  6. No error details: Only None on failure
  7. Dead integrations: AcousticBrainz non-functional
  8. Weak YouTube matching: First result assumed correct

API Strengths

  1. Simple interface: Single class, clear getters
  2. Flexible input: Works with identifiers or metadata
  3. Cascading fallback: Graceful degradation
  4. Lazy evaluation: Only query when needed
  5. JAMS support: Academic standard format

API Design Recommendations

For production use:

  1. Add exceptions: Raise specific errors instead of returning None
  2. Add validation: Validate input parameters
  3. Add async API: Async versions of all getters
  4. Add bulk API: Process multiple tracks in single call
  5. Add configuration: Runtime configuration for thresholds
  6. Add logging: Structured logging with correlation IDs
  7. Add rate limiting: Respect API limits
  8. Remove dead code: Delete AcousticBrainz methods
  9. Add documentation: Docstrings for all public methods
  10. Add type hints: Full type annotations

The API surface is clean and simple. The implementation needs hardening.