Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

522 lines
14 KiB
Markdown

# MusicMetaLinker API Reference
## API Type
MusicMetaLinker is a Python library API. No REST API, no GraphQL, no command-line interface for library functionality.
Batch processing has a CLI (link_partitions.py) but the core library is Python-only.
## Primary Interface: Align Class
### Constructor
```python
from musicmetalinker.linking import Align
linker = Align(
mbid_track=None,
mbid_release=None,
artist=None,
album=None,
track=None,
track_number=None,
duration=None,
isrc=None,
strict=False
)
```
**Parameters:**
**mbid_track** (str, optional): MusicBrainz recording ID. If provided, MusicBrainz is queried first and treated as authoritative.
**mbid_release** (str, optional): MusicBrainz release ID. Used for album-level metadata.
**artist** (str, optional): Artist name. Used for metadata-based search when identifiers unavailable.
**album** (str, optional): Album name. Used for filtering and matching.
**track** (str, optional): Track name. Primary search term for metadata-based queries.
**track_number** (int, optional): Track position on album. Used for filtering multiple matches.
**duration** (int or float, optional): Track duration in seconds. Critical for filtering. Deezer uses ±3 second threshold.
**isrc** (str, optional): International Standard Recording Code. If provided, used for direct lookup on Deezer and MusicBrainz.
**strict** (bool, optional): Strict matching mode. Behavior not fully documented. Likely affects fuzzy matching thresholds.
**Returns:** Align instance. No exceptions raised during construction. Queries execute lazily when getters called.
**Usage patterns:**
Minimal input (metadata only):
```python
linker = Align(artist="Radiohead", track="Creep")
```
With identifiers (preferred):
```python
linker = Align(
mbid_track="6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e",
isrc="GBAYE9200070"
)
```
Full metadata for best matching:
```python
linker = Align(
artist="The Beatles",
track="Hey Jude",
album="Hey Jude",
duration=431,
track_number=1
)
```
### Metadata Getter Methods
All getters return None if data unavailable. No exceptions raised.
#### get_artist()
```python
artist = linker.get_artist()
```
**Returns:** str or None. Artist name from best available source (MusicBrainz > Deezer > YouTube > input).
**Behavior:**
- If MBID available, returns MusicBrainz artist
- Falls back to Deezer artist if found
- Falls back to YouTube artist if found
- Returns input artist if no services matched
- Returns None if no artist information available
#### get_album()
```python
album = linker.get_album()
```
**Returns:** str or None. Album/release name.
**Behavior:** Same cascading fallback as get_artist().
#### get_track()
```python
track = linker.get_track()
```
**Returns:** str or None. Track/recording name.
**Behavior:** Same cascading fallback as get_artist().
#### get_track_number()
```python
track_number = linker.get_track_number()
```
**Returns:** int or None. Track position on album.
**Behavior:**
- Returns MusicBrainz track number if available
- Falls back to input track_number
- Returns None if unavailable
#### get_duration()
```python
duration = linker.get_duration()
```
**Returns:** int, float, or None. Track duration in seconds.
**Behavior:**
- Returns MusicBrainz duration if available (milliseconds converted to seconds)
- Falls back to Deezer duration
- Falls back to input duration
- Returns None if unavailable
**Note:** MusicBrainz stores duration in milliseconds. The library converts to seconds for consistency.
#### get_release_date()
```python
release_date = linker.get_release_date()
```
**Returns:** str or None. Release date in ISO format (YYYY-MM-DD) or year only (YYYY).
**Behavior:**
- Returns MusicBrainz release date if available
- Falls back to Deezer release date
- Returns None if unavailable
**Format inconsistency:** MusicBrainz may return full date, Deezer typically returns year only.
#### get_isrc()
```python
isrc = linker.get_isrc()
```
**Returns:** str or None. International Standard Recording Code.
**Behavior:**
- Returns input ISRC if provided
- Extracts from MusicBrainz recording if available
- Extracts from Deezer result if available
- Returns None if unavailable
**Format:** Standard ISRC format (e.g., "GBAYE9200070"). No validation performed.
#### get_bpm()
```python
bpm = linker.get_bpm()
```
**Returns:** int, float, or None. Tempo in beats per minute.
**Behavior:**
- Returns Deezer BPM if available
- Returns None if unavailable
**Note:** MusicBrainz doesn't provide BPM in standard queries. Only Deezer source.
### Identifier Getter Methods
#### get_mbid()
```python
mbid = linker.get_mbid()
```
**Returns:** str or None. MusicBrainz recording ID (UUID format).
**Behavior:**
- Returns input mbid_track if provided
- Queries MusicBrainz by ISRC if available
- Queries MusicBrainz by metadata if ISRC unavailable
- Returns None if no match found
**Format:** UUID string (e.g., "6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e").
#### get_deezer_id()
```python
deezer_id = linker.get_deezer_id()
```
**Returns:** int or None. Deezer track ID.
**Behavior:**
- Queries Deezer by ISRC if available
- Queries Deezer by metadata if ISRC unavailable
- Filters by duration (±3 seconds)
- Returns None if no match found
**Format:** Integer (e.g., 123456789).
#### get_deezer_link()
```python
deezer_link = linker.get_deezer_link()
```
**Returns:** str or None. Full Deezer track URL.
**Behavior:**
- Calls get_deezer_id() internally
- Constructs URL: f"https://www.deezer.com/track/{deezer_id}"
- Returns None if no Deezer ID available
**Format:** Full URL (e.g., "https://www.deezer.com/track/123456789").
#### get_youtube_link()
```python
youtube_link = linker.get_youtube_link()
```
**Returns:** str or None. YouTube Music track URL.
**Behavior:**
- Queries YouTube Music by metadata (artist, track, album)
- Returns first result (no sophisticated ranking)
- Returns None if no results
**Format:** Full YouTube URL (e.g., "https://www.youtube.com/watch?v=dQw4w9WgXcQ").
**Warning:** YouTube matching is weak. First result assumed correct. No duration filtering.
#### get_acousticbrainz_link()
```python
acousticbrainz_link = linker.get_acousticbrainz_link()
```
**Returns:** str or None. AcousticBrainz URL.
**Behavior:**
- Requires MBID (calls get_mbid() internally)
- Checks if https://acousticbrainz.org/{mbid} returns HTTP 200
- Returns URL if exists, None otherwise
**Critical issue:** AcousticBrainz shut down in 2022. This method always returns None. Dead code.
### Internal Service Methods
Not part of public API but exposed in service classes.
#### MusicBrainzAlign Methods
**get_recording(mbid):** Direct MusicBrainz recording lookup by MBID.
**get_best_match(artist, track, album, duration):** Search MusicBrainz by metadata with filtering.
**get_iswc():** Retrieve International Standard Musical Work Code.
**Implementation details:**
```python
from musicmetalinker.linking import MusicBrainzAlign
mb = MusicBrainzAlign(mbid="...")
recording = mb.get_recording(mbid)
# Returns dict with artist, album, track, duration, isrcs, etc.
```
Not intended for direct use. Align class wraps these methods.
#### DeezerAlign Methods
**best_match(artist, track, album, duration, duration_threshold=3):** Search Deezer with duration filtering.
**get_rank():** Retrieve Deezer popularity rank.
**Implementation details:**
```python
from musicmetalinker.linking import DeezerAlign
deezer = DeezerAlign(artist="...", track="...", album="...", duration=123)
match = deezer.best_match(artist, track, album, duration)
# Returns Deezer track object or None
```
Duration threshold defaults to 3 seconds. Adjustable for stricter/looser matching.
#### YouTubeAlign Methods
**get_best_match(artist, track, album):** Search YouTube Music.
**get_youtube_id():** Extract video ID from search results.
**Implementation details:**
```python
from musicmetalinker.linking import YouTubeAlign
yt = YouTubeAlign(artist="...", track="...", album="...")
match = yt.get_best_match(artist, track, album)
# Returns YouTube Music result dict or None
```
No duration parameter. No filtering. First result returned.
### Batch Processing API
#### link_partitions.py CLI
```bash
python link_partitions.py <directory> [options]
```
**Arguments:**
**directory** (positional): Path to directory containing JAMS files.
**Options:**
**--save:** Write enriched JAMS files back to disk. Without this flag, only CSV output generated.
**--limit audio:** Only process JAMS files with audio content. Skip annotation-only files.
**--overwrite:** Overwrite existing enriched JAMS files. Without this flag, existing files skipped.
**Output:**
CSV file with columns:
- jams_file: Original JAMS filename
- track_name, artist_name, album_name: Metadata
- track_number, duration, release_year: Attributes
- musicbrainz: MBID
- isrc: ISRC
- deezer_id, deezer_url: Deezer identifiers
- youtube_url: YouTube Music link
- acousticbrainz: AcousticBrainz link (always None)
- spotify_id: Spotify ID (if available)
Log file: link_partitions.log in current directory.
#### JAMSProcessor API
```python
from musicmetalinker.preprocessor import JAMSProcessor
processor = JAMSProcessor(jams_file_path)
metadata = processor.extract_metadata()
# Returns dict with artist, track, album, duration, etc.
processor.enrich_jams(align_instance)
processor.write_jams(output_path)
```
**extract_metadata():** Parses JAMS file and returns metadata dict.
**enrich_jams(align):** Takes Align instance and adds identifiers to JAMS structure.
**write_jams(path):** Writes enriched JAMS to file.
### Error Handling
No exceptions raised by public API. All errors silently suppressed.
**Pattern:**
- Service query fails: Returns None
- Network error: Returns None
- Invalid input: Returns None
- No match found: Returns None
**Implications:**
- No distinction between error types
- No error messages
- No logging of failures (except in batch mode)
- Caller cannot determine why None returned
**Debugging:**
- Enable logging to see internal errors
- Check link_partitions.log for batch processing errors
- Add print statements to source code
### Rate Limiting
No rate limiting implemented.
**Risks:**
- MusicBrainz rate limits: 1 request/second recommended, not enforced
- Deezer rate limits: Unknown, not enforced
- YouTube Music rate limits: Unknown, not enforced
**Batch processing:** No delays between requests. High risk of rate limiting or IP bans.
**Recommendation:** Add manual delays in batch processing loops.
### Caching
Results cached within Align instance lifetime. No cross-instance caching.
**Behavior:**
- First call to get_mbid() queries MusicBrainz
- Second call to get_mbid() returns cached value
- Creating new Align instance queries again
**No persistent cache:** No disk cache, no Redis, no memcached.
**Batch processing:** Each track creates new Align instance. No cache reuse across tracks.
### Thread Safety
Not thread-safe. No synchronization primitives.
**Unsafe operations:**
- Concurrent calls to same Align instance
- Concurrent batch processing of same directory
**Safe operations:**
- Multiple Align instances in separate threads (each queries independently)
### Authentication
**MusicBrainz:** No authentication. User-Agent header required ("elka/0.1" hardcoded).
**Deezer:** No authentication for search API.
**YouTube Music:** No authentication. Uses unofficial API.
**Spotify:** OAuth2 client credentials required. Configured in external mml_secrets.py file.
**Spotify usage:** Limited to ISRC extraction in Billboard dataset cleaning. Not used in main Align workflow.
### API Versioning
No API versioning. Library version 0.0.1 indicates pre-release.
**Breaking changes:** Possible in any release. No stability guarantees.
**Compatibility:** No backward compatibility promises.
### Dependencies for API Usage
Minimum dependencies for using Align class:
- musicbrainzngs
- deezer-python
- ytmusicapi
- requests
Optional dependencies:
- jams (for JAMS file support)
- pandas (for batch CSV output)
- spotipy (for Spotify integration)
### Performance Characteristics
**Query latency:**
- MusicBrainz: 100-500ms per query
- Deezer: 50-200ms per query
- YouTube Music: 100-300ms per query
**Total latency:** Sum of all service queries (sequential execution). Expect 250-1000ms per track.
**Batch processing:** Linear scaling. 1000 tracks = 1000x single track latency.
### API Limitations
1. **No bulk queries:** Each track requires separate Align instance
2. **No async support:** Synchronous only
3. **No streaming results:** All-or-nothing queries
4. **No partial updates:** Can't update single field
5. **No validation:** No input validation, no output validation
6. **No error details:** Only None on failure
7. **Dead integrations:** AcousticBrainz non-functional
8. **Weak YouTube matching:** First result assumed correct
### API Strengths
1. **Simple interface:** Single class, clear getters
2. **Flexible input:** Works with identifiers or metadata
3. **Cascading fallback:** Graceful degradation
4. **Lazy evaluation:** Only query when needed
5. **JAMS support:** Academic standard format
### API Design Recommendations
For production use:
1. **Add exceptions:** Raise specific errors instead of returning None
2. **Add validation:** Validate input parameters
3. **Add async API:** Async versions of all getters
4. **Add bulk API:** Process multiple tracks in single call
5. **Add configuration:** Runtime configuration for thresholds
6. **Add logging:** Structured logging with correlation IDs
7. **Add rate limiting:** Respect API limits
8. **Remove dead code:** Delete AcousticBrainz methods
9. **Add documentation:** Docstrings for all public methods
10. **Add type hints:** Full type annotations
The API surface is clean and simple. The implementation needs hardening.