a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
522 lines
14 KiB
Markdown
522 lines
14 KiB
Markdown
# MusicMetaLinker API Reference
|
|
|
|
## API Type
|
|
|
|
MusicMetaLinker is a Python library API. No REST API, no GraphQL, no command-line interface for library functionality.
|
|
|
|
Batch processing has a CLI (link_partitions.py) but the core library is Python-only.
|
|
|
|
## Primary Interface: Align Class
|
|
|
|
### Constructor
|
|
|
|
```python
|
|
from musicmetalinker.linking import Align
|
|
|
|
linker = Align(
|
|
mbid_track=None,
|
|
mbid_release=None,
|
|
artist=None,
|
|
album=None,
|
|
track=None,
|
|
track_number=None,
|
|
duration=None,
|
|
isrc=None,
|
|
strict=False
|
|
)
|
|
```
|
|
|
|
**Parameters:**
|
|
|
|
**mbid_track** (str, optional): MusicBrainz recording ID. If provided, MusicBrainz is queried first and treated as authoritative.
|
|
|
|
**mbid_release** (str, optional): MusicBrainz release ID. Used for album-level metadata.
|
|
|
|
**artist** (str, optional): Artist name. Used for metadata-based search when identifiers unavailable.
|
|
|
|
**album** (str, optional): Album name. Used for filtering and matching.
|
|
|
|
**track** (str, optional): Track name. Primary search term for metadata-based queries.
|
|
|
|
**track_number** (int, optional): Track position on album. Used for filtering multiple matches.
|
|
|
|
**duration** (int or float, optional): Track duration in seconds. Critical for filtering. Deezer uses ±3 second threshold.
|
|
|
|
**isrc** (str, optional): International Standard Recording Code. If provided, used for direct lookup on Deezer and MusicBrainz.
|
|
|
|
**strict** (bool, optional): Strict matching mode. Behavior not fully documented. Likely affects fuzzy matching thresholds.
|
|
|
|
**Returns:** Align instance. No exceptions raised during construction. Queries execute lazily when getters called.
|
|
|
|
**Usage patterns:**
|
|
|
|
Minimal input (metadata only):
|
|
```python
|
|
linker = Align(artist="Radiohead", track="Creep")
|
|
```
|
|
|
|
With identifiers (preferred):
|
|
```python
|
|
linker = Align(
|
|
mbid_track="6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e",
|
|
isrc="GBAYE9200070"
|
|
)
|
|
```
|
|
|
|
Full metadata for best matching:
|
|
```python
|
|
linker = Align(
|
|
artist="The Beatles",
|
|
track="Hey Jude",
|
|
album="Hey Jude",
|
|
duration=431,
|
|
track_number=1
|
|
)
|
|
```
|
|
|
|
### Metadata Getter Methods
|
|
|
|
All getters return None if data unavailable. No exceptions raised.
|
|
|
|
#### get_artist()
|
|
|
|
```python
|
|
artist = linker.get_artist()
|
|
```
|
|
|
|
**Returns:** str or None. Artist name from best available source (MusicBrainz > Deezer > YouTube > input).
|
|
|
|
**Behavior:**
|
|
- If MBID available, returns MusicBrainz artist
|
|
- Falls back to Deezer artist if found
|
|
- Falls back to YouTube artist if found
|
|
- Returns input artist if no services matched
|
|
- Returns None if no artist information available
|
|
|
|
#### get_album()
|
|
|
|
```python
|
|
album = linker.get_album()
|
|
```
|
|
|
|
**Returns:** str or None. Album/release name.
|
|
|
|
**Behavior:** Same cascading fallback as get_artist().
|
|
|
|
#### get_track()
|
|
|
|
```python
|
|
track = linker.get_track()
|
|
```
|
|
|
|
**Returns:** str or None. Track/recording name.
|
|
|
|
**Behavior:** Same cascading fallback as get_artist().
|
|
|
|
#### get_track_number()
|
|
|
|
```python
|
|
track_number = linker.get_track_number()
|
|
```
|
|
|
|
**Returns:** int or None. Track position on album.
|
|
|
|
**Behavior:**
|
|
- Returns MusicBrainz track number if available
|
|
- Falls back to input track_number
|
|
- Returns None if unavailable
|
|
|
|
#### get_duration()
|
|
|
|
```python
|
|
duration = linker.get_duration()
|
|
```
|
|
|
|
**Returns:** int, float, or None. Track duration in seconds.
|
|
|
|
**Behavior:**
|
|
- Returns MusicBrainz duration if available (milliseconds converted to seconds)
|
|
- Falls back to Deezer duration
|
|
- Falls back to input duration
|
|
- Returns None if unavailable
|
|
|
|
**Note:** MusicBrainz stores duration in milliseconds. The library converts to seconds for consistency.
|
|
|
|
#### get_release_date()
|
|
|
|
```python
|
|
release_date = linker.get_release_date()
|
|
```
|
|
|
|
**Returns:** str or None. Release date in ISO format (YYYY-MM-DD) or year only (YYYY).
|
|
|
|
**Behavior:**
|
|
- Returns MusicBrainz release date if available
|
|
- Falls back to Deezer release date
|
|
- Returns None if unavailable
|
|
|
|
**Format inconsistency:** MusicBrainz may return full date, Deezer typically returns year only.
|
|
|
|
#### get_isrc()
|
|
|
|
```python
|
|
isrc = linker.get_isrc()
|
|
```
|
|
|
|
**Returns:** str or None. International Standard Recording Code.
|
|
|
|
**Behavior:**
|
|
- Returns input ISRC if provided
|
|
- Extracts from MusicBrainz recording if available
|
|
- Extracts from Deezer result if available
|
|
- Returns None if unavailable
|
|
|
|
**Format:** Standard ISRC format (e.g., "GBAYE9200070"). No validation performed.
|
|
|
|
#### get_bpm()
|
|
|
|
```python
|
|
bpm = linker.get_bpm()
|
|
```
|
|
|
|
**Returns:** int, float, or None. Tempo in beats per minute.
|
|
|
|
**Behavior:**
|
|
- Returns Deezer BPM if available
|
|
- Returns None if unavailable
|
|
|
|
**Note:** MusicBrainz doesn't provide BPM in standard queries. Only Deezer source.
|
|
|
|
### Identifier Getter Methods
|
|
|
|
#### get_mbid()
|
|
|
|
```python
|
|
mbid = linker.get_mbid()
|
|
```
|
|
|
|
**Returns:** str or None. MusicBrainz recording ID (UUID format).
|
|
|
|
**Behavior:**
|
|
- Returns input mbid_track if provided
|
|
- Queries MusicBrainz by ISRC if available
|
|
- Queries MusicBrainz by metadata if ISRC unavailable
|
|
- Returns None if no match found
|
|
|
|
**Format:** UUID string (e.g., "6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e").
|
|
|
|
#### get_deezer_id()
|
|
|
|
```python
|
|
deezer_id = linker.get_deezer_id()
|
|
```
|
|
|
|
**Returns:** int or None. Deezer track ID.
|
|
|
|
**Behavior:**
|
|
- Queries Deezer by ISRC if available
|
|
- Queries Deezer by metadata if ISRC unavailable
|
|
- Filters by duration (±3 seconds)
|
|
- Returns None if no match found
|
|
|
|
**Format:** Integer (e.g., 123456789).
|
|
|
|
#### get_deezer_link()
|
|
|
|
```python
|
|
deezer_link = linker.get_deezer_link()
|
|
```
|
|
|
|
**Returns:** str or None. Full Deezer track URL.
|
|
|
|
**Behavior:**
|
|
- Calls get_deezer_id() internally
|
|
- Constructs URL: f"https://www.deezer.com/track/{deezer_id}"
|
|
- Returns None if no Deezer ID available
|
|
|
|
**Format:** Full URL (e.g., "https://www.deezer.com/track/123456789").
|
|
|
|
#### get_youtube_link()
|
|
|
|
```python
|
|
youtube_link = linker.get_youtube_link()
|
|
```
|
|
|
|
**Returns:** str or None. YouTube Music track URL.
|
|
|
|
**Behavior:**
|
|
- Queries YouTube Music by metadata (artist, track, album)
|
|
- Returns first result (no sophisticated ranking)
|
|
- Returns None if no results
|
|
|
|
**Format:** Full YouTube URL (e.g., "https://www.youtube.com/watch?v=dQw4w9WgXcQ").
|
|
|
|
**Warning:** YouTube matching is weak. First result assumed correct. No duration filtering.
|
|
|
|
#### get_acousticbrainz_link()
|
|
|
|
```python
|
|
acousticbrainz_link = linker.get_acousticbrainz_link()
|
|
```
|
|
|
|
**Returns:** str or None. AcousticBrainz URL.
|
|
|
|
**Behavior:**
|
|
- Requires MBID (calls get_mbid() internally)
|
|
- Checks if https://acousticbrainz.org/{mbid} returns HTTP 200
|
|
- Returns URL if exists, None otherwise
|
|
|
|
**Critical issue:** AcousticBrainz shut down in 2022. This method always returns None. Dead code.
|
|
|
|
### Internal Service Methods
|
|
|
|
Not part of public API but exposed in service classes.
|
|
|
|
#### MusicBrainzAlign Methods
|
|
|
|
**get_recording(mbid):** Direct MusicBrainz recording lookup by MBID.
|
|
|
|
**get_best_match(artist, track, album, duration):** Search MusicBrainz by metadata with filtering.
|
|
|
|
**get_iswc():** Retrieve International Standard Musical Work Code.
|
|
|
|
**Implementation details:**
|
|
|
|
```python
|
|
from musicmetalinker.linking import MusicBrainzAlign
|
|
|
|
mb = MusicBrainzAlign(mbid="...")
|
|
recording = mb.get_recording(mbid)
|
|
# Returns dict with artist, album, track, duration, isrcs, etc.
|
|
```
|
|
|
|
Not intended for direct use. Align class wraps these methods.
|
|
|
|
#### DeezerAlign Methods
|
|
|
|
**best_match(artist, track, album, duration, duration_threshold=3):** Search Deezer with duration filtering.
|
|
|
|
**get_rank():** Retrieve Deezer popularity rank.
|
|
|
|
**Implementation details:**
|
|
|
|
```python
|
|
from musicmetalinker.linking import DeezerAlign
|
|
|
|
deezer = DeezerAlign(artist="...", track="...", album="...", duration=123)
|
|
match = deezer.best_match(artist, track, album, duration)
|
|
# Returns Deezer track object or None
|
|
```
|
|
|
|
Duration threshold defaults to 3 seconds. Adjustable for stricter/looser matching.
|
|
|
|
#### YouTubeAlign Methods
|
|
|
|
**get_best_match(artist, track, album):** Search YouTube Music.
|
|
|
|
**get_youtube_id():** Extract video ID from search results.
|
|
|
|
**Implementation details:**
|
|
|
|
```python
|
|
from musicmetalinker.linking import YouTubeAlign
|
|
|
|
yt = YouTubeAlign(artist="...", track="...", album="...")
|
|
match = yt.get_best_match(artist, track, album)
|
|
# Returns YouTube Music result dict or None
|
|
```
|
|
|
|
No duration parameter. No filtering. First result returned.
|
|
|
|
### Batch Processing API
|
|
|
|
#### link_partitions.py CLI
|
|
|
|
```bash
|
|
python link_partitions.py <directory> [options]
|
|
```
|
|
|
|
**Arguments:**
|
|
|
|
**directory** (positional): Path to directory containing JAMS files.
|
|
|
|
**Options:**
|
|
|
|
**--save:** Write enriched JAMS files back to disk. Without this flag, only CSV output generated.
|
|
|
|
**--limit audio:** Only process JAMS files with audio content. Skip annotation-only files.
|
|
|
|
**--overwrite:** Overwrite existing enriched JAMS files. Without this flag, existing files skipped.
|
|
|
|
**Output:**
|
|
|
|
CSV file with columns:
|
|
- jams_file: Original JAMS filename
|
|
- track_name, artist_name, album_name: Metadata
|
|
- track_number, duration, release_year: Attributes
|
|
- musicbrainz: MBID
|
|
- isrc: ISRC
|
|
- deezer_id, deezer_url: Deezer identifiers
|
|
- youtube_url: YouTube Music link
|
|
- acousticbrainz: AcousticBrainz link (always None)
|
|
- spotify_id: Spotify ID (if available)
|
|
|
|
Log file: link_partitions.log in current directory.
|
|
|
|
#### JAMSProcessor API
|
|
|
|
```python
|
|
from musicmetalinker.preprocessor import JAMSProcessor
|
|
|
|
processor = JAMSProcessor(jams_file_path)
|
|
metadata = processor.extract_metadata()
|
|
# Returns dict with artist, track, album, duration, etc.
|
|
|
|
processor.enrich_jams(align_instance)
|
|
processor.write_jams(output_path)
|
|
```
|
|
|
|
**extract_metadata():** Parses JAMS file and returns metadata dict.
|
|
|
|
**enrich_jams(align):** Takes Align instance and adds identifiers to JAMS structure.
|
|
|
|
**write_jams(path):** Writes enriched JAMS to file.
|
|
|
|
### Error Handling
|
|
|
|
No exceptions raised by public API. All errors silently suppressed.
|
|
|
|
**Pattern:**
|
|
- Service query fails: Returns None
|
|
- Network error: Returns None
|
|
- Invalid input: Returns None
|
|
- No match found: Returns None
|
|
|
|
**Implications:**
|
|
- No distinction between error types
|
|
- No error messages
|
|
- No logging of failures (except in batch mode)
|
|
- Caller cannot determine why None returned
|
|
|
|
**Debugging:**
|
|
- Enable logging to see internal errors
|
|
- Check link_partitions.log for batch processing errors
|
|
- Add print statements to source code
|
|
|
|
### Rate Limiting
|
|
|
|
No rate limiting implemented.
|
|
|
|
**Risks:**
|
|
- MusicBrainz rate limits: 1 request/second recommended, not enforced
|
|
- Deezer rate limits: Unknown, not enforced
|
|
- YouTube Music rate limits: Unknown, not enforced
|
|
|
|
**Batch processing:** No delays between requests. High risk of rate limiting or IP bans.
|
|
|
|
**Recommendation:** Add manual delays in batch processing loops.
|
|
|
|
### Caching
|
|
|
|
Results cached within Align instance lifetime. No cross-instance caching.
|
|
|
|
**Behavior:**
|
|
- First call to get_mbid() queries MusicBrainz
|
|
- Second call to get_mbid() returns cached value
|
|
- Creating new Align instance queries again
|
|
|
|
**No persistent cache:** No disk cache, no Redis, no memcached.
|
|
|
|
**Batch processing:** Each track creates new Align instance. No cache reuse across tracks.
|
|
|
|
### Thread Safety
|
|
|
|
Not thread-safe. No synchronization primitives.
|
|
|
|
**Unsafe operations:**
|
|
- Concurrent calls to same Align instance
|
|
- Concurrent batch processing of same directory
|
|
|
|
**Safe operations:**
|
|
- Multiple Align instances in separate threads (each queries independently)
|
|
|
|
### Authentication
|
|
|
|
**MusicBrainz:** No authentication. User-Agent header required ("elka/0.1" hardcoded).
|
|
|
|
**Deezer:** No authentication for search API.
|
|
|
|
**YouTube Music:** No authentication. Uses unofficial API.
|
|
|
|
**Spotify:** OAuth2 client credentials required. Configured in external mml_secrets.py file.
|
|
|
|
**Spotify usage:** Limited to ISRC extraction in Billboard dataset cleaning. Not used in main Align workflow.
|
|
|
|
### API Versioning
|
|
|
|
No API versioning. Library version 0.0.1 indicates pre-release.
|
|
|
|
**Breaking changes:** Possible in any release. No stability guarantees.
|
|
|
|
**Compatibility:** No backward compatibility promises.
|
|
|
|
### Dependencies for API Usage
|
|
|
|
Minimum dependencies for using Align class:
|
|
- musicbrainzngs
|
|
- deezer-python
|
|
- ytmusicapi
|
|
- requests
|
|
|
|
Optional dependencies:
|
|
- jams (for JAMS file support)
|
|
- pandas (for batch CSV output)
|
|
- spotipy (for Spotify integration)
|
|
|
|
### Performance Characteristics
|
|
|
|
**Query latency:**
|
|
- MusicBrainz: 100-500ms per query
|
|
- Deezer: 50-200ms per query
|
|
- YouTube Music: 100-300ms per query
|
|
|
|
**Total latency:** Sum of all service queries (sequential execution). Expect 250-1000ms per track.
|
|
|
|
**Batch processing:** Linear scaling. 1000 tracks = 1000x single track latency.
|
|
|
|
### API Limitations
|
|
|
|
1. **No bulk queries:** Each track requires separate Align instance
|
|
2. **No async support:** Synchronous only
|
|
3. **No streaming results:** All-or-nothing queries
|
|
4. **No partial updates:** Can't update single field
|
|
5. **No validation:** No input validation, no output validation
|
|
6. **No error details:** Only None on failure
|
|
7. **Dead integrations:** AcousticBrainz non-functional
|
|
8. **Weak YouTube matching:** First result assumed correct
|
|
|
|
### API Strengths
|
|
|
|
1. **Simple interface:** Single class, clear getters
|
|
2. **Flexible input:** Works with identifiers or metadata
|
|
3. **Cascading fallback:** Graceful degradation
|
|
4. **Lazy evaluation:** Only query when needed
|
|
5. **JAMS support:** Academic standard format
|
|
|
|
### API Design Recommendations
|
|
|
|
For production use:
|
|
|
|
1. **Add exceptions:** Raise specific errors instead of returning None
|
|
2. **Add validation:** Validate input parameters
|
|
3. **Add async API:** Async versions of all getters
|
|
4. **Add bulk API:** Process multiple tracks in single call
|
|
5. **Add configuration:** Runtime configuration for thresholds
|
|
6. **Add logging:** Structured logging with correlation IDs
|
|
7. **Add rate limiting:** Respect API limits
|
|
8. **Remove dead code:** Delete AcousticBrainz methods
|
|
9. **Add documentation:** Docstrings for all public methods
|
|
10. **Add type hints:** Full type annotations
|
|
|
|
The API surface is clean and simple. The implementation needs hardening.
|