feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,714 @@
# minim: Architecture
## Architectural Pattern
minim follows a **library architecture**, not a client-server or microservices pattern. There is no daemon, no HTTP server, no background processes. The library runs entirely within the caller's Python process.
**Invocation Model:**
```python
from minim import spotify, tidal, qobuz
from minim.audio import Audio
# Instantiate API client
client = spotify.WebAPI(client_id="...", client_secret="...")
# Make API calls
results = client.search("Radiohead", types=["artist", "album"])
# Process audio files
audio = Audio("track.flac")
audio.set_metadata_using_spotify(results["tracks"]["items"][0])
audio.write_metadata()
```
All operations are synchronous, blocking calls. No event loop, no async/await in v1.
## Module Organization
The codebase is organized into eight top-level modules:
```
minim/
├── __init__.py # Package initialization, version info
├── audio.py # Audio file handling, metadata I/O
├── discogs.py # Discogs API client
├── itunes.py # iTunes Search API client
├── qobuz.py # Qobuz API client
├── spotify.py # Spotify Web API + private lyrics
├── tidal.py # TIDAL public + private API
└── utility.py # Shared utilities
```
**No Subpackages:** All modules are at the top level. No hierarchical organization despite 35K+ lines of code.
**Module Independence:** Each API client module is self-contained. No cross-dependencies between `spotify.py`, `tidal.py`, etc. They share only `utility.py` and standard library imports.
## Class Hierarchy
### Audio Module
```
Audio (base class)
├── FLAC
├── MP3
├── MP4
├── OggVorbis
└── WAVE
```
**Factory Pattern:** `Audio(filepath)` auto-detects format and returns appropriate subclass instance.
**Detection Logic:**
1. Check file extension (`.flac`, `.mp3`, `.m4a`, `.ogg`, `.wav`)
2. If ambiguous, read magic bytes from file header
3. Instantiate corresponding subclass
4. Raise `ValueError` if format unsupported
**Shared Interface:** All subclasses implement:
- `read_metadata()`: Parse tags from file
- `write_metadata()`: Write tags to file
- `convert(output_path, format)`: Transcode via FFmpeg
- `set_metadata_using_{service}(data)`: Map service JSON to tags
### API Client Classes
Each service module defines one or more API client classes:
**discogs.py:**
- `API`: Main Discogs API client (database, marketplace, collection, wantlist)
**itunes.py:**
- `SearchAPI`: iTunes Search API client
**qobuz.py:**
- `PrivateAPI`: Qobuz API client (uses undocumented endpoints)
**spotify.py:**
- `WebAPI`: Official Spotify Web API client
- `PrivateLyricsService`: Undocumented Musixmatch integration for lyrics
**tidal.py:**
- `API`: Public TIDAL API (documented endpoints)
- `PrivateAPI`: Private TIDAL API (undocumented endpoints for streaming URLs, lyrics, credits)
**Naming Convention:** "Private" indicates use of undocumented endpoints. These are reverse-engineered from web/mobile apps and may break without notice.
## Authentication Flow
All API clients follow a consistent initialization and authentication pattern:
### 1. Initialization (`__init__`)
```python
def __init__(self, client_id=None, client_secret=None, access_token=None, ...):
# Check environment variables
self.client_id = client_id or os.getenv("SERVICE_CLIENT_ID")
self.client_secret = client_secret or os.getenv("SERVICE_CLIENT_SECRET")
# Load from config file
config = ConfigParser()
config.read(os.path.expanduser("~/minim.cfg"))
if config.has_section("service"):
self.access_token = config.get("service", "access_token", fallback=None)
self.refresh_token = config.get("service", "refresh_token", fallback=None)
# Use provided tokens if available
if access_token:
self.access_token = access_token
```
**Precedence:** Explicit parameters > environment variables > config file
### 2. Flow Selection (`set_flow`)
```python
def set_flow(self, flow_type="authorization_code", redirect_uri="http://localhost:8888", ...):
self.flow_type = flow_type
self.redirect_uri = redirect_uri
self.scopes = scopes
```
**Supported Flows (Spotify example):**
- `authorization_code`: Full user access, requires user login
- `pkce`: Proof Key for Code Exchange, for mobile/desktop apps
- `client_credentials`: App-only access, no user context
- `web_player`: Extract token from browser cookie (undocumented)
### 3. Token Acquisition (`set_access_token`)
```python
def set_access_token(self, method="http.server"):
if self.flow_type == "authorization_code":
# Generate authorization URL
auth_url = self._build_auth_url()
# Open browser or print URL
webbrowser.open(auth_url)
# Start callback server
if method == "http.server":
code = self._listen_http_server()
elif method == "flask":
code = self._listen_flask()
elif method == "playwright":
code = self._automate_browser()
# Exchange code for token
token_response = self._exchange_code(code)
self.access_token = token_response["access_token"]
self.refresh_token = token_response.get("refresh_token")
# Save to config
self._save_config()
```
**Callback Methods:**
**http.server (default):**
```python
def _listen_http_server(self):
server = HTTPServer(("localhost", 8888), CallbackHandler)
server.handle_request() # Block until callback received
return server.authorization_code
```
**Flask:**
```python
def _listen_flask(self):
app = Flask(__name__)
@app.route("/callback")
def callback():
code = request.args.get("code")
# Store code and shutdown
return "Authorization successful"
app.run(port=8888)
```
**Playwright:**
```python
def _automate_browser(self):
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Navigate to auth URL
page.goto(auth_url)
# Fill login form (service-specific selectors)
page.fill("#username", self.email)
page.fill("#password", self.password)
page.click("button[type=submit]")
# Wait for redirect
page.wait_for_url(f"{self.redirect_uri}*")
# Extract code from URL
code = parse_qs(urlparse(page.url).query)["code"][0]
browser.close()
return code
```
### 4. Token Persistence
```python
def _save_config(self):
config = ConfigParser()
config.read(os.path.expanduser("~/minim.cfg"))
if not config.has_section("service"):
config.add_section("service")
config.set("service", "access_token", self.access_token)
if self.refresh_token:
config.set("service", "refresh_token", self.refresh_token)
with open(os.path.expanduser("~/minim.cfg"), "w") as f:
config.write(f)
```
**File Format (INI):**
```ini
[spotify]
client_id = abc123
client_secret = def456
access_token = BQC...
refresh_token = AQD...
expires_at = 1672531200
[tidal]
client_id = xyz789
access_token = eyJ...
refresh_token = eyJ...
```
**Security:** Plain text storage. File permissions default to user-readable (0644 on Unix). No encryption, no OS keychain integration.
### 5. Token Refresh
```python
def _request(self, method, url, **kwargs):
# Check if token expired
if self.expires_at and time.time() >= self.expires_at:
self._refresh_access_token()
# Make request with current token
response = requests.request(
method, url,
headers=self._get_headers(),
**kwargs
)
# Handle 401 Unauthorized (token invalid)
if response.status_code == 401:
self._refresh_access_token()
# Retry request
response = requests.request(method, url, headers=self._get_headers(), **kwargs)
return response
def _refresh_access_token(self):
response = requests.post(
self.token_url,
data={
"grant_type": "refresh_token",
"refresh_token": self.refresh_token,
"client_id": self.client_id,
"client_secret": self.client_secret
}
)
token_data = response.json()
self.access_token = token_data["access_token"]
self.expires_at = time.time() + token_data["expires_in"]
# Update refresh token if provided
if "refresh_token" in token_data:
self.refresh_token = token_data["refresh_token"]
self._save_config()
```
**Automatic Refresh:** Transparent to caller. If a request fails with 401, the client refreshes the token and retries automatically.
## Request Handling
All API clients implement a common `_request()` method:
```python
def _request(self, method: str, url: str, **kwargs) -> dict:
"""
Make HTTP request with authentication.
Args:
method: HTTP method (GET, POST, PUT, DELETE)
url: Full URL or path (prepended with base_url if relative)
**kwargs: Passed to requests.request()
Returns:
JSON response as dict
Raises:
RuntimeError: If response status is not 2xx
"""
# Prepend base URL if path is relative
if not url.startswith("http"):
url = self.base_url + url
# Add authentication headers
headers = kwargs.pop("headers", {})
headers.update(self._get_headers())
# Make request
response = requests.request(method, url, headers=headers, **kwargs)
# Check status
if not response.ok:
raise RuntimeError(
f"{method} {url} failed: {response.status_code} {response.text}"
)
# Parse JSON
return response.json()
```
**Header Injection:** Each service implements `_get_headers()`:
**Spotify (Bearer token):**
```python
def _get_headers(self):
return {"Authorization": f"Bearer {self.access_token}"}
```
**Discogs (OAuth 1.0a signature):**
```python
def _get_headers(self):
oauth = OAuth1(
self.consumer_key,
client_secret=self.consumer_secret,
resource_owner_key=self.access_token,
resource_owner_secret=self.access_token_secret
)
return oauth # requests-oauthlib handles header generation
```
**Qobuz (X-App-Id header + Bearer token):**
```python
def _get_headers(self):
return {
"X-App-Id": self.app_id,
"Authorization": f"Bearer {self.access_token}"
}
```
**Error Handling:** All HTTP errors raise `RuntimeError` with status code and response body. No typed exceptions, no retry logic, no exponential backoff.
**Rate Limiting:** Not implemented. Caller responsible for respecting service rate limits.
## Metadata Mapping Architecture
The `Audio` class provides service-specific metadata setters that normalize API responses to a common schema:
```python
class Audio:
def set_metadata_using_spotify(self, track_data: dict):
"""Map Spotify track object to audio metadata."""
self.title = track_data["name"]
self.artist = ", ".join(a["name"] for a in track_data["artists"])
self.album = track_data["album"]["name"]
self.date = track_data["album"]["release_date"]
self.track_number = track_data["track_number"]
self.disc_number = track_data["disc_number"]
self.isrc = track_data.get("external_ids", {}).get("isrc")
# Fetch artwork
if track_data["album"]["images"]:
artwork_url = track_data["album"]["images"][0]["url"]
self.artwork = requests.get(artwork_url).content
def set_metadata_using_tidal(self, track_data: dict):
"""Map TIDAL track object to audio metadata."""
self.title = track_data["title"]
self.artist = ", ".join(a["name"] for a in track_data["artists"])
self.album = track_data["album"]["title"]
self.date = track_data["streamStartDate"][:10] # ISO date to YYYY-MM-DD
self.track_number = track_data["trackNumber"]
self.disc_number = track_data["volumeNumber"]
self.isrc = track_data.get("isrc")
# Fetch artwork (construct URL from cover ID)
if track_data["album"]["cover"]:
cover_id = track_data["album"]["cover"].replace("-", "/")
artwork_url = f"https://resources.tidal.com/images/{cover_id}/1280x1280.jpg"
self.artwork = requests.get(artwork_url).content
```
**Normalization Challenges:**
1. **Artist Representation:**
- Spotify: Array of objects `[{"name": "Artist"}]`
- TIDAL: Array of objects `[{"name": "Artist"}]`
- iTunes: String `"Artist"`
- Qobuz: Object `{"name": "Artist"}` (single artist)
2. **Date Formats:**
- Spotify: ISO 8601 `"2023-01-15"` or year-only `"2023"`
- TIDAL: ISO 8601 with time `"2023-01-15T00:00:00.000Z"`
- iTunes: ISO 8601 `"2023-01-15T00:00:00Z"`
- Qobuz: Unix timestamp or ISO 8601
3. **Artwork URLs:**
- Spotify: Array of images with different sizes `[{"url": "...", "width": 640, "height": 640}]`
- TIDAL: Cover ID requiring URL construction
- iTunes: Direct URL `"artworkUrl100"`, `"artworkUrl600"`
- Qobuz: Direct URL with size parameter
4. **Track/Disc Numbers:**
- Spotify: Separate `track_number` and `disc_number` fields
- TIDAL: `trackNumber` and `volumeNumber`
- iTunes: Combined `"trackNumber": "3/12"` (track 3 of 12)
- Qobuz: Separate `track_number` and `media_number`
**Mapping Strategy:** Each `set_metadata_using_*()` method handles service-specific quirks and normalizes to the `Audio` class's internal representation.
## Audio File I/O Architecture
The `Audio` class uses `mutagen` for reading and writing metadata:
```python
class Audio:
def __init__(self, filepath: str):
self.filepath = filepath
self._file = mutagen.File(filepath)
if isinstance(self._file, mutagen.flac.FLAC):
self.__class__ = FLAC
elif isinstance(self._file, mutagen.mp3.MP3):
self.__class__ = MP3
elif isinstance(self._file, mutagen.mp4.MP4):
self.__class__ = MP4
# ... etc
def write_metadata(self):
"""Write metadata to file. Implemented by subclasses."""
raise NotImplementedError
class FLAC(Audio):
def write_metadata(self):
"""Write Vorbis Comments to FLAC file."""
self._file["TITLE"] = self.title
self._file["ARTIST"] = self.artist
self._file["ALBUM"] = self.album
self._file["DATE"] = self.date
self._file["TRACKNUMBER"] = str(self.track_number)
self._file["DISCNUMBER"] = str(self.disc_number)
if self.artwork:
picture = mutagen.flac.Picture()
picture.data = self.artwork
picture.type = 3 # Front cover
picture.mime = "image/jpeg"
self._file.add_picture(picture)
self._file.save()
class MP3(Audio):
def write_metadata(self):
"""Write ID3v2 tags to MP3 file."""
from mutagen.id3 import TIT2, TPE1, TALB, TDRC, TRCK, TPOS, APIC
self._file["TIT2"] = TIT2(encoding=3, text=self.title)
self._file["TPE1"] = TPE1(encoding=3, text=self.artist)
self._file["TALB"] = TALB(encoding=3, text=self.album)
self._file["TDRC"] = TDRC(encoding=3, text=self.date)
self._file["TRCK"] = TRCK(encoding=3, text=str(self.track_number))
self._file["TPOS"] = TPOS(encoding=3, text=str(self.disc_number))
if self.artwork:
self._file["APIC"] = APIC(
encoding=3,
mime="image/jpeg",
type=3, # Front cover
desc="Cover",
data=self.artwork
)
self._file.save()
```
**Tag Format Mapping:**
| Field | FLAC (Vorbis) | MP3 (ID3v2) | MP4 (Atoms) |
|-------|---------------|-------------|-------------|
| Title | `TITLE` | `TIT2` | `\xa9nam` |
| Artist | `ARTIST` | `TPE1` | `\xa9ART` |
| Album | `ALBUM` | `TALB` | `\xa9alb` |
| Date | `DATE` | `TDRC` | `\xa9day` |
| Track # | `TRACKNUMBER` | `TRCK` | `trkn` |
| Disc # | `DISCNUMBER` | `TPOS` | `disk` |
| Artwork | `METADATA_BLOCK_PICTURE` | `APIC` | `covr` |
**Format Conversion:**
```python
def convert(self, output_path: str, format: str, **ffmpeg_options):
"""Convert audio file to different format using FFmpeg."""
import subprocess
cmd = [
"ffmpeg",
"-i", self.filepath,
"-c:a", self._get_codec(format),
**self._build_ffmpeg_args(ffmpeg_options),
output_path
]
subprocess.run(cmd, check=True)
# Copy metadata to converted file
converted = Audio(output_path)
converted.title = self.title
converted.artist = self.artist
# ... copy all fields
converted.write_metadata()
def _get_codec(self, format: str) -> str:
"""Map format to FFmpeg codec."""
codecs = {
"flac": "flac",
"mp3": "libmp3lame",
"m4a": "aac",
"ogg": "libvorbis",
"wav": "pcm_s16le"
}
return codecs.get(format, format)
```
## Configuration Architecture
**File Location:** `~/minim.cfg` (expands to user's home directory)
**Format:** INI-style via Python's `ConfigParser`
**Structure:**
```ini
[discogs]
consumer_key = ...
consumer_secret = ...
access_token = ...
access_token_secret = ...
[qobuz]
app_id = ...
app_secret = ...
email = user@example.com
password = ...
access_token = ...
[spotify]
client_id = ...
client_secret = ...
access_token = ...
refresh_token = ...
expires_at = 1672531200
[tidal]
client_id = ...
client_secret = ...
access_token = ...
refresh_token = ...
user_id = 12345
country_code = US
```
**Reading:**
```python
config = ConfigParser()
config.read(os.path.expanduser("~/minim.cfg"))
if config.has_section("spotify"):
access_token = config.get("spotify", "access_token", fallback=None)
refresh_token = config.get("spotify", "refresh_token", fallback=None)
```
**Writing:**
```python
config = ConfigParser()
config.read(os.path.expanduser("~/minim.cfg"))
if not config.has_section("spotify"):
config.add_section("spotify")
config.set("spotify", "access_token", new_token)
with open(os.path.expanduser("~/minim.cfg"), "w") as f:
config.write(f)
```
**Thread Safety:** Not thread-safe. Concurrent writes from multiple processes can corrupt the file. No file locking implemented.
## Error Handling Architecture
**Strategy:** Fail-fast with `RuntimeError`
**API Errors:**
```python
def _request(self, method, url, **kwargs):
response = requests.request(method, url, **kwargs)
if not response.ok:
raise RuntimeError(
f"{method} {url} failed with status {response.status_code}: {response.text}"
)
return response.json()
```
**File Errors:**
```python
def __init__(self, filepath):
if not os.path.exists(filepath):
raise FileNotFoundError(f"Audio file not found: {filepath}")
self._file = mutagen.File(filepath)
if self._file is None:
raise ValueError(f"Unsupported audio format: {filepath}")
```
**No Typed Exceptions:** All errors are generic `RuntimeError`, `ValueError`, `FileNotFoundError`. No custom exception hierarchy.
**No Retry Logic:** Failed requests are not retried. Caller must implement retry logic if needed.
**No Logging:** Errors are raised, not logged. No warning messages for non-critical issues.
## Dependency Injection
minim does not use formal dependency injection. Configuration is passed via:
1. **Constructor parameters:** `WebAPI(client_id="...", client_secret="...")`
2. **Environment variables:** `os.getenv("SPOTIFY_CLIENT_ID")`
3. **Config file:** `ConfigParser().read("~/minim.cfg")`
**No DI Framework:** No use of `injector`, `dependency-injector`, or similar libraries.
**Testing Implications:** Difficult to mock API clients. Tests use real API calls with credentials from environment variables or config file.
## Concurrency Model
**Synchronous Only:** All operations are blocking, synchronous calls.
**No Async Support:** No `async`/`await`, no `asyncio`, no `aiohttp`.
**Threading:** Not thread-safe. Shared state (config file, token refresh) can cause race conditions.
**Multiprocessing:** Safe for read-only operations. Token refresh in multiple processes can corrupt config file.
## Extensibility
**Adding New Services:**
1. Create new module (e.g., `apple_music.py`)
2. Define API client class with `__init__`, `set_flow`, `set_access_token`, `_request`, `_get_headers`
3. Implement service-specific methods (`search`, `get_track`, etc.)
4. Add `set_metadata_using_apple_music()` to `Audio` class
**No Plugin System:** No formal extension mechanism. New services require modifying the library code.
**Subclassing:** API client classes can be subclassed to override behavior:
```python
class CustomSpotifyAPI(spotify.WebAPI):
def _request(self, method, url, **kwargs):
# Add custom logging
print(f"Making request: {method} {url}")
return super()._request(method, url, **kwargs)
```
## Deployment Architecture
**Not Applicable:** minim is a library, not a deployable service. No server, no containers, no orchestration.
**Distribution:** Install via pip from source repository.
**Runtime:** Runs in caller's Python process. No separate runtime environment.
## Summary
minim's architecture is straightforward and pragmatic:
- **Library pattern** with no server components
- **Synchronous, blocking** operations throughout
- **Consistent authentication flow** across all services
- **Automatic token management** with file-based persistence
- **Service-specific metadata mapping** to common schema
- **Format-agnostic audio I/O** via mutagen
- **Fail-fast error handling** with generic exceptions
The architecture prioritizes simplicity and ease of use over scalability and robustness. It's well-suited for personal projects, scripts, and research but lacks features needed for production services (async, rate limiting, typed exceptions, secure storage).
The v2 rewrite on the `dev` branch addresses many architectural limitations while preserving the core design philosophy.
+904
View File
@@ -0,0 +1,904 @@
# minim: Codebase Analysis
## Repository Structure
```
minim/
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions CI/CD
├── docs/
│ ├── conf.py # Sphinx configuration
│ ├── index.rst # Documentation index
│ └── ... # Additional documentation
├── minim/
│ ├── __init__.py # Package initialization (65 lines)
│ ├── audio.py # Audio file handling (1,860 lines)
│ ├── discogs.py # Discogs API client (5,501 lines)
│ ├── itunes.py # iTunes API client (575 lines)
│ ├── qobuz.py # Qobuz API client (5,579 lines)
│ ├── spotify.py # Spotify API client (9,862 lines)
│ ├── tidal.py # TIDAL API client (12,338 lines)
│ └── utility.py # Shared utilities (136 lines)
├── tests/
│ ├── test_audio.py # Audio module tests
│ ├── test_discogs.py # Discogs tests
│ ├── test_itunes.py # iTunes tests
│ ├── test_qobuz.py # Qobuz tests
│ ├── test_spotify.py # Spotify tests
│ └── test_tidal.py # TIDAL tests
├── .coveragerc # Coverage configuration
├── .gitignore # Git ignore patterns
├── environment.yml # Conda environment
├── LICENSE # GPL-3.0 license
├── README.md # Project README
└── setup.py # Package setup
```
**Total Source Lines:** 35,916 (excluding tests, docs, config)
**Module Distribution:**
- `tidal.py`: 34.4% (12,338 lines)
- `spotify.py`: 27.5% (9,862 lines)
- `discogs.py`: 15.3% (5,501 lines)
- `qobuz.py`: 15.5% (5,579 lines)
- `audio.py`: 5.2% (1,860 lines)
- `itunes.py`: 1.6% (575 lines)
- `utility.py`: 0.4% (136 lines)
- `__init__.py`: 0.2% (65 lines)
**Observation:** `tidal.py` is disproportionately large. This suggests either comprehensive API coverage or a need for refactoring into submodules.
## Code Organization
### Package Initialization (`__init__.py`)
**Purpose:** Package metadata and version info
**Contents:**
```python
"""
minim: Comprehensive music metadata library
"""
__version__ = "1.1.0"
__author__ = "Benjamin Ye"
__email__ = "bbye98@gmail.com"
__license__ = "GPL-3.0"
__url__ = "https://github.com/bbye98/minim"
# No automatic imports (users import specific modules)
```
**Design Choice:** No automatic imports. Users explicitly import modules:
```python
from minim import spotify # Not: from minim.spotify import WebAPI
```
### Utility Module (`utility.py`)
**Purpose:** Shared utilities across all modules
**Functions:**
**Config File Handling:**
```python
def get_config_path() -> str:
"""Get path to minim config file."""
return os.path.expanduser("~/minim.cfg")
def load_config() -> ConfigParser:
"""Load config file."""
config = ConfigParser()
config.read(get_config_path())
return config
def save_config(config: ConfigParser) -> None:
"""Save config file."""
with open(get_config_path(), "w") as f:
config.write(f)
```
**String Formatting:**
```python
def format_duration(seconds: int) -> str:
"""Format duration in seconds to MM:SS or HH:MM:SS."""
hours, remainder = divmod(seconds, 3600)
minutes, seconds = divmod(remainder, 60)
if hours > 0:
return f"{hours}:{minutes:02d}:{seconds:02d}"
else:
return f"{minutes}:{seconds:02d}"
def sanitize_filename(filename: str) -> str:
"""Remove invalid characters from filename."""
invalid_chars = '<>:"/\\|?*'
for char in invalid_chars:
filename = filename.replace(char, "_")
return filename
```
**URL Handling:**
```python
def build_url(base: str, path: str, params: dict = None) -> str:
"""Build URL with path and query parameters."""
url = base.rstrip("/") + "/" + path.lstrip("/")
if params:
query = "&".join(f"{k}={v}" for k, v in params.items() if v is not None)
url += "?" + query
return url
```
**Minimal Utilities:** Only 136 lines. Most logic is self-contained within each module.
## Configuration Management
### Config File Format
**Location:** `~/minim.cfg`
**Parser:** Python's `ConfigParser` (INI format)
**Structure:**
```ini
[section_name]
key = value
key2 = value2
```
**Reading:**
```python
from configparser import ConfigParser
import os
config = ConfigParser()
config.read(os.path.expanduser("~/minim.cfg"))
value = config.get("section", "key", fallback=None)
int_value = config.getint("section", "key", fallback=0)
bool_value = config.getboolean("section", "key", fallback=False)
```
**Writing:**
```python
if not config.has_section("section"):
config.add_section("section")
config.set("section", "key", "value")
with open(os.path.expanduser("~/minim.cfg"), "w") as f:
config.write(f)
```
### Environment Variables
**Pattern:** `{SERVICE}_{FIELD}` in uppercase
**Examples:**
- `SPOTIFY_CLIENT_ID`
- `TIDAL_ACCESS_TOKEN`
- `QOBUZ_EMAIL`
**Reading:**
```python
import os
client_id = os.getenv("SPOTIFY_CLIENT_ID")
client_secret = os.getenv("SPOTIFY_CLIENT_SECRET")
```
**Precedence in Code:**
```python
def __init__(self, client_id=None, client_secret=None):
# 1. Explicit parameter
self.client_id = client_id
# 2. Environment variable
if not self.client_id:
self.client_id = os.getenv("SPOTIFY_CLIENT_ID")
# 3. Config file
if not self.client_id:
config = load_config()
if config.has_section("spotify"):
self.client_id = config.get("spotify", "client_id", fallback=None)
```
## Logging and Error Handling
### Logging
**No Structured Logging:** minim does not use Python's `logging` module.
**Warnings:**
```python
import warnings
warnings.warn("Token will expire soon", UserWarning)
```
**Use Cases:**
- Non-critical issues (token expiration warnings)
- Deprecated features
- Fallback behavior
**No Debug Logging:** No verbose output for debugging. Users must add their own logging.
### Error Handling
**Strategy:** Fail-fast with exceptions
**Exception Types:**
- `RuntimeError`: API errors, HTTP failures
- `ValueError`: Invalid input, unsupported formats
- `FileNotFoundError`: Missing audio files
- `KeyError`: Missing required fields in API responses
**No Custom Exceptions:** All errors use built-in exception types.
**Example:**
```python
def _request(self, method, url, **kwargs):
response = requests.request(method, url, **kwargs)
if not response.ok:
raise RuntimeError(
f"{method} {url} failed: {response.status_code} {response.text}"
)
return response.json()
```
**Error Messages:**
- Include HTTP method and URL
- Include status code and response body
- No error codes or structured error objects
**Caller Responsibility:**
```python
try:
track = api.get_track(12345)
except RuntimeError as e:
# Parse error message to determine cause
if "404" in str(e):
print("Track not found")
elif "401" in str(e):
print("Authentication failed")
else:
print(f"Unknown error: {e}")
```
## Testing Infrastructure
### Test Framework
**Tool:** pytest
**Test Files:**
- `tests/test_audio.py`: Audio file handling tests
- `tests/test_discogs.py`: Discogs API tests
- `tests/test_itunes.py`: iTunes API tests
- `tests/test_qobuz.py`: Qobuz API tests
- `tests/test_spotify.py`: Spotify API tests
- `tests/test_tidal.py`: TIDAL API tests
**Test Structure:**
```python
import pytest
from minim import spotify
class TestSpotifyWebAPI:
@classmethod
def setup_class(cls):
"""Set up API client for all tests."""
cls.api = spotify.WebAPI(
client_id=os.getenv("SPOTIFY_CLIENT_ID"),
client_secret=os.getenv("SPOTIFY_CLIENT_SECRET")
)
cls.api.set_flow("client_credentials")
cls.api.set_access_token()
def test_search(self):
"""Test search functionality."""
results = self.api.search("Radiohead", types=["artist"], limit=1)
assert "artists" in results
assert len(results["artists"]["items"]) > 0
assert results["artists"]["items"][0]["name"] == "Radiohead"
def test_get_artist(self):
"""Test get artist by ID."""
artist = self.api.get_artist("4Z8W4fKeB5YxbusRsdQVPb")
assert artist["name"] == "Radiohead"
assert artist["type"] == "artist"
def test_invalid_id(self):
"""Test error handling for invalid ID."""
with pytest.raises(RuntimeError):
self.api.get_artist("invalid_id")
```
**Class-Based Tests:**
- `setup_class()`: Run once before all tests in class
- `teardown_class()`: Run once after all tests in class
- Shared API client across tests (reduces authentication overhead)
**Real API Calls:**
- Tests make actual HTTP requests to services
- Requires valid credentials in environment variables
- May fail if services are down or rate limits exceeded
**No Mocking:** Tests do not use `unittest.mock` or `responses` library. All API calls are real.
**Pros:**
- Tests verify actual API behavior
- Catches API changes immediately
**Cons:**
- Slow (network latency)
- Flaky (depends on service availability)
- Rate limiting issues
- Requires credentials
### Coverage Configuration
**File:** `.coveragerc`
```ini
[run]
source = minim
omit =
*/tests/*
*/__init__.py
*/site-packages/*
[report]
exclude_lines =
pragma: no cover
def __repr__
raise AssertionError
raise NotImplementedError
if __name__ == .__main__.:
if TYPE_CHECKING:
precision = 2
show_missing = True
```
**Coverage Execution:**
```bash
coverage run -m pytest tests/
coverage report
coverage html
```
**Coverage Metrics:** Not documented in repository. Estimated 60-80% based on test file count and module complexity.
### Continuous Integration
**Platform:** GitHub Actions
**Workflow:** `.github/workflows/ci.yml`
**Triggers:**
- Push to `main` or `dev` branches
- Pull requests to `main`
**Jobs:**
**Linting:**
```yaml
- name: Lint with ruff
run: ruff check .
```
**Testing:**
```yaml
- name: Run tests
env:
SPOTIFY_CLIENT_ID: ${{ secrets.SPOTIFY_CLIENT_ID }}
SPOTIFY_CLIENT_SECRET: ${{ secrets.SPOTIFY_CLIENT_SECRET }}
TIDAL_CLIENT_ID: ${{ secrets.TIDAL_CLIENT_ID }}
TIDAL_CLIENT_SECRET: ${{ secrets.TIDAL_CLIENT_SECRET }}
run: pytest tests/
```
**Environment:**
- OS: Ubuntu 22.04
- Python: 3.9
- FFmpeg: Installed via apt
**Secrets:** API credentials stored in GitHub Secrets, injected as environment variables.
## Code Style
### Linting
**Tool:** ruff (modern, fast Python linter)
**Replaces:** flake8, pylint, isort, pyupgrade
**Configuration:** `pyproject.toml` or `ruff.toml`
```toml
[tool.ruff]
line-length = 88
target-version = "py39"
[tool.ruff.lint]
select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # pyflakes
"I", # isort
"N", # pep8-naming
"UP", # pyupgrade
]
ignore = [
"E501", # line too long (handled by formatter)
]
```
**Execution:**
```bash
ruff check .
ruff check --fix . # Auto-fix issues
```
### Formatting
**No Formatter:** minim does not use `black`, `autopep8`, or similar formatters.
**Style:** Follows PEP 8 with manual formatting.
**Line Length:** Approximately 88 characters (black default), but not enforced.
### Type Hints
**Partial Coverage:** Type hints used inconsistently.
**Examples:**
**With Type Hints:**
```python
def search(self, query: str, types: list[str] = ["track"], limit: int = 20) -> dict:
"""Search Spotify catalog."""
...
```
**Without Type Hints:**
```python
def _request(self, method, url, **kwargs):
"""Make HTTP request."""
...
```
**No Type Checking:** Does not use `mypy` or `pyright` for static type checking.
**Recommendation for v2:** Add comprehensive type hints and integrate `mypy` into CI.
### Docstrings
**Format:** Google-style docstrings
**Example:**
```python
def get_track(self, track_id: str, market: str = None) -> dict:
"""
Get track details.
Args:
track_id: Spotify track ID
market: ISO 3166-1 alpha-2 country code
Returns:
Track object with metadata
Raises:
RuntimeError: If API request fails
Example:
>>> api = WebAPI(client_id="...", client_secret="...")
>>> track = api.get_track("3n3Ppam7vgaVa1iaRUc9Lp")
>>> print(track["name"])
Creep
"""
params = {}
if market:
params["market"] = market
return self._request("GET", f"/tracks/{track_id}", params=params)
```
**Coverage:** Most public methods have docstrings. Private methods (`_request`, `_get_headers`) often lack documentation.
**Sphinx Integration:** Docstrings parsed by Sphinx for ReadTheDocs documentation.
## Code Patterns
### API Client Pattern
**Common Structure:**
```python
class API:
def __init__(self, client_id=None, client_secret=None, access_token=None):
# Load credentials from parameters, env vars, or config file
self.client_id = client_id or os.getenv("SERVICE_CLIENT_ID")
self.client_secret = client_secret or os.getenv("SERVICE_CLIENT_SECRET")
self.access_token = access_token
# Load from config file if not provided
config = load_config()
if config.has_section("service"):
self.access_token = self.access_token or config.get("service", "access_token")
# API base URL
self.base_url = "https://api.service.com/v1"
def set_flow(self, flow_type="authorization_code", **kwargs):
"""Configure OAuth flow."""
self.flow_type = flow_type
# Store flow-specific parameters
def set_access_token(self, method="http.server"):
"""Obtain access token via OAuth flow."""
# Implement OAuth flow
# Save token to config file
def _get_headers(self) -> dict:
"""Get HTTP headers with authentication."""
return {"Authorization": f"Bearer {self.access_token}"}
def _request(self, method: str, url: str, **kwargs) -> dict:
"""Make authenticated HTTP request."""
if not url.startswith("http"):
url = self.base_url + url
headers = kwargs.pop("headers", {})
headers.update(self._get_headers())
response = requests.request(method, url, headers=headers, **kwargs)
if not response.ok:
raise RuntimeError(f"{method} {url} failed: {response.status_code}")
return response.json()
# Public API methods
def search(self, query: str, **kwargs) -> dict:
"""Search catalog."""
return self._request("GET", "/search", params={"q": query, **kwargs})
def get_track(self, track_id: str) -> dict:
"""Get track details."""
return self._request("GET", f"/tracks/{track_id}")
```
**Consistency:** All API clients (`discogs.py`, `spotify.py`, `tidal.py`, `qobuz.py`) follow this pattern with minor variations.
### Audio File Pattern
**Base Class with Subclasses:**
```python
class Audio:
def __init__(self, filepath: str):
self.filepath = filepath
self._file = mutagen.File(filepath)
# Auto-detect format and change class
if isinstance(self._file, mutagen.flac.FLAC):
self.__class__ = FLAC
elif isinstance(self._file, mutagen.mp3.MP3):
self.__class__ = MP3
# ... etc
self.read_metadata()
def read_metadata(self):
"""Read metadata from file. Implemented by subclasses."""
raise NotImplementedError
def write_metadata(self):
"""Write metadata to file. Implemented by subclasses."""
raise NotImplementedError
class FLAC(Audio):
def read_metadata(self):
self.title = self._file.get("TITLE", [None])[0]
self.artist = self._file.get("ARTIST", [None])[0]
# ... etc
def write_metadata(self):
self._file["TITLE"] = self.title
self._file["ARTIST"] = self.artist
# ... etc
self._file.save()
```
**Dynamic Class Change:** `self.__class__ = FLAC` changes instance class after initialization. Unusual pattern but works for format auto-detection.
### OAuth Callback Pattern
**Three Implementations:**
**1. http.server:**
```python
def _listen_http_server(self):
class CallbackHandler(BaseHTTPRequestHandler):
def do_GET(self):
query = parse_qs(urlparse(self.path).query)
self.server.authorization_code = query.get("code", [None])[0]
self.send_response(200)
self.end_headers()
self.wfile.write(b"Authorization successful. You may close this window.")
server = HTTPServer(("localhost", 8888), CallbackHandler)
server.handle_request()
return server.authorization_code
```
**2. Flask:**
```python
def _listen_flask(self):
app = Flask(__name__)
authorization_code = None
@app.route("/callback")
def callback():
nonlocal authorization_code
authorization_code = request.args.get("code")
shutdown = request.environ.get("werkzeug.server.shutdown")
if shutdown:
shutdown()
return "Authorization successful. You may close this window."
app.run(port=8888)
return authorization_code
```
**3. Playwright:**
```python
def _automate_browser(self):
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(self.auth_url)
page.fill("#username", self.email)
page.fill("#password", self.password)
page.click("button[type=submit]")
page.wait_for_url(f"{self.redirect_uri}*")
code = parse_qs(urlparse(page.url).query)["code"][0]
browser.close()
return code
```
**Flexibility:** Users choose callback method based on environment (headless server, desktop, etc.).
## Code Quality Issues
### Large Monolithic Files
**Problem:** `tidal.py` is 12,338 lines (34% of codebase).
**Impact:**
- Difficult to navigate
- Slow to load in editors
- Hard to maintain
- Merge conflicts more likely
**Recommendation:** Split into submodules:
```
minim/tidal/
├── __init__.py
├── auth.py # Authentication
├── catalog.py # Catalog endpoints
├── streaming.py # Streaming URLs
├── lyrics.py # Lyrics endpoints
├── user.py # User library
└── models.py # Data models
```
### Generic Error Handling
**Problem:** All errors are `RuntimeError` with string messages.
**Impact:**
- Caller must parse error messages to determine cause
- No structured error handling
- Difficult to distinguish error types
**Recommendation:** Define custom exceptions:
```python
class MinimError(Exception):
"""Base exception for minim."""
class APIError(MinimError):
"""API request failed."""
def __init__(self, status_code: int, message: str):
self.status_code = status_code
self.message = message
super().__init__(f"API error {status_code}: {message}")
class AuthenticationError(MinimError):
"""Authentication failed."""
class RateLimitError(APIError):
"""Rate limit exceeded."""
def __init__(self, retry_after: int):
self.retry_after = retry_after
super().__init__(429, f"Rate limit exceeded. Retry after {retry_after}s")
```
### No Rate Limiting
**Problem:** No built-in rate limiting. Caller responsible for tracking.
**Impact:**
- Easy to exceed service rate limits
- No automatic backoff
- Tests may fail due to rate limiting
**Recommendation:** Implement rate limiter:
```python
from time import time, sleep
class RateLimiter:
def __init__(self, requests_per_minute: int):
self.requests_per_minute = requests_per_minute
self.requests = []
def wait_if_needed(self):
now = time()
# Remove requests older than 1 minute
self.requests = [t for t in self.requests if now - t < 60]
if len(self.requests) >= self.requests_per_minute:
sleep_time = 60 - (now - self.requests[0])
if sleep_time > 0:
sleep(sleep_time)
self.requests.append(time())
# Usage in API client
class API:
def __init__(self):
self.rate_limiter = RateLimiter(60) # 60 requests per minute
def _request(self, method, url, **kwargs):
self.rate_limiter.wait_if_needed()
# Make request
```
### Plain Text Token Storage
**Problem:** Tokens stored unencrypted in `~/minim.cfg`.
**Impact:**
- Security risk on shared systems
- Tokens readable by any process
- Passwords stored in plain text (Qobuz)
**Recommendation:** Use OS keychain:
```python
import keyring
# Store token
keyring.set_password("minim", "spotify_access_token", access_token)
# Retrieve token
access_token = keyring.get_password("minim", "spotify_access_token")
```
### Inconsistent Type Hints
**Problem:** Some functions have type hints, others don't.
**Impact:**
- Reduced IDE autocomplete support
- No static type checking
- Harder to understand function signatures
**Recommendation:** Add comprehensive type hints and enable `mypy`:
```python
from typing import Optional, Dict, List, Any
def search(
self,
query: str,
types: List[str] = ["track"],
limit: int = 20,
offset: int = 0
) -> Dict[str, Any]:
"""Search catalog."""
...
```
## Code Metrics
### Complexity
**Cyclomatic Complexity:** Not measured. Likely moderate to high in large modules (`tidal.py`, `spotify.py`).
**Recommendation:** Use `radon` to measure complexity:
```bash
pip install radon
radon cc minim/ -a # Average complexity
radon cc minim/ -n D # Show functions with complexity > D (high)
```
### Duplication
**Code Duplication:** Likely present across API clients (authentication, request handling).
**Recommendation:** Extract common patterns to base class:
```python
class BaseAPI:
def __init__(self, service_name: str):
self.service_name = service_name
self.load_credentials()
def load_credentials(self):
# Common credential loading logic
...
def _request(self, method, url, **kwargs):
# Common request handling
...
class SpotifyAPI(BaseAPI):
def __init__(self):
super().__init__("spotify")
self.base_url = "https://api.spotify.com/v1"
```
### Dependencies
**Direct Dependencies:** 3 (cryptography, mutagen, requests)
**Optional Dependencies:** 6 (ffmpeg, flask, levenshtein, numpy, pillow, playwright)
**Dependency Graph:** Flat (no transitive dependencies within minim modules).
**Recommendation:** Keep dependencies minimal. Current approach is good.
## Summary
minim's codebase is well-structured for a personal project but shows signs of organic growth:
**Strengths:**
- Consistent API client pattern across modules
- Comprehensive test coverage with real API calls
- Good documentation (docstrings, ReadTheDocs)
- Minimal dependencies
- CI/CD with GitHub Actions
**Weaknesses:**
- Large monolithic files (`tidal.py` at 12K lines)
- Generic error handling (all `RuntimeError`)
- No rate limiting
- Plain text token storage
- Inconsistent type hints
- No static type checking
**Recommendations for v2:**
- Split large modules into subpackages
- Define custom exception hierarchy
- Implement rate limiting and backoff
- Use OS keychain for token storage
- Add comprehensive type hints
- Integrate `mypy` for static type checking
- Extract common patterns to base classes
- Add code complexity and duplication metrics to CI
The codebase is production-ready for personal use but requires hardening for commercial or large-scale deployment. The v2 rewrite on the `dev` branch addresses many of these issues.
+664
View File
@@ -0,0 +1,664 @@
# minim: Data Management
## Data Storage Architecture
minim does **not use a database**. All data is either:
1. **Ephemeral:** API responses held in memory during execution
2. **Token Storage:** OAuth tokens persisted to `~/minim.cfg`
3. **Audio Metadata:** Written to audio file tags via mutagen
There is no SQL database, no NoSQL store, no caching layer, no persistent data beyond configuration and audio files.
## Token Storage
### File Location
**Path:** `~/minim.cfg` (expands to user's home directory)
**Format:** INI-style configuration file via Python's `ConfigParser`
**Permissions:** Default file permissions (typically 0644 on Unix, readable by user and group)
**Security:** Plain text storage. No encryption, no obfuscation, no OS keychain integration.
### File Structure
```ini
[discogs]
consumer_key = Abcd1234Efgh5678
consumer_secret = IjklMnopQrstUvwx
access_token = YzabCdefGhijKlmn
access_token_secret = OpqrStuvWxyzAbcd
[qobuz]
app_id = 123456789
app_secret = abcdefghijklmnopqrstuvwxyz
email = user@example.com
password = MySecurePassword123
access_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
expires_at = 1672531200
[spotify]
client_id = 1234567890abcdef1234567890abcdef
client_secret = fedcba0987654321fedcba0987654321
redirect_uri = http://localhost:8888
access_token = BQDxK7...truncated...
refresh_token = AQBz3...truncated...
expires_at = 1672527600
scopes = user-library-read,playlist-read-private,user-read-playback-state
[tidal]
client_id = abcdefgh-1234-5678-90ab-cdefghijklmn
client_secret = ijklmnop-qrst-uvwx-yzab-cdefghijklmn
access_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
refresh_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
user_id = 12345678
country_code = US
expires_at = 1672534800
```
### Data Fields
**Common Fields (OAuth 2.0):**
- `client_id`: Application identifier
- `client_secret`: Application secret
- `access_token`: Bearer token for API requests
- `refresh_token`: Token for obtaining new access tokens
- `expires_at`: Unix timestamp when access token expires
**Service-Specific Fields:**
**Discogs (OAuth 1.0a):**
- `consumer_key`: OAuth consumer key
- `consumer_secret`: OAuth consumer secret
- `access_token`: OAuth access token
- `access_token_secret`: OAuth access token secret
- `personal_access_token`: Alternative to OAuth (from Discogs settings)
**Qobuz:**
- `app_id`: Qobuz application ID (extracted from web player)
- `app_secret`: Qobuz application secret (extracted from web player)
- `email`: User email for password grant
- `password`: User password (stored in plain text)
**Spotify:**
- `redirect_uri`: OAuth redirect URI
- `scopes`: Comma-separated list of permission scopes
**TIDAL:**
- `user_id`: TIDAL user ID (numeric)
- `country_code`: Two-letter country code for content availability
### Read/Write Operations
**Reading:**
```python
from configparser import ConfigParser
import os
config = ConfigParser()
config.read(os.path.expanduser("~/minim.cfg"))
if config.has_section("spotify"):
access_token = config.get("spotify", "access_token", fallback=None)
refresh_token = config.get("spotify", "refresh_token", fallback=None)
expires_at = config.getint("spotify", "expires_at", fallback=0)
```
**Writing:**
```python
config = ConfigParser()
config.read(os.path.expanduser("~/minim.cfg"))
if not config.has_section("spotify"):
config.add_section("spotify")
config.set("spotify", "access_token", new_access_token)
config.set("spotify", "refresh_token", new_refresh_token)
config.set("spotify", "expires_at", str(int(time.time()) + 3600))
with open(os.path.expanduser("~/minim.cfg"), "w") as f:
config.write(f)
```
**Concurrency:** Not thread-safe. Concurrent writes from multiple processes can corrupt the file. No file locking, no atomic writes.
### Security Implications
**Risks:**
1. **Plain Text Passwords:** Qobuz passwords stored unencrypted
2. **Token Exposure:** Access tokens readable by any process running as the user
3. **No Expiration Cleanup:** Expired tokens remain in file indefinitely
4. **File Permissions:** Default permissions may allow group/other read access
**Mitigations (Not Implemented):**
- Encrypt sensitive fields using OS keychain (Keyring, Keychain Access, Windows Credential Manager)
- Set restrictive file permissions (0600, user-only read/write)
- Use environment variables for sensitive credentials
- Implement token rotation and cleanup
**Recommendation:** For production use, replace file-based storage with secure credential management (AWS Secrets Manager, HashiCorp Vault, OS keychain).
## Audio Metadata Storage
### Tag Formats
minim writes metadata to audio files using format-specific tag systems:
| Format | Tag System | Implementation |
|--------|------------|----------------|
| FLAC | Vorbis Comments | `mutagen.flac.FLAC` |
| MP3 | ID3v2.4 | `mutagen.id3.ID3` |
| MP4/M4A | MP4 Atoms | `mutagen.mp4.MP4` |
| Ogg Vorbis | Vorbis Comments | `mutagen.oggvorbis.OggVorbis` |
| WAVE | ID3v2 (non-standard) | `mutagen.wave.WAVE` |
### Field Mapping
**FLAC (Vorbis Comments):**
```
TITLE = Track title
ARTIST = Primary artist(s)
ALBUMARTIST = Album artist
ALBUM = Album title
DATE = Release date (YYYY-MM-DD or YYYY)
GENRE = Genre
TRACKNUMBER = Track number
DISCNUMBER = Disc number
ISRC = International Standard Recording Code
BARCODE = UPC/EAN barcode
LYRICS = Song lyrics
COMMENT = Freeform comment
COPYRIGHT = Copyright notice
METADATA_BLOCK_PICTURE = Embedded artwork (base64-encoded)
```
**MP3 (ID3v2.4):**
```
TIT2 = Track title
TPE1 = Primary artist(s)
TPE2 = Album artist
TALB = Album title
TDRC = Release date
TCON = Genre
TRCK = Track number (format: "3" or "3/12")
TPOS = Disc number (format: "1" or "1/2")
TSRC = ISRC
TXXX:BARCODE = UPC/EAN barcode (custom frame)
USLT = Unsynchronized lyrics
COMM = Comment
TCOP = Copyright
APIC = Attached picture (artwork)
```
**MP4 (Atoms):**
```
©nam = Track title
©ART = Primary artist(s)
aART = Album artist
©alb = Album title
©day = Release date
©gen = Genre
trkn = Track number (tuple: (track, total))
disk = Disc number (tuple: (disc, total))
----:com.apple.iTunes:ISRC = ISRC (custom atom)
----:com.apple.iTunes:BARCODE = UPC/EAN barcode
©lyr = Lyrics
©cmt = Comment
cprt = Copyright
covr = Cover art
```
**Ogg Vorbis (Vorbis Comments):**
Same as FLAC (both use Vorbis Comments).
**WAVE (ID3v2):**
Same as MP3 (WAVE files can contain ID3v2 tags, though non-standard).
### Write Operations
**FLAC Example:**
```python
import mutagen.flac
audio = mutagen.flac.FLAC("track.flac")
# Text fields
audio["TITLE"] = "Creep"
audio["ARTIST"] = "Radiohead"
audio["ALBUM"] = "Pablo Honey"
audio["DATE"] = "1993"
audio["TRACKNUMBER"] = "2"
audio["DISCNUMBER"] = "1"
audio["ISRC"] = "GBAYE9200070"
# Artwork
picture = mutagen.flac.Picture()
picture.type = 3 # Front cover
picture.mime = "image/jpeg"
picture.desc = "Cover"
picture.data = open("cover.jpg", "rb").read()
audio.add_picture(picture)
audio.save()
```
**MP3 Example:**
```python
from mutagen.id3 import ID3, TIT2, TPE1, TALB, TDRC, TRCK, APIC
audio = ID3("track.mp3")
audio["TIT2"] = TIT2(encoding=3, text="Creep")
audio["TPE1"] = TPE1(encoding=3, text="Radiohead")
audio["TALB"] = TALB(encoding=3, text="Pablo Honey")
audio["TDRC"] = TDRC(encoding=3, text="1993")
audio["TRCK"] = TRCK(encoding=3, text="2/12")
audio["APIC"] = APIC(
encoding=3,
mime="image/jpeg",
type=3,
desc="Cover",
data=open("cover.jpg", "rb").read()
)
audio.save()
```
**MP4 Example:**
```python
import mutagen.mp4
audio = mutagen.mp4.MP4("track.m4a")
audio["©nam"] = "Creep"
audio["©ART"] = "Radiohead"
audio["©alb"] = "Pablo Honey"
audio["©day"] = "1993"
audio["trkn"] = [(2, 12)] # Track 2 of 12
audio["disk"] = [(1, 1)] # Disc 1 of 1
audio["covr"] = [
mutagen.mp4.MP4Cover(
open("cover.jpg", "rb").read(),
imageformat=mutagen.mp4.MP4Cover.FORMAT_JPEG
)
]
audio.save()
```
### Read Operations
**Auto-Detection:**
```python
import mutagen
audio = mutagen.File("track.flac")
# Access fields (format-agnostic where possible)
title = audio.get("TITLE", [None])[0] # FLAC/Ogg
title = audio.get("TIT2", None) # MP3
title = audio.get("©nam", [None])[0] # MP4
```
**minim Abstraction:**
```python
from minim.audio import Audio
audio = Audio("track.flac") # Auto-detects format
# Unified interface
print(audio.title)
print(audio.artist)
print(audio.album)
print(audio.track_number)
```
### Artwork Handling
**Fetching from API:**
```python
import requests
# Spotify example
track = spotify_api.get_track("3n3Ppam7vgaVa1iaRUc9Lp")
artwork_url = track["album"]["images"][0]["url"] # Largest image
artwork_data = requests.get(artwork_url).content
# TIDAL example
track = tidal_api.get_track(12345678)
cover_id = track["album"]["cover"].replace("-", "/")
artwork_url = f"https://resources.tidal.com/images/{cover_id}/1280x1280.jpg"
artwork_data = requests.get(artwork_url).content
```
**Embedding in File:**
```python
audio = Audio("track.flac")
audio.artwork = artwork_data # bytes
audio.write_metadata()
```
**Image Formats:** JPEG and PNG supported by all tag formats. JPEG preferred for smaller file size.
**Size Considerations:** Large artwork (>1MB) significantly increases file size. Recommendation: 600x600 to 1200x1200 pixels, JPEG quality 85-90%.
## Data Flow
### API Response to Audio File
**Complete Workflow:**
```python
from minim import spotify
from minim.audio import Audio
# 1. Authenticate
api = spotify.WebAPI(client_id="...", client_secret="...")
api.set_flow("client_credentials")
api.set_access_token()
# 2. Search for track
results = api.search("Radiohead Creep", types=["track"], limit=1)
track = results["tracks"]["items"][0]
# 3. Load audio file
audio = Audio("track.flac")
# 4. Map API response to metadata
audio.set_metadata_using_spotify(track)
# 5. Write to file
audio.write_metadata()
```
**Data Transformations:**
**Step 4 (Mapping):**
```python
def set_metadata_using_spotify(self, track_data: dict):
# Direct mappings
self.title = track_data["name"]
self.album = track_data["album"]["name"]
self.date = track_data["album"]["release_date"]
self.track_number = track_data["track_number"]
self.disc_number = track_data["disc_number"]
# Array to string
self.artist = ", ".join(a["name"] for a in track_data["artists"])
# Nested object
self.isrc = track_data.get("external_ids", {}).get("isrc")
# Fetch external resource
if track_data["album"]["images"]:
artwork_url = track_data["album"]["images"][0]["url"]
self.artwork = requests.get(artwork_url).content
```
**Step 5 (Writing):**
```python
# FLAC implementation
def write_metadata(self):
self._file["TITLE"] = self.title
self._file["ARTIST"] = self.artist
self._file["ALBUM"] = self.album
self._file["DATE"] = self.date
self._file["TRACKNUMBER"] = str(self.track_number)
self._file["DISCNUMBER"] = str(self.disc_number)
if self.isrc:
self._file["ISRC"] = self.isrc
if self.artwork:
picture = mutagen.flac.Picture()
picture.data = self.artwork
picture.type = 3
picture.mime = "image/jpeg"
self._file.add_picture(picture)
self._file.save()
```
### Service-Specific Normalization
**Artist Handling:**
**Spotify (array of objects):**
```json
{
"artists": [
{"name": "Radiohead", "id": "4Z8W4fKeB5YxbusRsdQVPb"},
{"name": "Thom Yorke", "id": "3WrFJ7ztbogyGnTHbHJFl2"}
]
}
```
**Normalization:** `", ".join(a["name"] for a in artists)``"Radiohead, Thom Yorke"`
**TIDAL (array of objects):**
```json
{
"artists": [
{"name": "Radiohead", "id": 4050}
]
}
```
**Normalization:** Same as Spotify.
**iTunes (string):**
```json
{
"artistName": "Radiohead"
}
```
**Normalization:** Direct assignment.
**Qobuz (object):**
```json
{
"performer": {"name": "Radiohead", "id": 12345}
}
```
**Normalization:** `performer["name"]`
**Date Handling:**
**Spotify:**
- Full date: `"2023-01-15"``"2023-01-15"`
- Year only: `"2023"``"2023"`
- Month precision: `"2023-01"``"2023-01"`
**TIDAL:**
- ISO 8601 with time: `"2023-01-15T00:00:00.000Z"``"2023-01-15"` (strip time)
**iTunes:**
- ISO 8601: `"2023-01-15T00:00:00Z"``"2023-01-15"`
**Qobuz:**
- Unix timestamp: `1673740800``datetime.fromtimestamp(1673740800).strftime("%Y-%m-%d")`
- ISO 8601: `"2023-01-15"``"2023-01-15"`
**Track/Disc Number Handling:**
**Spotify:**
```json
{
"track_number": 3,
"disc_number": 1
}
```
**Normalization:** Direct assignment.
**TIDAL:**
```json
{
"trackNumber": 3,
"volumeNumber": 1
}
```
**Normalization:** `track_number = trackNumber`, `disc_number = volumeNumber`
**iTunes:**
```json
{
"trackNumber": 3,
"trackCount": 12
}
```
**Normalization:** `track_number = trackNumber` (ignore `trackCount`)
**Qobuz:**
```json
{
"track_number": 3,
"media_number": 1
}
```
**Normalization:** Direct assignment.
## Format Conversion
### FFmpeg Integration
**Conversion Workflow:**
```python
audio = Audio("track.flac")
# Convert to MP3
mp3_audio = audio.convert("track.mp3", "mp3", bitrate="320k")
# Convert to AAC
m4a_audio = audio.convert("track.m4a", "m4a", bitrate="256k")
# Convert to Ogg Vorbis
ogg_audio = audio.convert("track.ogg", "ogg", quality=10)
```
**FFmpeg Command Construction:**
```python
def convert(self, output_path: str, format: str, **options):
cmd = ["ffmpeg", "-i", self.filepath]
# Codec selection
codec_map = {
"flac": "flac",
"mp3": "libmp3lame",
"m4a": "aac",
"ogg": "libvorbis",
"wav": "pcm_s16le"
}
cmd.extend(["-c:a", codec_map[format]])
# Options
if "bitrate" in options:
cmd.extend(["-b:a", options["bitrate"]])
if "quality" in options:
cmd.extend(["-q:a", str(options["quality"])])
if "sample_rate" in options:
cmd.extend(["-ar", str(options["sample_rate"])])
cmd.append(output_path)
subprocess.run(cmd, check=True)
```
**Metadata Preservation:**
```python
# After conversion, copy metadata
converted = Audio(output_path)
converted.title = self.title
converted.artist = self.artist
converted.album = self.album
# ... copy all fields
converted.artwork = self.artwork
converted.write_metadata()
```
**Lossy to Lossless:** Converting lossy formats (MP3, AAC) to lossless (FLAC) does not improve quality. The conversion is technically lossless but the source is already lossy.
**Lossless to Lossy:** Converting FLAC to MP3/AAC reduces file size but loses audio information. Irreversible.
## Data Validation
**No Validation:** minim does not validate metadata before writing to files.
**Potential Issues:**
- Invalid dates (e.g., `"2023-13-45"`) written as-is
- Track numbers exceeding album track count
- Non-numeric values in numeric fields
- Oversized artwork (multi-megabyte images)
**Recommendation:** Implement validation layer:
```python
def validate_metadata(audio: Audio):
# Date validation
if audio.date:
try:
datetime.strptime(audio.date, "%Y-%m-%d")
except ValueError:
# Try year-only format
try:
datetime.strptime(audio.date, "%Y")
except ValueError:
raise ValueError(f"Invalid date format: {audio.date}")
# Track number validation
if audio.track_number and audio.track_number < 1:
raise ValueError(f"Invalid track number: {audio.track_number}")
# Artwork size validation
if audio.artwork and len(audio.artwork) > 2 * 1024 * 1024: # 2MB
warnings.warn(f"Large artwork: {len(audio.artwork)} bytes")
```
## Data Retention
**Token Expiration:** Access tokens expire (typically 1 hour for OAuth 2.0). Refresh tokens used to obtain new access tokens without re-authentication.
**Token Cleanup:** Expired tokens remain in `~/minim.cfg` indefinitely. No automatic cleanup.
**Audio Metadata:** Persists in files until overwritten or file deleted.
**API Response Caching:** Not implemented. Every request hits the API.
## Data Privacy
**Sensitive Data in Config File:**
- User passwords (Qobuz)
- Access tokens (all services)
- Refresh tokens (OAuth 2.0 services)
- User IDs and email addresses
**Exposure Risks:**
- Backup systems may copy `~/minim.cfg` to cloud storage
- Version control systems may accidentally commit config file
- Malware can read tokens and impersonate user
**Recommendations:**
1. Add `~/minim.cfg` to `.gitignore`
2. Exclude from cloud backup or encrypt backups
3. Use environment variables for CI/CD
4. Rotate tokens regularly
5. Revoke tokens when no longer needed
## Summary
minim's data management is minimal and file-based:
- **No database:** All data is ephemeral or file-based
- **Token storage:** Plain text INI file at `~/minim.cfg`
- **Audio metadata:** Written to file tags via mutagen
- **No caching:** API responses not persisted
- **No validation:** Metadata written as-is without checks
This approach is simple and suitable for personal use but lacks security and robustness for production systems. The v2 rewrite addresses security concerns with OS keychain integration and adds validation layers.
For a metadata aggregator project, consider:
- Secure credential storage (OS keychain, secrets manager)
- Database for caching API responses (reduce API calls)
- Metadata validation before writing to files
- Audit logging for data access and modifications
+703
View File
@@ -0,0 +1,703 @@
# minim: Deployment
## Deployment Model
minim is a **Python library**, not a deployable service. There is no server, no daemon, no container to deploy. Users install the library and import it into their own Python code.
**Installation Target:** Developer workstations, scripts, Jupyter notebooks, personal automation tools.
**Not Applicable:** Production web servers, cloud deployments, containerized services, serverless functions.
## Installation Methods
### From Source (Current)
**Clone Repository:**
```bash
git clone https://github.com/bbye98/minim.git
cd minim
```
**Install in Development Mode:**
```bash
python -m pip install -e .
```
**Install in Production Mode:**
```bash
python -m pip install .
```
**Editable Install (`-e`):**
- Changes to source code immediately reflected without reinstalling
- Useful for development and testing
- Creates symlink to source directory
**Production Install:**
- Copies files to site-packages
- Requires reinstall after code changes
- Cleaner for end users
### Via Conda
**Environment File:**
```yaml
# environment.yml
name: minim
channels:
- conda-forge
- defaults
dependencies:
- python>=3.9
- cryptography
- mutagen
- requests
- pip
- pip:
- -e .
```
**Create Environment:**
```bash
conda env create -f environment.yml
conda activate minim
```
**Update Environment:**
```bash
conda env update -f environment.yml
```
### Via PyPI (Planned for v2)
**Not Yet Available.** minim is not published to PyPI as of v1.1.0.
**Planned for v2:**
```bash
pip install minim
```
**Package Metadata (setup.py):**
```python
from setuptools import setup, find_packages
setup(
name="minim",
version="1.1.0",
author="Benjamin Ye",
author_email="bbye98@gmail.com",
description="Comprehensive music metadata library",
long_description=open("README.md").read(),
long_description_content_type="text/markdown",
url="https://github.com/bbye98/minim",
packages=find_packages(),
classifiers=[
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
],
python_requires=">=3.9",
install_requires=[
"cryptography",
"mutagen",
"requests",
],
extras_require={
"full": [
"ffmpeg-python",
"flask",
"levenshtein",
"numpy",
"pillow",
"playwright",
],
},
)
```
## Dependencies
### Required (Core)
**cryptography:**
- Purpose: TIDAL manifest decryption, secure token handling
- Version: Not pinned (latest compatible)
- Install: `pip install cryptography`
**mutagen:**
- Purpose: Audio file metadata reading/writing
- Version: Not pinned
- Install: `pip install mutagen`
**requests:**
- Purpose: HTTP client for all API calls
- Version: Not pinned
- Install: `pip install requests`
### Optional (Features)
**ffmpeg:**
- Purpose: Audio format conversion
- Type: System binary (not Python package)
- Install: `apt install ffmpeg` (Ubuntu), `brew install ffmpeg` (macOS), download from ffmpeg.org (Windows)
- Detection: `shutil.which("ffmpeg")`
**flask:**
- Purpose: OAuth callback server (alternative to http.server)
- Install: `pip install flask`
**levenshtein:**
- Purpose: Fuzzy string matching for search results
- Install: `pip install levenshtein`
**numpy:**
- Purpose: Audio analysis features
- Install: `pip install numpy`
**pillow:**
- Purpose: Image processing for album artwork
- Install: `pip install pillow`
**playwright:**
- Purpose: Browser automation for OAuth flows
- Install: `pip install playwright && playwright install chromium`
### Dependency Management
**No Lock File:** minim does not use `requirements.txt` or `Pipfile.lock` for version pinning.
**Version Constraints:** None specified. Uses latest compatible versions.
**Risk:** Dependency updates may introduce breaking changes.
**Recommendation for Production:**
```bash
# Generate lock file
pip freeze > requirements.txt
# Install from lock file
pip install -r requirements.txt
```
## System Requirements
### Python Version
**Minimum:** Python 3.9
**Tested:** Python 3.9, 3.10, 3.11
**Recommended:** Python 3.11 (latest stable)
**Version-Specific Features:**
- Type hints (PEP 585): `list[str]` instead of `List[str]` (requires 3.9+)
- Union operator: `str | None` instead of `Optional[str]` (requires 3.10+, not used in v1)
### Operating Systems
**Supported:**
- Linux (Ubuntu 20.04+, Debian 11+, Fedora 35+, Arch)
- macOS (10.15 Catalina+)
- Windows (10, 11)
**Tested in CI:** Ubuntu 22.04 (GitHub Actions)
**Platform-Specific Considerations:**
**Linux:**
- FFmpeg available via package manager (`apt`, `dnf`, `pacman`)
- Config file at `~/.minim.cfg` or `/home/username/minim.cfg`
**macOS:**
- FFmpeg via Homebrew (`brew install ffmpeg`)
- Config file at `~/minim.cfg` or `/Users/username/minim.cfg`
**Windows:**
- FFmpeg requires manual download and PATH configuration
- Config file at `C:\Users\username\minim.cfg`
- Path handling uses `os.path.expanduser("~")` (cross-platform)
### External Dependencies
**FFmpeg (Optional):**
- Required for audio format conversion
- Not required for metadata reading/writing or API access
- Version: 4.0+ recommended
**Browser (Optional):**
- Required for OAuth flows using Playwright
- Chromium installed via `playwright install chromium`
- Not required for http.server or Flask callback methods
## Configuration
### Environment Variables
minim checks environment variables for credentials:
**Discogs:**
- `DISCOGS_CONSUMER_KEY`
- `DISCOGS_CONSUMER_SECRET`
- `DISCOGS_ACCESS_TOKEN`
- `DISCOGS_ACCESS_TOKEN_SECRET`
- `DISCOGS_PERSONAL_ACCESS_TOKEN`
**Qobuz:**
- `QOBUZ_APP_ID`
- `QOBUZ_APP_SECRET`
- `QOBUZ_EMAIL`
- `QOBUZ_PASSWORD`
**Spotify:**
- `SPOTIFY_CLIENT_ID`
- `SPOTIFY_CLIENT_SECRET`
- `SPOTIFY_REDIRECT_URI`
**TIDAL:**
- `TIDAL_CLIENT_ID`
- `TIDAL_CLIENT_SECRET`
- `TIDAL_REDIRECT_URI`
**Precedence:** Environment variables > config file > constructor parameters
**Use Case:** CI/CD pipelines, containerized environments, shared systems
**Example:**
```bash
export SPOTIFY_CLIENT_ID="abc123"
export SPOTIFY_CLIENT_SECRET="def456"
python script.py # Automatically uses environment variables
```
### Config File
**Location:** `~/minim.cfg`
**Format:** INI-style (ConfigParser)
**Auto-Creation:** Created automatically when tokens are saved via `set_access_token()`
**Manual Creation:**
```ini
[spotify]
client_id = abc123
client_secret = def456
access_token = BQC...
refresh_token = AQD...
expires_at = 1672531200
```
**Permissions:** Default (0644 on Unix). Recommendation: `chmod 600 ~/minim.cfg` for security.
## CI/CD
### GitHub Actions
**Workflow File:** `.github/workflows/ci.yml`
```yaml
name: CI
on:
push:
branches: [main, dev]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: Install FFmpeg
run: sudo apt-get update && sudo apt-get install -y ffmpeg
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .
pip install pytest ruff
- name: Lint with ruff
run: ruff check .
- name: Run tests
env:
SPOTIFY_CLIENT_ID: ${{ secrets.SPOTIFY_CLIENT_ID }}
SPOTIFY_CLIENT_SECRET: ${{ secrets.SPOTIFY_CLIENT_SECRET }}
TIDAL_CLIENT_ID: ${{ secrets.TIDAL_CLIENT_ID }}
TIDAL_CLIENT_SECRET: ${{ secrets.TIDAL_CLIENT_SECRET }}
run: pytest tests/
```
**Secrets Management:**
- API credentials stored in GitHub Secrets
- Accessed via `${{ secrets.SECRET_NAME }}`
- Not exposed in logs
**Test Execution:**
- Real API calls (not mocked)
- Requires valid credentials
- May fail if rate limits exceeded or services change APIs
**Linting:**
- `ruff`: Fast Python linter (replaces flake8, pylint)
- Configuration in `pyproject.toml` or `ruff.toml`
### Coverage
**Tool:** `coverage.py`
**Configuration:** `.coveragerc`
```ini
[run]
source = minim
omit =
*/tests/*
*/__init__.py
*/site-packages/*
[report]
exclude_lines =
pragma: no cover
def __repr__
raise AssertionError
raise NotImplementedError
if __name__ == .__main__.:
```
**Execution:**
```bash
coverage run -m pytest tests/
coverage report
coverage html # Generate HTML report
```
**Current Coverage:** Not documented in repository. Likely 60-80% based on test file count.
## Documentation
### ReadTheDocs
**URL:** https://minim.readthedocs.io
**Build System:** Sphinx
**Configuration:** `docs/conf.py`
```python
project = 'minim'
author = 'Benjamin Ye'
release = '1.1.0'
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
]
html_theme = 'sphinx_rtd_theme'
```
**Auto-Deploy:**
- Triggered on push to `main` branch
- Builds from `docs/` directory
- Parses docstrings from source code
**Docstring Format:** Google-style
```python
def search(self, query: str, types: list[str] = ["track"]) -> dict:
"""
Search Spotify catalog.
Args:
query: Search query string
types: Result types (track, album, artist, playlist)
Returns:
Dict with type-specific results arrays
Raises:
RuntimeError: If API request fails
Example:
>>> api = WebAPI(client_id="...", client_secret="...")
>>> results = api.search("Radiohead", types=["artist"])
>>> print(results["artists"]["items"][0]["name"])
Radiohead
"""
```
### Local Documentation Build
```bash
cd docs
pip install sphinx sphinx_rtd_theme
make html
open _build/html/index.html
```
## Versioning
**Scheme:** Semantic Versioning (SemVer)
**Format:** `MAJOR.MINOR.PATCH`
**Current Version:** 1.1.0
**Version History:**
- 1.0.0: Initial release
- 1.1.0: Bug fixes, minor feature additions
**Version Location:** `minim/__init__.py`
```python
__version__ = "1.1.0"
```
**Git Tags:**
```bash
git tag v1.1.0
git push origin v1.1.0
```
## Release Process
**Current (Manual):**
1. Update version in `minim/__init__.py`
2. Update `CHANGELOG.md` (if exists)
3. Commit changes: `git commit -m "Bump version to 1.1.0"`
4. Create tag: `git tag v1.1.0`
5. Push: `git push origin main --tags`
6. GitHub automatically triggers ReadTheDocs build
**Planned for v2 (Automated):**
1. Create release branch: `git checkout -b release/1.2.0`
2. Update version and changelog
3. Open pull request
4. Merge to main
5. GitHub Actions workflow:
- Run tests
- Build package: `python -m build`
- Publish to PyPI: `twine upload dist/*`
- Create GitHub release with changelog
- Trigger ReadTheDocs build
## Distribution Channels
### Current
**GitHub Releases:**
- Source code archives (`.tar.gz`, `.zip`)
- No pre-built binaries
- Download: https://github.com/bbye98/minim/releases
**ReadTheDocs:**
- Documentation only
- No package distribution
### Planned (v2)
**PyPI:**
- `pip install minim`
- Versioned releases
- Automatic dependency resolution
**Conda-Forge:**
- `conda install -c conda-forge minim`
- Cross-platform binaries
- Dependency management via conda
## Containerization
**Not Applicable:** minim is a library, not a service.
**Hypothetical Use Case:** Containerized script using minim
**Dockerfile:**
```dockerfile
FROM python:3.11-slim
# Install FFmpeg
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
# Install minim
WORKDIR /app
COPY . .
RUN pip install --no-cache-dir -e .
# Run script
CMD ["python", "script.py"]
```
**Docker Compose:**
```yaml
version: '3.8'
services:
minim-script:
build: .
environment:
- SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
- SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
volumes:
- ./audio:/app/audio
- ./config:/root/.minim.cfg
```
## Monitoring and Logging
**Not Applicable:** minim is a library. Monitoring and logging are the responsibility of the calling application.
**Library Behavior:**
- No built-in logging (uses `warnings` module for non-critical issues)
- Errors raised as exceptions (caller handles logging)
- No metrics, no telemetry, no health checks
**Caller Responsibility:**
```python
import logging
from minim import spotify
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Use minim with logging
try:
api = spotify.WebAPI(client_id="...", client_secret="...")
api.set_flow("client_credentials")
api.set_access_token()
results = api.search("Radiohead", types=["artist"])
logger.info(f"Found {len(results['artists']['items'])} artists")
except RuntimeError as e:
logger.error(f"API error: {e}")
except Exception as e:
logger.exception(f"Unexpected error: {e}")
```
## Security Considerations
### Credential Storage
**Risk:** Plain text tokens in `~/minim.cfg`
**Mitigation (Not Implemented):**
- Encrypt config file
- Use OS keychain (Keyring library)
- Use environment variables only
- Set restrictive file permissions (`chmod 600`)
### Dependency Vulnerabilities
**Risk:** Outdated dependencies with known CVEs
**Mitigation:**
```bash
# Scan for vulnerabilities
pip install safety
safety check
# Update dependencies
pip install --upgrade cryptography mutagen requests
```
### API Key Exposure
**Risk:** Hardcoded credentials in scripts
**Mitigation:**
- Use environment variables
- Use config file outside version control
- Add `minim.cfg` to `.gitignore`
### Private API Usage
**Risk:** Terms of service violations (Qobuz, TIDAL, Spotify lyrics)
**Mitigation:**
- Use only public APIs in production
- Document risks in README
- Obtain official API access if possible
## Scalability
**Not Applicable:** minim is a library for personal use, not a scalable service.
**Limitations:**
- Synchronous, blocking operations
- No connection pooling
- No rate limiting
- No caching
- Single-threaded
**For High-Volume Use:**
- Implement async version using `aiohttp`
- Add connection pooling
- Implement rate limiting and backoff
- Cache API responses (Redis, Memcached)
- Use task queue (Celery, RQ) for background processing
## Backup and Recovery
**Config File Backup:**
```bash
# Backup
cp ~/minim.cfg ~/minim.cfg.backup
# Restore
cp ~/minim.cfg.backup ~/minim.cfg
```
**Recommendation:** Exclude from cloud backup (contains sensitive tokens) or encrypt backups.
## Maintenance
**Current Status:** v1 in maintenance mode
**Maintenance Activities:**
- Bug fixes for critical issues
- Security updates for dependencies
- No new features
**Active Development:** v2 rewrite on `dev` branch
**Support Channels:**
- GitHub Issues: https://github.com/bbye98/minim/issues
- GitHub Discussions: https://github.com/bbye98/minim/discussions
## Summary
minim deployment is straightforward:
1. **Install from source:** `git clone` + `pip install -e .`
2. **Configure credentials:** Environment variables or `~/minim.cfg`
3. **Import and use:** `from minim import spotify`
No server deployment, no containers, no orchestration. The library runs in the caller's Python process.
For production use cases requiring scalability, security, and robustness, consider:
- Wrapping minim in a web service (Flask, FastAPI)
- Implementing async operations
- Adding rate limiting and caching
- Using secure credential storage
- Monitoring and logging
The v2 rewrite addresses many deployment concerns (PyPI publication, async support, secure storage) while maintaining the simple library architecture.
+735
View File
@@ -0,0 +1,735 @@
# minim: Evaluation
## Executive Summary
minim is a comprehensive Python library for music service API integration and audio metadata management. It excels at providing unified access to five major streaming platforms with automatic authentication handling and metadata normalization. The codebase demonstrates solid engineering practices but shows limitations for production use.
**Overall Assessment:** Excellent reference implementation for personal projects and research. Requires hardening for commercial or large-scale deployment.
**Recommendation:** Use as-is for personal projects. Extract patterns and adapt (respecting GPL-3.0) for production systems. Monitor v2 development for production-ready features.
## Strengths
### 1. Comprehensive API Coverage
**Five Services Integrated:**
- Discogs: Database, marketplace, collection, wantlist
- iTunes: Public search and lookup
- Qobuz: High-resolution streaming and downloads
- Spotify: Full Web API, playback control, audio features, lyrics
- TIDAL: High-fidelity streaming, lyrics, credits
**Depth of Coverage:**
- Spotify: 30+ endpoints, 4 OAuth flows, audio features, playback control
- TIDAL: Public + private API, streaming URLs, lyrics, credits
- Discogs: Database, collection, wantlist CRUD operations
- Qobuz: Catalog, streaming, playlists, favorites
- iTunes: Search and lookup (complete public API)
**Comparison:** Most music libraries focus on one or two services. minim provides unified access to five, covering both metadata and streaming.
### 2. Unified Authentication Pattern
**Consistent Flow Across Services:**
1. Initialize with credentials (parameters, env vars, or config file)
2. Set OAuth flow type
3. Obtain access token (automatic browser flow or manual)
4. Automatic token refresh and persistence
**Example:**
```python
# Same pattern for all services
api = spotify.WebAPI(client_id="...", client_secret="...")
api.set_flow("authorization_code", scopes=["user-library-read"])
api.set_access_token() # Opens browser, handles callback, saves token
# Token automatically refreshed on expiration
results = api.search("Radiohead") # Just works
```
**Benefit:** Users learn one authentication pattern, apply to all services.
### 3. Automatic Token Management
**Features:**
- Token caching to `~/minim.cfg`
- Automatic refresh on expiration
- Transparent to caller (no manual token handling)
- Persistent across sessions
**Implementation:**
```python
def _request(self, method, url, **kwargs):
# Check expiration
if self.expires_at and time.time() >= self.expires_at:
self._refresh_access_token()
# Make request
response = requests.request(method, url, headers=self._get_headers(), **kwargs)
# Handle 401 (token invalid)
if response.status_code == 401:
self._refresh_access_token()
response = requests.request(method, url, headers=self._get_headers(), **kwargs)
return response
```
**Benefit:** Users don't need to implement token refresh logic. It just works.
### 4. Audio Metadata Integration
**Direct API-to-File Mapping:**
```python
# Fetch metadata from Spotify
track = spotify_api.get_track("3n3Ppam7vgaVa1iaRUc9Lp")
# Load audio file
audio = Audio("track.flac")
# Map and write metadata
audio.set_metadata_using_spotify(track)
audio.write_metadata()
```
**Normalization Across Services:**
- Handles different field names (artist vs. performer vs. artistName)
- Normalizes date formats (ISO 8601, Unix timestamp, year-only)
- Converts arrays to strings (multiple artists)
- Fetches external resources (artwork URLs)
**Format Support:**
- FLAC (Vorbis Comments)
- MP3 (ID3v2)
- MP4/M4A (MP4 atoms)
- Ogg Vorbis (Vorbis Comments)
- WAVE (ID3v2)
**Benefit:** Single interface for metadata management across formats and services.
### 5. Multiple OAuth Callback Methods
**Three Options:**
**1. http.server (default):**
- No dependencies
- Works on any system with browser
- Simple implementation
**2. Flask:**
- Better error handling
- Customizable callback page
- Requires Flask dependency
**3. Playwright:**
- Fully automated (no manual login)
- Works in headless environments
- Handles complex login flows (CAPTCHA, 2FA)
- Requires Playwright dependency
**Flexibility:** Users choose method based on environment (desktop, server, CI/CD).
### 6. Pure Python Implementation
**Minimal Dependencies:**
- Core: cryptography, mutagen, requests (3 packages)
- Optional: ffmpeg, flask, levenshtein, numpy, pillow, playwright (6 packages)
**No Native Extensions:** All Python code, no C extensions, no compilation required.
**Benefits:**
- Easy to install (`pip install -e .`)
- Cross-platform (Linux, macOS, Windows)
- Easy to modify and debug
- No build toolchain required
### 7. Good Test Coverage
**Test Infrastructure:**
- pytest framework
- 6 test files (one per module)
- Class-based tests with shared setup
- Real API calls (not mocked)
- CI/CD with GitHub Actions
**Coverage:**
- Estimated 60-80% based on test file count
- Tests cover authentication, search, retrieval, metadata mapping
- Tests verify actual API behavior (catches breaking changes)
**Benefit:** High confidence in functionality. Tests serve as usage examples.
### 8. Comprehensive Documentation
**ReadTheDocs:**
- Auto-generated from docstrings
- API reference for all modules
- Usage examples
- Auto-deployed on push
**Docstrings:**
- Google-style format
- Parameters, return values, exceptions documented
- Usage examples in docstrings
**README:**
- Installation instructions
- Quick start guide
- Feature overview
- License information
**Benefit:** Users can learn the library without reading source code.
## Weaknesses
### 1. GPL-3.0 License (Copyleft)
**Implications:**
- Derivative works must be GPL-3.0
- Cannot be used in proprietary software without releasing source
- Incompatible with some commercial licenses (Apache 2.0, MIT)
**Impact:**
- Limits adoption in commercial projects
- Requires legal review for corporate use
- Cannot be combined with non-GPL libraries in some cases
**Comparison:** Most Python libraries use permissive licenses (MIT, Apache 2.0, BSD).
**Recommendation:** Consider dual licensing (GPL-3.0 + commercial license) or relicensing to LGPL-3.0 (allows use in proprietary software).
### 2. Not Published to PyPI
**Current Installation:**
```bash
git clone https://github.com/bbye98/minim.git
cd minim
pip install -e .
```
**Impact:**
- Harder to discover (not searchable on PyPI)
- No version pinning (`pip install minim==1.1.0`)
- No automatic dependency resolution
- Requires git and manual cloning
**Comparison:** Most Python libraries are on PyPI (`pip install library-name`).
**Status:** Planned for v2.
### 3. v1 in Maintenance Mode
**Current Status:**
- Bug fixes only
- No new features
- Active development on v2 (dev branch)
**Impact:**
- New features delayed until v2 release
- Users must wait for v2 or fork v1
- Uncertainty about v2 timeline
**Recommendation:** Communicate v2 roadmap and timeline clearly.
### 4. Plain Text Token Storage
**Security Issue:**
```ini
# ~/minim.cfg (plain text)
[qobuz]
email = user@example.com
password = MyPassword123
access_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
```
**Risks:**
- Passwords readable by any process running as user
- Tokens exposed in backups
- Accidental commit to version control
- Malware can steal credentials
**Impact:**
- Unsuitable for shared systems
- Unsuitable for production deployments
- Security audit failure
**Mitigation (Not Implemented):**
- OS keychain integration (Keyring library)
- Encryption of config file
- Environment variables only (no file storage)
**Status:** Planned for v2 (OS keychain integration).
### 5. Private API Dependency
**Services Using Private APIs:**
- Qobuz: App ID/secret extraction, all endpoints undocumented
- Spotify: Lyrics via Musixmatch integration (undocumented)
- TIDAL: Streaming URLs, lyrics, credits (undocumented)
**Risks:**
- APIs can break without notice
- Terms of service violations
- Account suspension risk
- Legal liability
**Impact:**
- Unreliable for production use
- Requires monitoring for breaking changes
- Cannot be used in commercial products
**Recommendation:** Use only public APIs in production. Document private API risks clearly.
### 6. No Rate Limiting
**Problem:**
```python
# No rate limiting
for track_id in track_ids: # 1000 tracks
track = api.get_track(track_id) # Exceeds rate limit
```
**Impact:**
- Easy to exceed service rate limits
- HTTP 429 errors (Too Many Requests)
- Temporary or permanent account blocks
- Tests fail due to rate limiting
**Comparison:** Most API libraries implement rate limiting (e.g., `ratelimit`, `pyrate-limiter`).
**Recommendation:** Implement rate limiter with configurable limits per service.
### 7. Generic Error Handling
**Problem:**
```python
# All errors are RuntimeError
try:
track = api.get_track("invalid_id")
except RuntimeError as e:
# Must parse error message to determine cause
if "404" in str(e):
print("Not found")
elif "401" in str(e):
print("Unauthorized")
```
**Impact:**
- No structured error handling
- Difficult to distinguish error types
- Cannot catch specific errors (404, 401, 429)
- Error messages not machine-readable
**Comparison:** Modern libraries use typed exceptions (e.g., `requests.HTTPError`, `spotipy.SpotifyException`).
**Recommendation:** Define exception hierarchy:
```python
class MinimError(Exception): pass
class APIError(MinimError): pass
class NotFoundError(APIError): pass
class AuthenticationError(APIError): pass
class RateLimitError(APIError): pass
```
### 8. Large Monolithic Files
**Problem:**
- `tidal.py`: 12,338 lines (34% of codebase)
- `spotify.py`: 9,862 lines (27% of codebase)
**Impact:**
- Difficult to navigate
- Slow to load in editors
- Hard to maintain
- Merge conflicts more likely
- Intimidating for contributors
**Comparison:** Well-structured libraries split modules at 500-1000 lines.
**Recommendation:** Split into subpackages:
```
minim/tidal/
├── __init__.py
├── auth.py
├── catalog.py
├── streaming.py
├── lyrics.py
└── user.py
```
### 9. No Async Support
**Problem:**
```python
# Synchronous, blocking
for track_id in track_ids: # 100 tracks
track = api.get_track(track_id) # 100 sequential requests
# Total time: 100 * 200ms = 20 seconds
```
**Impact:**
- Slow for bulk operations
- Cannot leverage async/await
- Blocks event loop in async applications
- Poor performance for high-volume use
**Comparison:** Modern libraries provide async versions (e.g., `aiohttp`, `httpx`).
**Recommendation:** Implement async API clients:
```python
async def get_track(self, track_id: str) -> dict:
async with aiohttp.ClientSession() as session:
async with session.get(f"{self.base_url}/tracks/{track_id}") as response:
return await response.json()
# Usage
tracks = await asyncio.gather(*[api.get_track(id) for id in track_ids])
# Total time: ~200ms (parallel requests)
```
**Status:** Planned for v2.
### 10. No Caching
**Problem:**
```python
# Same request made multiple times
track1 = api.get_track("123") # API call
track2 = api.get_track("123") # API call (duplicate)
```
**Impact:**
- Wastes API quota
- Slower performance
- Higher rate limit usage
- Increased latency
**Comparison:** Libraries like `requests-cache` provide transparent caching.
**Recommendation:** Implement caching layer:
```python
from functools import lru_cache
@lru_cache(maxsize=1000)
def get_track(self, track_id: str) -> dict:
return self._request("GET", f"/tracks/{track_id}")
```
Or use external cache (Redis, Memcached) for persistent caching.
## Integration Potential
### For Metadata Aggregator Project
**Highly Valuable:**
**1. OAuth Implementation Reference:**
- Authorization Code flow (Spotify, TIDAL)
- PKCE flow (TIDAL, Spotify)
- Client Credentials flow (Spotify)
- OAuth 1.0a (Discogs)
- Password Grant (Qobuz)
**Reusable Patterns:**
- Token acquisition and refresh
- Callback server implementations
- Config file persistence
- Environment variable handling
**2. Token Management Pattern:**
- Automatic refresh on expiration
- Persistent storage (config file)
- Transparent to caller
- Multi-service support
**Adaptation:**
- Replace file storage with database or keychain
- Add encryption for sensitive fields
- Implement token rotation
**3. Metadata Normalization:**
- Service-specific to common schema mapping
- Field name translation
- Date format normalization
- Array to string conversion
- External resource fetching (artwork)
**Reusable Logic:**
- `set_metadata_using_*()` methods show field mappings
- Artwork URL construction (TIDAL)
- ISRC/UPC extraction
- Multi-artist handling
**4. Audio File Handling:**
- Format auto-detection
- Tag reading/writing across formats
- Metadata to tag mapping
- Artwork embedding
**Adaptation:**
- Extract audio module as standalone library
- Add validation layer
- Support additional formats (ALAC, WMA)
**5. API Request Patterns:**
- Base URL + path construction
- Query parameter handling
- Header injection (authentication)
- JSON response parsing
- Error handling
**Reusable Code:**
- `_request()` method structure
- `_get_headers()` pattern
- URL building logic
### Limitations for Production Use
**Must Address:**
1. **Security:** Replace plain text storage with encrypted storage or OS keychain
2. **Rate Limiting:** Implement per-service rate limiters with backoff
3. **Error Handling:** Define typed exception hierarchy
4. **Async Support:** Add async API clients for high-volume use
5. **Caching:** Implement response caching to reduce API calls
6. **Monitoring:** Add logging, metrics, and health checks
7. **Private APIs:** Replace with public APIs or obtain official access
**GPL-3.0 Compliance:**
- Cannot copy code directly into proprietary software
- Must release derivative works as GPL-3.0
- Consider clean-room reimplementation of patterns
- Or negotiate commercial license with author
## Comparison with Alternatives
### vs. Spotipy (Spotify-only)
**Spotipy Advantages:**
- Focused on Spotify (more comprehensive)
- MIT license (permissive)
- Published to PyPI
- Larger community
- More frequent updates
**minim Advantages:**
- Multi-service support (5 services vs. 1)
- Audio file integration
- Unified authentication pattern
- Lyrics support (private API)
**Verdict:** Use Spotipy for Spotify-only projects. Use minim for multi-service integration.
### vs. Tidalapi (TIDAL-only)
**Tidalapi Advantages:**
- Focused on TIDAL
- MIT license
- Published to PyPI
- Active development
**minim Advantages:**
- Multi-service support
- Audio file integration
- More comprehensive TIDAL coverage (lyrics, credits)
**Verdict:** Use Tidalapi for TIDAL-only projects. Use minim for multi-service integration.
### vs. Mutagen (Audio-only)
**Mutagen Advantages:**
- Focused on audio files
- GPL-2.0+ license
- Published to PyPI
- Mature and stable
- No API dependencies
**minim Advantages:**
- API integration
- Service-specific metadata mapping
- Unified interface for API + audio
**Verdict:** Use Mutagen for audio-only projects. Use minim for API + audio integration.
### vs. Custom Implementation
**Custom Implementation Advantages:**
- Full control
- License flexibility
- Optimized for specific use case
- No unnecessary features
**minim Advantages:**
- Faster development (ready-made)
- Tested and documented
- Multi-service support out of the box
- Community support (issues, discussions)
**Verdict:** Use minim for rapid prototyping and personal projects. Build custom for production systems with specific requirements.
## Use Case Suitability
### Excellent For:
1. **Personal Music Library Management:**
- Fetch metadata from streaming services
- Write to local audio files
- Sync playlists between services
- Download high-resolution tracks (within terms of service)
2. **Research and Prototyping:**
- Explore music service APIs
- Test metadata quality across services
- Compare audio features (Spotify)
- Analyze credits (TIDAL)
3. **Learning OAuth Flows:**
- Reference implementation for OAuth 2.0
- Multiple flow types demonstrated
- Callback server examples
- Token management patterns
4. **Audio Metadata Normalization:**
- Understand field mapping across services
- Learn tag format differences
- Artwork handling examples
### Acceptable For:
1. **Internal Tools:**
- Company music library management
- Playlist curation tools
- Metadata quality auditing
- With security hardening (keychain, rate limiting)
2. **Academic Projects:**
- Music information retrieval research
- Metadata analysis
- Audio feature extraction
- With proper attribution (GPL-3.0)
### Not Suitable For:
1. **Commercial Products:**
- GPL-3.0 license requires source release
- Private API usage violates terms of service
- Plain text token storage is security risk
- No SLA or support
2. **High-Volume Services:**
- No async support (slow for bulk operations)
- No rate limiting (will exceed limits)
- No caching (wastes API quota)
- No connection pooling
3. **Production Web Services:**
- Security vulnerabilities (plain text tokens)
- No monitoring or metrics
- No health checks
- Generic error handling
## Recommendations
### For Personal Use:
**Use as-is.** minim is production-ready for personal projects. Install from source, configure credentials, and start using.
**Best Practices:**
- Set restrictive permissions on config file (`chmod 600 ~/minim.cfg`)
- Use environment variables in shared environments
- Implement rate limiting in your code
- Monitor for API changes (especially private APIs)
### For Research:
**Excellent reference.** Study the code to understand:
- OAuth flow implementations
- API request patterns
- Metadata normalization strategies
- Audio file handling
**Extract Patterns:**
- Token management logic
- Service-specific field mappings
- Error handling approaches
- Testing strategies
### For Production:
**Do not use directly.** Instead:
1. **Extract Patterns:** Study authentication, request handling, metadata mapping
2. **Reimplement:** Build production-ready version with:
- Secure credential storage (OS keychain, secrets manager)
- Rate limiting and backoff
- Typed exceptions
- Async support
- Caching layer
- Monitoring and logging
3. **Use Public APIs Only:** Avoid private APIs (Qobuz, Spotify lyrics, TIDAL private)
4. **License Compliance:** Respect GPL-3.0 or negotiate commercial license
### For Contributors:
**Wait for v2.** Active development is on `dev` branch. Contributing to v1 (maintenance mode) has limited impact.
**v2 Improvements:**
- Async support
- Typed exceptions
- Rate limiting
- Secure storage
- PyPI publication
- Modular architecture
**Contribute to v2:**
- Review dev branch
- Test new features
- Report issues
- Submit pull requests
## Final Verdict
**Overall Rating: 8/10**
**Breakdown:**
- **Functionality:** 9/10 (comprehensive API coverage, audio integration)
- **Code Quality:** 7/10 (good structure, but large files and generic errors)
- **Documentation:** 9/10 (excellent docstrings and ReadTheDocs)
- **Security:** 4/10 (plain text tokens, private APIs)
- **Performance:** 6/10 (synchronous only, no caching, no rate limiting)
- **Maintainability:** 7/10 (good tests, but large monolithic files)
- **Usability:** 9/10 (simple API, automatic token management)
- **License:** 6/10 (GPL-3.0 limits commercial use)
**Strengths Summary:**
- Comprehensive multi-service integration
- Unified authentication pattern
- Automatic token management
- Audio metadata integration
- Good documentation and tests
**Weaknesses Summary:**
- GPL-3.0 license (copyleft)
- Plain text token storage
- Private API dependency
- No rate limiting
- Generic error handling
- Large monolithic files
- No async support
**Recommendation:**
- **Personal Projects:** Use as-is (8/10)
- **Research:** Excellent reference (9/10)
- **Production:** Extract patterns, reimplement (5/10 as-is, 8/10 adapted)
**Future Outlook:**
v2 development addresses most weaknesses (async, typed exceptions, rate limiting, secure storage, PyPI publication). Monitor dev branch for production-ready release.
**For Metadata Aggregator Project:**
minim is an invaluable reference for:
- OAuth implementations per service
- Token management patterns
- Metadata normalization strategies
- Audio file handling
Extract patterns and adapt (respecting GPL-3.0) rather than using directly. The authentication flows, field mappings, and request handling patterns are particularly valuable.
@@ -0,0 +1,922 @@
# minim: Service Integrations
## Overview
minim integrates with five music services, each with different authentication methods, API coverage, and capabilities:
| Service | Auth Method | API Type | Streaming | Lyrics | Credits | Rate Limit |
|---------|-------------|----------|-----------|--------|---------|------------|
| Discogs | OAuth 1.0a, Personal Token | Public | No | No | Yes | 60/min (auth), 25/min (unauth) |
| iTunes | None | Public | No | No | No | ~20/min |
| Qobuz | Password Grant | Private | Yes | No | No | Unknown |
| Spotify | OAuth 2.0 (4 flows) | Public + Private | No | Yes | No | 180/30sec |
| TIDAL | OAuth 2.0 (PKCE, Client Creds) | Public + Private | Yes | Yes | Yes | Unknown |
## Discogs Integration
### Service Overview
**Purpose:** Music database, marketplace, collection management
**Website:** https://www.discogs.com
**API Documentation:** https://www.discogs.com/developers
**API Type:** Public, documented, RESTful
### Authentication
**Method 1: OAuth 1.0a**
**Setup:**
1. Create app at https://www.discogs.com/settings/developers
2. Obtain consumer key and consumer secret
3. Implement OAuth 1.0a flow (request token → user authorization → access token)
**Implementation:**
```python
from minim import discogs
api = discogs.API(
consumer_key="Abcd1234Efgh5678",
consumer_secret="IjklMnopQrstUvwx"
)
# OAuth flow (opens browser)
api.set_access_token()
# Tokens saved to ~/minim.cfg
```
**Method 2: Personal Access Token**
**Setup:**
1. Generate token at https://www.discogs.com/settings/developers
2. Use token directly (no OAuth flow)
**Implementation:**
```python
api = discogs.API(personal_access_token="YourPersonalToken")
```
**Comparison:**
- OAuth 1.0a: Required for write operations (collection, wantlist), higher rate limit
- Personal Token: Read-only, simpler setup, same rate limit as OAuth
### API Coverage
**Database:**
- Search releases, artists, labels, masters
- Get detailed information (tracklist, credits, images, identifiers)
- Browse by genre, style, format, year, country
**Marketplace:**
- Search listings
- Get listing details (price, condition, seller)
- Not implemented in minim (read-only marketplace access)
**Collection:**
- Get user's collection folders
- List items in folder
- Add/remove releases
- Update notes and rating
**Wantlist:**
- Get user's wantlist
- Add/remove releases
- Not implemented: Update notes
**User:**
- Get user profile
- Get user submissions (releases, artists, labels)
- Not implemented in minim
### Data Model
**Release Object:**
```json
{
"id": 249504,
"title": "OK Computer",
"artists": [{"name": "Radiohead", "id": 3840}],
"labels": [{"name": "Parlophone", "catno": "7243 8 55229 2 5"}],
"formats": [{"name": "CD", "qty": "1", "descriptions": ["Album"]}],
"year": 1997,
"country": "UK",
"genres": ["Electronic", "Rock"],
"styles": ["Alternative Rock", "Experimental"],
"tracklist": [
{"position": "1", "title": "Airbag", "duration": "4:44"},
{"position": "2", "title": "Paranoid Android", "duration": "6:23"}
],
"identifiers": [
{"type": "Barcode", "value": "724385522925"}
],
"images": [
{"type": "primary", "uri": "https://...", "width": 600, "height": 600}
]
}
```
**Artist Object:**
```json
{
"id": 3840,
"name": "Radiohead",
"profile": "English rock band formed in 1985...",
"members": [
{"name": "Thom Yorke", "id": 239},
{"name": "Jonny Greenwood", "id": 240}
],
"urls": ["https://www.radiohead.com"],
"images": [...]
}
```
### Rate Limiting
**Authenticated:** 60 requests per minute
**Unauthenticated:** 25 requests per minute
**Headers:**
- `X-Discogs-Ratelimit`: Total requests allowed per minute
- `X-Discogs-Ratelimit-Remaining`: Requests remaining in current window
- `X-Discogs-Ratelimit-Used`: Requests used in current window
**Enforcement:** HTTP 429 (Too Many Requests) when limit exceeded.
**minim Implementation:** Does not check rate limit headers. Caller responsible for tracking.
### Use Cases
1. **Metadata Enrichment:** Get detailed release information (catalog numbers, barcodes, formats)
2. **Collection Management:** Sync local library with Discogs collection
3. **Credits Extraction:** Get producer, engineer, musician credits from tracklist
4. **Format Identification:** Determine pressing details (country, year, label, catalog number)
### Limitations
- No streaming or preview URLs
- No lyrics
- Marketplace write operations not implemented
- User-submitted data (quality varies)
- Rate limiting requires manual tracking
## iTunes Integration
### Service Overview
**Purpose:** Public music catalog search and lookup
**Website:** https://www.apple.com/itunes
**API Documentation:** https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/iTuneSearchAPI
**API Type:** Public, documented, RESTful
### Authentication
**None required.** iTunes Search API is completely public.
**Implementation:**
```python
from minim import itunes
api = itunes.SearchAPI()
results = api.search("Radiohead", media="music", entity="musicArtist")
```
### API Coverage
**Search:**
- Search by term across all media types
- Filter by media (music, movie, podcast, audiobook, etc.)
- Filter by entity (song, album, artist, etc.)
- Filter by attribute (artist name, album name, song name, etc.)
**Lookup:**
- Lookup by iTunes ID
- Lookup by UPC (album barcode)
- Lookup by ISBN (books)
- Lookup by AMG (All Music Guide) ID
**Not Available:**
- Streaming URLs
- Lyrics
- User library access
- Playlist management
### Data Model
**Track Object:**
```json
{
"trackId": 1109731797,
"trackName": "Creep",
"artistName": "Radiohead",
"collectionName": "Pablo Honey",
"collectionId": 1109731533,
"artistId": 657515,
"trackNumber": 2,
"trackCount": 12,
"discNumber": 1,
"discCount": 1,
"releaseDate": "1993-02-22T08:00:00Z",
"primaryGenreName": "Alternative",
"trackTimeMillis": 238640,
"country": "USA",
"isrc": "GBAYE9200070",
"artworkUrl30": "https://.../30x30bb.jpg",
"artworkUrl60": "https://.../60x60bb.jpg",
"artworkUrl100": "https://.../100x100bb.jpg",
"previewUrl": "https://.../preview.m4a",
"trackViewUrl": "https://music.apple.com/us/album/creep/1109731533?i=1109731797"
}
```
**Album Object:**
```json
{
"collectionId": 1109731533,
"collectionName": "Pablo Honey",
"artistName": "Radiohead",
"artistId": 657515,
"trackCount": 12,
"releaseDate": "1993-02-22T08:00:00Z",
"primaryGenreName": "Alternative",
"copyright": "℗ 1993 XL Recordings Ltd.",
"country": "USA",
"artworkUrl100": "https://.../100x100bb.jpg",
"collectionViewUrl": "https://music.apple.com/us/album/pablo-honey/1109731533"
}
```
### Rate Limiting
**Limit:** Approximately 20 requests per minute (undocumented)
**Enforcement:** HTTP 403 (Forbidden) when limit exceeded
**Headers:** No rate limit headers provided
**Recommendation:** Implement exponential backoff on 403 errors.
### Use Cases
1. **Quick Metadata Lookup:** Get basic track/album info without authentication
2. **UPC Lookup:** Find albums by barcode
3. **Preview URLs:** Get 30-second preview clips (M4A format)
4. **Artwork:** Get album artwork in multiple sizes (30x30, 60x60, 100x100)
### Limitations
- No high-resolution artwork (max 100x100 pixels, can be scaled to 600x600 by changing URL)
- No streaming URLs (only 30-second previews)
- No lyrics
- No user-specific data
- Rate limit is undocumented and may change
- Search results limited to 200 items
## Qobuz Integration
### Service Overview
**Purpose:** High-resolution music streaming and downloads
**Website:** https://www.qobuz.com
**API Documentation:** None (private API)
**API Type:** Private, undocumented, reverse-engineered
### Authentication
**Method:** Password Grant OAuth 2.0
**Setup:**
1. Qobuz account with active subscription
2. App ID and secret auto-extracted from web player JavaScript
3. Email and password for authentication
**Implementation:**
```python
from minim import qobuz
api = qobuz.PrivateAPI(
email="user@example.com",
password="YourPassword"
)
# Automatic app_id/secret extraction and token acquisition
api.set_access_token()
```
**App ID/Secret Extraction:**
```python
def _get_app_credentials(self):
# Fetch Qobuz web player
response = requests.get("https://play.qobuz.com")
html = response.text
# Extract bundle URL from HTML
bundle_url_match = re.search(r'<script src="(/resources/\d+\.\d+\.\d+-[a-z]\d+/bundle\.js)"', html)
bundle_url = "https://play.qobuz.com" + bundle_url_match.group(1)
# Fetch bundle JavaScript
bundle_js = requests.get(bundle_url).text
# Extract app_id and secrets array
app_id = re.search(r'production:{api:{appId:"(\d+)"', bundle_js).group(1)
secrets = re.findall(r'[a-f0-9]{32}', bundle_js)
# Test secrets to find valid one
for secret in secrets:
if self._test_secret(app_id, secret):
return app_id, secret
```
**Security Note:** This method violates Qobuz terms of service. Use at your own risk.
### API Coverage
**Catalog:**
- Search tracks, albums, artists, playlists
- Get detailed information
- Browse by genre, new releases, charts
**Streaming:**
- Get streaming URLs with quality selection
- Quality levels: MP3 320kbps, FLAC 16/44.1, FLAC 24/96, FLAC Hi-Res (up to 24/192)
- Download tracks (within subscription terms)
**User Library:**
- Get user playlists
- Create, update, delete playlists
- Add/remove tracks from playlists
- Get favorites (tracks, albums, artists)
- Add/remove favorites
**Not Available:**
- Lyrics
- Credits (producer, engineer, etc.)
- User playback history
### Data Model
**Track Object:**
```json
{
"id": 12345678,
"title": "Creep",
"duration": 238,
"track_number": 2,
"media_number": 1,
"isrc": "GBAYE9200070",
"performer": {"name": "Radiohead", "id": 12345},
"album": {
"id": "0060254734729",
"title": "Pablo Honey",
"release_date_original": "1993-02-22",
"upc": "0060254734729",
"image": {
"small": "https://.../230x230.jpg",
"large": "https://.../600x600.jpg"
},
"label": {"name": "XL Recordings"}
},
"maximum_bit_depth": 16,
"maximum_sampling_rate": 44.1
}
```
**Streaming URL Response:**
```json
{
"url": "https://streaming.qobuz.com/...",
"format_id": 27,
"mime_type": "audio/flac",
"sampling_rate": 44.1,
"bit_depth": 16,
"restrictions": []
}
```
### Quality Levels
| Format ID | Quality | Codec | Bitrate/Depth | Subscription Tier |
|-----------|---------|-------|---------------|-------------------|
| 5 | MP3 | MP3 | 320 kbps | Studio |
| 6 | CD | FLAC | 16-bit/44.1kHz | Studio |
| 7 | Hi-Res | FLAC | 24-bit/96kHz | Studio Sublime |
| 27 | Hi-Res | FLAC | Up to 24-bit/192kHz | Studio Sublime |
**Availability:** Quality depends on:
1. User's subscription tier
2. Album's available formats
3. Geographic restrictions
### Rate Limiting
**Unknown.** Private API does not document rate limits.
**Observation:** Aggressive usage (>100 requests/minute) may trigger temporary blocks.
**Recommendation:** Implement conservative rate limiting (10-20 requests/minute).
### Use Cases
1. **High-Resolution Downloads:** Get FLAC files up to 24-bit/192kHz
2. **Metadata Enrichment:** Get detailed album info (label, UPC, release date)
3. **Playlist Management:** Sync playlists between services
4. **Favorites Sync:** Export/import favorite tracks
### Limitations
- Private API (may break without notice)
- Requires active subscription
- No lyrics or credits
- Geographic restrictions on content
- Terms of service violations (use at own risk)
## Spotify Integration
### Service Overview
**Purpose:** Music streaming, discovery, and social features
**Website:** https://www.spotify.com
**API Documentation:** https://developer.spotify.com/documentation/web-api
**API Type:** Public (Web API) + Private (Lyrics)
### Authentication
**Method:** OAuth 2.0 with four flow types
**Flow 1: Authorization Code**
- Full user access with refresh token
- Requires user login via browser
- Best for web applications
**Flow 2: PKCE (Proof Key for Code Exchange)**
- For mobile and desktop apps
- No client secret required
- Enhanced security for public clients
**Flow 3: Client Credentials**
- App-only access (no user context)
- No user login required
- Limited to catalog endpoints (no user library, playlists, playback)
**Flow 4: Web Player (Undocumented)**
- Extract `sp_dc` cookie from browser
- Access private endpoints (lyrics)
- Violates terms of service
**Implementation:**
```python
from minim import spotify
# Authorization Code flow
api = spotify.WebAPI(
client_id="your_client_id",
client_secret="your_client_secret",
redirect_uri="http://localhost:8888"
)
api.set_flow("authorization_code", scopes=[
"user-library-read",
"playlist-read-private",
"user-read-playback-state"
])
api.set_access_token() # Opens browser
# Client Credentials flow (no user login)
api = spotify.WebAPI(client_id="...", client_secret="...")
api.set_flow("client_credentials")
api.set_access_token()
```
### API Coverage
**Catalog:**
- Search tracks, albums, artists, playlists, shows, episodes
- Get detailed information
- Get related artists
- Get artist top tracks
- Get album tracks
- Get audio features (danceability, energy, tempo, etc.)
- Get audio analysis (detailed beat/bar/section analysis)
**User Library:**
- Get saved tracks, albums, shows, episodes
- Save/remove items
- Check if items are saved
**Playlists:**
- Get user playlists
- Get playlist details and tracks
- Create, update, delete playlists
- Add/remove tracks
- Reorder tracks
- Upload custom cover image
**Playback:**
- Get current playback state
- Get available devices
- Start/pause/skip playback
- Seek to position
- Set volume
- Toggle shuffle/repeat
- Transfer playback between devices
**Personalization:**
- Get top artists and tracks
- Get recently played tracks
- Get recommendations based on seeds
**Follow:**
- Follow/unfollow artists, users, playlists
- Check if following
- Get followed artists
**Browse:**
- Get featured playlists
- Get new releases
- Get categories
- Get category playlists
**Lyrics (Private API):**
- Get synchronized lyrics via Musixmatch integration
- Requires `sp_dc` cookie
### Data Model
**Track Object:**
```json
{
"id": "3n3Ppam7vgaVa1iaRUc9Lp",
"name": "Creep",
"artists": [
{"name": "Radiohead", "id": "4Z8W4fKeB5YxbusRsdQVPb"}
],
"album": {
"name": "Pablo Honey",
"id": "6AZv3m27uyRxi8KyJSfUxL",
"release_date": "1993-02-22",
"images": [
{"url": "https://.../640x640.jpg", "width": 640, "height": 640}
]
},
"duration_ms": 238640,
"track_number": 2,
"disc_number": 1,
"explicit": false,
"external_ids": {"isrc": "GBAYE9200070"},
"popularity": 82,
"preview_url": "https://.../preview.mp3"
}
```
**Audio Features Object:**
```json
{
"id": "3n3Ppam7vgaVa1iaRUc9Lp",
"danceability": 0.456,
"energy": 0.789,
"key": 7,
"loudness": -6.234,
"mode": 1,
"speechiness": 0.034,
"acousticness": 0.123,
"instrumentalness": 0.000012,
"liveness": 0.089,
"valence": 0.234,
"tempo": 92.456,
"duration_ms": 238640,
"time_signature": 4
}
```
**Lyrics Object (Private API):**
```json
{
"lyrics": {
"syncType": "LINE_SYNCED",
"lines": [
{"startTimeMs": "0", "words": "When you were here before", "syllables": []},
{"startTimeMs": "5230", "words": "Couldn't look you in the eye", "syllables": []}
],
"language": "en"
}
}
```
### Scopes
Spotify uses OAuth scopes to control API access. Common scopes:
**Library:**
- `user-library-read`: Read saved tracks/albums
- `user-library-modify`: Save/remove tracks/albums
**Playlists:**
- `playlist-read-private`: Read private playlists
- `playlist-read-collaborative`: Read collaborative playlists
- `playlist-modify-public`: Modify public playlists
- `playlist-modify-private`: Modify private playlists
**Playback:**
- `user-read-playback-state`: Read playback state
- `user-modify-playback-state`: Control playback
- `user-read-currently-playing`: Read currently playing track
**Personalization:**
- `user-top-read`: Read top artists and tracks
- `user-read-recently-played`: Read recently played tracks
**Follow:**
- `user-follow-read`: Read followed artists/users
- `user-follow-modify`: Follow/unfollow artists/users
**User:**
- `user-read-private`: Read user profile (country, product)
- `user-read-email`: Read user email
### Rate Limiting
**Limit:** Varies by endpoint, typically 180 requests per 30 seconds
**Enforcement:** HTTP 429 (Too Many Requests)
**Headers:**
- `Retry-After`: Seconds to wait before retrying
**Recommendation:** Implement exponential backoff and respect `Retry-After` header.
### Use Cases
1. **Comprehensive Metadata:** Get detailed track info, audio features, related artists
2. **Playlist Management:** Create, sync, and manage playlists
3. **Playback Control:** Build custom music players
4. **Music Discovery:** Get recommendations, browse new releases
5. **Lyrics Integration:** Display synchronized lyrics (via private API)
### Limitations
- No streaming URLs (only 30-second previews)
- No download capability
- Lyrics require private API (terms of service violation)
- Rate limiting varies by endpoint
- Some features require premium subscription
## TIDAL Integration
### Service Overview
**Purpose:** High-fidelity music streaming with MQA support
**Website:** https://www.tidal.com
**API Documentation:** Limited (mostly undocumented)
**API Type:** Public (limited) + Private (extensive)
### Authentication
**Method:** OAuth 2.0 (PKCE or Client Credentials)
**Public API:**
- Client credentials flow
- Limited endpoints (catalog search, basic info)
**Private API:**
- PKCE flow with user login
- Full access (streaming URLs, lyrics, credits)
**Implementation:**
```python
from minim import tidal
# Private API (full access)
api = tidal.PrivateAPI(
client_id="your_client_id",
client_secret="your_client_secret"
)
api.set_flow("pkce")
api.set_access_token() # Opens browser for login
```
**Client ID/Secret:**
- Extracted from TIDAL desktop/mobile apps
- Not officially provided by TIDAL
- Use at own risk (terms of service violation)
### API Coverage
**Catalog:**
- Search tracks, albums, artists, playlists, videos
- Get detailed information
- Get artist top tracks and albums
- Get similar artists
- Get album review and credits
**Streaming:**
- Get streaming URLs with quality selection
- Quality levels: LOW (96kbps AAC), HIGH (320kbps AAC), LOSSLESS (FLAC 16/44.1), HI_RES (FLAC 24/96+), HI_RES_LOSSLESS (MQA)
- Manifest decryption for protected streams
**Lyrics:**
- Get synchronized lyrics (LRC format)
- Get plain text lyrics
**Credits:**
- Get detailed credits (producers, engineers, musicians, composers, etc.)
- Role-based organization
**User Library:**
- Get user playlists
- Create, update, delete playlists
- Add/remove tracks
- Get favorites (tracks, albums, artists, videos)
- Add/remove favorites
**Not Available:**
- Playback control (no remote control API)
- Audio features/analysis
- Recommendations (limited)
### Data Model
**Track Object:**
```json
{
"id": 12345678,
"title": "Creep",
"duration": 238,
"trackNumber": 2,
"volumeNumber": 1,
"isrc": "GBAYE9200070",
"explicit": false,
"audioQuality": "HI_RES",
"artists": [
{"name": "Radiohead", "id": 4050}
],
"album": {
"id": 1234567,
"title": "Pablo Honey",
"releaseDate": "1993-02-22",
"cover": "01234567-89ab-cdef-0123-456789abcdef",
"upc": "0060254734729",
"numberOfTracks": 12,
"audioQuality": "HI_RES"
},
"streamStartDate": "1993-02-22T00:00:00.000Z"
}
```
**Streaming Manifest:**
```json
{
"mimeType": "audio/flac",
"codecs": "flac",
"encryptionType": "NONE",
"urls": ["https://streaming.tidal.com/..."],
"soundQuality": "HI_RES",
"bitDepth": 24,
"sampleRate": 96000
}
```
**Lyrics Object:**
```json
{
"trackId": 12345678,
"lyricsProvider": "Musixmatch",
"providerLyricsId": "12345",
"lyrics": "When you were here before\nCouldn't look you in the eye...",
"subtitles": "[00:00.00] When you were here before\n[00:05.23] Couldn't look you in the eye..."
}
```
**Credits Object:**
```json
{
"credits": [
{
"type": "Producers",
"contributors": [
{"name": "Sean Slade", "id": 12345},
{"name": "Paul Q. Kolderie", "id": 67890}
]
},
{
"type": "Performers",
"contributors": [
{"name": "Thom Yorke", "id": 111, "role": "Vocals"},
{"name": "Jonny Greenwood", "id": 222, "role": "Guitar"}
]
}
]
}
```
### Quality Levels
| Quality | Codec | Bitrate/Depth | Subscription Tier |
|---------|-------|---------------|-------------------|
| LOW | AAC | 96 kbps | Free (trial) |
| HIGH | AAC | 320 kbps | HiFi |
| LOSSLESS | FLAC | 16-bit/44.1kHz | HiFi |
| HI_RES | FLAC | 24-bit/96kHz+ | HiFi Plus |
| HI_RES_LOSSLESS | MQA | 24-bit/96kHz+ (MQA) | HiFi Plus |
**MQA (Master Quality Authenticated):**
- Proprietary format by Meridian Audio
- Requires MQA-compatible DAC for full unfolding
- Software decoding provides 24-bit/96kHz
### Manifest Decryption
Some streaming URLs are encrypted. minim handles decryption:
```python
def _decrypt_manifest(self, manifest: dict) -> str:
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
import base64
# Extract encrypted URL
encrypted_url = manifest["urls"][0]
# Decrypt using AES-128-CTR
key = base64.b64decode(manifest["encryptionKey"])
nonce = base64.b64decode(manifest["nonce"])
cipher = Cipher(
algorithms.AES(key),
modes.CTR(nonce),
backend=default_backend()
)
decryptor = cipher.decryptor()
decrypted = decryptor.update(base64.b64decode(encrypted_url)) + decryptor.finalize()
return decrypted.decode("utf-8")
```
### Rate Limiting
**Unknown.** Private API does not document rate limits.
**Observation:** Moderate usage (<50 requests/minute) appears safe.
**Recommendation:** Implement conservative rate limiting and exponential backoff on errors.
### Use Cases
1. **High-Fidelity Streaming:** Get FLAC files up to 24-bit/192kHz and MQA
2. **Comprehensive Credits:** Get detailed production credits
3. **Synchronized Lyrics:** Display time-synced lyrics
4. **Metadata Enrichment:** Get authoritative release dates, ISRCs, UPCs
5. **Playlist Management:** Sync playlists between services
### Limitations
- Private API (may break without notice)
- Requires active subscription (HiFi or HiFi Plus for lossless)
- Client ID/secret extraction violates terms of service
- No playback control API
- No audio features/analysis
- Geographic restrictions on content
## Integration Comparison
### Authentication Complexity
**Simplest:** iTunes (no auth)
**Simple:** Discogs (personal token), Spotify (client credentials)
**Moderate:** Spotify (authorization code), TIDAL (PKCE)
**Complex:** Qobuz (app_id extraction + password grant)
### API Documentation Quality
**Best:** Spotify (comprehensive, well-maintained)
**Good:** Discogs, iTunes
**Poor:** TIDAL (limited public docs)
**None:** Qobuz (fully reverse-engineered)
### Streaming Capability
**High-Resolution:** Qobuz (up to 24/192 FLAC), TIDAL (up to 24/192 FLAC + MQA)
**Lossless:** TIDAL (16/44.1 FLAC)
**Lossy:** TIDAL (AAC), Qobuz (MP3)
**Preview Only:** Spotify (30sec MP3), iTunes (30sec M4A)
**None:** Discogs
### Metadata Richness
**Credits:** TIDAL (excellent), Discogs (good), others (none)
**Lyrics:** TIDAL (synced), Spotify (synced, private API), others (none)
**Audio Features:** Spotify (excellent), others (none)
**Catalog Info:** All services (good)
### Terms of Service Compliance
**Compliant:** Discogs (public API), iTunes (public API), Spotify (public Web API)
**Questionable:** Spotify (private lyrics API)
**Violation:** Qobuz (app_id extraction, private API), TIDAL (client_id extraction, private API)
## Summary
minim provides comprehensive integration with five music services, each serving different use cases:
- **Discogs:** Best for credits, catalog numbers, and collection management
- **iTunes:** Best for quick, unauthenticated metadata lookup
- **Qobuz:** Best for high-resolution downloads (within subscription terms)
- **Spotify:** Best for comprehensive metadata, audio features, and playlist management
- **TIDAL:** Best for high-fidelity streaming, credits, and synchronized lyrics
All integrations follow consistent patterns (authentication, request handling, error raising) while exposing service-specific features. The private API usage (Qobuz, Spotify lyrics, TIDAL) provides powerful capabilities but carries legal and stability risks.
For a metadata aggregator project, prioritize public APIs (Spotify Web API, Discogs, iTunes) for production use, and use private APIs only for research or personal projects.
+312
View File
@@ -0,0 +1,312 @@
# minim: Overview
## Project Identity
**Name:** minim
**Version:** 1.1.0
**License:** GPL-3.0
**Language:** Python 3.9+
**Type:** Library (not a server or standalone application)
**Repository:** https://github.com/bbye98/minim
**Author:** Benjamin Ye
**Documentation:** https://minim.readthedocs.io
minim is a Python library for interacting with music streaming service APIs and managing audio file metadata. It provides unified interfaces to five major music platforms and tools for reading, writing, and converting audio metadata across multiple formats.
## Architecture Type
minim is a **library**, not a server or service. Users import modules directly into Python code:
```python
from minim import spotify, tidal, qobuz
from minim.audio import Audio
```
There is no HTTP API, no daemon process, no deployment infrastructure. The library runs in the caller's process space.
## Core Dependencies
**Required:**
- `cryptography`: TIDAL manifest decryption, secure token handling
- `mutagen`: Audio file metadata reading/writing (ID3v2, Vorbis Comments, MP4 atoms)
- `requests`: HTTP client for all API calls
**Optional:**
- `ffmpeg`: Audio format conversion between codecs
- `flask`: OAuth callback server (alternative to http.server)
- `levenshtein`: Fuzzy string matching for search results
- `numpy`: Audio analysis features
- `pillow`: Image processing for album artwork
- `playwright`: Browser automation for OAuth flows
The core library works with just the three required dependencies. Optional dependencies enable specific features but aren't mandatory for basic API access.
## Codebase Structure
**Total Lines:** 35,916 across 8 modules
### Module Breakdown
| Module | Lines | Purpose |
|--------|-------|---------|
| `audio.py` | 1,860 | Audio file handling, metadata reading/writing, format conversion |
| `discogs.py` | 5,501 | Discogs API client (database, marketplace, collection, wantlist) |
| `itunes.py` | 575 | iTunes Search API client (search, lookup) |
| `qobuz.py` | 5,579 | Qobuz API client (streaming, catalog, playlists, favorites) |
| `spotify.py` | 9,862 | Spotify Web API + private lyrics service |
| `tidal.py` | 12,338 | TIDAL public + private API (streaming, lyrics, credits) |
| `utility.py` | 136 | Shared utilities (config parsing, string formatting) |
| `__init__.py` | 65 | Package initialization, version info |
**Observation:** `tidal.py` is disproportionately large at 12,338 lines (34% of total codebase). This suggests either comprehensive API coverage or a need for refactoring into submodules.
## API Client Coverage
minim provides clients for five music services:
### 1. Discogs
- **Auth:** OAuth 1.0a (consumer key/secret + access token) or personal access token
- **Scope:** Music database (artists, releases, labels), marketplace, user collection/wantlist
- **Rate Limits:** 60 requests/minute (authenticated), 25/minute (unauthenticated)
### 2. iTunes Search API
- **Auth:** None required
- **Scope:** Public search and lookup across iTunes catalog
- **Rate Limits:** 20 requests/minute (approximate)
### 3. Qobuz
- **Auth:** Password grant OAuth (email/password)
- **Scope:** Catalog search, streaming URLs, playlists, favorites
- **Special:** Auto-extracts `app_id` and `app_secret` from web player JavaScript
### 4. Spotify
- **Auth:** Four OAuth 2.0 flows
- Authorization Code (full user access)
- PKCE (mobile/desktop apps)
- Client Credentials (app-only, no user context)
- Web Player (via `sp_dc` cookie, undocumented)
- **Scope:** Full Web API coverage (30+ permission scopes), private lyrics via Musixmatch integration
- **Special:** Most comprehensive API client in the library
### 5. TIDAL
- **Auth:** Public API (client credentials or PKCE) + Private API (additional endpoints)
- **Scope:** Catalog, streaming URLs with quality selection, lyrics, credits
- **Quality Levels:** LOW, HIGH, LOSSLESS, HI_RES, HI_RES_LOSSLESS
- **Special:** Manifest decryption for streaming URLs using `cryptography`
## Audio Format Support
minim handles five audio container formats through the `Audio` class:
- **FLAC:** Free Lossless Audio Codec, Vorbis Comments metadata
- **MP3:** MPEG-1 Audio Layer III, ID3v2 tags
- **MP4/M4A:** MPEG-4 Part 14, MP4 atom metadata
- **Ogg Vorbis:** Ogg container, Vorbis Comments metadata
- **WAVE:** Waveform Audio File Format, ID3v2 tags (non-standard but supported)
**Format Conversion:** FFmpeg integration allows transcoding between formats while preserving metadata.
**Auto-Detection:** The `Audio` class automatically detects format from file extension and magic bytes, instantiating the appropriate subclass (`FLAC`, `MP3`, `MP4`, `OggVorbis`, `WAVE`).
## Authentication Pattern
All API clients follow a consistent authentication flow:
1. **Initialization:** `__init__()` checks for existing tokens in `~/minim.cfg`
2. **Flow Selection:** `set_flow()` configures OAuth flow type (auth code, PKCE, client credentials, etc.)
3. **Token Acquisition:** `set_access_token()` performs OAuth handshake or uses provided credentials
4. **Persistence:** Tokens saved to `~/minim.cfg` via `ConfigParser`
5. **Auto-Refresh:** Expired tokens automatically refreshed on next API call
**Token Storage Location:** `~/minim.cfg` (plain text ConfigParser file)
**Security Note:** Tokens stored in plain text. No encryption, no OS keychain integration. Suitable for personal use, not production systems handling user credentials.
## Request Handling Pattern
All API modules use a common `_request()` method:
```python
def _request(self, method: str, url: str, **kwargs) -> dict:
response = requests.request(method, url, headers=self._get_headers(), **kwargs)
if not response.ok:
raise RuntimeError(f"API error: {response.status_code} {response.text}")
return response.json()
```
**Error Handling:** All API errors raise `RuntimeError` with status code and response text. No typed exceptions, no retry logic, no rate limit handling.
**Headers:** Each service implements `_get_headers()` to inject authentication (Bearer tokens, OAuth signatures, API keys).
## Metadata Mapping
The `Audio` class provides service-specific metadata setters:
- `set_metadata_using_itunes(data: dict)`
- `set_metadata_using_qobuz(data: dict)`
- `set_metadata_using_spotify(data: dict)`
- `set_metadata_using_tidal(data: dict)`
Each method maps service-specific JSON responses to standardized audio file tags:
- `album`: Album title
- `artist`: Primary artist(s)
- `title`: Track title
- `isrc`: International Standard Recording Code
- `artwork`: Album cover image (bytes)
- `date`: Release date
- `genre`: Genre classification
- `track_number`: Position in album
- `disc_number`: Disc number (for multi-disc albums)
**Normalization:** Handles differences in field names, date formats, artist arrays vs. strings, and artwork URL fetching.
## Configuration Management
**File:** `~/minim.cfg`
**Format:** INI-style via Python's `ConfigParser`
**Structure:**
```ini
[discogs]
access_token = ...
access_token_secret = ...
[qobuz]
email = user@example.com
password = ...
access_token = ...
[spotify]
client_id = ...
client_secret = ...
access_token = ...
refresh_token = ...
[tidal]
client_id = ...
access_token = ...
refresh_token = ...
```
**Environment Variables:** Each service also checks for credentials in environment variables (e.g., `SPOTIFY_CLIENT_ID`, `TIDAL_CLIENT_SECRET`). Environment variables take precedence over config file.
## OAuth Callback Methods
minim supports three methods for handling OAuth redirect URIs:
### 1. http.server (default)
- Spawns temporary HTTP server on localhost
- Listens for OAuth callback
- Extracts authorization code from query parameters
- Shuts down after receiving callback
### 2. Flask
- Uses Flask development server for callback handling
- Same flow as http.server but with Flask routing
- Requires `flask` optional dependency
### 3. Playwright
- Launches headless browser
- Automates login flow
- Intercepts redirect URL
- Extracts authorization code
- Requires `playwright` optional dependency
**Use Case:** Playwright useful for services requiring CAPTCHA or complex login flows. Flask/http.server sufficient for standard OAuth.
## Testing Infrastructure
**Framework:** pytest
**Coverage:** 6 test files, one per major module
**Test Style:** Class-based with `setup_class()` for authentication
**API Calls:** Tests make real API calls (not mocked)
**CI Environment:** GitHub Actions with Ubuntu, Python 3.9, FFmpeg installed
**Coverage Configuration:** `.coveragerc` excludes test files and `__init__.py` from coverage reports.
**Limitation:** Real API calls in CI require valid credentials. Tests may fail if rate limits exceeded or services change APIs.
## Development Status
**Current Version:** 1.1.0 (maintenance mode)
**Active Development:** v2 rewrite on `dev` branch
**Maintenance:** Bug fixes and security updates only for v1
**v2 Changes (from dev branch inspection):**
- Modular architecture (split large files like `tidal.py`)
- Typed exceptions instead of generic `RuntimeError`
- Rate limiting built-in
- Async support via `aiohttp`
- Secure token storage via OS keychain
- PyPI publication planned
## Distribution
**Current:** Install from source only
```bash
git clone https://github.com/bbye98/minim.git
cd minim
python -m pip install -e .
```
**Conda:** `environment.yml` provided for conda users
**PyPI:** Not yet published (planned for v2)
**Documentation:** Auto-deployed to ReadTheDocs on every push to main branch
## Use Cases
minim is designed for:
1. **Personal Music Library Management:** Fetch metadata from streaming services, write to local audio files
2. **Playlist Synchronization:** Export playlists from one service, import to another
3. **Audio File Tagging:** Bulk metadata updates using authoritative sources
4. **Music Discovery:** Search across multiple services, compare results
5. **Streaming URL Extraction:** Download tracks from Qobuz/TIDAL (within terms of service)
**Not Designed For:**
- Production web services (no rate limiting, plain text tokens)
- Real-time streaming (no playback engine)
- Large-scale automation (no async, no connection pooling)
## Key Strengths
1. **Unified Interface:** Consistent API across five different services
2. **Comprehensive Coverage:** Implements most endpoints for each service
3. **Automatic Token Management:** Caching, refresh, persistence handled transparently
4. **Audio Metadata Integration:** Direct mapping from API responses to file tags
5. **Multiple OAuth Flows:** Flexibility in authentication methods
6. **Pure Python:** Minimal dependencies, easy to install and modify
## Key Limitations
1. **GPL-3.0 License:** Copyleft requires derivative works to be GPL-3.0
2. **Plain Text Token Storage:** Security risk for shared systems
3. **No Rate Limiting:** Caller responsible for respecting API limits
4. **Generic Error Handling:** All errors are `RuntimeError`, no typed exceptions
5. **Synchronous Only:** No async support in v1
6. **Private API Dependency:** Qobuz, Spotify lyrics, TIDAL private endpoints can break without notice
7. **Monolithic Files:** `tidal.py` at 12K lines is difficult to navigate and maintain
## Integration Potential
For a metadata aggregator project, minim provides:
- **Reference Implementations:** OAuth flows for each service
- **Token Management Pattern:** Config file caching, auto-refresh logic
- **Metadata Normalization:** Field mapping from service-specific to standardized schemas
- **Audio File Handling:** Reading/writing tags across formats
**Reusability:** Code can be extracted and adapted (respecting GPL-3.0). The authentication patterns, request handling, and metadata mapping are particularly valuable.
**Caution:** Private API usage (Qobuz app_id extraction, Spotify lyrics, TIDAL private endpoints) may violate terms of service. Use only documented public APIs in production systems.
## Conclusion
minim is a mature, feature-rich library for music service API integration and audio metadata management. It excels at providing unified access to multiple streaming platforms with automatic authentication handling. The codebase demonstrates solid engineering practices (testing, documentation, CI/CD) but shows signs of organic growth (large monolithic files, generic error handling).
For personal projects and research, minim is production-ready. For commercial or large-scale use, the v2 rewrite addresses critical limitations (async, rate limiting, secure storage, typed exceptions). The GPL-3.0 license requires careful consideration for proprietary projects.
As a reference implementation, minim is invaluable for understanding how to integrate with music streaming APIs, handle OAuth flows, and normalize metadata across services.