- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
24 KiB
minim: Codebase Analysis
Repository Structure
minim/
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions CI/CD
├── docs/
│ ├── conf.py # Sphinx configuration
│ ├── index.rst # Documentation index
│ └── ... # Additional documentation
├── minim/
│ ├── __init__.py # Package initialization (65 lines)
│ ├── audio.py # Audio file handling (1,860 lines)
│ ├── discogs.py # Discogs API client (5,501 lines)
│ ├── itunes.py # iTunes API client (575 lines)
│ ├── qobuz.py # Qobuz API client (5,579 lines)
│ ├── spotify.py # Spotify API client (9,862 lines)
│ ├── tidal.py # TIDAL API client (12,338 lines)
│ └── utility.py # Shared utilities (136 lines)
├── tests/
│ ├── test_audio.py # Audio module tests
│ ├── test_discogs.py # Discogs tests
│ ├── test_itunes.py # iTunes tests
│ ├── test_qobuz.py # Qobuz tests
│ ├── test_spotify.py # Spotify tests
│ └── test_tidal.py # TIDAL tests
├── .coveragerc # Coverage configuration
├── .gitignore # Git ignore patterns
├── environment.yml # Conda environment
├── LICENSE # GPL-3.0 license
├── README.md # Project README
└── setup.py # Package setup
Total Source Lines: 35,916 (excluding tests, docs, config)
Module Distribution:
tidal.py: 34.4% (12,338 lines)spotify.py: 27.5% (9,862 lines)discogs.py: 15.3% (5,501 lines)qobuz.py: 15.5% (5,579 lines)audio.py: 5.2% (1,860 lines)itunes.py: 1.6% (575 lines)utility.py: 0.4% (136 lines)__init__.py: 0.2% (65 lines)
Observation: tidal.py is disproportionately large. This suggests either comprehensive API coverage or a need for refactoring into submodules.
Code Organization
Package Initialization (__init__.py)
Purpose: Package metadata and version info
Contents:
"""
minim: Comprehensive music metadata library
"""
__version__ = "1.1.0"
__author__ = "Benjamin Ye"
__email__ = "bbye98@gmail.com"
__license__ = "GPL-3.0"
__url__ = "https://github.com/bbye98/minim"
# No automatic imports (users import specific modules)
Design Choice: No automatic imports. Users explicitly import modules:
from minim import spotify # Not: from minim.spotify import WebAPI
Utility Module (utility.py)
Purpose: Shared utilities across all modules
Functions:
Config File Handling:
def get_config_path() -> str:
"""Get path to minim config file."""
return os.path.expanduser("~/minim.cfg")
def load_config() -> ConfigParser:
"""Load config file."""
config = ConfigParser()
config.read(get_config_path())
return config
def save_config(config: ConfigParser) -> None:
"""Save config file."""
with open(get_config_path(), "w") as f:
config.write(f)
String Formatting:
def format_duration(seconds: int) -> str:
"""Format duration in seconds to MM:SS or HH:MM:SS."""
hours, remainder = divmod(seconds, 3600)
minutes, seconds = divmod(remainder, 60)
if hours > 0:
return f"{hours}:{minutes:02d}:{seconds:02d}"
else:
return f"{minutes}:{seconds:02d}"
def sanitize_filename(filename: str) -> str:
"""Remove invalid characters from filename."""
invalid_chars = '<>:"/\\|?*'
for char in invalid_chars:
filename = filename.replace(char, "_")
return filename
URL Handling:
def build_url(base: str, path: str, params: dict = None) -> str:
"""Build URL with path and query parameters."""
url = base.rstrip("/") + "/" + path.lstrip("/")
if params:
query = "&".join(f"{k}={v}" for k, v in params.items() if v is not None)
url += "?" + query
return url
Minimal Utilities: Only 136 lines. Most logic is self-contained within each module.
Configuration Management
Config File Format
Location: ~/minim.cfg
Parser: Python's ConfigParser (INI format)
Structure:
[section_name]
key = value
key2 = value2
Reading:
from configparser import ConfigParser
import os
config = ConfigParser()
config.read(os.path.expanduser("~/minim.cfg"))
value = config.get("section", "key", fallback=None)
int_value = config.getint("section", "key", fallback=0)
bool_value = config.getboolean("section", "key", fallback=False)
Writing:
if not config.has_section("section"):
config.add_section("section")
config.set("section", "key", "value")
with open(os.path.expanduser("~/minim.cfg"), "w") as f:
config.write(f)
Environment Variables
Pattern: {SERVICE}_{FIELD} in uppercase
Examples:
SPOTIFY_CLIENT_IDTIDAL_ACCESS_TOKENQOBUZ_EMAIL
Reading:
import os
client_id = os.getenv("SPOTIFY_CLIENT_ID")
client_secret = os.getenv("SPOTIFY_CLIENT_SECRET")
Precedence in Code:
def __init__(self, client_id=None, client_secret=None):
# 1. Explicit parameter
self.client_id = client_id
# 2. Environment variable
if not self.client_id:
self.client_id = os.getenv("SPOTIFY_CLIENT_ID")
# 3. Config file
if not self.client_id:
config = load_config()
if config.has_section("spotify"):
self.client_id = config.get("spotify", "client_id", fallback=None)
Logging and Error Handling
Logging
No Structured Logging: minim does not use Python's logging module.
Warnings:
import warnings
warnings.warn("Token will expire soon", UserWarning)
Use Cases:
- Non-critical issues (token expiration warnings)
- Deprecated features
- Fallback behavior
No Debug Logging: No verbose output for debugging. Users must add their own logging.
Error Handling
Strategy: Fail-fast with exceptions
Exception Types:
RuntimeError: API errors, HTTP failuresValueError: Invalid input, unsupported formatsFileNotFoundError: Missing audio filesKeyError: Missing required fields in API responses
No Custom Exceptions: All errors use built-in exception types.
Example:
def _request(self, method, url, **kwargs):
response = requests.request(method, url, **kwargs)
if not response.ok:
raise RuntimeError(
f"{method} {url} failed: {response.status_code} {response.text}"
)
return response.json()
Error Messages:
- Include HTTP method and URL
- Include status code and response body
- No error codes or structured error objects
Caller Responsibility:
try:
track = api.get_track(12345)
except RuntimeError as e:
# Parse error message to determine cause
if "404" in str(e):
print("Track not found")
elif "401" in str(e):
print("Authentication failed")
else:
print(f"Unknown error: {e}")
Testing Infrastructure
Test Framework
Tool: pytest
Test Files:
tests/test_audio.py: Audio file handling teststests/test_discogs.py: Discogs API teststests/test_itunes.py: iTunes API teststests/test_qobuz.py: Qobuz API teststests/test_spotify.py: Spotify API teststests/test_tidal.py: TIDAL API tests
Test Structure:
import pytest
from minim import spotify
class TestSpotifyWebAPI:
@classmethod
def setup_class(cls):
"""Set up API client for all tests."""
cls.api = spotify.WebAPI(
client_id=os.getenv("SPOTIFY_CLIENT_ID"),
client_secret=os.getenv("SPOTIFY_CLIENT_SECRET")
)
cls.api.set_flow("client_credentials")
cls.api.set_access_token()
def test_search(self):
"""Test search functionality."""
results = self.api.search("Radiohead", types=["artist"], limit=1)
assert "artists" in results
assert len(results["artists"]["items"]) > 0
assert results["artists"]["items"][0]["name"] == "Radiohead"
def test_get_artist(self):
"""Test get artist by ID."""
artist = self.api.get_artist("4Z8W4fKeB5YxbusRsdQVPb")
assert artist["name"] == "Radiohead"
assert artist["type"] == "artist"
def test_invalid_id(self):
"""Test error handling for invalid ID."""
with pytest.raises(RuntimeError):
self.api.get_artist("invalid_id")
Class-Based Tests:
setup_class(): Run once before all tests in classteardown_class(): Run once after all tests in class- Shared API client across tests (reduces authentication overhead)
Real API Calls:
- Tests make actual HTTP requests to services
- Requires valid credentials in environment variables
- May fail if services are down or rate limits exceeded
No Mocking: Tests do not use unittest.mock or responses library. All API calls are real.
Pros:
- Tests verify actual API behavior
- Catches API changes immediately
Cons:
- Slow (network latency)
- Flaky (depends on service availability)
- Rate limiting issues
- Requires credentials
Coverage Configuration
File: .coveragerc
[run]
source = minim
omit =
*/tests/*
*/__init__.py
*/site-packages/*
[report]
exclude_lines =
pragma: no cover
def __repr__
raise AssertionError
raise NotImplementedError
if __name__ == .__main__.:
if TYPE_CHECKING:
precision = 2
show_missing = True
Coverage Execution:
coverage run -m pytest tests/
coverage report
coverage html
Coverage Metrics: Not documented in repository. Estimated 60-80% based on test file count and module complexity.
Continuous Integration
Platform: GitHub Actions
Workflow: .github/workflows/ci.yml
Triggers:
- Push to
mainordevbranches - Pull requests to
main
Jobs:
Linting:
- name: Lint with ruff
run: ruff check .
Testing:
- name: Run tests
env:
SPOTIFY_CLIENT_ID: ${{ secrets.SPOTIFY_CLIENT_ID }}
SPOTIFY_CLIENT_SECRET: ${{ secrets.SPOTIFY_CLIENT_SECRET }}
TIDAL_CLIENT_ID: ${{ secrets.TIDAL_CLIENT_ID }}
TIDAL_CLIENT_SECRET: ${{ secrets.TIDAL_CLIENT_SECRET }}
run: pytest tests/
Environment:
- OS: Ubuntu 22.04
- Python: 3.9
- FFmpeg: Installed via apt
Secrets: API credentials stored in GitHub Secrets, injected as environment variables.
Code Style
Linting
Tool: ruff (modern, fast Python linter)
Replaces: flake8, pylint, isort, pyupgrade
Configuration: pyproject.toml or ruff.toml
[tool.ruff]
line-length = 88
target-version = "py39"
[tool.ruff.lint]
select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # pyflakes
"I", # isort
"N", # pep8-naming
"UP", # pyupgrade
]
ignore = [
"E501", # line too long (handled by formatter)
]
Execution:
ruff check .
ruff check --fix . # Auto-fix issues
Formatting
No Formatter: minim does not use black, autopep8, or similar formatters.
Style: Follows PEP 8 with manual formatting.
Line Length: Approximately 88 characters (black default), but not enforced.
Type Hints
Partial Coverage: Type hints used inconsistently.
Examples:
With Type Hints:
def search(self, query: str, types: list[str] = ["track"], limit: int = 20) -> dict:
"""Search Spotify catalog."""
...
Without Type Hints:
def _request(self, method, url, **kwargs):
"""Make HTTP request."""
...
No Type Checking: Does not use mypy or pyright for static type checking.
Recommendation for v2: Add comprehensive type hints and integrate mypy into CI.
Docstrings
Format: Google-style docstrings
Example:
def get_track(self, track_id: str, market: str = None) -> dict:
"""
Get track details.
Args:
track_id: Spotify track ID
market: ISO 3166-1 alpha-2 country code
Returns:
Track object with metadata
Raises:
RuntimeError: If API request fails
Example:
>>> api = WebAPI(client_id="...", client_secret="...")
>>> track = api.get_track("3n3Ppam7vgaVa1iaRUc9Lp")
>>> print(track["name"])
Creep
"""
params = {}
if market:
params["market"] = market
return self._request("GET", f"/tracks/{track_id}", params=params)
Coverage: Most public methods have docstrings. Private methods (_request, _get_headers) often lack documentation.
Sphinx Integration: Docstrings parsed by Sphinx for ReadTheDocs documentation.
Code Patterns
API Client Pattern
Common Structure:
class API:
def __init__(self, client_id=None, client_secret=None, access_token=None):
# Load credentials from parameters, env vars, or config file
self.client_id = client_id or os.getenv("SERVICE_CLIENT_ID")
self.client_secret = client_secret or os.getenv("SERVICE_CLIENT_SECRET")
self.access_token = access_token
# Load from config file if not provided
config = load_config()
if config.has_section("service"):
self.access_token = self.access_token or config.get("service", "access_token")
# API base URL
self.base_url = "https://api.service.com/v1"
def set_flow(self, flow_type="authorization_code", **kwargs):
"""Configure OAuth flow."""
self.flow_type = flow_type
# Store flow-specific parameters
def set_access_token(self, method="http.server"):
"""Obtain access token via OAuth flow."""
# Implement OAuth flow
# Save token to config file
def _get_headers(self) -> dict:
"""Get HTTP headers with authentication."""
return {"Authorization": f"Bearer {self.access_token}"}
def _request(self, method: str, url: str, **kwargs) -> dict:
"""Make authenticated HTTP request."""
if not url.startswith("http"):
url = self.base_url + url
headers = kwargs.pop("headers", {})
headers.update(self._get_headers())
response = requests.request(method, url, headers=headers, **kwargs)
if not response.ok:
raise RuntimeError(f"{method} {url} failed: {response.status_code}")
return response.json()
# Public API methods
def search(self, query: str, **kwargs) -> dict:
"""Search catalog."""
return self._request("GET", "/search", params={"q": query, **kwargs})
def get_track(self, track_id: str) -> dict:
"""Get track details."""
return self._request("GET", f"/tracks/{track_id}")
Consistency: All API clients (discogs.py, spotify.py, tidal.py, qobuz.py) follow this pattern with minor variations.
Audio File Pattern
Base Class with Subclasses:
class Audio:
def __init__(self, filepath: str):
self.filepath = filepath
self._file = mutagen.File(filepath)
# Auto-detect format and change class
if isinstance(self._file, mutagen.flac.FLAC):
self.__class__ = FLAC
elif isinstance(self._file, mutagen.mp3.MP3):
self.__class__ = MP3
# ... etc
self.read_metadata()
def read_metadata(self):
"""Read metadata from file. Implemented by subclasses."""
raise NotImplementedError
def write_metadata(self):
"""Write metadata to file. Implemented by subclasses."""
raise NotImplementedError
class FLAC(Audio):
def read_metadata(self):
self.title = self._file.get("TITLE", [None])[0]
self.artist = self._file.get("ARTIST", [None])[0]
# ... etc
def write_metadata(self):
self._file["TITLE"] = self.title
self._file["ARTIST"] = self.artist
# ... etc
self._file.save()
Dynamic Class Change: self.__class__ = FLAC changes instance class after initialization. Unusual pattern but works for format auto-detection.
OAuth Callback Pattern
Three Implementations:
1. http.server:
def _listen_http_server(self):
class CallbackHandler(BaseHTTPRequestHandler):
def do_GET(self):
query = parse_qs(urlparse(self.path).query)
self.server.authorization_code = query.get("code", [None])[0]
self.send_response(200)
self.end_headers()
self.wfile.write(b"Authorization successful. You may close this window.")
server = HTTPServer(("localhost", 8888), CallbackHandler)
server.handle_request()
return server.authorization_code
2. Flask:
def _listen_flask(self):
app = Flask(__name__)
authorization_code = None
@app.route("/callback")
def callback():
nonlocal authorization_code
authorization_code = request.args.get("code")
shutdown = request.environ.get("werkzeug.server.shutdown")
if shutdown:
shutdown()
return "Authorization successful. You may close this window."
app.run(port=8888)
return authorization_code
3. Playwright:
def _automate_browser(self):
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(self.auth_url)
page.fill("#username", self.email)
page.fill("#password", self.password)
page.click("button[type=submit]")
page.wait_for_url(f"{self.redirect_uri}*")
code = parse_qs(urlparse(page.url).query)["code"][0]
browser.close()
return code
Flexibility: Users choose callback method based on environment (headless server, desktop, etc.).
Code Quality Issues
Large Monolithic Files
Problem: tidal.py is 12,338 lines (34% of codebase).
Impact:
- Difficult to navigate
- Slow to load in editors
- Hard to maintain
- Merge conflicts more likely
Recommendation: Split into submodules:
minim/tidal/
├── __init__.py
├── auth.py # Authentication
├── catalog.py # Catalog endpoints
├── streaming.py # Streaming URLs
├── lyrics.py # Lyrics endpoints
├── user.py # User library
└── models.py # Data models
Generic Error Handling
Problem: All errors are RuntimeError with string messages.
Impact:
- Caller must parse error messages to determine cause
- No structured error handling
- Difficult to distinguish error types
Recommendation: Define custom exceptions:
class MinimError(Exception):
"""Base exception for minim."""
class APIError(MinimError):
"""API request failed."""
def __init__(self, status_code: int, message: str):
self.status_code = status_code
self.message = message
super().__init__(f"API error {status_code}: {message}")
class AuthenticationError(MinimError):
"""Authentication failed."""
class RateLimitError(APIError):
"""Rate limit exceeded."""
def __init__(self, retry_after: int):
self.retry_after = retry_after
super().__init__(429, f"Rate limit exceeded. Retry after {retry_after}s")
No Rate Limiting
Problem: No built-in rate limiting. Caller responsible for tracking.
Impact:
- Easy to exceed service rate limits
- No automatic backoff
- Tests may fail due to rate limiting
Recommendation: Implement rate limiter:
from time import time, sleep
class RateLimiter:
def __init__(self, requests_per_minute: int):
self.requests_per_minute = requests_per_minute
self.requests = []
def wait_if_needed(self):
now = time()
# Remove requests older than 1 minute
self.requests = [t for t in self.requests if now - t < 60]
if len(self.requests) >= self.requests_per_minute:
sleep_time = 60 - (now - self.requests[0])
if sleep_time > 0:
sleep(sleep_time)
self.requests.append(time())
# Usage in API client
class API:
def __init__(self):
self.rate_limiter = RateLimiter(60) # 60 requests per minute
def _request(self, method, url, **kwargs):
self.rate_limiter.wait_if_needed()
# Make request
Plain Text Token Storage
Problem: Tokens stored unencrypted in ~/minim.cfg.
Impact:
- Security risk on shared systems
- Tokens readable by any process
- Passwords stored in plain text (Qobuz)
Recommendation: Use OS keychain:
import keyring
# Store token
keyring.set_password("minim", "spotify_access_token", access_token)
# Retrieve token
access_token = keyring.get_password("minim", "spotify_access_token")
Inconsistent Type Hints
Problem: Some functions have type hints, others don't.
Impact:
- Reduced IDE autocomplete support
- No static type checking
- Harder to understand function signatures
Recommendation: Add comprehensive type hints and enable mypy:
from typing import Optional, Dict, List, Any
def search(
self,
query: str,
types: List[str] = ["track"],
limit: int = 20,
offset: int = 0
) -> Dict[str, Any]:
"""Search catalog."""
...
Code Metrics
Complexity
Cyclomatic Complexity: Not measured. Likely moderate to high in large modules (tidal.py, spotify.py).
Recommendation: Use radon to measure complexity:
pip install radon
radon cc minim/ -a # Average complexity
radon cc minim/ -n D # Show functions with complexity > D (high)
Duplication
Code Duplication: Likely present across API clients (authentication, request handling).
Recommendation: Extract common patterns to base class:
class BaseAPI:
def __init__(self, service_name: str):
self.service_name = service_name
self.load_credentials()
def load_credentials(self):
# Common credential loading logic
...
def _request(self, method, url, **kwargs):
# Common request handling
...
class SpotifyAPI(BaseAPI):
def __init__(self):
super().__init__("spotify")
self.base_url = "https://api.spotify.com/v1"
Dependencies
Direct Dependencies: 3 (cryptography, mutagen, requests)
Optional Dependencies: 6 (ffmpeg, flask, levenshtein, numpy, pillow, playwright)
Dependency Graph: Flat (no transitive dependencies within minim modules).
Recommendation: Keep dependencies minimal. Current approach is good.
Summary
minim's codebase is well-structured for a personal project but shows signs of organic growth:
Strengths:
- Consistent API client pattern across modules
- Comprehensive test coverage with real API calls
- Good documentation (docstrings, ReadTheDocs)
- Minimal dependencies
- CI/CD with GitHub Actions
Weaknesses:
- Large monolithic files (
tidal.pyat 12K lines) - Generic error handling (all
RuntimeError) - No rate limiting
- Plain text token storage
- Inconsistent type hints
- No static type checking
Recommendations for v2:
- Split large modules into subpackages
- Define custom exception hierarchy
- Implement rate limiting and backoff
- Use OS keychain for token storage
- Add comprehensive type hints
- Integrate
mypyfor static type checking - Extract common patterns to base classes
- Add code complexity and duplication metrics to CI
The codebase is production-ready for personal use but requires hardening for commercial or large-scale deployment. The v2 rewrite on the dev branch addresses many of these issues.