# minim: Codebase Analysis ## Repository Structure ``` minim/ ├── .github/ │ └── workflows/ │ └── ci.yml # GitHub Actions CI/CD ├── docs/ │ ├── conf.py # Sphinx configuration │ ├── index.rst # Documentation index │ └── ... # Additional documentation ├── minim/ │ ├── __init__.py # Package initialization (65 lines) │ ├── audio.py # Audio file handling (1,860 lines) │ ├── discogs.py # Discogs API client (5,501 lines) │ ├── itunes.py # iTunes API client (575 lines) │ ├── qobuz.py # Qobuz API client (5,579 lines) │ ├── spotify.py # Spotify API client (9,862 lines) │ ├── tidal.py # TIDAL API client (12,338 lines) │ └── utility.py # Shared utilities (136 lines) ├── tests/ │ ├── test_audio.py # Audio module tests │ ├── test_discogs.py # Discogs tests │ ├── test_itunes.py # iTunes tests │ ├── test_qobuz.py # Qobuz tests │ ├── test_spotify.py # Spotify tests │ └── test_tidal.py # TIDAL tests ├── .coveragerc # Coverage configuration ├── .gitignore # Git ignore patterns ├── environment.yml # Conda environment ├── LICENSE # GPL-3.0 license ├── README.md # Project README └── setup.py # Package setup ``` **Total Source Lines:** 35,916 (excluding tests, docs, config) **Module Distribution:** - `tidal.py`: 34.4% (12,338 lines) - `spotify.py`: 27.5% (9,862 lines) - `discogs.py`: 15.3% (5,501 lines) - `qobuz.py`: 15.5% (5,579 lines) - `audio.py`: 5.2% (1,860 lines) - `itunes.py`: 1.6% (575 lines) - `utility.py`: 0.4% (136 lines) - `__init__.py`: 0.2% (65 lines) **Observation:** `tidal.py` is disproportionately large. This suggests either comprehensive API coverage or a need for refactoring into submodules. ## Code Organization ### Package Initialization (`__init__.py`) **Purpose:** Package metadata and version info **Contents:** ```python """ minim: Comprehensive music metadata library """ __version__ = "1.1.0" __author__ = "Benjamin Ye" __email__ = "bbye98@gmail.com" __license__ = "GPL-3.0" __url__ = "https://github.com/bbye98/minim" # No automatic imports (users import specific modules) ``` **Design Choice:** No automatic imports. Users explicitly import modules: ```python from minim import spotify # Not: from minim.spotify import WebAPI ``` ### Utility Module (`utility.py`) **Purpose:** Shared utilities across all modules **Functions:** **Config File Handling:** ```python def get_config_path() -> str: """Get path to minim config file.""" return os.path.expanduser("~/minim.cfg") def load_config() -> ConfigParser: """Load config file.""" config = ConfigParser() config.read(get_config_path()) return config def save_config(config: ConfigParser) -> None: """Save config file.""" with open(get_config_path(), "w") as f: config.write(f) ``` **String Formatting:** ```python def format_duration(seconds: int) -> str: """Format duration in seconds to MM:SS or HH:MM:SS.""" hours, remainder = divmod(seconds, 3600) minutes, seconds = divmod(remainder, 60) if hours > 0: return f"{hours}:{minutes:02d}:{seconds:02d}" else: return f"{minutes}:{seconds:02d}" def sanitize_filename(filename: str) -> str: """Remove invalid characters from filename.""" invalid_chars = '<>:"/\\|?*' for char in invalid_chars: filename = filename.replace(char, "_") return filename ``` **URL Handling:** ```python def build_url(base: str, path: str, params: dict = None) -> str: """Build URL with path and query parameters.""" url = base.rstrip("/") + "/" + path.lstrip("/") if params: query = "&".join(f"{k}={v}" for k, v in params.items() if v is not None) url += "?" + query return url ``` **Minimal Utilities:** Only 136 lines. Most logic is self-contained within each module. ## Configuration Management ### Config File Format **Location:** `~/minim.cfg` **Parser:** Python's `ConfigParser` (INI format) **Structure:** ```ini [section_name] key = value key2 = value2 ``` **Reading:** ```python from configparser import ConfigParser import os config = ConfigParser() config.read(os.path.expanduser("~/minim.cfg")) value = config.get("section", "key", fallback=None) int_value = config.getint("section", "key", fallback=0) bool_value = config.getboolean("section", "key", fallback=False) ``` **Writing:** ```python if not config.has_section("section"): config.add_section("section") config.set("section", "key", "value") with open(os.path.expanduser("~/minim.cfg"), "w") as f: config.write(f) ``` ### Environment Variables **Pattern:** `{SERVICE}_{FIELD}` in uppercase **Examples:** - `SPOTIFY_CLIENT_ID` - `TIDAL_ACCESS_TOKEN` - `QOBUZ_EMAIL` **Reading:** ```python import os client_id = os.getenv("SPOTIFY_CLIENT_ID") client_secret = os.getenv("SPOTIFY_CLIENT_SECRET") ``` **Precedence in Code:** ```python def __init__(self, client_id=None, client_secret=None): # 1. Explicit parameter self.client_id = client_id # 2. Environment variable if not self.client_id: self.client_id = os.getenv("SPOTIFY_CLIENT_ID") # 3. Config file if not self.client_id: config = load_config() if config.has_section("spotify"): self.client_id = config.get("spotify", "client_id", fallback=None) ``` ## Logging and Error Handling ### Logging **No Structured Logging:** minim does not use Python's `logging` module. **Warnings:** ```python import warnings warnings.warn("Token will expire soon", UserWarning) ``` **Use Cases:** - Non-critical issues (token expiration warnings) - Deprecated features - Fallback behavior **No Debug Logging:** No verbose output for debugging. Users must add their own logging. ### Error Handling **Strategy:** Fail-fast with exceptions **Exception Types:** - `RuntimeError`: API errors, HTTP failures - `ValueError`: Invalid input, unsupported formats - `FileNotFoundError`: Missing audio files - `KeyError`: Missing required fields in API responses **No Custom Exceptions:** All errors use built-in exception types. **Example:** ```python def _request(self, method, url, **kwargs): response = requests.request(method, url, **kwargs) if not response.ok: raise RuntimeError( f"{method} {url} failed: {response.status_code} {response.text}" ) return response.json() ``` **Error Messages:** - Include HTTP method and URL - Include status code and response body - No error codes or structured error objects **Caller Responsibility:** ```python try: track = api.get_track(12345) except RuntimeError as e: # Parse error message to determine cause if "404" in str(e): print("Track not found") elif "401" in str(e): print("Authentication failed") else: print(f"Unknown error: {e}") ``` ## Testing Infrastructure ### Test Framework **Tool:** pytest **Test Files:** - `tests/test_audio.py`: Audio file handling tests - `tests/test_discogs.py`: Discogs API tests - `tests/test_itunes.py`: iTunes API tests - `tests/test_qobuz.py`: Qobuz API tests - `tests/test_spotify.py`: Spotify API tests - `tests/test_tidal.py`: TIDAL API tests **Test Structure:** ```python import pytest from minim import spotify class TestSpotifyWebAPI: @classmethod def setup_class(cls): """Set up API client for all tests.""" cls.api = spotify.WebAPI( client_id=os.getenv("SPOTIFY_CLIENT_ID"), client_secret=os.getenv("SPOTIFY_CLIENT_SECRET") ) cls.api.set_flow("client_credentials") cls.api.set_access_token() def test_search(self): """Test search functionality.""" results = self.api.search("Radiohead", types=["artist"], limit=1) assert "artists" in results assert len(results["artists"]["items"]) > 0 assert results["artists"]["items"][0]["name"] == "Radiohead" def test_get_artist(self): """Test get artist by ID.""" artist = self.api.get_artist("4Z8W4fKeB5YxbusRsdQVPb") assert artist["name"] == "Radiohead" assert artist["type"] == "artist" def test_invalid_id(self): """Test error handling for invalid ID.""" with pytest.raises(RuntimeError): self.api.get_artist("invalid_id") ``` **Class-Based Tests:** - `setup_class()`: Run once before all tests in class - `teardown_class()`: Run once after all tests in class - Shared API client across tests (reduces authentication overhead) **Real API Calls:** - Tests make actual HTTP requests to services - Requires valid credentials in environment variables - May fail if services are down or rate limits exceeded **No Mocking:** Tests do not use `unittest.mock` or `responses` library. All API calls are real. **Pros:** - Tests verify actual API behavior - Catches API changes immediately **Cons:** - Slow (network latency) - Flaky (depends on service availability) - Rate limiting issues - Requires credentials ### Coverage Configuration **File:** `.coveragerc` ```ini [run] source = minim omit = */tests/* */__init__.py */site-packages/* [report] exclude_lines = pragma: no cover def __repr__ raise AssertionError raise NotImplementedError if __name__ == .__main__.: if TYPE_CHECKING: precision = 2 show_missing = True ``` **Coverage Execution:** ```bash coverage run -m pytest tests/ coverage report coverage html ``` **Coverage Metrics:** Not documented in repository. Estimated 60-80% based on test file count and module complexity. ### Continuous Integration **Platform:** GitHub Actions **Workflow:** `.github/workflows/ci.yml` **Triggers:** - Push to `main` or `dev` branches - Pull requests to `main` **Jobs:** **Linting:** ```yaml - name: Lint with ruff run: ruff check . ``` **Testing:** ```yaml - name: Run tests env: SPOTIFY_CLIENT_ID: ${{ secrets.SPOTIFY_CLIENT_ID }} SPOTIFY_CLIENT_SECRET: ${{ secrets.SPOTIFY_CLIENT_SECRET }} TIDAL_CLIENT_ID: ${{ secrets.TIDAL_CLIENT_ID }} TIDAL_CLIENT_SECRET: ${{ secrets.TIDAL_CLIENT_SECRET }} run: pytest tests/ ``` **Environment:** - OS: Ubuntu 22.04 - Python: 3.9 - FFmpeg: Installed via apt **Secrets:** API credentials stored in GitHub Secrets, injected as environment variables. ## Code Style ### Linting **Tool:** ruff (modern, fast Python linter) **Replaces:** flake8, pylint, isort, pyupgrade **Configuration:** `pyproject.toml` or `ruff.toml` ```toml [tool.ruff] line-length = 88 target-version = "py39" [tool.ruff.lint] select = [ "E", # pycodestyle errors "W", # pycodestyle warnings "F", # pyflakes "I", # isort "N", # pep8-naming "UP", # pyupgrade ] ignore = [ "E501", # line too long (handled by formatter) ] ``` **Execution:** ```bash ruff check . ruff check --fix . # Auto-fix issues ``` ### Formatting **No Formatter:** minim does not use `black`, `autopep8`, or similar formatters. **Style:** Follows PEP 8 with manual formatting. **Line Length:** Approximately 88 characters (black default), but not enforced. ### Type Hints **Partial Coverage:** Type hints used inconsistently. **Examples:** **With Type Hints:** ```python def search(self, query: str, types: list[str] = ["track"], limit: int = 20) -> dict: """Search Spotify catalog.""" ... ``` **Without Type Hints:** ```python def _request(self, method, url, **kwargs): """Make HTTP request.""" ... ``` **No Type Checking:** Does not use `mypy` or `pyright` for static type checking. **Recommendation for v2:** Add comprehensive type hints and integrate `mypy` into CI. ### Docstrings **Format:** Google-style docstrings **Example:** ```python def get_track(self, track_id: str, market: str = None) -> dict: """ Get track details. Args: track_id: Spotify track ID market: ISO 3166-1 alpha-2 country code Returns: Track object with metadata Raises: RuntimeError: If API request fails Example: >>> api = WebAPI(client_id="...", client_secret="...") >>> track = api.get_track("3n3Ppam7vgaVa1iaRUc9Lp") >>> print(track["name"]) Creep """ params = {} if market: params["market"] = market return self._request("GET", f"/tracks/{track_id}", params=params) ``` **Coverage:** Most public methods have docstrings. Private methods (`_request`, `_get_headers`) often lack documentation. **Sphinx Integration:** Docstrings parsed by Sphinx for ReadTheDocs documentation. ## Code Patterns ### API Client Pattern **Common Structure:** ```python class API: def __init__(self, client_id=None, client_secret=None, access_token=None): # Load credentials from parameters, env vars, or config file self.client_id = client_id or os.getenv("SERVICE_CLIENT_ID") self.client_secret = client_secret or os.getenv("SERVICE_CLIENT_SECRET") self.access_token = access_token # Load from config file if not provided config = load_config() if config.has_section("service"): self.access_token = self.access_token or config.get("service", "access_token") # API base URL self.base_url = "https://api.service.com/v1" def set_flow(self, flow_type="authorization_code", **kwargs): """Configure OAuth flow.""" self.flow_type = flow_type # Store flow-specific parameters def set_access_token(self, method="http.server"): """Obtain access token via OAuth flow.""" # Implement OAuth flow # Save token to config file def _get_headers(self) -> dict: """Get HTTP headers with authentication.""" return {"Authorization": f"Bearer {self.access_token}"} def _request(self, method: str, url: str, **kwargs) -> dict: """Make authenticated HTTP request.""" if not url.startswith("http"): url = self.base_url + url headers = kwargs.pop("headers", {}) headers.update(self._get_headers()) response = requests.request(method, url, headers=headers, **kwargs) if not response.ok: raise RuntimeError(f"{method} {url} failed: {response.status_code}") return response.json() # Public API methods def search(self, query: str, **kwargs) -> dict: """Search catalog.""" return self._request("GET", "/search", params={"q": query, **kwargs}) def get_track(self, track_id: str) -> dict: """Get track details.""" return self._request("GET", f"/tracks/{track_id}") ``` **Consistency:** All API clients (`discogs.py`, `spotify.py`, `tidal.py`, `qobuz.py`) follow this pattern with minor variations. ### Audio File Pattern **Base Class with Subclasses:** ```python class Audio: def __init__(self, filepath: str): self.filepath = filepath self._file = mutagen.File(filepath) # Auto-detect format and change class if isinstance(self._file, mutagen.flac.FLAC): self.__class__ = FLAC elif isinstance(self._file, mutagen.mp3.MP3): self.__class__ = MP3 # ... etc self.read_metadata() def read_metadata(self): """Read metadata from file. Implemented by subclasses.""" raise NotImplementedError def write_metadata(self): """Write metadata to file. Implemented by subclasses.""" raise NotImplementedError class FLAC(Audio): def read_metadata(self): self.title = self._file.get("TITLE", [None])[0] self.artist = self._file.get("ARTIST", [None])[0] # ... etc def write_metadata(self): self._file["TITLE"] = self.title self._file["ARTIST"] = self.artist # ... etc self._file.save() ``` **Dynamic Class Change:** `self.__class__ = FLAC` changes instance class after initialization. Unusual pattern but works for format auto-detection. ### OAuth Callback Pattern **Three Implementations:** **1. http.server:** ```python def _listen_http_server(self): class CallbackHandler(BaseHTTPRequestHandler): def do_GET(self): query = parse_qs(urlparse(self.path).query) self.server.authorization_code = query.get("code", [None])[0] self.send_response(200) self.end_headers() self.wfile.write(b"Authorization successful. You may close this window.") server = HTTPServer(("localhost", 8888), CallbackHandler) server.handle_request() return server.authorization_code ``` **2. Flask:** ```python def _listen_flask(self): app = Flask(__name__) authorization_code = None @app.route("/callback") def callback(): nonlocal authorization_code authorization_code = request.args.get("code") shutdown = request.environ.get("werkzeug.server.shutdown") if shutdown: shutdown() return "Authorization successful. You may close this window." app.run(port=8888) return authorization_code ``` **3. Playwright:** ```python def _automate_browser(self): from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=True) page = browser.new_page() page.goto(self.auth_url) page.fill("#username", self.email) page.fill("#password", self.password) page.click("button[type=submit]") page.wait_for_url(f"{self.redirect_uri}*") code = parse_qs(urlparse(page.url).query)["code"][0] browser.close() return code ``` **Flexibility:** Users choose callback method based on environment (headless server, desktop, etc.). ## Code Quality Issues ### Large Monolithic Files **Problem:** `tidal.py` is 12,338 lines (34% of codebase). **Impact:** - Difficult to navigate - Slow to load in editors - Hard to maintain - Merge conflicts more likely **Recommendation:** Split into submodules: ``` minim/tidal/ ├── __init__.py ├── auth.py # Authentication ├── catalog.py # Catalog endpoints ├── streaming.py # Streaming URLs ├── lyrics.py # Lyrics endpoints ├── user.py # User library └── models.py # Data models ``` ### Generic Error Handling **Problem:** All errors are `RuntimeError` with string messages. **Impact:** - Caller must parse error messages to determine cause - No structured error handling - Difficult to distinguish error types **Recommendation:** Define custom exceptions: ```python class MinimError(Exception): """Base exception for minim.""" class APIError(MinimError): """API request failed.""" def __init__(self, status_code: int, message: str): self.status_code = status_code self.message = message super().__init__(f"API error {status_code}: {message}") class AuthenticationError(MinimError): """Authentication failed.""" class RateLimitError(APIError): """Rate limit exceeded.""" def __init__(self, retry_after: int): self.retry_after = retry_after super().__init__(429, f"Rate limit exceeded. Retry after {retry_after}s") ``` ### No Rate Limiting **Problem:** No built-in rate limiting. Caller responsible for tracking. **Impact:** - Easy to exceed service rate limits - No automatic backoff - Tests may fail due to rate limiting **Recommendation:** Implement rate limiter: ```python from time import time, sleep class RateLimiter: def __init__(self, requests_per_minute: int): self.requests_per_minute = requests_per_minute self.requests = [] def wait_if_needed(self): now = time() # Remove requests older than 1 minute self.requests = [t for t in self.requests if now - t < 60] if len(self.requests) >= self.requests_per_minute: sleep_time = 60 - (now - self.requests[0]) if sleep_time > 0: sleep(sleep_time) self.requests.append(time()) # Usage in API client class API: def __init__(self): self.rate_limiter = RateLimiter(60) # 60 requests per minute def _request(self, method, url, **kwargs): self.rate_limiter.wait_if_needed() # Make request ``` ### Plain Text Token Storage **Problem:** Tokens stored unencrypted in `~/minim.cfg`. **Impact:** - Security risk on shared systems - Tokens readable by any process - Passwords stored in plain text (Qobuz) **Recommendation:** Use OS keychain: ```python import keyring # Store token keyring.set_password("minim", "spotify_access_token", access_token) # Retrieve token access_token = keyring.get_password("minim", "spotify_access_token") ``` ### Inconsistent Type Hints **Problem:** Some functions have type hints, others don't. **Impact:** - Reduced IDE autocomplete support - No static type checking - Harder to understand function signatures **Recommendation:** Add comprehensive type hints and enable `mypy`: ```python from typing import Optional, Dict, List, Any def search( self, query: str, types: List[str] = ["track"], limit: int = 20, offset: int = 0 ) -> Dict[str, Any]: """Search catalog.""" ... ``` ## Code Metrics ### Complexity **Cyclomatic Complexity:** Not measured. Likely moderate to high in large modules (`tidal.py`, `spotify.py`). **Recommendation:** Use `radon` to measure complexity: ```bash pip install radon radon cc minim/ -a # Average complexity radon cc minim/ -n D # Show functions with complexity > D (high) ``` ### Duplication **Code Duplication:** Likely present across API clients (authentication, request handling). **Recommendation:** Extract common patterns to base class: ```python class BaseAPI: def __init__(self, service_name: str): self.service_name = service_name self.load_credentials() def load_credentials(self): # Common credential loading logic ... def _request(self, method, url, **kwargs): # Common request handling ... class SpotifyAPI(BaseAPI): def __init__(self): super().__init__("spotify") self.base_url = "https://api.spotify.com/v1" ``` ### Dependencies **Direct Dependencies:** 3 (cryptography, mutagen, requests) **Optional Dependencies:** 6 (ffmpeg, flask, levenshtein, numpy, pillow, playwright) **Dependency Graph:** Flat (no transitive dependencies within minim modules). **Recommendation:** Keep dependencies minimal. Current approach is good. ## Summary minim's codebase is well-structured for a personal project but shows signs of organic growth: **Strengths:** - Consistent API client pattern across modules - Comprehensive test coverage with real API calls - Good documentation (docstrings, ReadTheDocs) - Minimal dependencies - CI/CD with GitHub Actions **Weaknesses:** - Large monolithic files (`tidal.py` at 12K lines) - Generic error handling (all `RuntimeError`) - No rate limiting - Plain text token storage - Inconsistent type hints - No static type checking **Recommendations for v2:** - Split large modules into subpackages - Define custom exception hierarchy - Implement rate limiting and backoff - Use OS keychain for token storage - Add comprehensive type hints - Integrate `mypy` for static type checking - Extract common patterns to base classes - Add code complexity and duplication metrics to CI The codebase is production-ready for personal use but requires hardening for commercial or large-scale deployment. The v2 rewrite on the `dev` branch addresses many of these issues.