Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

24 KiB

Raw Blame History

minim: Codebase Analysis

Repository Structure

minim/
├── .github/
│   └── workflows/
│       └── ci.yml                 # GitHub Actions CI/CD
├── docs/
│   ├── conf.py                    # Sphinx configuration
│   ├── index.rst                  # Documentation index
│   └── ...                        # Additional documentation
├── minim/
│   ├── __init__.py                # Package initialization (65 lines)
│   ├── audio.py                   # Audio file handling (1,860 lines)
│   ├── discogs.py                 # Discogs API client (5,501 lines)
│   ├── itunes.py                  # iTunes API client (575 lines)
│   ├── qobuz.py                   # Qobuz API client (5,579 lines)
│   ├── spotify.py                 # Spotify API client (9,862 lines)
│   ├── tidal.py                   # TIDAL API client (12,338 lines)
│   └── utility.py                 # Shared utilities (136 lines)
├── tests/
│   ├── test_audio.py              # Audio module tests
│   ├── test_discogs.py            # Discogs tests
│   ├── test_itunes.py             # iTunes tests
│   ├── test_qobuz.py              # Qobuz tests
│   ├── test_spotify.py            # Spotify tests
│   └── test_tidal.py              # TIDAL tests
├── .coveragerc                    # Coverage configuration
├── .gitignore                     # Git ignore patterns
├── environment.yml                # Conda environment
├── LICENSE                        # GPL-3.0 license
├── README.md                      # Project README
└── setup.py                       # Package setup

Total Source Lines: 35,916 (excluding tests, docs, config)

Module Distribution:

tidal.py: 34.4% (12,338 lines)
spotify.py: 27.5% (9,862 lines)
discogs.py: 15.3% (5,501 lines)
qobuz.py: 15.5% (5,579 lines)
audio.py: 5.2% (1,860 lines)
itunes.py: 1.6% (575 lines)
utility.py: 0.4% (136 lines)
__init__.py: 0.2% (65 lines)

Observation: tidal.py is disproportionately large. This suggests either comprehensive API coverage or a need for refactoring into submodules.

Code Organization

Package Initialization (`init.py`)

Purpose: Package metadata and version info

Contents:

"""
minim: Comprehensive music metadata library
"""

__version__ = "1.1.0"
__author__ = "Benjamin Ye"
__email__ = "bbye98@gmail.com"
__license__ = "GPL-3.0"
__url__ = "https://github.com/bbye98/minim"

# No automatic imports (users import specific modules)

Design Choice: No automatic imports. Users explicitly import modules:

from minim import spotify  # Not: from minim.spotify import WebAPI

Utility Module (`utility.py`)

Purpose: Shared utilities across all modules

Functions:

Config File Handling:

def get_config_path() -> str:
    """Get path to minim config file."""
    return os.path.expanduser("~/minim.cfg")

def load_config() -> ConfigParser:
    """Load config file."""
    config = ConfigParser()
    config.read(get_config_path())
    return config

def save_config(config: ConfigParser) -> None:
    """Save config file."""
    with open(get_config_path(), "w") as f:
        config.write(f)

String Formatting:

def format_duration(seconds: int) -> str:
    """Format duration in seconds to MM:SS or HH:MM:SS."""
    hours, remainder = divmod(seconds, 3600)
    minutes, seconds = divmod(remainder, 60)
    
    if hours > 0:
        return f"{hours}:{minutes:02d}:{seconds:02d}"
    else:
        return f"{minutes}:{seconds:02d}"

def sanitize_filename(filename: str) -> str:
    """Remove invalid characters from filename."""
    invalid_chars = '<>:"/\\|?*'
    for char in invalid_chars:
        filename = filename.replace(char, "_")
    return filename

URL Handling:

def build_url(base: str, path: str, params: dict = None) -> str:
    """Build URL with path and query parameters."""
    url = base.rstrip("/") + "/" + path.lstrip("/")
    
    if params:
        query = "&".join(f"{k}={v}" for k, v in params.items() if v is not None)
        url += "?" + query
    
    return url

Minimal Utilities: Only 136 lines. Most logic is self-contained within each module.

Configuration Management

Config File Format

Location: ~/minim.cfg

Parser: Python's ConfigParser (INI format)

Structure:

[section_name]
key = value
key2 = value2

Reading:

from configparser import ConfigParser
import os

config = ConfigParser()
config.read(os.path.expanduser("~/minim.cfg"))

value = config.get("section", "key", fallback=None)
int_value = config.getint("section", "key", fallback=0)
bool_value = config.getboolean("section", "key", fallback=False)

Writing:

if not config.has_section("section"):
    config.add_section("section")

config.set("section", "key", "value")

with open(os.path.expanduser("~/minim.cfg"), "w") as f:
    config.write(f)

Environment Variables

Pattern: {SERVICE}_{FIELD} in uppercase

Examples:

SPOTIFY_CLIENT_ID
TIDAL_ACCESS_TOKEN
QOBUZ_EMAIL

Reading:

import os

client_id = os.getenv("SPOTIFY_CLIENT_ID")
client_secret = os.getenv("SPOTIFY_CLIENT_SECRET")

Precedence in Code:

def __init__(self, client_id=None, client_secret=None):
    # 1. Explicit parameter
    self.client_id = client_id
    
    # 2. Environment variable
    if not self.client_id:
        self.client_id = os.getenv("SPOTIFY_CLIENT_ID")
    
    # 3. Config file
    if not self.client_id:
        config = load_config()
        if config.has_section("spotify"):
            self.client_id = config.get("spotify", "client_id", fallback=None)

Logging and Error Handling

Logging

No Structured Logging: minim does not use Python's logging module.

Warnings:

import warnings

warnings.warn("Token will expire soon", UserWarning)

Use Cases:

Non-critical issues (token expiration warnings)
Deprecated features
Fallback behavior

No Debug Logging: No verbose output for debugging. Users must add their own logging.

Error Handling

Strategy: Fail-fast with exceptions

Exception Types:

RuntimeError: API errors, HTTP failures
ValueError: Invalid input, unsupported formats
FileNotFoundError: Missing audio files
KeyError: Missing required fields in API responses

No Custom Exceptions: All errors use built-in exception types.

Example:

def _request(self, method, url, **kwargs):
    response = requests.request(method, url, **kwargs)
    
    if not response.ok:
        raise RuntimeError(
            f"{method} {url} failed: {response.status_code} {response.text}"
        )
    
    return response.json()

Error Messages:

Include HTTP method and URL
Include status code and response body
No error codes or structured error objects

Caller Responsibility:

try:
    track = api.get_track(12345)
except RuntimeError as e:
    # Parse error message to determine cause
    if "404" in str(e):
        print("Track not found")
    elif "401" in str(e):
        print("Authentication failed")
    else:
        print(f"Unknown error: {e}")

Testing Infrastructure

Test Framework

Tool: pytest

Test Files:

tests/test_audio.py: Audio file handling tests
tests/test_discogs.py: Discogs API tests
tests/test_itunes.py: iTunes API tests
tests/test_qobuz.py: Qobuz API tests
tests/test_spotify.py: Spotify API tests
tests/test_tidal.py: TIDAL API tests

Test Structure:

import pytest
from minim import spotify

class TestSpotifyWebAPI:
    @classmethod
    def setup_class(cls):
        """Set up API client for all tests."""
        cls.api = spotify.WebAPI(
            client_id=os.getenv("SPOTIFY_CLIENT_ID"),
            client_secret=os.getenv("SPOTIFY_CLIENT_SECRET")
        )
        cls.api.set_flow("client_credentials")
        cls.api.set_access_token()
    
    def test_search(self):
        """Test search functionality."""
        results = self.api.search("Radiohead", types=["artist"], limit=1)
        
        assert "artists" in results
        assert len(results["artists"]["items"]) > 0
        assert results["artists"]["items"][0]["name"] == "Radiohead"
    
    def test_get_artist(self):
        """Test get artist by ID."""
        artist = self.api.get_artist("4Z8W4fKeB5YxbusRsdQVPb")
        
        assert artist["name"] == "Radiohead"
        assert artist["type"] == "artist"
    
    def test_invalid_id(self):
        """Test error handling for invalid ID."""
        with pytest.raises(RuntimeError):
            self.api.get_artist("invalid_id")

Class-Based Tests:

setup_class(): Run once before all tests in class
teardown_class(): Run once after all tests in class
Shared API client across tests (reduces authentication overhead)

Real API Calls:

Tests make actual HTTP requests to services
Requires valid credentials in environment variables
May fail if services are down or rate limits exceeded

No Mocking: Tests do not use unittest.mock or responses library. All API calls are real.

Pros:

Tests verify actual API behavior
Catches API changes immediately

Cons:

Slow (network latency)
Flaky (depends on service availability)
Rate limiting issues
Requires credentials

Coverage Configuration

File: .coveragerc

[run]
source = minim
omit =
    */tests/*
    */__init__.py
    */site-packages/*

[report]
exclude_lines =
    pragma: no cover
    def __repr__
    raise AssertionError
    raise NotImplementedError
    if __name__ == .__main__.:
    if TYPE_CHECKING:

precision = 2
show_missing = True

Coverage Execution:

coverage run -m pytest tests/
coverage report
coverage html

Coverage Metrics: Not documented in repository. Estimated 60-80% based on test file count and module complexity.

Continuous Integration

Platform: GitHub Actions

Workflow: .github/workflows/ci.yml

Triggers:

Push to main or dev branches
Pull requests to main

Jobs:

Linting:

- name: Lint with ruff
  run: ruff check .

Testing:

- name: Run tests
  env:
    SPOTIFY_CLIENT_ID: ${{ secrets.SPOTIFY_CLIENT_ID }}
    SPOTIFY_CLIENT_SECRET: ${{ secrets.SPOTIFY_CLIENT_SECRET }}
    TIDAL_CLIENT_ID: ${{ secrets.TIDAL_CLIENT_ID }}
    TIDAL_CLIENT_SECRET: ${{ secrets.TIDAL_CLIENT_SECRET }}
  run: pytest tests/

Environment:

OS: Ubuntu 22.04
Python: 3.9
FFmpeg: Installed via apt

Secrets: API credentials stored in GitHub Secrets, injected as environment variables.

Code Style

Linting

Tool: ruff (modern, fast Python linter)

Replaces: flake8, pylint, isort, pyupgrade

Configuration: pyproject.toml or ruff.toml

[tool.ruff]
line-length = 88
target-version = "py39"

[tool.ruff.lint]
select = [
    "E",   # pycodestyle errors
    "W",   # pycodestyle warnings
    "F",   # pyflakes
    "I",   # isort
    "N",   # pep8-naming
    "UP",  # pyupgrade
]
ignore = [
    "E501",  # line too long (handled by formatter)
]

Execution:

ruff check .
ruff check --fix .  # Auto-fix issues

Formatting

No Formatter: minim does not use black, autopep8, or similar formatters.

Style: Follows PEP 8 with manual formatting.

Line Length: Approximately 88 characters (black default), but not enforced.

Type Hints

Partial Coverage: Type hints used inconsistently.

Examples:

With Type Hints:

def search(self, query: str, types: list[str] = ["track"], limit: int = 20) -> dict:
    """Search Spotify catalog."""
    ...

Without Type Hints:

def _request(self, method, url, **kwargs):
    """Make HTTP request."""
    ...

No Type Checking: Does not use mypy or pyright for static type checking.

Recommendation for v2: Add comprehensive type hints and integrate mypy into CI.

Docstrings

Format: Google-style docstrings

Example:

def get_track(self, track_id: str, market: str = None) -> dict:
    """
    Get track details.
    
    Args:
        track_id: Spotify track ID
        market: ISO 3166-1 alpha-2 country code
    
    Returns:
        Track object with metadata
    
    Raises:
        RuntimeError: If API request fails
    
    Example:
        >>> api = WebAPI(client_id="...", client_secret="...")
        >>> track = api.get_track("3n3Ppam7vgaVa1iaRUc9Lp")
        >>> print(track["name"])
        Creep
    """
    params = {}
    if market:
        params["market"] = market
    
    return self._request("GET", f"/tracks/{track_id}", params=params)

Coverage: Most public methods have docstrings. Private methods (_request, _get_headers) often lack documentation.

Sphinx Integration: Docstrings parsed by Sphinx for ReadTheDocs documentation.

Code Patterns

API Client Pattern

Common Structure:

class API:
    def __init__(self, client_id=None, client_secret=None, access_token=None):
        # Load credentials from parameters, env vars, or config file
        self.client_id = client_id or os.getenv("SERVICE_CLIENT_ID")
        self.client_secret = client_secret or os.getenv("SERVICE_CLIENT_SECRET")
        self.access_token = access_token
        
        # Load from config file if not provided
        config = load_config()
        if config.has_section("service"):
            self.access_token = self.access_token or config.get("service", "access_token")
        
        # API base URL
        self.base_url = "https://api.service.com/v1"
    
    def set_flow(self, flow_type="authorization_code", **kwargs):
        """Configure OAuth flow."""
        self.flow_type = flow_type
        # Store flow-specific parameters
    
    def set_access_token(self, method="http.server"):
        """Obtain access token via OAuth flow."""
        # Implement OAuth flow
        # Save token to config file
    
    def _get_headers(self) -> dict:
        """Get HTTP headers with authentication."""
        return {"Authorization": f"Bearer {self.access_token}"}
    
    def _request(self, method: str, url: str, **kwargs) -> dict:
        """Make authenticated HTTP request."""
        if not url.startswith("http"):
            url = self.base_url + url
        
        headers = kwargs.pop("headers", {})
        headers.update(self._get_headers())
        
        response = requests.request(method, url, headers=headers, **kwargs)
        
        if not response.ok:
            raise RuntimeError(f"{method} {url} failed: {response.status_code}")
        
        return response.json()
    
    # Public API methods
    def search(self, query: str, **kwargs) -> dict:
        """Search catalog."""
        return self._request("GET", "/search", params={"q": query, **kwargs})
    
    def get_track(self, track_id: str) -> dict:
        """Get track details."""
        return self._request("GET", f"/tracks/{track_id}")

Consistency: All API clients (discogs.py, spotify.py, tidal.py, qobuz.py) follow this pattern with minor variations.

Audio File Pattern

Base Class with Subclasses:

class Audio:
    def __init__(self, filepath: str):
        self.filepath = filepath
        self._file = mutagen.File(filepath)
        
        # Auto-detect format and change class
        if isinstance(self._file, mutagen.flac.FLAC):
            self.__class__ = FLAC
        elif isinstance(self._file, mutagen.mp3.MP3):
            self.__class__ = MP3
        # ... etc
        
        self.read_metadata()
    
    def read_metadata(self):
        """Read metadata from file. Implemented by subclasses."""
        raise NotImplementedError
    
    def write_metadata(self):
        """Write metadata to file. Implemented by subclasses."""
        raise NotImplementedError

class FLAC(Audio):
    def read_metadata(self):
        self.title = self._file.get("TITLE", [None])[0]
        self.artist = self._file.get("ARTIST", [None])[0]
        # ... etc
    
    def write_metadata(self):
        self._file["TITLE"] = self.title
        self._file["ARTIST"] = self.artist
        # ... etc
        self._file.save()

Dynamic Class Change: self.__class__ = FLAC changes instance class after initialization. Unusual pattern but works for format auto-detection.

OAuth Callback Pattern

Three Implementations:

1. http.server:

def _listen_http_server(self):
    class CallbackHandler(BaseHTTPRequestHandler):
        def do_GET(self):
            query = parse_qs(urlparse(self.path).query)
            self.server.authorization_code = query.get("code", [None])[0]
            self.send_response(200)
            self.end_headers()
            self.wfile.write(b"Authorization successful. You may close this window.")
    
    server = HTTPServer(("localhost", 8888), CallbackHandler)
    server.handle_request()
    return server.authorization_code

2. Flask:

def _listen_flask(self):
    app = Flask(__name__)
    authorization_code = None
    
    @app.route("/callback")
    def callback():
        nonlocal authorization_code
        authorization_code = request.args.get("code")
        shutdown = request.environ.get("werkzeug.server.shutdown")
        if shutdown:
            shutdown()
        return "Authorization successful. You may close this window."
    
    app.run(port=8888)
    return authorization_code

3. Playwright:

def _automate_browser(self):
    from playwright.sync_api import sync_playwright
    
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        
        page.goto(self.auth_url)
        page.fill("#username", self.email)
        page.fill("#password", self.password)
        page.click("button[type=submit]")
        
        page.wait_for_url(f"{self.redirect_uri}*")
        code = parse_qs(urlparse(page.url).query)["code"][0]
        
        browser.close()
        return code

Flexibility: Users choose callback method based on environment (headless server, desktop, etc.).

Code Quality Issues

Large Monolithic Files

Problem: tidal.py is 12,338 lines (34% of codebase).

Impact:

Difficult to navigate
Slow to load in editors
Hard to maintain
Merge conflicts more likely

Recommendation: Split into submodules:

minim/tidal/
├── __init__.py
├── auth.py          # Authentication
├── catalog.py       # Catalog endpoints
├── streaming.py     # Streaming URLs
├── lyrics.py        # Lyrics endpoints
├── user.py          # User library
└── models.py        # Data models

Generic Error Handling

Problem: All errors are RuntimeError with string messages.

Impact:

Caller must parse error messages to determine cause
No structured error handling
Difficult to distinguish error types

Recommendation: Define custom exceptions:

class MinimError(Exception):
    """Base exception for minim."""

class APIError(MinimError):
    """API request failed."""
    def __init__(self, status_code: int, message: str):
        self.status_code = status_code
        self.message = message
        super().__init__(f"API error {status_code}: {message}")

class AuthenticationError(MinimError):
    """Authentication failed."""

class RateLimitError(APIError):
    """Rate limit exceeded."""
    def __init__(self, retry_after: int):
        self.retry_after = retry_after
        super().__init__(429, f"Rate limit exceeded. Retry after {retry_after}s")

No Rate Limiting

Problem: No built-in rate limiting. Caller responsible for tracking.

Impact:

Easy to exceed service rate limits
No automatic backoff
Tests may fail due to rate limiting

Recommendation: Implement rate limiter:

from time import time, sleep

class RateLimiter:
    def __init__(self, requests_per_minute: int):
        self.requests_per_minute = requests_per_minute
        self.requests = []
    
    def wait_if_needed(self):
        now = time()
        # Remove requests older than 1 minute
        self.requests = [t for t in self.requests if now - t < 60]
        
        if len(self.requests) >= self.requests_per_minute:
            sleep_time = 60 - (now - self.requests[0])
            if sleep_time > 0:
                sleep(sleep_time)
        
        self.requests.append(time())

# Usage in API client
class API:
    def __init__(self):
        self.rate_limiter = RateLimiter(60)  # 60 requests per minute
    
    def _request(self, method, url, **kwargs):
        self.rate_limiter.wait_if_needed()
        # Make request

Plain Text Token Storage

Problem: Tokens stored unencrypted in ~/minim.cfg.

Impact:

Security risk on shared systems
Tokens readable by any process
Passwords stored in plain text (Qobuz)

Recommendation: Use OS keychain:

import keyring

# Store token
keyring.set_password("minim", "spotify_access_token", access_token)

# Retrieve token
access_token = keyring.get_password("minim", "spotify_access_token")

Inconsistent Type Hints

Problem: Some functions have type hints, others don't.

Impact:

Reduced IDE autocomplete support
No static type checking
Harder to understand function signatures

Recommendation: Add comprehensive type hints and enable mypy:

from typing import Optional, Dict, List, Any

def search(
    self,
    query: str,
    types: List[str] = ["track"],
    limit: int = 20,
    offset: int = 0
) -> Dict[str, Any]:
    """Search catalog."""
    ...

Code Metrics

Complexity

Cyclomatic Complexity: Not measured. Likely moderate to high in large modules (tidal.py, spotify.py).

Recommendation: Use radon to measure complexity:

pip install radon
radon cc minim/ -a  # Average complexity
radon cc minim/ -n D  # Show functions with complexity > D (high)

Duplication

Code Duplication: Likely present across API clients (authentication, request handling).

Recommendation: Extract common patterns to base class:

class BaseAPI:
    def __init__(self, service_name: str):
        self.service_name = service_name
        self.load_credentials()
    
    def load_credentials(self):
        # Common credential loading logic
        ...
    
    def _request(self, method, url, **kwargs):
        # Common request handling
        ...

class SpotifyAPI(BaseAPI):
    def __init__(self):
        super().__init__("spotify")
        self.base_url = "https://api.spotify.com/v1"

Dependencies

Direct Dependencies: 3 (cryptography, mutagen, requests)

Optional Dependencies: 6 (ffmpeg, flask, levenshtein, numpy, pillow, playwright)

Dependency Graph: Flat (no transitive dependencies within minim modules).

Recommendation: Keep dependencies minimal. Current approach is good.

Summary

minim's codebase is well-structured for a personal project but shows signs of organic growth:

Strengths:

Consistent API client pattern across modules
Comprehensive test coverage with real API calls
Good documentation (docstrings, ReadTheDocs)
Minimal dependencies
CI/CD with GitHub Actions

Weaknesses:

Large monolithic files (tidal.py at 12K lines)
Generic error handling (all RuntimeError)
No rate limiting
Plain text token storage
Inconsistent type hints
No static type checking

Recommendations for v2:

Split large modules into subpackages
Define custom exception hierarchy
Implement rate limiting and backoff
Use OS keychain for token storage
Add comprehensive type hints
Integrate mypy for static type checking
Extract common patterns to base classes
Add code complexity and duplication metrics to CI

The codebase is production-ready for personal use but requires hardening for commercial or large-scale deployment. The v2 rewrite on the dev branch addresses many of these issues.

24 KiB Raw Blame History

minim: Codebase Analysis

Repository Structure

Code Organization

Package Initialization (__init__.py)

Utility Module (utility.py)

Configuration Management

Config File Format

Environment Variables

Logging and Error Handling

Logging

Error Handling

Testing Infrastructure

Test Framework

Coverage Configuration

Continuous Integration

Code Style

Linting

Formatting

Type Hints

Docstrings

Code Patterns

API Client Pattern

Audio File Pattern

OAuth Callback Pattern

Code Quality Issues

Large Monolithic Files

Generic Error Handling

No Rate Limiting

Plain Text Token Storage

Inconsistent Type Hints

Code Metrics

Complexity

Duplication

Dependencies

Summary

24 KiB

Raw Blame History

Package Initialization (`init.py`)

Utility Module (`utility.py`)