Files
metadata-agregator/docs/research/minim/analysis/OVERVIEW.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

12 KiB

minim: Overview

Project Identity

Name: minim
Version: 1.1.0
License: GPL-3.0
Language: Python 3.9+
Type: Library (not a server or standalone application)
Repository: https://github.com/bbye98/minim
Author: Benjamin Ye
Documentation: https://minim.readthedocs.io

minim is a Python library for interacting with music streaming service APIs and managing audio file metadata. It provides unified interfaces to five major music platforms and tools for reading, writing, and converting audio metadata across multiple formats.

Architecture Type

minim is a library, not a server or service. Users import modules directly into Python code:

from minim import spotify, tidal, qobuz
from minim.audio import Audio

There is no HTTP API, no daemon process, no deployment infrastructure. The library runs in the caller's process space.

Core Dependencies

Required:

  • cryptography: TIDAL manifest decryption, secure token handling
  • mutagen: Audio file metadata reading/writing (ID3v2, Vorbis Comments, MP4 atoms)
  • requests: HTTP client for all API calls

Optional:

  • ffmpeg: Audio format conversion between codecs
  • flask: OAuth callback server (alternative to http.server)
  • levenshtein: Fuzzy string matching for search results
  • numpy: Audio analysis features
  • pillow: Image processing for album artwork
  • playwright: Browser automation for OAuth flows

The core library works with just the three required dependencies. Optional dependencies enable specific features but aren't mandatory for basic API access.

Codebase Structure

Total Lines: 35,916 across 8 modules

Module Breakdown

Module Lines Purpose
audio.py 1,860 Audio file handling, metadata reading/writing, format conversion
discogs.py 5,501 Discogs API client (database, marketplace, collection, wantlist)
itunes.py 575 iTunes Search API client (search, lookup)
qobuz.py 5,579 Qobuz API client (streaming, catalog, playlists, favorites)
spotify.py 9,862 Spotify Web API + private lyrics service
tidal.py 12,338 TIDAL public + private API (streaming, lyrics, credits)
utility.py 136 Shared utilities (config parsing, string formatting)
__init__.py 65 Package initialization, version info

Observation: tidal.py is disproportionately large at 12,338 lines (34% of total codebase). This suggests either comprehensive API coverage or a need for refactoring into submodules.

API Client Coverage

minim provides clients for five music services:

1. Discogs

  • Auth: OAuth 1.0a (consumer key/secret + access token) or personal access token
  • Scope: Music database (artists, releases, labels), marketplace, user collection/wantlist
  • Rate Limits: 60 requests/minute (authenticated), 25/minute (unauthenticated)

2. iTunes Search API

  • Auth: None required
  • Scope: Public search and lookup across iTunes catalog
  • Rate Limits: 20 requests/minute (approximate)

3. Qobuz

  • Auth: Password grant OAuth (email/password)
  • Scope: Catalog search, streaming URLs, playlists, favorites
  • Special: Auto-extracts app_id and app_secret from web player JavaScript

4. Spotify

  • Auth: Four OAuth 2.0 flows
    • Authorization Code (full user access)
    • PKCE (mobile/desktop apps)
    • Client Credentials (app-only, no user context)
    • Web Player (via sp_dc cookie, undocumented)
  • Scope: Full Web API coverage (30+ permission scopes), private lyrics via Musixmatch integration
  • Special: Most comprehensive API client in the library

5. TIDAL

  • Auth: Public API (client credentials or PKCE) + Private API (additional endpoints)
  • Scope: Catalog, streaming URLs with quality selection, lyrics, credits
  • Quality Levels: LOW, HIGH, LOSSLESS, HI_RES, HI_RES_LOSSLESS
  • Special: Manifest decryption for streaming URLs using cryptography

Audio Format Support

minim handles five audio container formats through the Audio class:

  • FLAC: Free Lossless Audio Codec, Vorbis Comments metadata
  • MP3: MPEG-1 Audio Layer III, ID3v2 tags
  • MP4/M4A: MPEG-4 Part 14, MP4 atom metadata
  • Ogg Vorbis: Ogg container, Vorbis Comments metadata
  • WAVE: Waveform Audio File Format, ID3v2 tags (non-standard but supported)

Format Conversion: FFmpeg integration allows transcoding between formats while preserving metadata.

Auto-Detection: The Audio class automatically detects format from file extension and magic bytes, instantiating the appropriate subclass (FLAC, MP3, MP4, OggVorbis, WAVE).

Authentication Pattern

All API clients follow a consistent authentication flow:

  1. Initialization: __init__() checks for existing tokens in ~/minim.cfg
  2. Flow Selection: set_flow() configures OAuth flow type (auth code, PKCE, client credentials, etc.)
  3. Token Acquisition: set_access_token() performs OAuth handshake or uses provided credentials
  4. Persistence: Tokens saved to ~/minim.cfg via ConfigParser
  5. Auto-Refresh: Expired tokens automatically refreshed on next API call

Token Storage Location: ~/minim.cfg (plain text ConfigParser file)

Security Note: Tokens stored in plain text. No encryption, no OS keychain integration. Suitable for personal use, not production systems handling user credentials.

Request Handling Pattern

All API modules use a common _request() method:

def _request(self, method: str, url: str, **kwargs) -> dict:
    response = requests.request(method, url, headers=self._get_headers(), **kwargs)
    if not response.ok:
        raise RuntimeError(f"API error: {response.status_code} {response.text}")
    return response.json()

Error Handling: All API errors raise RuntimeError with status code and response text. No typed exceptions, no retry logic, no rate limit handling.

Headers: Each service implements _get_headers() to inject authentication (Bearer tokens, OAuth signatures, API keys).

Metadata Mapping

The Audio class provides service-specific metadata setters:

  • set_metadata_using_itunes(data: dict)
  • set_metadata_using_qobuz(data: dict)
  • set_metadata_using_spotify(data: dict)
  • set_metadata_using_tidal(data: dict)

Each method maps service-specific JSON responses to standardized audio file tags:

  • album: Album title
  • artist: Primary artist(s)
  • title: Track title
  • isrc: International Standard Recording Code
  • artwork: Album cover image (bytes)
  • date: Release date
  • genre: Genre classification
  • track_number: Position in album
  • disc_number: Disc number (for multi-disc albums)

Normalization: Handles differences in field names, date formats, artist arrays vs. strings, and artwork URL fetching.

Configuration Management

File: ~/minim.cfg
Format: INI-style via Python's ConfigParser

Structure:

[discogs]
access_token = ...
access_token_secret = ...

[qobuz]
email = user@example.com
password = ...
access_token = ...

[spotify]
client_id = ...
client_secret = ...
access_token = ...
refresh_token = ...

[tidal]
client_id = ...
access_token = ...
refresh_token = ...

Environment Variables: Each service also checks for credentials in environment variables (e.g., SPOTIFY_CLIENT_ID, TIDAL_CLIENT_SECRET). Environment variables take precedence over config file.

OAuth Callback Methods

minim supports three methods for handling OAuth redirect URIs:

1. http.server (default)

  • Spawns temporary HTTP server on localhost
  • Listens for OAuth callback
  • Extracts authorization code from query parameters
  • Shuts down after receiving callback

2. Flask

  • Uses Flask development server for callback handling
  • Same flow as http.server but with Flask routing
  • Requires flask optional dependency

3. Playwright

  • Launches headless browser
  • Automates login flow
  • Intercepts redirect URL
  • Extracts authorization code
  • Requires playwright optional dependency

Use Case: Playwright useful for services requiring CAPTCHA or complex login flows. Flask/http.server sufficient for standard OAuth.

Testing Infrastructure

Framework: pytest
Coverage: 6 test files, one per major module
Test Style: Class-based with setup_class() for authentication
API Calls: Tests make real API calls (not mocked)
CI Environment: GitHub Actions with Ubuntu, Python 3.9, FFmpeg installed

Coverage Configuration: .coveragerc excludes test files and __init__.py from coverage reports.

Limitation: Real API calls in CI require valid credentials. Tests may fail if rate limits exceeded or services change APIs.

Development Status

Current Version: 1.1.0 (maintenance mode)
Active Development: v2 rewrite on dev branch
Maintenance: Bug fixes and security updates only for v1

v2 Changes (from dev branch inspection):

  • Modular architecture (split large files like tidal.py)
  • Typed exceptions instead of generic RuntimeError
  • Rate limiting built-in
  • Async support via aiohttp
  • Secure token storage via OS keychain
  • PyPI publication planned

Distribution

Current: Install from source only

git clone https://github.com/bbye98/minim.git
cd minim
python -m pip install -e .

Conda: environment.yml provided for conda users

PyPI: Not yet published (planned for v2)

Documentation: Auto-deployed to ReadTheDocs on every push to main branch

Use Cases

minim is designed for:

  1. Personal Music Library Management: Fetch metadata from streaming services, write to local audio files
  2. Playlist Synchronization: Export playlists from one service, import to another
  3. Audio File Tagging: Bulk metadata updates using authoritative sources
  4. Music Discovery: Search across multiple services, compare results
  5. Streaming URL Extraction: Download tracks from Qobuz/TIDAL (within terms of service)

Not Designed For:

  • Production web services (no rate limiting, plain text tokens)
  • Real-time streaming (no playback engine)
  • Large-scale automation (no async, no connection pooling)

Key Strengths

  1. Unified Interface: Consistent API across five different services
  2. Comprehensive Coverage: Implements most endpoints for each service
  3. Automatic Token Management: Caching, refresh, persistence handled transparently
  4. Audio Metadata Integration: Direct mapping from API responses to file tags
  5. Multiple OAuth Flows: Flexibility in authentication methods
  6. Pure Python: Minimal dependencies, easy to install and modify

Key Limitations

  1. GPL-3.0 License: Copyleft requires derivative works to be GPL-3.0
  2. Plain Text Token Storage: Security risk for shared systems
  3. No Rate Limiting: Caller responsible for respecting API limits
  4. Generic Error Handling: All errors are RuntimeError, no typed exceptions
  5. Synchronous Only: No async support in v1
  6. Private API Dependency: Qobuz, Spotify lyrics, TIDAL private endpoints can break without notice
  7. Monolithic Files: tidal.py at 12K lines is difficult to navigate and maintain

Integration Potential

For a metadata aggregator project, minim provides:

  • Reference Implementations: OAuth flows for each service
  • Token Management Pattern: Config file caching, auto-refresh logic
  • Metadata Normalization: Field mapping from service-specific to standardized schemas
  • Audio File Handling: Reading/writing tags across formats

Reusability: Code can be extracted and adapted (respecting GPL-3.0). The authentication patterns, request handling, and metadata mapping are particularly valuable.

Caution: Private API usage (Qobuz app_id extraction, Spotify lyrics, TIDAL private endpoints) may violate terms of service. Use only documented public APIs in production systems.

Conclusion

minim is a mature, feature-rich library for music service API integration and audio metadata management. It excels at providing unified access to multiple streaming platforms with automatic authentication handling. The codebase demonstrates solid engineering practices (testing, documentation, CI/CD) but shows signs of organic growth (large monolithic files, generic error handling).

For personal projects and research, minim is production-ready. For commercial or large-scale use, the v2 rewrite addresses critical limitations (async, rate limiting, secure storage, typed exceptions). The GPL-3.0 license requires careful consideration for proprietary projects.

As a reference implementation, minim is invaluable for understanding how to integrate with music streaming APIs, handle OAuth flows, and normalize metadata across services.