# minim: Data Management ## Data Storage Architecture minim does **not use a database**. All data is either: 1. **Ephemeral:** API responses held in memory during execution 2. **Token Storage:** OAuth tokens persisted to `~/minim.cfg` 3. **Audio Metadata:** Written to audio file tags via mutagen There is no SQL database, no NoSQL store, no caching layer, no persistent data beyond configuration and audio files. ## Token Storage ### File Location **Path:** `~/minim.cfg` (expands to user's home directory) **Format:** INI-style configuration file via Python's `ConfigParser` **Permissions:** Default file permissions (typically 0644 on Unix, readable by user and group) **Security:** Plain text storage. No encryption, no obfuscation, no OS keychain integration. ### File Structure ```ini [discogs] consumer_key = Abcd1234Efgh5678 consumer_secret = IjklMnopQrstUvwx access_token = YzabCdefGhijKlmn access_token_secret = OpqrStuvWxyzAbcd [qobuz] app_id = 123456789 app_secret = abcdefghijklmnopqrstuvwxyz email = user@example.com password = MySecurePassword123 access_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... expires_at = 1672531200 [spotify] client_id = 1234567890abcdef1234567890abcdef client_secret = fedcba0987654321fedcba0987654321 redirect_uri = http://localhost:8888 access_token = BQDxK7...truncated... refresh_token = AQBz3...truncated... expires_at = 1672527600 scopes = user-library-read,playlist-read-private,user-read-playback-state [tidal] client_id = abcdefgh-1234-5678-90ab-cdefghijklmn client_secret = ijklmnop-qrst-uvwx-yzab-cdefghijklmn access_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... refresh_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... user_id = 12345678 country_code = US expires_at = 1672534800 ``` ### Data Fields **Common Fields (OAuth 2.0):** - `client_id`: Application identifier - `client_secret`: Application secret - `access_token`: Bearer token for API requests - `refresh_token`: Token for obtaining new access tokens - `expires_at`: Unix timestamp when access token expires **Service-Specific Fields:** **Discogs (OAuth 1.0a):** - `consumer_key`: OAuth consumer key - `consumer_secret`: OAuth consumer secret - `access_token`: OAuth access token - `access_token_secret`: OAuth access token secret - `personal_access_token`: Alternative to OAuth (from Discogs settings) **Qobuz:** - `app_id`: Qobuz application ID (extracted from web player) - `app_secret`: Qobuz application secret (extracted from web player) - `email`: User email for password grant - `password`: User password (stored in plain text) **Spotify:** - `redirect_uri`: OAuth redirect URI - `scopes`: Comma-separated list of permission scopes **TIDAL:** - `user_id`: TIDAL user ID (numeric) - `country_code`: Two-letter country code for content availability ### Read/Write Operations **Reading:** ```python from configparser import ConfigParser import os config = ConfigParser() config.read(os.path.expanduser("~/minim.cfg")) if config.has_section("spotify"): access_token = config.get("spotify", "access_token", fallback=None) refresh_token = config.get("spotify", "refresh_token", fallback=None) expires_at = config.getint("spotify", "expires_at", fallback=0) ``` **Writing:** ```python config = ConfigParser() config.read(os.path.expanduser("~/minim.cfg")) if not config.has_section("spotify"): config.add_section("spotify") config.set("spotify", "access_token", new_access_token) config.set("spotify", "refresh_token", new_refresh_token) config.set("spotify", "expires_at", str(int(time.time()) + 3600)) with open(os.path.expanduser("~/minim.cfg"), "w") as f: config.write(f) ``` **Concurrency:** Not thread-safe. Concurrent writes from multiple processes can corrupt the file. No file locking, no atomic writes. ### Security Implications **Risks:** 1. **Plain Text Passwords:** Qobuz passwords stored unencrypted 2. **Token Exposure:** Access tokens readable by any process running as the user 3. **No Expiration Cleanup:** Expired tokens remain in file indefinitely 4. **File Permissions:** Default permissions may allow group/other read access **Mitigations (Not Implemented):** - Encrypt sensitive fields using OS keychain (Keyring, Keychain Access, Windows Credential Manager) - Set restrictive file permissions (0600, user-only read/write) - Use environment variables for sensitive credentials - Implement token rotation and cleanup **Recommendation:** For production use, replace file-based storage with secure credential management (AWS Secrets Manager, HashiCorp Vault, OS keychain). ## Audio Metadata Storage ### Tag Formats minim writes metadata to audio files using format-specific tag systems: | Format | Tag System | Implementation | |--------|------------|----------------| | FLAC | Vorbis Comments | `mutagen.flac.FLAC` | | MP3 | ID3v2.4 | `mutagen.id3.ID3` | | MP4/M4A | MP4 Atoms | `mutagen.mp4.MP4` | | Ogg Vorbis | Vorbis Comments | `mutagen.oggvorbis.OggVorbis` | | WAVE | ID3v2 (non-standard) | `mutagen.wave.WAVE` | ### Field Mapping **FLAC (Vorbis Comments):** ``` TITLE = Track title ARTIST = Primary artist(s) ALBUMARTIST = Album artist ALBUM = Album title DATE = Release date (YYYY-MM-DD or YYYY) GENRE = Genre TRACKNUMBER = Track number DISCNUMBER = Disc number ISRC = International Standard Recording Code BARCODE = UPC/EAN barcode LYRICS = Song lyrics COMMENT = Freeform comment COPYRIGHT = Copyright notice METADATA_BLOCK_PICTURE = Embedded artwork (base64-encoded) ``` **MP3 (ID3v2.4):** ``` TIT2 = Track title TPE1 = Primary artist(s) TPE2 = Album artist TALB = Album title TDRC = Release date TCON = Genre TRCK = Track number (format: "3" or "3/12") TPOS = Disc number (format: "1" or "1/2") TSRC = ISRC TXXX:BARCODE = UPC/EAN barcode (custom frame) USLT = Unsynchronized lyrics COMM = Comment TCOP = Copyright APIC = Attached picture (artwork) ``` **MP4 (Atoms):** ``` ©nam = Track title ©ART = Primary artist(s) aART = Album artist ©alb = Album title ©day = Release date ©gen = Genre trkn = Track number (tuple: (track, total)) disk = Disc number (tuple: (disc, total)) ----:com.apple.iTunes:ISRC = ISRC (custom atom) ----:com.apple.iTunes:BARCODE = UPC/EAN barcode ©lyr = Lyrics ©cmt = Comment cprt = Copyright covr = Cover art ``` **Ogg Vorbis (Vorbis Comments):** Same as FLAC (both use Vorbis Comments). **WAVE (ID3v2):** Same as MP3 (WAVE files can contain ID3v2 tags, though non-standard). ### Write Operations **FLAC Example:** ```python import mutagen.flac audio = mutagen.flac.FLAC("track.flac") # Text fields audio["TITLE"] = "Creep" audio["ARTIST"] = "Radiohead" audio["ALBUM"] = "Pablo Honey" audio["DATE"] = "1993" audio["TRACKNUMBER"] = "2" audio["DISCNUMBER"] = "1" audio["ISRC"] = "GBAYE9200070" # Artwork picture = mutagen.flac.Picture() picture.type = 3 # Front cover picture.mime = "image/jpeg" picture.desc = "Cover" picture.data = open("cover.jpg", "rb").read() audio.add_picture(picture) audio.save() ``` **MP3 Example:** ```python from mutagen.id3 import ID3, TIT2, TPE1, TALB, TDRC, TRCK, APIC audio = ID3("track.mp3") audio["TIT2"] = TIT2(encoding=3, text="Creep") audio["TPE1"] = TPE1(encoding=3, text="Radiohead") audio["TALB"] = TALB(encoding=3, text="Pablo Honey") audio["TDRC"] = TDRC(encoding=3, text="1993") audio["TRCK"] = TRCK(encoding=3, text="2/12") audio["APIC"] = APIC( encoding=3, mime="image/jpeg", type=3, desc="Cover", data=open("cover.jpg", "rb").read() ) audio.save() ``` **MP4 Example:** ```python import mutagen.mp4 audio = mutagen.mp4.MP4("track.m4a") audio["©nam"] = "Creep" audio["©ART"] = "Radiohead" audio["©alb"] = "Pablo Honey" audio["©day"] = "1993" audio["trkn"] = [(2, 12)] # Track 2 of 12 audio["disk"] = [(1, 1)] # Disc 1 of 1 audio["covr"] = [ mutagen.mp4.MP4Cover( open("cover.jpg", "rb").read(), imageformat=mutagen.mp4.MP4Cover.FORMAT_JPEG ) ] audio.save() ``` ### Read Operations **Auto-Detection:** ```python import mutagen audio = mutagen.File("track.flac") # Access fields (format-agnostic where possible) title = audio.get("TITLE", [None])[0] # FLAC/Ogg title = audio.get("TIT2", None) # MP3 title = audio.get("©nam", [None])[0] # MP4 ``` **minim Abstraction:** ```python from minim.audio import Audio audio = Audio("track.flac") # Auto-detects format # Unified interface print(audio.title) print(audio.artist) print(audio.album) print(audio.track_number) ``` ### Artwork Handling **Fetching from API:** ```python import requests # Spotify example track = spotify_api.get_track("3n3Ppam7vgaVa1iaRUc9Lp") artwork_url = track["album"]["images"][0]["url"] # Largest image artwork_data = requests.get(artwork_url).content # TIDAL example track = tidal_api.get_track(12345678) cover_id = track["album"]["cover"].replace("-", "/") artwork_url = f"https://resources.tidal.com/images/{cover_id}/1280x1280.jpg" artwork_data = requests.get(artwork_url).content ``` **Embedding in File:** ```python audio = Audio("track.flac") audio.artwork = artwork_data # bytes audio.write_metadata() ``` **Image Formats:** JPEG and PNG supported by all tag formats. JPEG preferred for smaller file size. **Size Considerations:** Large artwork (>1MB) significantly increases file size. Recommendation: 600x600 to 1200x1200 pixels, JPEG quality 85-90%. ## Data Flow ### API Response to Audio File **Complete Workflow:** ```python from minim import spotify from minim.audio import Audio # 1. Authenticate api = spotify.WebAPI(client_id="...", client_secret="...") api.set_flow("client_credentials") api.set_access_token() # 2. Search for track results = api.search("Radiohead Creep", types=["track"], limit=1) track = results["tracks"]["items"][0] # 3. Load audio file audio = Audio("track.flac") # 4. Map API response to metadata audio.set_metadata_using_spotify(track) # 5. Write to file audio.write_metadata() ``` **Data Transformations:** **Step 4 (Mapping):** ```python def set_metadata_using_spotify(self, track_data: dict): # Direct mappings self.title = track_data["name"] self.album = track_data["album"]["name"] self.date = track_data["album"]["release_date"] self.track_number = track_data["track_number"] self.disc_number = track_data["disc_number"] # Array to string self.artist = ", ".join(a["name"] for a in track_data["artists"]) # Nested object self.isrc = track_data.get("external_ids", {}).get("isrc") # Fetch external resource if track_data["album"]["images"]: artwork_url = track_data["album"]["images"][0]["url"] self.artwork = requests.get(artwork_url).content ``` **Step 5 (Writing):** ```python # FLAC implementation def write_metadata(self): self._file["TITLE"] = self.title self._file["ARTIST"] = self.artist self._file["ALBUM"] = self.album self._file["DATE"] = self.date self._file["TRACKNUMBER"] = str(self.track_number) self._file["DISCNUMBER"] = str(self.disc_number) if self.isrc: self._file["ISRC"] = self.isrc if self.artwork: picture = mutagen.flac.Picture() picture.data = self.artwork picture.type = 3 picture.mime = "image/jpeg" self._file.add_picture(picture) self._file.save() ``` ### Service-Specific Normalization **Artist Handling:** **Spotify (array of objects):** ```json { "artists": [ {"name": "Radiohead", "id": "4Z8W4fKeB5YxbusRsdQVPb"}, {"name": "Thom Yorke", "id": "3WrFJ7ztbogyGnTHbHJFl2"} ] } ``` **Normalization:** `", ".join(a["name"] for a in artists)` → `"Radiohead, Thom Yorke"` **TIDAL (array of objects):** ```json { "artists": [ {"name": "Radiohead", "id": 4050} ] } ``` **Normalization:** Same as Spotify. **iTunes (string):** ```json { "artistName": "Radiohead" } ``` **Normalization:** Direct assignment. **Qobuz (object):** ```json { "performer": {"name": "Radiohead", "id": 12345} } ``` **Normalization:** `performer["name"]` **Date Handling:** **Spotify:** - Full date: `"2023-01-15"` → `"2023-01-15"` - Year only: `"2023"` → `"2023"` - Month precision: `"2023-01"` → `"2023-01"` **TIDAL:** - ISO 8601 with time: `"2023-01-15T00:00:00.000Z"` → `"2023-01-15"` (strip time) **iTunes:** - ISO 8601: `"2023-01-15T00:00:00Z"` → `"2023-01-15"` **Qobuz:** - Unix timestamp: `1673740800` → `datetime.fromtimestamp(1673740800).strftime("%Y-%m-%d")` - ISO 8601: `"2023-01-15"` → `"2023-01-15"` **Track/Disc Number Handling:** **Spotify:** ```json { "track_number": 3, "disc_number": 1 } ``` **Normalization:** Direct assignment. **TIDAL:** ```json { "trackNumber": 3, "volumeNumber": 1 } ``` **Normalization:** `track_number = trackNumber`, `disc_number = volumeNumber` **iTunes:** ```json { "trackNumber": 3, "trackCount": 12 } ``` **Normalization:** `track_number = trackNumber` (ignore `trackCount`) **Qobuz:** ```json { "track_number": 3, "media_number": 1 } ``` **Normalization:** Direct assignment. ## Format Conversion ### FFmpeg Integration **Conversion Workflow:** ```python audio = Audio("track.flac") # Convert to MP3 mp3_audio = audio.convert("track.mp3", "mp3", bitrate="320k") # Convert to AAC m4a_audio = audio.convert("track.m4a", "m4a", bitrate="256k") # Convert to Ogg Vorbis ogg_audio = audio.convert("track.ogg", "ogg", quality=10) ``` **FFmpeg Command Construction:** ```python def convert(self, output_path: str, format: str, **options): cmd = ["ffmpeg", "-i", self.filepath] # Codec selection codec_map = { "flac": "flac", "mp3": "libmp3lame", "m4a": "aac", "ogg": "libvorbis", "wav": "pcm_s16le" } cmd.extend(["-c:a", codec_map[format]]) # Options if "bitrate" in options: cmd.extend(["-b:a", options["bitrate"]]) if "quality" in options: cmd.extend(["-q:a", str(options["quality"])]) if "sample_rate" in options: cmd.extend(["-ar", str(options["sample_rate"])]) cmd.append(output_path) subprocess.run(cmd, check=True) ``` **Metadata Preservation:** ```python # After conversion, copy metadata converted = Audio(output_path) converted.title = self.title converted.artist = self.artist converted.album = self.album # ... copy all fields converted.artwork = self.artwork converted.write_metadata() ``` **Lossy to Lossless:** Converting lossy formats (MP3, AAC) to lossless (FLAC) does not improve quality. The conversion is technically lossless but the source is already lossy. **Lossless to Lossy:** Converting FLAC to MP3/AAC reduces file size but loses audio information. Irreversible. ## Data Validation **No Validation:** minim does not validate metadata before writing to files. **Potential Issues:** - Invalid dates (e.g., `"2023-13-45"`) written as-is - Track numbers exceeding album track count - Non-numeric values in numeric fields - Oversized artwork (multi-megabyte images) **Recommendation:** Implement validation layer: ```python def validate_metadata(audio: Audio): # Date validation if audio.date: try: datetime.strptime(audio.date, "%Y-%m-%d") except ValueError: # Try year-only format try: datetime.strptime(audio.date, "%Y") except ValueError: raise ValueError(f"Invalid date format: {audio.date}") # Track number validation if audio.track_number and audio.track_number < 1: raise ValueError(f"Invalid track number: {audio.track_number}") # Artwork size validation if audio.artwork and len(audio.artwork) > 2 * 1024 * 1024: # 2MB warnings.warn(f"Large artwork: {len(audio.artwork)} bytes") ``` ## Data Retention **Token Expiration:** Access tokens expire (typically 1 hour for OAuth 2.0). Refresh tokens used to obtain new access tokens without re-authentication. **Token Cleanup:** Expired tokens remain in `~/minim.cfg` indefinitely. No automatic cleanup. **Audio Metadata:** Persists in files until overwritten or file deleted. **API Response Caching:** Not implemented. Every request hits the API. ## Data Privacy **Sensitive Data in Config File:** - User passwords (Qobuz) - Access tokens (all services) - Refresh tokens (OAuth 2.0 services) - User IDs and email addresses **Exposure Risks:** - Backup systems may copy `~/minim.cfg` to cloud storage - Version control systems may accidentally commit config file - Malware can read tokens and impersonate user **Recommendations:** 1. Add `~/minim.cfg` to `.gitignore` 2. Exclude from cloud backup or encrypt backups 3. Use environment variables for CI/CD 4. Rotate tokens regularly 5. Revoke tokens when no longer needed ## Summary minim's data management is minimal and file-based: - **No database:** All data is ephemeral or file-based - **Token storage:** Plain text INI file at `~/minim.cfg` - **Audio metadata:** Written to file tags via mutagen - **No caching:** API responses not persisted - **No validation:** Metadata written as-is without checks This approach is simple and suitable for personal use but lacks security and robustness for production systems. The v2 rewrite addresses security concerns with OS keychain integration and adds validation layers. For a metadata aggregator project, consider: - Secure credential storage (OS keychain, secrets manager) - Database for caching API responses (reduce API calls) - Metadata validation before writing to files - Audit logging for data access and modifications