feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,664 @@
|
||||
# minim: Data Management
|
||||
|
||||
## Data Storage Architecture
|
||||
|
||||
minim does **not use a database**. All data is either:
|
||||
|
||||
1. **Ephemeral:** API responses held in memory during execution
|
||||
2. **Token Storage:** OAuth tokens persisted to `~/minim.cfg`
|
||||
3. **Audio Metadata:** Written to audio file tags via mutagen
|
||||
|
||||
There is no SQL database, no NoSQL store, no caching layer, no persistent data beyond configuration and audio files.
|
||||
|
||||
## Token Storage
|
||||
|
||||
### File Location
|
||||
|
||||
**Path:** `~/minim.cfg` (expands to user's home directory)
|
||||
|
||||
**Format:** INI-style configuration file via Python's `ConfigParser`
|
||||
|
||||
**Permissions:** Default file permissions (typically 0644 on Unix, readable by user and group)
|
||||
|
||||
**Security:** Plain text storage. No encryption, no obfuscation, no OS keychain integration.
|
||||
|
||||
### File Structure
|
||||
|
||||
```ini
|
||||
[discogs]
|
||||
consumer_key = Abcd1234Efgh5678
|
||||
consumer_secret = IjklMnopQrstUvwx
|
||||
access_token = YzabCdefGhijKlmn
|
||||
access_token_secret = OpqrStuvWxyzAbcd
|
||||
|
||||
[qobuz]
|
||||
app_id = 123456789
|
||||
app_secret = abcdefghijklmnopqrstuvwxyz
|
||||
email = user@example.com
|
||||
password = MySecurePassword123
|
||||
access_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
|
||||
expires_at = 1672531200
|
||||
|
||||
[spotify]
|
||||
client_id = 1234567890abcdef1234567890abcdef
|
||||
client_secret = fedcba0987654321fedcba0987654321
|
||||
redirect_uri = http://localhost:8888
|
||||
access_token = BQDxK7...truncated...
|
||||
refresh_token = AQBz3...truncated...
|
||||
expires_at = 1672527600
|
||||
scopes = user-library-read,playlist-read-private,user-read-playback-state
|
||||
|
||||
[tidal]
|
||||
client_id = abcdefgh-1234-5678-90ab-cdefghijklmn
|
||||
client_secret = ijklmnop-qrst-uvwx-yzab-cdefghijklmn
|
||||
access_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
|
||||
refresh_token = eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
|
||||
user_id = 12345678
|
||||
country_code = US
|
||||
expires_at = 1672534800
|
||||
```
|
||||
|
||||
### Data Fields
|
||||
|
||||
**Common Fields (OAuth 2.0):**
|
||||
- `client_id`: Application identifier
|
||||
- `client_secret`: Application secret
|
||||
- `access_token`: Bearer token for API requests
|
||||
- `refresh_token`: Token for obtaining new access tokens
|
||||
- `expires_at`: Unix timestamp when access token expires
|
||||
|
||||
**Service-Specific Fields:**
|
||||
|
||||
**Discogs (OAuth 1.0a):**
|
||||
- `consumer_key`: OAuth consumer key
|
||||
- `consumer_secret`: OAuth consumer secret
|
||||
- `access_token`: OAuth access token
|
||||
- `access_token_secret`: OAuth access token secret
|
||||
- `personal_access_token`: Alternative to OAuth (from Discogs settings)
|
||||
|
||||
**Qobuz:**
|
||||
- `app_id`: Qobuz application ID (extracted from web player)
|
||||
- `app_secret`: Qobuz application secret (extracted from web player)
|
||||
- `email`: User email for password grant
|
||||
- `password`: User password (stored in plain text)
|
||||
|
||||
**Spotify:**
|
||||
- `redirect_uri`: OAuth redirect URI
|
||||
- `scopes`: Comma-separated list of permission scopes
|
||||
|
||||
**TIDAL:**
|
||||
- `user_id`: TIDAL user ID (numeric)
|
||||
- `country_code`: Two-letter country code for content availability
|
||||
|
||||
### Read/Write Operations
|
||||
|
||||
**Reading:**
|
||||
```python
|
||||
from configparser import ConfigParser
|
||||
import os
|
||||
|
||||
config = ConfigParser()
|
||||
config.read(os.path.expanduser("~/minim.cfg"))
|
||||
|
||||
if config.has_section("spotify"):
|
||||
access_token = config.get("spotify", "access_token", fallback=None)
|
||||
refresh_token = config.get("spotify", "refresh_token", fallback=None)
|
||||
expires_at = config.getint("spotify", "expires_at", fallback=0)
|
||||
```
|
||||
|
||||
**Writing:**
|
||||
```python
|
||||
config = ConfigParser()
|
||||
config.read(os.path.expanduser("~/minim.cfg"))
|
||||
|
||||
if not config.has_section("spotify"):
|
||||
config.add_section("spotify")
|
||||
|
||||
config.set("spotify", "access_token", new_access_token)
|
||||
config.set("spotify", "refresh_token", new_refresh_token)
|
||||
config.set("spotify", "expires_at", str(int(time.time()) + 3600))
|
||||
|
||||
with open(os.path.expanduser("~/minim.cfg"), "w") as f:
|
||||
config.write(f)
|
||||
```
|
||||
|
||||
**Concurrency:** Not thread-safe. Concurrent writes from multiple processes can corrupt the file. No file locking, no atomic writes.
|
||||
|
||||
### Security Implications
|
||||
|
||||
**Risks:**
|
||||
1. **Plain Text Passwords:** Qobuz passwords stored unencrypted
|
||||
2. **Token Exposure:** Access tokens readable by any process running as the user
|
||||
3. **No Expiration Cleanup:** Expired tokens remain in file indefinitely
|
||||
4. **File Permissions:** Default permissions may allow group/other read access
|
||||
|
||||
**Mitigations (Not Implemented):**
|
||||
- Encrypt sensitive fields using OS keychain (Keyring, Keychain Access, Windows Credential Manager)
|
||||
- Set restrictive file permissions (0600, user-only read/write)
|
||||
- Use environment variables for sensitive credentials
|
||||
- Implement token rotation and cleanup
|
||||
|
||||
**Recommendation:** For production use, replace file-based storage with secure credential management (AWS Secrets Manager, HashiCorp Vault, OS keychain).
|
||||
|
||||
## Audio Metadata Storage
|
||||
|
||||
### Tag Formats
|
||||
|
||||
minim writes metadata to audio files using format-specific tag systems:
|
||||
|
||||
| Format | Tag System | Implementation |
|
||||
|--------|------------|----------------|
|
||||
| FLAC | Vorbis Comments | `mutagen.flac.FLAC` |
|
||||
| MP3 | ID3v2.4 | `mutagen.id3.ID3` |
|
||||
| MP4/M4A | MP4 Atoms | `mutagen.mp4.MP4` |
|
||||
| Ogg Vorbis | Vorbis Comments | `mutagen.oggvorbis.OggVorbis` |
|
||||
| WAVE | ID3v2 (non-standard) | `mutagen.wave.WAVE` |
|
||||
|
||||
### Field Mapping
|
||||
|
||||
**FLAC (Vorbis Comments):**
|
||||
```
|
||||
TITLE = Track title
|
||||
ARTIST = Primary artist(s)
|
||||
ALBUMARTIST = Album artist
|
||||
ALBUM = Album title
|
||||
DATE = Release date (YYYY-MM-DD or YYYY)
|
||||
GENRE = Genre
|
||||
TRACKNUMBER = Track number
|
||||
DISCNUMBER = Disc number
|
||||
ISRC = International Standard Recording Code
|
||||
BARCODE = UPC/EAN barcode
|
||||
LYRICS = Song lyrics
|
||||
COMMENT = Freeform comment
|
||||
COPYRIGHT = Copyright notice
|
||||
METADATA_BLOCK_PICTURE = Embedded artwork (base64-encoded)
|
||||
```
|
||||
|
||||
**MP3 (ID3v2.4):**
|
||||
```
|
||||
TIT2 = Track title
|
||||
TPE1 = Primary artist(s)
|
||||
TPE2 = Album artist
|
||||
TALB = Album title
|
||||
TDRC = Release date
|
||||
TCON = Genre
|
||||
TRCK = Track number (format: "3" or "3/12")
|
||||
TPOS = Disc number (format: "1" or "1/2")
|
||||
TSRC = ISRC
|
||||
TXXX:BARCODE = UPC/EAN barcode (custom frame)
|
||||
USLT = Unsynchronized lyrics
|
||||
COMM = Comment
|
||||
TCOP = Copyright
|
||||
APIC = Attached picture (artwork)
|
||||
```
|
||||
|
||||
**MP4 (Atoms):**
|
||||
```
|
||||
©nam = Track title
|
||||
©ART = Primary artist(s)
|
||||
aART = Album artist
|
||||
©alb = Album title
|
||||
©day = Release date
|
||||
©gen = Genre
|
||||
trkn = Track number (tuple: (track, total))
|
||||
disk = Disc number (tuple: (disc, total))
|
||||
----:com.apple.iTunes:ISRC = ISRC (custom atom)
|
||||
----:com.apple.iTunes:BARCODE = UPC/EAN barcode
|
||||
©lyr = Lyrics
|
||||
©cmt = Comment
|
||||
cprt = Copyright
|
||||
covr = Cover art
|
||||
```
|
||||
|
||||
**Ogg Vorbis (Vorbis Comments):**
|
||||
Same as FLAC (both use Vorbis Comments).
|
||||
|
||||
**WAVE (ID3v2):**
|
||||
Same as MP3 (WAVE files can contain ID3v2 tags, though non-standard).
|
||||
|
||||
### Write Operations
|
||||
|
||||
**FLAC Example:**
|
||||
```python
|
||||
import mutagen.flac
|
||||
|
||||
audio = mutagen.flac.FLAC("track.flac")
|
||||
|
||||
# Text fields
|
||||
audio["TITLE"] = "Creep"
|
||||
audio["ARTIST"] = "Radiohead"
|
||||
audio["ALBUM"] = "Pablo Honey"
|
||||
audio["DATE"] = "1993"
|
||||
audio["TRACKNUMBER"] = "2"
|
||||
audio["DISCNUMBER"] = "1"
|
||||
audio["ISRC"] = "GBAYE9200070"
|
||||
|
||||
# Artwork
|
||||
picture = mutagen.flac.Picture()
|
||||
picture.type = 3 # Front cover
|
||||
picture.mime = "image/jpeg"
|
||||
picture.desc = "Cover"
|
||||
picture.data = open("cover.jpg", "rb").read()
|
||||
audio.add_picture(picture)
|
||||
|
||||
audio.save()
|
||||
```
|
||||
|
||||
**MP3 Example:**
|
||||
```python
|
||||
from mutagen.id3 import ID3, TIT2, TPE1, TALB, TDRC, TRCK, APIC
|
||||
|
||||
audio = ID3("track.mp3")
|
||||
|
||||
audio["TIT2"] = TIT2(encoding=3, text="Creep")
|
||||
audio["TPE1"] = TPE1(encoding=3, text="Radiohead")
|
||||
audio["TALB"] = TALB(encoding=3, text="Pablo Honey")
|
||||
audio["TDRC"] = TDRC(encoding=3, text="1993")
|
||||
audio["TRCK"] = TRCK(encoding=3, text="2/12")
|
||||
|
||||
audio["APIC"] = APIC(
|
||||
encoding=3,
|
||||
mime="image/jpeg",
|
||||
type=3,
|
||||
desc="Cover",
|
||||
data=open("cover.jpg", "rb").read()
|
||||
)
|
||||
|
||||
audio.save()
|
||||
```
|
||||
|
||||
**MP4 Example:**
|
||||
```python
|
||||
import mutagen.mp4
|
||||
|
||||
audio = mutagen.mp4.MP4("track.m4a")
|
||||
|
||||
audio["©nam"] = "Creep"
|
||||
audio["©ART"] = "Radiohead"
|
||||
audio["©alb"] = "Pablo Honey"
|
||||
audio["©day"] = "1993"
|
||||
audio["trkn"] = [(2, 12)] # Track 2 of 12
|
||||
audio["disk"] = [(1, 1)] # Disc 1 of 1
|
||||
|
||||
audio["covr"] = [
|
||||
mutagen.mp4.MP4Cover(
|
||||
open("cover.jpg", "rb").read(),
|
||||
imageformat=mutagen.mp4.MP4Cover.FORMAT_JPEG
|
||||
)
|
||||
]
|
||||
|
||||
audio.save()
|
||||
```
|
||||
|
||||
### Read Operations
|
||||
|
||||
**Auto-Detection:**
|
||||
```python
|
||||
import mutagen
|
||||
|
||||
audio = mutagen.File("track.flac")
|
||||
|
||||
# Access fields (format-agnostic where possible)
|
||||
title = audio.get("TITLE", [None])[0] # FLAC/Ogg
|
||||
title = audio.get("TIT2", None) # MP3
|
||||
title = audio.get("©nam", [None])[0] # MP4
|
||||
```
|
||||
|
||||
**minim Abstraction:**
|
||||
```python
|
||||
from minim.audio import Audio
|
||||
|
||||
audio = Audio("track.flac") # Auto-detects format
|
||||
|
||||
# Unified interface
|
||||
print(audio.title)
|
||||
print(audio.artist)
|
||||
print(audio.album)
|
||||
print(audio.track_number)
|
||||
```
|
||||
|
||||
### Artwork Handling
|
||||
|
||||
**Fetching from API:**
|
||||
```python
|
||||
import requests
|
||||
|
||||
# Spotify example
|
||||
track = spotify_api.get_track("3n3Ppam7vgaVa1iaRUc9Lp")
|
||||
artwork_url = track["album"]["images"][0]["url"] # Largest image
|
||||
artwork_data = requests.get(artwork_url).content
|
||||
|
||||
# TIDAL example
|
||||
track = tidal_api.get_track(12345678)
|
||||
cover_id = track["album"]["cover"].replace("-", "/")
|
||||
artwork_url = f"https://resources.tidal.com/images/{cover_id}/1280x1280.jpg"
|
||||
artwork_data = requests.get(artwork_url).content
|
||||
```
|
||||
|
||||
**Embedding in File:**
|
||||
```python
|
||||
audio = Audio("track.flac")
|
||||
audio.artwork = artwork_data # bytes
|
||||
audio.write_metadata()
|
||||
```
|
||||
|
||||
**Image Formats:** JPEG and PNG supported by all tag formats. JPEG preferred for smaller file size.
|
||||
|
||||
**Size Considerations:** Large artwork (>1MB) significantly increases file size. Recommendation: 600x600 to 1200x1200 pixels, JPEG quality 85-90%.
|
||||
|
||||
## Data Flow
|
||||
|
||||
### API Response to Audio File
|
||||
|
||||
**Complete Workflow:**
|
||||
|
||||
```python
|
||||
from minim import spotify
|
||||
from minim.audio import Audio
|
||||
|
||||
# 1. Authenticate
|
||||
api = spotify.WebAPI(client_id="...", client_secret="...")
|
||||
api.set_flow("client_credentials")
|
||||
api.set_access_token()
|
||||
|
||||
# 2. Search for track
|
||||
results = api.search("Radiohead Creep", types=["track"], limit=1)
|
||||
track = results["tracks"]["items"][0]
|
||||
|
||||
# 3. Load audio file
|
||||
audio = Audio("track.flac")
|
||||
|
||||
# 4. Map API response to metadata
|
||||
audio.set_metadata_using_spotify(track)
|
||||
|
||||
# 5. Write to file
|
||||
audio.write_metadata()
|
||||
```
|
||||
|
||||
**Data Transformations:**
|
||||
|
||||
**Step 4 (Mapping):**
|
||||
```python
|
||||
def set_metadata_using_spotify(self, track_data: dict):
|
||||
# Direct mappings
|
||||
self.title = track_data["name"]
|
||||
self.album = track_data["album"]["name"]
|
||||
self.date = track_data["album"]["release_date"]
|
||||
self.track_number = track_data["track_number"]
|
||||
self.disc_number = track_data["disc_number"]
|
||||
|
||||
# Array to string
|
||||
self.artist = ", ".join(a["name"] for a in track_data["artists"])
|
||||
|
||||
# Nested object
|
||||
self.isrc = track_data.get("external_ids", {}).get("isrc")
|
||||
|
||||
# Fetch external resource
|
||||
if track_data["album"]["images"]:
|
||||
artwork_url = track_data["album"]["images"][0]["url"]
|
||||
self.artwork = requests.get(artwork_url).content
|
||||
```
|
||||
|
||||
**Step 5 (Writing):**
|
||||
```python
|
||||
# FLAC implementation
|
||||
def write_metadata(self):
|
||||
self._file["TITLE"] = self.title
|
||||
self._file["ARTIST"] = self.artist
|
||||
self._file["ALBUM"] = self.album
|
||||
self._file["DATE"] = self.date
|
||||
self._file["TRACKNUMBER"] = str(self.track_number)
|
||||
self._file["DISCNUMBER"] = str(self.disc_number)
|
||||
|
||||
if self.isrc:
|
||||
self._file["ISRC"] = self.isrc
|
||||
|
||||
if self.artwork:
|
||||
picture = mutagen.flac.Picture()
|
||||
picture.data = self.artwork
|
||||
picture.type = 3
|
||||
picture.mime = "image/jpeg"
|
||||
self._file.add_picture(picture)
|
||||
|
||||
self._file.save()
|
||||
```
|
||||
|
||||
### Service-Specific Normalization
|
||||
|
||||
**Artist Handling:**
|
||||
|
||||
**Spotify (array of objects):**
|
||||
```json
|
||||
{
|
||||
"artists": [
|
||||
{"name": "Radiohead", "id": "4Z8W4fKeB5YxbusRsdQVPb"},
|
||||
{"name": "Thom Yorke", "id": "3WrFJ7ztbogyGnTHbHJFl2"}
|
||||
]
|
||||
}
|
||||
```
|
||||
**Normalization:** `", ".join(a["name"] for a in artists)` → `"Radiohead, Thom Yorke"`
|
||||
|
||||
**TIDAL (array of objects):**
|
||||
```json
|
||||
{
|
||||
"artists": [
|
||||
{"name": "Radiohead", "id": 4050}
|
||||
]
|
||||
}
|
||||
```
|
||||
**Normalization:** Same as Spotify.
|
||||
|
||||
**iTunes (string):**
|
||||
```json
|
||||
{
|
||||
"artistName": "Radiohead"
|
||||
}
|
||||
```
|
||||
**Normalization:** Direct assignment.
|
||||
|
||||
**Qobuz (object):**
|
||||
```json
|
||||
{
|
||||
"performer": {"name": "Radiohead", "id": 12345}
|
||||
}
|
||||
```
|
||||
**Normalization:** `performer["name"]`
|
||||
|
||||
**Date Handling:**
|
||||
|
||||
**Spotify:**
|
||||
- Full date: `"2023-01-15"` → `"2023-01-15"`
|
||||
- Year only: `"2023"` → `"2023"`
|
||||
- Month precision: `"2023-01"` → `"2023-01"`
|
||||
|
||||
**TIDAL:**
|
||||
- ISO 8601 with time: `"2023-01-15T00:00:00.000Z"` → `"2023-01-15"` (strip time)
|
||||
|
||||
**iTunes:**
|
||||
- ISO 8601: `"2023-01-15T00:00:00Z"` → `"2023-01-15"`
|
||||
|
||||
**Qobuz:**
|
||||
- Unix timestamp: `1673740800` → `datetime.fromtimestamp(1673740800).strftime("%Y-%m-%d")`
|
||||
- ISO 8601: `"2023-01-15"` → `"2023-01-15"`
|
||||
|
||||
**Track/Disc Number Handling:**
|
||||
|
||||
**Spotify:**
|
||||
```json
|
||||
{
|
||||
"track_number": 3,
|
||||
"disc_number": 1
|
||||
}
|
||||
```
|
||||
**Normalization:** Direct assignment.
|
||||
|
||||
**TIDAL:**
|
||||
```json
|
||||
{
|
||||
"trackNumber": 3,
|
||||
"volumeNumber": 1
|
||||
}
|
||||
```
|
||||
**Normalization:** `track_number = trackNumber`, `disc_number = volumeNumber`
|
||||
|
||||
**iTunes:**
|
||||
```json
|
||||
{
|
||||
"trackNumber": 3,
|
||||
"trackCount": 12
|
||||
}
|
||||
```
|
||||
**Normalization:** `track_number = trackNumber` (ignore `trackCount`)
|
||||
|
||||
**Qobuz:**
|
||||
```json
|
||||
{
|
||||
"track_number": 3,
|
||||
"media_number": 1
|
||||
}
|
||||
```
|
||||
**Normalization:** Direct assignment.
|
||||
|
||||
## Format Conversion
|
||||
|
||||
### FFmpeg Integration
|
||||
|
||||
**Conversion Workflow:**
|
||||
```python
|
||||
audio = Audio("track.flac")
|
||||
|
||||
# Convert to MP3
|
||||
mp3_audio = audio.convert("track.mp3", "mp3", bitrate="320k")
|
||||
|
||||
# Convert to AAC
|
||||
m4a_audio = audio.convert("track.m4a", "m4a", bitrate="256k")
|
||||
|
||||
# Convert to Ogg Vorbis
|
||||
ogg_audio = audio.convert("track.ogg", "ogg", quality=10)
|
||||
```
|
||||
|
||||
**FFmpeg Command Construction:**
|
||||
```python
|
||||
def convert(self, output_path: str, format: str, **options):
|
||||
cmd = ["ffmpeg", "-i", self.filepath]
|
||||
|
||||
# Codec selection
|
||||
codec_map = {
|
||||
"flac": "flac",
|
||||
"mp3": "libmp3lame",
|
||||
"m4a": "aac",
|
||||
"ogg": "libvorbis",
|
||||
"wav": "pcm_s16le"
|
||||
}
|
||||
cmd.extend(["-c:a", codec_map[format]])
|
||||
|
||||
# Options
|
||||
if "bitrate" in options:
|
||||
cmd.extend(["-b:a", options["bitrate"]])
|
||||
if "quality" in options:
|
||||
cmd.extend(["-q:a", str(options["quality"])])
|
||||
if "sample_rate" in options:
|
||||
cmd.extend(["-ar", str(options["sample_rate"])])
|
||||
|
||||
cmd.append(output_path)
|
||||
|
||||
subprocess.run(cmd, check=True)
|
||||
```
|
||||
|
||||
**Metadata Preservation:**
|
||||
```python
|
||||
# After conversion, copy metadata
|
||||
converted = Audio(output_path)
|
||||
converted.title = self.title
|
||||
converted.artist = self.artist
|
||||
converted.album = self.album
|
||||
# ... copy all fields
|
||||
converted.artwork = self.artwork
|
||||
converted.write_metadata()
|
||||
```
|
||||
|
||||
**Lossy to Lossless:** Converting lossy formats (MP3, AAC) to lossless (FLAC) does not improve quality. The conversion is technically lossless but the source is already lossy.
|
||||
|
||||
**Lossless to Lossy:** Converting FLAC to MP3/AAC reduces file size but loses audio information. Irreversible.
|
||||
|
||||
## Data Validation
|
||||
|
||||
**No Validation:** minim does not validate metadata before writing to files.
|
||||
|
||||
**Potential Issues:**
|
||||
- Invalid dates (e.g., `"2023-13-45"`) written as-is
|
||||
- Track numbers exceeding album track count
|
||||
- Non-numeric values in numeric fields
|
||||
- Oversized artwork (multi-megabyte images)
|
||||
|
||||
**Recommendation:** Implement validation layer:
|
||||
|
||||
```python
|
||||
def validate_metadata(audio: Audio):
|
||||
# Date validation
|
||||
if audio.date:
|
||||
try:
|
||||
datetime.strptime(audio.date, "%Y-%m-%d")
|
||||
except ValueError:
|
||||
# Try year-only format
|
||||
try:
|
||||
datetime.strptime(audio.date, "%Y")
|
||||
except ValueError:
|
||||
raise ValueError(f"Invalid date format: {audio.date}")
|
||||
|
||||
# Track number validation
|
||||
if audio.track_number and audio.track_number < 1:
|
||||
raise ValueError(f"Invalid track number: {audio.track_number}")
|
||||
|
||||
# Artwork size validation
|
||||
if audio.artwork and len(audio.artwork) > 2 * 1024 * 1024: # 2MB
|
||||
warnings.warn(f"Large artwork: {len(audio.artwork)} bytes")
|
||||
```
|
||||
|
||||
## Data Retention
|
||||
|
||||
**Token Expiration:** Access tokens expire (typically 1 hour for OAuth 2.0). Refresh tokens used to obtain new access tokens without re-authentication.
|
||||
|
||||
**Token Cleanup:** Expired tokens remain in `~/minim.cfg` indefinitely. No automatic cleanup.
|
||||
|
||||
**Audio Metadata:** Persists in files until overwritten or file deleted.
|
||||
|
||||
**API Response Caching:** Not implemented. Every request hits the API.
|
||||
|
||||
## Data Privacy
|
||||
|
||||
**Sensitive Data in Config File:**
|
||||
- User passwords (Qobuz)
|
||||
- Access tokens (all services)
|
||||
- Refresh tokens (OAuth 2.0 services)
|
||||
- User IDs and email addresses
|
||||
|
||||
**Exposure Risks:**
|
||||
- Backup systems may copy `~/minim.cfg` to cloud storage
|
||||
- Version control systems may accidentally commit config file
|
||||
- Malware can read tokens and impersonate user
|
||||
|
||||
**Recommendations:**
|
||||
1. Add `~/minim.cfg` to `.gitignore`
|
||||
2. Exclude from cloud backup or encrypt backups
|
||||
3. Use environment variables for CI/CD
|
||||
4. Rotate tokens regularly
|
||||
5. Revoke tokens when no longer needed
|
||||
|
||||
## Summary
|
||||
|
||||
minim's data management is minimal and file-based:
|
||||
|
||||
- **No database:** All data is ephemeral or file-based
|
||||
- **Token storage:** Plain text INI file at `~/minim.cfg`
|
||||
- **Audio metadata:** Written to file tags via mutagen
|
||||
- **No caching:** API responses not persisted
|
||||
- **No validation:** Metadata written as-is without checks
|
||||
|
||||
This approach is simple and suitable for personal use but lacks security and robustness for production systems. The v2 rewrite addresses security concerns with OS keychain integration and adds validation layers.
|
||||
|
||||
For a metadata aggregator project, consider:
|
||||
- Secure credential storage (OS keychain, secrets manager)
|
||||
- Database for caching API responses (reduce API calls)
|
||||
- Metadata validation before writing to files
|
||||
- Audit logging for data access and modifications
|
||||
Reference in New Issue
Block a user