- Implement all 8 FormatHandler trait methods - Use lofty 0.24 for ID3v2.4 tag creation/parsing - Map all 36 AudioMeta fields to ID3v2 frames - Handle ID3v2 header parsing for audio_start - Detect ID3v1 tags at EOF for audio_end - Add 13 comprehensive unit tests - Fix test-utils AudioMeta construction with ..Default::default() - All tests pass, LSP diagnostics clean
31 KiB
Metadata Overlay: Design Doc
Authors: AI Assistant
Status: Draft
Last Updated: 2026-05-17
Reviewers: [TBD]
Approvers: [TBD]
Prerequisites: architecture.md, requirements.md
[TOC]
1. Abstract
Metadata Overlay enables MusicFS to serve modified audio metadata to consumers (Jellyfin, Plex, mpv, VLC) while preserving original files on origin storage. When a media server reads a file through the FUSE mount, it receives metadata headers generated on-the-fly from the database, seamlessly spliced with original audio data from the origin.
Key constraints:
- Never modify origin files (read-only architecture)
- Never duplicate entire files (storage-efficient)
- Support all audio formats via pluggable format handlers
- Transparent to consumers (standard file I/O)
Solution approach: Store metadata as individual database columns. On
read(), generate format-specific headers on-the-fly (~10-50 μs) and splice
them with original audio bytes using offset translation. No pre-generated
headers are stored.
2. Background
2.1 Current State
MusicFS serves files with their original embedded metadata. The metadata extraction flow is:
Origin File → symphonia parser → AudioMeta struct → SQLite DB → Virtual paths
↓
FUSE read() ← CAS chunks ← Origin (unchanged bytes)
The database stores metadata for virtual path generation and search, but file content is served verbatim from origin/CAS. Only 12 metadata fields are stored: title, artist, album, album_artist, genre, year, track, disc, duration_ms, bitrate, sample_rate, format.
2.2 Pain Points
| Problem | Impact |
|---|---|
| Cannot fix incorrect tags | Jellyfin shows wrong artist/album |
| Cannot add missing metadata | Files with no tags appear as "Unknown" |
| Origin is authoritative | User edits require modifying source files |
| Breaks torrent seeding | Modifying origin invalidates checksums |
| Missing fields | Only 12 of ~30 standard fields stored |
2.3 User Stories
-
Tag Correction: "Origin files have 'The Beatles' tagged as 'Beatles, The'. I want Jellyfin to see the correct name without modifying my NAS."
-
Missing Metadata: "My FLAC rips have no album art or year. I want to add them in MusicFS so Plex displays proper covers."
-
Torrent Preservation: "My music is seeding. I can't modify files but want correct tags in my media server."
-
Multi-Library Views: "I want one physical file to appear in both 'Classical' and 'Relaxation' collections with different metadata."
3. Goals & Non-Goals
3.1 Goals
| ID | Goal | Success Metric |
|---|---|---|
| G1 | Serve modified metadata transparently | Players read edited tags without special handling |
| G2 | Zero origin modification | Origin files byte-identical before/after |
| G3 | Zero storage overhead for headers | No pre-generated header blobs stored |
| G4 | MP3 and FLAC out of the box | Other formats added on demand via plugins |
| G5 | Pluggable format handlers | Add new format support without core changes |
| G6 | Unified metadata model | Single API regardless of underlying format |
| G7 | Sub-second edit latency | Metadata changes reflected on next read |
3.2 Non-Goals
| ID | Non-Goal | Rationale |
|---|---|---|
| NG1 | Audio transcoding | Out of scope; separate feature |
| NG2 | Lossless round-trip | Synthesized headers may differ structurally from original |
| NG3 | Writing back to origin | Violates read-only principle |
| NG4 | Video file support | Focus on audio; defer to future |
| NG5 | Metadata sync to external DBs | Jellyfin/Plex have their own; not our concern |
4. Proposed Design
4.1 High-Level Architecture
@startuml
!theme plain
skinparam componentStyle rectangle
package "FUSE Layer" {
[getattr()] as GA
[read()] as RD
}
package "Overlay Engine" {
[OverlayReader] as OR
[FormatHandlerRegistry] as FHR
}
package "Storage" {
database "SQLite\n(metadata columns\n+ format_layout)" as DB
[CAS\n(origin audio chunks)] as CAS
}
package "Format Handlers (Pluggable)" {
[Id3v2Handler] as H1
[FlacHandler] as H2
[WavHandler\n(on demand)] as H3
[OggHandler\n(on demand)] as H4
[Mp4Handler\n(on demand)] as H5
}
GA --> OR : virtual_size?
RD --> OR : read(ino, offset, size)
OR --> DB : get metadata + layout
OR --> FHR : synthesize(metadata, layout)
FHR --> H1
FHR --> H2
FHR --> H3
FHR --> H4
FHR --> H5
OR --> CAS : read audio bytes
note right of OR
On-the-fly generation:
1. Read metadata from DB columns
2. Generate header (~10-50 μs)
3. Splice header + CAS audio
4. Return to FUSE
end note
@enduml
4.2 Core Flows
4.2.1 Flow: Initial Ingest (Origin Scan)
Triggered on mount or rescan. Extracts metadata from origin files and populates all database columns.
@startuml
!theme plain
participant "Origin\nFederation" as OF
participant "CAS" as CAS
participant "Format\nHandler" as FH
participant "Metadata\nParser" as MP
database "SQLite" as DB
participant "Tantivy" as TI
participant "Virtual\nTree" as VT
OF -> OF : Scan origin directory
loop for each audio file
OF -> CAS : Fetch file header (first 256KB)
CAS -> CAS : Chunk and store full file
CAS --> OF : ChunkManifest
OF -> FH : analyze(header_bytes, file_size)
note right of FH
Detects format, returns
FormatLayout with audio_start,
audio_end, format_data
(e.g. STREAMINFO for FLAC)
end note
FH --> OF : FormatLayout
OF -> MP : extract(header_bytes)
note right of MP
Uses symphonia to parse
all embedded tags
end note
MP --> OF : metadata fields
OF -> DB : INSERT INTO files\n(all metadata columns,\nformat_layout, chunk_manifest)
OF -> TI : Index metadata
OF -> VT : Add virtual tree node
end
@enduml
4.2.2 Flow: FUSE read() with Overlay
The core read path. Headers are generated on-the-fly from DB columns — nothing pre-computed is stored.
@startuml
!theme plain
participant "FUSE\nKernel" as FK
participant "Overlay\nReader" as OR
database "SQLite" as DB
participant "Format\nHandler" as FH
participant "CAS" as CAS
FK -> OR : read(ino, offset, size)
OR -> OR : Lookup file by inode
OR -> DB : SELECT metadata columns,\nformat_layout WHERE id = ?
note right of DB : ~1 μs via page cache
DB --> OR : FileMetadataRow
OR -> FH : synthesize(metadata, layout)
note right of FH
On-the-fly generation
~10-50 μs, pure CPU
end note
FH --> OR : synthetic_header bytes
OR -> OR : header_len = synthetic_header.len()\nvirtual_size = header_len + audio_len
alt offset falls in header region
OR -> OR : Slice from synthetic_header
else offset falls in audio region
OR -> OR : origin_offset = audio_start\n+ (offset - header_len)
OR -> CAS : read(file_id, origin_offset, size)
CAS --> OR : audio bytes
else offset spans boundary
OR -> OR : Take header tail
OR -> CAS : read(file_id, audio_start, remaining)
CAS --> OR : audio bytes
OR -> OR : Concatenate header + audio
end
OR --> FK : reply.data(spliced bytes)
@enduml
4.2.3 Flow: FUSE getattr() with Overlay
Returns the virtual file size (synthetic header + audio) instead of the origin file size.
@startuml
!theme plain
participant "FUSE\nKernel" as FK
participant "Virtual\nTree" as VT
database "SQLite" as DB
participant "Format\nHandler" as FH
FK -> VT : getattr(ino)
VT -> VT : Lookup VirtualNode
VT -> DB : Read format_layout for file_id
DB --> VT : FormatLayout
VT -> FH : estimate_header_size(metadata)
note right of FH
Fast estimate without
full header synthesis
end note
FH --> VT : estimated_header_len
VT -> VT : virtual_size = estimated_header_len\n+ (audio_end - audio_start)
VT --> FK : FileAttr with virtual_size
@enduml
4.2.4 Flow: Metadata Update (User Edits Tags)
Triggered via gRPC API or CLI. Updates DB columns directly. Next read() generates a new header automatically.
@startuml
!theme plain
actor "User" as U
participant "CLI /\ngRPC" as API
participant "Metadata\nService" as MS
database "SQLite" as DB
participant "Tantivy" as TI
participant "Virtual\nTree" as VT
participant "Event\nBus" as EB
U -> API : musicfs metadata set\n--title "Fix" --artist "Fix"
API -> MS : UpdateMetadata(file_id, fields)
alt merge mode (default)
MS -> DB : SELECT current metadata
DB --> MS : current row
MS -> MS : Overwrite only provided fields
end
MS -> DB : UPDATE files SET title=?, artist=?\nWHERE id=?
DB --> MS : ok
MS -> TI : Re-index updated file
MS -> VT : Recompute virtual path
note right of VT
If artist/album/title changed
the file moves in the tree
end note
MS -> EB : Emit MetadataChanged
note right of EB
FUSE attr cache invalidation,
gRPC event subscribers
end note
MS --> API : success
API --> U : done
@enduml
4.2.5 Flow: Metadata Clear (Revert to Original)
Removes user overrides. File reverts to serving original embedded metadata.
@startuml
!theme plain
actor "User" as U
participant "CLI /\ngRPC" as API
participant "Metadata\nService" as MS
participant "CAS" as CAS
participant "Metadata\nParser" as MP
database "SQLite" as DB
participant "Tantivy" as TI
participant "Virtual\nTree" as VT
participant "Event\nBus" as EB
U -> API : musicfs metadata clear <path>
API -> MS : ClearOverlay(file_id)
MS -> CAS : Read origin file header
CAS --> MS : header bytes
MS -> MP : extract(header_bytes)
MP --> MS : original metadata
MS -> DB : UPDATE files SET all columns\nto original values
DB --> MS : ok
MS -> TI : Re-index with original metadata
MS -> VT : Recompute virtual path
MS -> EB : Emit MetadataCleared
MS --> API : success
API --> U : done
@enduml
4.2.6 Flow: Batch Import
Import metadata from external source (CSV, JSON, MusicBrainz dump).
@startuml
!theme plain
actor "User" as U
participant "CLI /\ngRPC" as API
participant "Import\nEngine" as IE
database "SQLite" as DB
participant "Tantivy" as TI
participant "Event\nBus" as EB
U -> API : musicfs metadata import\n--format csv metadata.csv
API -> IE : ImportMetadata(file, format)
IE -> IE : Parse source file (CSV/JSON)
IE -> IE : Match rows to files by\npath, ISRC, or MusicBrainz ID
IE -> DB : BEGIN TRANSACTION
loop for each matched row
IE -> DB : UPDATE files SET matched columns
IE -> TI : Re-index file
IE --> API : stream progress
end
IE -> DB : COMMIT
IE -> EB : Emit BatchImportComplete
IE --> API : final summary
API --> U : updated N, skipped M, errors K
@enduml
4.3 Offset Translation
Virtual File (what consumer sees):
┌─────────────────────┬────────────────────────────────────────────┐
│ Synthetic Header │ Original Audio │
│ (N bytes) │ (M bytes) │
│ generated on-fly │ from CAS │
└─────────────────────┴────────────────────────────────────────────┘
0 N N+M
↑ ↑
header_len virtual_size
Origin File (on storage):
┌─────────────────────┬────────────────────────────────────────────┐
│ Original Header │ Original Audio │
│ (X bytes) │ (M bytes) │
└─────────────────────┴────────────────────────────────────────────┘
0 X X+M
↑ ↑
layout.audio_start layout.audio_end
Offset Translation:
virtual_offset → origin_offset
if virtual_offset < N:
return synthetic_header[virtual_offset]
else:
origin_offset = X + (virtual_offset - N)
return cas_read(file_id, origin_offset)
4.4 Format Handler Plugin System
4.4.1 Handler Trait
/// Trait for format-specific metadata handling.
///
/// Implementations handle:
/// 1. Analyzing original files to find audio boundaries
/// 2. Synthesizing new headers from database metadata
///
/// Plugins implement this trait and register via FormatHandlerRegistry.
pub trait FormatHandler: Send + Sync + 'static {
fn id(&self) -> &'static str;
fn name(&self) -> &'static str;
fn extensions(&self) -> &[&'static str];
fn mime_types(&self) -> &[&'static str];
/// Analyze file bytes to determine audio layout.
fn analyze(&self, data: &[u8], file_size: u64) -> Result<FormatLayout, FormatError>;
/// Synthesize header bytes from metadata. Called on every read().
fn synthesize(
&self,
metadata: &FileMetadataRow,
layout: &FormatLayout,
) -> Result<Vec<u8>, FormatError>;
/// Extract metadata from header bytes (for initial ingest).
fn extract(&self, data: &[u8]) -> Result<ExtractedMetadata, FormatError>;
/// Estimate header size without full synthesis (for getattr).
fn estimate_header_size(&self, metadata: &FileMetadataRow) -> usize {
10 * 1024 // 10KB default
}
}
4.4.2 Handler Registry
pub struct FormatHandlerRegistry {
handlers: HashMap<String, Arc<dyn FormatHandler>>,
extension_map: HashMap<String, String>,
}
impl FormatHandlerRegistry {
pub fn new() -> Self {
// Only MP3 and FLAC shipped by default.
// Other handlers registered via load_plugins() or register().
let mut r = Self { .. };
r.register(Arc::new(Id3v2Handler::new())); // .mp3
r.register(Arc::new(FlacHandler::new())); // .flac
r
}
pub fn register(&mut self, handler: Arc<dyn FormatHandler>) { /* ... */ }
pub fn get_by_extension(&self, ext: &str) -> Option<Arc<dyn FormatHandler>> { /* ... */ }
pub fn load_plugins(&mut self, plugin_dir: &Path) -> Result<usize> { /* ... */ }
}
4.4.3 Format Complexity Summary
| Format | Handler | Complexity | Shipped |
|---|---|---|---|
| MP3 | Id3v2Handler |
Low | Yes — built-in |
| FLAC | FlacHandler |
Low | Yes — built-in |
| WAV | WavHandler |
Low | On demand |
| OGG/Opus | OggHandler |
Medium | On demand |
| M4A/MP4 | Mp4Handler |
High | On demand |
MP3 and FLAC cover the vast majority of music libraries. Other formats
use the same FormatHandler trait and can be added as plugins or built-in
handlers when needed — the architecture does not change.
4.5 Database Schema
All metadata fields are individual columns. SQLite NULL columns cost 0 bytes.
Only format_layout and custom_tags are blobs.
PRAGMA journal_mode = WAL;
PRAGMA foreign_keys = ON;
PRAGMA synchronous = NORMAL;
CREATE TABLE IF NOT EXISTS files (
id INTEGER PRIMARY KEY,
origin_id TEXT NOT NULL,
real_path TEXT NOT NULL,
virtual_path TEXT NOT NULL,
-- ═══ Core Identification ═══
title TEXT,
artist TEXT,
album TEXT,
album_artist TEXT,
track_number INTEGER,
track_total INTEGER,
disc_number INTEGER,
disc_total INTEGER,
date TEXT, -- "2024" or "2024-05-17"
year INTEGER, -- extracted for convenience
genre TEXT,
-- ═══ Credits ═══
composer TEXT,
comment TEXT,
lyrics TEXT,
copyright TEXT,
compilation INTEGER, -- 0/1
-- ═══ Sorting ═══
artist_sort TEXT,
album_artist_sort TEXT,
album_sort TEXT,
title_sort TEXT,
-- ═══ MusicBrainz IDs ═══
mb_recording_id TEXT, -- Recording MBID
mb_album_id TEXT, -- Release MBID
mb_artist_id TEXT, -- Artist MBID
mb_album_artist_id TEXT, -- Album Artist MBID
mb_release_group_id TEXT, -- Release Group MBID
-- ═══ ReplayGain ═══
replaygain_track_gain REAL, -- dB
replaygain_track_peak REAL, -- 0.0-1.0+
replaygain_album_gain REAL,
replaygain_album_peak REAL,
-- ═══ Technical (from audio stream, read-only) ═══
duration_ms INTEGER,
bitrate INTEGER, -- kbps
sample_rate INTEGER, -- Hz
channels INTEGER,
bits_per_sample INTEGER,
format TEXT, -- "flac", "mp3", etc.
encoder TEXT, -- encoding software
-- ═══ Custom Tags (overflow for non-standard fields) ═══
custom_tags TEXT, -- JSON: {"ISRC":"US1234","LABEL":"Sony"}
-- ═══ Format Layout (for byte-range splicing) ═══
-- Stored as msgpack blob. Contains audio_start, audio_end,
-- format_data (STREAMINFO for FLAC, stco for MP4, etc.)
format_layout BLOB,
-- ═══ Sync State ═══
origin_mtime INTEGER NOT NULL,
origin_size INTEGER NOT NULL,
content_hash TEXT,
chunk_manifest BLOB,
last_sync INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
-- ═══ Trash (existing feature) ═══
trashed INTEGER NOT NULL DEFAULT 0,
original_path TEXT,
trashed_at INTEGER,
UNIQUE(origin_id, real_path)
);
-- ═══ Indexes ═══
CREATE INDEX IF NOT EXISTS idx_files_virtual ON files(virtual_path);
CREATE INDEX IF NOT EXISTS idx_files_artist_album ON files(artist, album);
CREATE INDEX IF NOT EXISTS idx_files_content_hash ON files(content_hash);
CREATE INDEX IF NOT EXISTS idx_files_real ON files(origin_id, real_path);
CREATE INDEX IF NOT EXISTS idx_files_origin ON files(origin_id);
CREATE INDEX IF NOT EXISTS idx_files_last_sync ON files(last_sync);
CREATE INDEX IF NOT EXISTS idx_files_trashed ON files(trashed) WHERE trashed = 1;
CREATE INDEX IF NOT EXISTS idx_files_mb_album ON files(mb_album_id);
CREATE INDEX IF NOT EXISTS idx_files_mb_artist ON files(mb_artist_id);
CREATE INDEX IF NOT EXISTS idx_files_genre ON files(genre);
CREATE INDEX IF NOT EXISTS idx_files_year ON files(year);
CREATE INDEX IF NOT EXISTS idx_files_composer ON files(composer);
-- ═══ Artwork (unchanged, separate table) ═══
CREATE TABLE IF NOT EXISTS artwork (
id INTEGER PRIMARY KEY,
file_id INTEGER NOT NULL REFERENCES files(id) ON DELETE CASCADE,
art_type TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
width INTEGER,
height INTEGER,
mime_type TEXT,
UNIQUE(file_id, art_type)
);
CREATE INDEX IF NOT EXISTS idx_artwork_file ON artwork(file_id);
-- ═══ Collections (unchanged) ═══
CREATE TABLE IF NOT EXISTS collections (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE,
query_json TEXT NOT NULL,
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now')),
updated_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);
-- ═══ Directories (unchanged) ═══
CREATE TABLE IF NOT EXISTS directories (
id INTEGER PRIMARY KEY,
path TEXT NOT NULL UNIQUE,
created_at INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))
);
CREATE INDEX IF NOT EXISTS idx_directories_path ON directories(path);
4.6 Read Algorithm
impl OverlayReader {
pub async fn read(
&self,
file_id: FileId,
offset: u64,
size: u32,
) -> Result<Bytes, ReaderError> {
let file = self.db.get_file(file_id)?;
let layout: FormatLayout = rmp_serde::from_slice(&file.format_layout)?;
let handler = self.registry.get_by_format(&file.format)?;
// Generate header on-the-fly (~10-50 μs)
let header = handler.synthesize(&file, &layout)?;
let header_len = header.len() as u64;
let audio_len = layout.audio_end - layout.audio_start;
let virtual_size = header_len + audio_len;
let virtual_end = (offset + size as u64).min(virtual_size);
if offset >= virtual_size {
return Ok(Bytes::new());
}
let mut result = BytesMut::with_capacity((virtual_end - offset) as usize);
// Region 1: Synthetic header
if offset < header_len {
let end = virtual_end.min(header_len);
result.extend_from_slice(&header[offset as usize..end as usize]);
}
// Region 2: Origin audio data
if virtual_end > header_len {
let audio_start = header_len.max(offset) - header_len;
let audio_size = (virtual_end - header_len.max(offset)) as u32;
let origin_offset = layout.audio_start + audio_start;
let audio = self.cas_reader.read(file_id, origin_offset, audio_size).await?;
result.extend_from_slice(&audio);
}
Ok(result.freeze())
}
}
4.7 API Design
4.7.1 gRPC Extensions
service MetadataService {
rpc GetMetadata(GetMetadataRequest) returns (MetadataResponse);
rpc UpdateMetadata(UpdateMetadataRequest) returns (UpdateMetadataResponse);
rpc ClearOverlay(ClearOverlayRequest) returns (ClearOverlayResponse);
rpc BatchUpdateMetadata(BatchUpdateRequest) returns (stream BatchUpdateProgress);
rpc ImportMetadata(ImportMetadataRequest) returns (stream ImportProgress);
}
message UpdateMetadataRequest {
int64 file_id = 1;
// Only set fields you want to change.
// Unset fields are left as-is (merge behavior).
optional string title = 2;
optional string artist = 3;
optional string album = 4;
optional string album_artist = 5;
optional uint32 track_number = 6;
optional uint32 disc_number = 7;
optional string date = 8;
optional string genre = 9;
optional string composer = 10;
optional string comment = 11;
optional string lyrics = 12;
optional string copyright = 13;
optional bool compilation = 14;
optional string artist_sort = 15;
optional string album_artist_sort = 16;
optional string album_sort = 17;
optional string title_sort = 18;
optional string mb_recording_id = 20;
optional string mb_album_id = 21;
optional string mb_artist_id = 22;
optional float replaygain_track_gain = 30;
optional float replaygain_track_peak = 31;
optional float replaygain_album_gain = 32;
optional float replaygain_album_peak = 33;
map<string, string> custom_tags = 50;
}
4.7.2 CLI Interface
Two ways to set metadata: flags for quick single-field edits, JSON for bulk or complex updates. Both can be combined.
# ── View ──
# Print all metadata as JSON
musicfs metadata get "/Artist/Album/01 - Track.flac"
# Print specific field
musicfs metadata get "/Artist/Album/01 - Track.flac" --field artist
# ── Edit via flags (one field at a time or several) ──
musicfs metadata set "/Artist/Album/01 - Track.flac" \
--title "Corrected Title"
musicfs metadata set "/Artist/Album/01 - Track.flac" \
--artist "Corrected Artist" \
--album-artist "Corrected Artist" \
--year 2024 \
--genre "Rock"
# Every DB column has a corresponding flag:
# --title, --artist, --album, --album-artist,
# --track-number, --track-total, --disc-number, --disc-total,
# --date, --year, --genre,
# --composer, --comment, --lyrics, --copyright, --compilation,
# --artist-sort, --album-artist-sort, --album-sort, --title-sort,
# --mb-recording-id, --mb-album-id, --mb-artist-id,
# --mb-album-artist-id, --mb-release-group-id,
# --replaygain-track-gain, --replaygain-track-peak,
# --replaygain-album-gain, --replaygain-album-peak,
# --encoder
# Set a custom tag (anything not in the standard set)
musicfs metadata set "/path/to/file" --custom ISRC=US1234567890
# ── Edit via JSON (any number of fields at once) ──
# Inline JSON
musicfs metadata set "/Artist/Album/01 - Track.flac" --json '{
"title": "Corrected Title",
"artist": "Corrected Artist",
"year": 2024,
"custom_tags": {"ISRC": "US1234567890", "LABEL": "Sony"}
}'
# From file
musicfs metadata set "/Artist/Album/01 - Track.flac" --json @metadata.json
# Flags and JSON can be combined (flags take precedence)
musicfs metadata set "/path/to/file" --json @base.json --year 2025
# ── Revert ──
# Revert to original embedded metadata
musicfs metadata clear "/Artist/Album/01 - Track.flac"
# ── Diff ──
# Show what changed vs original
musicfs metadata diff "/Artist/Album/01 - Track.flac"
# ── Batch ──
# Import from CSV (columns map to field names)
musicfs metadata import --format csv metadata.csv
# Import from JSON (array of objects with "path" or "file_id" key)
musicfs metadata import --format json metadata.json
# Export
musicfs metadata export --output metadata.json
musicfs metadata export --query "artist:Beatles" --output beatles.json
5. Cross-Cutting Concerns
5.1 Security & Privacy
| Concern | Mitigation |
|---|---|
| Plugin isolation | Native plugins require explicit trust; future WASM sandboxing |
| No credential exposure | Overlays contain only metadata, never auth tokens |
| Backup/restore | All data in SQLite, included in standard backup |
5.2 Observability
Metrics:
musicfs_overlay_files_modified # Files with user-edited metadata
musicfs_overlay_generation_us # Histogram: header generation time
musicfs_overlay_read_total # Reads served via overlay
Logging:
INFO overlay.update file_id=123 fields=[title,artist]
DEBUG overlay.read file_id=123 offset=0 size=65536 generation_us=42
WARN overlay.format file_id=456 error="No handler for format=opus"
5.3 Scalability & Performance
| Metric | Target | Notes |
|---|---|---|
| Header generation | <100 μs | ~10-50 μs typical, pure CPU |
| read() overhead vs passthrough | <5% | One DB read + one synthesize |
| getattr() overhead | <1 μs | estimate_header_size(), no full synthesis |
| Storage per file | 0 extra | Metadata already in columns |
| Memory (LRU cache) | Optional | Cache hot headers if profiling shows need |
5.4 Testing Plan
| Test Type | Coverage |
|---|---|
| Unit | FormatHandler implementations, offset arithmetic |
| Integration | Full read path with overlays, DB round-trip |
| Format Matrix | Each format × {overlay on, overlay off} |
| Fuzzing | Malformed files, boundary offsets, huge metadata |
| Player Compat | mpv, VLC, Jellyfin, Plex, ffprobe |
6. Alternatives Considered
6.1 Alternative A: Pre-generate and Store Headers in DB
Description: Synthesize headers on metadata update, store as BLOB.
Rejected Because:
- 1-10 KB per file × 1M files = 1-10 GB unnecessary storage
- Cache invalidation complexity (must regenerate on any field change)
- Generation is <100 μs — faster than a SQLite BLOB read of that size
- More moving parts for no measurable benefit
6.2 Alternative B: NFO Sidecar Files
Description: Generate .nfo XML files alongside audio files.
Rejected Because:
- Only works with players that support NFO (Jellyfin, Plex)
- mpv, VLC, foobar2000 read embedded tags only
- Not transparent to all consumers
6.3 Alternative C: Full File Rewrite + CAS Cache
Description: Rewrite entire file with new metadata, cache in CAS.
Rejected Because:
- Doubles storage for modified files
- High CPU/memory on first access
- Defeats CAS deduplication
6.4 Alternative D: Metadata Blobs Instead of Columns
Description: Store metadata as a single msgpack/JSON blob per file.
Rejected Because:
- Not directly queryable (no
WHERE artist = ?) - Not indexable
- SQLite NULL columns cost 0 bytes — no space savings from blobs
- Schema is self-documenting with columns
- Virtual path templates can reference any column directly
7. Implementation Plan
7.1 Phase 1: Schema Migration + Core Types (3 days)
| Deliverable | Details |
|---|---|
| Schema migration | Add new columns to files table |
FormatLayout struct |
Audio boundary description |
FormatHandler trait |
Plugin interface |
FormatHandlerRegistry |
Built-in handler registration |
Exit Criteria: DB migrates cleanly, types compile.
7.2 Phase 2: Ingest Pipeline Update (3 days)
| Deliverable | Details |
|---|---|
| Update symphonia parser | Extract all new fields |
| Format analysis on ingest | Run analyze() → store format_layout |
| Populate new DB columns | All fields written on scan |
Exit Criteria: Full rescan populates all metadata columns.
7.3 Phase 3: Read Path + MP3/FLAC (5 days)
| Deliverable | Details |
|---|---|
OverlayReader |
Splice logic in FUSE read() |
Id3v2Handler |
analyze + synthesize for MP3 |
FlacHandler |
analyze + synthesize for FLAC |
| FUSE getattr() | Return virtual_size |
Exit Criteria: ffprobe/mpv reads modified MP3 and FLAC tags.
7.4 Phase 4: API + CLI (3 days)
| Deliverable | Details |
|---|---|
| gRPC MetadataService | get, set, clear, batch, import |
| CLI commands | musicfs metadata {get,set,clear,diff,import,export} |
Exit Criteria: Full API functional end-to-end.
7.6 Rollout
[experimental]
metadata_overlay = true # Enable overlay feature
[metadata_overlay]
# Additional format handlers loaded from this directory
plugin_dir = "/etc/musicfs/format-plugins/"
Files with no registered handler for their format are served with original bytes unchanged (passthrough). No error, no degradation.
8. Glossary & References
8.1 Glossary
| Term | Definition |
|---|---|
| Overlay | Mode where file serves user-edited metadata instead of original |
| Synthetic Header | Format-specific metadata bytes generated on-the-fly |
| Format Layout | Description of audio/metadata byte boundaries in origin file |
| Offset Translation | Converting virtual file offset to origin file offset |
8.2 References
| Document | Link |
|---|---|
| ID3v2.4 Specification | https://id3.org/id3v2.4.0-structure |
| FLAC Format | https://xiph.org/flac/format.html |
| OGG Encapsulation | https://xiph.org/ogg/doc/rfc3533.txt |
| MP4 Specification | ISO/IEC 14496-12 |
| MusicBrainz Picard Tag Mapping | https://picard-docs.musicbrainz.org/en/appendices/tag_mapping.html |
| symphonia StandardTagKey | https://docs.rs/symphonia-core/0.5.4/symphonia_core/meta/enum.StandardTagKey.html |
| lofty-rs | https://github.com/Serial-ATA/lofty-rs |
| MusicFS Architecture | architecture.md |
8.3 New Dependencies
| Crate | Version | Purpose |
|---|---|---|
| lofty | 0.24+ | Metadata header generation (all formats) |