Compare commits

...

4 Commits

Author SHA1 Message Date
Alexander 154f85bd9b chore(flake): add embedme to dev shell and pre-commit hooks
Keeps README code blocks in sync with source files (config.example.toml, dist/musicfs.service) on every commit.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/claude-agent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-18 13:43:08 +02:00
Alexander 61457e1f89 docs: add comprehensive project README
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/claude-agent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-18 13:43:03 +02:00
Alexander 4a1b68981e Forgotten fixes 2026-05-18 13:31:31 +02:00
Alexander b88583707d feat: add metadata enrichment integration with music-agregator
- Add SyncedFile message and subdir scoping to RescanOrigin proto
- Add label, album_type, cover_url fields to UpdateMetadataRequest/MetadataResponse
- Implement OriginScanner: walk, hash, diff, ingest with live FUSE tree and content fetcher registration
- Add enrichment DB columns: enrichment_source, enriched_at, enrichment_attempts, genres_json, label, album_type, cover_url
- Add EnrichmentUpdate struct and update_enrichment DB method
- Wire BatchUpdateMetadata to write enrichment fields alongside audio metadata
- Wire gRPC server into CLI mount command with --grpc-port flag
- Pass VirtualTree and ContentFetcher to scanner so rescanned files are immediately visible and readable via FUSE
2026-05-17 23:32:18 +02:00
15 changed files with 2063 additions and 42 deletions
Generated
+3
View File
@@ -2050,8 +2050,11 @@ dependencies = [
"hex",
"hmac",
"musicfs-cache",
"musicfs-cas",
"musicfs-core",
"musicfs-metadata",
"musicfs-search",
"parking_lot 0.12.5",
"prost",
"reqwest",
"serde",
+879
View File
@@ -0,0 +1,879 @@
# MusicFS
> A read-only FUSE filesystem that presents your music library organized by metadata — artist, album, track — regardless of how files are stored on disk.
Browse `/Artist/Album/Track.flac` in any media player or file manager. Original files are never touched.
---
## What It Does
MusicFS mounts as a virtual filesystem. Point it at your music storage (local drive, NFS share, S3 bucket, SFTP server) and it exposes a clean metadata-based directory tree:
```
/mnt/music/
├── Metallica/
│ └── 72 Seasons (2023) [FLAC]/
│ ├── 01 - 72 Seasons.flac
│ ├── 02 - Shadows Follow.flac
│ └── cover.jpg
├── Pink Floyd/
│ └── The Wall (1979) [FLAC]/
│ ├── 01 - In the Flesh?.flac
│ └── ...
└── .search/
└── (full-text search — see Search section)
```
Files are read directly from origin storage with local chunk caching. Once cached, playback works entirely offline. Write operations return `EROFS` — origin files are always safe.
---
## Features
| Feature | Details |
|---------|---------|
| **Instant mount** | O(1) regardless of library size (<500ms) |
| **Metadata-organized paths** | Configurable path templates via `$artist`, `$album`, `$year`, etc. |
| **Multi-origin federation** | Local, NFS, SMB, S3, SFTP — automatic failover by priority |
| **Content-addressable cache** | Chunk-level deduplication, LRU eviction, delta sync (>90% bandwidth savings) |
| **Full-text search** | `/.search/metallica/` returns instant results across 1M+ tracks |
| **Metadata overlay** | Set/override tags in the virtual layer without modifying originals |
| **Album art** | Virtual `cover.jpg` per album, extracted from embedded tags |
| **Plugin system** | Native `.so` and WASM plugins for custom origins, formats, metadata sources |
| **gRPC control API** | Cache stats, origin health, live event streaming, metadata management |
| **systemd integration** | `sd_notify` ready, journald logging, clean SIGTERM handling |
**Supported formats:** FLAC, MP3, OGG, WAV, M4A, AAC, Opus
---
## Quick Start
```bash
# 1. Enter dev environment (provides Rust, FUSE3, SQLite, everything)
nix develop
# 2. Build
cargo build
# 3. Mount your music library
./target/debug/musicfs mount /mnt/music --origin /path/to/your/music
# 4. Browse
ls /mnt/music
mpv /mnt/music/Artist/Album/01\ -\ Track.flac
# 5. Unmount
fusermount -u /mnt/music
```
No `rustup`, no `apt install`. The Nix flake provides the full toolchain.
---
## Installation
### From Nix (recommended)
```bash
# Development shell — everything you need
nix develop
# Or install the binary into your profile
nix profile install .#musicfs
```
### From Source
**Prerequisites (non-Nix):**
- Rust 1.75+
- `libfuse3-dev` / `fuse3` (package name varies by distro)
- `libsqlite3-dev`
- `libssl-dev`
- `protobuf-compiler` (for gRPC)
- `clang` + `lld`
```bash
git clone https://github.com/user/musicfs
cd musicfs/musicfs
cargo build --release
sudo cp target/release/musicfs /usr/local/bin/
```
### System Requirements
| Resource | Minimum | Recommended |
|----------|---------|-------------|
| CPU | 1 core | 4 cores |
| RAM | 256 MB | 2 GB |
| Disk (cache) | 1 GB | 50 GB |
| Linux kernel | 4.x+ | 5.x+ |
| FUSE module | required | — |
---
## Configuration
MusicFS can be configured via file (`--config`), CLI flags, or environment variables (`RUST_LOG` for log level).
### Minimal Config
```toml
mount_point = "/mnt/music"
cache_dir = "/home/user/.cache/musicfs"
[[origins]]
id = "local"
origin_type = "local"
priority = 1
path = "/mnt/nas/music"
```
```bash
musicfs mount --config /etc/musicfs/config.toml
```
### Full Config Reference
<!-- embedme config.example.toml -->
```toml
# MusicFS Configuration
# Copy to /etc/musicfs/config.toml or ~/.config/musicfs/config.toml
# Required: where to mount the virtual filesystem
mount_point = "/mnt/music"
# Required: directory for cache data (CAS chunks, metadata, search index)
cache_dir = "/var/cache/musicfs"
# ------------------------------------------------------------------------------
# Origins - music sources (at least one required)
# Supported types: local, nfs, smb, s3, sftp
# Lower priority number = preferred source for failover
# ------------------------------------------------------------------------------
[[origins]]
id = "local-music"
origin_type = "local"
priority = 1
enabled = true
path = "/home/user/Music"
[[origins]]
id = "nas-nfs"
origin_type = "nfs"
priority = 2
enabled = true
path = "/mnt/nas/music"
[[origins]]
id = "nas-smb"
origin_type = "smb"
priority = 3
enabled = false
path = "/mnt/smb/music"
[[origins]]
id = "cloud-backup"
origin_type = "s3"
priority = 10
enabled = false
bucket = "my-music-backup"
region = "us-east-1"
[[origins]]
id = "remote-server"
origin_type = "sftp"
priority = 10
enabled = false
host = "music.example.com"
port = 22
user = "musicfs"
path = "/srv/music"
# ------------------------------------------------------------------------------
# Cache settings
# ------------------------------------------------------------------------------
[cache]
# In-memory metadata cache size (artist/album/track info)
metadata_cache_mb = 100
# On-disk content cache size (audio chunks)
content_cache_gb = 10
# ------------------------------------------------------------------------------
# Health monitoring for origin failover
# ------------------------------------------------------------------------------
[health]
# How often to check origin health
check_interval_secs = 30
# Timeout for health check probes
timeout_ms = 5000
# Consecutive failures before marking origin unhealthy
unhealthy_threshold = 3
# Per-origin type thresholds (overrides unhealthy_threshold)
[health.per_origin_thresholds]
local = 1
nfs = 3
smb = 3
s3 = 3
sftp = 3
# ------------------------------------------------------------------------------
# Logging
# ------------------------------------------------------------------------------
[logging]
# Directory for log files
log_dir = "/var/log/musicfs"
# Output logs as JSON (for log aggregators)
json_output = false
# Send logs to systemd journal
journald = true
# Log level filter (tracing format)
# Examples: "info", "debug", "musicfs=debug,warn", "musicfs_fuse=trace"
level = "musicfs=info,warn"
# Trace sampling rate for performance tracing (0.0 to 1.0)
trace_sample_rate = 1.0
```
### Cache Layout on Disk
```
~/.cache/musicfs/
├── musicfs.db # SQLite: file metadata, virtual tree, overlay data
├── musicfs.lock # Single-instance lock
├── musicfs.pid # Daemon PID
├── chunks/ # Content-addressable chunk files
│ ├── aa/ # 256 subdirs (first 2 hex chars of hash)
│ │ └── aa1b2c… # 64 KB average chunk
│ └── ...
├── search.idx/ # Tantivy full-text search index
└── chunks.sled/ # Sled KV: content hash → chunk location
```
---
## CLI Reference
```
musicfs [OPTIONS] <COMMAND>
OPTIONS:
-l, --log-level <LEVEL> Log verbosity [default: info]
```
### `mount` — Start the filesystem
```bash
# From CLI flags (quick start)
musicfs mount /mnt/music --origin /path/to/music
# From config file
musicfs mount --config /etc/musicfs/config.toml
# All flags
musicfs mount [MOUNTPOINT] \
--config <path> # Config file (overrides flags)
--origin <path> # Source music directory
--cache-dir <path> # Cache location [default: ~/.cache/musicfs]
--grpc-port <port> # gRPC server port [default: 50052]
```
### `status` — Daemon status
```bash
musicfs status
```
### `cache` — Cache management
```bash
musicfs cache stats # Hit rate, size, dedup ratio
musicfs cache clear # Clear all caches
musicfs cache clear <origin-id> # Clear cache for one origin
musicfs cache prefetch <path> [path…] # Pre-warm cache for paths
```
### `search` — Full-text search
```bash
musicfs search "metallica" # Search across all metadata
musicfs search "dark side" --limit 20 # Limit results [default: 100]
```
Search results are also browsable as a virtual directory (see [Search](#search)).
### `origin` — Origin management
```bash
musicfs origin list # List all configured origins
musicfs origin health <id> # Check health of one origin
musicfs origin rescan <id> # Force re-scan and re-index
```
### `metadata` — Metadata overlay
```bash
# Requires running daemon
musicfs metadata get "/Artist/Album/01 - Track.flac"
musicfs metadata get "/Artist/Album/01 - Track.flac" --field artist
musicfs metadata set "/Artist/Album/01 - Track.flac" \
--title "New Title" \
--artist "New Artist" \
--album "New Album" \
--track 1 \
--genre "Rock" \
--date "2023"
# Set from JSON
musicfs metadata set "/path/to/file.flac" --json '{"title":"foo","year":2023}'
# Show current (overlaid) metadata
musicfs metadata diff "/path/to/file.flac"
# Revert overlay — restore original metadata
musicfs metadata clear "/path/to/file.flac"
# Bulk import/export
musicfs metadata import library.csv
musicfs metadata import library.json
musicfs metadata export --output library.json
musicfs metadata export --output library.csv --query "artist:Metallica"
```
> **Note:** `--endpoint` flag (default `http://[::1]:50051`) selects the gRPC server.
### `trash` — Deleted file recovery
When files disappear from the origin, MusicFS moves them to a virtual trash rather than removing them immediately.
```bash
musicfs trash list --config /etc/musicfs/config.toml
musicfs trash list --since 7d # Deleted in last 7 days
musicfs trash list --origin local # Filter by origin
musicfs trash list --path "/Metallica" # Filter by path prefix
musicfs trash restore "/Metallica/72 Seasons" # Restore folder
musicfs trash restore --all # Restore everything
musicfs trash empty --older-than 30d # Permanently delete old entries
musicfs trash empty --pattern "/Unknown*" # Delete by pattern
```
### `events` — Live event stream
```bash
musicfs events # All events
musicfs events --type file_added # Filter by type
# Event types: file_added, file_removed, file_modified,
# origin_connected, origin_disconnected,
# sync_started, sync_completed, cache_eviction
```
### `shutdown` — Stop the daemon
```bash
musicfs shutdown # Graceful (drain in-flight ops)
musicfs shutdown --graceful false # Immediate
musicfs shutdown --timeout 60 # Max drain timeout seconds
```
---
## Storage Origins
### Local Filesystem
```toml
[[origins]]
id = "local"
origin_type = "local"
priority = 1
path = "/mnt/nas/music"
```
Changes detected via `inotify`. Zero-latency access.
### NFS
```toml
[[origins]]
id = "nfs"
origin_type = "nfs"
priority = 2
host = "nas.local"
export = "/exports/music"
```
### SMB / CIFS
```toml
[[origins]]
id = "smb"
origin_type = "smb"
priority = 3
host = "nas.local"
share = "music"
```
### S3 (stub — not yet functional)
```toml
[[origins]]
id = "s3"
origin_type = "s3"
priority = 4
bucket = "my-music"
region = "us-east-1"
# Credentials via AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY env vars
```
### SFTP (stub — not yet functional)
```toml
[[origins]]
id = "sftp"
origin_type = "sftp"
priority = 4
host = "server.example.com"
port = 22
username = "alice"
# Auth via SSH agent or key file — never store passwords in config
```
### Multi-Origin Failover
Multiple origins are federates into a single virtual tree. MusicFS selects origins by priority, falling back automatically when one becomes unhealthy. Health is polled every `check_interval_secs` (default: 30s). When all origins for a file are unavailable, cached data is served seamlessly.
---
## Virtual Filesystem Layout
### Path Templates
The virtual path for each file is built from its audio metadata using a configurable template. Variables are sanitized (no `/`, `\`, `:`).
**Default template:**
```
$artist/$album ($year) [$format_upper]/$track - $title.$format
```
**Template variables:**
| Variable | Description | Example |
|----------|-------------|---------|
| `$artist` | Track artist | `Metallica` |
| `$album` | Album name | `72 Seasons` |
| `$title` | Track title | `Lux Æterna` |
| `$track` | Track number (zero-padded) | `03` |
| `$disc` | Disc number | `1` |
| `$year` | Release year | `2023` |
| `$genre` | Genre | `Metal` |
| `$format` | File extension (lowercase) | `flac` |
| `$format_upper` | File extension (uppercase) | `FLAC` |
Files with missing metadata fall back to `Unknown Artist/Unknown Album/filename`.
### Album Art
Each album directory includes a virtual `cover.jpg` extracted from the embedded tags of the first track. No files are written to disk by MusicFS — the image is synthesized on read.
### Search
The `/.search/` virtual directory exposes full-text search as filesystem paths:
```bash
# Search via filesystem — use the query as a directory name
ls "/mnt/music/.search/dark side of the moon/"
# → Returns matching tracks as symlinks to their virtual paths
# Or use the CLI
musicfs search "dark side of the moon"
musicfs search "artist:Metallica" --limit 50
```
**Query syntax** (powered by [tantivy](https://github.com/quickwit-oss/tantivy)):
| Syntax | Example | Matches |
|--------|---------|---------|
| Simple terms | `metallica sandman` | All fields contain both words |
| Field-specific | `artist:Metallica` | Artist field only |
| Phrase | `album:"Master of Puppets"` | Exact phrase in album |
| Fuzzy | `metalica~1` | Within Levenshtein distance 1 |
| Range | `year:[1980 TO 1989]` | Numeric range |
| Boolean | `genre:Metal AND year:[1980 TO 1989]` | Combined conditions |
Indexed fields: `title`, `artist`, `album`, `album_artist`, `genre`, `composer`, `year`.
Results cached for 5 minutes. Max 1000 results per query. Queries capped at 256 characters.
### Smart Collections
Built-in and custom query-based virtual folders appear alongside regular directories:
- **Recently Added** — tracks added in the last 30 days
- **80s Music** — year 19801989
- **90s Music** — year 19901999
Custom collections can be defined via the gRPC API with compound boolean queries over any indexed field.
---
## Metadata Overlay
MusicFS lets you override metadata in the virtual layer **without touching origin files**. Overlaid metadata is synthesized into the audio file header on read — players see your corrected tags, the origin file is unchanged.
```bash
# Fix a misnamed artist
musicfs metadata set "/Unknown/Best Of/01 - Track.flac" \
--artist "The Beatles" \
--album "Past Masters"
# Verify
musicfs metadata get "/The Beatles/Past Masters/01 - Track.flac"
# See what's been overlaid vs. original
musicfs metadata diff "/The Beatles/Past Masters/01 - Track.flac"
# Revert
musicfs metadata clear "/The Beatles/Past Masters/01 - Track.flac"
```
Supported fields: `title`, `artist`, `album`, `album-artist`, `track`, `disc`, `genre`, `date`, `composer`, `comment`, `lyrics`, `copyright`, `compilation`, sort fields (`artist-sort`, etc.), MusicBrainz IDs, ReplayGain values, and arbitrary custom tags.
---
## Plugin Development
Plugins extend MusicFS without modifying core code. Three plugin types:
| Type | Purpose | Examples |
|------|---------|---------|
| **Origin** | Custom storage backends | Google Drive, Dropbox, custom NAS protocol |
| **Metadata** | External tag enrichment | MusicBrainz, Discogs, Last.fm |
| **Format** | Custom audio formats | Game audio, proprietary codecs |
### Native Plugin (`.so`)
```rust
// Cargo.toml
[lib]
crate-type = ["cdylib"]
[dependencies]
musicfs-plugins = { path = "..." }
semver = "1"
serde_json = "1"
```
```rust
use musicfs_plugins::{declare_plugin, Plugin, PluginType, FormatPlugin};
use musicfs_core::AudioMeta;
use semver::Version;
use serde_json::Value;
struct MyFormatPlugin;
impl Plugin for MyFormatPlugin {
fn name(&self) -> &str { "my-format" }
fn version(&self) -> Version { Version::new(1, 0, 0) }
fn plugin_type(&self) -> PluginType { PluginType::Format }
fn init(&mut self, _config: Value) -> musicfs_plugins::Result<()> { Ok(()) }
fn shutdown(&mut self) -> musicfs_plugins::Result<()> { Ok(()) }
}
impl FormatPlugin for MyFormatPlugin {
fn extensions(&self) -> &[&str] { &["xyz"] }
fn parse(&self, reader: &mut dyn std::io::Read) -> musicfs_plugins::Result<AudioMeta> {
// Parse your format and return metadata
todo!()
}
fn synthesize_header(&self, metadata: &AudioMeta) -> musicfs_plugins::Result<Vec<u8>> {
// Build a new file header with updated metadata
todo!()
}
}
// Required export — MusicFS calls this to instantiate the plugin
declare_plugin!(MyFormatPlugin, MyFormatPlugin);
```
```bash
cargo build --release
# produces target/release/libmy_format_plugin.so
```
### Loading Plugins
```toml
[plugins]
enabled = true
search_paths = ["/usr/lib/musicfs/plugins"] # Auto-discover .so files here
[plugins.plugins.my-format]
path = "/path/to/libmy_format_plugin.so"
enabled = true
config = { key = "value" } # Passed to Plugin::init()
```
### WASM Plugins (experimental)
```toml
[plugins.wasm]
enabled = true
max_memory_mb = 64
max_cpu_time_ms = 5000
```
Load a `.wasm` binary at runtime via the gRPC API or by placing it in a search path. WASM plugins run sandboxed inside [wasmtime](https://wasmtime.dev/).
### Plugin API Version
Current: `0.1.0`. Breaking changes will increment the major version. MusicFS checks `musicfs_plugin_api_version()` before loading any native plugin.
---
## Control API (gRPC)
MusicFS exposes a gRPC API for programmatic control. The server starts automatically with the daemon.
**Default port:** `50052` (override with `--grpc-port`)
**Proto definition:** `crates/musicfs-grpc/proto/musicfs.proto`
### Available RPCs
```
MusicFS service:
GetStatus → daemon version, uptime, mount state, open handles
Shutdown → graceful or forced stop
GetCacheStats → hit rate, chunk count, dedup ratio, per-tier breakdown
ClearCache → clear all or per-origin, per-tier, dry-run supported
Prefetch → pre-warm cache for paths or search queries
ListOrigins → all configured origins with file count and health
GetOriginHealth → health status and latency for one origin
RescanOrigin → force re-scan with streaming progress
Search → full-text search (paginated or streaming)
SubscribeEvents → server-streaming live event feed
MetadataService:
GetMetadata → all tags for a virtual path
UpdateMetadata → set overlay tags for a file
ClearOverlay → revert to original metadata
ImportMetadata → bulk import from CSV/JSON (streaming progress)
```
### Query with `grpcurl`
```bash
# Daemon status
grpcurl -plaintext localhost:50052 musicfs.v1.MusicFS/GetStatus
# Search
grpcurl -plaintext -d '{"query": "metallica", "limit": 10}' \
localhost:50052 musicfs.v1.MusicFS/Search
# Cache stats
grpcurl -plaintext localhost:50052 musicfs.v1.MusicFS/GetCacheStats
# List origins
grpcurl -plaintext localhost:50052 musicfs.v1.MusicFS/ListOrigins
# Trigger rescan with live progress
grpcurl -plaintext -d '{"origin_id": "local"}' \
localhost:50052 musicfs.v1.MusicFS/RescanOrigin
# Live event stream
grpcurl -plaintext localhost:50052 musicfs.v1.MusicFS/SubscribeEvents
```
---
## Production Deployment
### systemd
```bash
sudo cp dist/musicfs.service /etc/systemd/system/
# Edit the service to match your paths:
# ExecStart=/usr/bin/musicfs mount --config /etc/musicfs/config.toml
sudo systemctl enable --now musicfs
sudo systemctl status musicfs
```
<!-- embedme dist/musicfs.service -->
```ini
[Unit]
Description=MusicFS - Virtual FUSE Filesystem for Music
After=network.target
[Service]
ExecStart=/usr/bin/musicfs mount /mnt/music --origin /path/to/music
ExecStopPost=/usr/bin/fusermount -u /mnt/music
Restart=on-failure
[Install]
WantedBy=multi-user.target
```
MusicFS sends `sd_notify(READY)` when the mount is live and `sd_notify(STOPPING)` during shutdown. Use `Type=notify` for precise readiness tracking.
### Signals
| Signal | Behavior |
|--------|---------|
| `SIGTERM` | Graceful shutdown — drains in-flight ops, unmounts |
| `SIGINT` | Graceful shutdown (same) |
| `SIGHUP` | Process pending file restores from trash |
### Security Notes
- Run as an **unprivileged user** — no root required.
- Store remote credentials in the **system keyring** or environment variables. Never put them in the config file.
- Credentials are redacted from logs and `RUST_LOG` output.
- WASM plugins run sandboxed. Native `.so` plugins have full process access — only load plugins you trust.
---
## Observability
### Logs
```bash
# Set level at startup
musicfs mount ... --log-level debug
# or via env
RUST_LOG=musicfs=debug,warn musicfs mount ...
```
| Level | Content |
|-------|---------|
| `error` | Unrecoverable failures, data corruption |
| `warn` | Recoverable failures, origin timeouts, skipped files |
| `info` | Mount/unmount, sync completion, config reload |
| `debug` | Cache hits/misses, origin selection, file scans |
| `trace` | Individual FUSE operations, chunk I/O |
Log files rotate daily in `log_dir` (default: `/var/log/musicfs/`). Structured JSON available with `json_output = true`. On Linux, logs forward to journald by default (`journald = true`).
### Prometheus Metrics
Metrics are exposed in Prometheus format via the gRPC API:
```
musicfs_fuse_ops_total{op="read"} 152341
musicfs_fuse_ops_total{op="readdir"} 8234
musicfs_fuse_latency_seconds{op="read",quantile="0.99"} 0.004
musicfs_cache_hits_total 142107
musicfs_cache_misses_total 10234
musicfs_cache_size_bytes 5368709120
musicfs_origin_health{origin="local"} 1
musicfs_origin_health{origin="s3"} 0
musicfs_sync_files_changed{origin="local"} 15
```
---
## Performance
| Operation | Target | Maximum |
|-----------|--------|---------|
| Mount (any library size) | <100ms | 500ms |
| `stat()` cached | <1ms | 5ms |
| `readdir()` cached | <10ms | 50ms |
| `open()` cached | <5ms | 20ms |
| `read()` cached | <1ms | 5ms |
| `read()` cache miss, local | <50ms | 200ms |
| `read()` cache miss, remote | <200ms | 1000ms |
| Search (1M tracks) | <500ms | 1000ms |
| Sequential read (cached) | >500 MB/s | — |
| Metadata ops | >1000 ops/s | — |
Memory: <50 MB idle, <200 MB with 1K files active, <500 MB peak.
Scales to 10M+ files with O(1) mount and O(log n) lookups.
---
## Known Limitations
These are tracked issues — see `docs/v2/plans/` for details.
| Issue | Impact | Workaround |
|-------|--------|-----------|
| **No persistent state on mount** | Every restart does a full origin scan (O(N)). SQLite/search index persist but are not loaded on startup. | — |
| **S3 and SFTP origins are stubs** | Only `local`, `nfs`, and `smb` have real implementations. | Use NFS/SMB mount as proxy for remote storage. |
| **No write-through for metadata** | Overlaid metadata exists only in MusicFS's database, not in the actual audio files. | Use a tagger (beets, mp3tag) to write back if needed. |
| **FUSE↔tokio deadlock risk** | `block_on()` in sync FUSE callbacks can stall under heavy concurrent load. | Keep concurrent open handles below ~500. |
| **No background task supervision** | Health monitor, watcher, and indexer are fire-and-forget. A crash silently stops background work. | Restart the daemon periodically in critical deployments. |
---
## Architecture
MusicFS is a workspace of 11 Rust crates:
```
musicfs-cli → binary, CLI parsing, startup wiring
musicfs-fuse → FUSE operations (fuser), virtual tree serving
musicfs-core → shared types, config, events, errors
musicfs-cache → SQLite metadata DB, virtual tree, format handlers
musicfs-cas → content-addressable chunk store (sled + xxHash64)
musicfs-origins → origin backends (local, NFS, SMB, S3 stub, SFTP stub)
musicfs-metadata → audio tag extraction (symphonia)
musicfs-sync → delta sync, CDC chunking (FastCDC), inotify watcher
musicfs-search → full-text index (tantivy), .search/ virtual dir
musicfs-grpc → gRPC server (tonic + prost), proto codegen
musicfs-plugins → plugin host, native .so loader, WASM sandbox
```
Data flow on a cache miss: `FUSE read()``VirtualPathResolver``CAS` (chunk lookup) → `OriginFederation` (fetch missing range) → CDC chunk → store → return.
Full design: [`docs/v2/architecture.md`](docs/v2/architecture.md)
Requirements: [`docs/v2/requirements.md`](docs/v2/requirements.md)
Roadmap: [`docs/v2/development-plan.md`](docs/v2/development-plan.md)
---
## Development
```bash
nix develop # Enter dev shell
cargo check # Fast compile check
cargo test # All 162 tests
cargo test -p musicfs-core # Single crate
cargo clippy # Lint
cargo fmt # Format
cargo nextest run # Parallel test runner (faster)
cargo watch -x check -x test # Watch mode
# Cargo aliases
cargo t # test
cargo c # check
cargo b # build
# gRPC codegen (runs via build.rs automatically)
cargo build -p musicfs-grpc
```
Pre-commit hooks (rustfmt + clippy) are installed automatically in the Nix dev shell.
---
## License
MIT OR Apache-2.0 — see [LICENSE-MIT](LICENSE-MIT) and [LICENSE-APACHE](LICENSE-APACHE).
+74 -1
View File
@@ -786,6 +786,66 @@ impl Database {
Ok(())
}
pub fn update_enrichment(&self, file_id: FileId, enrichment: &EnrichmentUpdate) -> Result<()> {
let conn = self.conn.lock().unwrap();
let mut set_clauses = vec![
"label = ?1".to_string(),
"album_type = ?2".to_string(),
"cover_url = ?3".to_string(),
"enrichment_source = ?4".to_string(),
"enriched_at = strftime('%s', 'now')".to_string(),
"enrichment_attempts = 0".to_string(),
"last_enrichment_error = NULL".to_string(),
];
let mut params_vec: Vec<Box<dyn rusqlite::ToSql>> = vec![
Box::new(enrichment.label.clone()),
Box::new(enrichment.album_type.clone()),
Box::new(enrichment.cover_url.clone()),
Box::new(enrichment.source.clone()),
];
if let Some(ref genres) = enrichment.genres_json {
params_vec.push(Box::new(genres.clone()));
set_clauses.push(format!("genres_json = ?{}", params_vec.len()));
}
if let Some(ref genre) = enrichment.primary_genre {
params_vec.push(Box::new(genre.clone()));
set_clauses.push(format!("genre = ?{}", params_vec.len()));
}
params_vec.push(Box::new(file_id.0));
let id_param = params_vec.len();
let sql = format!(
"UPDATE files SET {} WHERE id = ?{}",
set_clauses.join(", "),
id_param
);
let params_refs: Vec<&dyn rusqlite::ToSql> =
params_vec.iter().map(|p| p.as_ref()).collect();
let rows = conn
.execute(&sql, params_refs.as_slice())
.map_err(|e| Error::Database(format!("update_enrichment failed: {}", e)))?;
if rows == 0 {
return Err(Error::FileNotFound(format!(
"file id {} not found",
file_id.0
)));
}
debug!(
id = file_id.0,
source = &enrichment.source,
"updated enrichment metadata"
);
Ok(())
}
pub fn clear_overlay(&self, file_id: FileId) -> Result<()> {
let conn = self.conn.lock().unwrap();
@@ -802,7 +862,10 @@ impl Database {
mb_recording_id = NULL, mb_album_id = NULL, mb_artist_id = NULL, mb_album_artist_id = NULL, mb_release_group_id = NULL,
replaygain_track_gain = NULL, replaygain_track_peak = NULL, replaygain_album_gain = NULL, replaygain_album_peak = NULL,
channels = NULL, bits_per_sample = NULL, encoder = NULL,
custom_tags = NULL, format_layout = NULL
custom_tags = NULL, format_layout = NULL,
label = NULL, album_type = NULL, cover_url = NULL, genres_json = NULL,
enrichment_source = NULL, enriched_at = NULL,
enrichment_attempts = 0, last_enrichment_error = NULL
WHERE id = ?1
"#,
params![file_id.0],
@@ -948,6 +1011,16 @@ pub struct TrashedFile {
pub origin_id: OriginId,
}
#[derive(Debug, Clone, Default)]
pub struct EnrichmentUpdate {
pub label: Option<String>,
pub album_type: Option<String>,
pub cover_url: Option<String>,
pub genres_json: Option<String>,
pub primary_genre: Option<String>,
pub source: String,
}
#[derive(Debug, Clone, Default)]
pub struct TrashedFilter {
pub origin_id: Option<OriginId>,
+1 -1
View File
@@ -11,7 +11,7 @@ mod prefetch;
mod tree;
pub use artwork::{ArtworkCache, ArtworkError, CachedArtwork};
pub use db::{Database, TrashedFile, TrashedFilter};
pub use db::{Database, EnrichmentUpdate, TrashedFile, TrashedFilter};
pub use eviction::{EvictionError, EvictionPolicy, LruEviction};
pub use format_handler::{FormatError, FormatHandler, FormatHandlerRegistry};
pub use format_layout::FormatLayout;
+9
View File
@@ -46,6 +46,15 @@ CREATE TABLE IF NOT EXISTS files (
encoder TEXT,
custom_tags TEXT,
format_layout BLOB,
label TEXT,
album_type TEXT,
cover_url TEXT,
genres_json TEXT,
enrichment_source TEXT,
enriched_at INTEGER,
enrichment_attempts INTEGER NOT NULL DEFAULT 0,
last_enrichment_error TEXT,
origin_mtime INTEGER NOT NULL,
origin_size INTEGER NOT NULL,
+47 -5
View File
@@ -10,6 +10,7 @@ use musicfs_cache::{
use musicfs_cas::{CasConfig, CasStore, ContentFetcher, FileReader};
use musicfs_core::{FileId, FileMeta, LoggingConfig, OriginId, RealPath, VirtualPath};
use musicfs_fuse::MusicFs;
use musicfs_grpc::{MetadataServiceImpl, MusicFsServer as GrpcServer};
use musicfs_metadata::MetadataParser;
use musicfs_origins::{LocalOrigin, Origin};
use parking_lot::RwLock;
@@ -47,6 +48,8 @@ enum Commands {
origin: Option<PathBuf>,
#[arg(short = 'd', long, help = "Cache directory")]
cache_dir: Option<PathBuf>,
#[arg(long, default_value = "50052", help = "gRPC server port")]
grpc_port: u16,
},
Status,
Cache {
@@ -165,6 +168,7 @@ fn main() -> Result<()> {
mountpoint,
origin,
cache_dir,
grpc_port,
} => {
let mut config = if let Some(config_path) = config {
musicfs_core::Config::from_file(&config_path)?
@@ -213,7 +217,7 @@ fn main() -> Result<()> {
}
let _guard = init_logging(&config.logging)?;
run_mount(config)
run_mount(config, grpc_port)
}
Commands::Status => {
init_basic_logging(&cli.log_level);
@@ -259,11 +263,11 @@ fn run_metadata(endpoint: String, command: MetadataCommand) -> Result<()> {
runtime.block_on(metadata::run_metadata(command, &endpoint))
}
fn run_mount(config: musicfs_core::Config) -> Result<()> {
fn run_mount(config: musicfs_core::Config, grpc_port: u16) -> Result<()> {
let runtime = tokio::runtime::Runtime::new().context("Failed to create Tokio runtime")?;
let handle = runtime.handle().clone();
let (tree, reader, db, overlay_reader) = runtime.block_on(async {
let (tree, reader, db, overlay_reader, origin_root, fetcher) = runtime.block_on(async {
info!(mountpoint = ?config.mount_point, "Mount configuration");
info!("Cache directory: {:?}", config.cache_dir);
@@ -364,7 +368,7 @@ fn run_mount(config: musicfs_core::Config) -> Result<()> {
let tree = Arc::new(RwLock::new(tree));
let reader = Arc::new(FileReader::with_fetcher(store.clone(), fetcher));
let reader = Arc::new(FileReader::with_fetcher(store.clone(), fetcher.clone()));
// Create overlay reader for metadata synthesis
let overlay_reader = Arc::new(OverlayReader::new(
@@ -373,7 +377,15 @@ fn run_mount(config: musicfs_core::Config) -> Result<()> {
reader.clone(),
));
Ok::<_, anyhow::Error>((tree, reader, db, overlay_reader))
let first_origin_root = config
.origins
.iter()
.find(|o| o.enabled && o.origin_type == musicfs_core::OriginType::Local)
.and_then(|o| o.settings.get("path").and_then(|v| v.as_str()))
.map(PathBuf::from)
.unwrap_or_else(|| PathBuf::from("/"));
Ok::<_, anyhow::Error>((tree, reader, db, overlay_reader, first_origin_root, fetcher))
})?;
check_stale_mount(&config.mount_point)?;
@@ -388,6 +400,8 @@ fn run_mount(config: musicfs_core::Config) -> Result<()> {
.context("Failed to write PID file")?;
info!(pid_path = ?pid_path, "PID file written");
let grpc_db = db.clone();
let tree_for_grpc = tree.clone();
let tree_for_restore = tree.clone();
let db_for_restore = db.clone();
@@ -411,6 +425,34 @@ fn run_mount(config: musicfs_core::Config) -> Result<()> {
let shutdown_token = tokio_util::sync::CancellationToken::new();
let event_bus = Arc::new(musicfs_core::EventBus::default());
let grpc_event_bus = event_bus.clone();
let grpc_origin_root = origin_root.clone();
let grpc_shutdown = shutdown_token.clone();
runtime.spawn(async move {
let addr = format!("0.0.0.0:{}", grpc_port).parse().unwrap();
let grpc_tree = tree_for_grpc.clone();
let grpc_fetcher = fetcher.clone();
let musicfs_server = GrpcServer::new(grpc_event_bus, grpc_db.clone(), grpc_tree, grpc_fetcher, grpc_origin_root);
let metadata_server = MetadataServiceImpl::new(grpc_db);
info!(%addr, "gRPC server starting");
let result = tonic::transport::Server::builder()
.add_service(musicfs_grpc::proto::musicfs::v1::music_fs_server::MusicFsServer::new(musicfs_server))
.add_service(musicfs_grpc::proto::musicfs::v1::metadata_service_server::MetadataServiceServer::new(metadata_server))
.serve_with_shutdown(addr, async move {
grpc_shutdown.cancelled().await;
})
.await;
if let Err(e) = result {
tracing::error!(error = %e, "gRPC server error");
}
});
runtime.block_on(async {
let mut sigterm =
tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())?;
+6
View File
@@ -387,6 +387,9 @@ async fn run_set(
replaygain_track_peak: fields.replaygain_track_peak,
replaygain_album_gain: fields.replaygain_album_gain,
replaygain_album_peak: fields.replaygain_album_peak,
label: None,
album_type: None,
cover_url: None,
custom_tags: fields.custom_tags,
}
} else {
@@ -416,6 +419,9 @@ async fn run_set(
replaygain_track_peak: None,
replaygain_album_gain: None,
replaygain_album_peak: None,
label: None,
album_type: None,
cover_url: None,
custom_tags: HashMap::new(),
}
};
+3
View File
@@ -5,8 +5,11 @@ edition.workspace = true
[dependencies]
musicfs-cache = { path = "../musicfs-cache" }
musicfs-cas = { path = "../musicfs-cas" }
musicfs-metadata = { path = "../musicfs-metadata" }
musicfs-search = { path = "../musicfs-search" }
musicfs-core = { path = "../musicfs-core" }
parking_lot.workspace = true
tonic.workspace = true
prost.workspace = true
tokio.workspace = true
+19
View File
@@ -2,6 +2,8 @@ syntax = "proto3";
package musicfs.v1;
option go_package = "homelab.lan/music-agregator/gen/musicfs/v1;musicfsv1";
service MusicFS {
rpc Search(SearchRequest) returns (SearchResponse);
rpc SearchStream(SearchRequest) returns (stream SearchResult);
@@ -152,6 +154,10 @@ message OriginInfo {
message OriginRequest {
string origin_id = 1;
// Optional subdirectory to scope the scan (relative to origin root).
// If empty, scans the entire origin.
// Example: "Metallica - Master of Puppets (1986) [FLAC]"
optional string subdir = 2;
}
message OriginHealthResponse {
@@ -167,6 +173,13 @@ message SyncProgress {
uint32 total = 3;
string current_path = 4;
uint64 bytes_synced = 5;
repeated SyncedFile new_files = 6;
}
message SyncedFile {
string path = 1;
int64 file_id = 2;
string virtual_path = 3;
}
message EventFilter {
@@ -226,6 +239,9 @@ message MetadataResponse {
optional uint32 channels = 34;
optional uint32 bits_per_sample = 35;
optional string encoder = 36;
optional string label = 40;
optional string album_type = 41;
optional string cover_url = 42;
map<string, string> custom_tags = 50;
}
@@ -255,6 +271,9 @@ message UpdateMetadataRequest {
optional float replaygain_track_peak = 31;
optional float replaygain_album_gain = 32;
optional float replaygain_album_peak = 33;
optional string label = 40;
optional string album_type = 41;
optional string cover_url = 42;
map<string, string> custom_tags = 50;
}
+1
View File
@@ -7,6 +7,7 @@ pub mod proto {
}
mod metadata;
pub mod scanner;
mod search_service;
mod server;
mod webhook;
+55 -15
View File
@@ -5,7 +5,7 @@ use crate::proto::musicfs::v1::{
ClearOverlayRequest, ClearOverlayResponse, GetMetadataRequest, ImportMetadataRequest,
ImportProgress, MetadataResponse, UpdateMetadataRequest, UpdateMetadataResponse,
};
use musicfs_cache::Database;
use musicfs_cache::{Database, EnrichmentUpdate};
use musicfs_core::{AudioMeta, FileId, VirtualPath};
use std::sync::Arc;
use tokio::sync::mpsc;
@@ -63,6 +63,9 @@ impl MetadataServiceImpl {
channels: meta.channels,
bits_per_sample: meta.bits_per_sample,
encoder: meta.encoder.clone(),
label: None,
album_type: None,
cover_url: None,
custom_tags: Default::default(),
}
}
@@ -160,24 +163,40 @@ impl MetadataService for MetadataServiceImpl {
let audio_meta = Self::request_to_audio_meta(&req);
match self.db.update_metadata(file_id, &audio_meta) {
Ok(()) => {
debug!(file_id = req.file_id, "Metadata updated successfully");
Ok(Response::new(UpdateMetadataResponse {
file_id: req.file_id,
success: true,
error_message: None,
}))
}
Err(e) => {
warn!(file_id = req.file_id, error = %e, "Failed to update metadata");
Ok(Response::new(UpdateMetadataResponse {
if let Err(e) = self.db.update_metadata(file_id, &audio_meta) {
warn!(file_id = req.file_id, error = %e, "Failed to update metadata");
return Ok(Response::new(UpdateMetadataResponse {
file_id: req.file_id,
success: false,
error_message: Some(e.to_string()),
}));
}
if req.label.is_some() || req.album_type.is_some() || req.cover_url.is_some() {
let enrichment = EnrichmentUpdate {
label: req.label.clone(),
album_type: req.album_type.clone(),
cover_url: req.cover_url.clone(),
genres_json: None,
primary_genre: None,
source: "orchestrator".to_string(),
};
if let Err(e) = self.db.update_enrichment(file_id, &enrichment) {
warn!(file_id = req.file_id, error = %e, "Failed to update enrichment");
return Ok(Response::new(UpdateMetadataResponse {
file_id: req.file_id,
success: false,
error_message: Some(e.to_string()),
}))
}));
}
}
debug!(file_id = req.file_id, "Metadata updated successfully");
Ok(Response::new(UpdateMetadataResponse {
file_id: req.file_id,
success: true,
error_message: None,
}))
}
#[instrument(level = "info", skip(self, request), fields(method = "clear_overlay"))]
@@ -239,7 +258,28 @@ impl MetadataService for MetadataServiceImpl {
let error_message = if let Some(ref metadata_req) = item.metadata {
let audio_meta = MetadataServiceImpl::request_to_audio_meta(metadata_req);
match db.update_metadata(file_id, &audio_meta) {
Ok(()) => None,
Ok(()) => {
if metadata_req.label.is_some()
|| metadata_req.album_type.is_some()
|| metadata_req.cover_url.is_some()
{
let enrichment = EnrichmentUpdate {
label: metadata_req.label.clone(),
album_type: metadata_req.album_type.clone(),
cover_url: metadata_req.cover_url.clone(),
genres_json: None,
primary_genre: None,
source: "orchestrator".to_string(),
};
if let Err(e) = db.update_enrichment(file_id, &enrichment) {
Some(e.to_string())
} else {
None
}
} else {
None
}
}
Err(e) => Some(e.to_string()),
}
} else {
+261
View File
@@ -0,0 +1,261 @@
use musicfs_cache::{Database, VirtualTree};
use musicfs_cas::ContentFetcher;
use musicfs_core::{
AudioMeta, Error, Event, EventBus, FileId, FileMeta, OriginId, RealPath, Result, VirtualPath,
};
use musicfs_metadata::MetadataParser;
use parking_lot::RwLock;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::time::UNIX_EPOCH;
use tokio::sync::mpsc;
use tracing::{info, warn};
pub struct ScanResult {
pub new_files: Vec<SyncedFileInfo>,
pub changed: u32,
pub deleted: u32,
pub unchanged: u32,
pub bytes_synced: u64,
}
pub struct SyncedFileInfo {
pub path: String,
pub file_id: FileId,
pub virtual_path: String,
}
#[derive(Debug, Clone)]
pub struct ScanProgress {
pub phase: String,
pub current: u32,
pub total: u32,
pub current_path: String,
pub bytes_synced: u64,
}
pub struct OriginScanner {
db: Arc<Database>,
event_bus: Arc<EventBus>,
tree: Arc<RwLock<VirtualTree>>,
fetcher: Arc<ContentFetcher>,
parser: MetadataParser,
}
impl OriginScanner {
pub fn new(
db: Arc<Database>,
event_bus: Arc<EventBus>,
tree: Arc<RwLock<VirtualTree>>,
fetcher: Arc<ContentFetcher>,
) -> Self {
Self {
db,
event_bus,
tree,
fetcher,
parser: MetadataParser,
}
}
pub async fn scan(
&self,
origin_id: &OriginId,
origin_root: &Path,
subdir: Option<&str>,
progress_tx: mpsc::Sender<ScanProgress>,
) -> Result<ScanResult> {
let scan_root = match subdir {
Some(sub) if !sub.is_empty() => origin_root.join(sub),
_ => origin_root.to_path_buf(),
};
if !scan_root.exists() {
return Err(Error::Origin(format!(
"scan path does not exist: {}",
scan_root.display()
)));
}
// Phase 1: Scanning
let audio_files = self.collect_audio_files(&scan_root, &progress_tx)?;
let total_files = audio_files.len() as u32;
info!(files = total_files, "scan phase complete");
// Phase 2: Hashing + categorization
let mut new_files = Vec::new();
let mut unchanged = 0u32;
for (i, abs_path) in audio_files.iter().enumerate() {
let _ = progress_tx.try_send(ScanProgress {
phase: "hashing".to_string(),
current: i as u32 + 1,
total: total_files,
current_path: abs_path.display().to_string(),
bytes_synced: 0,
});
let rel_path = abs_path.strip_prefix(origin_root).unwrap_or(abs_path);
let existing = self.db.get_file_by_real_path(origin_id, rel_path)?;
if existing.is_some() {
unchanged += 1;
continue;
}
let size = std::fs::metadata(abs_path).map(|m| m.len()).unwrap_or(0);
new_files.push(DiscoveredFile {
abs_path: abs_path.clone(),
rel_path: rel_path.to_path_buf(),
size,
});
}
info!(
new = new_files.len(),
unchanged = unchanged,
"hash phase complete"
);
// Phase 3: Indexing
let mut synced = Vec::new();
let mut bytes_synced = 0u64;
let ingest_total = new_files.len() as u32;
for (i, file) in new_files.iter().enumerate() {
let _ = progress_tx.try_send(ScanProgress {
phase: "indexing".to_string(),
current: i as u32 + 1,
total: ingest_total,
current_path: file.abs_path.display().to_string(),
bytes_synced,
});
let audio_meta = match self.parser.parse_file(&file.abs_path) {
Ok(meta) => meta,
Err(e) => {
warn!(path = %file.abs_path.display(), error = %e, "parse failed, using defaults");
AudioMeta::default()
}
};
let virtual_path = derive_virtual_path(&audio_meta, &file.rel_path);
let file_id = self.db.upsert_file(
origin_id,
&file.rel_path,
&virtual_path,
&audio_meta,
UNIX_EPOCH,
file.size,
)?;
let file_meta = FileMeta {
id: file_id,
virtual_path: virtual_path.clone(),
real_path: RealPath {
origin_id: origin_id.clone(),
path: file.rel_path.clone(),
},
size: file.size,
mtime: UNIX_EPOCH,
content_hash: None,
audio: Some(audio_meta),
};
{
let mut tree = self.tree.write();
tree.insert_file(&file_meta);
}
self.fetcher.register_file(file_meta.clone());
self.event_bus.publish(Event::FileAdded {
path: virtual_path.clone(),
origin_id: origin_id.clone(),
});
bytes_synced += file.size;
synced.push(SyncedFileInfo {
path: file.abs_path.display().to_string(),
file_id,
virtual_path: virtual_path.as_str().to_string(),
});
}
Ok(ScanResult {
new_files: synced,
changed: 0,
deleted: 0,
unchanged,
bytes_synced,
})
}
fn collect_audio_files(
&self,
scan_root: &Path,
progress_tx: &mpsc::Sender<ScanProgress>,
) -> Result<Vec<PathBuf>> {
let mut files = Vec::new();
self.walk_dir(scan_root, &mut files, progress_tx)?;
Ok(files)
}
fn walk_dir(
&self,
dir: &Path,
files: &mut Vec<PathBuf>,
progress_tx: &mpsc::Sender<ScanProgress>,
) -> Result<()> {
let entries = std::fs::read_dir(dir)
.map_err(|e| Error::Origin(format!("read_dir {}: {}", dir.display(), e)))?;
for entry in entries.flatten() {
let path = entry.path();
if path.is_dir() {
self.walk_dir(&path, files, progress_tx)?;
} else if is_audio_file(&path) {
files.push(path.clone());
let _ = progress_tx.try_send(ScanProgress {
phase: "scanning".to_string(),
current: files.len() as u32,
total: 0,
current_path: path.display().to_string(),
bytes_synced: 0,
});
}
}
Ok(())
}
}
fn derive_virtual_path(meta: &AudioMeta, rel_path: &Path) -> VirtualPath {
let artist = meta.artist.as_deref().unwrap_or("Unknown Artist");
let album = meta.album.as_deref().unwrap_or("Unknown Album");
let filename = rel_path
.file_name()
.and_then(|n| n.to_str())
.unwrap_or("unknown");
VirtualPath::new(format!("/{}/{}/{}", artist, album, filename))
}
fn is_audio_file(path: &Path) -> bool {
matches!(
path.extension()
.and_then(|e| e.to_str())
.map(|e| e.to_lowercase())
.as_deref(),
Some("flac" | "mp3" | "ogg" | "wav" | "m4a" | "aac" | "opus")
)
}
struct DiscoveredFile {
abs_path: PathBuf,
rel_path: PathBuf,
size: u64,
}
+115 -20
View File
@@ -2,11 +2,11 @@ use crate::proto::musicfs::v1::{
music_fs_server::MusicFs, CacheStats, ClearCacheRequest, ClearCacheResponse, Empty, Event,
EventFilter, HealthStatus, MountState, OriginHealthResponse, OriginRequest, OriginsResponse,
PrefetchProgress, PrefetchRequest, SearchRequest, SearchResponse, SearchResult,
ShutdownRequest, StatusResponse, SyncProgress, TierStats,
ShutdownRequest, StatusResponse, SyncProgress, SyncedFile, TierStats,
};
use musicfs_core::{Event as CoreEvent, EventBus};
use std::sync::Arc;
use std::time::{Duration, Instant};
use std::time::Instant;
use tokio::sync::mpsc;
use tokio_stream::wrappers::ReceiverStream;
use tonic::{Request, Response, Status};
@@ -16,14 +16,30 @@ pub struct MusicFsServer {
start_time: Instant,
event_bus: Arc<EventBus>,
version: String,
scanner: Arc<crate::scanner::OriginScanner>,
origin_root: std::path::PathBuf,
}
impl MusicFsServer {
pub fn new(event_bus: Arc<EventBus>) -> Self {
pub fn new(
event_bus: Arc<EventBus>,
db: Arc<musicfs_cache::Database>,
tree: Arc<parking_lot::RwLock<musicfs_cache::VirtualTree>>,
fetcher: Arc<musicfs_cas::ContentFetcher>,
origin_root: std::path::PathBuf,
) -> Self {
let scanner = Arc::new(crate::scanner::OriginScanner::new(
db,
event_bus.clone(),
tree,
fetcher,
));
Self {
start_time: Instant::now(),
event_bus,
version: env!("CARGO_PKG_VERSION").to_string(),
scanner,
origin_root,
}
}
@@ -368,24 +384,85 @@ impl MusicFs for MusicFsServer {
request: Request<OriginRequest>,
) -> Result<Response<Self::RescanOriginStream>, Status> {
let req = request.into_inner();
info!(origin_id = %req.origin_id, "gRPC rescan_origin started");
let subdir = req.subdir.as_deref().filter(|s| !s.is_empty());
info!(
origin_id = %req.origin_id,
subdir = ?subdir,
"gRPC rescan_origin started"
);
let (tx, rx) = mpsc::channel(32);
let (progress_tx, mut progress_rx) = mpsc::channel::<crate::scanner::ScanProgress>(64);
let origin_id = musicfs_core::OriginId::from(req.origin_id.as_str());
let scanner = self.scanner.clone();
let origin_root = self.origin_root.clone();
let subdir_owned = subdir.map(|s| s.to_string());
tokio::spawn(async move {
let phases = ["scanning", "indexing", "complete"];
for (i, phase) in phases.iter().enumerate() {
let progress = SyncProgress {
phase: phase.to_string(),
current: i as u32 + 1,
total: phases.len() as u32,
current_path: String::new(),
bytes_synced: 0,
};
if tx.send(Ok(progress)).await.is_err() {
break;
let forward_handle = {
let tx = tx.clone();
tokio::spawn(async move {
while let Some(progress) = progress_rx.recv().await {
let proto = SyncProgress {
phase: progress.phase,
current: progress.current,
total: progress.total,
current_path: progress.current_path,
bytes_synced: progress.bytes_synced,
new_files: vec![],
};
if tx.send(Ok(proto)).await.is_err() {
break;
}
}
})
};
let result = scanner
.scan(
&origin_id,
&origin_root,
subdir_owned.as_deref(),
progress_tx,
)
.await;
forward_handle.abort();
match result {
Ok(scan_result) => {
let synced_files: Vec<SyncedFile> = scan_result
.new_files
.iter()
.map(|f| SyncedFile {
path: f.path.clone(),
file_id: f.file_id.0,
virtual_path: f.virtual_path.clone(),
})
.collect();
let _ = tx
.send(Ok(SyncProgress {
phase: "complete".to_string(),
current: scan_result.new_files.len() as u32
+ scan_result.changed
+ scan_result.deleted,
total: scan_result.new_files.len() as u32
+ scan_result.changed
+ scan_result.deleted
+ scan_result.unchanged,
current_path: String::new(),
bytes_synced: scan_result.bytes_synced,
new_files: synced_files,
}))
.await;
}
Err(e) => {
let _ = tx
.send(Err(Status::internal(format!("rescan failed: {}", e))))
.await;
}
tokio::time::sleep(Duration::from_millis(100)).await;
}
});
@@ -438,10 +515,29 @@ impl MusicFs for MusicFsServer {
mod tests {
use super::*;
async fn make_test_server() -> (MusicFsServer, tempfile::TempDir) {
let event_bus = Arc::new(EventBus::new(16));
let db = Arc::new(musicfs_cache::Database::open_memory().unwrap());
let tree = Arc::new(parking_lot::RwLock::new(
musicfs_cache::TreeBuilder::new().build(),
));
let dir = tempfile::tempdir().unwrap();
let cfg = musicfs_cas::CasConfig {
chunks_dir: dir.path().join("chunks"),
..Default::default()
};
let store = Arc::new(musicfs_cas::CasStore::open(cfg).await.unwrap());
let fetcher = Arc::new(musicfs_cas::ContentFetcher::new(store));
let origin_root = std::path::PathBuf::from("/tmp/test-origin");
(
MusicFsServer::new(event_bus, db, tree, fetcher, origin_root),
dir,
)
}
#[tokio::test]
async fn test_get_status() {
let event_bus = Arc::new(EventBus::new(16));
let server = MusicFsServer::new(event_bus);
let (server, _dir) = make_test_server().await;
let response = server.get_status(Request::new(Empty {})).await.unwrap();
let status = response.into_inner();
@@ -452,8 +548,7 @@ mod tests {
#[tokio::test]
async fn test_get_cache_stats() {
let event_bus = Arc::new(EventBus::new(16));
let server = MusicFsServer::new(event_bus);
let (server, _dir) = make_test_server().await;
let response = server
.get_cache_stats(Request::new(Empty {}))
@@ -0,0 +1,579 @@
# Metadata Enrichment (Standalone Mode): Design Doc
**Authors:** Sisyphus
**Status:** Draft
**Last Updated:** 2026-05-18
**Reviewers:**
**Approvers:**
**Document Link:** `docs/v2/plans/metadata-enrichment-standalone.md`
**Prerequisites:** [architecture.md](../architecture.md), [week-12-external-metadata.md](week-12-external-metadata.md)
---
## 1. Abstract
When musicfs operates without the music-agregator orchestrator, it should
still be able to enrich file metadata (genres, label, artwork URL, album
type) by querying the metadata-agregator service directly. This document
describes a **built-in metadata provider** compiled into musicfs that
queries metadata-agregator's gRPC `SearchAlbums` endpoint using
artist + album names extracted from file tags. Enrichment is lazy and
non-blocking — file access always returns immediately using embedded
tags, while a background worker enriches metadata asynchronously.
This plan **supersedes** the week-12 plan's approach of embedding
MusicBrainz/Discogs/Last.fm HTTP clients directly into musicfs. Instead,
musicfs delegates all external metadata resolution to metadata-agregator,
which already handles provider APIs, rate limiting, and caching.
## 2. Background
### 2.1. Current State
musicfs extracts audio metadata via symphonia (FLAC, MP3, AAC, OGG,
Opus) and stores it in `AudioMeta`. This metadata is whatever the file
tags contain — typically title, artist, album, year, track number.
The existing plugin system (`musicfs-plugins`) defines a `MetadataPlugin`
trait for external metadata lookup, but:
- No plugins have been implemented yet.
- The plugin system only supports native `.so` and WASM plugins.
- A gRPC client to metadata-agregator would require bundling an async
runtime and tonic inside a `.so` — an awkward fit.
Meanwhile, metadata-agregator is a Go gRPC service that:
- Searches MusicBrainz by artist + album name (`SearchAlbums` RPC).
- Caches results in PostgreSQL.
- Returns rich metadata: genres, cover URL, label, release date, album
type, artist credits.
### 2.2. Pain Points
- musicfs files lack genres, artwork URLs, and label info unless the
original files were meticulously tagged.
- The week-12 plan proposed embedding 4 separate HTTP API clients
(MusicBrainz, Discogs, Last.fm, AcoustID) directly into musicfs,
duplicating what metadata-agregator already does.
- The `MetadataPlugin` trait is designed for `.so`/WASM plugins, which
is wrong for a core infrastructure gRPC client.
## 3. Goals & Non-Goals
### 3.1. Goals
- **G1:** Enrich file metadata with genres, label, album type, and cover
URL by querying metadata-agregator via gRPC.
- **G2:** Never block file access — enrichment happens in background.
- **G3:** Make the provider entirely optional — disabled by default,
musicfs works identically without it.
- **G4:** Respect enrichment source priority so orchestrator pushes
(from the full-system mode) are not overwritten.
### 3.2. Non-Goals
- **NG1:** Embedding MusicBrainz/Discogs/Last.fm HTTP clients directly
into musicfs (metadata-agregator handles this).
- **NG2:** Audio fingerprinting (AcoustID) — deferred to future work.
- **NG3:** Modifying the existing `MetadataPlugin` trait — the built-in
provider is separate from the plugin system.
- **NG4:** Bidirectional communication — musicfs only queries
metadata-agregator, never the reverse.
## 4. Proposed Design
### 4.1. High-Level Architecture
```plantuml
@startuml
!theme plain
skinparam componentStyle rectangle
package "musicfs" as mfs {
component "FUSE Layer\n(readdir/open/read)" as fuse
component "MetadataCache / DB" as db
component "OverlayReader\n(synthesize headers)" as overlay
component "EnrichmentQueue\n(bounded, async)" as queue
component "EnrichmentWorker\n(background)" as worker
}
component "metadata-agregator\nSearchAlbums(query, artist)" as meta
fuse -right-> db : lookup metadata
db -right-> overlay : serve with overlay
fuse -down-> queue : enriched_at NULL?\npush request
queue -down-> worker : dequeue
worker -down-> meta : gRPC:\nSearchAlbums(\n query=album,\n artist=artist)
meta -up-> worker : Album (genres,\nlabel, cover_url)
worker -up-> db : write enriched\nmetadata to overlay
note bottom of meta
metadata-agregator handles:
• MusicBrainz API
• rate limiting
• PostgreSQL cache
end note
note right of fuse
File access is never blocked.
Returns embedded tags immediately.
Enrichment happens async.
end note
@enduml
```
### 4.2. Enrichment Flow
```plantuml
@startuml
!theme plain
skinparam sequenceMessageAlign center
participant "Media Player" as mp
participant "FUSE Layer" as fuse
participant "MetadataCache\n(SQLite)" as db
participant "EnrichmentQueue" as queue
participant "EnrichmentWorker" as worker
participant "metadata-agregator" as meta
== File Access (non-blocking) ==
mp -> fuse : open("/Pink Floyd/The Wall/01 - In the Flesh.flac")
fuse -> db : lookup(virtual_path)
db --> fuse : AudioMeta(artist, album, title, ...)\nenriched_at = NULL
fuse -> queue : try_push(file_id, artist="Pink Floyd", album="The Wall")
note right of queue : non-blocking,\nbounded queue
fuse --> mp : return file handle\n(with embedded tags only)
== Background Enrichment (async) ==
queue -> worker : dequeue(file_id, artist, album)
worker -> worker : check enrichment_source\n(skip if 'orchestrator' or 'provider')
worker -> worker : dedup check:\nalready enriched same album?\n(reuse cached result)
worker -> meta : SearchAlbums(\n query="The Wall",\n artist="Pink Floyd",\n limit=1)
meta --> worker : Album(\n genres=["Progressive Rock", "Art Rock"],\n label="Harvest",\n cover_url="https://...",\n album_type="album")
worker -> db : update_metadata(\n file_id,\n genres, label, cover_url,\n enrichment_source='provider',\n enriched_at=now())
worker -> worker : publish EventBus::FileModified
note over mp : next access sees\nenriched metadata
@enduml
```
### 4.3. Detailed Design
#### 4.3.1. Configuration
Add `[metadata_provider]` section to `config.toml`:
```toml
[metadata_provider]
enabled = false # disabled by default
endpoint = "http://localhost:50051" # metadata-agregator gRPC
timeout_ms = 5000 # per-request timeout
retry_max = 3 # max retries on failure
retry_backoff_ms = 1000 # initial backoff between retries
queue_size = 256 # enrichment queue capacity
```
Config struct addition in `musicfs-core/src/config.rs`:
```rust
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct MetadataProviderConfig {
#[serde(default)]
pub enabled: bool,
#[serde(default = "default_provider_endpoint")]
pub endpoint: String,
#[serde(default = "default_provider_timeout_ms")]
pub timeout_ms: u64,
#[serde(default = "default_retry_max")]
pub retry_max: u32,
#[serde(default = "default_retry_backoff_ms")]
pub retry_backoff_ms: u64,
#[serde(default = "default_queue_size")]
pub queue_size: usize,
}
```
#### 4.3.2. Built-in Metadata Provider
New module in `musicfs-metadata` (not a plugin, compiled in):
```rust
// musicfs-metadata/src/provider.rs
pub struct MetadataAgregatorProvider {
client: MetadataServiceClient<Channel>,
config: MetadataProviderConfig,
}
impl MetadataAgregatorProvider {
pub async fn connect(config: &MetadataProviderConfig)
-> Result<Self>;
/// Query metadata-agregator by artist + album names.
/// Returns enriched metadata if a match is found.
pub async fn lookup(
&self,
artist: &str,
album: &str,
) -> Result<Option<EnrichedMetadata>>;
}
```
The `lookup` method calls `SearchAlbums(query=album, artist=artist,
limit=1)` on metadata-agregator. If a result is returned, it maps
the response to `EnrichedMetadata`:
```rust
pub struct EnrichedMetadata {
pub genres: Vec<String>,
pub label: Option<String>,
pub album_type: Option<String>,
pub cover_url: Option<String>,
pub release_date: Option<String>,
pub total_tracks: Option<u32>,
pub total_discs: Option<u32>,
}
```
#### 4.3.3. ExternalMetadata Extension
Extend the existing `ExternalMetadata` in `musicfs-plugins/src/traits.rs`
to carry richer data:
```rust
pub struct ExternalMetadata {
// existing fields...
pub title: Option<String>,
pub artist: Option<String>,
pub album: Option<String>,
pub album_artist: Option<String>,
pub genre: Option<String>, // kept for backward compat
pub year: Option<u32>,
pub track: Option<u32>,
pub disc: Option<u32>,
pub musicbrainz_id: Option<String>,
pub artwork_url: Option<String>,
// new fields
pub genres: Vec<String>,
pub label: Option<String>,
pub album_type: Option<String>,
pub cover_url: Option<String>,
}
```
#### 4.3.4. Database Schema Changes
Add columns to `file_metadata` table in
`musicfs-cache/src/schema.sql`:
```sql
ALTER TABLE file_metadata ADD COLUMN enrichment_source TEXT;
-- 'embedded' | 'provider' | 'orchestrator'
ALTER TABLE file_metadata ADD COLUMN enriched_at INTEGER;
-- unix timestamp, NULL = not enriched
ALTER TABLE file_metadata ADD COLUMN enrichment_attempts INTEGER DEFAULT 0;
-- number of failed enrichment attempts
ALTER TABLE file_metadata ADD COLUMN last_enrichment_error TEXT;
-- last error message, NULL if no error
ALTER TABLE file_metadata ADD COLUMN genres_json TEXT;
-- JSON array: '["Progressive Rock","Art Rock"]'
-- separate from existing `genre` (singular) for backward compat
ALTER TABLE file_metadata ADD COLUMN label TEXT;
ALTER TABLE file_metadata ADD COLUMN album_type TEXT;
ALTER TABLE file_metadata ADD COLUMN cover_url TEXT;
```
> **Note:** The existing `genre TEXT` column (singular) is preserved
> for backward compatibility. `genres_json` stores the full list.
> The singular `genre` field is set to the first genre in the array
> when enriched.
#### 4.3.5. Background Enrichment Queue + Worker
```rust
// musicfs-metadata/src/enrichment.rs
pub struct EnrichmentQueue {
tx: mpsc::Sender<EnrichmentRequest>,
/// Tracks in-flight (artist, album) pairs to prevent duplicate
/// API calls when multiple tracks from the same album are
/// accessed simultaneously.
in_flight: Arc<DashSet<(String, String)>>,
}
struct EnrichmentRequest {
file_id: FileId,
artist: String,
album: String,
}
pub struct EnrichmentWorker {
rx: mpsc::Receiver<EnrichmentRequest>,
provider: Arc<MetadataAgregatorProvider>,
db: Arc<Database>,
event_bus: Arc<EventBus>,
in_flight: Arc<DashSet<(String, String)>>,
config: MetadataProviderConfig,
}
```
##### Enqueue-time dedup
When `EnrichmentQueue::try_push()` is called, it checks the
`in_flight` `DashSet` before pushing. If `(artist, album)` is
already in the set, the request is dropped (the worker will enrich
all files with the same album in one pass). This prevents 12
simultaneous track opens from making 12 identical API calls.
If `try_push` fails because the queue is full, log at WARN level
and increment `enrichment_queue_drops_total` metric.
##### Worker loop (single-threaded, processes one at a time):
1. Dequeue `EnrichmentRequest`.
2. Check `enrichment_attempts` — skip if `>= retry_max`.
3. **Atomic conflict check**: write uses conditional SQL:
```sql
UPDATE file_metadata SET
genres_json = ?, label = ?, album_type = ?, cover_url = ?,
genre = ?, -- first genre for backward compat
enrichment_source = 'provider',
enriched_at = strftime('%s', 'now'),
enrichment_attempts = 0,
last_enrichment_error = NULL
WHERE file_id = ?
AND (enrichment_source IS NULL OR enrichment_source = 'embedded')
```
This prevents the TOCTOU race — if the orchestrator wrote between
dequeue and now, the `WHERE` clause prevents overwrite. The UPDATE
returns rows_affected=0, which the worker treats as "skip, already
enriched by higher-priority source".
4. Deduplicate by (artist, album) — if another file in the same album
was already enriched, reuse the cached `EnrichedMetadata` result
for all files with the same (artist, album) pair.
5. Call `provider.lookup(artist, album)`.
6. On success: execute atomic update (step 3) for all files with this
(artist, album). Publish `EventBus::FileModified` for each updated
file. Remove `(artist, album)` from `in_flight` set.
7. On failure: increment `enrichment_attempts`, set
`last_enrichment_error`. If `attempts < retry_max`, re-enqueue
with exponential backoff (`retry_backoff_ms * 2^attempts`).
If `attempts >= retry_max`, log at WARN and stop retrying.
Remove from `in_flight` set.
##### Shutdown behavior
Queue contents are lost on shutdown. This is acceptable — files will
be re-queued on next access since `enriched_at` is still NULL.
Enrichment is idempotent.
#### 4.3.6. FUSE Integration Point
In the FUSE `readdir` / `getattr` / `open` path
(`musicfs-fuse/src/ops.rs`), after loading `AudioMeta` from DB:
```rust
if metadata_provider.is_enabled()
&& file_meta.enriched_at.is_none()
&& file_meta.enrichment_attempts < config.retry_max
&& file_meta.audio.artist.is_some()
&& file_meta.audio.album.is_some()
{
if let Err(_) = enrichment_queue.try_push(EnrichmentRequest {
file_id: file_meta.id,
artist: file_meta.audio.artist.unwrap(),
album: file_meta.audio.album.unwrap(),
}) {
// Queue full — file will be retried on next access
tracing::warn!(
file_id = ?file_meta.id,
"enrichment queue full, dropping request"
);
metrics::ENRICHMENT_QUEUE_DROPS.inc();
}
// Non-blocking: returns immediately with embedded tags
}
```
The `enrichment_attempts < retry_max` check prevents files that have
permanently failed enrichment (e.g., metadata-agregator has no match)
from being re-queued on every access.
#### 4.3.7. Conflict Resolution
| Source | Priority | Writes When |
|--------|----------|-------------|
| `orchestrator` | Highest | Always overwrites (full-system mode push) |
| `provider` | Medium | Only if current source is NULL or `'embedded'` |
| `embedded` | Lowest | Implicit default from file tag parsing |
Conflict resolution is enforced **atomically at write time** using
conditional SQL (`WHERE enrichment_source IS NULL OR
enrichment_source = 'embedded'`), not at dequeue time. This prevents
the TOCTOU race where the orchestrator writes between the worker's
check and the worker's write.
#### 4.3.8. Proto Changes Required
The existing `UpdateMetadataRequest` in `musicfs.proto` must be
extended to carry the new enrichment fields:
```protobuf
// Add to UpdateMetadataRequest:
optional string label = 40;
optional string album_type = 41;
optional string cover_url = 42;
```
> **Note on genres:** metadata-agregator returns `repeated Genre`
> (objects with `id` + `name`). The provider extracts genre names
> and stores them as a JSON array in `genres_json`. The singular
> `genre` field in `UpdateMetadataRequest` (already exists at
> field 9) is set to the first/primary genre for backward compat.
#### 4.3.9. `cover_url` Usage
`cover_url` is stored in the metadata overlay but is **not used by
musicfs for artwork embedding or display** in this plan. It is
stored for consumption by external tools (e.g., media players that
query musicfs's gRPC `GetMetadata` and fetch artwork themselves).
Artwork download and caching is deferred to future work.
## 5. Cross-Cutting Concerns
### 5.1. Security & Privacy
- gRPC connection to metadata-agregator is plaintext (internal network).
TLS can be added via config if needed.
- No PII involved — only music metadata.
- No API keys stored in musicfs — metadata-agregator handles provider
auth.
### 5.2. Observability
New tracing spans and metrics:
| Metric | Type | Description |
|--------|------|-------------|
| `enrichment_queue_depth` | Gauge | Current queue size |
| `enrichment_queue_drops_total` | Counter | Requests dropped (queue full) |
| `enrichment_inflight_albums` | Gauge | In-flight (artist, album) dedup set size |
| `enrichment_lookups_total` | Counter | Total provider lookups |
| `enrichment_hits_total` | Counter | Successful matches |
| `enrichment_misses_total` | Counter | No match found |
| `enrichment_errors_total` | Counter | Provider errors |
| `enrichment_skipped_total` | Counter | Skipped (higher-priority source already wrote) |
| `enrichment_latency_ms` | Histogram | Lookup latency |
### 5.3. Scalability & Performance
- Queue is bounded (default 256) — backpressure via `try_push`.
- Album-level deduplication: 12 tracks in same album = 1 lookup.
- No impact on file read latency — enrichment is fully async.
- metadata-agregator caches in PostgreSQL, so repeated lookups are
cheap.
### 5.4. Testing Plan
| Test | Type | Validates |
|------|------|-----------|
| `test_provider_connect` | Unit | gRPC connection setup |
| `test_lookup_match` | Unit (mock) | SearchAlbums → EnrichedMetadata mapping |
| `test_lookup_no_match` | Unit (mock) | Graceful handling of empty results, increments attempts |
| `test_enrichment_queue_push` | Unit | Queue push + in_flight dedup |
| `test_enrichment_queue_full_drops` | Unit | try_push fails gracefully, logs, increments metric |
| `test_enrichment_worker_writes_db` | Integration | DB write after lookup |
| `test_enrichment_atomic_conflict` | Integration | Orchestrator writes between dequeue and worker write → worker does NOT overwrite |
| `test_enrichment_retry_backoff` | Unit | Failed attempts increment counter, exponential backoff |
| `test_enrichment_max_attempts_stop` | Unit | After retry_max failures, file not re-queued |
| `test_config_disabled` | Unit | No queue/worker when disabled |
| `test_album_dedup_simultaneous` | Integration | 12 tracks opened at once → 1 API call |
| `test_genre_backward_compat` | Unit | genres_json stored as array, genre set to first entry |
## 6. Alternatives Considered
### 6.1. Native .so Plugin
Rejected. Requires bundling a separate async runtime + tonic gRPC
stack inside a dynamically loaded library. ABI instability, duplicate
runtimes, and deployment complexity outweigh the "purity" of using the
plugin system.
### 6.2. Direct MusicBrainz/Discogs/Last.fm HTTP Clients (week-12 plan)
Rejected. metadata-agregator already handles these providers with rate
limiting, caching, and deduplication. Embedding HTTP clients in musicfs
would duplicate this work and couple musicfs to specific provider APIs.
### 6.3. WASM Plugin
Rejected. WASI networking is immature. gRPC over WASM adds unnecessary
latency and complexity.
### 6.4. On-Demand Blocking Lookup
Rejected. Blocking file access while waiting for a gRPC response would
cause latency spikes and kill media player UX. Background async is the
only acceptable approach.
## 7. Implementation Plan
### Phase 1: Foundation (Day 1)
- [ ] Add `MetadataProviderConfig` to config.rs
- [ ] Add DB schema columns: `enrichment_source`, `enriched_at`,
`enrichment_attempts`, `last_enrichment_error`, `genres_json`,
`label`, `album_type`, `cover_url`
- [ ] Add `label`, `album_type`, `cover_url` fields to
`UpdateMetadataRequest` in `musicfs.proto`
- [ ] Extend `ExternalMetadata` struct
- [ ] Update `config.example.toml`
### Phase 2: Provider + Worker (Day 12)
- [ ] Implement `MetadataAgregatorProvider` (gRPC client wrapper)
- [ ] Implement `EnrichmentQueue` with `DashSet` in-flight dedup
- [ ] Implement `EnrichmentWorker` with:
- Atomic conditional write (`WHERE enrichment_source IS NULL OR ...`)
- Retry tracking (`enrichment_attempts`, exponential backoff)
- Album-level result caching
- [ ] Add queue drop logging + metrics
- [ ] Wire into startup (musicfs-cli) — conditional on config
### Phase 3: Integration + Tests (Day 2)
- [ ] Wire enrichment trigger in FUSE getattr/readdir path
(with `enrichment_attempts < retry_max` guard)
- [ ] Write unit tests: atomic conflict, queue drops, retry backoff,
max attempts, genre backward compat
- [ ] Write integration test: 12-track simultaneous dedup
- [ ] Write integration test with in-memory DB + mock gRPC server
- [ ] Update architecture.md with metadata provider component
## 8. Glossary / References
| Term | Definition |
|------|------------|
| metadata-agregator | Go gRPC service that searches MusicBrainz and caches results in PostgreSQL |
| Enrichment | Adding genres, label, artwork URL to file metadata beyond what's in file tags |
| Overlay | musicfs mechanism for serving modified metadata without changing origin files |
| `AudioMeta` | Core metadata struct extracted from file tags by symphonia |
| `ExternalMetadata` | Metadata returned by external providers (plugin trait) |
| `enrichment_source` | Tracks who last wrote metadata: `embedded`, `provider`, or `orchestrator` |
- [metadata-agregator proto](../../../../metadata-agregator/proto/metadata/v1/metadata.proto)
- [musicfs-plugins traits](../../crates/musicfs-plugins/src/traits.rs)
- [musicfs-cache overlay](../../crates/musicfs-cache/src/overlay.rs)
- [architecture.md](../architecture.md)
+11
View File
@@ -31,6 +31,15 @@
clippy = pkgs.clippy;
};
};
embedme = {
enable = true;
name = "embedme";
description = "Keep README code blocks in sync with source files";
entry = "${pkgs.nodePackages.embedme}/bin/embedme";
args = [ "README.md" ];
pass_filenames = false;
language = "system";
};
};
};
in {
@@ -73,6 +82,8 @@
protobuf
grpcurl
nodePackages.embedme
];
};
});