Update Control API specification to use gRPC over Unix socket instead of JSON-RPC 2.0. gRPC provides better type safety, native streaming for events, and auto-generated clients for multi-language integration. architecture.md: - Add decision rationale table (JSON-RPC vs gRPC comparison) - Add full .proto definitions (~200 lines) for musicfs.v1 package - Define MusicFS service with 9 RPC methods: - Daemon: GetStatus, Shutdown - Cache: GetCacheStats, ClearCache, Prefetch (streaming) - Origins: ListOrigins, GetOriginHealth, RescanOrigin (streaming) - Search: Search, SearchStream - Events: SubscribeEvents (server-streaming) - Add grpcurl debugging examples requirements.md: - FR-17.1: Clarify Unix socket uses gRPC - FR-17.2: Upgrade from SHOULD to SHALL for gRPC requirement
32 KiB
MusicFS: Design Doc
Authors: [TBD]
Status: Draft
Last Updated: 2026-05-12
Reviewers: [TBD]
Approvers: [TBD]
Requirements: requirements.md
[TOC]
1. Abstract
MusicFS is a read-only FUSE filesystem that presents music libraries organized by metadata (artist/album/track) rather than physical file paths. It supports multiple origin storage backends (local, NFS, S3, SFTP), provides intelligent caching with delta synchronization, and exposes a plugin architecture for extensibility.
The system addresses limitations of the existing beetfs implementation:
- O(N) mount time → O(1) lazy loading
- Full file in RAM → streaming with content-addressable chunks
- Single origin → federated multi-origin with failover
- No offline support → cache-first with graceful degradation
Target users are media enthusiasts with large music collections (100K-10M+ tracks) distributed across multiple storage systems who want a unified, metadata-organized view without modifying original files.
2. Background
2.1 Current State
The existing beetfs implementation is a Python 2.7 FUSE plugin for beets that:
- Presents a virtual filesystem organized by metadata templates
- Overlays metadata from beets database onto file headers
- Supports metadata writes back to the beets database
2.2 Pain Points
| Problem | Impact |
|---|---|
| O(N) mount time (5-120s for large libraries) | Unusable for large collections |
| Loads entire file into RAM on open | OOM risk, 50-100MB per file |
| Python GIL limits concurrency | Poor performance under load |
| No caching between sessions | Repeated work on every mount |
| Single local origin only | Can't federate across storage |
| No offline support | Unusable without origin access |
| Critical bugs (nested methods, tree building) | Non-functional |
2.3 Related Systems
| System | Relationship |
|---|---|
| beets | Source of inspiration; potential import source |
| rclone mount | Similar FUSE + remote storage; no metadata organization |
| Plex/Jellyfin | Media servers with metadata; not filesystem-based |
3. Goals & Non-Goals
3.1 Goals
| ID | Goal | Success Metric |
|---|---|---|
| G1 | O(1) mount time | <500ms regardless of library size |
| G2 | Minimal memory footprint | <50MB idle, <500MB peak |
| G3 | Support multiple origins | ≥2 origins with automatic failover |
| G4 | Offline-first operation | Serve cached data when origin unavailable |
| G5 | Delta synchronization | >90% bandwidth reduction vs full sync |
| G6 | Plugin extensibility | Support custom origins, formats, metadata sources |
| G7 | Full-text search | Sub-second search across 1M+ tracks |
3.2 Design Requirements
The following quantitative requirements drive architectural decisions. Full specification in requirements.md.
3.2.1 Latency Requirements
| Operation | Target | Maximum | Requirement |
|---|---|---|---|
stat() cached |
<1ms | 5ms | NFR-1.1 |
readdir() cached |
<10ms | 50ms | NFR-1.2 |
open() cached |
<5ms | 20ms | NFR-1.3 |
read() cached |
<1ms | 5ms | NFR-1.4 |
read() cache miss (local) |
<50ms | 200ms | NFR-1.5 |
read() cache miss (remote) |
<200ms | 1000ms | NFR-1.6 |
| Mount completion | <100ms | 500ms | NFR-1.7 |
| Search query (1M files) | <500ms | 1000ms | FR-14 |
Design Response:
- Lazy loading eliminates mount-time I/O → O(1) mount
- In-memory LRU cache for hot metadata → <1ms stat
- SQLite with indexes → O(log n) lookups
- Async I/O via tokio → non-blocking operations
3.2.2 Throughput Requirements
| Metric | Target | Requirement |
|---|---|---|
| Sequential read (cached) | >500 MB/s | NFR-2.1 |
| Sequential read (local origin) | >200 MB/s | NFR-2.2 |
| Metadata ops/sec | >1000 | NFR-2.3 |
| Concurrent file handles | >1000 | NFR-2.4 |
Design Response:
- Memory-mapped chunk files → kernel-optimized reads
- No GIL (Rust) → true parallelism
- Async FUSE ops → handle many concurrent requests
3.2.3 Scalability Requirements
| Metric | Target | Stretch | Requirement |
|---|---|---|---|
| Library size | 1M files | 10M files | NFR-3.1, NFR-3.5 |
| Directory entries | 100K | 1M | NFR-3.2 |
| Concurrent clients | 10 | 100+ | NFR-3.6 |
| Mount time scaling | O(1) | O(1) | NFR-3.3 |
Design Response:
- Lazy tree loading → mount time independent of size
- SQLite indexes → O(log n) regardless of scale
- Streaming readdir → handle large directories
- Connection pooling → support many clients
3.2.4 Resource Requirements
| Resource | Idle | Active (1K files) | Peak | Requirement |
|---|---|---|---|---|
| Memory | <50 MB | <200 MB | <500 MB | NFR-4.1-4.3 |
| Per-file overhead | - | <1 KB | - | NFR-4.4 |
| Metadata cache | - | 100 MB default | configurable | NFR-5.1 |
| Content cache | - | 10 GB default | configurable | NFR-5.2 |
Design Response:
- Streaming reads → never load full file in memory
- Content-addressed chunks → bounded cache with LRU eviction
- Metadata in SQLite → minimal per-file RAM overhead
3.2.5 Efficiency Requirements
| Metric | Target | Requirement |
|---|---|---|
| Delta sync bandwidth reduction | >90% | NFR-6.4 |
| Cache hit rate (warm) | >95% | Derived |
| Deduplication ratio | >10% typical | FR-20 |
Design Response:
- CDC chunking → stable boundaries, minimal re-transfer
- Content-addressable storage → automatic deduplication
- Prefetch engine → anticipate access patterns
3.2.6 Reliability Requirements
| Scenario | Behavior | Requirement |
|---|---|---|
| Origin offline | Serve cached data | NFR-7.1 |
| Network failure | Graceful degradation, no crash | NFR-7.2 |
| Failed operation | Retry with backoff (100ms, 500ms, 2s) | NFR-7.3 |
| Malformed audio | Skip file, log error, don't crash | NFR-7.4 |
| Chunk corruption | Detect via checksum, re-fetch | NFR-8.1, NFR-8.4 |
| Interrupted sync | Resume from last good state | NFR-8.3 |
| Unclean unmount | Recover on next mount | NFR-8.2 |
Design Response:
- Cache-first architecture → offline operation by default
- Origin federation with health checks → survive single origin failure
- xxHash checksums on all chunks → detect corruption
- WAL mode SQLite → ACID transactions, crash recovery
3.2.7 Concurrent Access Requirements
| Scenario | Limit | Latency Impact | Requirement |
|---|---|---|---|
| Simultaneous open files | >1000 handles | None | NFR-2.4 |
| Parallel read ops | >100 concurrent | <2x p99 latency | Derived |
| Multiple clients | >10 (target 100+) | Linear degradation | NFR-3.6 |
| Readdir during sync | No blocking | Serve stale if needed | FR-9.2 |
Design Response:
- Async I/O (tokio) → non-blocking operations
- No GIL → true parallelism across cores
- Read-write locks on cache → readers don't block readers
- Stale-while-revalidate → serve cached during refresh
3.3 Non-Goals
| ID | Non-Goal | Rationale |
|---|---|---|
| NG1 | Write to origin files | Read-only by design; preserves originals |
| NG2 | Transcoding | Out of scope for MVP; plugin possible later |
| NG3 | Video file support | Focus on audio; deferred to future |
| NG4 | Distributed/clustered mode | Single-node for MVP; architecture supports later |
| NG5 | Mobile app | CLI/daemon only; filesystem interface |
4. Proposed Design
4.1 High-Level Architecture
@startuml
!theme plain
skinparam componentStyle rectangle
package "User Space" {
[Media Players\n(mpv, VLC, Plex)] as Apps
package "MusicFS Daemon" {
[FUSE Interface] as FUSE
[Control API] as Control
[Metrics] as Metrics
package "Core Services" {
[Virtual Path\nResolver] as VPR
[Event Bus] as Events
[Search Engine\n(tantivy)] as Search
}
package "Plugin Host" {
[Origin\nPlugins] as OriginPlugins
[Metadata\nPlugins] as MetaPlugins
[Format\nPlugins] as FormatPlugins
}
package "Storage Layer" {
[Content-Addressable\nStore (CAS)] as CAS
database "SQLite\n(metadata)" as SQLite
database "sled\n(chunks)" as Sled
}
[Origin\nFederation] as Federation
}
}
package "Origins (Read-Only)" {
[Local FS] as Local
[NFS] as NFS
[S3] as S3
[SFTP] as SFTP
}
Apps --> FUSE : POSIX
FUSE --> VPR
VPR --> Events
VPR --> Search
VPR --> CAS
CAS --> SQLite
CAS --> Sled
VPR --> Federation
Federation --> OriginPlugins
OriginPlugins --> Local
OriginPlugins --> NFS
OriginPlugins --> S3
OriginPlugins --> SFTP
Control --> Events
Metrics --> Events
@enduml
4.2 Component Overview
| Component | Responsibility | Technology |
|---|---|---|
| FUSE Interface | Translate POSIX ops to internal calls | fuser (Rust) |
| Virtual Path Resolver | Map virtual ↔ real paths | Custom |
| Event Bus | Decouple components, enable observability | tokio broadcast |
| Search Engine | Full-text metadata search | tantivy |
| Plugin Host | Load/manage plugins | Native + WASM |
| CAS | Content-addressed chunk storage | Custom + sled |
| Origin Federation | Multi-origin routing with failover | Custom |
4.3 Detailed Design
4.3.1 Virtual Path Resolution
The resolver maps metadata-based virtual paths to real origin paths.
@startuml
!theme plain
participant "FUSE" as F
participant "VirtualPathResolver" as VPR
participant "MetadataIndex" as MI
participant "TreeCache" as TC
participant "OriginFederation" as OF
F -> VPR : lookup("/Metallica/72 Seasons/01.flac")
VPR -> TC : get_cached(path)
alt cache hit
TC --> VPR : CachedEntry
else cache miss
VPR -> MI : query(artist="Metallica", album="72 Seasons", track=1)
MI --> VPR : FileRecord { origin_id, real_path, metadata }
VPR -> TC : store(path, entry)
end
VPR -> OF : resolve_origin(origin_id)
OF --> VPR : OriginHandle
VPR --> F : ResolvedPath { origin, real_path, inode }
@enduml
Path Template Grammar:
template = segment ("/" segment)*
segment = (literal | variable)+
variable = "$" identifier
identifier = "artist" | "album" | "title" | "track" | "year" | "genre"
| "format" | "format_upper" | "disc"
Default Template:
$artist/$album ($year) [$format_upper]/$track - $title.$format
4.3.2 Content-Addressable Store (CAS)
All file content is stored as content-addressed chunks, enabling deduplication and efficient delta sync.
@startuml
!theme plain
package "Content-Addressable Store" {
component "Chunk Manager" as CM
component "CDC Chunker\n(FastCDC)" as CDC
component "Hash Index\n(xxHash64)" as Hash
database "Chunk Files\n~/.cache/musicfs/chunks/" as Chunks
database "Index DB\n(sled)" as Index
CM --> CDC : chunk data
CDC --> Hash : compute hash
Hash --> Index : store hash → location
CM --> Chunks : write chunk file
}
note right of CDC
Avg chunk: 64KB
Min: 16KB, Max: 256KB
Stable boundaries for delta sync
end note
@enduml
Chunk Storage Layout:
~/.cache/musicfs/
├── chunks/
│ ├── aa/
│ │ ├── aa1b2c3d4e5f6789... (64KB chunk)
│ │ └── aa9f8e7d6c5b4a32...
│ ├── ab/
│ └── ... (256 subdirs for distribution)
├── metadata.db (SQLite: file metadata, tree cache)
├── search.idx/ (tantivy: full-text index)
└── chunks.sled/ (sled: hash → chunk location)
4.3.3 Origin Federation
Multiple origins are managed with priority-based routing and health tracking.
@startuml
!theme plain
participant "VirtualPathResolver" as VPR
participant "OriginFederation" as OF
participant "HealthChecker" as HC
participant "Origin[Local]" as O1
participant "Origin[NFS]" as O2
participant "Origin[S3]" as O3
VPR -> OF : read(real_path, offset, size)
OF -> OF : select_origin(priority, health)
alt Origin[Local] healthy (pri=1)
OF -> O1 : read()
O1 --> OF : data
else Origin[Local] unhealthy, try NFS (pri=2)
OF -> O2 : read()
alt success
O2 --> OF : data
else failure
OF -> O3 : read()
O3 --> OF : data
end
end
OF --> VPR : data
note over HC
Background health checks
every 30s per origin
end note
@enduml
Origin Configuration:
[[origins]]
id = "local"
type = "local"
path = "/mnt/nas/music"
priority = 1
[[origins]]
id = "backup"
type = "s3"
bucket = "music-backup"
priority = 2
4.3.4 Plugin System
Plugins extend functionality without modifying core code.
@startuml
!theme plain
interface "Plugin" {
+name(): String
+version(): Version
+init(config)
+shutdown()
}
interface "OriginPlugin" {
+list_dir(path): Vec<DirEntry>
+read(path, offset, size): Vec<u8>
+stat(path): FileStat
+watch(path, callback): WatchHandle
}
interface "MetadataPlugin" {
+extract(data, format): Metadata
+can_handle(format): bool
}
interface "FormatPlugin" {
+extensions(): Vec<String>
+parse_header(data): AudioHeader
+synthesize_header(metadata): Vec<u8>
}
Plugin <|-- OriginPlugin
Plugin <|-- MetadataPlugin
Plugin <|-- FormatPlugin
class "LocalFSPlugin" implements OriginPlugin
class "S3Plugin" implements OriginPlugin
class "SymphoniaPlugin" implements MetadataPlugin
class "FlacPlugin" implements FormatPlugin
class "Mp3Plugin" implements FormatPlugin
@enduml
Plugin Loading:
- Built-in: Compiled into binary (Local, S3, SFTP, symphonia)
- Native: Dynamic libraries (
.so/.dylib) loaded at runtime - WASM: Sandboxed plugins via wasmtime (future)
4.3.5 Data Flow: Read Operation
@startuml
!theme plain
|FUSE|
start
:receive read(path, offset, size);
|VirtualPathResolver|
:resolve virtual path to real path;
:lookup file metadata;
|CAS|
:compute chunk range for [offset, offset+size];
if (all chunks cached?) then (yes)
:read from local chunk files;
else (no)
|OriginFederation|
:select healthy origin by priority;
:fetch missing byte range;
|CAS|
:chunk fetched data (CDC);
:store chunks by hash;
:update chunk manifest;
endif
|EventBus|
:emit FileAccessed event;
|FUSE|
:return data to application;
stop
@enduml
4.3.6 Data Schema
Metadata Index (SQLite):
CREATE TABLE files (
id INTEGER PRIMARY KEY,
origin_id TEXT NOT NULL,
real_path TEXT NOT NULL,
virtual_path TEXT NOT NULL,
-- Metadata (see FR-6 in requirements.md)
title TEXT,
artist TEXT,
album TEXT,
album_artist TEXT,
genre TEXT,
year INTEGER,
track INTEGER,
disc INTEGER,
duration_ms INTEGER,
bitrate INTEGER,
sample_rate INTEGER,
format TEXT,
-- Sync state
origin_mtime INTEGER,
origin_size INTEGER,
content_hash TEXT,
chunk_manifest BLOB, -- msgpack: [(chunk_hash, offset, size)]
last_sync INTEGER,
UNIQUE(origin_id, real_path)
);
CREATE INDEX idx_virtual ON files(virtual_path);
CREATE INDEX idx_artist_album ON files(artist, album);
CREATE INDEX idx_content_hash ON files(content_hash);
CREATE TABLE artwork (
id INTEGER PRIMARY KEY,
file_id INTEGER REFERENCES files(id),
art_type TEXT, -- 'front', 'back'
chunk_hash TEXT, -- reference to CAS
width INTEGER,
height INTEGER,
UNIQUE(file_id, art_type)
);
CREATE TABLE collections (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE,
query_json TEXT, -- smart collection query
created_at INTEGER
);
4.3.7 Control API
Protocol Choice: gRPC over Unix Socket
| Criterion | JSON-RPC | gRPC | Winner |
|---|---|---|---|
| Type safety | Runtime validation | Compile-time (protobuf) | gRPC |
| Schema evolution | Ad-hoc versioning | Built-in field numbering | gRPC |
| Streaming | Requires WebSocket/polling | Native bidirectional | gRPC |
| Client generation | Manual per language | Auto-gen 10+ languages | gRPC |
| Performance | JSON parse overhead | Binary, zero-copy | gRPC |
| Debugging | Human-readable | Needs tooling (grpcurl) | JSON-RPC |
| Simplicity | Lower barrier | Requires protoc | JSON-RPC |
Decision: gRPC for primary API. Human-readable debugging via grpcurl and CLI wrapper.
Rationale:
- Event streaming - Native server-streaming for real-time sync/cache events without polling
- Multi-language clients - Auto-generated clients for Python (beets integration), Go, Node.js
- Schema evolution - Protobuf field numbering allows backward-compatible API changes
- Performance - Binary encoding avoids JSON serialization overhead on high-frequency stat() calls
Protocol Buffer Definitions:
syntax = "proto3";
package musicfs.v1;
// ============================================================================
// Core Services
// ============================================================================
service MusicFS {
// Daemon lifecycle
rpc GetStatus(Empty) returns (StatusResponse);
rpc Shutdown(ShutdownRequest) returns (Empty);
// Cache management
rpc GetCacheStats(Empty) returns (CacheStats);
rpc ClearCache(ClearCacheRequest) returns (ClearCacheResponse);
rpc Prefetch(PrefetchRequest) returns (stream PrefetchProgress);
// Origin management
rpc ListOrigins(Empty) returns (OriginsResponse);
rpc GetOriginHealth(OriginRequest) returns (OriginHealth);
rpc RescanOrigin(OriginRequest) returns (stream SyncProgress);
// Search
rpc Search(SearchRequest) returns (SearchResponse);
rpc SearchStream(SearchRequest) returns (stream SearchResult);
// Events (server-streaming)
rpc SubscribeEvents(EventFilter) returns (stream Event);
}
// ============================================================================
// Messages: Daemon
// ============================================================================
message Empty {}
message StatusResponse {
string version = 1;
uint64 uptime_seconds = 2;
string mount_point = 3;
MountState state = 4;
uint32 open_file_handles = 5;
uint64 fuse_ops_total = 6;
}
enum MountState {
MOUNT_STATE_UNKNOWN = 0;
MOUNT_STATE_MOUNTING = 1;
MOUNT_STATE_READY = 2;
MOUNT_STATE_SYNCING = 3;
MOUNT_STATE_DEGRADED = 4; // Some origins unavailable
MOUNT_STATE_UNMOUNTING = 5;
}
message ShutdownRequest {
bool force = 1; // Skip graceful drain
uint32 drain_timeout_ms = 2; // Max wait for in-flight ops (default: 5000)
}
// ============================================================================
// Messages: Cache
// ============================================================================
message CacheStats {
// Hit/miss counters
uint64 hits = 1;
uint64 misses = 2;
double hit_rate = 3;
// Storage
uint64 chunks_stored = 4;
uint64 chunks_unique = 5; // After deduplication
double dedup_ratio = 6; // Space saved by dedup
uint64 size_bytes = 7;
uint64 size_limit_bytes = 8;
// Metadata cache
uint64 metadata_entries = 9;
uint64 metadata_bytes = 10;
// Per-tier breakdown
TierStats l1_metadata = 11;
TierStats l2_headers = 12;
TierStats l3_chunks = 13;
}
message TierStats {
uint64 entries = 1;
uint64 bytes = 2;
uint64 evictions = 3;
}
message ClearCacheRequest {
optional string origin_id = 1; // Empty = all origins
CacheTier tier = 2; // Which tier to clear
bool dry_run = 3; // Report what would be cleared
}
enum CacheTier {
CACHE_TIER_ALL = 0;
CACHE_TIER_METADATA = 1;
CACHE_TIER_HEADERS = 2;
CACHE_TIER_CHUNKS = 3;
}
message ClearCacheResponse {
uint64 entries_cleared = 1;
uint64 bytes_freed = 2;
}
message PrefetchRequest {
repeated string paths = 1; // Virtual paths to prefetch
optional string query = 2; // Or search query
PrefetchStrategy strategy = 3;
}
enum PrefetchStrategy {
PREFETCH_METADATA_ONLY = 0; // Just stat info
PREFETCH_HEADERS = 1; // Metadata + audio headers
PREFETCH_FULL = 2; // Complete file content
}
message PrefetchProgress {
string path = 1;
uint64 bytes_fetched = 2;
uint64 bytes_total = 3;
bool complete = 4;
optional string error = 5;
}
// ============================================================================
// Messages: Origins
// ============================================================================
message OriginRequest {
string origin_id = 1;
}
message OriginsResponse {
repeated OriginInfo origins = 1;
}
message OriginInfo {
string id = 1;
string origin_type = 2; // "local", "sftp", "s3", "smb"
string display_name = 3;
OriginHealth health = 4;
uint64 file_count = 5;
uint64 total_bytes = 6;
int64 last_sync_unix = 7;
}
message OriginHealth {
HealthStatus status = 1;
uint32 latency_ms = 2;
optional string error_message = 3;
int64 last_check_unix = 4;
}
enum HealthStatus {
HEALTH_UNKNOWN = 0;
HEALTH_HEALTHY = 1;
HEALTH_DEGRADED = 2; // Slow but working
HEALTH_UNHEALTHY = 3; // Connection failed
}
message SyncProgress {
string origin_id = 1;
SyncPhase phase = 2;
uint64 files_scanned = 3;
uint64 files_changed = 4;
uint64 files_total = 5;
uint64 bytes_transferred = 6;
optional string current_file = 7;
bool complete = 8;
optional string error = 9;
}
enum SyncPhase {
SYNC_PHASE_SCANNING = 0;
SYNC_PHASE_COMPARING = 1;
SYNC_PHASE_FETCHING = 2;
SYNC_PHASE_INDEXING = 3;
SYNC_PHASE_COMPLETE = 4;
}
// ============================================================================
// Messages: Search
// ============================================================================
message SearchRequest {
string query = 1; // Full-text query
uint32 limit = 2; // Max results (default: 100)
uint32 offset = 3; // Pagination
repeated string fields = 4; // Restrict to fields: artist, album, title
optional string origin_id = 5; // Filter by origin
}
message SearchResponse {
repeated SearchResult results = 1;
uint64 total_matches = 2;
uint32 query_time_ms = 3;
}
message SearchResult {
string virtual_path = 1;
string title = 2;
string artist = 3;
string album = 4;
float score = 5; // Relevance score
map<string, string> highlights = 6; // Field -> highlighted snippet
}
// ============================================================================
// Messages: Events
// ============================================================================
message EventFilter {
repeated EventType types = 1; // Empty = all events
optional string origin_id = 2; // Filter by origin
}
enum EventType {
EVENT_TYPE_ALL = 0;
EVENT_TYPE_FILE_ADDED = 1;
EVENT_TYPE_FILE_REMOVED = 2;
EVENT_TYPE_FILE_MODIFIED = 3;
EVENT_TYPE_ORIGIN_CONNECTED = 4;
EVENT_TYPE_ORIGIN_DISCONNECTED = 5;
EVENT_TYPE_SYNC_STARTED = 6;
EVENT_TYPE_SYNC_COMPLETED = 7;
EVENT_TYPE_CACHE_EVICTION = 8;
}
message Event {
EventType type = 1;
int64 timestamp_unix = 2;
string origin_id = 3;
optional string path = 4;
map<string, string> metadata = 5;
}
CLI Interface (wraps gRPC client):
musicfs mount /mnt/music # Mount filesystem
musicfs status # GetStatus()
musicfs cache stats # GetCacheStats()
musicfs cache clear --origin=local # ClearCache(origin_id="local")
musicfs search "metallica heavy" # Search(query="metallica heavy")
musicfs origin list # ListOrigins()
musicfs origin rescan local # RescanOrigin() with progress
musicfs events --type=file_added # SubscribeEvents() stream
Debugging:
# Direct gRPC inspection via grpcurl
grpcurl -unix /run/musicfs.sock musicfs.v1.MusicFS/GetStatus
grpcurl -unix /run/musicfs.sock -d '{"query":"metallica"}' musicfs.v1.MusicFS/Search
5. Cross-Cutting Concerns
5.1 Security & Privacy
| Concern | Mitigation |
|---|---|
| Credential storage | Use system keyring (secret-service) or env vars; never in config file |
| Credential exposure | Redact from logs; exclude from /proc/cmdline |
| Cache at rest | Optional encryption via age/libsodium (P3 requirement) |
| Plugin sandboxing | WASM plugins run in wasmtime sandbox; native plugins require trust |
| Access control | Respect origin permissions; run as unprivileged user |
| No PII handling | Filesystem metadata only; no user data collected |
5.2 Observability
Metrics (Prometheus format):
musicfs_fuse_ops_total{op="read"} 152341
musicfs_fuse_ops_total{op="readdir"} 8234
musicfs_fuse_latency_seconds{op="read",quantile="0.99"} 0.004
musicfs_cache_hits_total 142107
musicfs_cache_misses_total 10234
musicfs_cache_size_bytes 5368709120
musicfs_origin_health{origin="local"} 1
musicfs_origin_health{origin="s3"} 0
musicfs_sync_files_changed{origin="local"} 15
Logging Levels:
| Level | Content |
|---|---|
| ERROR | Unrecoverable failures, data corruption |
| WARN | Recoverable failures, origin timeouts |
| INFO | Mount/unmount, sync completion, config reload |
| DEBUG | Cache hits/misses, origin selection |
| TRACE | Individual FUSE operations, chunk I/O |
Golden Signals Dashboard:
- Latency: p50/p95/p99 for read, stat, readdir
- Traffic: FUSE ops/sec, bytes read/sec
- Errors: Origin failures, cache corruption
- Saturation: Cache fullness, open file handles
5.3 Scalability & Performance
Expected Load:
| Metric | Target | Maximum |
|---|---|---|
| Library size | 1M files | 10M files |
| Concurrent clients | 10 | 100+ |
| FUSE ops/sec | 1,000 | 10,000 |
| Read throughput | 500 MB/s | 1 GB/s |
Scaling Strategy:
- Horizontal: Not supported (single daemon per mountpoint)
- Vertical: Increase cache size, add origins
Resource Requirements:
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 1 core | 4 cores |
| RAM | 256 MB | 2 GB |
| Disk (cache) | 1 GB | 50 GB |
| Network | 10 Mbps | 1 Gbps |
5.4 Testing Plan
| Test Type | Scope | Tools |
|---|---|---|
| Unit | Individual components | cargo test |
| Integration | Component interaction | cargo test --features integration |
| E2E | Full FUSE operations | pytest + real mount |
| Performance | Latency, throughput | criterion.rs, custom benchmarks |
| Stress | High load, large libraries | locust, custom generators |
| Chaos | Origin failures, network issues | toxiproxy |
Test Matrix:
Origins: [local, s3, sftp] × [healthy, degraded, offline]
Cache: [cold, warm, full]
Library: [100, 10K, 1M, 10M] files
Operations: [mount, readdir, stat, read, search]
6. Alternatives Considered
6.1 Alternative A: Extend beetfs (Python)
Description: Fix bugs in existing beetfs, add features incrementally.
Rejected Because:
- Python GIL fundamentally limits concurrency
- Python 2.7 EOL; migration to Python 3 substantial
- Architecture (full file in RAM) requires rewrite anyway
- No async I/O support in fuse-python
6.2 Alternative B: Use rclone mount
Description: Use rclone's FUSE mount with VFS caching.
Rejected Because:
- No metadata-based virtual path organization
- No metadata overlay functionality
- Limited plugin extensibility
- Would require forking and heavy modification
6.3 Alternative C: Build as Plex/Jellyfin Plugin
Description: Extend existing media server with virtual filesystem view.
Rejected Because:
- Tied to specific media server
- Not a true filesystem (no POSIX interface)
- Heavy runtime dependency
- Different use case (streaming vs filesystem)
6.4 Alternative D: Go Implementation
Description: Implement in Go using go-fuse.
Considered Trade-offs:
| Aspect | Rust | Go |
|---|---|---|
| Memory safety | Compile-time | GC pauses |
| Concurrency | async/await, no GC | goroutines, GC |
| FUSE library | fuser (mature) | go-fuse (mature) |
| Learning curve | Steeper | Gentler |
| Binary size | Smaller | Larger |
Decision: Rust chosen for zero-cost abstractions, no GC pauses during I/O, and better fit for systems programming.
7. Implementation Plan
7.1 Phase 1: MVP (4 weeks)
Goal: Basic functional filesystem with single origin.
| Week | Deliverables |
|---|---|
| 1 | Project setup, FUSE skeleton, local origin plugin |
| 2 | Metadata extraction (symphonia), SQLite schema |
| 3 | Virtual path resolver, tree cache, basic readdir/stat/read |
| 4 | CAS implementation, chunk caching, integration tests |
Exit Criteria:
- Mount and browse local music library
- Play audio files through mounted filesystem
- Cache persists across restarts
7.2 Phase 2: Delta Sync & Multi-Origin (3 weeks)
Goal: Efficient synchronization and origin federation.
| Week | Deliverables |
|---|---|
| 5 | CDC chunking (FastCDC), delta detection |
| 6 | Origin federation, priority routing, health checks |
| 7 | S3 origin plugin, SFTP origin plugin |
Exit Criteria:
- Delta sync achieves >90% bandwidth reduction
- Automatic failover between origins
- Remote origins functional
7.3 Phase 3: Search & Smart Features (2 weeks)
Goal: Full-text search and intelligent caching.
| Week | Deliverables |
|---|---|
| 8 | tantivy integration, search indexing, /.search/ virtual dir |
| 9 | Smart collections, prefetch engine, access pattern learning |
Exit Criteria:
- Search returns results in <1s for 1M tracks
- Prefetch reduces cache misses by >50%
7.4 Phase 4: Plugin System & Polish (2 weeks)
Goal: Extensibility and production readiness.
| Week | Deliverables |
|---|---|
| 10 | Plugin host, plugin API stabilization, example plugins |
| 11 | Control API, metrics, documentation, packaging |
Exit Criteria:
- Custom origin plugin loadable at runtime
- Prometheus metrics exported
- systemd service functional
7.5 Rollout Strategy
@startuml
!theme plain
[*] --> Alpha
Alpha --> Beta : Internal testing complete
Beta --> GA : Community testing complete
state Alpha {
[*] --> DevTesting
DevTesting --> DogFood : Core features work
}
state Beta {
[*] --> LimitedRelease
LimitedRelease --> PublicBeta : No critical bugs
}
state GA {
[*] --> Stable
}
note right of Alpha : 2-4 weeks\nDevelopers only
note right of Beta : 4-8 weeks\nEarly adopters
note right of GA : Stable releases
@enduml
Feature Flags:
[features]
search_enabled = true
smart_collections = false # Beta
wasm_plugins = false # Experimental
Rollback: Binary replacement + cache clear; no data migration needed.
8. Glossary & References
8.1 Glossary
| Term | Definition |
|---|---|
| CAS | Content-Addressable Store; data stored/retrieved by hash |
| CDC | Content-Defined Chunking; chunking with stable boundaries |
| FUSE | Filesystem in Userspace; kernel interface for user-space filesystems |
| Origin | Source storage backend (local, S3, NFS, etc.) |
| Virtual Path | Metadata-derived path shown to users |
| Real Path | Actual path on origin storage |
8.2 References
| Document | Link |
|---|---|
| Requirements Specification | requirements.md |
| beetfs (Original) | beetsplug/beetFs.py |
| beetfs Features | v1/features.md |
| fuser (Rust FUSE) | https://github.com/cberner/fuser |
| tantivy (Search) | https://github.com/quickwit-oss/tantivy |
| symphonia (Audio) | https://github.com/pdrat/symphonia |
| FastCDC | https://github.com/nlfiedler/fastcdc-rs |
| wasmtime | https://wasmtime.dev/ |
8.3 Dependencies
| Crate | Version | Purpose |
|---|---|---|
| fuser | 0.14+ | FUSE interface |
| tokio | 1.x | Async runtime |
| rusqlite | 0.31+ | SQLite bindings |
| sled | 0.34+ | Embedded key-value store |
| tantivy | 0.21+ | Full-text search |
| symphonia | 0.5+ | Audio metadata extraction |
| fastcdc | 3.x | Content-defined chunking |
| xxhash-rust | 0.8+ | Fast hashing |
| serde | 1.x | Serialization |
| toml | 0.8+ | Configuration |
| tracing | 0.1+ | Logging/instrumentation |
| metrics | 0.22+ | Prometheus metrics |