15 Commits

Author SHA1 Message Date
Alexander 90e9683076 Add persistent state implementation plan (SQLite)
Decision: SQLite (Option A) — existing schema, CRUD, row mapping,
and chunk_manifest column are already built but not wired into mount.

8-day plan to transform mount from O(N×origin_latency) to O(N×SQLite_read):
1. Database bulk load + manifest CRUD methods
2. Rewrite run_mount() with DB-load vs first-mount-scan paths
3. Persist chunk manifests via ManifestCached event
4. Wire tantivy + PatternStore + CollectionStore into mount
5. Background delta sync (origin vs DB reconciliation)
6. Shutdown WAL checkpoint
7-8. Integration testing + buffer
2026-05-13 16:02:25 +02:00
Alexander 3038c94b8c Add Phase C implementation plan (Production Hardening)
Merges practical items from resilience Phases C+D+E+F into one pass.
Turns all 6 remaining RED tests GREEN:
- D1/D2: Health check timeout + parallel join_all
- C6: Fix recursive CAS calculate_size()
- C7: FUSE read 30s timeout wrapper
- 6.4: Auto-re-fetch corrupt/missing chunks from origin
- 6.6: Passthrough fallback when CAS write fails
- C9: PID file with flock
- 5.3: fd exhaustion graceful handling
~4 days estimated.
2026-05-13 15:42:18 +02:00
Alexander 4e394c60ec Add Phase B implementation plan (Crash Recovery)
BlueDoc covering 6 issues with TDD flow:
- 2.8: CAS size pre-check (StoreFull error variant)
- 2.4: SQLite PRAGMA integrity_check on open
- 2.4: tantivy open_with_recovery (detect + rebuild)
- 3.5: sled corruption repair + fallback recreate
- 2.3: Graceful shutdown with CancellationToken
- 2.6: TaskSupervisor (monitor, detect panic, restart)
Turns 5 RED tests GREEN, adds 4 new tests. ~5 days.
2026-05-13 14:56:43 +02:00
Alexander 24086cc744 Add Phase A implementation plan (Stop Dying)
BlueDoc covering 6 critical resilience fixes with TDD flow:
- 2.9: RwLock → parking_lot (poison-free locks)
- 2.2: Panic hook with tracing integration
- 3.7+2.7: systemd ExecStopPost + stale mount cleanup
- 2.10: sd_notify READY/STOPPING
- 2.1: Signal handling via spawn_mount2 + tokio signals
Each with: stubs → RED tests → implementation → GREEN verify.
~5 days estimated, exact files and code patterns specified.
2026-05-13 14:00:46 +02:00
Alexander 00f14930cd Consolidate resilience testing into BlueDoc format
Replace original resilience-testing.md with BlueDoc-structured version.
All code examples from original preserved in Appendix A (17 sections).
Added: Abstract, Background, Goals/Non-Goals, Cross-Cutting Concerns,
Alternatives Considered (Jepsen, proptest, loom, mockall), phased
implementation plan with rollout order. Removed v2 suffix.
2026-05-13 12:54:20 +02:00
Alexander c6aa47f440 Add resilience testing BlueDoc (v2)
Restructured resilience testing strategy into BlueDoc template format
with proper sections: Abstract, Background, Goals/Non-Goals, Proposed
Design, Cross-Cutting Concerns, Alternatives Considered, Implementation
Plan, and Glossary. Original resilience-testing.md preserved.
2026-05-13 12:46:25 +02:00
Alexander 0b97905826 Add resilience testing strategy
Maps all 34 resilience issues to concrete test approaches across 3 layers:
trait-based mocks + failpoints (fast), fork-kill crash recovery (medium),
and Toxiproxy + Docker integration (thorough). Includes tooling choices
(fail crate, rlimit, nix, wiremock), test organization, failpoint
instrumentation map, and coverage matrix.
2026-05-13 12:22:21 +02:00
Alexander 87574ce008 Add resilience audit and persistent state plans
Comprehensive fault tolerance analysis covering 34 issues across 6 phases:
signal handling, crash recovery, cache corruption, network failures,
resource exhaustion, and the critical finding that no persistent state
is used on mount (every restart is a full origin rescan).

Persistent state plan covers storage engine options, mount flow redesign,
background delta sync, and the in-memory state inventory.
2026-05-13 12:09:41 +02:00
Alexander 5ac33987c0 Add comprehensive logging with tracing, file rotation, and systemd integration
- Add tracing-appender and tracing-journald for production logging
- Add LoggingConfig with trace_sample_rate, json_output, journald options
- Expand init_logging() with file rotation, journald, and stderr layers
- Add sanitize_path() helper for PII protection in logs
- Instrument FUSE operations with #[instrument] and trace decision points
- Instrument gRPC handlers (10 methods) with span correlation
- Add spawn instrumentation for health monitor, indexer, watcher tasks
- Add broadcast lag handling (RecvError::Lagged) in event subscribers
- Fix webhook.rs expect() calls with proper error handling
- Add logging to patterns.rs, collections.rs, artwork.rs database ops
- Add Drop impl logging for PluginManager and WatchHandle
- Update systemd service with rate limiting and journal output
- Add logrotate config and example config.toml with logging section
2026-05-13 11:21:51 +02:00
Alexander bc9fa36646 Add Week 10 Plugin System and Week 11 Control API
Week 10 - Plugin System (FR-19):
- Plugin traits: Plugin, OriginPlugin, MetadataPlugin, FormatPlugin
- NativePluginHost with libloading for dynamic loading
- WasmPluginHost (feature-gated) with wasmtime runtime
- PluginManager coordinating both hosts with version checks
- OriginInstance::watch() with WatchHandle, WatchEvent for live updates
- FormatPlugin::synthesize_header() for metadata overlay

Week 11 - Control API & Production (FR-17, FR-18, NFR-6, NFR-10):
- gRPC server with full MusicFS service (status, cache, origins, events)
- Proto extended: MountState enum, TierStats, full StatusResponse/CacheStats
- WebhookHandler with HMAC-SHA256 signing and exponential retry
- Metrics with latency histograms (p50/p95/p99) and origin health gauges
- CLI with mount, status, cache, search, origin, events, shutdown commands
- E2E player compatibility tests (mpv, VLC, file manager)
- systemd service, PKGBUILD, RPM spec for packaging

Plans added for Weeks 10-14 covering P1 features.
All 154 tests passing.
2026-05-13 10:34:01 +02:00
Alexander 3cb6dfcaf8 Add Week 8 Search API docs and Week 8-9 plans with Oracle fixes
- docs/api/search.md: FUSE and gRPC search API documentation
- Week 8 plan: Oracle fixes for IndexWriter pattern, moka cache, gRPC API
- Week 9 plan: Oracle fixes for artwork schema, spawn_blocking, access_log
- Week 7 performance review

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/claude-agent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-12 23:23:49 +02:00
Alexander 0e5a514015 Add Week 5-7 plans with Oracle review fixes
Week 5 (CDC & Delta Detection):
- Add read_full() method to avoid u32 overflow on >4GB files
- Add chunk_streaming() to avoid 200MB+ memory per file
- Implement scan_origin() recursive walk (was stub)
- Use spawn_blocking for watcher instead of separate runtime
- Add 200ms event debouncing
- Add >90% bandwidth reduction test

Week 6 (Origin Federation):
- Define all-origins-unhealthy behavior (least-bad selection)
- Track watch handles for cleanup on unregister
- Clarify tuple-based priority routing
- Add per-origin-type health thresholds
- Align retry delays with NFR-7.3 spec (100ms, 500ms, 2000ms)

Week 7 (Remote Origins):
- Replace SFTP single mutex with connection pool
- Add 30s timeout to all remote operations
- Custom Debug impl to redact credentials
- SSH host verification against known_hosts
- Clamp S3 range requests to file size
- Use head_bucket for S3 health checks
2026-05-12 19:48:40 +02:00
Alexander e575276b6f Add Week 4b plan: Origin-CAS connector for cache-miss handling
- Create week-04b-origin-connector.md with ContentFetcher design
- Update development-plan.md: Phase 1 now includes Week 4b
- Update architecture.md: Phase 1 table includes Week 4b
- Plan includes EventBus integration per FR-18.1 (Oracle-verified)
2026-05-12 18:55:58 +02:00
Alexander ffbb238633 Implement Week 4 CAS store with chunk deduplication and LRU eviction
- Add musicfs-cas crate: CasStore, ChunkHash, FileReader, ChunkManifest
- Add LruEviction policy to musicfs-cache for cache size management
- Integrate FileReader into FUSE filesystem for actual file reads
- Use xxHash64 for content hashing, sled for index, msgpack serialization
- Default cache path: ~/.cache/musicfs/chunks/ with 256 subdirs sharding
- 20 new tests (14 CAS unit + 3 integration + 3 eviction), 54 total
2026-05-12 18:43:39 +02:00
Alexander e08988f7f3 Add development plan and Oracle-validated weekly plans (Weeks 1-3)
development-plan.md (master plan):
- 11-week implementation broken into 4 phases
- 11 Rust crates with dependency graph
- Per-week deliverables, tests, exit criteria
- Deferred requirements (FR-21, FR-22) with rationale

plans/week-01-foundation.md:
- Workspace setup, core types, FUSE skeleton, local origin
- Origin trait with watch() method (arch 4.3.4)
- EventBus with FileAccessed event (FR-18.1)
- All EROFS handlers for read-only enforcement (FR-4.1-4.5)

plans/week-02-metadata.md:
- symphonia metadata extraction (FR-6.1-6.5)
- SQLite schema matching architecture 4.3.6 exactly
- Column names: track/disc (not track_number/disc_number)
- Hash columns as TEXT (hex-encoded, not BLOB)
- Added idx_files_real index (FR-7.3)

plans/week-03-virtual-tree.md:
- Path resolver with $var syntax (arch 4.3.1)
- Template vars: $artist, $album, $title, $track, $year, $disc, $genre, $format, $format_upper
- RefreshPolicy struct for FR-9.3 (TTL-based refresh)
- force_refresh() method for FR-9.4 (signal/API refresh)

All plans Oracle-validated against architecture.md and requirements.md
2026-05-12 17:52:33 +02:00