Implements all 6 critical resilience fixes from phase-a-stop-dying.md:
- Issue 2.9: Migrate std::sync::RwLock → parking_lot::RwLock (7 files)
Prevents lock poisoning cascade on writer panic
- Issue 2.2: Add install_panic_hook() to log panics via tracing
Ensures panics are captured in logs/journald before process death
- Issue 3.7: Add ExecStopPost to systemd service
Cleans up stale FUSE mounts on service stop
- Issue 2.7: Add check_stale_mount() detection on startup
Auto-cleans leftover mounts from previous crashes
- Issue 2.10: Integrate sd_notify for systemd lifecycle
Sends READY=1 after mount, STOPPING on shutdown
- Issue 2.1: Add signal handling with spawn_mount
Catches SIGTERM/SIGINT for clean shutdown instead of instant death
All 7 Phase A tests pass:
- test_poisoned_tree_lock_returns_eio_not_panic
- test_parking_lot_rwlock_survives_panic
- test_panic_hook_logs_to_tracing
- test_systemd_service_has_execstoppost
- test_stale_mount_check_function_exists
- test_sd_notify_ready_sent
- test_sigterm_triggers_shutdown
Phase 1 of resilience testing design doc implementation:
- New musicfs-test-utils crate with FaultyOrigin, FaultyCasStore, fixtures
- Failpoints instrumented in musicfs-cas/store.rs
- 16 resilience tests (13 RED for missing features, 3 GREEN for existing)
- 3 Docker/Toxiproxy network tests (RED until docker-compose up)
- docker-compose.yml for Toxiproxy + MinIO + SFTP test infrastructure
Tests properly fail-first (TDD): check_all() sequential, no health timeout,
missing corruption detection, no passthrough mode, etc.
- Add tracing-appender and tracing-journald for production logging
- Add LoggingConfig with trace_sample_rate, json_output, journald options
- Expand init_logging() with file rotation, journald, and stderr layers
- Add sanitize_path() helper for PII protection in logs
- Instrument FUSE operations with #[instrument] and trace decision points
- Instrument gRPC handlers (10 methods) with span correlation
- Add spawn instrumentation for health monitor, indexer, watcher tasks
- Add broadcast lag handling (RecvError::Lagged) in event subscribers
- Fix webhook.rs expect() calls with proper error handling
- Add logging to patterns.rs, collections.rs, artwork.rs database ops
- Add Drop impl logging for PluginManager and WatchHandle
- Update systemd service with rate limiting and journal output
- Add logrotate config and example config.toml with logging section
- Add CdcChunker using FastCDC v3 (16KB/64KB/256KB chunks)
- Add DeltaDetector with scan_origin() returning ScannedFile (no FileId assignment)
- Add OriginWatcher with inotify and 200ms debounce using tokio::spawn
- Fix LocalOrigin::read() to loop until all bytes read
- Add read_full() method to Origin trait
- Add mtime field to ChunkManifest
- Update ContentFetcher to use CDC chunking
- Update bandwidth reduction test to assert >90% (NFR-6.4)
Tests: 71 pass (+11 new)