Commit Graph

5 Commits

Author SHA1 Message Date
Alexander 0ff2a17ab7 Implement Phase C: Production Hardening
Implements phase-c-hardening.md to fix 6 RED resilience tests:

- D1/D2: Health check timeout (1.5s) + parallel execution via join_all
- C6: Recursive CAS calculate_size() to scan shard subdirectories
- C7: FUSE read timeout (30s) returns EIO instead of hanging
- 6.4: Auto-re-fetch corrupt/missing chunks from origin
- 6.6: Passthrough mode - continue even when CAS write fails
- C9: PID file with flock prevents concurrent mounts
- 5.3: fd exhaustion handling test

All 27 resilience tests now pass. Full test suite green.

Files changed:
- musicfs-origins/src/health.rs: timeout + join_all
- musicfs-origins/Cargo.toml: add futures dependency
- musicfs-cas/src/store.rs: recursive calculate_size
- musicfs-cas/src/reader.rs: auto-re-fetch on IntegrityError/NotFound
- musicfs-cas/src/fetcher.rs: passthrough fallback
- musicfs-fuse/src/filesystem.rs: 30s read timeout
- musicfs-cli/src/main.rs: PID file with flock
- musicfs-test-utils/tests/resilience.rs: updated tests
2026-05-13 15:55:22 +02:00
Alexander 6285eeb6c0 Implement Phase A: Stop Dying resilience fixes
Implements all 6 critical resilience fixes from phase-a-stop-dying.md:

- Issue 2.9: Migrate std::sync::RwLock → parking_lot::RwLock (7 files)
  Prevents lock poisoning cascade on writer panic

- Issue 2.2: Add install_panic_hook() to log panics via tracing
  Ensures panics are captured in logs/journald before process death

- Issue 3.7: Add ExecStopPost to systemd service
  Cleans up stale FUSE mounts on service stop

- Issue 2.7: Add check_stale_mount() detection on startup
  Auto-cleans leftover mounts from previous crashes

- Issue 2.10: Integrate sd_notify for systemd lifecycle
  Sends READY=1 after mount, STOPPING on shutdown

- Issue 2.1: Add signal handling with spawn_mount
  Catches SIGTERM/SIGINT for clean shutdown instead of instant death

All 7 Phase A tests pass:
- test_poisoned_tree_lock_returns_eio_not_panic
- test_parking_lot_rwlock_survives_panic
- test_panic_hook_logs_to_tracing
- test_systemd_service_has_execstoppost
- test_stale_mount_check_function_exists
- test_sd_notify_ready_sent
- test_sigterm_triggers_shutdown
2026-05-13 14:48:32 +02:00
Alexander 09f019730f Implement Week 7 Remote Origins with Oracle fixes
- Add credentials.rs with CredentialStore, redacted Debug (session_token shows [REDACTED])
- Add nfs.rs with ESTALE retry using Fn closure, 5s health timeout
- Add smb.rs with ENOTCONN retry handling, 5s health timeout
- Add s3.rs/sftp.rs feature-gated stubs with security documentation
- Add error variants: S3, Sftp, Timeout, Credential, NfsStaleHandle
- Fix delta.rs unused imports

Oracle fixes applied:
- SMB retry_on_disconnect for ENOTCONN (errno 107)
- session_token Debug shows [REDACTED] when Some, None otherwise
- NFS/SMB health checks wrapped with tokio::time::timeout(5s)

102 tests pass, 0 warnings.
2026-05-12 22:26:19 +02:00
Alexander d5ef68c9c9 Implement Week 6 Origin Federation with Oracle fixes
New files:
- musicfs-core/src/config.rs: Config, OriginConfig, HealthConfig
- musicfs-origins/src/registry.rs: OriginRegistry with watch cleanup
- musicfs-origins/src/router.rs: Priority router with (priority, latency) ordering
- musicfs-origins/src/health.rs: HealthMonitor with per-origin-type thresholds
- musicfs-origins/src/failover.rs: FailoverExecutor with NFR-7.3 backoff

Oracle fixes applied:
- Per-OriginType threshold: Local=1, Remote=3 (check_one uses threshold_for)
- AllOriginsUnhealthy event: Added to events.rs, emitted in select_with_fallback
- Unified OriginType: Removed duplicate from traits.rs, use musicfs_core::OriginType
- Watch handle cleanup: Tracked and dropped on unregister()
- Retry delays: 100ms, 500ms, 2000ms (NFR-7.3 compliant)

Tests: 91 pass (+20 new)
2026-05-12 20:15:56 +02:00
Alexander 76856b893a Implement Week 1 foundation: workspace, core types, FUSE skeleton, LocalOrigin
- musicfs-core: OriginId, FileId, VirtualPath, ContentHash, AudioMeta,
  FileMeta, EventBus with FileAccessed event (5 tests)
- musicfs-fuse: FUSE skeleton with EROFS handlers for write ops
- musicfs-origins: Origin trait with watch(), LocalOrigin impl (6 tests)
- flake.nix: Nix dev shell with rust toolchain, clang, lld, fuse3

All 11 tests pass. Build produces no warnings.
2026-05-12 18:01:47 +02:00