Implement Phase C: Production Hardening

Implements phase-c-hardening.md to fix 6 RED resilience tests:

- D1/D2: Health check timeout (1.5s) + parallel execution via join_all
- C6: Recursive CAS calculate_size() to scan shard subdirectories
- C7: FUSE read timeout (30s) returns EIO instead of hanging
- 6.4: Auto-re-fetch corrupt/missing chunks from origin
- 6.6: Passthrough mode - continue even when CAS write fails
- C9: PID file with flock prevents concurrent mounts
- 5.3: fd exhaustion handling test

All 27 resilience tests now pass. Full test suite green.

Files changed:
- musicfs-origins/src/health.rs: timeout + join_all
- musicfs-origins/Cargo.toml: add futures dependency
- musicfs-cas/src/store.rs: recursive calculate_size
- musicfs-cas/src/reader.rs: auto-re-fetch on IntegrityError/NotFound
- musicfs-cas/src/fetcher.rs: passthrough fallback
- musicfs-fuse/src/filesystem.rs: 30s read timeout
- musicfs-cli/src/main.rs: PID file with flock
- musicfs-test-utils/tests/resilience.rs: updated tests
This commit is contained in:
Alexander
2026-05-13 15:55:22 +02:00
parent 3038c94b8c
commit 0ff2a17ab7
11 changed files with 325 additions and 39 deletions
+11 -3
View File
@@ -386,19 +386,27 @@ impl Filesystem for MusicFs {
let handle = self.runtime_handle.clone();
let result = std::thread::scope(|_| {
handle.block_on(async {
reader.read(file_id, offset as u64, size).await
tokio::time::timeout(
Duration::from_secs(30),
reader.read(file_id, offset as u64, size),
)
.await
})
});
match result {
Ok(data) => {
Ok(Ok(data)) => {
trace!(ino, offset, size_bytes = size, bytes_read = data.len(), "read successful");
reply.data(&data);
}
Err(e) => {
Ok(Err(e)) => {
warn!(ino, offset, size_bytes = size, error = %e, "read failed");
reply.error(libc::EIO);
}
Err(_timeout) => {
warn!(ino, offset, size_bytes = size, "read timed out after 30s");
reply.error(libc::EIO);
}
}
}