Implement Phase B: Crash Recovery

Add startup integrity checks, corruption recovery, CAS size limits,
graceful shutdown orchestration, and a task supervisor — turning 5
previously-RED resilience tests GREEN and adding 5 new tests.

- CAS: pre-check size limit in put(), add StoreFull error variant
- CAS: sled corruption recovery in open() (retry then recreate)
- SQLite: open_with_integrity_check() via PRAGMA integrity_check(1)
- tantivy: open_with_recovery() deletes and rebuilds corrupt index
- CLI: CancellationToken-based ordered shutdown sequence
- Core: TaskSupervisor with spawn_supervised/spawn_critical + backoff
- Tests: replace 4 todo!() stubs, add 5 new shutdown/supervisor tests
This commit is contained in:
Alexander
2026-05-13 15:33:23 +02:00
parent 4e394c60ec
commit 5da96ffab2
12 changed files with 485 additions and 14 deletions
@@ -9,6 +9,7 @@ musicfs-core = { path = "../musicfs-core" }
musicfs-origins = { path = "../musicfs-origins" }
musicfs-cas = { path = "../musicfs-cas" }
musicfs-cache = { path = "../musicfs-cache" }
musicfs-search = { path = "../musicfs-search" }
async-trait.workspace = true
tokio = { workspace = true, features = ["full", "sync", "time"] }
@@ -37,5 +38,6 @@ full = ["failpoints", "process-tests", "resource-limits", "docker-tests"]
[dev-dependencies]
tokio-test = "0.4"
tokio-util.workspace = true
sd-notify.workspace = true
libc.workspace = true