Compare commits
10 Commits
823aaf3fe4
...
90e9683076
| Author | SHA1 | Date | |
|---|---|---|---|
| 90e9683076 | |||
| 0ff2a17ab7 | |||
| 3038c94b8c | |||
| 5da96ffab2 | |||
| 4e394c60ec | |||
| 6285eeb6c0 | |||
| 24086cc744 | |||
| e3eeba4650 | |||
| 00f14930cd | |||
| c6aa47f440 |
@@ -0,0 +1,796 @@
|
||||
# Persistent State: Implementation Plan
|
||||
|
||||
**Authors:** AI-assisted
|
||||
**Status:** Draft
|
||||
**Last Updated:** 2026-05-13
|
||||
**Reviewers:** TBD
|
||||
**Approvers:** TBD
|
||||
**Prerequisites:** [persistent-state.md](persistent-state.md) (research), [phase-a-stop-dying.md](phase-a-stop-dying.md) (signal handling + shutdown)
|
||||
**Estimated Effort:** ~8 days
|
||||
|
||||
---
|
||||
|
||||
[TOC]
|
||||
|
||||
---
|
||||
|
||||
## 1. Abstract
|
||||
|
||||
Wire up the existing SQLite persistence layer into the mount path so that subsequent mounts load from database instead of rescanning origins. This transforms mount time from O(N × origin_latency) to O(N × SQLite_read) — roughly 1000x faster for remote origins.
|
||||
|
||||
**Storage decision: SQLite (Option A).** Rationale:
|
||||
- `Database` struct with full CRUD already exists in `musicfs-cache/src/db.rs`
|
||||
- Schema with `chunk_manifest BLOB` column already exists in `schema.sql`
|
||||
- `ChunkManifest::from_db()` and `chunks_to_bytes()` already exist but are never called
|
||||
- Row-to-`FileMeta` mapping already exists in `get_file_by_virtual_path()`
|
||||
- WAL mode crash safety already configured
|
||||
- 2-4 second bulk load for 1M rows is acceptable (target is <5s, not <500ms — the <500ms target is for the mount syscall itself, which returns immediately with lazy tree loading)
|
||||
|
||||
No new storage engine. No new dependencies. Wire existing code.
|
||||
|
||||
---
|
||||
|
||||
## 2. Background
|
||||
|
||||
### 2.1 Current State
|
||||
|
||||
`run_mount()` in `main.rs`:
|
||||
1. Opens CAS store ✅
|
||||
2. Creates origin connection ✅
|
||||
3. `scan_music_files()` — walks entire origin, parses every file with symphonia ❌ **BOTTLENECK**
|
||||
4. Builds VirtualTree from scan results (in-memory only) ❌ **LOST ON RESTART**
|
||||
5. Registers every file in ContentFetcher (in-memory only) ❌ **LOST ON RESTART**
|
||||
6. Mounts FUSE ✅
|
||||
|
||||
### 2.2 What Exists But Is Not Wired
|
||||
|
||||
| Component | Exists | Wired Into Mount? |
|
||||
|-----------|--------|--------------------|
|
||||
| `Database::open()` + schema + WAL | ✅ | ❌ |
|
||||
| `Database::upsert_file()` | ✅ | ❌ |
|
||||
| `Database::get_file_by_virtual_path()` (returns `FileMeta`) | ✅ | ❌ |
|
||||
| `schema.sql` with `chunk_manifest BLOB` column | ✅ | ❌ |
|
||||
| `ChunkManifest::chunks_to_bytes()` (serialize) | ✅ | ❌ |
|
||||
| `ChunkManifest::from_db()` (deserialize) | ✅ | ❌ |
|
||||
| `TreeBuilder::add_file(&FileMeta)` | ✅ | ✅ (from scan, not from DB) |
|
||||
| `ContentFetcher::register_file(FileMeta)` | ✅ | ✅ (from scan, not from DB) |
|
||||
| `PatternStore::new(db_path)` (loads from SQLite on open) | ✅ | ❌ |
|
||||
| `CollectionStore::new(db_path)` | ✅ | ❌ |
|
||||
| `SearchIndex::open(path)` (opens tantivy from disk) | ✅ | ❌ |
|
||||
|
||||
### 2.3 What's Missing
|
||||
|
||||
| Component | Needs Building |
|
||||
|-----------|----------------|
|
||||
| `Database::list_all_files()` → `Vec<FileMeta>` | New method (SQL exists, just needs `SELECT *`) |
|
||||
| `Database::update_manifest(FileId, &[u8])` | New method (column exists) |
|
||||
| `Database::get_manifest(FileId)` → `Option<Vec<u8>>` | New method |
|
||||
| `Database::list_all_manifests()` → `Vec<(FileId, ChunkManifest)>` | New method |
|
||||
| Background delta sync task | New (compare DB state vs origin) |
|
||||
| First-mount detection | New (check `file_count() > 0`) |
|
||||
|
||||
---
|
||||
|
||||
## 3. Goals & Non-Goals
|
||||
|
||||
### 3.1 Goals
|
||||
|
||||
- Subsequent mount loads tree from SQLite, not origin scan
|
||||
- Chunk manifests persist to SQLite, loaded on mount (no re-download)
|
||||
- tantivy index, PatternStore, CollectionStore opened on mount
|
||||
- Background delta sync reconciles DB vs origin after mount
|
||||
- First mount (empty DB) falls back to current full-scan behavior
|
||||
- Mount time for 10K files: <1 second (subsequent mount)
|
||||
- All existing tests pass, no regressions
|
||||
|
||||
### 3.2 Non-Goals
|
||||
|
||||
- Achieving <500ms mount for 1M+ files (requires lazy tree loading — future work)
|
||||
- LRU eviction persistence (separate task, low urgency)
|
||||
- Changing the storage engine (SQLite is the decision)
|
||||
- Config file parsing changes (origin config stays in TOML, not DB)
|
||||
- Schema migrations for existing data (fresh DB on first mount)
|
||||
|
||||
---
|
||||
|
||||
## 4. Proposed Design
|
||||
|
||||
### 4.1 Implementation Order
|
||||
|
||||
```
|
||||
4.2 Database: list_all_files() + manifest CRUD (foundation)
|
||||
↓
|
||||
4.3 Mount path: load tree + fetcher from DB (core change)
|
||||
↓
|
||||
4.4 Persist manifests after fetch (write path)
|
||||
↓
|
||||
4.5 Open tantivy + PatternStore + CollectionStore (quick wiring)
|
||||
↓
|
||||
4.6 Background delta sync (post-mount reconciliation)
|
||||
↓
|
||||
4.7 First-mount detection + fallback (edge case)
|
||||
↓
|
||||
4.8 Shutdown: WAL checkpoint + flush (cleanup)
|
||||
```
|
||||
|
||||
### 4.2 Database: New Methods
|
||||
|
||||
**File**: `musicfs-cache/src/db.rs`
|
||||
|
||||
#### list_all_files()
|
||||
|
||||
Bulk load all files from DB. Reuses the existing row-to-FileMeta mapping from `get_file_by_virtual_path()`.
|
||||
|
||||
```rust
|
||||
pub fn list_all_files(&self) -> Result<Vec<FileMeta>> {
|
||||
let conn = self.conn.lock().unwrap();
|
||||
|
||||
let mut stmt = conn.prepare(
|
||||
r#"SELECT id, origin_id, real_path, virtual_path,
|
||||
title, artist, album, album_artist, genre,
|
||||
year, track, disc,
|
||||
duration_ms, bitrate, sample_rate, format,
|
||||
origin_mtime, origin_size, content_hash
|
||||
FROM files
|
||||
ORDER BY virtual_path"#
|
||||
).map_err(|e| Error::Database(format!("prepare failed: {}", e)))?;
|
||||
|
||||
let files = stmt.query_map([], |row| {
|
||||
// Same mapping as get_file_by_virtual_path
|
||||
Ok(Self::row_to_file_meta(row))
|
||||
})
|
||||
.map_err(|e| Error::Database(format!("query failed: {}", e)))?
|
||||
.filter_map(|r| r.ok())
|
||||
.collect();
|
||||
|
||||
Ok(files)
|
||||
}
|
||||
```
|
||||
|
||||
Extract the row mapping into a shared `row_to_file_meta(row)` helper to avoid duplication with `get_file_by_virtual_path()`.
|
||||
|
||||
#### Manifest CRUD
|
||||
|
||||
```rust
|
||||
pub fn update_manifest(&self, file_id: FileId, manifest_blob: &[u8]) -> Result<()> {
|
||||
let conn = self.conn.lock().unwrap();
|
||||
conn.execute(
|
||||
"UPDATE files SET chunk_manifest = ?1 WHERE id = ?2",
|
||||
params![manifest_blob, file_id.0],
|
||||
).map_err(|e| Error::Database(format!("update manifest failed: {}", e)))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn get_manifest(&self, file_id: FileId) -> Result<Option<Vec<u8>>> {
|
||||
let conn = self.conn.lock().unwrap();
|
||||
conn.query_row(
|
||||
"SELECT chunk_manifest FROM files WHERE id = ?1",
|
||||
params![file_id.0],
|
||||
|row| row.get(0),
|
||||
)
|
||||
.optional()
|
||||
.map_err(|e| Error::Database(format!("get manifest failed: {}", e)))
|
||||
}
|
||||
|
||||
pub fn list_all_manifests(&self) -> Result<Vec<(FileId, u64, i64, Vec<u8>)>> {
|
||||
let conn = self.conn.lock().unwrap();
|
||||
let mut stmt = conn.prepare(
|
||||
"SELECT id, origin_size, origin_mtime, chunk_manifest FROM files WHERE chunk_manifest IS NOT NULL"
|
||||
).map_err(|e| Error::Database(format!("prepare failed: {}", e)))?;
|
||||
|
||||
let manifests = stmt.query_map([], |row| {
|
||||
Ok((
|
||||
FileId(row.get(0)?),
|
||||
row.get::<_, i64>(1)? as u64,
|
||||
row.get::<_, i64>(2)?,
|
||||
row.get::<_, Vec<u8>>(3)?,
|
||||
))
|
||||
})
|
||||
.map_err(|e| Error::Database(format!("query failed: {}", e)))?
|
||||
.filter_map(|r| r.ok())
|
||||
.collect();
|
||||
|
||||
Ok(manifests)
|
||||
}
|
||||
```
|
||||
|
||||
#### WAL Checkpoint
|
||||
|
||||
```rust
|
||||
pub fn checkpoint(&self) -> Result<()> {
|
||||
let conn = self.conn.lock().unwrap();
|
||||
conn.execute_batch("PRAGMA wal_checkpoint(TRUNCATE)")
|
||||
.map_err(|e| Error::Database(format!("WAL checkpoint failed: {}", e)))?;
|
||||
info!("SQLite WAL checkpoint completed");
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
#### Tests
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn test_list_all_files() {
|
||||
let db = Database::open_memory().unwrap();
|
||||
// Insert 3 files
|
||||
// list_all_files() returns 3
|
||||
// Verify FileMeta fields match what was inserted
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_manifest_roundtrip() {
|
||||
let db = Database::open_memory().unwrap();
|
||||
// Insert file, update_manifest with blob, get_manifest returns same blob
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_list_all_manifests_skips_null() {
|
||||
let db = Database::open_memory().unwrap();
|
||||
// Insert 3 files, only 1 with manifest
|
||||
// list_all_manifests() returns 1
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.3 Mount Path: Load From DB
|
||||
|
||||
**File**: `musicfs-cli/src/main.rs` — rewrite `run_mount()`
|
||||
|
||||
The key change: replace `scan_music_files()` with DB load when data exists.
|
||||
|
||||
```rust
|
||||
fn run_mount(mountpoint: PathBuf, origin_path: Option<PathBuf>, cache_dir: Option<PathBuf>) -> Result<()> {
|
||||
let origin_path = origin_path.context("--origin is required")?;
|
||||
let runtime = tokio::runtime::Runtime::new()?;
|
||||
let handle = runtime.handle().clone();
|
||||
|
||||
let (tree, reader, db) = runtime.block_on(async {
|
||||
let cache_dir = resolve_cache_dir(cache_dir);
|
||||
std::fs::create_dir_all(&cache_dir)?;
|
||||
std::fs::create_dir_all(&mountpoint)?;
|
||||
|
||||
// Open CAS store
|
||||
let store = Arc::new(CasStore::open(CasConfig {
|
||||
chunks_dir: cache_dir.join("chunks"),
|
||||
..Default::default()
|
||||
}).await?);
|
||||
|
||||
// Open database
|
||||
let db_path = cache_dir.join("metadata.db");
|
||||
let db = Arc::new(Database::open_with_integrity_check(&db_path)
|
||||
.or_else(|_| Database::open(&db_path))?); // Fallback to normal open if integrity check fails
|
||||
|
||||
let fetcher = Arc::new(ContentFetcher::new(store.clone()));
|
||||
let origin_id = OriginId::from("local");
|
||||
let origin = Arc::new(LocalOrigin::new(origin_id.clone(), origin_path.clone()));
|
||||
fetcher.register_origin(origin);
|
||||
|
||||
// Decide: load from DB or full scan
|
||||
let file_count = db.file_count().unwrap_or(0);
|
||||
|
||||
let files = if file_count > 0 {
|
||||
// SUBSEQUENT MOUNT — load from DB
|
||||
info!(file_count, "Loading metadata from database");
|
||||
let start = Instant::now();
|
||||
let files = db.list_all_files()?;
|
||||
info!(elapsed_ms = start.elapsed().as_millis() as u64, "Database load complete");
|
||||
files
|
||||
} else {
|
||||
// FIRST MOUNT — full origin scan
|
||||
info!("First mount: scanning origin");
|
||||
let files = scan_music_files(&origin_path, &origin_id).await?;
|
||||
info!(file_count = files.len(), "Scan complete, persisting to database");
|
||||
|
||||
// Persist to DB for next mount
|
||||
for file in &files {
|
||||
if let Some(ref audio) = file.audio {
|
||||
db.upsert_file(
|
||||
&file.real_path.origin_id,
|
||||
&file.real_path.path,
|
||||
&file.virtual_path,
|
||||
audio,
|
||||
file.mtime,
|
||||
file.size,
|
||||
)?;
|
||||
}
|
||||
}
|
||||
info!("Metadata persisted to database");
|
||||
files
|
||||
};
|
||||
|
||||
// Build tree + register files (same as before, but from DB or scan)
|
||||
let mut builder = TreeBuilder::new();
|
||||
for file in &files {
|
||||
builder.add_file(file);
|
||||
fetcher.register_file(file.clone());
|
||||
}
|
||||
let tree = Arc::new(RwLock::new(builder.build()));
|
||||
|
||||
// Load manifests from DB
|
||||
let reader = Arc::new(FileReader::with_fetcher(store, fetcher));
|
||||
let manifest_count = load_manifests_from_db(&db, &reader)?;
|
||||
if manifest_count > 0 {
|
||||
info!(manifest_count, "Loaded chunk manifests from database");
|
||||
}
|
||||
|
||||
Ok::<_, anyhow::Error>((tree, reader, db))
|
||||
})?;
|
||||
|
||||
// Open search index
|
||||
let search_dir = cache_dir.join("search.idx");
|
||||
let _search_index = SearchIndex::open_with_recovery(&search_dir)
|
||||
.context("Failed to open search index")?;
|
||||
|
||||
// Open pattern store
|
||||
let patterns_path = cache_dir.join("patterns.db");
|
||||
let _pattern_store = PatternStore::new(&patterns_path, 30)
|
||||
.context("Failed to open pattern store")?;
|
||||
|
||||
// ... mount, signal handler, shutdown (same as current) ...
|
||||
|
||||
// On shutdown: checkpoint WAL
|
||||
db.checkpoint().unwrap_or_else(|e| warn!("WAL checkpoint failed: {}", e));
|
||||
}
|
||||
```
|
||||
|
||||
Helper function:
|
||||
|
||||
```rust
|
||||
fn load_manifests_from_db(db: &Database, reader: &FileReader) -> Result<usize> {
|
||||
let manifests = db.list_all_manifests()?;
|
||||
let mut count = 0;
|
||||
for (file_id, total_size, mtime, blob) in manifests {
|
||||
if let Some(manifest) = ChunkManifest::from_db(file_id, total_size, mtime, &blob) {
|
||||
reader.register_manifest(manifest);
|
||||
count += 1;
|
||||
}
|
||||
}
|
||||
Ok(count)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.4 Persist Manifests After Fetch
|
||||
|
||||
**File**: `musicfs-cas/src/fetcher.rs`
|
||||
|
||||
After `fetch_file()` downloads and chunks a file, persist the manifest to SQLite.
|
||||
|
||||
The fetcher currently doesn't have access to the Database. Two options:
|
||||
1. Pass `Arc<Database>` to ContentFetcher (adds dependency musicfs-cas → musicfs-cache)
|
||||
2. Emit an event with the manifest, have the caller persist it
|
||||
|
||||
**Approach**: Option 2 — use the existing EventBus. Add a new event variant:
|
||||
|
||||
**File**: `musicfs-core/src/events.rs`
|
||||
|
||||
```rust
|
||||
pub enum Event {
|
||||
// ... existing variants
|
||||
ManifestCached {
|
||||
file_id: FileId,
|
||||
manifest_blob: Vec<u8>,
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
**File**: `musicfs-cas/src/fetcher.rs` — emit event after fetch:
|
||||
|
||||
```rust
|
||||
pub async fn fetch_file(&self, file_id: FileId) -> Result<ChunkManifest, FetchError> {
|
||||
// ... existing fetch + chunk logic ...
|
||||
|
||||
// Emit manifest for persistence
|
||||
if let Some(bus) = &self.event_bus {
|
||||
bus.publish(Event::ManifestCached {
|
||||
file_id,
|
||||
manifest_blob: manifest.chunks_to_bytes(),
|
||||
});
|
||||
}
|
||||
|
||||
Ok(manifest)
|
||||
}
|
||||
```
|
||||
|
||||
**File**: `musicfs-cli/src/main.rs` — subscribe to ManifestCached events:
|
||||
|
||||
```rust
|
||||
// Spawn manifest persistence listener
|
||||
let db_for_manifests = db.clone();
|
||||
let mut manifest_rx = event_bus.subscribe();
|
||||
tokio::spawn(async move {
|
||||
while let Ok(event) = manifest_rx.recv().await {
|
||||
if let Event::ManifestCached { file_id, manifest_blob } = event {
|
||||
if let Err(e) = db_for_manifests.update_manifest(file_id, &manifest_blob) {
|
||||
warn!(file_id = ?file_id, error = %e, "Failed to persist manifest");
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.5 Open tantivy + PatternStore + CollectionStore
|
||||
|
||||
These already have `open()` methods that load from disk. Just call them in the mount path.
|
||||
|
||||
**File**: `musicfs-cli/src/main.rs`
|
||||
|
||||
```rust
|
||||
// After tree is built, before FUSE mount
|
||||
|
||||
// Search index
|
||||
let search_dir = cache_dir.join("search.idx");
|
||||
let search_index = Arc::new(
|
||||
SearchIndex::open_with_recovery(&search_dir)
|
||||
.unwrap_or_else(|e| {
|
||||
warn!("Search index failed, creating fresh: {}", e);
|
||||
SearchIndex::open(&search_dir).expect("Failed to create search index")
|
||||
})
|
||||
);
|
||||
|
||||
// Pattern store (already persists to SQLite, loads sequence_counts on open)
|
||||
let patterns_path = cache_dir.join("patterns.db");
|
||||
let pattern_store = Arc::new(
|
||||
PatternStore::new(&patterns_path, 30)
|
||||
.unwrap_or_else(|e| {
|
||||
warn!("Pattern store failed: {}", e);
|
||||
PatternStore::new(&patterns_path, 30).expect("Failed to create pattern store")
|
||||
})
|
||||
);
|
||||
|
||||
// Collection store
|
||||
let collections_path = cache_dir.join("collections.db");
|
||||
let collection_store = Arc::new(
|
||||
CollectionStore::new(&collections_path)
|
||||
.unwrap_or_else(|e| {
|
||||
warn!("Collection store failed: {}", e);
|
||||
CollectionStore::new(&collections_path).expect("Failed to create collection store")
|
||||
})
|
||||
);
|
||||
```
|
||||
|
||||
For tantivy: if this is a first mount, index all files after scan:
|
||||
|
||||
```rust
|
||||
if file_count == 0 {
|
||||
// First mount — index all files
|
||||
info!("First mount: building search index");
|
||||
let indexer = Indexer::new(search_index.clone(), event_bus.clone(), /* metadata_lookup */);
|
||||
indexer.index_batch(&files)?;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.6 Background Delta Sync
|
||||
|
||||
After mount completes, spawn a background task that compares DB state against origin and reconciles differences.
|
||||
|
||||
**File**: `musicfs-sync/src/delta.rs` or new `musicfs-cli/src/sync.rs`
|
||||
|
||||
```rust
|
||||
pub async fn background_delta_sync(
|
||||
origin: Arc<dyn Origin>,
|
||||
origin_id: OriginId,
|
||||
db: Arc<Database>,
|
||||
tree: Arc<RwLock<VirtualTree>>,
|
||||
fetcher: Arc<ContentFetcher>,
|
||||
event_bus: Arc<EventBus>,
|
||||
) -> Result<SyncSummary> {
|
||||
info!("Starting background delta sync");
|
||||
let start = Instant::now();
|
||||
|
||||
let mut added = 0u64;
|
||||
let mut modified = 0u64;
|
||||
let mut removed = 0u64;
|
||||
let mut unchanged = 0u64;
|
||||
|
||||
// Get all files currently in DB
|
||||
let db_files: HashMap<PathBuf, FileMeta> = db.list_all_files()?
|
||||
.into_iter()
|
||||
.map(|f| (f.real_path.path.clone(), f))
|
||||
.collect();
|
||||
|
||||
// Walk origin
|
||||
let origin_files = scan_origin_recursive(&origin, Path::new("/")).await?;
|
||||
|
||||
// Compare
|
||||
for (path, origin_stat) in &origin_files {
|
||||
match db_files.get(path) {
|
||||
Some(db_file) if db_file.mtime == origin_stat.mtime && db_file.size == origin_stat.size => {
|
||||
unchanged += 1;
|
||||
}
|
||||
Some(db_file) => {
|
||||
// Modified — re-parse metadata, update DB, update tree
|
||||
modified += 1;
|
||||
// ... update logic ...
|
||||
}
|
||||
None => {
|
||||
// New file — parse metadata, add to DB + tree
|
||||
added += 1;
|
||||
// ... add logic ...
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Find removed files (in DB but not on origin)
|
||||
let origin_paths: HashSet<_> = origin_files.keys().collect();
|
||||
for (path, db_file) in &db_files {
|
||||
if !origin_paths.contains(path) {
|
||||
removed += 1;
|
||||
db.delete_file(db_file.id)?;
|
||||
tree.write().remove_file(&db_file.virtual_path);
|
||||
}
|
||||
}
|
||||
|
||||
let elapsed = start.elapsed();
|
||||
info!(
|
||||
added, modified, removed, unchanged,
|
||||
elapsed_ms = elapsed.as_millis() as u64,
|
||||
"Delta sync complete"
|
||||
);
|
||||
|
||||
Ok(SyncSummary { added, modified, removed, unchanged })
|
||||
}
|
||||
```
|
||||
|
||||
Spawn in `run_mount()` after FUSE mount:
|
||||
|
||||
```rust
|
||||
// Background delta sync (non-blocking)
|
||||
let sync_db = db.clone();
|
||||
let sync_tree = tree.clone();
|
||||
let sync_fetcher = fetcher.clone();
|
||||
let sync_origin = origin.clone();
|
||||
let sync_origin_id = origin_id.clone();
|
||||
let sync_bus = event_bus.clone();
|
||||
tokio::spawn(async move {
|
||||
if let Err(e) = background_delta_sync(
|
||||
sync_origin, sync_origin_id, sync_db, sync_tree, sync_fetcher, sync_bus,
|
||||
).await {
|
||||
warn!("Delta sync failed: {}", e);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.7 First-Mount Detection
|
||||
|
||||
Simple: check `db.file_count()`:
|
||||
|
||||
```rust
|
||||
let file_count = db.file_count().unwrap_or(0);
|
||||
|
||||
if file_count > 0 {
|
||||
// Load from DB
|
||||
} else {
|
||||
// Full scan + persist
|
||||
}
|
||||
```
|
||||
|
||||
This is already shown in Section 4.3. No separate implementation step.
|
||||
|
||||
---
|
||||
|
||||
### 4.8 Shutdown: WAL Checkpoint + Flush
|
||||
|
||||
**File**: `musicfs-cli/src/main.rs` — in the shutdown sequence (after signal, before dropping session):
|
||||
|
||||
```rust
|
||||
info!("Beginning ordered shutdown");
|
||||
shutdown_token.cancel();
|
||||
tokio::time::sleep(Duration::from_millis(500)).await;
|
||||
|
||||
// Flush persistence
|
||||
if let Err(e) = db.checkpoint() {
|
||||
warn!("SQLite WAL checkpoint failed: {}", e);
|
||||
}
|
||||
info!("Background tasks stopped, state flushed");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Cross-Cutting Concerns
|
||||
|
||||
### 5.1 Security & Privacy
|
||||
|
||||
- No new attack surface — SQLite file has same permissions as cache directory
|
||||
- Metadata in DB is the same as what's already in the FUSE virtual tree (not new data)
|
||||
- `chunk_manifest` BLOB is binary chunk hashes — not sensitive
|
||||
|
||||
### 5.2 Observability
|
||||
|
||||
- Mount time logged: "Loading metadata from database" with elapsed_ms
|
||||
- First-mount detected and logged: "First mount: scanning origin"
|
||||
- Delta sync summary logged: added/modified/removed/unchanged counts + elapsed
|
||||
- WAL checkpoint logged on shutdown
|
||||
- Manifest persistence failures logged at WARN (non-fatal)
|
||||
|
||||
### 5.3 Scalability
|
||||
|
||||
| Library Size | First Mount (scan) | Subsequent Mount (DB load) |
|
||||
|---|---|---|
|
||||
| 1K files | ~1-2s | <100ms |
|
||||
| 10K files | ~10-20s | ~200ms |
|
||||
| 100K files | ~2-5 min | ~1-2s |
|
||||
| 1M files | ~20-60 min | ~2-4s |
|
||||
|
||||
Delta sync runs in background — mount returns immediately, user sees stale-but-functional data while sync catches up.
|
||||
|
||||
### 5.4 Testing
|
||||
|
||||
```rust
|
||||
// Test: subsequent mount loads from DB
|
||||
#[tokio::test]
|
||||
async fn test_mount_loads_from_db() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let db = Database::open(dir.path().join("test.db")).unwrap();
|
||||
|
||||
// Insert files
|
||||
for i in 0..100 {
|
||||
db.upsert_file(/* ... */).unwrap();
|
||||
}
|
||||
|
||||
// Load all
|
||||
let files = db.list_all_files().unwrap();
|
||||
assert_eq!(files.len(), 100);
|
||||
|
||||
// Build tree from DB files (same as mount path)
|
||||
let mut builder = TreeBuilder::new();
|
||||
for f in &files { builder.add_file(f); }
|
||||
let tree = builder.build();
|
||||
assert_eq!(tree.file_count(), 100);
|
||||
}
|
||||
|
||||
// Test: manifest roundtrip through DB
|
||||
#[tokio::test]
|
||||
async fn test_manifest_persists_and_loads() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let db = Database::open(dir.path().join("test.db")).unwrap();
|
||||
|
||||
let id = db.upsert_file(/* ... */).unwrap();
|
||||
|
||||
let manifest = ChunkManifest { /* ... */ };
|
||||
let blob = manifest.chunks_to_bytes();
|
||||
db.update_manifest(id, &blob).unwrap();
|
||||
|
||||
let loaded = db.get_manifest(id).unwrap().unwrap();
|
||||
let restored = ChunkManifest::from_db(id, 1000, 0, &loaded).unwrap();
|
||||
assert_eq!(restored.chunks.len(), manifest.chunks.len());
|
||||
}
|
||||
|
||||
// Test: first mount detects empty DB
|
||||
#[tokio::test]
|
||||
async fn test_first_mount_detection() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let db = Database::open(dir.path().join("test.db")).unwrap();
|
||||
assert_eq!(db.file_count().unwrap(), 0); // First mount
|
||||
}
|
||||
|
||||
// Test: delta sync detects changes
|
||||
#[tokio::test]
|
||||
async fn test_delta_sync_detects_added_file() {
|
||||
// DB has files A, B
|
||||
// Origin has files A, B, C
|
||||
// Delta sync should detect C as added
|
||||
}
|
||||
|
||||
// Test: delta sync detects removed file
|
||||
#[tokio::test]
|
||||
async fn test_delta_sync_detects_removed_file() {
|
||||
// DB has files A, B, C
|
||||
// Origin has files A, B
|
||||
// Delta sync should detect C as removed
|
||||
}
|
||||
|
||||
// Test: shutdown checkpoints WAL
|
||||
#[tokio::test]
|
||||
async fn test_shutdown_checkpoints_wal() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let db_path = dir.path().join("test.db");
|
||||
let db = Database::open(&db_path).unwrap();
|
||||
db.upsert_file(/* ... */).unwrap();
|
||||
|
||||
// WAL file should exist
|
||||
let wal_path = db_path.with_extension("db-wal");
|
||||
// After checkpoint, WAL should be truncated
|
||||
db.checkpoint().unwrap();
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Alternatives Considered
|
||||
|
||||
### 6.1 sled for Tree Storage (Option B)
|
||||
|
||||
sled is faster for bulk key-value reads (~1-2s for 1M entries vs SQLite's ~2-4s). Rejected because:
|
||||
- SQLite code already exists (schema, CRUD, row mapping)
|
||||
- sled would require new serialization layer (bincode/msgpack for FileMeta)
|
||||
- Two persistence engines is more complex
|
||||
- SQLite's 2-4s is acceptable for the target
|
||||
|
||||
### 6.2 Flat File Snapshot (Option C)
|
||||
|
||||
Fastest possible bulk load (<1s via mmap). Rejected because:
|
||||
- No incremental updates — every change rewrites the entire file
|
||||
- At 1M files (~500MB), delta sync triggers a 500MB write for each changed file
|
||||
- No concurrent access safety
|
||||
- No crash recovery for partial writes
|
||||
|
||||
### 6.3 Lazy Tree Loading
|
||||
|
||||
Instead of loading all files into memory on mount, load only the root directories and fetch deeper levels on demand from SQLite. This would achieve true O(1) mount. Deferred because:
|
||||
- Requires significant refactoring of VirtualTree (currently all-in-memory)
|
||||
- SQLite 2-4s load is good enough for production
|
||||
- Can be added later as optimization without changing the persistence layer
|
||||
|
||||
### 6.4 Separate Manifest Store
|
||||
|
||||
Instead of storing manifests in the `files.chunk_manifest` column, use a separate sled tree or SQLite table. Rejected because the column already exists and the schema already supports it.
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Plan
|
||||
|
||||
### 7.1 Task Sequence
|
||||
|
||||
| Day | Task | Deliverable |
|
||||
|-----|------|-------------|
|
||||
| 1 | Database methods: `list_all_files()`, `update_manifest()`, `get_manifest()`, `list_all_manifests()`, `checkpoint()`. Extract `row_to_file_meta()` helper. | New DB methods + tests |
|
||||
| 2 | Rewrite `run_mount()`: DB load path vs scan path. First-mount detection. | Core mount change |
|
||||
| 3 | Persist manifests: `ManifestCached` event + listener in main.rs. Load manifests on mount via `load_manifests_from_db()`. | Manifest persistence |
|
||||
| 4 | Wire tantivy + PatternStore + CollectionStore into mount path. First-mount indexing. | Search/patterns on mount |
|
||||
| 5 | Background delta sync: compare DB vs origin, update differences. | Delta sync task |
|
||||
| 6 | Shutdown: WAL checkpoint. Upsert files to DB during first-mount scan. | Clean shutdown |
|
||||
| 7 | Integration testing: full mount→read→restart→mount cycle. Verify tree + manifests survive restart. | E2E validation |
|
||||
| 8 | Buffer for issues found during integration. | — |
|
||||
|
||||
### 7.2 Verification Checklist
|
||||
|
||||
- [ ] `cargo check` — zero errors
|
||||
- [ ] `cargo test --workspace --exclude musicfs-grpc` — all pass
|
||||
- [ ] Manual test: first mount (empty cache dir) — scans origin, creates DB
|
||||
- [ ] Manual test: second mount (DB exists) — loads from DB, no origin scan
|
||||
- [ ] Manual test: add file to origin, restart — delta sync discovers it
|
||||
- [ ] Manual test: `kill -9` daemon, restart — DB loads, manifests intact
|
||||
- [ ] Mount time for 10K test files: <1 second on subsequent mount
|
||||
- [ ] `ls -la ~/.cache/musicfs/metadata.db` exists after first mount
|
||||
|
||||
---
|
||||
|
||||
## 8. Files Changed
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `musicfs-cache/src/db.rs` | `list_all_files()`, `update_manifest()`, `get_manifest()`, `list_all_manifests()`, `checkpoint()`, `row_to_file_meta()` refactor |
|
||||
| `musicfs-core/src/events.rs` | Add `ManifestCached` event variant |
|
||||
| `musicfs-cli/src/main.rs` | Rewrite `run_mount()`: DB load vs scan, open tantivy/patterns/collections, manifest listener, delta sync spawn, shutdown checkpoint |
|
||||
| `musicfs-cli/Cargo.toml` | Add `musicfs-search`, `musicfs-cache` dependencies (for PatternStore, CollectionStore, SearchIndex) |
|
||||
| `musicfs-cas/src/fetcher.rs` | Emit `ManifestCached` event after `fetch_file()` |
|
||||
| `musicfs-sync/src/delta.rs` | New `background_delta_sync()` function (or new file) |
|
||||
| `musicfs-test-utils/tests/resilience.rs` | New tests: mount-from-DB, manifest roundtrip, delta sync, first-mount detection |
|
||||
|
||||
---
|
||||
|
||||
## 9. Glossary / References
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **First mount** | Initial mount with empty database — triggers full origin scan |
|
||||
| **Subsequent mount** | Mount with existing database — loads from SQLite |
|
||||
| **Delta sync** | Background task that compares DB state against origin after mount |
|
||||
| **Stale data window** | Time between mount and delta sync completion when data may be outdated |
|
||||
| **WAL checkpoint** | SQLite operation that flushes write-ahead log to main database file |
|
||||
|
||||
| Document | Path |
|
||||
|----------|------|
|
||||
| Persistent state research | [persistent-state.md](persistent-state.md) |
|
||||
| Phase A (signals, shutdown) | [phase-a-stop-dying.md](phase-a-stop-dying.md) |
|
||||
| Phase B (crash recovery) | [phase-b-crash-recovery.md](phase-b-crash-recovery.md) |
|
||||
| Architecture | [architecture.md](../architecture.md) |
|
||||
@@ -0,0 +1,569 @@
|
||||
# Phase A: Stop Dying — Implementation Plan
|
||||
|
||||
**Authors:** AI-assisted
|
||||
**Status:** Draft
|
||||
**Last Updated:** 2026-05-13
|
||||
**Reviewers:** TBD
|
||||
**Approvers:** TBD
|
||||
**Prerequisites:** [resilience-fault-tolerance.md](resilience-fault-tolerance.md), [resilience-testing.md](resilience-testing.md)
|
||||
**Estimated Effort:** ~5 days
|
||||
|
||||
---
|
||||
|
||||
[TOC]
|
||||
|
||||
---
|
||||
|
||||
## 1. Abstract
|
||||
|
||||
Implement the 6 most critical resilience fixes (issues 2.1, 2.2, 2.7, 2.9, 2.10, 3.7 from the [resilience audit](resilience-fault-tolerance.md)) that prevent MusicFS from dying on common operational events: signals, panics, lock poisoning, and systemd lifecycle.
|
||||
|
||||
Issues 2.3 (shutdown orchestration), 2.4 (cache integrity), 2.5 (sync recovery), 2.6 (task supervisor), 2.8 (disk space) are deferred to Phase B — they depend on Phase A infrastructure or on the [persistent state](persistent-state.md) work.
|
||||
|
||||
**Development flow** (TDD, per-issue):
|
||||
1. Create stubs so the codebase compiles
|
||||
2. Write RED tests that express the expected behavior
|
||||
3. Implement the fix
|
||||
4. Verify tests turn GREEN
|
||||
5. Run full test suite — no regressions
|
||||
|
||||
---
|
||||
|
||||
## 2. Background
|
||||
|
||||
MusicFS currently dies on:
|
||||
- Any signal (SIGTERM, SIGINT) — instant death, no cleanup
|
||||
- Any panic in a writer thread — RwLock poisons, all FUSE ops crash
|
||||
- systemd lifecycle — `Type=notify` but no `sd_notify`, ExecStop is a stub
|
||||
- Crash leaves stale FUSE mount — users must manually `fusermount -u`
|
||||
|
||||
The [resilience test crate](../../musicfs/crates/musicfs-test-utils/) and RED tests are already in place. This plan implements the fixes to turn them GREEN.
|
||||
|
||||
---
|
||||
|
||||
## 3. Goals & Non-Goals
|
||||
|
||||
### 3.1 Goals
|
||||
|
||||
- Signal handler catches SIGTERM/SIGINT and initiates clean exit
|
||||
- Panics are logged with full context before process terminates
|
||||
- RwLock poisoning cannot cascade to kill FUSE operations
|
||||
- systemd integration works (`sd_notify READY=1`, `ExecStopPost`)
|
||||
- Stale FUSE mounts are detected and cleaned on startup
|
||||
- All existing 162 tests continue to pass
|
||||
- All Phase A RED tests turn GREEN
|
||||
|
||||
### 3.2 Non-Goals
|
||||
|
||||
- Graceful shutdown orchestration with ordered teardown (Phase B — needs CancellationToken plumbing through all components)
|
||||
- Task supervisor for background task restart (Phase B)
|
||||
- Cache integrity checks on startup (Phase B — needs persistent state)
|
||||
- Disk space monitoring (Phase B)
|
||||
- Interrupted sync recovery (Phase B — needs persistent state)
|
||||
|
||||
---
|
||||
|
||||
## 4. Proposed Design
|
||||
|
||||
### 4.1 Implementation Order
|
||||
|
||||
Dependencies determine the order. Each issue is independent except where noted.
|
||||
|
||||
```
|
||||
4.2 RwLock poison fix (no deps, instant win, unblocks safety)
|
||||
↓
|
||||
4.3 Panic hook (no deps, complements RwLock fix)
|
||||
↓
|
||||
4.4 systemd ExecStopPost (no deps, config-only change)
|
||||
↓
|
||||
4.5 sd_notify integration (no deps, new crate dependency)
|
||||
↓
|
||||
4.6 Signal handling (depends on: FUSE mount change to spawn_mount2)
|
||||
↓
|
||||
4.7 Stale mount detection (depends on: signal handling for clean test)
|
||||
```
|
||||
|
||||
### 4.2 Issue 2.9: RwLock Poison Fix
|
||||
|
||||
**Approach**: Replace `std::sync::RwLock` with `parking_lot::RwLock` in all production paths. `parking_lot` never poisons — a panic in a writer releases the lock and subsequent readers see the pre-panic state.
|
||||
|
||||
**Why parking_lot over poison recovery**: The codebase already uses `parking_lot` in `prefetch.rs` and `index.rs`. Using it everywhere is consistent. The alternative (`.unwrap_or_else(|p| p.into_inner())`) is verbose and error-prone — one missed call re-introduces the bug.
|
||||
|
||||
#### Step 1: Stubs (compile)
|
||||
|
||||
None needed — `parking_lot::RwLock` is a drop-in replacement (same API, no `PoisonError`).
|
||||
|
||||
#### Step 2: RED tests
|
||||
|
||||
Already exist in `tests/resilience.rs`:
|
||||
- `test_poisoned_tree_lock_returns_eio_not_panic` — currently passes (demonstrates the problem)
|
||||
- `test_parking_lot_rwlock_survives_panic` — currently passes (proves the fix works)
|
||||
|
||||
Additional test to add: verify FUSE filesystem survives a writer panic on the tree lock.
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**Files to change:**
|
||||
|
||||
| File | Change |
|
||||
|------|--------|
|
||||
| `musicfs-fuse/src/filesystem.rs` | `use std::sync::RwLock` → `use parking_lot::RwLock`; remove all `.unwrap()` on lock calls (parking_lot returns guard directly, not `Result`) |
|
||||
| `musicfs-cas/src/reader.rs` | Same change for `manifests: RwLock<HashMap<...>>` |
|
||||
| `musicfs-cas/src/fetcher.rs` | Same change for `origins` and `file_meta` locks |
|
||||
| `musicfs-origins/src/registry.rs` | Same change for `origins` and `watch_handles` locks |
|
||||
| `musicfs-cache/src/eviction.rs` | Same change for `access_times` and `hash_to_time` locks |
|
||||
| `musicfs-core/src/metrics.rs` | Same change for histogram locks |
|
||||
| `musicfs-cache/src/tree.rs` | Same change for `last_refresh` lock |
|
||||
|
||||
**Pattern**: In each file:
|
||||
```rust
|
||||
// BEFORE
|
||||
use std::sync::RwLock;
|
||||
let guard = self.tree.read().unwrap();
|
||||
|
||||
// AFTER
|
||||
use parking_lot::RwLock;
|
||||
let guard = self.tree.read(); // No unwrap needed
|
||||
```
|
||||
|
||||
For the `MusicFs` struct in `filesystem.rs`, the `tree` field is `Arc<RwLock<VirtualTree>>` — this is passed in from `main.rs`. Change `main.rs` to use `parking_lot::RwLock` there too.
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test # All 162+ tests pass
|
||||
cargo test -p musicfs-test-utils # Resilience tests pass
|
||||
cargo check # No warnings
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.3 Issue 2.2: Panic Hook
|
||||
|
||||
**Approach**: Install a custom panic hook at daemon startup that logs the panic with `tracing::error!` before the default behavior (abort/unwind). This ensures panics are captured in log files and journald.
|
||||
|
||||
#### Step 1: Stubs
|
||||
|
||||
Add to `musicfs-core/src/lib.rs`:
|
||||
```rust
|
||||
pub fn install_panic_hook() {
|
||||
// stub — will be implemented
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 2: RED tests
|
||||
|
||||
Write in `tests/resilience.rs`:
|
||||
```rust
|
||||
#[test]
|
||||
fn test_panic_hook_logs_to_tracing() {
|
||||
// Install hook with a test tracing subscriber
|
||||
// Trigger panic via catch_unwind
|
||||
// Verify error! log contains panic message + thread name
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-core/src/lib.rs` (or new `musicfs-core/src/panic.rs`)
|
||||
|
||||
```rust
|
||||
pub fn install_panic_hook() {
|
||||
let default_hook = std::panic::take_hook();
|
||||
std::panic::set_hook(Box::new(move |info| {
|
||||
let thread = std::thread::current();
|
||||
let thread_name = thread.name().unwrap_or("<unnamed>");
|
||||
|
||||
let message = if let Some(s) = info.payload().downcast_ref::<&str>() {
|
||||
s.to_string()
|
||||
} else if let Some(s) = info.payload().downcast_ref::<String>() {
|
||||
s.clone()
|
||||
} else {
|
||||
"unknown panic".to_string()
|
||||
};
|
||||
|
||||
let location = info.location().map(|l| format!("{}:{}:{}", l.file(), l.line(), l.column()))
|
||||
.unwrap_or_else(|| "unknown location".to_string());
|
||||
|
||||
tracing::error!(
|
||||
thread = thread_name,
|
||||
location = %location,
|
||||
"PANIC: {}",
|
||||
message
|
||||
);
|
||||
|
||||
default_hook(info);
|
||||
}));
|
||||
}
|
||||
```
|
||||
|
||||
**Call site**: `musicfs-cli/src/main.rs`, at the very top of `main()`:
|
||||
```rust
|
||||
fn main() -> Result<()> {
|
||||
musicfs_core::install_panic_hook();
|
||||
let cli = Cli::parse();
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test -p musicfs-core # Panic hook unit tests
|
||||
cargo test -p musicfs-test-utils # Resilience tests
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.4 Issue 3.7 + 2.7: systemd Service Fix + FUSE Cleanup
|
||||
|
||||
**Approach**: Fix the systemd service file and add stale mount detection on startup.
|
||||
|
||||
#### Step 1: No stubs needed (config change)
|
||||
|
||||
#### Step 2: RED tests
|
||||
|
||||
Already exists: `test_systemd_service_has_execstoppost` — currently fails because service file lacks `ExecStopPost`.
|
||||
|
||||
Add test for stale mount detection:
|
||||
```rust
|
||||
#[test]
|
||||
fn test_stale_mount_check_function_exists() {
|
||||
// Verify the function signature exists
|
||||
// (actual mount test needs privileged environment)
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `dist/musicfs.service`
|
||||
|
||||
```diff
|
||||
ExecStop=/usr/bin/musicfs shutdown
|
||||
+ExecStopPost=/usr/bin/fusermount -uz %h/music || true
|
||||
Restart=on-failure
|
||||
```
|
||||
|
||||
Note: `fusermount -uz` is "lazy unmount" — always succeeds even if mount is busy. The `|| true` prevents systemd from treating cleanup failure as a service failure.
|
||||
|
||||
**File**: `musicfs-cli/src/main.rs` — add stale mount check before mounting:
|
||||
|
||||
```rust
|
||||
fn check_stale_mount(mountpoint: &Path) -> Result<()> {
|
||||
// Check /proc/mounts for existing mount at this path
|
||||
if let Ok(mounts) = std::fs::read_to_string("/proc/mounts") {
|
||||
for line in mounts.lines() {
|
||||
if line.contains(&mountpoint.to_string_lossy().as_ref()) && line.contains("fuse") {
|
||||
warn!("Stale FUSE mount detected at {:?}, attempting cleanup", mountpoint);
|
||||
let status = std::process::Command::new("fusermount")
|
||||
.args(["-uz", &mountpoint.to_string_lossy()])
|
||||
.status();
|
||||
match status {
|
||||
Ok(s) if s.success() => info!("Stale mount cleaned up"),
|
||||
Ok(s) => warn!("fusermount exited with: {}", s),
|
||||
Err(e) => warn!("Failed to run fusermount: {}", e),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
Also fix the `test_systemd_service_has_execstoppost` test path — currently points to wrong location.
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test -p musicfs-test-utils -- test_systemd # Service file test
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.5 Issue 2.10: sd_notify Integration
|
||||
|
||||
**Approach**: Add `sd-notify` crate, call `READY=1` after mount, `STOPPING` on shutdown.
|
||||
|
||||
#### Step 1: Stubs
|
||||
|
||||
Add dependency to `musicfs-cli/Cargo.toml`:
|
||||
```toml
|
||||
sd-notify = "0.4"
|
||||
```
|
||||
|
||||
#### Step 2: RED tests
|
||||
|
||||
Write test that mocks the notify socket:
|
||||
```rust
|
||||
#[test]
|
||||
fn test_sd_notify_ready_sent() {
|
||||
// Create Unix datagram socket at $NOTIFY_SOCKET
|
||||
// Call sd_notify::notify(READY=1)
|
||||
// Verify message received on socket
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-cli/src/main.rs`
|
||||
|
||||
After `fs.mount()` succeeds (or more precisely, after `spawn_mount2` — see 4.6):
|
||||
```rust
|
||||
// Notify systemd we're ready
|
||||
if let Err(e) = sd_notify::notify(false, &[sd_notify::NotifyState::Ready]) {
|
||||
debug!("sd_notify not available (not running under systemd): {}", e);
|
||||
}
|
||||
```
|
||||
|
||||
On shutdown path:
|
||||
```rust
|
||||
let _ = sd_notify::notify(false, &[sd_notify::NotifyState::Stopping]);
|
||||
```
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test -p musicfs-test-utils -- test_sd_notify
|
||||
cargo build -p musicfs-cli # Verify it compiles with new dep
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.6 Issue 2.1: Signal Handling
|
||||
|
||||
**Approach**: Switch from `fuser::mount2` (blocking) to `fuser::spawn_mount2` (background), then listen for signals on the main thread.
|
||||
|
||||
This is the most complex change in Phase A. It restructures the daemon's main loop.
|
||||
|
||||
#### Step 1: Stubs
|
||||
|
||||
Change `MusicFs::mount()` signature to return a session handle:
|
||||
|
||||
```rust
|
||||
// BEFORE
|
||||
pub fn mount(self, mountpoint: &Path) -> Result<()> {
|
||||
fuser::mount2(self, mountpoint, &options)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
// AFTER (stub — returns BackgroundSession)
|
||||
pub fn spawn_mount(self, mountpoint: &Path) -> Result<fuser::BackgroundSession> {
|
||||
let session = fuser::spawn_mount2(self, mountpoint, &options)?;
|
||||
Ok(session)
|
||||
}
|
||||
```
|
||||
|
||||
Keep old `mount()` temporarily for compatibility.
|
||||
|
||||
#### Step 2: RED tests
|
||||
|
||||
Write in `tests/resilience.rs`:
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_sigterm_triggers_shutdown() {
|
||||
// Spawn daemon as child process
|
||||
// Wait for mount
|
||||
// Send SIGTERM
|
||||
// Verify clean exit within 10s
|
||||
// Verify mountpoint is unmounted
|
||||
}
|
||||
```
|
||||
|
||||
This test requires the signal handler to exist. It will be RED until implementation.
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-cli/src/main.rs` — rewrite `run_mount()`:
|
||||
|
||||
```rust
|
||||
fn run_mount(mountpoint: PathBuf, origin_path: Option<PathBuf>, cache_dir: Option<PathBuf>) -> Result<()> {
|
||||
let origin_path = origin_path.context("--origin is required")?;
|
||||
let runtime = tokio::runtime::Runtime::new()?;
|
||||
let handle = runtime.handle().clone();
|
||||
|
||||
let (tree, reader) = runtime.block_on(async {
|
||||
// ... existing setup code (unchanged) ...
|
||||
Ok::<_, anyhow::Error>((tree, reader))
|
||||
})?;
|
||||
|
||||
// Check for stale mount before mounting
|
||||
check_stale_mount(&mountpoint)?;
|
||||
|
||||
let fs = MusicFs::with_reader(tree, reader, handle.clone());
|
||||
info!("Mounting filesystem at {:?}", mountpoint);
|
||||
|
||||
// spawn_mount2 returns immediately — FUSE runs in background
|
||||
let session = fs.spawn_mount(&mountpoint)
|
||||
.context("Failed to mount filesystem")?;
|
||||
|
||||
// Notify systemd
|
||||
let _ = sd_notify::notify(false, &[sd_notify::NotifyState::Ready]);
|
||||
info!("MusicFS ready, PID {}", std::process::id());
|
||||
|
||||
// Block on signal
|
||||
runtime.block_on(async {
|
||||
let mut sigterm = tokio::signal::unix::signal(
|
||||
tokio::signal::unix::SignalKind::terminate()
|
||||
)?;
|
||||
let mut sigint = tokio::signal::unix::signal(
|
||||
tokio::signal::unix::SignalKind::interrupt()
|
||||
)?;
|
||||
|
||||
tokio::select! {
|
||||
_ = sigterm.recv() => {
|
||||
info!("Received SIGTERM, shutting down");
|
||||
}
|
||||
_ = sigint.recv() => {
|
||||
info!("Received SIGINT, shutting down");
|
||||
}
|
||||
}
|
||||
|
||||
Ok::<_, anyhow::Error>(())
|
||||
})?;
|
||||
|
||||
// Shutdown sequence
|
||||
let _ = sd_notify::notify(false, &[sd_notify::NotifyState::Stopping]);
|
||||
info!("Unmounting filesystem");
|
||||
drop(session); // BackgroundSession::drop() calls unmount
|
||||
info!("Shutdown complete");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
**File**: `musicfs-fuse/src/filesystem.rs` — add `spawn_mount()`:
|
||||
|
||||
```rust
|
||||
pub fn spawn_mount(self, mountpoint: &Path) -> Result<fuser::BackgroundSession> {
|
||||
info!("Mounting MusicFS at {:?}", mountpoint);
|
||||
let options = vec![
|
||||
fuser::MountOption::RO,
|
||||
fuser::MountOption::FSName("musicfs".to_string()),
|
||||
fuser::MountOption::AutoUnmount,
|
||||
fuser::MountOption::AllowOther,
|
||||
];
|
||||
let session = fuser::spawn_mount2(self, mountpoint, &options)
|
||||
.map_err(musicfs_core::Error::Io)?;
|
||||
Ok(session)
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo build -p musicfs-cli
|
||||
cargo test -p musicfs-test-utils -- test_sigterm # Process-level test
|
||||
cargo test # No regressions
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Cross-Cutting Concerns
|
||||
|
||||
### 5.1 Security & Privacy
|
||||
|
||||
- No new attack surface — changes are internal lifecycle management
|
||||
- Panic hook does NOT log sensitive data (only panic message, thread name, location)
|
||||
- `sd_notify` uses existing systemd socket — no new IPC
|
||||
|
||||
### 5.2 Observability
|
||||
|
||||
- Panic hook ensures all panics are captured in logs/journald
|
||||
- Signal handling logs which signal triggered shutdown
|
||||
- sd_notify gives systemd accurate service state
|
||||
- Stale mount detection logs cleanup attempts
|
||||
|
||||
### 5.3 Testing
|
||||
|
||||
All changes follow the TDD flow:
|
||||
1. Stubs compile
|
||||
2. RED tests document expected behavior
|
||||
3. Implementation turns tests GREEN
|
||||
4. Full suite passes (no regressions)
|
||||
|
||||
---
|
||||
|
||||
## 6. Alternatives Considered
|
||||
|
||||
### 6.1 Poison Recovery Instead of parking_lot
|
||||
|
||||
**Alternative**: Keep `std::sync::RwLock`, add `.unwrap_or_else(|p| p.into_inner())` to every lock call.
|
||||
|
||||
**Rejected**: 30+ call sites to change, easy to miss one, and the pattern is verbose. `parking_lot` is already a dependency and is strictly better for this use case (faster, no poison, correct API).
|
||||
|
||||
### 6.2 Keep mount2 (blocking) with Signal Thread
|
||||
|
||||
**Alternative**: Keep `fuser::mount2`, spawn a separate thread for signal handling, use a channel to communicate shutdown.
|
||||
|
||||
**Rejected**: `mount2` consumes `self` and blocks — there's no clean way to interrupt it from another thread. `spawn_mount2` is the canonical solution from the `fuser` crate.
|
||||
|
||||
### 6.3 Defer sd_notify Until Full Shutdown Orchestration
|
||||
|
||||
**Alternative**: Implement sd_notify only after CancellationToken + graceful shutdown are in place.
|
||||
|
||||
**Rejected**: sd_notify `READY=1` is critical now — without it, `Type=notify` in the service file means systemd will timeout and kill the daemon on every start. The shutdown `STOPPING` notification is a bonus but not required for Phase A.
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Plan
|
||||
|
||||
### 7.1 Task Sequence
|
||||
|
||||
| Day | Task | Issue | Effort | Test Approach |
|
||||
|-----|------|-------|--------|---------------|
|
||||
| 1 (morning) | RwLock → parking_lot migration | 2.9 | 2h | Existing GREEN test validates; verify no `.unwrap()` on locks |
|
||||
| 1 (afternoon) | Panic hook | 2.2 | 2h | New test: panic → verify tracing output |
|
||||
| 2 (morning) | systemd ExecStopPost + stale mount check | 3.7 + 2.7 | 2h | Existing RED test → GREEN; new stale mount test |
|
||||
| 2 (afternoon) | sd_notify integration | 2.10 | 2h | New test: mock socket → verify READY=1 |
|
||||
| 3 | Signal handling (spawn_mount2 + signal loop) | 2.1 | 4h | Fork daemon → send SIGTERM → verify exit |
|
||||
| 4 | Integration + regression testing | — | 4h | Full `cargo test`, manual FUSE mount test |
|
||||
| 5 | Buffer for issues found during integration | — | 4h | — |
|
||||
|
||||
### 7.2 Verification Checklist
|
||||
|
||||
After all tasks complete:
|
||||
|
||||
- [ ] `cargo check` — zero errors, zero warnings
|
||||
- [ ] `cargo test` — all 162+ existing tests pass
|
||||
- [ ] `cargo test -p musicfs-test-utils` — all resilience tests pass
|
||||
- [ ] `cargo clippy` — no new warnings
|
||||
- [ ] `grep -r '\.read()\.unwrap()\|\.write()\.unwrap()' crates/` — zero hits in production code (test code is OK)
|
||||
- [ ] `dist/musicfs.service` contains `ExecStopPost`
|
||||
- [ ] Manual test: `musicfs mount`, then `kill -TERM <pid>`, verify clean exit + mount gone
|
||||
- [ ] Manual test: `kill -9 <pid>`, then `musicfs mount` again — no "already mounted" error
|
||||
|
||||
---
|
||||
|
||||
## 8. Files Changed
|
||||
|
||||
| File | Change | Issue |
|
||||
|------|--------|-------|
|
||||
| `musicfs-fuse/src/filesystem.rs` | `std::sync::RwLock` → `parking_lot::RwLock`; add `spawn_mount()` | 2.9, 2.1 |
|
||||
| `musicfs-cas/src/reader.rs` | `std::sync::RwLock` → `parking_lot::RwLock` | 2.9 |
|
||||
| `musicfs-cas/src/fetcher.rs` | `std::sync::RwLock` → `parking_lot::RwLock` | 2.9 |
|
||||
| `musicfs-origins/src/registry.rs` | `std::sync::RwLock` → `parking_lot::RwLock` | 2.9 |
|
||||
| `musicfs-cache/src/eviction.rs` | `std::sync::RwLock` → `parking_lot::RwLock` | 2.9 |
|
||||
| `musicfs-cache/src/tree.rs` | `std::sync::RwLock` → `parking_lot::RwLock` | 2.9 |
|
||||
| `musicfs-core/src/metrics.rs` | `std::sync::RwLock` → `parking_lot::RwLock` | 2.9 |
|
||||
| `musicfs-core/src/lib.rs` | Add `install_panic_hook()` | 2.2 |
|
||||
| `musicfs-cli/src/main.rs` | Panic hook, signal handler, spawn_mount2, sd_notify, stale mount check | 2.1, 2.2, 2.7, 2.10 |
|
||||
| `musicfs-cli/Cargo.toml` | Add `sd-notify`, `tokio-util` deps | 2.10, 2.1 |
|
||||
| `dist/musicfs.service` | Add `ExecStopPost`, fix `ExecStop` | 3.7 |
|
||||
| `tests/resilience.rs` | Update/add tests for signal, panic hook, sd_notify | all |
|
||||
|
||||
---
|
||||
|
||||
## 9. Glossary / References
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **parking_lot** | Fast, poison-free lock implementation. Already a project dependency. |
|
||||
| **spawn_mount2** | `fuser` API that mounts FUSE in a background thread, returning a `BackgroundSession` handle |
|
||||
| **sd_notify** | systemd notification protocol. `READY=1` signals service started, `STOPPING` signals shutdown. |
|
||||
| **BackgroundSession** | Handle returned by `spawn_mount2`. Dropping it unmounts the filesystem. |
|
||||
|
||||
| Document | Path |
|
||||
|----------|------|
|
||||
| Resilience audit | [resilience-fault-tolerance.md](resilience-fault-tolerance.md) |
|
||||
| Resilience testing | [resilience-testing.md](resilience-testing.md) |
|
||||
| Architecture | [architecture.md](../architecture.md) |
|
||||
@@ -0,0 +1,830 @@
|
||||
# Phase B: Crash Recovery — Implementation Plan
|
||||
|
||||
**Authors:** AI-assisted
|
||||
**Status:** Draft
|
||||
**Last Updated:** 2026-05-13
|
||||
**Reviewers:** TBD
|
||||
**Approvers:** TBD
|
||||
**Prerequisites:** [phase-a-stop-dying.md](phase-a-stop-dying.md) (completed), [resilience-fault-tolerance.md](resilience-fault-tolerance.md)
|
||||
**Estimated Effort:** ~5 days
|
||||
|
||||
---
|
||||
|
||||
[TOC]
|
||||
|
||||
---
|
||||
|
||||
## 1. Abstract
|
||||
|
||||
Phase A made the daemon survive signals and panics. Phase B makes it **recover from crashes** — startup integrity checks for all storage layers (SQLite, tantivy, sled), graceful shutdown with ordered teardown of background tasks, disk space pre-checks, and a task supervisor that restarts dead background tasks.
|
||||
|
||||
This covers issues 2.3, 2.4, 2.6, and 2.8 from the [resilience audit](resilience-fault-tolerance.md), deferred from Phase A.
|
||||
|
||||
Issue 2.5 (interrupted sync recovery) is deferred to after [persistent state](persistent-state.md) is wired up — checkpoint/resume requires the DB to be in the mount path.
|
||||
|
||||
**RED tests to turn GREEN** (from current `resilience.rs`):
|
||||
- `test_sqlite_integrity_check_detects_corruption` — currently `todo!()`
|
||||
- `test_tantivy_corruption_triggers_rebuild` — currently `todo!()`
|
||||
- `test_sled_corruption_triggers_repair` — currently `todo!()`
|
||||
- `test_cas_put_handles_enospc` — currently fails (no size pre-check)
|
||||
- `test_tantivy_survives_uncommitted_crash` — currently `todo!()`
|
||||
|
||||
**New tests to write:**
|
||||
- Shutdown orchestration: CancellationToken propagation, ordered teardown, tantivy flush
|
||||
- Task supervisor: panic detection, restart with backoff, status reporting
|
||||
|
||||
---
|
||||
|
||||
## 2. Background
|
||||
|
||||
### 2.1 What Phase A Delivered
|
||||
|
||||
- Signal handling via `spawn_mount2` + tokio signal loop ✅
|
||||
- Panic hook logging via `tracing::error!` ✅
|
||||
- RwLock → `parking_lot` (no more poison cascade) ✅
|
||||
- sd_notify READY/STOPPING ✅
|
||||
- ExecStopPost + stale mount detection ✅
|
||||
|
||||
### 2.2 What's Still Broken After Phase A
|
||||
|
||||
The daemon now **stops cleanly** on signals but:
|
||||
|
||||
1. **Shutdown is unordered** — `drop(session)` unmounts FUSE, but background tasks (health monitor, indexer, watcher, prefetcher) are killed mid-operation by runtime drop. No tantivy flush, no SQLite checkpoint.
|
||||
|
||||
2. **No startup integrity checks** — if the daemon was `kill -9`'d (or OOM-killed, power loss), SQLite/tantivy/sled may have partial writes. Currently these propagate as runtime errors or silent corruption.
|
||||
|
||||
3. **Background tasks are fire-and-forget** — health monitor, watcher, indexer, prefetcher use `tokio::spawn` with no `JoinHandle` stored. If a task panics, it's silently dead.
|
||||
|
||||
4. **CAS accepts oversized writes** — `put()` doesn't check `max_size` before writing. Cache grows unbounded.
|
||||
|
||||
---
|
||||
|
||||
## 3. Goals & Non-Goals
|
||||
|
||||
### 3.1 Goals
|
||||
|
||||
- Graceful shutdown flushes tantivy, checkpoints SQLite WAL, stops background tasks in order
|
||||
- Corrupted SQLite detected on open via `PRAGMA integrity_check`
|
||||
- Corrupted tantivy index detected and rebuilt from scratch
|
||||
- Corrupted sled index detected and repaired
|
||||
- CAS rejects writes that would exceed `max_size`
|
||||
- Background tasks are supervised — panics detected, critical tasks restarted
|
||||
- All 5 RED tests turn GREEN
|
||||
- All new tests for shutdown + supervisor are GREEN
|
||||
|
||||
### 3.2 Non-Goals
|
||||
|
||||
- Interrupted sync recovery (2.5) — depends on persistent state work
|
||||
- Disk space monitoring daemon (periodic `statvfs`) — Phase C
|
||||
- Connection pooling, config reload, watchdog — Phase C/D
|
||||
- Passthrough mode when cache dies — Phase F
|
||||
|
||||
---
|
||||
|
||||
## 4. Proposed Design
|
||||
|
||||
### 4.1 Implementation Order
|
||||
|
||||
```
|
||||
4.2 CAS size pre-check (no deps, simplest fix)
|
||||
↓
|
||||
4.3 SQLite integrity check (no deps)
|
||||
↓
|
||||
4.4 tantivy corruption recovery (no deps)
|
||||
↓
|
||||
4.5 sled corruption recovery (no deps)
|
||||
↓
|
||||
4.6 Graceful shutdown orchestration (depends on: Phase A signal handler)
|
||||
↓
|
||||
4.7 Task supervisor (depends on: 4.6 CancellationToken)
|
||||
```
|
||||
|
||||
### 4.2 Issue 2.8: CAS Size Pre-Check
|
||||
|
||||
**Problem**: `CasStore::put()` writes data without checking if it would exceed `max_size`. The existing test `test_cas_put_handles_enospc` creates a store with `max_size: 100` and writes 1000 bytes — currently succeeds when it should fail.
|
||||
|
||||
#### Step 1: Stubs — none needed
|
||||
|
||||
#### Step 2: RED test — already exists
|
||||
|
||||
```rust
|
||||
// Currently FAILS — this is what we need to fix
|
||||
#[tokio::test]
|
||||
async fn test_cas_put_handles_enospc() {
|
||||
let store = CasStore::open(CasConfig { max_size: 100, ... }).await.unwrap();
|
||||
let large_data = vec![0u8; 1000];
|
||||
let result = store.put(&large_data).await;
|
||||
assert!(result.is_err());
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-cas/src/store.rs` — add size check at top of `put()`:
|
||||
|
||||
```rust
|
||||
pub async fn put(&self, data: &[u8]) -> Result<ChunkHash, CasError> {
|
||||
let hash = ChunkHash::from_bytes(data);
|
||||
let path = self.chunk_path(&hash);
|
||||
|
||||
if path.exists() {
|
||||
trace!(hash = %hash, size_bytes = data.len(), "dedup hit");
|
||||
return Ok(hash);
|
||||
}
|
||||
|
||||
// NEW: Pre-check size limit
|
||||
if self.config.max_size > 0 {
|
||||
let new_size = self.current_size.load(Ordering::SeqCst) + data.len() as u64;
|
||||
if new_size > self.config.max_size {
|
||||
warn!(
|
||||
current_size = self.current_size.load(Ordering::SeqCst),
|
||||
chunk_size = data.len(),
|
||||
max_size = self.config.max_size,
|
||||
"CAS store full, rejecting write"
|
||||
);
|
||||
return Err(CasError::StoreFull {
|
||||
current: self.current_size.load(Ordering::SeqCst),
|
||||
max: self.config.max_size,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// ... rest of put() unchanged
|
||||
}
|
||||
```
|
||||
|
||||
Also add new error variant:
|
||||
|
||||
```rust
|
||||
pub enum CasError {
|
||||
// ... existing variants
|
||||
#[error("Store full: {current} / {max} bytes")]
|
||||
StoreFull { current: u64, max: u64 },
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test -p musicfs-test-utils --test resilience -- test_cas_put_handles_enospc
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.3 Issue 2.4 (part 1): SQLite Integrity Check
|
||||
|
||||
**Problem**: `Database::open()` runs schema but no integrity check. After crash, corrupt pages serve bad data silently.
|
||||
|
||||
#### Step 1: Stubs
|
||||
|
||||
Add to `musicfs-cache/src/db.rs`:
|
||||
|
||||
```rust
|
||||
pub fn open_with_integrity_check(path: &Path) -> Result<Self> {
|
||||
todo!()
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 2: RED test — already exists as `todo!()`
|
||||
|
||||
Replace the `todo!()` with a real test:
|
||||
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_sqlite_integrity_check_detects_corruption() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let db_path = dir.path().join("test.db");
|
||||
|
||||
// Create valid DB with data
|
||||
{
|
||||
let db = Database::open(&db_path).unwrap();
|
||||
db.upsert_file(
|
||||
&OriginId::from("test"),
|
||||
Path::new("/test.flac"),
|
||||
&VirtualPath::new("/Test.flac"),
|
||||
&AudioMeta::default(),
|
||||
UNIX_EPOCH,
|
||||
1000,
|
||||
).unwrap();
|
||||
}
|
||||
|
||||
// Corrupt the file
|
||||
let mut data = std::fs::read(&db_path).unwrap();
|
||||
let mid = data.len() / 2;
|
||||
data[mid..mid+100].fill(0xFF);
|
||||
std::fs::write(&db_path, &data).unwrap();
|
||||
|
||||
// open_with_integrity_check should detect corruption
|
||||
let result = Database::open_with_integrity_check(&db_path);
|
||||
assert!(result.is_err());
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-cache/src/db.rs`
|
||||
|
||||
```rust
|
||||
pub fn open_with_integrity_check(path: &Path) -> Result<Self> {
|
||||
debug!(?path, "Opening database with integrity check");
|
||||
|
||||
let conn = Connection::open(path)
|
||||
.map_err(|e| Error::Database(format!("open failed: {}", e)))?;
|
||||
|
||||
// Quick integrity check — verifies page-level consistency
|
||||
let integrity: String = conn
|
||||
.query_row("PRAGMA integrity_check(1)", [], |row| row.get(0))
|
||||
.map_err(|e| Error::Database(format!("integrity check failed: {}", e)))?;
|
||||
|
||||
if integrity != "ok" {
|
||||
warn!(path = ?path, result = %integrity, "Database integrity check failed");
|
||||
return Err(Error::DatabaseCorrupted(format!(
|
||||
"integrity check failed: {}", integrity
|
||||
)));
|
||||
}
|
||||
|
||||
conn.execute_batch(SCHEMA)
|
||||
.map_err(|e| Error::Database(format!("schema init failed: {}", e)))?;
|
||||
|
||||
let db = Self { conn: Arc::new(Mutex::new(conn)) };
|
||||
let count = db.file_count().unwrap_or(0);
|
||||
info!(path = ?path, file_count = count, "Database opened (integrity verified)");
|
||||
Ok(db)
|
||||
}
|
||||
```
|
||||
|
||||
Also add the error variant to `musicfs-core/src/error.rs`:
|
||||
|
||||
```rust
|
||||
pub enum Error {
|
||||
// ... existing
|
||||
#[error("Database corrupted: {0}")]
|
||||
DatabaseCorrupted(String),
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test -p musicfs-test-utils --test resilience -- test_sqlite_integrity
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.4 Issue 2.4 (part 2): tantivy Corruption Recovery
|
||||
|
||||
**Problem**: If tantivy `meta.json` or segment files are corrupted, `Index::open_in_dir()` panics or returns an error. No recovery path — daemon crashes.
|
||||
|
||||
#### Step 1: Stubs
|
||||
|
||||
Add to `musicfs-search/src/index.rs`:
|
||||
|
||||
```rust
|
||||
pub fn open_with_recovery(index_path: &Path) -> Result<Self, SearchError> {
|
||||
todo!()
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 2: RED test — replace `todo!()` with real test
|
||||
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_tantivy_corruption_triggers_rebuild() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let index_path = dir.path().join("search_idx");
|
||||
|
||||
// Create valid index with data
|
||||
{
|
||||
let index = SearchIndex::open(&index_path).unwrap();
|
||||
index.index_file(&make_file_meta(1, "/a.flac", 1000)).unwrap();
|
||||
index.commit().unwrap();
|
||||
}
|
||||
|
||||
// Corrupt meta.json
|
||||
std::fs::write(index_path.join("meta.json"), b"corrupted").unwrap();
|
||||
|
||||
// open_with_recovery should detect corruption and rebuild empty
|
||||
let index = SearchIndex::open_with_recovery(&index_path).unwrap();
|
||||
let results = index.search("a", 10).unwrap();
|
||||
assert_eq!(results.len(), 0); // Rebuilt empty but functional
|
||||
}
|
||||
```
|
||||
|
||||
Also replace the tantivy crash test `todo!()`:
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn test_tantivy_survives_uncommitted_crash() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let index_path = dir.path().join("search_idx");
|
||||
|
||||
{
|
||||
let index = SearchIndex::open(&index_path).unwrap();
|
||||
index.index_file(&make_file_meta(1, "/a.flac", 1000)).unwrap();
|
||||
index.commit().unwrap();
|
||||
// Write without commit, then "crash" (drop without commit)
|
||||
index.index_file(&make_file_meta(2, "/b.flac", 1000)).unwrap();
|
||||
// mem::forget would leak, just drop naturally
|
||||
}
|
||||
|
||||
let index = SearchIndex::open(&index_path).unwrap();
|
||||
let results = index.search("a", 10).unwrap();
|
||||
assert_eq!(results.len(), 1); // Committed doc survives
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-search/src/index.rs`
|
||||
|
||||
```rust
|
||||
pub fn open_with_recovery(index_path: &Path) -> Result<Self, SearchError> {
|
||||
match Self::open(index_path) {
|
||||
Ok(index) => {
|
||||
// Verify index is functional with a simple search
|
||||
match index.reader.searcher().num_docs() {
|
||||
docs => {
|
||||
info!(docs, "Search index opened successfully");
|
||||
Ok(index)
|
||||
}
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
warn!(
|
||||
error = %e,
|
||||
path = ?index_path,
|
||||
"Search index corrupted, rebuilding from scratch"
|
||||
);
|
||||
// Delete corrupted index
|
||||
if index_path.exists() {
|
||||
std::fs::remove_dir_all(index_path)
|
||||
.map_err(|e| SearchError::Io(e))?;
|
||||
}
|
||||
// Create fresh index
|
||||
Self::open(index_path)
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test -p musicfs-test-utils --test resilience -- test_tantivy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.5 Issue 3.5: sled Corruption Recovery
|
||||
|
||||
**Problem**: `sled::open()` on a corrupted DB returns `sled::Error::Corruption` which propagates as `CasError::Sled` and crashes the daemon on startup.
|
||||
|
||||
#### Step 1: Stubs — none needed, modify existing `open()`
|
||||
|
||||
#### Step 2: RED test — replace `todo!()`
|
||||
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_sled_corruption_triggers_repair() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let chunks_dir = dir.path().join("chunks");
|
||||
let config = CasConfig { chunks_dir: chunks_dir.clone(), max_size: 10_000_000, shard_levels: 2 };
|
||||
|
||||
// Create valid store with data
|
||||
{
|
||||
let store = CasStore::open(config.clone()).await.unwrap();
|
||||
store.put(b"test data").await.unwrap();
|
||||
}
|
||||
|
||||
// Corrupt sled index files
|
||||
let sled_dir = chunks_dir.join("index.sled");
|
||||
if sled_dir.exists() {
|
||||
for entry in std::fs::read_dir(&sled_dir).unwrap() {
|
||||
let entry = entry.unwrap();
|
||||
if entry.metadata().unwrap().is_file() {
|
||||
std::fs::write(entry.path(), b"corrupted").unwrap();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Re-open should recover (repair or recreate)
|
||||
let result = CasStore::open(config).await;
|
||||
assert!(result.is_ok(), "sled should recover from corruption");
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-cas/src/store.rs` — modify `open()`:
|
||||
|
||||
```rust
|
||||
pub async fn open(config: CasConfig) -> Result<Self, CasError> {
|
||||
fs::create_dir_all(&config.chunks_dir).await?;
|
||||
|
||||
let index_path = config.chunks_dir.join("index.sled");
|
||||
let index = match sled::open(&index_path) {
|
||||
Ok(db) => db,
|
||||
Err(e) => {
|
||||
warn!(error = %e, path = ?index_path, "sled index corrupted, attempting recovery");
|
||||
|
||||
// Try repair
|
||||
match sled::Config::new().path(&index_path).repair(true).open() {
|
||||
Ok(db) => {
|
||||
info!("sled index repaired successfully");
|
||||
db
|
||||
}
|
||||
Err(repair_err) => {
|
||||
warn!(error = %repair_err, "sled repair failed, recreating index");
|
||||
// Delete and recreate
|
||||
if index_path.exists() {
|
||||
std::fs::remove_dir_all(&index_path)
|
||||
.map_err(|e| CasError::Io(e))?;
|
||||
}
|
||||
sled::open(&index_path)?
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
let current_size = Self::calculate_size(&config.chunks_dir).await;
|
||||
|
||||
Ok(Self {
|
||||
config,
|
||||
index,
|
||||
current_size: AtomicU64::new(current_size),
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test -p musicfs-test-utils --test resilience -- test_sled_corruption
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.6 Issue 2.3: Graceful Shutdown Orchestration
|
||||
|
||||
**Problem**: On signal, `drop(session)` unmounts FUSE, then `drop(runtime)` kills all tokio tasks abruptly. No tantivy flush, no SQLite WAL checkpoint, no ordered task shutdown.
|
||||
|
||||
**Approach**: `CancellationToken` from `tokio_util` propagated to all background tasks. Signal triggers token cancellation, then ordered shutdown.
|
||||
|
||||
#### Step 1: Add dependency
|
||||
|
||||
```toml
|
||||
# musicfs-cli/Cargo.toml
|
||||
tokio-util = { version = "0.7", features = ["rt"] }
|
||||
```
|
||||
|
||||
#### Step 2: Tests
|
||||
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_shutdown_cancels_background_tasks() {
|
||||
let token = CancellationToken::new();
|
||||
let stopped = Arc::new(AtomicBool::new(false));
|
||||
let stopped_clone = stopped.clone();
|
||||
let token_clone = token.clone();
|
||||
|
||||
tokio::spawn(async move {
|
||||
token_clone.cancelled().await;
|
||||
stopped_clone.store(true, Ordering::SeqCst);
|
||||
});
|
||||
|
||||
assert!(!stopped.load(Ordering::SeqCst));
|
||||
token.cancel();
|
||||
tokio::time::sleep(Duration::from_millis(50)).await;
|
||||
assert!(stopped.load(Ordering::SeqCst));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_shutdown_flushes_tantivy() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let index = SearchIndex::open(dir.path().join("idx")).unwrap();
|
||||
|
||||
index.index_file(&make_file_meta(1, "/a.flac", 1000)).unwrap();
|
||||
// Graceful shutdown should commit
|
||||
index.commit().unwrap();
|
||||
|
||||
let index2 = SearchIndex::open(dir.path().join("idx")).unwrap();
|
||||
assert_eq!(index2.search("a", 10).unwrap().len(), 1);
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-cli/src/main.rs` — restructure the signal loop:
|
||||
|
||||
The current code:
|
||||
```rust
|
||||
// Wait for signal
|
||||
runtime.block_on(async { ... signal select ... })?;
|
||||
// Drop session, exit
|
||||
```
|
||||
|
||||
Change to:
|
||||
```rust
|
||||
let shutdown_token = CancellationToken::new();
|
||||
|
||||
// TODO: Pass token to health monitor, watcher, indexer, prefetcher
|
||||
// (requires their start() methods to accept CancellationToken)
|
||||
// For now, we just use it for the shutdown sequence
|
||||
|
||||
runtime.block_on(async {
|
||||
// ... signal select ...
|
||||
|
||||
// Ordered shutdown
|
||||
info!("Beginning ordered shutdown");
|
||||
shutdown_token.cancel();
|
||||
|
||||
// Wait briefly for tasks to notice cancellation
|
||||
tokio::time::sleep(Duration::from_millis(500)).await;
|
||||
|
||||
// Flush search index if available
|
||||
// (requires SearchIndex to be accessible — currently not wired in main.rs)
|
||||
|
||||
info!("Background tasks stopped");
|
||||
})?;
|
||||
```
|
||||
|
||||
**Note**: Full CancellationToken propagation through health monitor, watcher, indexer, and prefetcher `start()` methods requires changing their signatures. The current `mpsc::channel<()>` stop mechanism in each task should be replaced with or supplemented by the token. This can be done incrementally — start by adding the token to `run_mount()`, then wire it into each task as they're touched.
|
||||
|
||||
For this phase, the minimum viable change is:
|
||||
1. Create the token in `run_mount()`
|
||||
2. Cancel it on signal
|
||||
3. Add a brief sleep for tasks to notice
|
||||
4. The existing `drop(session)` and runtime drop handle cleanup
|
||||
|
||||
Full per-task CancellationToken wiring is tracked as follow-up work.
|
||||
|
||||
---
|
||||
|
||||
### 4.7 Issue 2.6: Task Supervisor
|
||||
|
||||
**Problem**: 13 `tokio::spawn()` calls with no `JoinHandle` stored. Dead tasks go unnoticed.
|
||||
|
||||
**Approach**: New `TaskSupervisor` struct in `musicfs-core` that stores handles, checks liveness, and restarts critical tasks.
|
||||
|
||||
#### Step 1: Stubs
|
||||
|
||||
**File**: `musicfs-core/src/supervisor.rs` (new file)
|
||||
|
||||
```rust
|
||||
pub struct TaskSupervisor { ... }
|
||||
|
||||
pub enum TaskStatus {
|
||||
Running,
|
||||
Failed { error: String, at: Instant },
|
||||
Restarting { attempt: u32 },
|
||||
Stopped,
|
||||
}
|
||||
|
||||
impl TaskSupervisor {
|
||||
pub fn new() -> Self;
|
||||
pub fn spawn_supervised(&self, name: &str, future: impl Future) -> ();
|
||||
pub fn spawn_critical(&self, name: &str, factory: impl Fn() -> impl Future) -> ();
|
||||
pub fn task_status(&self, name: &str) -> TaskStatus;
|
||||
pub fn check_all(&self) -> Vec<(String, TaskStatus)>;
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 2: Tests
|
||||
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_supervisor_detects_task_completion() {
|
||||
let supervisor = TaskSupervisor::new();
|
||||
supervisor.spawn_supervised("fast", async { /* returns immediately */ });
|
||||
tokio::time::sleep(Duration::from_millis(50)).await;
|
||||
// Task completed normally — should be Stopped, not Failed
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_supervisor_detects_panic() {
|
||||
let supervisor = TaskSupervisor::new();
|
||||
supervisor.spawn_supervised("panicker", async {
|
||||
panic!("boom");
|
||||
});
|
||||
tokio::time::sleep(Duration::from_millis(50)).await;
|
||||
assert!(matches!(supervisor.task_status("panicker"), TaskStatus::Failed { .. }));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_supervisor_restarts_critical_task() {
|
||||
let count = Arc::new(AtomicU32::new(0));
|
||||
let c = count.clone();
|
||||
|
||||
let supervisor = TaskSupervisor::new();
|
||||
supervisor.spawn_critical("restartable", move || {
|
||||
let c = c.clone();
|
||||
async move {
|
||||
let n = c.fetch_add(1, Ordering::SeqCst);
|
||||
if n == 0 { panic!("first run fails"); }
|
||||
// Second run: stay alive
|
||||
loop { tokio::time::sleep(Duration::from_secs(60)).await; }
|
||||
}
|
||||
});
|
||||
|
||||
tokio::time::sleep(Duration::from_secs(2)).await;
|
||||
assert_eq!(count.load(Ordering::SeqCst), 2);
|
||||
assert!(matches!(supervisor.task_status("restartable"), TaskStatus::Running));
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-core/src/supervisor.rs`
|
||||
|
||||
```rust
|
||||
use parking_lot::RwLock;
|
||||
use std::collections::HashMap;
|
||||
use std::sync::Arc;
|
||||
use std::time::{Duration, Instant};
|
||||
use tokio::task::JoinHandle;
|
||||
use tracing::{error, info, warn};
|
||||
|
||||
pub struct TaskSupervisor {
|
||||
tasks: Arc<RwLock<HashMap<String, TaskEntry>>>,
|
||||
}
|
||||
|
||||
struct TaskEntry {
|
||||
handle: JoinHandle<()>,
|
||||
status: TaskStatus,
|
||||
restart_count: u32,
|
||||
last_restart: Option<Instant>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum TaskStatus {
|
||||
Running,
|
||||
Failed { error: String, at: Instant },
|
||||
Restarting { attempt: u32 },
|
||||
Stopped,
|
||||
}
|
||||
|
||||
impl TaskSupervisor {
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
tasks: Arc::new(RwLock::new(HashMap::new())),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn spawn_supervised<F>(&self, name: &str, future: F)
|
||||
where
|
||||
F: std::future::Future<Output = ()> + Send + 'static,
|
||||
{
|
||||
let tasks = self.tasks.clone();
|
||||
let name_owned = name.to_string();
|
||||
|
||||
let handle = tokio::spawn(async move {
|
||||
future.await;
|
||||
});
|
||||
|
||||
// Monitor the handle
|
||||
let tasks_monitor = self.tasks.clone();
|
||||
let name_monitor = name.to_string();
|
||||
let monitor_handle = handle;
|
||||
|
||||
self.tasks.write().insert(
|
||||
name_owned,
|
||||
TaskEntry {
|
||||
handle: monitor_handle,
|
||||
status: TaskStatus::Running,
|
||||
restart_count: 0,
|
||||
last_restart: None,
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
pub fn task_status(&self, name: &str) -> TaskStatus {
|
||||
let mut tasks = self.tasks.write();
|
||||
if let Some(entry) = tasks.get_mut(name) {
|
||||
if entry.handle.is_finished() {
|
||||
entry.status = TaskStatus::Failed {
|
||||
error: "Task exited".into(),
|
||||
at: Instant::now(),
|
||||
};
|
||||
}
|
||||
entry.status.clone()
|
||||
} else {
|
||||
TaskStatus::Stopped
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note**: The full `spawn_critical` with automatic restart requires a task factory (`Fn() -> Future`) pattern. The supervisor spawns a monitor task that awaits the `JoinHandle`, and on failure, calls the factory again with exponential backoff (1s→5s→30s, max 5 restarts). This is the most complex piece — the detailed implementation is in the test code above.
|
||||
|
||||
---
|
||||
|
||||
## 5. Cross-Cutting Concerns
|
||||
|
||||
### 5.1 Security & Privacy
|
||||
|
||||
- `PRAGMA integrity_check` is read-only — no risk to data
|
||||
- sled repair may lose recently-written entries — acceptable for a cache
|
||||
- tantivy rebuild deletes index entirely — no sensitive data exposure (metadata only)
|
||||
|
||||
### 5.2 Observability
|
||||
|
||||
- SQLite integrity check result logged at INFO (ok) or WARN (failed)
|
||||
- sled repair attempts logged at WARN
|
||||
- tantivy rebuild logged at WARN with file count before/after
|
||||
- CAS `StoreFull` error logged at WARN with current/max sizes
|
||||
- Task supervisor logs all state transitions (started, failed, restarting, stopped)
|
||||
|
||||
### 5.3 Testing
|
||||
|
||||
| Test | Status Before | Status After | Issue |
|
||||
|------|---------------|--------------|-------|
|
||||
| `test_cas_put_handles_enospc` | ❌ FAILED | ✅ GREEN | 2.8 |
|
||||
| `test_sqlite_integrity_check_detects_corruption` | ❌ todo!() | ✅ GREEN | 2.4 |
|
||||
| `test_tantivy_corruption_triggers_rebuild` | ❌ todo!() | ✅ GREEN | 2.4 |
|
||||
| `test_tantivy_survives_uncommitted_crash` | ❌ todo!() | ✅ GREEN | 5.2 |
|
||||
| `test_sled_corruption_triggers_repair` | ❌ todo!() | ✅ GREEN | 3.5 |
|
||||
| `test_shutdown_cancels_background_tasks` | NEW | ✅ GREEN | 2.3 |
|
||||
| `test_shutdown_flushes_tantivy` | NEW | ✅ GREEN | 2.3 |
|
||||
| `test_supervisor_detects_panic` | NEW | ✅ GREEN | 2.6 |
|
||||
| `test_supervisor_restarts_critical_task` | NEW | ✅ GREEN | 2.6 |
|
||||
|
||||
---
|
||||
|
||||
## 6. Alternatives Considered
|
||||
|
||||
### 6.1 Full `PRAGMA integrity_check` vs Quick Check
|
||||
|
||||
`PRAGMA integrity_check` scans every page — slow for large DBs (seconds for 1M rows). `PRAGMA integrity_check(1)` stops after the first error — fast enough for startup. We use the quick variant.
|
||||
|
||||
### 6.2 tantivy Repair vs Rebuild
|
||||
|
||||
tantivy has no built-in repair. If `meta.json` is corrupt or segments are missing, the only option is delete + recreate. This is acceptable because the search index can be rebuilt from SQLite metadata (once persistent state is wired up). For now, rebuild produces an empty index.
|
||||
|
||||
### 6.3 sled Repair vs Recreate
|
||||
|
||||
sled has `Config::repair(true)` which attempts to recover. If repair fails, we delete and recreate. After recreation, the index is empty but chunk files still exist on disk — a future reconciliation pass can rebuild the index from chunk files (Phase F).
|
||||
|
||||
### 6.4 Custom Supervisor vs `tokio-graceful` Crate
|
||||
|
||||
`tokio-graceful` provides shutdown coordination but not task restart. Our needs are specific (restart with backoff, status reporting, critical vs non-critical distinction). A custom `TaskSupervisor` is simpler and avoids a dependency for ~100 lines of code.
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Plan
|
||||
|
||||
### 7.1 Task Sequence
|
||||
|
||||
| Day | Task | Issue | Effort | Test |
|
||||
|-----|------|-------|--------|------|
|
||||
| 1 (morning) | CAS size pre-check + `StoreFull` error variant | 2.8 | 1h | `test_cas_put_handles_enospc` → GREEN |
|
||||
| 1 (afternoon) | SQLite `open_with_integrity_check` + `DatabaseCorrupted` error | 2.4 | 2h | `test_sqlite_integrity_check` → GREEN |
|
||||
| 2 (morning) | tantivy `open_with_recovery` (detect + delete + recreate) | 2.4 | 2h | `test_tantivy_corruption` + `test_tantivy_survives_uncommitted_crash` → GREEN |
|
||||
| 2 (afternoon) | sled recovery in `CasStore::open` (repair + fallback recreate) | 3.5 | 2h | `test_sled_corruption` → GREEN |
|
||||
| 3 | Graceful shutdown with CancellationToken | 2.3 | 4h | `test_shutdown_cancels_background_tasks`, `test_shutdown_flushes_tantivy` → GREEN |
|
||||
| 4 | Task supervisor implementation | 2.6 | 4h | `test_supervisor_detects_panic`, `test_supervisor_restarts` → GREEN |
|
||||
| 5 | Integration + regression testing | — | 4h | Full `cargo test`, verify no regressions |
|
||||
|
||||
### 7.2 Verification Checklist
|
||||
|
||||
After all tasks:
|
||||
|
||||
- [ ] `cargo check` — zero errors, zero warnings
|
||||
- [ ] `cargo test --workspace --exclude musicfs-grpc` — all tests pass (exclude pre-existing grpc issue)
|
||||
- [ ] `cargo test -p musicfs-test-utils --test resilience` — 5 previously-RED tests now GREEN
|
||||
- [ ] `cargo clippy` — no new warnings
|
||||
- [ ] Remaining RED tests are only for Phases C-F (health timeout, parallel checks, fd exhaustion, chunk auto-repair, passthrough mode)
|
||||
|
||||
---
|
||||
|
||||
## 8. Files Changed
|
||||
|
||||
| File | Change | Issue |
|
||||
|------|--------|-------|
|
||||
| `musicfs-cas/src/store.rs` | Size pre-check in `put()`, `StoreFull` error, sled recovery in `open()` | 2.8, 3.5 |
|
||||
| `musicfs-cache/src/db.rs` | `open_with_integrity_check()` with `PRAGMA integrity_check(1)` | 2.4 |
|
||||
| `musicfs-core/src/error.rs` | Add `DatabaseCorrupted(String)` variant | 2.4 |
|
||||
| `musicfs-search/src/index.rs` | `open_with_recovery()` — detect, delete, recreate | 2.4 |
|
||||
| `musicfs-core/src/supervisor.rs` | NEW — `TaskSupervisor`, `TaskStatus`, spawn/monitor/restart | 2.6 |
|
||||
| `musicfs-core/src/lib.rs` | Re-export supervisor module | 2.6 |
|
||||
| `musicfs-cli/src/main.rs` | CancellationToken creation, ordered shutdown sequence | 2.3 |
|
||||
| `musicfs-cli/Cargo.toml` | Add `tokio-util` dependency | 2.3 |
|
||||
| `musicfs-test-utils/tests/resilience.rs` | Replace `todo!()` stubs with real tests, add supervisor tests | all |
|
||||
|
||||
---
|
||||
|
||||
## 9. Glossary / References
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **CancellationToken** | `tokio_util::sync::CancellationToken` — cooperative cancellation signal for async tasks |
|
||||
| **PRAGMA integrity_check** | SQLite command that verifies page-level data consistency |
|
||||
| **sled repair** | sled's built-in recovery mode that attempts to reconstruct a corrupted database |
|
||||
| **TaskSupervisor** | New struct that monitors `JoinHandle`s and restarts failed tasks with backoff |
|
||||
| **StoreFull** | New `CasError` variant returned when a write would exceed `max_size` |
|
||||
|
||||
| Document | Path |
|
||||
|----------|------|
|
||||
| Phase A plan | [phase-a-stop-dying.md](phase-a-stop-dying.md) |
|
||||
| Resilience audit | [resilience-fault-tolerance.md](resilience-fault-tolerance.md) |
|
||||
| Resilience testing | [resilience-testing.md](resilience-testing.md) |
|
||||
| Persistent state | [persistent-state.md](persistent-state.md) |
|
||||
@@ -0,0 +1,598 @@
|
||||
# Phase C: Production Hardening — Implementation Plan
|
||||
|
||||
**Authors:** AI-assisted
|
||||
**Status:** Draft
|
||||
**Last Updated:** 2026-05-13
|
||||
**Reviewers:** TBD
|
||||
**Approvers:** TBD
|
||||
**Prerequisites:** [phase-b-crash-recovery.md](phase-b-crash-recovery.md) (completed), [resilience-fault-tolerance.md](resilience-fault-tolerance.md)
|
||||
**Estimated Effort:** ~4 days
|
||||
|
||||
---
|
||||
|
||||
[TOC]
|
||||
|
||||
---
|
||||
|
||||
## 1. Abstract
|
||||
|
||||
Phase C merges the practical items from Phases C and D of the resilience audit into a single implementation pass. It fixes the remaining 6 RED tests and addresses production-critical issues: health check hangs that block all origin monitoring, unbounded FUSE reads that can freeze the filesystem, broken CAS size accounting that disables eviction, and concurrent mount protection.
|
||||
|
||||
**Deferred items** (depend on unimplemented features or low urgency): interrupted sync recovery (needs persistent state), SIGHUP config reload, connection pooling (S3/SFTP are stubs), event bus backpressure, FUSE session recovery, offline mode state machine, DNS failure handling, stale-data awareness.
|
||||
|
||||
**RED tests to turn GREEN:**
|
||||
- `test_local_origin_health_check_has_timeout` (D1)
|
||||
- `test_health_checks_run_in_parallel` (D2)
|
||||
- `test_fd_exhaustion_handling` (E — 5.3)
|
||||
- `test_corrupt_chunk_auto_refetched` (F — 6.4)
|
||||
- `test_missing_chunk_triggers_origin_fetch` (F — 6.4)
|
||||
- `test_passthrough_mode_when_cache_disk_dead` (F — 6.6)
|
||||
|
||||
---
|
||||
|
||||
## 2. Background
|
||||
|
||||
After Phase A+B, the daemon survives signals, recovers from storage corruption on startup, supervises background tasks, and rejects oversized CAS writes. But:
|
||||
|
||||
1. **Health checks hang on dead origins** — `check_one()` calls `origin.health().await` with no timeout. A dead NAS (local origin pointing to network mount) blocks health monitoring for ALL origins because checks run sequentially.
|
||||
|
||||
2. **FUSE reads have no timeout** — `reader.read()` in the FUSE `read()` callback has no timeout. A slow or hung origin blocks the FUSE thread indefinitely.
|
||||
|
||||
3. **CAS size tracking is broken** — `calculate_size()` only scans top-level of `chunks_dir`, missing all chunks in shard subdirectories (`aa/bb/<hash>`). `current_size` is always ~0, eviction never triggers.
|
||||
|
||||
4. **Corrupt chunks return EIO** — when `verify_integrity()` detects a bad chunk, it returns `CasError::IntegrityError`. The reader propagates this as EIO to FUSE. It should auto-re-fetch from origin instead.
|
||||
|
||||
5. **No concurrent mount protection** — two `musicfs mount` commands can run simultaneously, corrupting SQLite and sled.
|
||||
|
||||
6. **fd exhaustion is unhandled** — no graceful behavior when file descriptors run out.
|
||||
|
||||
---
|
||||
|
||||
## 3. Goals & Non-Goals
|
||||
|
||||
### 3.1 Goals
|
||||
|
||||
- Health checks complete within 5 seconds regardless of origin responsiveness
|
||||
- Health checks run in parallel (3 origins checked in ~5s, not ~15s)
|
||||
- FUSE reads timeout after 30 seconds (returns EIO, doesn't hang)
|
||||
- CAS size accounting is correct (recursive shard scan)
|
||||
- Corrupt/missing chunks are auto-re-fetched from origin transparently
|
||||
- PID file prevents concurrent mounts
|
||||
- fd exhaustion produces clean errors, not panics
|
||||
- All 6 remaining RED tests turn GREEN
|
||||
|
||||
### 3.2 Non-Goals
|
||||
|
||||
- Interrupted sync recovery (C1) — blocked on persistent state
|
||||
- systemd watchdog (C3) — useful but not critical yet
|
||||
- SIGHUP config reload (C4) — nice-to-have
|
||||
- Connection pooling (C5) — S3/SFTP origins are stubs
|
||||
- Event bus backpressure (C8) — low urgency
|
||||
- FUSE session recovery (C10) — complex edge case
|
||||
- Offline mode state machine (D3) — needs broader design
|
||||
- DNS failure handling (D5) — depends on C5
|
||||
- Stale-data awareness (D6) — low severity for music FS
|
||||
|
||||
---
|
||||
|
||||
## 4. Proposed Design
|
||||
|
||||
### 4.1 Implementation Order
|
||||
|
||||
```
|
||||
4.2 Health check timeout + parallel checks (2 RED tests, independent)
|
||||
↓
|
||||
4.3 Fix CAS calculate_size() (independent, unblocks eviction)
|
||||
↓
|
||||
4.4 FUSE read timeout (independent)
|
||||
↓
|
||||
4.5 CAS chunk auto-re-fetch on corruption (2 RED tests)
|
||||
↓
|
||||
4.6 PID file / flock (independent)
|
||||
↓
|
||||
4.7 fd exhaustion handling (1 RED test)
|
||||
```
|
||||
|
||||
### 4.2 Issues D1+D2: Health Check Timeout + Parallel Checks
|
||||
|
||||
**Problem**: `check_one()` awaits `origin.health()` with no timeout. `check_all()` iterates sequentially. One hung origin blocks everything.
|
||||
|
||||
#### Step 1: No stubs needed
|
||||
|
||||
#### Step 2: RED tests already exist
|
||||
|
||||
`test_local_origin_health_check_has_timeout` — FaultyOrigin with `TimeoutMs(5000)`, asserts check completes in <2s.
|
||||
|
||||
`test_health_checks_run_in_parallel` — 3 origins each with `TimeoutMs(200)`, asserts `check_all()` completes in <350ms (parallel), not ~600ms (sequential).
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-origins/src/health.rs`
|
||||
|
||||
Wrap `origin.health()` in `check_one()` with timeout:
|
||||
|
||||
```rust
|
||||
async fn check_one(&self, id: &OriginId, origin: &Arc<dyn Origin>) {
|
||||
let start = Instant::now();
|
||||
let health_timeout = Duration::from_secs(5);
|
||||
|
||||
let status = match tokio::time::timeout(health_timeout, origin.health()).await {
|
||||
Ok(status) => status,
|
||||
Err(_) => {
|
||||
warn!(origin_id = %id, timeout_ms = health_timeout.as_millis() as u64,
|
||||
"Health check timed out");
|
||||
HealthStatus::Unhealthy
|
||||
}
|
||||
};
|
||||
|
||||
let latency_ms = start.elapsed().as_millis() as u64;
|
||||
// ... rest unchanged
|
||||
}
|
||||
```
|
||||
|
||||
Change `check_all()` to use `futures::future::join_all`:
|
||||
|
||||
```rust
|
||||
pub async fn check_all(&self) {
|
||||
let origins: Vec<_> = self.origins.iter()
|
||||
.map(|e| (e.key().clone(), e.value().clone()))
|
||||
.collect();
|
||||
|
||||
let checks: Vec<_> = origins.iter()
|
||||
.map(|(id, origin)| self.check_one(id, origin))
|
||||
.collect();
|
||||
|
||||
futures::future::join_all(checks).await;
|
||||
}
|
||||
```
|
||||
|
||||
Add `futures` to `musicfs-origins/Cargo.toml` (or use `tokio::join!` macro if count is small/known).
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test -p musicfs-test-utils --test resilience -- test_local_origin_health_check
|
||||
cargo test -p musicfs-test-utils --test resilience -- test_health_checks_run_in_parallel
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.3 Issue C6: Fix CAS calculate_size()
|
||||
|
||||
**Problem**: `calculate_size()` only scans direct children of `chunks_dir`. Chunks live in shard subdirectories (`chunks/aa/bb/<hash>`). Size is always ~0, eviction never triggers.
|
||||
|
||||
#### Step 1: No stubs needed
|
||||
|
||||
#### Step 2: Test
|
||||
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_cas_size_tracking_is_correct() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let config = CasConfig { chunks_dir: dir.path().join("chunks"), max_size: 10_000_000, shard_levels: 2 };
|
||||
let store = CasStore::open(config).await.unwrap();
|
||||
|
||||
let data = vec![0u8; 1000];
|
||||
store.put(&data).await.unwrap();
|
||||
|
||||
// Size should reflect the chunk we just wrote (~1000 bytes)
|
||||
assert!(store.current_size() >= 1000, "current_size should track chunk data, got {}", store.current_size());
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-cas/src/store.rs` — make `calculate_size` recursive:
|
||||
|
||||
```rust
|
||||
async fn calculate_size(dir: &Path) -> u64 {
|
||||
Self::calculate_size_recursive(dir).await
|
||||
}
|
||||
|
||||
#[async recursion::async_recursion]
|
||||
async fn calculate_size_recursive(dir: &Path) -> u64 {
|
||||
let mut size = 0u64;
|
||||
if let Ok(mut entries) = fs::read_dir(dir).await {
|
||||
while let Ok(Some(entry)) = entries.next_entry().await {
|
||||
if let Ok(meta) = entry.metadata().await {
|
||||
if meta.is_file() {
|
||||
size += meta.len();
|
||||
} else if meta.is_dir() {
|
||||
// Skip sled index directory
|
||||
let name = entry.file_name();
|
||||
if name != "index.sled" {
|
||||
size += Self::calculate_size_recursive(&entry.path()).await;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
size
|
||||
}
|
||||
```
|
||||
|
||||
Alternative without `async_recursion` (use `Box::pin`):
|
||||
|
||||
```rust
|
||||
fn calculate_size_recursive(dir: &Path) -> Pin<Box<dyn Future<Output = u64> + Send + '_>> {
|
||||
Box::pin(async move {
|
||||
let mut size = 0u64;
|
||||
if let Ok(mut entries) = fs::read_dir(dir).await {
|
||||
while let Ok(Some(entry)) = entries.next_entry().await {
|
||||
if let Ok(meta) = entry.metadata().await {
|
||||
if meta.is_file() {
|
||||
size += meta.len();
|
||||
} else if meta.is_dir() {
|
||||
let name = entry.file_name();
|
||||
if name != "index.sled" {
|
||||
size += Self::calculate_size_recursive(&entry.path()).await;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
size
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.4 Issue C7: FUSE Read Timeout
|
||||
|
||||
**Problem**: FUSE `read()` calls `handle.block_on(reader.read(...))` with no timeout. A slow origin blocks the entire FUSE thread.
|
||||
|
||||
#### Step 1: No stubs needed
|
||||
|
||||
#### Step 2: Test
|
||||
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_fuse_read_timeout_returns_eio() {
|
||||
// Uses FaultyOrigin with TimeoutMs(60_000) — simulates hung read
|
||||
// FUSE read should timeout at 30s and return EIO, not hang forever
|
||||
// (This test validates the timeout wrapper, not actual FUSE mount)
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-fuse/src/filesystem.rs` — wrap the read with timeout:
|
||||
|
||||
```rust
|
||||
fn read(&mut self, _req: &Request, ino: u64, _fh: u64, offset: i64, size: u32, _flags: i32, _lock_owner: Option<u64>, reply: ReplyData) {
|
||||
// ... file_id lookup unchanged ...
|
||||
|
||||
let reader = reader.clone();
|
||||
let handle = self.runtime_handle.clone();
|
||||
let result = std::thread::scope(|_| {
|
||||
handle.block_on(async {
|
||||
tokio::time::timeout(
|
||||
Duration::from_secs(30),
|
||||
reader.read(file_id, offset as u64, size),
|
||||
).await
|
||||
})
|
||||
});
|
||||
|
||||
match result {
|
||||
Ok(Ok(data)) => {
|
||||
trace!(ino, bytes_read = data.len(), "read successful");
|
||||
reply.data(&data);
|
||||
}
|
||||
Ok(Err(e)) => {
|
||||
warn!(ino, error = %e, "read failed");
|
||||
reply.error(libc::EIO);
|
||||
}
|
||||
Err(_timeout) => {
|
||||
warn!(ino, offset, size, "read timed out after 30s");
|
||||
reply.error(libc::EIO);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4.5 Issues 6.4: CAS Chunk Auto-Re-Fetch on Corruption/Missing
|
||||
|
||||
**Problem**: When `store.get()` finds a corrupt or missing chunk, it returns an error. The reader propagates this as EIO to FUSE. It should try to re-fetch the chunk from the origin instead.
|
||||
|
||||
#### Step 1: No stubs needed — modify `FileReader::read()`
|
||||
|
||||
#### Step 2: RED tests already exist
|
||||
|
||||
`test_corrupt_chunk_auto_refetched` — corrupts chunk file on disk, expects read to succeed (re-fetched from origin).
|
||||
|
||||
`test_missing_chunk_triggers_origin_fetch` — deletes chunk file, expects read to succeed.
|
||||
|
||||
Both currently fail because the reader doesn't attempt re-fetch on chunk errors.
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-cas/src/reader.rs` — add retry-with-refetch in the chunk read loop:
|
||||
|
||||
```rust
|
||||
pub async fn read(&self, file_id: FileId, offset: u64, size: u32) -> Result<Bytes, ReaderError> {
|
||||
let manifest = self.get_or_fetch_manifest(file_id).await?;
|
||||
|
||||
// ... offset/end calculation unchanged ...
|
||||
|
||||
for chunk_ref in &manifest.chunks {
|
||||
// ... range check unchanged ...
|
||||
|
||||
let chunk_data = match self.store.get(&chunk_ref.hash).await {
|
||||
Ok(data) => data,
|
||||
Err(CasError::IntegrityError { .. }) | Err(CasError::NotFound(_)) => {
|
||||
// Chunk is corrupt or missing — try to re-fetch from origin
|
||||
warn!(hash = %chunk_ref.hash, "Chunk corrupt/missing, attempting re-fetch");
|
||||
if let Some(fetcher) = &self.fetcher {
|
||||
// Re-fetch the entire file (will re-chunk and store)
|
||||
let new_manifest = fetcher.fetch_file(file_id).await?;
|
||||
// Update cached manifest
|
||||
self.manifests.write().insert(file_id, new_manifest);
|
||||
// Retry the get
|
||||
self.store.get(&chunk_ref.hash).await?
|
||||
} else {
|
||||
return Err(ReaderError::Cas(CasError::NotFound(chunk_ref.hash.as_hex())));
|
||||
}
|
||||
}
|
||||
Err(e) => return Err(ReaderError::Cas(e)),
|
||||
};
|
||||
|
||||
// ... slice extraction unchanged ...
|
||||
}
|
||||
|
||||
Ok(result.freeze())
|
||||
}
|
||||
```
|
||||
|
||||
**Important**: The re-fetch downloads the entire file from origin and re-chunks it. For a single corrupt chunk this is wasteful (fetches all chunks to fix one), but it's the simplest correct approach. Chunk-level re-fetch would require the origin to support byte-range reads mapped to chunk boundaries — possible but complex. The file-level approach reuses existing `fetch_file()` logic.
|
||||
|
||||
#### Step 4: Verify
|
||||
|
||||
```bash
|
||||
cargo test -p musicfs-test-utils --test resilience -- test_corrupt_chunk
|
||||
cargo test -p musicfs-test-utils --test resilience -- test_missing_chunk
|
||||
```
|
||||
|
||||
**Note on test updates**: The existing RED tests reference `store.chunk_path()` which is private. The tests will need to either:
|
||||
- Make `chunk_path()` pub(crate) or add a test helper
|
||||
- Or construct the path manually using the sharding logic
|
||||
|
||||
The tests also need a `ContentFetcher` with a real `LocalOrigin` to re-fetch from. The current tests create a CAS store but no fetcher — they need to be updated to include the full pipeline.
|
||||
|
||||
---
|
||||
|
||||
### 4.6 Issue C9: PID File / flock
|
||||
|
||||
**Problem**: Two `musicfs mount` commands can run simultaneously, both writing to the same SQLite/sled files.
|
||||
|
||||
#### Step 1: No stubs needed
|
||||
|
||||
#### Step 2: Test
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
fn test_pid_file_prevents_concurrent_mount() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let lock_path = dir.path().join("musicfs.lock");
|
||||
|
||||
// First lock succeeds
|
||||
let lock1 = try_acquire_lock(&lock_path);
|
||||
assert!(lock1.is_ok());
|
||||
|
||||
// Second lock fails
|
||||
let lock2 = try_acquire_lock(&lock_path);
|
||||
assert!(lock2.is_err());
|
||||
|
||||
// Release first, second succeeds
|
||||
drop(lock1);
|
||||
let lock3 = try_acquire_lock(&lock_path);
|
||||
assert!(lock3.is_ok());
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
**File**: `musicfs-cli/src/main.rs`
|
||||
|
||||
```rust
|
||||
use std::fs::File;
|
||||
use std::os::unix::io::AsRawFd;
|
||||
|
||||
struct LockFile {
|
||||
_file: File,
|
||||
}
|
||||
|
||||
fn try_acquire_lock(path: &Path) -> Result<LockFile> {
|
||||
let file = File::create(path).context("Failed to create lock file")?;
|
||||
let fd = file.as_raw_fd();
|
||||
|
||||
let ret = unsafe { libc::flock(fd, libc::LOCK_EX | libc::LOCK_NB) };
|
||||
if ret != 0 {
|
||||
let err = std::io::Error::last_os_error();
|
||||
if err.kind() == std::io::ErrorKind::WouldBlock {
|
||||
anyhow::bail!("MusicFS is already running (lock file: {:?})", path);
|
||||
}
|
||||
return Err(err).context("Failed to acquire lock");
|
||||
}
|
||||
|
||||
// Write PID for debugging
|
||||
use std::io::Write;
|
||||
let mut f = &file;
|
||||
writeln!(f, "{}", std::process::id())?;
|
||||
|
||||
Ok(LockFile { _file: file })
|
||||
}
|
||||
```
|
||||
|
||||
Call in `run_mount()` before mounting:
|
||||
|
||||
```rust
|
||||
let lock_path = cache_dir.join("musicfs.lock");
|
||||
let _lock = try_acquire_lock(&lock_path)
|
||||
.context("Failed to acquire lock — is another instance running?")?;
|
||||
```
|
||||
|
||||
Lock is released automatically when `_lock` is dropped (process exit or scope end).
|
||||
|
||||
---
|
||||
|
||||
### 4.7 Issue 5.3: fd Exhaustion Handling
|
||||
|
||||
**Problem**: When fd limit is hit, operations fail with EMFILE. Currently this propagates as panics or unhelpful errors.
|
||||
|
||||
#### Step 1: Replace the `todo!()` test
|
||||
|
||||
#### Step 2: Test
|
||||
|
||||
```rust
|
||||
#[test]
|
||||
#[cfg(target_os = "linux")]
|
||||
fn test_fd_exhaustion_handling() {
|
||||
use rlimit::{Resource, setrlimit, getrlimit};
|
||||
|
||||
let (orig_soft, orig_hard) = getrlimit(Resource::NOFILE).unwrap();
|
||||
|
||||
// Set very low limit
|
||||
setrlimit(Resource::NOFILE, 64, 64).unwrap();
|
||||
|
||||
let dir = TempDir::new().unwrap();
|
||||
let rt = tokio::runtime::Runtime::new().unwrap();
|
||||
|
||||
let result = rt.block_on(async {
|
||||
CasStore::open(CasConfig {
|
||||
chunks_dir: dir.path().join("chunks"),
|
||||
max_size: 1_000_000,
|
||||
shard_levels: 2,
|
||||
}).await
|
||||
});
|
||||
|
||||
// Should either succeed (sled uses fewer than 64 fds) or fail gracefully
|
||||
// Must NOT panic
|
||||
match result {
|
||||
Ok(_store) => { /* lucky — enough fds */ }
|
||||
Err(e) => {
|
||||
// Error message should be meaningful
|
||||
let msg = format!("{}", e);
|
||||
assert!(!msg.contains("panic"), "Should not panic on fd exhaustion");
|
||||
}
|
||||
}
|
||||
|
||||
setrlimit(Resource::NOFILE, orig_soft, orig_hard).unwrap();
|
||||
}
|
||||
```
|
||||
|
||||
#### Step 3: Implementation
|
||||
|
||||
This is primarily a **test** — verifying that existing code handles fd exhaustion without panicking. The fix is ensuring all I/O paths return `Result` rather than `.unwrap()` on file operations. Phase A's RwLock migration already removed the biggest panic source. The remaining `.unwrap()` calls are in test code only.
|
||||
|
||||
No production code change required if existing error paths handle I/O errors correctly. The test validates this.
|
||||
|
||||
---
|
||||
|
||||
## 5. Cross-Cutting Concerns
|
||||
|
||||
### 5.1 Observability
|
||||
|
||||
- Health check timeout logged at WARN with origin_id and timeout duration
|
||||
- FUSE read timeout logged at WARN with inode, offset, size
|
||||
- CAS chunk re-fetch logged at WARN with chunk hash
|
||||
- PID file path logged at INFO on lock acquisition
|
||||
|
||||
### 5.2 Performance
|
||||
|
||||
- Health checks now parallel: O(1) wall-clock time instead of O(N) per check cycle
|
||||
- FUSE read timeout: 30s cap prevents indefinite hangs but doesn't improve happy-path latency
|
||||
- `calculate_size()` recursive scan: runs once at startup, negligible cost
|
||||
|
||||
### 5.3 Testing
|
||||
|
||||
| Test | Status Before | Status After | Issue |
|
||||
|------|---------------|--------------|-------|
|
||||
| `test_local_origin_health_check_has_timeout` | ❌ FAILED | ✅ GREEN | D1 |
|
||||
| `test_health_checks_run_in_parallel` | ❌ FAILED | ✅ GREEN | D2 |
|
||||
| `test_fd_exhaustion_handling` | ❌ todo!() | ✅ GREEN | 5.3 |
|
||||
| `test_corrupt_chunk_auto_refetched` | ❌ FAILED | ✅ GREEN | 6.4 |
|
||||
| `test_missing_chunk_triggers_origin_fetch` | ❌ FAILED | ✅ GREEN | 6.4 |
|
||||
| `test_passthrough_mode_when_cache_disk_dead` | ❌ todo!() | ✅ GREEN | 6.6 |
|
||||
| `test_cas_size_tracking_is_correct` | NEW | ✅ GREEN | C6 |
|
||||
| `test_pid_file_prevents_concurrent_mount` | NEW | ✅ GREEN | C9 |
|
||||
|
||||
**Note on passthrough mode** (6.6): The test expects reads to succeed when the cache dir is read-only. With chunk auto-re-fetch (4.5), this partially works — if the origin is alive and the chunk isn't in cache, the fetcher reads from origin. But the fetcher tries to _write_ the chunk to CAS, which will fail on a read-only cache dir. The implementation needs a fallback path: if CAS write fails after origin fetch, return the data anyway without caching. This makes `test_passthrough_mode_when_cache_disk_dead` pass.
|
||||
|
||||
---
|
||||
|
||||
## 6. Alternatives Considered
|
||||
|
||||
### 6.1 Per-Origin Configurable Timeout vs Universal 5s
|
||||
|
||||
Could allow `health_check_timeout_ms` per origin config. Rejected for Phase C — universal 5s is correct for all current origin types. Can be made configurable later.
|
||||
|
||||
### 6.2 Chunk-Level Re-Fetch vs File-Level Re-Fetch
|
||||
|
||||
When one chunk is corrupt, we could re-fetch just that chunk's byte range from origin. Requires the origin to support byte-range reads and the system to know which byte range maps to which chunk. Complex. File-level re-fetch reuses existing `fetch_file()` and is correct, just slightly wasteful. Good enough for Phase C.
|
||||
|
||||
### 6.3 `advisory-lock` Crate vs Raw `flock`
|
||||
|
||||
The `advisory-lock` crate wraps flock nicely but adds a dependency for 10 lines of code. Raw `libc::flock` is simple enough and avoids the dependency.
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Plan
|
||||
|
||||
### 7.1 Task Sequence
|
||||
|
||||
| Day | Task | Issue | Effort | Tests |
|
||||
|-----|------|-------|--------|-------|
|
||||
| 1 (morning) | Health check timeout in `check_one()` | D1 | 1h | `test_local_origin_health_check_has_timeout` → GREEN |
|
||||
| 1 (morning) | Parallel `check_all()` with `join_all` | D2 | 1h | `test_health_checks_run_in_parallel` → GREEN |
|
||||
| 1 (afternoon) | Fix `calculate_size()` recursion | C6 | 1h | `test_cas_size_tracking_is_correct` → GREEN |
|
||||
| 1 (afternoon) | FUSE read timeout wrapper | C7 | 1h | New timeout test |
|
||||
| 2 (morning) | CAS chunk auto-re-fetch on corruption/missing | 6.4 | 3h | `test_corrupt_chunk_auto_refetched` + `test_missing_chunk_triggers_origin_fetch` → GREEN |
|
||||
| 2 (afternoon) | Passthrough fallback (CAS write fails → return data anyway) | 6.6 | 1h | `test_passthrough_mode_when_cache_disk_dead` → GREEN |
|
||||
| 3 (morning) | PID file / flock | C9 | 1h | `test_pid_file_prevents_concurrent_mount` → GREEN |
|
||||
| 3 (morning) | fd exhaustion test | 5.3 | 1h | `test_fd_exhaustion_handling` → GREEN |
|
||||
| 3 (afternoon) | Integration + regression testing | — | 2h | Full `cargo test` |
|
||||
| 4 | Buffer | — | 4h | — |
|
||||
|
||||
### 7.2 Verification Checklist
|
||||
|
||||
After all tasks:
|
||||
|
||||
- [ ] `cargo check` — zero errors, zero warnings
|
||||
- [ ] `cargo test --workspace --exclude musicfs-grpc` — all pass
|
||||
- [ ] `cargo test -p musicfs-test-utils --test resilience` — **25 passed, 0 failed** (all RED tests GREEN)
|
||||
- [ ] `cargo clippy` — no new warnings
|
||||
|
||||
---
|
||||
|
||||
## 8. Files Changed
|
||||
|
||||
| File | Change | Issue |
|
||||
|------|--------|-------|
|
||||
| `musicfs-origins/src/health.rs` | Timeout in `check_one()`, `join_all` in `check_all()` | D1, D2 |
|
||||
| `musicfs-origins/Cargo.toml` | Add `futures` dependency (for `join_all`) | D2 |
|
||||
| `musicfs-cas/src/store.rs` | Recursive `calculate_size()`, skip `index.sled` dir | C6 |
|
||||
| `musicfs-fuse/src/filesystem.rs` | `tokio::time::timeout(30s)` around reader.read() | C7 |
|
||||
| `musicfs-cas/src/reader.rs` | Auto-re-fetch on `IntegrityError` / `NotFound` | 6.4 |
|
||||
| `musicfs-cas/src/fetcher.rs` | Possible: make `fetch_file` return data even if CAS write fails | 6.6 |
|
||||
| `musicfs-cli/src/main.rs` | PID file with flock, fd exhaustion handling | C9, 5.3 |
|
||||
| `musicfs-test-utils/tests/resilience.rs` | Replace remaining todo!()s, add new tests, update chunk tests with fetcher pipeline | all |
|
||||
|
||||
---
|
||||
|
||||
## 9. Glossary / References
|
||||
|
||||
| Term | Definition |
|
||||
|------|------------|
|
||||
| **join_all** | `futures::future::join_all` — runs multiple futures concurrently, waits for all |
|
||||
| **flock** | Advisory file locking syscall — `LOCK_EX | LOCK_NB` for exclusive non-blocking |
|
||||
| **EMFILE** | "Too many open files" errno — returned when process fd limit is reached |
|
||||
| **Passthrough mode** | When CAS is unavailable, read directly from origin without caching |
|
||||
|
||||
| Document | Path |
|
||||
|----------|------|
|
||||
| Phase A plan | [phase-a-stop-dying.md](phase-a-stop-dying.md) |
|
||||
| Phase B plan | [phase-b-crash-recovery.md](phase-b-crash-recovery.md) |
|
||||
| Resilience audit | [resilience-fault-tolerance.md](resilience-fault-tolerance.md) |
|
||||
+880
-950
File diff suppressed because it is too large
Load Diff
Generated
+375
-30
@@ -158,7 +158,7 @@ checksum = "c7c24de15d275a1ecfd47a380fb4d5ec9bfe0933f309ed5e705b775596a3574d"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -169,7 +169,7 @@ checksum = "9035ad2d096bed7955a320ee7e2230574d28fd3c3a0f186cbea1ff3c7eed5dbb"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -274,6 +274,17 @@ dependencies = [
|
||||
"generic-array",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "bmrng"
|
||||
version = "0.4.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "e9758e48498ae13d49b51a979d553d254e67021b203d9597e82a04ebd81025b2"
|
||||
dependencies = [
|
||||
"futures",
|
||||
"loom",
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "bumpalo"
|
||||
version = "3.20.2"
|
||||
@@ -322,6 +333,12 @@ version = "1.0.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
|
||||
|
||||
[[package]]
|
||||
name = "cfg_aliases"
|
||||
version = "0.2.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "613afe47fcd5fac7ccf1db93babcb082c5994d996f20b8b159f2ad1658eb5724"
|
||||
|
||||
[[package]]
|
||||
name = "chrono"
|
||||
version = "0.4.44"
|
||||
@@ -366,7 +383,7 @@ dependencies = [
|
||||
"heck 0.5.0",
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -692,7 +709,7 @@ checksum = "97369cbbc041bc366949bc74d34658d6cda5621039731c6310521892a3a20ae0"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -738,6 +755,17 @@ version = "0.1.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "af9673d8203fcb076b19dfd17e38b3d4ae9f44959416ea532ce72415a6020365"
|
||||
|
||||
[[package]]
|
||||
name = "fail"
|
||||
version = "0.5.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "fe5e43d0f78a42ad591453aedb1d7ae631ce7ee445c7643691055a9ed8d3b01c"
|
||||
dependencies = [
|
||||
"log",
|
||||
"once_cell",
|
||||
"rand",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "fallible-iterator"
|
||||
version = "0.3.0"
|
||||
@@ -889,6 +917,21 @@ dependencies = [
|
||||
"zerocopy 0.7.35",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "futures"
|
||||
version = "0.3.32"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "8b147ee9d1f6d097cef9ce628cd2ee62288d963e16fb287bd9286455b241382d"
|
||||
dependencies = [
|
||||
"futures-channel",
|
||||
"futures-core",
|
||||
"futures-executor",
|
||||
"futures-io",
|
||||
"futures-sink",
|
||||
"futures-task",
|
||||
"futures-util",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "futures-channel"
|
||||
version = "0.3.32"
|
||||
@@ -896,6 +939,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "07bbe89c50d7a535e539b8c17bc0b49bdb77747034daa8087407d655f3f7cc1d"
|
||||
dependencies = [
|
||||
"futures-core",
|
||||
"futures-sink",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -904,6 +948,34 @@ version = "0.3.32"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "7e3450815272ef58cec6d564423f6e755e25379b217b0bc688e295ba24df6b1d"
|
||||
|
||||
[[package]]
|
||||
name = "futures-executor"
|
||||
version = "0.3.32"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "baf29c38818342a3b26b5b923639e7b1f4a61fc5e76102d4b1981c6dc7a7579d"
|
||||
dependencies = [
|
||||
"futures-core",
|
||||
"futures-task",
|
||||
"futures-util",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "futures-io"
|
||||
version = "0.3.32"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "cecba35d7ad927e23624b22ad55235f2239cfa44fd10428eecbeba6d6a717718"
|
||||
|
||||
[[package]]
|
||||
name = "futures-macro"
|
||||
version = "0.3.32"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "e835b70203e41293343137df5c0664546da5745f82ec9b84d40be8336958447b"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "futures-sink"
|
||||
version = "0.3.32"
|
||||
@@ -922,8 +994,13 @@ version = "0.3.32"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6"
|
||||
dependencies = [
|
||||
"futures-channel",
|
||||
"futures-core",
|
||||
"futures-io",
|
||||
"futures-macro",
|
||||
"futures-sink",
|
||||
"futures-task",
|
||||
"memchr",
|
||||
"pin-project-lite",
|
||||
"slab",
|
||||
]
|
||||
@@ -950,6 +1027,19 @@ dependencies = [
|
||||
"serde_json",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "generator"
|
||||
version = "0.6.25"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "061d3be1afec479d56fa3bd182bf966c7999ec175fcfdb87ac14d417241366c6"
|
||||
dependencies = [
|
||||
"cc",
|
||||
"libc",
|
||||
"log",
|
||||
"rustversion",
|
||||
"winapi",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "generic-array"
|
||||
version = "0.14.7"
|
||||
@@ -1022,7 +1112,7 @@ dependencies = [
|
||||
"indexmap 2.14.0",
|
||||
"slab",
|
||||
"tokio",
|
||||
"tokio-util",
|
||||
"tokio-util 0.7.18",
|
||||
"tracing",
|
||||
]
|
||||
|
||||
@@ -1173,6 +1263,20 @@ dependencies = [
|
||||
"want",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "hyper-rustls"
|
||||
version = "0.24.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "ec3efd23720e2049821a693cbc7e65ea87c72f1c58ff2f9522ff332b1491e590"
|
||||
dependencies = [
|
||||
"futures-util",
|
||||
"http",
|
||||
"hyper",
|
||||
"rustls",
|
||||
"tokio",
|
||||
"tokio-rustls",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "hyper-timeout"
|
||||
version = "0.4.1"
|
||||
@@ -1593,6 +1697,20 @@ version = "0.4.29"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
|
||||
|
||||
[[package]]
|
||||
name = "loom"
|
||||
version = "0.4.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "27a6650b2f722ae8c0e2ebc46d07f80c9923464fc206d962332f1eff83143530"
|
||||
dependencies = [
|
||||
"cfg-if",
|
||||
"futures-util",
|
||||
"generator",
|
||||
"scoped-tls",
|
||||
"serde",
|
||||
"serde_json",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "lru"
|
||||
version = "0.12.5"
|
||||
@@ -1720,6 +1838,18 @@ dependencies = [
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "mockall_double"
|
||||
version = "0.2.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "7dffc15b97456ecc84d2bde8c1df79145e154f45225828c4361f676e1b82acd6"
|
||||
dependencies = [
|
||||
"cfg-if",
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn 1.0.109",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "moka"
|
||||
version = "0.12.15"
|
||||
@@ -1775,11 +1905,13 @@ version = "0.1.0"
|
||||
dependencies = [
|
||||
"bytes",
|
||||
"dirs",
|
||||
"fail",
|
||||
"hex",
|
||||
"musicfs-cache",
|
||||
"musicfs-core",
|
||||
"musicfs-origins",
|
||||
"musicfs-sync",
|
||||
"parking_lot 0.12.5",
|
||||
"rmp-serde",
|
||||
"serde",
|
||||
"sled",
|
||||
@@ -1797,13 +1929,17 @@ dependencies = [
|
||||
"anyhow",
|
||||
"clap",
|
||||
"dirs",
|
||||
"libc",
|
||||
"musicfs-cache",
|
||||
"musicfs-cas",
|
||||
"musicfs-core",
|
||||
"musicfs-fuse",
|
||||
"musicfs-metadata",
|
||||
"musicfs-origins",
|
||||
"parking_lot 0.12.5",
|
||||
"sd-notify",
|
||||
"tokio",
|
||||
"tokio-util 0.7.18",
|
||||
"tracing",
|
||||
"tracing-appender",
|
||||
"tracing-journald",
|
||||
@@ -1815,6 +1951,7 @@ name = "musicfs-core"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"hex",
|
||||
"parking_lot 0.12.5",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"tempfile",
|
||||
@@ -1882,8 +2019,10 @@ version = "0.1.0"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"dashmap",
|
||||
"futures",
|
||||
"libc",
|
||||
"musicfs-core",
|
||||
"parking_lot 0.12.5",
|
||||
"tempfile",
|
||||
"thiserror 1.0.69",
|
||||
"tokio",
|
||||
@@ -1942,6 +2081,33 @@ dependencies = [
|
||||
"xxhash-rust",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "musicfs-test-utils"
|
||||
version = "0.1.0"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"bytes",
|
||||
"fail",
|
||||
"libc",
|
||||
"musicfs-cache",
|
||||
"musicfs-cas",
|
||||
"musicfs-core",
|
||||
"musicfs-origins",
|
||||
"musicfs-search",
|
||||
"nix",
|
||||
"noxious-client",
|
||||
"parking_lot 0.12.5",
|
||||
"reqwest",
|
||||
"rlimit",
|
||||
"sd-notify",
|
||||
"tempfile",
|
||||
"thiserror 1.0.69",
|
||||
"tokio",
|
||||
"tokio-test",
|
||||
"tokio-util 0.7.18",
|
||||
"tracing",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "native-tls"
|
||||
version = "0.2.18"
|
||||
@@ -1959,6 +2125,18 @@ dependencies = [
|
||||
"tempfile",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "nix"
|
||||
version = "0.29.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "71e2746dc3a24dd78b3cfcb7be93368c6de9963d30f43a6a73998a9cf4b17b46"
|
||||
dependencies = [
|
||||
"bitflags 2.11.1",
|
||||
"cfg-if",
|
||||
"cfg_aliases",
|
||||
"libc",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "nom"
|
||||
version = "7.1.3"
|
||||
@@ -1988,6 +2166,39 @@ dependencies = [
|
||||
"windows-sys 0.48.0",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "noxious"
|
||||
version = "0.1.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "e68998924150ba54dbf1adf4c3f7f7c10bb5d3c6789ab71af11e34fe4c667970"
|
||||
dependencies = [
|
||||
"async-trait",
|
||||
"bmrng",
|
||||
"bytes",
|
||||
"futures",
|
||||
"mockall_double",
|
||||
"pin-project-lite",
|
||||
"rand",
|
||||
"serde",
|
||||
"thiserror 1.0.69",
|
||||
"tokio",
|
||||
"tokio-util 0.6.10",
|
||||
"tracing",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "noxious-client"
|
||||
version = "1.0.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "5b7ab7a9efb5768cd07e2b2455f80b3998d7397be76398c2ac03a52a42b652e7"
|
||||
dependencies = [
|
||||
"noxious",
|
||||
"reqwest",
|
||||
"serde",
|
||||
"thiserror 1.0.69",
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "nu-ansi-term"
|
||||
version = "0.50.3"
|
||||
@@ -2084,7 +2295,7 @@ checksum = "a948666b637a0f465e8564c73e89d4dde00d72d4d473cc972f390fc3dcee7d9c"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -2217,7 +2428,7 @@ checksum = "a990e22f43e84855daf260dded30524ef4a9021cc7541c26540500a50b624389"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -2282,7 +2493,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "479ca8adacdd7ce8f1fb39ce9ecccbfe93a3f1344b3d0d97f20bc0196208f62b"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -2321,7 +2532,7 @@ dependencies = [
|
||||
"prost",
|
||||
"prost-types",
|
||||
"regex",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
"tempfile",
|
||||
]
|
||||
|
||||
@@ -2335,7 +2546,7 @@ dependencies = [
|
||||
"itertools",
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -2524,6 +2735,7 @@ dependencies = [
|
||||
"http",
|
||||
"http-body",
|
||||
"hyper",
|
||||
"hyper-rustls",
|
||||
"hyper-tls",
|
||||
"ipnet",
|
||||
"js-sys",
|
||||
@@ -2533,6 +2745,7 @@ dependencies = [
|
||||
"once_cell",
|
||||
"percent-encoding",
|
||||
"pin-project-lite",
|
||||
"rustls",
|
||||
"rustls-pemfile",
|
||||
"serde",
|
||||
"serde_json",
|
||||
@@ -2541,14 +2754,39 @@ dependencies = [
|
||||
"system-configuration",
|
||||
"tokio",
|
||||
"tokio-native-tls",
|
||||
"tokio-rustls",
|
||||
"tower-service",
|
||||
"url",
|
||||
"wasm-bindgen",
|
||||
"wasm-bindgen-futures",
|
||||
"web-sys",
|
||||
"webpki-roots",
|
||||
"winreg",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "ring"
|
||||
version = "0.17.14"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "a4689e6c2294d81e88dc6261c768b63bc4fcdb852be6d1352498b114f61383b7"
|
||||
dependencies = [
|
||||
"cc",
|
||||
"cfg-if",
|
||||
"getrandom 0.2.17",
|
||||
"libc",
|
||||
"untrusted",
|
||||
"windows-sys 0.52.0",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rlimit"
|
||||
version = "0.10.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "7043b63bd0cd1aaa628e476b80e6d4023a3b50eb32789f2728908107bd0c793a"
|
||||
dependencies = [
|
||||
"libc",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rmp"
|
||||
version = "0.8.15"
|
||||
@@ -2630,6 +2868,18 @@ dependencies = [
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rustls"
|
||||
version = "0.21.12"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "3f56a14d1f48b391359b22f731fd4bd7e43c97f3c50eee276f3aa09c94784d3e"
|
||||
dependencies = [
|
||||
"log",
|
||||
"ring",
|
||||
"rustls-webpki",
|
||||
"sct",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rustls-pemfile"
|
||||
version = "1.0.4"
|
||||
@@ -2639,6 +2889,16 @@ dependencies = [
|
||||
"base64 0.21.7",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rustls-webpki"
|
||||
version = "0.101.7"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "8b6275d1ee7a1cd780b64aca7726599a1dbc893b1e64144529e55c3c2f745765"
|
||||
dependencies = [
|
||||
"ring",
|
||||
"untrusted",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rustversion"
|
||||
version = "1.0.22"
|
||||
@@ -2669,12 +2929,37 @@ dependencies = [
|
||||
"windows-sys 0.61.2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "scoped-tls"
|
||||
version = "1.0.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "e1cf6437eb19a8f4a6cc0f7dca544973b0b78843adbfeb3683d1a94a0024a294"
|
||||
|
||||
[[package]]
|
||||
name = "scopeguard"
|
||||
version = "1.2.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49"
|
||||
|
||||
[[package]]
|
||||
name = "sct"
|
||||
version = "0.7.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "da046153aa2352493d6cb7da4b6e5c0c057d8a1d0a9aa8560baffdd945acd414"
|
||||
dependencies = [
|
||||
"ring",
|
||||
"untrusted",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "sd-notify"
|
||||
version = "0.4.5"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b943eadf71d8b69e661330cb0e2656e31040acf21ee7708e2c238a0ec6af2bf4"
|
||||
dependencies = [
|
||||
"libc",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "security-framework"
|
||||
version = "3.7.0"
|
||||
@@ -2731,7 +3016,7 @@ checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -3036,6 +3321,17 @@ dependencies = [
|
||||
"symphonia-metadata",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "syn"
|
||||
version = "1.0.109"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "72b64191b275b66ffe2469e8af2c1cfe3bafa67b529ead792a6d0160888b4237"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"unicode-ident",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "syn"
|
||||
version = "2.0.117"
|
||||
@@ -3061,7 +3357,7 @@ checksum = "728a70f3dbaf5bab7f0c4b1ac8d7ae5ea60a4b5549c8a5914361c99147a709d2"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -3277,7 +3573,7 @@ checksum = "4fee6c4efc90059e10f81e6d42c60a18f76588c3d74cb83a0b242a2b6c7504c1"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -3288,7 +3584,7 @@ checksum = "ebc4ee7f67670e9b64d05fa4253e753e016c6c95ff35b89b7941d6b856dec1d5"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -3376,7 +3672,7 @@ checksum = "385a6cb71ab9ab790c5fe8d67f1645e6c450a7ce006a33de03daa956cf70a496"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -3389,6 +3685,16 @@ dependencies = [
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tokio-rustls"
|
||||
version = "0.24.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "c28327cf380ac148141087fbfb9de9d7bd4e84ab5d2c28fbc911d753de8a7081"
|
||||
dependencies = [
|
||||
"rustls",
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tokio-stream"
|
||||
version = "0.1.18"
|
||||
@@ -3400,6 +3706,31 @@ dependencies = [
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tokio-test"
|
||||
version = "0.4.5"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "3f6d24790a10a7af737693a3e8f1d03faef7e6ca0cc99aae5066f533766de545"
|
||||
dependencies = [
|
||||
"futures-core",
|
||||
"tokio",
|
||||
"tokio-stream",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tokio-util"
|
||||
version = "0.6.10"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "36943ee01a6d67977dd3f84a5a1d2efeb4ada3a1ae771cadfaa535d9d9fc6507"
|
||||
dependencies = [
|
||||
"bytes",
|
||||
"futures-core",
|
||||
"futures-sink",
|
||||
"log",
|
||||
"pin-project-lite",
|
||||
"tokio",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tokio-util"
|
||||
version = "0.7.18"
|
||||
@@ -3409,6 +3740,7 @@ dependencies = [
|
||||
"bytes",
|
||||
"futures-core",
|
||||
"futures-sink",
|
||||
"futures-util",
|
||||
"pin-project-lite",
|
||||
"tokio",
|
||||
]
|
||||
@@ -3491,7 +3823,7 @@ dependencies = [
|
||||
"proc-macro2",
|
||||
"prost-build",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -3508,7 +3840,7 @@ dependencies = [
|
||||
"rand",
|
||||
"slab",
|
||||
"tokio",
|
||||
"tokio-util",
|
||||
"tokio-util 0.7.18",
|
||||
"tower-layer",
|
||||
"tower-service",
|
||||
"tracing",
|
||||
@@ -3532,6 +3864,7 @@ version = "0.1.44"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "63e71662fa4b2a2c3a26f570f037eb95bb1f85397f3cd8076caed2f026a6d100"
|
||||
dependencies = [
|
||||
"log",
|
||||
"pin-project-lite",
|
||||
"tracing-attributes",
|
||||
"tracing-core",
|
||||
@@ -3558,7 +3891,7 @@ checksum = "7490cfa5ec963746568740651ac6781f701c9c5ea257c58e057f3ba8cf69e8da"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -3654,6 +3987,12 @@ version = "0.2.6"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "ebc1c04c71510c7f702b52b7c350734c9ff1295c464a03335b00bb84fc54f853"
|
||||
|
||||
[[package]]
|
||||
name = "untrusted"
|
||||
version = "0.9.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "8ecb6da28b8a351d773b68d5825ac39017e680750f980f3a1a85cd8dd28a47c1"
|
||||
|
||||
[[package]]
|
||||
name = "url"
|
||||
version = "2.5.8"
|
||||
@@ -3799,7 +4138,7 @@ dependencies = [
|
||||
"bumpalo",
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
"wasm-bindgen-shared",
|
||||
]
|
||||
|
||||
@@ -3981,7 +4320,7 @@ dependencies = [
|
||||
"anyhow",
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
"wasmtime-component-util",
|
||||
"wasmtime-wit-bindgen",
|
||||
"wit-parser 0.201.0",
|
||||
@@ -4155,7 +4494,7 @@ checksum = "ffaafa5c12355b1a9ee068e9295d50c4ca0a400c721950cdae4f5b54391a2da5"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -4225,6 +4564,12 @@ dependencies = [
|
||||
"wasm-bindgen",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "webpki-roots"
|
||||
version = "0.25.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "5f20c57d8d7db6d3b86154206ae5d8fba62dd39573114de97c2cb0578251f8e1"
|
||||
|
||||
[[package]]
|
||||
name = "winapi"
|
||||
version = "0.3.9"
|
||||
@@ -4293,7 +4638,7 @@ checksum = "053e2e040ab57b9dc951b72c264860db7eb3b0200ba345b4e4c3b14f67855ddf"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -4304,7 +4649,7 @@ checksum = "3f316c4a2570ba26bbec722032c4099d8c8bc095efccdc15688708623367e358"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -4543,7 +4888,7 @@ dependencies = [
|
||||
"heck 0.5.0",
|
||||
"indexmap 2.14.0",
|
||||
"prettyplease",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
"wasm-metadata",
|
||||
"wit-bindgen-core",
|
||||
"wit-component",
|
||||
@@ -4559,7 +4904,7 @@ dependencies = [
|
||||
"prettyplease",
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
"wit-bindgen-core",
|
||||
"wit-bindgen-rust",
|
||||
]
|
||||
@@ -4650,7 +4995,7 @@ checksum = "de844c262c8848816172cef550288e7dc6c7b7814b4ee56b3e1553f275f1858e"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
"synstructure",
|
||||
]
|
||||
|
||||
@@ -4681,7 +5026,7 @@ checksum = "fa4f8080344d4671fb4e831a13ad1e68092748387dfc4f55e356242fae12ce3e"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -4692,7 +5037,7 @@ checksum = "70e3cd084b1788766f53af483dd21f93881ff30d7320490ec3ef7526d203bad4"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -4712,7 +5057,7 @@ checksum = "11532158c46691caf0f2593ea8358fed6bbf68a0315e80aae9bd41fbade684a1"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
"synstructure",
|
||||
]
|
||||
|
||||
@@ -4746,7 +5091,7 @@ checksum = "625dc425cab0dca6dc3c3319506e6593dcb08a9f387ea3b284dbd52a92c40555"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
"syn 2.0.117",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
||||
@@ -13,7 +13,9 @@ repository = "https://github.com/user/musicfs"
|
||||
[workspace.dependencies]
|
||||
# Async runtime
|
||||
tokio = { version = "1", features = ["full"] }
|
||||
tokio-util = { version = "0.7", features = ["rt"] }
|
||||
async-trait = "0.1"
|
||||
futures = "0.3"
|
||||
|
||||
# Error handling
|
||||
thiserror = "1"
|
||||
@@ -61,6 +63,12 @@ clap = { version = "4", features = ["derive"] }
|
||||
|
||||
# Testing
|
||||
tempfile = "3"
|
||||
fail = "0.5"
|
||||
rlimit = "0.10"
|
||||
nix = { version = "0.29", features = ["signal", "process"] }
|
||||
wiremock = "0.6"
|
||||
assert_cmd = "2.0"
|
||||
noxious-client = "1.0"
|
||||
|
||||
# Platform-specific
|
||||
libc = "0.2"
|
||||
@@ -81,5 +89,7 @@ tokio-stream = "0.1"
|
||||
image = { version = "0.24", default-features = false, features = ["jpeg", "png"] }
|
||||
chrono = "0.4"
|
||||
|
||||
sd-notify = "0.4"
|
||||
|
||||
[workspace.dependencies.tonic-build]
|
||||
version = "0.11"
|
||||
|
||||
@@ -6,7 +6,7 @@ use rusqlite::{params, Connection, OptionalExtension};
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::{Arc, Mutex};
|
||||
use std::time::{Duration, SystemTime, UNIX_EPOCH};
|
||||
use tracing::{debug, info};
|
||||
use tracing::{debug, info, warn};
|
||||
|
||||
const SCHEMA: &str = include_str!("schema.sql");
|
||||
|
||||
@@ -32,6 +32,34 @@ impl Database {
|
||||
Ok(db)
|
||||
}
|
||||
|
||||
pub fn open_with_integrity_check(path: &Path) -> Result<Self> {
|
||||
debug!(?path, "Opening database with integrity check");
|
||||
|
||||
let conn = Connection::open(path)
|
||||
.map_err(|e| Error::Database(format!("open failed: {}", e)))?;
|
||||
|
||||
let integrity: String = conn
|
||||
.query_row("PRAGMA integrity_check(1)", [], |row| row.get(0))
|
||||
.map_err(|e| Error::Database(format!("integrity check failed: {}", e)))?;
|
||||
|
||||
if integrity != "ok" {
|
||||
warn!(path = ?path, result = %integrity, "Database integrity check failed");
|
||||
return Err(Error::DatabaseCorrupted(format!(
|
||||
"integrity check failed: {}", integrity
|
||||
)));
|
||||
}
|
||||
|
||||
conn.execute_batch(SCHEMA)
|
||||
.map_err(|e| Error::Database(format!("schema init failed: {}", e)))?;
|
||||
|
||||
let db = Self {
|
||||
conn: Arc::new(Mutex::new(conn)),
|
||||
};
|
||||
let count = db.file_count().unwrap_or(0);
|
||||
info!(path = ?path, file_count = count, "Database opened (integrity verified)");
|
||||
Ok(db)
|
||||
}
|
||||
|
||||
pub fn open_memory() -> Result<Self> {
|
||||
let conn = Connection::open_in_memory()
|
||||
.map_err(|e| Error::Database(format!("open_in_memory failed: {}", e)))?;
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
use musicfs_cas::CasStore;
|
||||
use musicfs_core::ChunkHash;
|
||||
use parking_lot::RwLock;
|
||||
use std::collections::BTreeMap;
|
||||
use std::sync::RwLock;
|
||||
use std::time::Instant;
|
||||
use tracing::info;
|
||||
|
||||
@@ -64,8 +64,8 @@ impl Default for LruEviction {
|
||||
impl EvictionPolicy for LruEviction {
|
||||
fn record_access(&self, hash: ChunkHash) {
|
||||
let now = Instant::now();
|
||||
let mut times = self.access_times.write().unwrap();
|
||||
let mut h2t = self.hash_to_time.write().unwrap();
|
||||
let mut times = self.access_times.write();
|
||||
let mut h2t = self.hash_to_time.write();
|
||||
|
||||
if let Some(old_time) = h2t.remove(&hash) {
|
||||
times.remove(&old_time);
|
||||
@@ -76,13 +76,13 @@ impl EvictionPolicy for LruEviction {
|
||||
}
|
||||
|
||||
fn select_victims(&self, count: usize) -> Vec<ChunkHash> {
|
||||
let times = self.access_times.read().unwrap();
|
||||
let times = self.access_times.read();
|
||||
times.values().take(count).copied().collect()
|
||||
}
|
||||
|
||||
fn remove(&self, hash: &ChunkHash) {
|
||||
let mut times = self.access_times.write().unwrap();
|
||||
let mut h2t = self.hash_to_time.write().unwrap();
|
||||
let mut times = self.access_times.write();
|
||||
let mut h2t = self.hash_to_time.write();
|
||||
|
||||
if let Some(time) = h2t.remove(hash) {
|
||||
times.remove(&time);
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
use musicfs_core::{FileId, FileMeta, VirtualPath};
|
||||
use parking_lot::RwLock;
|
||||
use std::collections::{BTreeMap, HashMap};
|
||||
use std::ffi::{OsStr, OsString};
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::RwLock;
|
||||
use std::time::{Duration, SystemTime};
|
||||
use tracing::{debug, trace};
|
||||
|
||||
@@ -291,7 +291,7 @@ impl VirtualTree {
|
||||
}
|
||||
|
||||
pub fn needs_refresh(&self) -> bool {
|
||||
let last = *self.last_refresh.read().unwrap();
|
||||
let last = *self.last_refresh.read();
|
||||
last.elapsed().unwrap_or(Duration::MAX) > self.refresh_policy.ttl
|
||||
}
|
||||
|
||||
@@ -303,11 +303,11 @@ impl VirtualTree {
|
||||
root.children.clear();
|
||||
}
|
||||
|
||||
*self.last_refresh.write().unwrap() = SystemTime::now();
|
||||
*self.last_refresh.write() = SystemTime::now();
|
||||
}
|
||||
|
||||
pub fn mark_refreshed(&self) {
|
||||
*self.last_refresh.write().unwrap() = SystemTime::now();
|
||||
*self.last_refresh.write() = SystemTime::now();
|
||||
}
|
||||
|
||||
pub fn refresh_policy(&self) -> &RefreshPolicy {
|
||||
|
||||
@@ -3,7 +3,12 @@ name = "musicfs-cas"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
|
||||
[features]
|
||||
default = []
|
||||
failpoints = ["fail/failpoints"]
|
||||
|
||||
[dependencies]
|
||||
fail = { workspace = true, optional = true }
|
||||
musicfs-core = { path = "../musicfs-core" }
|
||||
musicfs-origins = { path = "../musicfs-origins" }
|
||||
musicfs-sync = { path = "../musicfs-sync" }
|
||||
@@ -17,6 +22,7 @@ rmp-serde.workspace = true
|
||||
hex.workspace = true
|
||||
dirs.workspace = true
|
||||
thiserror.workspace = true
|
||||
parking_lot.workspace = true
|
||||
|
||||
[dev-dependencies]
|
||||
tempfile.workspace = true
|
||||
|
||||
@@ -2,9 +2,10 @@ use crate::{CasStore, ChunkManifest, ChunkRef};
|
||||
use musicfs_core::{Event, EventBus, FileId, FileMeta, OriginId};
|
||||
use musicfs_origins::Origin;
|
||||
use musicfs_sync::CdcChunker;
|
||||
use parking_lot::RwLock;
|
||||
use std::collections::HashMap;
|
||||
use std::sync::{Arc, RwLock};
|
||||
use tracing::{debug, info};
|
||||
use std::sync::Arc;
|
||||
use tracing::{debug, info, warn};
|
||||
|
||||
pub struct ContentFetcher {
|
||||
store: Arc<CasStore>,
|
||||
@@ -37,15 +38,15 @@ impl ContentFetcher {
|
||||
|
||||
pub fn register_origin(&self, origin: Arc<dyn Origin>) {
|
||||
let id = origin.id().clone();
|
||||
self.origins.write().unwrap().insert(id, origin);
|
||||
self.origins.write().insert(id, origin);
|
||||
}
|
||||
|
||||
pub fn register_file(&self, meta: FileMeta) {
|
||||
self.file_meta.write().unwrap().insert(meta.id, meta);
|
||||
self.file_meta.write().insert(meta.id, meta);
|
||||
}
|
||||
|
||||
pub fn register_files(&self, files: impl IntoIterator<Item = FileMeta>) {
|
||||
let mut map = self.file_meta.write().unwrap();
|
||||
let mut map = self.file_meta.write();
|
||||
for meta in files {
|
||||
map.insert(meta.id, meta);
|
||||
}
|
||||
@@ -53,7 +54,7 @@ impl ContentFetcher {
|
||||
|
||||
pub async fn fetch_file(&self, file_id: FileId) -> Result<ChunkManifest, FetchError> {
|
||||
let meta = {
|
||||
let files = self.file_meta.read().unwrap();
|
||||
let files = self.file_meta.read();
|
||||
files
|
||||
.get(&file_id)
|
||||
.cloned()
|
||||
@@ -61,7 +62,7 @@ impl ContentFetcher {
|
||||
};
|
||||
|
||||
let origin = {
|
||||
let origins = self.origins.read().unwrap();
|
||||
let origins = self.origins.read();
|
||||
origins
|
||||
.get(&meta.real_path.origin_id)
|
||||
.cloned()
|
||||
@@ -91,7 +92,9 @@ impl ContentFetcher {
|
||||
let mut chunk_refs = Vec::with_capacity(chunks.len());
|
||||
for chunk in chunks {
|
||||
if !self.store.exists(&chunk.hash) {
|
||||
self.store.put(chunk.data).await.map_err(FetchError::Store)?;
|
||||
if let Err(e) = self.store.put(chunk.data).await {
|
||||
warn!(hash = %chunk.hash, error = %e, "CAS write failed, continuing in passthrough mode");
|
||||
}
|
||||
}
|
||||
|
||||
chunk_refs.push(ChunkRef {
|
||||
@@ -123,7 +126,7 @@ impl ContentFetcher {
|
||||
}
|
||||
|
||||
pub fn get_file_meta(&self, file_id: FileId) -> Option<FileMeta> {
|
||||
self.file_meta.read().unwrap().get(&file_id).cloned()
|
||||
self.file_meta.read().get(&file_id).cloned()
|
||||
}
|
||||
|
||||
pub fn emit_access_event(&self, meta: &FileMeta, offset: u64, size: u32) {
|
||||
|
||||
@@ -1,12 +1,13 @@
|
||||
use crate::chunks::ChunkRef;
|
||||
use crate::fetcher::{ContentFetcher, FetchError};
|
||||
use crate::store::CasStore;
|
||||
use crate::store::{CasError, CasStore};
|
||||
use bytes::{Bytes, BytesMut};
|
||||
use musicfs_core::FileId;
|
||||
use parking_lot::RwLock;
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::collections::HashMap;
|
||||
use std::sync::{Arc, RwLock};
|
||||
use tracing::{debug, trace};
|
||||
use std::sync::Arc;
|
||||
use tracing::{debug, trace, warn};
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct ChunkManifest {
|
||||
@@ -60,13 +61,13 @@ impl FileReader {
|
||||
}
|
||||
|
||||
pub fn register_manifest(&self, manifest: ChunkManifest) {
|
||||
let mut manifests = self.manifests.write().unwrap();
|
||||
let mut manifests = self.manifests.write();
|
||||
manifests.insert(manifest.file_id, manifest);
|
||||
}
|
||||
|
||||
async fn get_or_fetch_manifest(&self, file_id: FileId) -> Result<ChunkManifest, ReaderError> {
|
||||
{
|
||||
let manifests = self.manifests.read().unwrap();
|
||||
let manifests = self.manifests.read();
|
||||
if let Some(m) = manifests.get(&file_id) {
|
||||
trace!(file_id = ?file_id, "manifest cache hit");
|
||||
return Ok(m.clone());
|
||||
@@ -81,7 +82,6 @@ impl FileReader {
|
||||
let manifest = fetcher.ensure_cached(file_id).await?;
|
||||
self.manifests
|
||||
.write()
|
||||
.unwrap()
|
||||
.insert(file_id, manifest.clone());
|
||||
Ok(manifest)
|
||||
}
|
||||
@@ -116,7 +116,31 @@ impl FileReader {
|
||||
continue;
|
||||
}
|
||||
|
||||
let chunk_data = self.store.get(&chunk_ref.hash).await?;
|
||||
let chunk_data = match self.store.get(&chunk_ref.hash).await {
|
||||
Ok(data) => data,
|
||||
Err(CasError::IntegrityError { .. }) => {
|
||||
warn!(hash = %chunk_ref.hash, "Chunk corrupt, deleting and re-fetching");
|
||||
let _ = self.store.delete(&chunk_ref.hash).await;
|
||||
if let Some(fetcher) = &self.fetcher {
|
||||
let new_manifest = fetcher.fetch_file(file_id).await?;
|
||||
self.manifests.write().insert(file_id, new_manifest);
|
||||
self.store.get(&chunk_ref.hash).await?
|
||||
} else {
|
||||
return Err(ReaderError::Cas(CasError::NotFound(chunk_ref.hash.as_hex())));
|
||||
}
|
||||
}
|
||||
Err(CasError::NotFound(_)) => {
|
||||
warn!(hash = %chunk_ref.hash, "Chunk missing, attempting re-fetch");
|
||||
if let Some(fetcher) = &self.fetcher {
|
||||
let new_manifest = fetcher.fetch_file(file_id).await?;
|
||||
self.manifests.write().insert(file_id, new_manifest);
|
||||
self.store.get(&chunk_ref.hash).await?
|
||||
} else {
|
||||
return Err(ReaderError::Cas(CasError::NotFound(chunk_ref.hash.as_hex())));
|
||||
}
|
||||
}
|
||||
Err(e) => return Err(ReaderError::Cas(e)),
|
||||
};
|
||||
|
||||
let read_start = if offset > chunk_start {
|
||||
(offset - chunk_start) as usize
|
||||
|
||||
@@ -4,7 +4,10 @@ use musicfs_core::ChunkHash;
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use tokio::fs;
|
||||
use tracing::{debug, trace, warn};
|
||||
use tracing::{debug, info, trace, warn};
|
||||
|
||||
#[cfg(feature = "failpoints")]
|
||||
use fail::fail_point;
|
||||
|
||||
const DEFAULT_MAX_SIZE_10GB: u64 = 10 * 1024 * 1024 * 1024;
|
||||
const DEFAULT_SHARD_LEVELS_256_SUBDIRS: u8 = 2;
|
||||
@@ -42,7 +45,27 @@ impl CasStore {
|
||||
fs::create_dir_all(&config.chunks_dir).await?;
|
||||
|
||||
let index_path = config.chunks_dir.join("index.sled");
|
||||
let index = sled::open(&index_path)?;
|
||||
let index = match sled::open(&index_path) {
|
||||
Ok(db) => db,
|
||||
Err(e) => {
|
||||
warn!(error = %e, path = ?index_path, "sled index corrupted, attempting recovery");
|
||||
|
||||
match sled::Config::new().path(&index_path).open() {
|
||||
Ok(db) => {
|
||||
info!("sled index repaired successfully");
|
||||
db
|
||||
}
|
||||
Err(repair_err) => {
|
||||
warn!(error = %repair_err, "sled repair failed, recreating index");
|
||||
if index_path.exists() {
|
||||
std::fs::remove_dir_all(&index_path)
|
||||
.map_err(CasError::Io)?;
|
||||
}
|
||||
sled::open(&index_path)?
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
let current_size = Self::calculate_size(&config.chunks_dir).await;
|
||||
|
||||
@@ -54,17 +77,29 @@ impl CasStore {
|
||||
}
|
||||
|
||||
async fn calculate_size(dir: &Path) -> u64 {
|
||||
let mut size = 0u64;
|
||||
if let Ok(mut entries) = fs::read_dir(dir).await {
|
||||
while let Ok(Some(entry)) = entries.next_entry().await {
|
||||
if let Ok(meta) = entry.metadata().await {
|
||||
if meta.is_file() {
|
||||
size += meta.len();
|
||||
Self::calculate_size_recursive(dir).await
|
||||
}
|
||||
|
||||
fn calculate_size_recursive(dir: &Path) -> std::pin::Pin<Box<dyn std::future::Future<Output = u64> + Send + '_>> {
|
||||
Box::pin(async move {
|
||||
let mut size = 0u64;
|
||||
if let Ok(mut entries) = fs::read_dir(dir).await {
|
||||
while let Ok(Some(entry)) = entries.next_entry().await {
|
||||
if let Ok(meta) = entry.metadata().await {
|
||||
if meta.is_file() {
|
||||
size += meta.len();
|
||||
} else if meta.is_dir() {
|
||||
// Skip sled index directory
|
||||
let name = entry.file_name();
|
||||
if name != "index.sled" {
|
||||
size += Self::calculate_size_recursive(&entry.path()).await;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
size
|
||||
size
|
||||
})
|
||||
}
|
||||
|
||||
pub async fn put(&self, data: &[u8]) -> Result<ChunkHash, CasError> {
|
||||
@@ -76,12 +111,44 @@ impl CasStore {
|
||||
return Ok(hash);
|
||||
}
|
||||
|
||||
if self.config.max_size > 0 {
|
||||
let new_size = self.current_size.load(Ordering::SeqCst) + data.len() as u64;
|
||||
if new_size > self.config.max_size {
|
||||
warn!(
|
||||
current_size = self.current_size.load(Ordering::SeqCst),
|
||||
chunk_size = data.len(),
|
||||
max_size = self.config.max_size,
|
||||
"CAS store full, rejecting write"
|
||||
);
|
||||
return Err(CasError::StoreFull {
|
||||
current: self.current_size.load(Ordering::SeqCst),
|
||||
max: self.config.max_size,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(parent) = path.parent() {
|
||||
fs::create_dir_all(parent).await?;
|
||||
}
|
||||
|
||||
#[cfg(feature = "failpoints")]
|
||||
fail_point!("cas-put-before-write", |_| {
|
||||
Err(CasError::Io(std::io::Error::new(
|
||||
std::io::ErrorKind::Other,
|
||||
"Failpoint: cas-put-before-write",
|
||||
)))
|
||||
});
|
||||
|
||||
fs::write(&path, data).await?;
|
||||
|
||||
#[cfg(feature = "failpoints")]
|
||||
fail_point!("cas-put-after-write-before-index", |_| {
|
||||
Err(CasError::Io(std::io::Error::new(
|
||||
std::io::ErrorKind::Other,
|
||||
"Failpoint: cas-put-after-write-before-index",
|
||||
)))
|
||||
});
|
||||
|
||||
let location = ChunkLocation {
|
||||
path: path.clone(),
|
||||
size: data.len() as u32,
|
||||
@@ -232,6 +299,9 @@ pub enum CasError {
|
||||
|
||||
#[error("Serialization error: {0}")]
|
||||
Serialization(String),
|
||||
|
||||
#[error("Store full: {current} / {max} bytes")]
|
||||
StoreFull { current: u64, max: u64 },
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
|
||||
@@ -17,11 +17,15 @@ musicfs-metadata.path = "../musicfs-metadata"
|
||||
|
||||
clap.workspace = true
|
||||
tokio.workspace = true
|
||||
tokio-util.workspace = true
|
||||
tracing.workspace = true
|
||||
tracing-subscriber.workspace = true
|
||||
tracing-appender.workspace = true
|
||||
anyhow.workspace = true
|
||||
dirs.workspace = true
|
||||
parking_lot.workspace = true
|
||||
libc.workspace = true
|
||||
|
||||
[target.'cfg(target_os = "linux")'.dependencies]
|
||||
tracing-journald.workspace = true
|
||||
sd-notify.workspace = true
|
||||
|
||||
@@ -6,10 +6,14 @@ use musicfs_core::{FileId, FileMeta, LoggingConfig, OriginId, RealPath, VirtualP
|
||||
use musicfs_fuse::MusicFs;
|
||||
use musicfs_metadata::MetadataParser;
|
||||
use musicfs_origins::{LocalOrigin, Origin};
|
||||
use parking_lot::RwLock;
|
||||
use std::fs::File;
|
||||
use std::io::Write;
|
||||
use std::os::unix::io::AsRawFd;
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::{Arc, RwLock};
|
||||
use std::sync::Arc;
|
||||
use std::time::SystemTime;
|
||||
use tracing::{debug, info};
|
||||
use tracing::{debug, info, warn};
|
||||
use tracing_appender::non_blocking::WorkerGuard;
|
||||
use tracing_subscriber::{fmt, prelude::*, EnvFilter, Layer};
|
||||
|
||||
@@ -86,7 +90,31 @@ enum OriginCommands {
|
||||
},
|
||||
}
|
||||
|
||||
struct LockFile {
|
||||
_file: File,
|
||||
}
|
||||
|
||||
fn try_acquire_lock(path: &Path) -> Result<LockFile> {
|
||||
let file = File::create(path).context("Failed to create lock file")?;
|
||||
let fd = file.as_raw_fd();
|
||||
|
||||
let ret = unsafe { libc::flock(fd, libc::LOCK_EX | libc::LOCK_NB) };
|
||||
if ret != 0 {
|
||||
let err = std::io::Error::last_os_error();
|
||||
if err.kind() == std::io::ErrorKind::WouldBlock {
|
||||
anyhow::bail!("MusicFS is already running (lock file: {:?})", path);
|
||||
}
|
||||
return Err(err).context("Failed to acquire lock");
|
||||
}
|
||||
|
||||
let mut f = &file;
|
||||
writeln!(f, "{}", std::process::id())?;
|
||||
|
||||
Ok(LockFile { _file: file })
|
||||
}
|
||||
|
||||
fn main() -> Result<()> {
|
||||
musicfs_core::install_panic_hook();
|
||||
let cli = Cli::parse();
|
||||
|
||||
match cli.command {
|
||||
@@ -137,24 +165,25 @@ fn run_mount(
|
||||
) -> Result<()> {
|
||||
let origin_path = origin_path.context("--origin is required for mount")?;
|
||||
|
||||
let cache_dir = cache_dir.unwrap_or_else(|| {
|
||||
dirs::cache_dir()
|
||||
.unwrap_or_else(|| PathBuf::from("/tmp"))
|
||||
.join("musicfs")
|
||||
});
|
||||
|
||||
let runtime = tokio::runtime::Runtime::new().context("Failed to create Tokio runtime")?;
|
||||
let handle = runtime.handle().clone();
|
||||
|
||||
let cache_dir_clone = cache_dir.clone();
|
||||
let (tree, reader) = runtime.block_on(async {
|
||||
info!(origin = ?origin_path, mountpoint = ?mountpoint, "Mount configuration");
|
||||
info!("Cache directory: {:?}", cache_dir_clone);
|
||||
|
||||
let cache_dir = cache_dir.unwrap_or_else(|| {
|
||||
dirs::cache_dir()
|
||||
.unwrap_or_else(|| PathBuf::from("/tmp"))
|
||||
.join("musicfs")
|
||||
});
|
||||
info!("Cache directory: {:?}", cache_dir);
|
||||
|
||||
std::fs::create_dir_all(&cache_dir).context("Failed to create cache directory")?;
|
||||
std::fs::create_dir_all(&cache_dir_clone).context("Failed to create cache directory")?;
|
||||
std::fs::create_dir_all(&mountpoint).context("Failed to create mountpoint")?;
|
||||
|
||||
let cas_config = CasConfig {
|
||||
chunks_dir: cache_dir.join("chunks"),
|
||||
chunks_dir: cache_dir_clone.join("chunks"),
|
||||
..Default::default()
|
||||
};
|
||||
let store = Arc::new(
|
||||
@@ -188,14 +217,64 @@ fn run_mount(
|
||||
Ok::<_, anyhow::Error>((tree, reader))
|
||||
})?;
|
||||
|
||||
let fs = MusicFs::with_reader(tree, reader, handle);
|
||||
check_stale_mount(&mountpoint)?;
|
||||
|
||||
let lock_path = cache_dir.join("musicfs.lock");
|
||||
let _lock = try_acquire_lock(&lock_path)
|
||||
.context("Failed to acquire lock — is another instance running?")?;
|
||||
info!(lock_path = ?lock_path, "Lock acquired");
|
||||
|
||||
let fs = MusicFs::with_reader(tree, reader, handle.clone());
|
||||
|
||||
info!("Mounting filesystem at {:?}", mountpoint);
|
||||
info!("Press Ctrl+C to unmount");
|
||||
|
||||
fs.mount(&mountpoint)
|
||||
let session = fs
|
||||
.spawn_mount(&mountpoint)
|
||||
.context("Failed to mount filesystem")?;
|
||||
|
||||
#[cfg(target_os = "linux")]
|
||||
{
|
||||
if let Err(e) = sd_notify::notify(false, &[sd_notify::NotifyState::Ready]) {
|
||||
debug!("sd_notify not available (not running under systemd): {}", e);
|
||||
}
|
||||
}
|
||||
info!("MusicFS ready, PID {}", std::process::id());
|
||||
|
||||
let shutdown_token = tokio_util::sync::CancellationToken::new();
|
||||
|
||||
runtime.block_on(async {
|
||||
let mut sigterm =
|
||||
tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate())?;
|
||||
let mut sigint =
|
||||
tokio::signal::unix::signal(tokio::signal::unix::SignalKind::interrupt())?;
|
||||
|
||||
tokio::select! {
|
||||
_ = sigterm.recv() => {
|
||||
info!("Received SIGTERM, shutting down");
|
||||
}
|
||||
_ = sigint.recv() => {
|
||||
info!("Received SIGINT, shutting down");
|
||||
}
|
||||
}
|
||||
|
||||
info!("Beginning ordered shutdown");
|
||||
shutdown_token.cancel();
|
||||
|
||||
tokio::time::sleep(std::time::Duration::from_millis(500)).await;
|
||||
|
||||
info!("Background tasks stopped");
|
||||
|
||||
Ok::<_, anyhow::Error>(())
|
||||
})?;
|
||||
|
||||
#[cfg(target_os = "linux")]
|
||||
{
|
||||
let _ = sd_notify::notify(false, &[sd_notify::NotifyState::Stopping]);
|
||||
}
|
||||
info!("Unmounting filesystem");
|
||||
drop(session);
|
||||
info!("Shutdown complete");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
@@ -437,3 +516,25 @@ fn sanitize(s: &str) -> String {
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
fn check_stale_mount(mountpoint: &Path) -> Result<()> {
|
||||
if let Ok(mounts) = std::fs::read_to_string("/proc/mounts") {
|
||||
for line in mounts.lines() {
|
||||
if line.contains(mountpoint.to_string_lossy().as_ref()) && line.contains("fuse") {
|
||||
warn!(
|
||||
"Stale FUSE mount detected at {:?}, attempting cleanup",
|
||||
mountpoint
|
||||
);
|
||||
let status = std::process::Command::new("fusermount")
|
||||
.args(["-uz", &mountpoint.to_string_lossy()])
|
||||
.status();
|
||||
match status {
|
||||
Ok(s) if s.success() => info!("Stale mount cleaned up"),
|
||||
Ok(s) => warn!("fusermount exited with: {}", s),
|
||||
Err(e) => warn!("Failed to run fusermount: {}", e),
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
@@ -12,6 +12,7 @@ tokio = { workspace = true, features = ["sync"] }
|
||||
tracing.workspace = true
|
||||
xxhash-rust.workspace = true
|
||||
hex.workspace = true
|
||||
parking_lot.workspace = true
|
||||
|
||||
[dev-dependencies]
|
||||
tempfile.workspace = true
|
||||
|
||||
@@ -23,6 +23,9 @@ pub enum Error {
|
||||
#[error("Database error: {0}")]
|
||||
Database(String),
|
||||
|
||||
#[error("Database corrupted: {0}")]
|
||||
DatabaseCorrupted(String),
|
||||
|
||||
#[error("NFS stale file handle")]
|
||||
NfsStaleHandle,
|
||||
|
||||
|
||||
@@ -4,6 +4,7 @@ pub mod error;
|
||||
pub mod events;
|
||||
pub mod metrics;
|
||||
pub mod resolver;
|
||||
pub mod supervisor;
|
||||
pub mod types;
|
||||
|
||||
pub use config::{
|
||||
@@ -19,6 +20,38 @@ pub fn sanitize_path(path: &Path) -> String {
|
||||
path.to_string_lossy().to_string()
|
||||
}
|
||||
}
|
||||
|
||||
/// Install a custom panic hook that logs panics via tracing before the default behavior.
|
||||
/// This ensures panics are captured in log files and journald.
|
||||
pub fn install_panic_hook() {
|
||||
let default_hook = std::panic::take_hook();
|
||||
std::panic::set_hook(Box::new(move |info| {
|
||||
let thread = std::thread::current();
|
||||
let thread_name = thread.name().unwrap_or("<unnamed>");
|
||||
|
||||
let message = if let Some(s) = info.payload().downcast_ref::<&str>() {
|
||||
(*s).to_string()
|
||||
} else if let Some(s) = info.payload().downcast_ref::<String>() {
|
||||
s.clone()
|
||||
} else {
|
||||
"unknown panic".to_string()
|
||||
};
|
||||
|
||||
let location = info
|
||||
.location()
|
||||
.map(|l| format!("{}:{}:{}", l.file(), l.line(), l.column()))
|
||||
.unwrap_or_else(|| "unknown location".to_string());
|
||||
|
||||
tracing::error!(
|
||||
thread = thread_name,
|
||||
location = %location,
|
||||
"PANIC: {}",
|
||||
message
|
||||
);
|
||||
|
||||
default_hook(info);
|
||||
}));
|
||||
}
|
||||
pub use credentials::{Credential, CredentialConfig, CredentialError, CredentialStore};
|
||||
pub use error::{Error, Result};
|
||||
pub use events::{Event, EventBus};
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
use parking_lot::RwLock;
|
||||
use std::collections::HashMap;
|
||||
use std::sync::atomic::{AtomicU64, Ordering};
|
||||
use std::sync::RwLock;
|
||||
use std::time::Instant;
|
||||
|
||||
#[derive(Default)]
|
||||
@@ -45,7 +45,7 @@ impl Metrics {
|
||||
self.fuse_ops.open.load(Ordering::Relaxed),
|
||||
));
|
||||
|
||||
for (op, histogram) in self.fuse_latency.histograms.read().unwrap().iter() {
|
||||
for (op, histogram) in self.fuse_latency.histograms.read().iter() {
|
||||
let quantiles = histogram.quantiles();
|
||||
output.push_str(&format!(
|
||||
"# HELP musicfs_fuse_latency_seconds FUSE operation latency\n\
|
||||
@@ -95,7 +95,7 @@ impl Metrics {
|
||||
"# HELP musicfs_origin_health Origin health status (1=healthy, 0=unhealthy)\n\
|
||||
# TYPE musicfs_origin_health gauge\n",
|
||||
);
|
||||
for (origin_id, healthy) in self.origin_health.status.read().unwrap().iter() {
|
||||
for (origin_id, healthy) in self.origin_health.status.read().iter() {
|
||||
output.push_str(&format!(
|
||||
"musicfs_origin_health{{origin=\"{}\"}} {}\n",
|
||||
origin_id,
|
||||
@@ -203,7 +203,7 @@ pub struct FuseLatencyMetrics {
|
||||
|
||||
impl FuseLatencyMetrics {
|
||||
pub fn record(&self, op: &str, latency_secs: f64) {
|
||||
let mut histograms = self.histograms.write().unwrap();
|
||||
let mut histograms = self.histograms.write();
|
||||
histograms
|
||||
.entry(op.to_string())
|
||||
.or_default()
|
||||
@@ -268,7 +268,6 @@ impl OriginHealthMetrics {
|
||||
pub fn set_health(&self, origin_id: &str, healthy: bool) {
|
||||
self.status
|
||||
.write()
|
||||
.unwrap()
|
||||
.insert(origin_id.to_string(), healthy);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,181 @@
|
||||
use parking_lot::RwLock;
|
||||
use std::collections::HashMap;
|
||||
use std::sync::Arc;
|
||||
use std::time::{Duration, Instant};
|
||||
use tokio::task::JoinHandle;
|
||||
use tracing::{error, warn};
|
||||
|
||||
pub struct TaskSupervisor {
|
||||
tasks: Arc<RwLock<HashMap<String, TaskEntry>>>,
|
||||
}
|
||||
|
||||
struct TaskEntry {
|
||||
handle: JoinHandle<()>,
|
||||
status: TaskStatus,
|
||||
restart_count: u32,
|
||||
last_restart: Option<Instant>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum TaskStatus {
|
||||
Running,
|
||||
Failed { error: String, at: Instant },
|
||||
Restarting { attempt: u32 },
|
||||
Stopped,
|
||||
}
|
||||
|
||||
impl Default for TaskSupervisor {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
impl TaskSupervisor {
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
tasks: Arc::new(RwLock::new(HashMap::new())),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn spawn_supervised<F>(&self, name: &str, future: F)
|
||||
where
|
||||
F: std::future::Future<Output = ()> + Send + 'static,
|
||||
{
|
||||
let name_owned = name.to_string();
|
||||
|
||||
let handle = tokio::spawn(async move {
|
||||
future.await;
|
||||
});
|
||||
|
||||
self.tasks.write().insert(
|
||||
name_owned,
|
||||
TaskEntry {
|
||||
handle,
|
||||
status: TaskStatus::Running,
|
||||
restart_count: 0,
|
||||
last_restart: None,
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
pub fn spawn_critical<F, Fut>(&self, name: &str, factory: F)
|
||||
where
|
||||
F: Fn() -> Fut + Send + Sync + 'static,
|
||||
Fut: std::future::Future<Output = ()> + Send + 'static,
|
||||
{
|
||||
let tasks = self.tasks.clone();
|
||||
let name_owned = name.to_string();
|
||||
|
||||
let monitor_handle = tokio::spawn(async move {
|
||||
let mut restart_count = 0u32;
|
||||
let max_restarts = 5u32;
|
||||
let backoff_durations = [
|
||||
Duration::from_secs(1),
|
||||
Duration::from_secs(5),
|
||||
Duration::from_secs(30),
|
||||
];
|
||||
|
||||
loop {
|
||||
let handle = tokio::spawn(factory());
|
||||
|
||||
{
|
||||
let mut t = tasks.write();
|
||||
if let Some(entry) = t.get_mut(&name_owned) {
|
||||
entry.status = TaskStatus::Running;
|
||||
}
|
||||
}
|
||||
|
||||
match handle.await {
|
||||
Ok(()) => {
|
||||
let mut t = tasks.write();
|
||||
if let Some(entry) = t.get_mut(&name_owned) {
|
||||
entry.status = TaskStatus::Stopped;
|
||||
}
|
||||
break;
|
||||
}
|
||||
Err(e) => {
|
||||
restart_count += 1;
|
||||
|
||||
if restart_count > max_restarts {
|
||||
error!(task = %name_owned, "Task exceeded max restarts ({}), giving up", max_restarts);
|
||||
let mut t = tasks.write();
|
||||
if let Some(entry) = t.get_mut(&name_owned) {
|
||||
entry.status = TaskStatus::Failed {
|
||||
error: format!("Exceeded max restarts: {}", e),
|
||||
at: Instant::now(),
|
||||
};
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
let backoff_idx =
|
||||
(restart_count as usize - 1).min(backoff_durations.len() - 1);
|
||||
let backoff = backoff_durations[backoff_idx];
|
||||
|
||||
warn!(
|
||||
task = %name_owned,
|
||||
error = %e,
|
||||
attempt = restart_count,
|
||||
backoff_ms = backoff.as_millis() as u64,
|
||||
"Critical task failed, restarting with backoff"
|
||||
);
|
||||
|
||||
{
|
||||
let mut t = tasks.write();
|
||||
if let Some(entry) = t.get_mut(&name_owned) {
|
||||
entry.status = TaskStatus::Restarting {
|
||||
attempt: restart_count,
|
||||
};
|
||||
entry.restart_count = restart_count;
|
||||
entry.last_restart = Some(Instant::now());
|
||||
}
|
||||
}
|
||||
|
||||
tokio::time::sleep(backoff).await;
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
self.tasks.write().insert(
|
||||
name.to_string(),
|
||||
TaskEntry {
|
||||
handle: monitor_handle,
|
||||
status: TaskStatus::Running,
|
||||
restart_count: 0,
|
||||
last_restart: None,
|
||||
},
|
||||
);
|
||||
}
|
||||
|
||||
pub fn task_status(&self, name: &str) -> TaskStatus {
|
||||
let mut tasks = self.tasks.write();
|
||||
if let Some(entry) = tasks.get_mut(name) {
|
||||
if entry.handle.is_finished() {
|
||||
entry.status = TaskStatus::Failed {
|
||||
error: "Task exited".into(),
|
||||
at: Instant::now(),
|
||||
};
|
||||
}
|
||||
entry.status.clone()
|
||||
} else {
|
||||
TaskStatus::Stopped
|
||||
}
|
||||
}
|
||||
|
||||
pub fn check_all(&self) -> Vec<(String, TaskStatus)> {
|
||||
let mut tasks = self.tasks.write();
|
||||
tasks
|
||||
.iter_mut()
|
||||
.map(|(name, entry)| {
|
||||
if entry.handle.is_finished() {
|
||||
entry.status = TaskStatus::Failed {
|
||||
error: "Task exited".into(),
|
||||
at: Instant::now(),
|
||||
};
|
||||
}
|
||||
(name.clone(), entry.status.clone())
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
}
|
||||
@@ -6,10 +6,11 @@ use fuser::{
|
||||
use musicfs_cache::{VirtualNode, VirtualTree, ROOT_INODE};
|
||||
use musicfs_cas::FileReader;
|
||||
use musicfs_core::Result;
|
||||
use parking_lot::RwLock;
|
||||
use std::collections::HashMap;
|
||||
use std::ffi::OsStr;
|
||||
use std::path::Path;
|
||||
use std::sync::{Arc, RwLock};
|
||||
use std::sync::Arc;
|
||||
use std::time::{Duration, SystemTime};
|
||||
use tokio::runtime::Handle;
|
||||
use tracing::{debug, info, instrument, trace, warn};
|
||||
@@ -65,15 +66,15 @@ impl MusicFs {
|
||||
}
|
||||
|
||||
fn get_or_create_query_inode(&self, query: &str) -> u64 {
|
||||
let query_inodes = self.query_inodes.read().unwrap();
|
||||
let query_inodes = self.query_inodes.read();
|
||||
if let Some(&inode) = query_inodes.get(query) {
|
||||
return inode;
|
||||
}
|
||||
drop(query_inodes);
|
||||
|
||||
let mut query_inodes = self.query_inodes.write().unwrap();
|
||||
let mut inode_queries = self.inode_queries.write().unwrap();
|
||||
let mut next_inode = self.next_query_inode.write().unwrap();
|
||||
let mut query_inodes = self.query_inodes.write();
|
||||
let mut inode_queries = self.inode_queries.write();
|
||||
let mut next_inode = self.next_query_inode.write();
|
||||
|
||||
if let Some(&inode) = query_inodes.get(query) {
|
||||
return inode;
|
||||
@@ -87,7 +88,7 @@ impl MusicFs {
|
||||
}
|
||||
|
||||
fn get_query_for_inode(&self, inode: u64) -> Option<String> {
|
||||
self.inode_queries.read().unwrap().get(&inode).cloned()
|
||||
self.inode_queries.read().get(&inode).cloned()
|
||||
}
|
||||
|
||||
pub fn mount(self, mountpoint: &Path) -> Result<()> {
|
||||
@@ -105,6 +106,22 @@ impl MusicFs {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub fn spawn_mount(self, mountpoint: &Path) -> Result<fuser::BackgroundSession> {
|
||||
info!("Mounting MusicFS at {:?}", mountpoint);
|
||||
|
||||
let options = vec![
|
||||
fuser::MountOption::RO,
|
||||
fuser::MountOption::FSName("musicfs".to_string()),
|
||||
fuser::MountOption::AutoUnmount,
|
||||
fuser::MountOption::AllowOther,
|
||||
];
|
||||
|
||||
let session =
|
||||
fuser::spawn_mount2(self, mountpoint, &options).map_err(musicfs_core::Error::Io)?;
|
||||
|
||||
Ok(session)
|
||||
}
|
||||
|
||||
fn node_to_attr(&self, node: &VirtualNode) -> FileAttr {
|
||||
match node {
|
||||
VirtualNode::Directory(dir) => FileAttr {
|
||||
@@ -189,7 +206,7 @@ impl Filesystem for MusicFs {
|
||||
}
|
||||
}
|
||||
|
||||
let tree = self.tree.read().unwrap();
|
||||
let tree = self.tree.read();
|
||||
|
||||
if let Some(inode) = tree.lookup(parent, name) {
|
||||
trace!(parent, name = %name_str, ino = inode, "file found in tree");
|
||||
@@ -230,7 +247,7 @@ impl Filesystem for MusicFs {
|
||||
}
|
||||
}
|
||||
|
||||
let tree = self.tree.read().unwrap();
|
||||
let tree = self.tree.read();
|
||||
|
||||
if let Some(node) = tree.get(ino) {
|
||||
trace!(ino, "inode found in tree");
|
||||
@@ -267,7 +284,7 @@ impl Filesystem for MusicFs {
|
||||
}
|
||||
}
|
||||
|
||||
let tree = self.tree.read().unwrap();
|
||||
let tree = self.tree.read();
|
||||
|
||||
if let Some(children) = tree.readdir(ino) {
|
||||
trace!(ino, offset, children_count = children.len(), "directory found");
|
||||
@@ -324,7 +341,7 @@ impl Filesystem for MusicFs {
|
||||
return;
|
||||
}
|
||||
|
||||
let tree = self.tree.read().unwrap();
|
||||
let tree = self.tree.read();
|
||||
|
||||
if tree.get(ino).is_some() {
|
||||
trace!(ino, "inode found");
|
||||
@@ -348,7 +365,7 @@ impl Filesystem for MusicFs {
|
||||
reply: ReplyData,
|
||||
) {
|
||||
let file_id = {
|
||||
let tree = self.tree.read().unwrap();
|
||||
let tree = self.tree.read();
|
||||
if let Some(VirtualNode::File(file)) = tree.get(ino) {
|
||||
trace!(ino, "file found in tree");
|
||||
file.file_id
|
||||
@@ -369,19 +386,27 @@ impl Filesystem for MusicFs {
|
||||
let handle = self.runtime_handle.clone();
|
||||
let result = std::thread::scope(|_| {
|
||||
handle.block_on(async {
|
||||
reader.read(file_id, offset as u64, size).await
|
||||
tokio::time::timeout(
|
||||
Duration::from_secs(30),
|
||||
reader.read(file_id, offset as u64, size),
|
||||
)
|
||||
.await
|
||||
})
|
||||
});
|
||||
|
||||
match result {
|
||||
Ok(data) => {
|
||||
Ok(Ok(data)) => {
|
||||
trace!(ino, offset, size_bytes = size, bytes_read = data.len(), "read successful");
|
||||
reply.data(&data);
|
||||
}
|
||||
Err(e) => {
|
||||
Ok(Err(e)) => {
|
||||
warn!(ino, offset, size_bytes = size, error = %e, "read failed");
|
||||
reply.error(libc::EIO);
|
||||
}
|
||||
Err(_timeout) => {
|
||||
warn!(ino, offset, size_bytes = size, "read timed out after 30s");
|
||||
reply.error(libc::EIO);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -564,7 +589,7 @@ mod tests {
|
||||
|
||||
let _fs = MusicFs::new(tree.clone(), handle);
|
||||
|
||||
let tree_read = tree.read().unwrap();
|
||||
let tree_read = tree.read();
|
||||
assert!(tree_read.get(ROOT_INODE).is_some());
|
||||
assert!(tree_read.get_by_path(&VirtualPath::new("/Artist")).is_some());
|
||||
}
|
||||
|
||||
@@ -12,10 +12,12 @@ sftp = []
|
||||
musicfs-core = { path = "../musicfs-core" }
|
||||
async-trait.workspace = true
|
||||
dashmap.workspace = true
|
||||
futures.workspace = true
|
||||
libc.workspace = true
|
||||
thiserror.workspace = true
|
||||
tokio = { workspace = true, features = ["fs", "sync", "time"] }
|
||||
tracing.workspace = true
|
||||
parking_lot.workspace = true
|
||||
|
||||
[dev-dependencies]
|
||||
tempfile.workspace = true
|
||||
|
||||
@@ -1,11 +1,12 @@
|
||||
use crate::traits::Origin;
|
||||
use dashmap::DashMap;
|
||||
use futures::future::join_all;
|
||||
use musicfs_core::{Event, EventBus, HealthStatus, OriginId, OriginType};
|
||||
use std::collections::HashMap;
|
||||
use std::sync::Arc;
|
||||
use std::time::{Duration, Instant};
|
||||
use tokio::sync::mpsc;
|
||||
use tracing::{debug, info, info_span, Instrument};
|
||||
use tracing::{debug, info, info_span, warn, Instrument};
|
||||
|
||||
pub struct HealthMonitor {
|
||||
origins: DashMap<OriginId, Arc<dyn Origin>>,
|
||||
@@ -180,21 +181,37 @@ impl HealthMonitor {
|
||||
HealthCheckHandle { stop_tx }
|
||||
}
|
||||
|
||||
async fn check_all(&self) {
|
||||
pub async fn check_all(&self) {
|
||||
let origins: Vec<_> = self
|
||||
.origins
|
||||
.iter()
|
||||
.map(|e| (e.key().clone(), e.value().clone()))
|
||||
.collect();
|
||||
|
||||
for (id, origin) in origins {
|
||||
self.check_one(&id, &origin).await;
|
||||
}
|
||||
let checks: Vec<_> = origins
|
||||
.iter()
|
||||
.map(|(id, origin)| self.check_one(id, origin))
|
||||
.collect();
|
||||
|
||||
join_all(checks).await;
|
||||
}
|
||||
|
||||
async fn check_one(&self, id: &OriginId, origin: &Arc<dyn Origin>) {
|
||||
let start = Instant::now();
|
||||
let status = origin.health().await;
|
||||
let health_timeout = Duration::from_millis(1500);
|
||||
|
||||
let status = match tokio::time::timeout(health_timeout, origin.health()).await {
|
||||
Ok(status) => status,
|
||||
Err(_) => {
|
||||
warn!(
|
||||
origin_id = %id,
|
||||
timeout_ms = health_timeout.as_millis() as u64,
|
||||
"Health check timed out"
|
||||
);
|
||||
HealthStatus::Unhealthy
|
||||
}
|
||||
};
|
||||
|
||||
let latency_ms = start.elapsed().as_millis() as u64;
|
||||
|
||||
let threshold = self.threshold_for(origin.origin_type());
|
||||
|
||||
@@ -2,8 +2,9 @@ use crate::health::{HealthMonitor, HealthSnapshot};
|
||||
use crate::router::Router;
|
||||
use crate::traits::{Origin, WatchHandle};
|
||||
use musicfs_core::{OriginId, RealPath};
|
||||
use parking_lot::RwLock;
|
||||
use std::collections::HashMap;
|
||||
use std::sync::{Arc, RwLock};
|
||||
use std::sync::Arc;
|
||||
use tracing::{info, warn};
|
||||
|
||||
pub struct OriginRegistry {
|
||||
@@ -29,17 +30,17 @@ impl OriginRegistry {
|
||||
|
||||
self.router.set_priority(id.clone(), priority);
|
||||
self.health_monitor.add_origin(origin.clone());
|
||||
self.origins.write().unwrap().insert(id, origin);
|
||||
self.origins.write().insert(id, origin);
|
||||
}
|
||||
|
||||
pub fn unregister(&self, id: &OriginId) {
|
||||
info!("Unregistering origin {}", id);
|
||||
|
||||
if let Some(handles) = self.watch_handles.write().unwrap().remove(id) {
|
||||
if let Some(handles) = self.watch_handles.write().remove(id) {
|
||||
info!("Dropping {} watch handles for origin {}", handles.len(), id);
|
||||
}
|
||||
|
||||
self.origins.write().unwrap().remove(id);
|
||||
self.origins.write().remove(id);
|
||||
self.router.remove_priority(id);
|
||||
self.health_monitor.remove_origin(id);
|
||||
}
|
||||
@@ -47,22 +48,21 @@ impl OriginRegistry {
|
||||
pub fn register_watch(&self, origin_id: &OriginId, handle: WatchHandle) {
|
||||
self.watch_handles
|
||||
.write()
|
||||
.unwrap()
|
||||
.entry(origin_id.clone())
|
||||
.or_default()
|
||||
.push(handle);
|
||||
}
|
||||
|
||||
pub fn get(&self, id: &OriginId) -> Option<Arc<dyn Origin>> {
|
||||
self.origins.read().unwrap().get(id).cloned()
|
||||
self.origins.read().get(id).cloned()
|
||||
}
|
||||
|
||||
pub fn list(&self) -> Vec<Arc<dyn Origin>> {
|
||||
self.origins.read().unwrap().values().cloned().collect()
|
||||
self.origins.read().values().cloned().collect()
|
||||
}
|
||||
|
||||
pub fn route(&self, path: &RealPath) -> Option<Arc<dyn Origin>> {
|
||||
let origins = self.origins.read().unwrap();
|
||||
let origins = self.origins.read();
|
||||
let health = self.health_monitor.snapshot();
|
||||
|
||||
let candidates: Vec<_> = origins
|
||||
@@ -86,7 +86,7 @@ impl OriginRegistry {
|
||||
}
|
||||
|
||||
pub fn route_with_fallback(&self, path: &RealPath) -> Option<Arc<dyn Origin>> {
|
||||
let origins = self.origins.read().unwrap();
|
||||
let origins = self.origins.read();
|
||||
let health = self.health_monitor.snapshot();
|
||||
|
||||
let candidates: Vec<_> = origins
|
||||
@@ -109,7 +109,7 @@ impl OriginRegistry {
|
||||
}
|
||||
|
||||
pub fn route_all(&self, path: &RealPath) -> Vec<Arc<dyn Origin>> {
|
||||
let origins = self.origins.read().unwrap();
|
||||
let origins = self.origins.read();
|
||||
let health = self.health_monitor.snapshot();
|
||||
|
||||
let mut result: Vec<_> = origins
|
||||
|
||||
@@ -6,7 +6,7 @@ use tantivy::collector::TopDocs;
|
||||
use tantivy::query::{BooleanQuery, FuzzyTermQuery, Occur, Query, QueryParser};
|
||||
use tantivy::schema::{Field, Schema, Value, STORED, TEXT, INDEXED};
|
||||
use tantivy::{Index, IndexReader, IndexWriter, ReloadPolicy, TantivyDocument, Term};
|
||||
use tracing::{debug, info};
|
||||
use tracing::{debug, info, warn};
|
||||
|
||||
const SCHEMA_VERSION: u32 = 1;
|
||||
|
||||
@@ -95,6 +95,28 @@ impl SearchIndex {
|
||||
})
|
||||
}
|
||||
|
||||
pub fn open_with_recovery(index_path: &Path) -> Result<Self, SearchError> {
|
||||
match Self::open(index_path) {
|
||||
Ok(index) => {
|
||||
let docs = index.reader.searcher().num_docs();
|
||||
info!(docs, "Search index opened successfully");
|
||||
Ok(index)
|
||||
}
|
||||
Err(e) => {
|
||||
warn!(
|
||||
error = %e,
|
||||
path = ?index_path,
|
||||
"Search index corrupted, rebuilding from scratch"
|
||||
);
|
||||
if index_path.exists() {
|
||||
std::fs::remove_dir_all(index_path)
|
||||
.map_err(SearchError::Io)?;
|
||||
}
|
||||
Self::open(index_path)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn index_file(&self, file: &FileMeta) -> Result<(), SearchError> {
|
||||
let mut doc = TantivyDocument::new();
|
||||
|
||||
|
||||
@@ -0,0 +1,43 @@
|
||||
[package]
|
||||
name = "musicfs-test-utils"
|
||||
version.workspace = true
|
||||
edition.workspace = true
|
||||
description = "Test utilities and fixtures for MusicFS resilience testing"
|
||||
|
||||
[dependencies]
|
||||
musicfs-core = { path = "../musicfs-core" }
|
||||
musicfs-origins = { path = "../musicfs-origins" }
|
||||
musicfs-cas = { path = "../musicfs-cas" }
|
||||
musicfs-cache = { path = "../musicfs-cache" }
|
||||
musicfs-search = { path = "../musicfs-search" }
|
||||
|
||||
async-trait.workspace = true
|
||||
tokio = { workspace = true, features = ["full", "sync", "time"] }
|
||||
tracing.workspace = true
|
||||
thiserror.workspace = true
|
||||
parking_lot.workspace = true
|
||||
tempfile.workspace = true
|
||||
bytes.workspace = true
|
||||
|
||||
# Fault injection
|
||||
fail = { version = "0.5", optional = true }
|
||||
rlimit = { version = "0.10", optional = true }
|
||||
nix = { version = "0.29", optional = true, features = ["signal", "process"] }
|
||||
|
||||
# Docker/network tests
|
||||
noxious-client = { version = "1.0", optional = true }
|
||||
reqwest = { version = "0.11", optional = true, default-features = false, features = ["rustls-tls"] }
|
||||
|
||||
[features]
|
||||
default = []
|
||||
failpoints = ["fail/failpoints"]
|
||||
process-tests = ["nix"]
|
||||
resource-limits = ["rlimit"]
|
||||
docker-tests = ["noxious-client", "reqwest"]
|
||||
full = ["failpoints", "process-tests", "resource-limits", "docker-tests"]
|
||||
|
||||
[dev-dependencies]
|
||||
tokio-test = "0.4"
|
||||
tokio-util.workspace = true
|
||||
sd-notify.workspace = true
|
||||
libc.workspace = true
|
||||
@@ -0,0 +1,206 @@
|
||||
use musicfs_cas::CasError;
|
||||
use musicfs_core::Error;
|
||||
use std::time::{Duration, Instant};
|
||||
|
||||
pub fn assert_error_contains<T, E: std::fmt::Debug>(result: Result<T, E>, expected_text: &str) {
|
||||
match result {
|
||||
Ok(_) => panic!("Expected error containing '{}', but got Ok", expected_text),
|
||||
Err(e) => {
|
||||
let error_msg = format!("{:?}", e);
|
||||
assert!(
|
||||
error_msg.contains(expected_text),
|
||||
"Expected error containing '{}', but got: {}",
|
||||
expected_text,
|
||||
error_msg
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub fn assert_io_error<T>(result: Result<T, Error>) {
|
||||
match result {
|
||||
Err(Error::Io(_)) => (),
|
||||
Err(e) => panic!("Expected Io error, got: {:?}", e),
|
||||
Ok(_) => panic!("Expected Io error, got Ok"),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn assert_cas_io_error<T>(result: Result<T, CasError>) {
|
||||
match result {
|
||||
Err(CasError::Io(_)) => (),
|
||||
Err(e) => panic!("Expected CasError::Io, got: {:?}", e),
|
||||
Ok(_) => panic!("Expected CasError::Io, got Ok"),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn assert_cas_not_found<T>(result: Result<T, CasError>) {
|
||||
match result {
|
||||
Err(CasError::NotFound(_)) => (),
|
||||
Err(e) => panic!("Expected CasError::NotFound, got: {:?}", e),
|
||||
Ok(_) => panic!("Expected CasError::NotFound, got Ok"),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn assert_cas_integrity_error<T>(result: Result<T, CasError>) {
|
||||
match result {
|
||||
Err(CasError::IntegrityError { .. }) => (),
|
||||
Err(e) => panic!("Expected CasError::IntegrityError, got: {:?}", e),
|
||||
Ok(_) => panic!("Expected CasError::IntegrityError, got Ok"),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn assert_file_not_found<T>(result: Result<T, Error>) {
|
||||
match result {
|
||||
Err(Error::FileNotFound(_)) => (),
|
||||
Err(e) => panic!("Expected FileNotFound error, got: {:?}", e),
|
||||
Ok(_) => panic!("Expected FileNotFound error, got Ok"),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn assert_origin_error<T>(result: Result<T, Error>) {
|
||||
match result {
|
||||
Err(Error::Origin(_)) => (),
|
||||
Err(e) => panic!("Expected Origin error, got: {:?}", e),
|
||||
Ok(_) => panic!("Expected Origin error, got Ok"),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn assert_timeout_error<T>(result: Result<T, Error>) {
|
||||
match result {
|
||||
Err(Error::Timeout(_)) => (),
|
||||
Err(e) => panic!("Expected Timeout error, got: {:?}", e),
|
||||
Ok(_) => panic!("Expected Timeout error, got Ok"),
|
||||
}
|
||||
}
|
||||
|
||||
pub struct TimedAssertion {
|
||||
start: Instant,
|
||||
min_duration: Option<Duration>,
|
||||
max_duration: Option<Duration>,
|
||||
}
|
||||
|
||||
impl TimedAssertion {
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
start: Instant::now(),
|
||||
min_duration: None,
|
||||
max_duration: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn expect_at_least(mut self, duration: Duration) -> Self {
|
||||
self.min_duration = Some(duration);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn expect_at_most(mut self, duration: Duration) -> Self {
|
||||
self.max_duration = Some(duration);
|
||||
self
|
||||
}
|
||||
|
||||
pub fn assert_elapsed(self) {
|
||||
let elapsed = self.start.elapsed();
|
||||
|
||||
if let Some(min) = self.min_duration {
|
||||
assert!(
|
||||
elapsed >= min,
|
||||
"Expected at least {:?}, but only {:?} elapsed",
|
||||
min,
|
||||
elapsed
|
||||
);
|
||||
}
|
||||
|
||||
if let Some(max) = self.max_duration {
|
||||
assert!(
|
||||
elapsed <= max,
|
||||
"Expected at most {:?}, but {:?} elapsed",
|
||||
max,
|
||||
elapsed
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for TimedAssertion {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn assert_completes_within<F, T>(future: F, timeout: Duration) -> T
|
||||
where
|
||||
F: std::future::Future<Output = T>,
|
||||
{
|
||||
tokio::time::timeout(timeout, future)
|
||||
.await
|
||||
.expect(&format!(
|
||||
"Operation did not complete within {:?}",
|
||||
timeout
|
||||
))
|
||||
}
|
||||
|
||||
pub async fn assert_times_out<F, T>(future: F, timeout: Duration)
|
||||
where
|
||||
F: std::future::Future<Output = T>,
|
||||
{
|
||||
match tokio::time::timeout(timeout, future).await {
|
||||
Ok(_) => panic!("Expected operation to time out, but it completed"),
|
||||
Err(_) => (),
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_assert_error_contains() {
|
||||
let result: Result<(), Error> = Err(Error::Origin("connection refused".into()));
|
||||
assert_error_contains(result, "connection");
|
||||
}
|
||||
|
||||
#[test]
|
||||
#[should_panic(expected = "Expected error containing")]
|
||||
fn test_assert_error_contains_failure() {
|
||||
let result: Result<(), Error> = Err(Error::Origin("something else".into()));
|
||||
assert_error_contains(result, "connection");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_assert_io_error() {
|
||||
let result: Result<(), Error> =
|
||||
Err(Error::Io(std::io::Error::new(std::io::ErrorKind::Other, "test")));
|
||||
assert_io_error(result);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_timed_assertion_at_least() {
|
||||
let timer = TimedAssertion::new().expect_at_least(Duration::from_millis(10));
|
||||
std::thread::sleep(Duration::from_millis(15));
|
||||
timer.assert_elapsed();
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_timed_assertion_at_most() {
|
||||
let timer = TimedAssertion::new().expect_at_most(Duration::from_millis(100));
|
||||
timer.assert_elapsed();
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_assert_completes_within() {
|
||||
let result =
|
||||
assert_completes_within(async { 42 }, Duration::from_millis(100)).await;
|
||||
assert_eq!(result, 42);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_assert_times_out() {
|
||||
assert_times_out(
|
||||
async {
|
||||
tokio::time::sleep(Duration::from_secs(10)).await;
|
||||
},
|
||||
Duration::from_millis(10),
|
||||
)
|
||||
.await;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,250 @@
|
||||
use bytes::Bytes;
|
||||
use musicfs_cas::{CasConfig, CasError, CasStore, DedupStats};
|
||||
use musicfs_core::ChunkHash;
|
||||
use std::io::{self, ErrorKind};
|
||||
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
|
||||
use std::sync::Arc;
|
||||
|
||||
pub struct FaultyCasStore {
|
||||
inner: Arc<CasStore>,
|
||||
inject_enospc: AtomicBool,
|
||||
inject_eio_on_read: AtomicBool,
|
||||
inject_eio_on_write: AtomicBool,
|
||||
inject_corruption: AtomicBool,
|
||||
fail_after_n_puts: AtomicUsize,
|
||||
put_count: AtomicUsize,
|
||||
}
|
||||
|
||||
impl FaultyCasStore {
|
||||
pub fn new(inner: Arc<CasStore>) -> Self {
|
||||
Self {
|
||||
inner,
|
||||
inject_enospc: AtomicBool::new(false),
|
||||
inject_eio_on_read: AtomicBool::new(false),
|
||||
inject_eio_on_write: AtomicBool::new(false),
|
||||
inject_corruption: AtomicBool::new(false),
|
||||
fail_after_n_puts: AtomicUsize::new(usize::MAX),
|
||||
put_count: AtomicUsize::new(0),
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn open(config: CasConfig) -> Result<Self, CasError> {
|
||||
let store = CasStore::open(config).await?;
|
||||
Ok(Self::new(Arc::new(store)))
|
||||
}
|
||||
|
||||
pub fn set_inject_enospc(&self, enabled: bool) {
|
||||
self.inject_enospc.store(enabled, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
pub fn set_inject_eio_on_read(&self, enabled: bool) {
|
||||
self.inject_eio_on_read.store(enabled, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
pub fn set_inject_eio_on_write(&self, enabled: bool) {
|
||||
self.inject_eio_on_write.store(enabled, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
pub fn set_inject_corruption(&self, enabled: bool) {
|
||||
self.inject_corruption.store(enabled, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
pub fn set_fail_after_n_puts(&self, n: usize) {
|
||||
self.fail_after_n_puts.store(n, Ordering::SeqCst);
|
||||
self.put_count.store(0, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
pub fn reset_faults(&self) {
|
||||
self.inject_enospc.store(false, Ordering::SeqCst);
|
||||
self.inject_eio_on_read.store(false, Ordering::SeqCst);
|
||||
self.inject_eio_on_write.store(false, Ordering::SeqCst);
|
||||
self.inject_corruption.store(false, Ordering::SeqCst);
|
||||
self.fail_after_n_puts.store(usize::MAX, Ordering::SeqCst);
|
||||
self.put_count.store(0, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
pub fn put_count(&self) -> usize {
|
||||
self.put_count.load(Ordering::SeqCst)
|
||||
}
|
||||
|
||||
pub async fn put(&self, data: &[u8]) -> Result<ChunkHash, CasError> {
|
||||
let count = self.put_count.fetch_add(1, Ordering::SeqCst);
|
||||
|
||||
if self.inject_enospc.load(Ordering::SeqCst) {
|
||||
return Err(CasError::Io(io::Error::new(
|
||||
ErrorKind::Other,
|
||||
"No space left on device (ENOSPC injected)",
|
||||
)));
|
||||
}
|
||||
|
||||
if self.inject_eio_on_write.load(Ordering::SeqCst) {
|
||||
return Err(CasError::Io(io::Error::new(
|
||||
ErrorKind::Other,
|
||||
"Input/output error (EIO injected)",
|
||||
)));
|
||||
}
|
||||
|
||||
let threshold = self.fail_after_n_puts.load(Ordering::SeqCst);
|
||||
if count >= threshold {
|
||||
return Err(CasError::Io(io::Error::new(
|
||||
ErrorKind::Other,
|
||||
"Injected failure after N puts",
|
||||
)));
|
||||
}
|
||||
|
||||
self.inner.put(data).await
|
||||
}
|
||||
|
||||
pub async fn get(&self, hash: &ChunkHash) -> Result<Bytes, CasError> {
|
||||
if self.inject_eio_on_read.load(Ordering::SeqCst) {
|
||||
return Err(CasError::Io(io::Error::new(
|
||||
ErrorKind::Other,
|
||||
"Input/output error (EIO injected)",
|
||||
)));
|
||||
}
|
||||
|
||||
let data = self.inner.get(hash).await?;
|
||||
|
||||
if self.inject_corruption.load(Ordering::SeqCst) {
|
||||
let mut corrupted = data.to_vec();
|
||||
if !corrupted.is_empty() {
|
||||
corrupted[0] = corrupted[0].wrapping_add(1);
|
||||
}
|
||||
return Err(CasError::IntegrityError {
|
||||
expected: hash.as_hex(),
|
||||
actual: ChunkHash::from_bytes(&corrupted).as_hex(),
|
||||
});
|
||||
}
|
||||
|
||||
Ok(data)
|
||||
}
|
||||
|
||||
pub fn exists(&self, hash: &ChunkHash) -> bool {
|
||||
self.inner.exists(hash)
|
||||
}
|
||||
|
||||
pub async fn delete(&self, hash: &ChunkHash) -> Result<(), CasError> {
|
||||
if self.inject_eio_on_write.load(Ordering::SeqCst) {
|
||||
return Err(CasError::Io(io::Error::new(
|
||||
ErrorKind::Other,
|
||||
"Input/output error (EIO injected)",
|
||||
)));
|
||||
}
|
||||
self.inner.delete(hash).await
|
||||
}
|
||||
|
||||
pub fn current_size(&self) -> u64 {
|
||||
self.inner.current_size()
|
||||
}
|
||||
|
||||
pub fn max_size(&self) -> u64 {
|
||||
self.inner.max_size()
|
||||
}
|
||||
|
||||
pub fn list_chunks(&self) -> impl Iterator<Item = ChunkHash> + '_ {
|
||||
self.inner.list_chunks()
|
||||
}
|
||||
|
||||
pub fn dedup_stats(&self) -> DedupStats {
|
||||
self.inner.dedup_stats()
|
||||
}
|
||||
|
||||
pub fn inner(&self) -> &Arc<CasStore> {
|
||||
&self.inner
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use tempfile::TempDir;
|
||||
|
||||
async fn test_store() -> (FaultyCasStore, TempDir) {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let config = CasConfig {
|
||||
chunks_dir: dir.path().join("chunks"),
|
||||
max_size: 1024 * 1024,
|
||||
shard_levels: 2,
|
||||
};
|
||||
let store = FaultyCasStore::open(config).await.unwrap();
|
||||
(store, dir)
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_healthy_passthrough() {
|
||||
let (store, _dir) = test_store().await;
|
||||
|
||||
let data = b"test data";
|
||||
let hash = store.put(data).await.unwrap();
|
||||
let retrieved = store.get(&hash).await.unwrap();
|
||||
assert_eq!(&retrieved[..], data);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_inject_enospc() {
|
||||
let (store, _dir) = test_store().await;
|
||||
|
||||
store.set_inject_enospc(true);
|
||||
let result = store.put(b"test").await;
|
||||
assert!(result.is_err());
|
||||
|
||||
let err = result.unwrap_err();
|
||||
assert!(matches!(err, CasError::Io(_)));
|
||||
|
||||
store.set_inject_enospc(false);
|
||||
assert!(store.put(b"test").await.is_ok());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_inject_eio_on_read() {
|
||||
let (store, _dir) = test_store().await;
|
||||
|
||||
let hash = store.put(b"test").await.unwrap();
|
||||
|
||||
store.set_inject_eio_on_read(true);
|
||||
let result = store.get(&hash).await;
|
||||
assert!(result.is_err());
|
||||
|
||||
store.set_inject_eio_on_read(false);
|
||||
assert!(store.get(&hash).await.is_ok());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_inject_corruption() {
|
||||
let (store, _dir) = test_store().await;
|
||||
|
||||
let hash = store.put(b"test data").await.unwrap();
|
||||
|
||||
store.set_inject_corruption(true);
|
||||
let result = store.get(&hash).await;
|
||||
assert!(matches!(result, Err(CasError::IntegrityError { .. })));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_fail_after_n_puts() {
|
||||
let (store, _dir) = test_store().await;
|
||||
|
||||
store.set_fail_after_n_puts(2);
|
||||
|
||||
assert!(store.put(b"data1").await.is_ok());
|
||||
assert!(store.put(b"data2").await.is_ok());
|
||||
assert!(store.put(b"data3").await.is_err());
|
||||
assert!(store.put(b"data4").await.is_err());
|
||||
assert_eq!(store.put_count(), 4);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_reset_faults() {
|
||||
let (store, _dir) = test_store().await;
|
||||
|
||||
store.set_inject_enospc(true);
|
||||
store.set_inject_eio_on_read(true);
|
||||
store.set_fail_after_n_puts(1);
|
||||
|
||||
store.reset_faults();
|
||||
|
||||
assert!(store.put(b"test").await.is_ok());
|
||||
let hash = store.put(b"test2").await.unwrap();
|
||||
assert!(store.get(&hash).await.is_ok());
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,328 @@
|
||||
use async_trait::async_trait;
|
||||
use musicfs_core::{DirEntry, Error, FileStat, HealthStatus, OriginId, OriginType, Result};
|
||||
use musicfs_origins::{Origin, WatchCallback, WatchHandle};
|
||||
use parking_lot::RwLock;
|
||||
use std::io::{self, ErrorKind};
|
||||
use std::path::Path;
|
||||
use std::sync::atomic::{AtomicUsize, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
use tokio::io::AsyncRead;
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
pub enum FailMode {
|
||||
Healthy,
|
||||
FailEveryNth(usize),
|
||||
FailAfterN(usize),
|
||||
TimeoutMs(u64),
|
||||
PartialRead { max_bytes: usize },
|
||||
ReturnError(ErrorKind),
|
||||
}
|
||||
|
||||
impl Default for FailMode {
|
||||
fn default() -> Self {
|
||||
FailMode::Healthy
|
||||
}
|
||||
}
|
||||
|
||||
pub struct FaultyOrigin {
|
||||
inner: Arc<dyn Origin>,
|
||||
fail_mode: Arc<RwLock<FailMode>>,
|
||||
call_count: AtomicUsize,
|
||||
}
|
||||
|
||||
impl FaultyOrigin {
|
||||
pub fn new(inner: Arc<dyn Origin>, mode: FailMode) -> Self {
|
||||
Self {
|
||||
inner,
|
||||
fail_mode: Arc::new(RwLock::new(mode)),
|
||||
call_count: AtomicUsize::new(0),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn wrap(inner: impl Origin + 'static) -> Self {
|
||||
Self::new(Arc::new(inner), FailMode::Healthy)
|
||||
}
|
||||
|
||||
pub fn set_mode(&self, mode: FailMode) {
|
||||
*self.fail_mode.write() = mode;
|
||||
}
|
||||
|
||||
pub fn call_count(&self) -> usize {
|
||||
self.call_count.load(Ordering::SeqCst)
|
||||
}
|
||||
|
||||
pub fn reset_count(&self) {
|
||||
self.call_count.store(0, Ordering::SeqCst);
|
||||
}
|
||||
|
||||
fn increment_and_check(&self) -> Option<Error> {
|
||||
let count = self.call_count.fetch_add(1, Ordering::SeqCst) + 1;
|
||||
let mode = self.fail_mode.read();
|
||||
|
||||
match *mode {
|
||||
FailMode::Healthy => None,
|
||||
FailMode::FailEveryNth(n) if n > 0 && count % n == 0 => {
|
||||
Some(Error::Origin("Injected failure (every Nth)".into()))
|
||||
}
|
||||
FailMode::FailEveryNth(_) => None,
|
||||
FailMode::FailAfterN(n) if count > n => {
|
||||
Some(Error::Origin("Injected failure (after N)".into()))
|
||||
}
|
||||
FailMode::FailAfterN(_) => None,
|
||||
FailMode::TimeoutMs(_) => None,
|
||||
FailMode::PartialRead { .. } => None,
|
||||
FailMode::ReturnError(kind) => {
|
||||
Some(Error::Io(io::Error::new(kind, "Injected I/O error")))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn maybe_timeout(&self) -> Option<Error> {
|
||||
let mode = self.fail_mode.read().clone();
|
||||
if let FailMode::TimeoutMs(ms) = mode {
|
||||
tokio::time::sleep(Duration::from_millis(ms)).await;
|
||||
Some(Error::Timeout("Injected timeout".into()))
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
fn truncate_if_partial(&self, mut data: Vec<u8>) -> Vec<u8> {
|
||||
let mode = self.fail_mode.read();
|
||||
if let FailMode::PartialRead { max_bytes } = *mode {
|
||||
data.truncate(max_bytes);
|
||||
}
|
||||
data
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Origin for FaultyOrigin {
|
||||
fn id(&self) -> &OriginId {
|
||||
self.inner.id()
|
||||
}
|
||||
|
||||
fn origin_type(&self) -> OriginType {
|
||||
self.inner.origin_type()
|
||||
}
|
||||
|
||||
fn display_name(&self) -> &str {
|
||||
self.inner.display_name()
|
||||
}
|
||||
|
||||
async fn readdir(&self, path: &Path) -> Result<Vec<DirEntry>> {
|
||||
if let Some(err) = self.increment_and_check() {
|
||||
return Err(err);
|
||||
}
|
||||
if let Some(err) = self.maybe_timeout().await {
|
||||
return Err(err);
|
||||
}
|
||||
self.inner.readdir(path).await
|
||||
}
|
||||
|
||||
async fn stat(&self, path: &Path) -> Result<FileStat> {
|
||||
if let Some(err) = self.increment_and_check() {
|
||||
return Err(err);
|
||||
}
|
||||
if let Some(err) = self.maybe_timeout().await {
|
||||
return Err(err);
|
||||
}
|
||||
self.inner.stat(path).await
|
||||
}
|
||||
|
||||
async fn read(&self, path: &Path, offset: u64, size: u32) -> Result<Vec<u8>> {
|
||||
if let Some(err) = self.increment_and_check() {
|
||||
return Err(err);
|
||||
}
|
||||
if let Some(err) = self.maybe_timeout().await {
|
||||
return Err(err);
|
||||
}
|
||||
let data = self.inner.read(path, offset, size).await?;
|
||||
Ok(self.truncate_if_partial(data))
|
||||
}
|
||||
|
||||
async fn read_full(&self, path: &Path) -> Result<Vec<u8>> {
|
||||
if let Some(err) = self.increment_and_check() {
|
||||
return Err(err);
|
||||
}
|
||||
if let Some(err) = self.maybe_timeout().await {
|
||||
return Err(err);
|
||||
}
|
||||
let data = self.inner.read_full(path).await?;
|
||||
Ok(self.truncate_if_partial(data))
|
||||
}
|
||||
|
||||
async fn exists(&self, path: &Path) -> Result<bool> {
|
||||
if let Some(err) = self.increment_and_check() {
|
||||
return Err(err);
|
||||
}
|
||||
if let Some(err) = self.maybe_timeout().await {
|
||||
return Err(err);
|
||||
}
|
||||
self.inner.exists(path).await
|
||||
}
|
||||
|
||||
async fn health(&self) -> HealthStatus {
|
||||
let mode = self.fail_mode.read().clone();
|
||||
match mode {
|
||||
FailMode::Healthy => self.inner.health().await,
|
||||
FailMode::ReturnError(_) => HealthStatus::Unhealthy,
|
||||
FailMode::TimeoutMs(ms) => {
|
||||
tokio::time::sleep(Duration::from_millis(ms)).await;
|
||||
HealthStatus::Unhealthy
|
||||
}
|
||||
FailMode::FailAfterN(n) if self.call_count.load(Ordering::SeqCst) >= n => {
|
||||
HealthStatus::Unhealthy
|
||||
}
|
||||
_ => self.inner.health().await,
|
||||
}
|
||||
}
|
||||
|
||||
async fn open_read(&self, path: &Path) -> Result<Box<dyn AsyncRead + Send + Unpin>> {
|
||||
if let Some(err) = self.increment_and_check() {
|
||||
return Err(err);
|
||||
}
|
||||
if let Some(err) = self.maybe_timeout().await {
|
||||
return Err(err);
|
||||
}
|
||||
self.inner.open_read(path).await
|
||||
}
|
||||
|
||||
async fn watch(&self, path: &Path, callback: WatchCallback) -> Result<WatchHandle> {
|
||||
self.inner.watch(path, callback).await
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
use std::time::SystemTime;
|
||||
|
||||
struct MockOrigin {
|
||||
id: OriginId,
|
||||
}
|
||||
|
||||
impl MockOrigin {
|
||||
fn new(id: &str) -> Self {
|
||||
Self {
|
||||
id: OriginId::from(id),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl Origin for MockOrigin {
|
||||
fn id(&self) -> &OriginId {
|
||||
&self.id
|
||||
}
|
||||
|
||||
fn origin_type(&self) -> OriginType {
|
||||
OriginType::Local
|
||||
}
|
||||
|
||||
fn display_name(&self) -> &str {
|
||||
"mock"
|
||||
}
|
||||
|
||||
async fn readdir(&self, _path: &Path) -> Result<Vec<DirEntry>> {
|
||||
Ok(vec![])
|
||||
}
|
||||
|
||||
async fn stat(&self, _path: &Path) -> Result<FileStat> {
|
||||
Ok(FileStat {
|
||||
size: 1000,
|
||||
mtime: SystemTime::now(),
|
||||
is_dir: false,
|
||||
})
|
||||
}
|
||||
|
||||
async fn read(&self, _path: &Path, _offset: u64, size: u32) -> Result<Vec<u8>> {
|
||||
Ok(vec![0u8; size as usize])
|
||||
}
|
||||
|
||||
async fn read_full(&self, _path: &Path) -> Result<Vec<u8>> {
|
||||
Ok(vec![0u8; 100])
|
||||
}
|
||||
|
||||
async fn exists(&self, _path: &Path) -> Result<bool> {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
async fn health(&self) -> HealthStatus {
|
||||
HealthStatus::Healthy
|
||||
}
|
||||
|
||||
async fn open_read(&self, _path: &Path) -> Result<Box<dyn AsyncRead + Send + Unpin>> {
|
||||
Err(Error::Origin("Not implemented".into()))
|
||||
}
|
||||
|
||||
async fn watch(&self, _path: &Path, _callback: WatchCallback) -> Result<WatchHandle> {
|
||||
Err(Error::Origin("Not implemented".into()))
|
||||
}
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_healthy_passthrough() {
|
||||
let inner = Arc::new(MockOrigin::new("test"));
|
||||
let faulty = FaultyOrigin::new(inner, FailMode::Healthy);
|
||||
|
||||
let result = faulty.stat(Path::new("/test")).await;
|
||||
assert!(result.is_ok());
|
||||
assert_eq!(faulty.call_count(), 1);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_fail_every_nth() {
|
||||
let inner = Arc::new(MockOrigin::new("test"));
|
||||
let faulty = FaultyOrigin::new(inner, FailMode::FailEveryNth(2));
|
||||
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_ok());
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_err());
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_ok());
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_err());
|
||||
assert_eq!(faulty.call_count(), 4);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_fail_after_n() {
|
||||
let inner = Arc::new(MockOrigin::new("test"));
|
||||
let faulty = FaultyOrigin::new(inner, FailMode::FailAfterN(2));
|
||||
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_ok());
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_ok());
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_err());
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_err());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_partial_read() {
|
||||
let inner = Arc::new(MockOrigin::new("test"));
|
||||
let faulty = FaultyOrigin::new(inner, FailMode::PartialRead { max_bytes: 10 });
|
||||
|
||||
let data = faulty.read(Path::new("/test"), 0, 100).await.unwrap();
|
||||
assert_eq!(data.len(), 10);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_mode_change_mid_test() {
|
||||
let inner = Arc::new(MockOrigin::new("test"));
|
||||
let faulty = FaultyOrigin::new(inner, FailMode::ReturnError(ErrorKind::ConnectionRefused));
|
||||
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_err());
|
||||
|
||||
faulty.set_mode(FailMode::Healthy);
|
||||
assert!(faulty.stat(Path::new("/test")).await.is_ok());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_health_reflects_mode() {
|
||||
let inner = Arc::new(MockOrigin::new("test"));
|
||||
let faulty = FaultyOrigin::new(inner, FailMode::Healthy);
|
||||
|
||||
assert_eq!(faulty.health().await, HealthStatus::Healthy);
|
||||
|
||||
faulty.set_mode(FailMode::ReturnError(ErrorKind::ConnectionRefused));
|
||||
assert_eq!(faulty.health().await, HealthStatus::Unhealthy);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,255 @@
|
||||
use musicfs_cache::TreeBuilder;
|
||||
use musicfs_cas::{CasConfig, CasStore};
|
||||
use musicfs_core::{
|
||||
AudioFormat, AudioMeta, FileId, FileMeta, OriginId, RealPath, VirtualPath,
|
||||
};
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::{Arc, RwLock};
|
||||
use std::time::SystemTime;
|
||||
use tempfile::TempDir;
|
||||
|
||||
pub fn make_file_meta(id: i64, vpath: &str, size: u64) -> FileMeta {
|
||||
FileMeta {
|
||||
id: FileId(id),
|
||||
virtual_path: VirtualPath::new(vpath),
|
||||
real_path: RealPath {
|
||||
origin_id: OriginId::from("test"),
|
||||
path: PathBuf::from(vpath),
|
||||
},
|
||||
size,
|
||||
mtime: SystemTime::now(),
|
||||
content_hash: None,
|
||||
audio: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn make_file_meta_with_origin(id: i64, vpath: &str, size: u64, origin_id: &str) -> FileMeta {
|
||||
FileMeta {
|
||||
id: FileId(id),
|
||||
virtual_path: VirtualPath::new(vpath),
|
||||
real_path: RealPath {
|
||||
origin_id: OriginId::from(origin_id),
|
||||
path: PathBuf::from(vpath),
|
||||
},
|
||||
size,
|
||||
mtime: SystemTime::now(),
|
||||
content_hash: None,
|
||||
audio: None,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn make_audio_meta(artist: &str, album: &str, title: &str) -> AudioMeta {
|
||||
AudioMeta {
|
||||
title: Some(title.to_string()),
|
||||
artist: Some(artist.to_string()),
|
||||
album: Some(album.to_string()),
|
||||
album_artist: None,
|
||||
genre: None,
|
||||
year: None,
|
||||
track: None,
|
||||
disc: None,
|
||||
duration_ms: Some(180_000),
|
||||
bitrate: Some(320),
|
||||
sample_rate: Some(44100),
|
||||
format: AudioFormat::Flac,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn make_audio_file(
|
||||
id: i64,
|
||||
vpath: &str,
|
||||
size: u64,
|
||||
artist: &str,
|
||||
album: &str,
|
||||
title: &str,
|
||||
) -> FileMeta {
|
||||
FileMeta {
|
||||
id: FileId(id),
|
||||
virtual_path: VirtualPath::new(vpath),
|
||||
real_path: RealPath {
|
||||
origin_id: OriginId::from("test"),
|
||||
path: PathBuf::from(vpath),
|
||||
},
|
||||
size,
|
||||
mtime: SystemTime::now(),
|
||||
content_hash: None,
|
||||
audio: Some(make_audio_meta(artist, album, title)),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn make_audio_file_full(
|
||||
id: i64,
|
||||
vpath: &str,
|
||||
size: u64,
|
||||
artist: &str,
|
||||
album: &str,
|
||||
title: &str,
|
||||
track: u32,
|
||||
year: u32,
|
||||
) -> FileMeta {
|
||||
let mut audio = make_audio_meta(artist, album, title);
|
||||
audio.track = Some(track);
|
||||
audio.year = Some(year);
|
||||
|
||||
FileMeta {
|
||||
id: FileId(id),
|
||||
virtual_path: VirtualPath::new(vpath),
|
||||
real_path: RealPath {
|
||||
origin_id: OriginId::from("test"),
|
||||
path: PathBuf::from(vpath),
|
||||
},
|
||||
size,
|
||||
mtime: SystemTime::now(),
|
||||
content_hash: None,
|
||||
audio: Some(audio),
|
||||
}
|
||||
}
|
||||
|
||||
pub struct TestCasStore {
|
||||
pub store: Arc<CasStore>,
|
||||
pub dir: TempDir,
|
||||
}
|
||||
|
||||
pub async fn setup_test_cas() -> TestCasStore {
|
||||
let dir = TempDir::new().expect("Failed to create temp dir for CAS");
|
||||
let config = CasConfig {
|
||||
chunks_dir: dir.path().join("chunks"),
|
||||
max_size: 100 * 1024 * 1024,
|
||||
shard_levels: 2,
|
||||
};
|
||||
let store = CasStore::open(config)
|
||||
.await
|
||||
.expect("Failed to open CAS store");
|
||||
TestCasStore {
|
||||
store: Arc::new(store),
|
||||
dir,
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn setup_test_cas_with_size(max_size: u64) -> TestCasStore {
|
||||
let dir = TempDir::new().expect("Failed to create temp dir for CAS");
|
||||
let config = CasConfig {
|
||||
chunks_dir: dir.path().join("chunks"),
|
||||
max_size,
|
||||
shard_levels: 2,
|
||||
};
|
||||
let store = CasStore::open(config)
|
||||
.await
|
||||
.expect("Failed to open CAS store");
|
||||
TestCasStore {
|
||||
store: Arc::new(store),
|
||||
dir,
|
||||
}
|
||||
}
|
||||
|
||||
pub fn setup_test_tree(files: &[FileMeta]) -> Arc<RwLock<musicfs_cache::VirtualTree>> {
|
||||
let mut builder = TreeBuilder::new();
|
||||
for file in files {
|
||||
builder.add_file(file);
|
||||
}
|
||||
Arc::new(RwLock::new(builder.build()))
|
||||
}
|
||||
|
||||
pub fn create_test_file(dir: &Path, relative_path: &str, content: &[u8]) -> PathBuf {
|
||||
let full_path = dir.join(relative_path);
|
||||
if let Some(parent) = full_path.parent() {
|
||||
std::fs::create_dir_all(parent).expect("Failed to create parent directories");
|
||||
}
|
||||
std::fs::write(&full_path, content).expect("Failed to write test file");
|
||||
full_path
|
||||
}
|
||||
|
||||
pub fn create_test_dir_structure(base: &Path, structure: &[&str]) {
|
||||
for path in structure {
|
||||
let full_path = base.join(path);
|
||||
if path.ends_with('/') {
|
||||
std::fs::create_dir_all(&full_path).expect("Failed to create directory");
|
||||
} else {
|
||||
if let Some(parent) = full_path.parent() {
|
||||
std::fs::create_dir_all(parent).expect("Failed to create parent");
|
||||
}
|
||||
std::fs::write(&full_path, format!("content of {}", path))
|
||||
.expect("Failed to write file");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
pub struct TestOriginDir {
|
||||
pub dir: TempDir,
|
||||
}
|
||||
|
||||
impl TestOriginDir {
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
dir: TempDir::new().expect("Failed to create origin temp dir"),
|
||||
}
|
||||
}
|
||||
|
||||
pub fn add_file(&self, path: &str, content: &[u8]) -> PathBuf {
|
||||
create_test_file(self.dir.path(), path, content)
|
||||
}
|
||||
|
||||
pub fn add_audio_file(&self, path: &str) -> PathBuf {
|
||||
let fake_audio = b"FAKE_FLAC_HEADER_FOR_TESTING_ONLY";
|
||||
self.add_file(path, fake_audio)
|
||||
}
|
||||
|
||||
pub fn path(&self) -> &Path {
|
||||
self.dir.path()
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for TestOriginDir {
|
||||
fn default() -> Self {
|
||||
Self::new()
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_make_file_meta() {
|
||||
let meta = make_file_meta(1, "/Artist/Album/Track.flac", 1000);
|
||||
assert_eq!(meta.id.0, 1);
|
||||
assert_eq!(meta.virtual_path.as_str(), "/Artist/Album/Track.flac");
|
||||
assert_eq!(meta.size, 1000);
|
||||
assert!(meta.audio.is_none());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_make_audio_file() {
|
||||
let meta = make_audio_file(1, "/path.flac", 5000, "Artist", "Album", "Title");
|
||||
assert!(meta.audio.is_some());
|
||||
let audio = meta.audio.unwrap();
|
||||
assert_eq!(audio.artist, Some("Artist".to_string()));
|
||||
assert_eq!(audio.album, Some("Album".to_string()));
|
||||
assert_eq!(audio.title, Some("Title".to_string()));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_setup_test_cas() {
|
||||
let test_cas = setup_test_cas().await;
|
||||
let hash = test_cas.store.put(b"test data").await.unwrap();
|
||||
assert!(test_cas.store.exists(&hash));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_setup_test_tree() {
|
||||
let files = vec![
|
||||
make_file_meta(1, "/A/B/1.flac", 100),
|
||||
make_file_meta(2, "/A/B/2.flac", 200),
|
||||
];
|
||||
let tree = setup_test_tree(&files);
|
||||
let guard = tree.read().unwrap();
|
||||
assert!(guard.file_count() > 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_origin_dir() {
|
||||
let origin = TestOriginDir::new();
|
||||
let path = origin.add_file("artist/album/track.flac", b"content");
|
||||
assert!(path.exists());
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,9 @@
|
||||
pub mod assertions;
|
||||
pub mod faulty_cas;
|
||||
pub mod faulty_origin;
|
||||
pub mod fixtures;
|
||||
|
||||
pub use assertions::*;
|
||||
pub use faulty_cas::FaultyCasStore;
|
||||
pub use faulty_origin::{FailMode, FaultyOrigin};
|
||||
pub use fixtures::*;
|
||||
@@ -0,0 +1,148 @@
|
||||
#![cfg(feature = "docker-tests")]
|
||||
|
||||
use musicfs_core::{OriginId, OriginType};
|
||||
use musicfs_origins::{HealthMonitor, LocalOrigin, OriginRegistry};
|
||||
use noxious_client::{Client, StreamDirection, Toxic, ToxicKind};
|
||||
use std::collections::HashMap;
|
||||
use std::sync::Arc;
|
||||
use std::time::Duration;
|
||||
use tempfile::TempDir;
|
||||
|
||||
const TOXIPROXY_API: &str = "http://localhost:8474";
|
||||
const TOXIPROXY_LISTEN: &str = "localhost:18080";
|
||||
const UPSTREAM_ADDR: &str = "minio:9000";
|
||||
|
||||
async fn require_toxiproxy() {
|
||||
let available = match reqwest::get(format!("{}/version", TOXIPROXY_API)).await {
|
||||
Ok(resp) => resp.status().is_success(),
|
||||
Err(_) => false,
|
||||
};
|
||||
assert!(available, "Toxiproxy not available at {}. Run: cd tests/integration && docker-compose up -d", TOXIPROXY_API);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
#[ignore = "Requires docker-compose up -d (tests/integration/docker-compose.yml)"]
|
||||
async fn test_toxiproxy_latency_injection() {
|
||||
require_toxiproxy().await;
|
||||
|
||||
let client = Client::new(TOXIPROXY_API);
|
||||
let proxy = client
|
||||
.create_proxy("minio_latency", TOXIPROXY_LISTEN, UPSTREAM_ADDR)
|
||||
.await
|
||||
.expect("Failed to create proxy");
|
||||
|
||||
let toxic = Toxic {
|
||||
name: "latency_downstream".to_string(),
|
||||
kind: ToxicKind::Latency {
|
||||
latency: 500,
|
||||
jitter: 100,
|
||||
},
|
||||
direction: StreamDirection::Downstream,
|
||||
toxicity: 1.0,
|
||||
};
|
||||
|
||||
proxy
|
||||
.add_toxic(&toxic)
|
||||
.await
|
||||
.expect("Failed to add toxic");
|
||||
|
||||
let start = std::time::Instant::now();
|
||||
let _ = reqwest::get(format!("http://{}/minio/health/live", TOXIPROXY_LISTEN)).await;
|
||||
let elapsed = start.elapsed();
|
||||
|
||||
assert!(
|
||||
elapsed >= Duration::from_millis(400),
|
||||
"Latency should be injected, got {:?}",
|
||||
elapsed
|
||||
);
|
||||
|
||||
proxy.delete().await.expect("Failed to cleanup proxy");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
#[ignore = "Requires docker-compose up -d (tests/integration/docker-compose.yml)"]
|
||||
async fn test_toxiproxy_timeout_simulates_network_partition() {
|
||||
require_toxiproxy().await;
|
||||
|
||||
let client = Client::new(TOXIPROXY_API);
|
||||
let proxy = client
|
||||
.create_proxy("minio_partition", TOXIPROXY_LISTEN, UPSTREAM_ADDR)
|
||||
.await
|
||||
.expect("Failed to create proxy");
|
||||
|
||||
let result = reqwest::get(format!("http://{}/minio/health/live", TOXIPROXY_LISTEN)).await;
|
||||
assert!(result.is_ok(), "Should reach MinIO through proxy initially");
|
||||
|
||||
let toxic = Toxic {
|
||||
name: "timeout".to_string(),
|
||||
kind: ToxicKind::Timeout { timeout: 0 },
|
||||
direction: StreamDirection::Downstream,
|
||||
toxicity: 1.0,
|
||||
};
|
||||
|
||||
proxy
|
||||
.add_toxic(&toxic)
|
||||
.await
|
||||
.expect("Failed to add toxic");
|
||||
|
||||
let result = tokio::time::timeout(
|
||||
Duration::from_secs(2),
|
||||
reqwest::get(format!("http://{}/minio/health/live", TOXIPROXY_LISTEN)),
|
||||
)
|
||||
.await;
|
||||
|
||||
assert!(
|
||||
result.is_err() || result.unwrap().is_err(),
|
||||
"Should timeout during partition"
|
||||
);
|
||||
|
||||
proxy
|
||||
.remove_toxic("timeout")
|
||||
.await
|
||||
.expect("Failed to remove toxic");
|
||||
|
||||
tokio::time::sleep(Duration::from_millis(100)).await;
|
||||
|
||||
let result = reqwest::get(format!("http://{}/minio/health/live", TOXIPROXY_LISTEN)).await;
|
||||
assert!(result.is_ok(), "Should reach MinIO after partition heals");
|
||||
|
||||
proxy.delete().await.expect("Failed to cleanup proxy");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
#[ignore = "Requires docker-compose up -d (tests/integration/docker-compose.yml)"]
|
||||
async fn test_toxiproxy_slow_close_throttles_responses() {
|
||||
require_toxiproxy().await;
|
||||
|
||||
let client = Client::new(TOXIPROXY_API);
|
||||
let proxy = client
|
||||
.create_proxy("minio_slow", TOXIPROXY_LISTEN, UPSTREAM_ADDR)
|
||||
.await
|
||||
.expect("Failed to create proxy");
|
||||
|
||||
let toxic = Toxic {
|
||||
name: "slow_close".to_string(),
|
||||
kind: ToxicKind::SlowClose { delay: 1000 },
|
||||
direction: StreamDirection::Downstream,
|
||||
toxicity: 1.0,
|
||||
};
|
||||
|
||||
proxy
|
||||
.add_toxic(&toxic)
|
||||
.await
|
||||
.expect("Failed to add toxic");
|
||||
|
||||
let start = std::time::Instant::now();
|
||||
let _ = reqwest::get(format!("http://{}/minio/health/live", TOXIPROXY_LISTEN)).await;
|
||||
let elapsed = start.elapsed();
|
||||
|
||||
assert!(
|
||||
elapsed >= Duration::from_millis(800),
|
||||
"Slow close should delay response, got {:?}",
|
||||
elapsed
|
||||
);
|
||||
|
||||
proxy.delete().await.expect("Failed to cleanup proxy");
|
||||
}
|
||||
|
||||
|
||||
@@ -0,0 +1,722 @@
|
||||
use musicfs_cache::{Database, VirtualTree, ROOT_INODE};
|
||||
use musicfs_cas::{CasConfig, CasStore};
|
||||
use musicfs_core::supervisor::{TaskStatus, TaskSupervisor};
|
||||
use musicfs_core::{
|
||||
AudioMeta, FileId, FileMeta, HealthStatus, OriginId, OriginType, RealPath, VirtualPath,
|
||||
};
|
||||
use musicfs_origins::{HealthMonitor, LocalOrigin, OriginRegistry};
|
||||
use musicfs_search::SearchIndex;
|
||||
use musicfs_test_utils::{FailMode, FaultyOrigin};
|
||||
use std::collections::HashMap;
|
||||
use std::io::ErrorKind;
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::sync::atomic::{AtomicBool, AtomicU32, Ordering};
|
||||
use std::sync::Arc;
|
||||
use std::time::{Duration, Instant, UNIX_EPOCH};
|
||||
use tempfile::TempDir;
|
||||
use tokio_util::sync::CancellationToken;
|
||||
|
||||
fn setup_test_file(dir: &TempDir, name: &str, content: &[u8]) -> PathBuf {
|
||||
let path = dir.path().join(name);
|
||||
std::fs::write(&path, content).unwrap();
|
||||
path
|
||||
}
|
||||
|
||||
async fn setup_cas(dir: &Path) -> CasStore {
|
||||
CasStore::open(CasConfig {
|
||||
chunks_dir: dir.join("chunks"),
|
||||
max_size: 100 * 1024 * 1024,
|
||||
shard_levels: 2,
|
||||
})
|
||||
.await
|
||||
.unwrap()
|
||||
}
|
||||
|
||||
fn create_faulty_origin(id: &str, dir: &TempDir, mode: FailMode) -> Arc<FaultyOrigin> {
|
||||
let inner = Arc::new(LocalOrigin::new(OriginId::from(id), dir.path().to_path_buf()));
|
||||
Arc::new(FaultyOrigin::new(inner, mode))
|
||||
}
|
||||
|
||||
fn make_file_meta(id: i64, path: &str, size: u64) -> FileMeta {
|
||||
let name = Path::new(path)
|
||||
.file_stem()
|
||||
.and_then(|s| s.to_str())
|
||||
.unwrap_or("unknown")
|
||||
.to_string();
|
||||
FileMeta {
|
||||
id: FileId(id),
|
||||
virtual_path: VirtualPath::new(path),
|
||||
real_path: RealPath {
|
||||
origin_id: OriginId::from("test"),
|
||||
path: PathBuf::from(path),
|
||||
},
|
||||
size,
|
||||
mtime: UNIX_EPOCH,
|
||||
content_hash: None,
|
||||
audio: Some(AudioMeta {
|
||||
title: Some(name),
|
||||
..Default::default()
|
||||
}),
|
||||
}
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_sqlite_integrity_check_detects_corruption() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let db_path = dir.path().join("test.db");
|
||||
|
||||
{
|
||||
let db = Database::open(&db_path).unwrap();
|
||||
db.upsert_file(
|
||||
&OriginId::from("test"),
|
||||
Path::new("/test.flac"),
|
||||
&VirtualPath::new("/Test.flac"),
|
||||
&AudioMeta::default(),
|
||||
UNIX_EPOCH,
|
||||
1000,
|
||||
)
|
||||
.unwrap();
|
||||
}
|
||||
|
||||
let mut data = std::fs::read(&db_path).unwrap();
|
||||
let mid = data.len() / 2;
|
||||
data[mid..mid + 100].fill(0xFF);
|
||||
std::fs::write(&db_path, &data).unwrap();
|
||||
|
||||
let result = Database::open_with_integrity_check(&db_path);
|
||||
assert!(result.is_err());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_tantivy_corruption_triggers_rebuild() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let index_path = dir.path().join("search_idx");
|
||||
|
||||
{
|
||||
let index = SearchIndex::open(&index_path).unwrap();
|
||||
index.index_file(&make_file_meta(1, "/a.flac", 1000)).unwrap();
|
||||
index.commit().unwrap();
|
||||
}
|
||||
|
||||
std::fs::write(index_path.join("meta.json"), b"corrupted").unwrap();
|
||||
|
||||
let index = SearchIndex::open_with_recovery(&index_path).unwrap();
|
||||
let results = index.search("a", 10).unwrap();
|
||||
assert_eq!(results.len(), 0);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_sled_corruption_triggers_repair() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let chunks_dir = dir.path().join("chunks");
|
||||
let config = CasConfig {
|
||||
chunks_dir: chunks_dir.clone(),
|
||||
max_size: 10_000_000,
|
||||
shard_levels: 2,
|
||||
};
|
||||
|
||||
{
|
||||
let store = CasStore::open(config.clone()).await.unwrap();
|
||||
store.put(b"test data").await.unwrap();
|
||||
}
|
||||
|
||||
let sled_dir = chunks_dir.join("index.sled");
|
||||
if sled_dir.exists() {
|
||||
for entry in std::fs::read_dir(&sled_dir).unwrap() {
|
||||
let entry = entry.unwrap();
|
||||
if entry.metadata().unwrap().is_file() {
|
||||
std::fs::write(entry.path(), b"corrupted").unwrap();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
let result = CasStore::open(config).await;
|
||||
assert!(result.is_ok(), "sled should recover from corruption");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_cas_put_handles_enospc() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let store = CasStore::open(CasConfig {
|
||||
chunks_dir: dir.path().join("chunks"),
|
||||
max_size: 100,
|
||||
shard_levels: 2,
|
||||
})
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let large_data = vec![0u8; 1000];
|
||||
let result = store.put(&large_data).await;
|
||||
|
||||
assert!(result.is_err(), "Issue 2.8: CasStore should pre-check space and reject oversized write");
|
||||
}
|
||||
|
||||
/// Demonstrates the PROBLEM with std::sync::RwLock: after a writer panic,
|
||||
/// the lock is poisoned and all subsequent access fails with PoisonError.
|
||||
/// This is why we use parking_lot::RwLock instead (see test_parking_lot_rwlock_survives_panic).
|
||||
#[test]
|
||||
fn test_poisoned_tree_lock_returns_eio_not_panic() {
|
||||
use std::sync::{Arc, RwLock};
|
||||
use std::thread;
|
||||
|
||||
let lock = Arc::new(RwLock::new(42));
|
||||
let lock_clone = lock.clone();
|
||||
|
||||
let handle = thread::spawn(move || {
|
||||
let _guard = lock_clone.write().unwrap();
|
||||
panic!("writer panic");
|
||||
});
|
||||
|
||||
let _ = handle.join();
|
||||
|
||||
let result = lock.read();
|
||||
// std::sync::RwLock poisons after writer panic - this is the problem we fix with parking_lot
|
||||
assert!(result.is_err(), "Issue 2.9: std::sync::RwLock should poison after writer panic (this demonstrates the problem)");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_parking_lot_rwlock_survives_panic() {
|
||||
use parking_lot::RwLock;
|
||||
use std::sync::Arc;
|
||||
use std::thread;
|
||||
|
||||
let tree = Arc::new(RwLock::new(VirtualTree::new()));
|
||||
let tree_clone = tree.clone();
|
||||
|
||||
let handle = thread::spawn(move || {
|
||||
let _guard = tree_clone.write();
|
||||
panic!("writer panic");
|
||||
});
|
||||
|
||||
let _ = handle.join();
|
||||
|
||||
assert!(tree.read().get(ROOT_INODE).is_some(), "parking_lot RwLock should survive writer panic");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_failover_on_primary_death() {
|
||||
let primary_dir = TempDir::new().unwrap();
|
||||
let backup_dir = TempDir::new().unwrap();
|
||||
setup_test_file(&primary_dir, "test.txt", b"primary");
|
||||
setup_test_file(&backup_dir, "test.txt", b"backup");
|
||||
|
||||
let primary = create_faulty_origin("primary", &primary_dir, FailMode::ReturnError(ErrorKind::ConnectionRefused));
|
||||
let backup = create_faulty_origin("backup", &backup_dir, FailMode::Healthy);
|
||||
|
||||
let mut thresholds = HashMap::new();
|
||||
thresholds.insert(OriginType::Local, 1);
|
||||
let monitor = Arc::new(HealthMonitor::new(Duration::from_secs(30)).with_per_type_thresholds(thresholds));
|
||||
let registry = Arc::new(OriginRegistry::new(monitor.clone()));
|
||||
|
||||
registry.register(primary.clone(), 1);
|
||||
registry.register(backup.clone(), 2);
|
||||
|
||||
monitor.check_now(&OriginId::from("primary")).await;
|
||||
monitor.check_now(&OriginId::from("backup")).await;
|
||||
|
||||
assert!(registry.health().is_unhealthy(&OriginId::from("primary")));
|
||||
assert!(registry.health().is_healthy(&OriginId::from("backup")));
|
||||
|
||||
let path = RealPath {
|
||||
origin_id: OriginId::from("backup"),
|
||||
path: PathBuf::from("/test.txt"),
|
||||
};
|
||||
let candidates = registry.route_all(&path);
|
||||
assert_eq!(candidates.len(), 1);
|
||||
assert_eq!(candidates[0].id(), &OriginId::from("backup"));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_origin_recovery_resumes_routing() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
setup_test_file(&dir, "test.txt", b"content");
|
||||
|
||||
let faulty = create_faulty_origin("recovering", &dir, FailMode::ReturnError(ErrorKind::ConnectionRefused));
|
||||
|
||||
let mut thresholds = HashMap::new();
|
||||
thresholds.insert(OriginType::Local, 1);
|
||||
let monitor = Arc::new(HealthMonitor::new(Duration::from_secs(30)).with_per_type_thresholds(thresholds));
|
||||
monitor.add_origin(faulty.clone());
|
||||
|
||||
monitor.check_now(&OriginId::from("recovering")).await;
|
||||
assert_eq!(monitor.get_state(&OriginId::from("recovering")).unwrap().status, HealthStatus::Unhealthy);
|
||||
|
||||
faulty.set_mode(FailMode::Healthy);
|
||||
monitor.check_now(&OriginId::from("recovering")).await;
|
||||
|
||||
assert_eq!(monitor.get_state(&OriginId::from("recovering")).unwrap().status, HealthStatus::Healthy);
|
||||
assert_eq!(monitor.get_state(&OriginId::from("recovering")).unwrap().consecutive_failures, 0);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_local_origin_health_check_has_timeout() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
setup_test_file(&dir, "test.txt", b"content");
|
||||
|
||||
let slow = create_faulty_origin("slow", &dir, FailMode::TimeoutMs(5_000));
|
||||
|
||||
let monitor = Arc::new(HealthMonitor::new(Duration::from_secs(30)));
|
||||
monitor.add_origin(slow.clone());
|
||||
|
||||
let start = Instant::now();
|
||||
monitor.check_now(&OriginId::from("slow")).await;
|
||||
let elapsed = start.elapsed();
|
||||
|
||||
assert!(elapsed < Duration::from_secs(2),
|
||||
"Issue 4.2.1: Health check should timeout in <2s, took {:?}", elapsed);
|
||||
|
||||
let state = monitor.get_state(&OriginId::from("slow")).unwrap();
|
||||
assert_eq!(state.status, HealthStatus::Unhealthy);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_health_checks_run_in_parallel() {
|
||||
let slow1_dir = TempDir::new().unwrap();
|
||||
let slow2_dir = TempDir::new().unwrap();
|
||||
let slow3_dir = TempDir::new().unwrap();
|
||||
|
||||
let slow1 = create_faulty_origin("slow1", &slow1_dir, FailMode::TimeoutMs(200));
|
||||
let slow2 = create_faulty_origin("slow2", &slow2_dir, FailMode::TimeoutMs(200));
|
||||
let slow3 = create_faulty_origin("slow3", &slow3_dir, FailMode::TimeoutMs(200));
|
||||
|
||||
let monitor = Arc::new(HealthMonitor::new(Duration::from_secs(30)));
|
||||
monitor.add_origin(slow1);
|
||||
monitor.add_origin(slow2);
|
||||
monitor.add_origin(slow3);
|
||||
|
||||
let start = Instant::now();
|
||||
monitor.check_all().await;
|
||||
let elapsed = start.elapsed();
|
||||
|
||||
assert!(elapsed < Duration::from_millis(350), "Issue 4.2.2: check_all() should run in parallel (sequential would take ~600ms), took {:?}", elapsed);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_tantivy_survives_uncommitted_crash() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let index_path = dir.path().join("search_idx");
|
||||
|
||||
{
|
||||
let index = SearchIndex::open(&index_path).unwrap();
|
||||
index.index_file(&make_file_meta(1, "/a.flac", 1000)).unwrap();
|
||||
index.commit().unwrap();
|
||||
index.index_file(&make_file_meta(2, "/b.flac", 1000)).unwrap();
|
||||
}
|
||||
|
||||
let index = SearchIndex::open(&index_path).unwrap();
|
||||
let results = index.search("a", 10).unwrap();
|
||||
assert_eq!(results.len(), 1);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
#[cfg(feature = "resource-limits")]
|
||||
async fn test_fd_exhaustion_handling() {
|
||||
use rlimit::{getrlimit, setrlimit, Resource};
|
||||
|
||||
let (orig_soft, orig_hard) = getrlimit(Resource::NOFILE).unwrap();
|
||||
|
||||
setrlimit(Resource::NOFILE, 64, 64).unwrap();
|
||||
|
||||
let dir = TempDir::new().unwrap();
|
||||
let result = CasStore::open(CasConfig {
|
||||
chunks_dir: dir.path().join("chunks"),
|
||||
max_size: 1_000_000,
|
||||
shard_levels: 2,
|
||||
})
|
||||
.await;
|
||||
|
||||
match result {
|
||||
Ok(_store) => {}
|
||||
Err(e) => {
|
||||
let msg = format!("{}", e);
|
||||
assert!(
|
||||
!msg.contains("panic"),
|
||||
"Should not panic on fd exhaustion"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
setrlimit(Resource::NOFILE, orig_soft, orig_hard).unwrap();
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
#[cfg(not(feature = "resource-limits"))]
|
||||
async fn test_fd_exhaustion_handling() {
|
||||
eprintln!("Skipping test_fd_exhaustion_handling: resource-limits feature not enabled");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_corrupt_chunk_auto_refetched() {
|
||||
use musicfs_cas::{ContentFetcher, FileReader};
|
||||
use musicfs_origins::LocalOrigin;
|
||||
|
||||
let dir = TempDir::new().unwrap();
|
||||
let origin_dir = TempDir::new().unwrap();
|
||||
let test_content = b"original audio data for chunk test";
|
||||
setup_test_file(&origin_dir, "test.flac", test_content);
|
||||
|
||||
let store = Arc::new(setup_cas(dir.path()).await);
|
||||
|
||||
let origin = Arc::new(LocalOrigin::new(OriginId::from("local"), origin_dir.path().to_path_buf()));
|
||||
let fetcher = Arc::new(ContentFetcher::new(store.clone()));
|
||||
fetcher.register_origin(origin);
|
||||
|
||||
let file_meta = FileMeta {
|
||||
id: FileId(1),
|
||||
virtual_path: VirtualPath::new("/test.flac"),
|
||||
real_path: RealPath {
|
||||
origin_id: OriginId::from("local"),
|
||||
path: PathBuf::from("/test.flac"),
|
||||
},
|
||||
size: test_content.len() as u64,
|
||||
mtime: UNIX_EPOCH,
|
||||
content_hash: None,
|
||||
audio: None,
|
||||
};
|
||||
fetcher.register_file(file_meta);
|
||||
|
||||
let manifest = fetcher.fetch_file(FileId(1)).await.unwrap();
|
||||
let chunk_hash = manifest.chunks[0].hash;
|
||||
let hex = chunk_hash.as_hex();
|
||||
let chunk_path = dir.path().join("chunks").join(&hex[0..2]).join(&hex[2..4]).join(&hex);
|
||||
|
||||
let mut corrupted = std::fs::read(&chunk_path).unwrap();
|
||||
corrupted[0] = corrupted[0].wrapping_add(1);
|
||||
std::fs::write(&chunk_path, &corrupted).unwrap();
|
||||
|
||||
let reader = FileReader::with_fetcher(store, fetcher);
|
||||
reader.register_manifest(manifest);
|
||||
|
||||
let result = reader.read(FileId(1), 0, test_content.len() as u32).await;
|
||||
|
||||
assert!(result.is_ok(), "Issue 6.4: Corrupted chunk should be auto-refetched from origin");
|
||||
assert_eq!(&result.unwrap()[..], test_content, "Data should match original after re-fetch");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_missing_chunk_triggers_origin_fetch() {
|
||||
use musicfs_cas::{ContentFetcher, FileReader};
|
||||
use musicfs_origins::LocalOrigin;
|
||||
|
||||
let dir = TempDir::new().unwrap();
|
||||
let origin_dir = TempDir::new().unwrap();
|
||||
let test_content = b"test data for missing chunk";
|
||||
setup_test_file(&origin_dir, "test.flac", test_content);
|
||||
|
||||
let store = Arc::new(setup_cas(dir.path()).await);
|
||||
|
||||
let origin = Arc::new(LocalOrigin::new(OriginId::from("local"), origin_dir.path().to_path_buf()));
|
||||
let fetcher = Arc::new(ContentFetcher::new(store.clone()));
|
||||
fetcher.register_origin(origin);
|
||||
|
||||
let file_meta = FileMeta {
|
||||
id: FileId(1),
|
||||
virtual_path: VirtualPath::new("/test.flac"),
|
||||
real_path: RealPath {
|
||||
origin_id: OriginId::from("local"),
|
||||
path: PathBuf::from("/test.flac"),
|
||||
},
|
||||
size: test_content.len() as u64,
|
||||
mtime: UNIX_EPOCH,
|
||||
content_hash: None,
|
||||
audio: None,
|
||||
};
|
||||
fetcher.register_file(file_meta);
|
||||
|
||||
let manifest = fetcher.fetch_file(FileId(1)).await.unwrap();
|
||||
let chunk_hash = manifest.chunks[0].hash;
|
||||
let hex = chunk_hash.as_hex();
|
||||
let chunk_path = dir.path().join("chunks").join(&hex[0..2]).join(&hex[2..4]).join(&hex);
|
||||
|
||||
std::fs::remove_file(&chunk_path).unwrap();
|
||||
|
||||
let reader = FileReader::with_fetcher(store, fetcher);
|
||||
reader.register_manifest(manifest);
|
||||
|
||||
let result = reader.read(FileId(1), 0, test_content.len() as u32).await;
|
||||
|
||||
assert!(result.is_ok(), "Issue 6.4: Missing chunk should be re-fetched from origin");
|
||||
assert_eq!(&result.unwrap()[..], test_content, "Data should match original after re-fetch");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_passthrough_mode_when_cache_disk_dead() {
|
||||
use musicfs_cas::ContentFetcher;
|
||||
use musicfs_origins::LocalOrigin;
|
||||
|
||||
let dir = TempDir::new().unwrap();
|
||||
let origin_dir = TempDir::new().unwrap();
|
||||
let test_content = b"passthrough test data";
|
||||
setup_test_file(&origin_dir, "test.flac", test_content);
|
||||
|
||||
let store = Arc::new(CasStore::open(CasConfig {
|
||||
chunks_dir: dir.path().join("chunks"),
|
||||
max_size: 10,
|
||||
shard_levels: 2,
|
||||
})
|
||||
.await
|
||||
.unwrap());
|
||||
|
||||
let origin = Arc::new(LocalOrigin::new(OriginId::from("local"), origin_dir.path().to_path_buf()));
|
||||
let fetcher = Arc::new(ContentFetcher::new(store.clone()));
|
||||
fetcher.register_origin(origin);
|
||||
|
||||
let file_meta = FileMeta {
|
||||
id: FileId(1),
|
||||
virtual_path: VirtualPath::new("/test.flac"),
|
||||
real_path: RealPath {
|
||||
origin_id: OriginId::from("local"),
|
||||
path: PathBuf::from("/test.flac"),
|
||||
},
|
||||
size: test_content.len() as u64,
|
||||
mtime: UNIX_EPOCH,
|
||||
content_hash: None,
|
||||
audio: None,
|
||||
};
|
||||
fetcher.register_file(file_meta);
|
||||
|
||||
let manifest = fetcher.fetch_file(FileId(1)).await.unwrap();
|
||||
|
||||
assert!(!manifest.chunks.is_empty(), "Issue 6.6: Fetch should complete even when CAS write fails (passthrough mode)");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_cas_size_tracking_is_correct() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let config = CasConfig {
|
||||
chunks_dir: dir.path().join("chunks"),
|
||||
max_size: 10_000_000,
|
||||
shard_levels: 2,
|
||||
};
|
||||
let store = CasStore::open(config).await.unwrap();
|
||||
|
||||
let data = vec![0u8; 1000];
|
||||
store.put(&data).await.unwrap();
|
||||
|
||||
assert!(
|
||||
store.current_size() >= 1000,
|
||||
"Issue C6: current_size should track chunk data (recursive), got {}",
|
||||
store.current_size()
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_pid_file_prevents_concurrent_mount() {
|
||||
use std::fs::File;
|
||||
use std::os::unix::io::AsRawFd;
|
||||
|
||||
let dir = TempDir::new().unwrap();
|
||||
let lock_path = dir.path().join("musicfs.lock");
|
||||
|
||||
fn try_lock(path: &Path) -> Result<File, std::io::Error> {
|
||||
let file = File::create(path)?;
|
||||
let fd = file.as_raw_fd();
|
||||
let ret = unsafe { libc::flock(fd, libc::LOCK_EX | libc::LOCK_NB) };
|
||||
if ret != 0 {
|
||||
return Err(std::io::Error::last_os_error());
|
||||
}
|
||||
Ok(file)
|
||||
}
|
||||
|
||||
let lock1 = try_lock(&lock_path);
|
||||
assert!(lock1.is_ok(), "Issue C9: First lock should succeed");
|
||||
|
||||
let lock2 = try_lock(&lock_path);
|
||||
assert!(lock2.is_err(), "Issue C9: Second lock should fail (already held)");
|
||||
|
||||
drop(lock1);
|
||||
|
||||
let lock3 = try_lock(&lock_path);
|
||||
assert!(lock3.is_ok(), "Issue C9: Third lock should succeed after first released");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_panic_hook_logs_to_tracing() {
|
||||
use std::panic;
|
||||
|
||||
musicfs_core::install_panic_hook();
|
||||
|
||||
let result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
|
||||
panic!("test panic message");
|
||||
}));
|
||||
|
||||
assert!(result.is_err(), "Panic should have been caught");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_stale_mount_check_function_exists() {
|
||||
let path = std::path::Path::new("/nonexistent/musicfs/mount");
|
||||
assert!(
|
||||
!path.exists(),
|
||||
"Test path should not exist for this test to be meaningful"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_systemd_service_has_execstoppost() {
|
||||
let service_path = std::path::Path::new("../../dist/musicfs.service");
|
||||
if !service_path.exists() {
|
||||
panic!("Issue 3.7: dist/musicfs.service does not exist at {:?}", service_path);
|
||||
}
|
||||
|
||||
let content = std::fs::read_to_string(service_path).unwrap();
|
||||
assert!(
|
||||
content.contains("ExecStopPost") && content.contains("fusermount"),
|
||||
"Issue 3.7: Service file should have ExecStopPost with fusermount for cleanup"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_sd_notify_ready_sent() {
|
||||
use std::os::unix::net::UnixDatagram;
|
||||
use tempfile::TempDir;
|
||||
|
||||
let dir = TempDir::new().unwrap();
|
||||
let socket_path = dir.path().join("notify.sock");
|
||||
let socket = UnixDatagram::bind(&socket_path).unwrap();
|
||||
socket.set_read_timeout(Some(Duration::from_secs(1))).unwrap();
|
||||
|
||||
std::env::set_var("NOTIFY_SOCKET", &socket_path);
|
||||
|
||||
let result = sd_notify::notify(false, &[sd_notify::NotifyState::Ready]);
|
||||
assert!(result.is_ok(), "sd_notify should succeed when NOTIFY_SOCKET is set");
|
||||
|
||||
let mut buf = [0u8; 256];
|
||||
let len = socket.recv(&mut buf).unwrap();
|
||||
let msg = std::str::from_utf8(&buf[..len]).unwrap();
|
||||
|
||||
assert!(msg.contains("READY=1"), "sd_notify should send READY=1, got: {}", msg);
|
||||
|
||||
std::env::remove_var("NOTIFY_SOCKET");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_shutdown_cancels_background_tasks() {
|
||||
let token = CancellationToken::new();
|
||||
let stopped = Arc::new(AtomicBool::new(false));
|
||||
let stopped_clone = stopped.clone();
|
||||
let token_clone = token.clone();
|
||||
|
||||
tokio::spawn(async move {
|
||||
token_clone.cancelled().await;
|
||||
stopped_clone.store(true, Ordering::SeqCst);
|
||||
});
|
||||
|
||||
assert!(!stopped.load(Ordering::SeqCst));
|
||||
token.cancel();
|
||||
tokio::time::sleep(Duration::from_millis(50)).await;
|
||||
assert!(stopped.load(Ordering::SeqCst));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_shutdown_flushes_tantivy() {
|
||||
let dir = TempDir::new().unwrap();
|
||||
let idx_path = dir.path().join("idx");
|
||||
|
||||
{
|
||||
let index = SearchIndex::open(&idx_path).unwrap();
|
||||
index.index_file(&make_file_meta(1, "/a.flac", 1000)).unwrap();
|
||||
index.commit().unwrap();
|
||||
}
|
||||
|
||||
let index2 = SearchIndex::open(&idx_path).unwrap();
|
||||
assert_eq!(index2.search("a", 10).unwrap().len(), 1);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_supervisor_detects_task_completion() {
|
||||
let supervisor = TaskSupervisor::new();
|
||||
supervisor.spawn_supervised("fast", async {});
|
||||
tokio::time::sleep(Duration::from_millis(50)).await;
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_supervisor_detects_panic() {
|
||||
let supervisor = TaskSupervisor::new();
|
||||
supervisor.spawn_supervised("panicker", async {
|
||||
panic!("boom");
|
||||
});
|
||||
tokio::time::sleep(Duration::from_millis(50)).await;
|
||||
assert!(matches!(
|
||||
supervisor.task_status("panicker"),
|
||||
TaskStatus::Failed { .. }
|
||||
));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_supervisor_restarts_critical_task() {
|
||||
let count = Arc::new(AtomicU32::new(0));
|
||||
let c = count.clone();
|
||||
|
||||
let supervisor = TaskSupervisor::new();
|
||||
supervisor.spawn_critical("restartable", move || {
|
||||
let c = c.clone();
|
||||
async move {
|
||||
let n = c.fetch_add(1, Ordering::SeqCst);
|
||||
if n == 0 {
|
||||
panic!("first run fails");
|
||||
}
|
||||
loop {
|
||||
tokio::time::sleep(Duration::from_secs(60)).await;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
tokio::time::sleep(Duration::from_secs(2)).await;
|
||||
assert_eq!(count.load(Ordering::SeqCst), 2);
|
||||
assert!(matches!(
|
||||
supervisor.task_status("restartable"),
|
||||
TaskStatus::Running
|
||||
));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_sigterm_triggers_shutdown() {
|
||||
use std::process::{Command, Stdio};
|
||||
use std::time::Duration;
|
||||
use tokio::time::timeout;
|
||||
|
||||
let musicfs_bin = std::env::var("CARGO_BIN_EXE_musicfs").ok();
|
||||
if musicfs_bin.is_none() {
|
||||
eprintln!("Skipping test_sigterm_triggers_shutdown: musicfs binary not available in test context");
|
||||
return;
|
||||
}
|
||||
|
||||
let bin_path = musicfs_bin.unwrap();
|
||||
let temp_dir = tempfile::TempDir::new().unwrap();
|
||||
let mountpoint = temp_dir.path().join("mount");
|
||||
let origin = temp_dir.path().join("origin");
|
||||
std::fs::create_dir_all(&mountpoint).unwrap();
|
||||
std::fs::create_dir_all(&origin).unwrap();
|
||||
|
||||
let mut child = Command::new(&bin_path)
|
||||
.args(["mount", "--origin", origin.to_str().unwrap(), mountpoint.to_str().unwrap()])
|
||||
.stdout(Stdio::null())
|
||||
.stderr(Stdio::null())
|
||||
.spawn();
|
||||
|
||||
if child.is_err() {
|
||||
eprintln!("Skipping test_sigterm_triggers_shutdown: failed to spawn musicfs");
|
||||
return;
|
||||
}
|
||||
|
||||
let mut child = child.unwrap();
|
||||
tokio::time::sleep(Duration::from_millis(500)).await;
|
||||
|
||||
unsafe {
|
||||
libc::kill(child.id() as i32, libc::SIGTERM);
|
||||
}
|
||||
|
||||
let exit_result = timeout(Duration::from_secs(10), async {
|
||||
loop {
|
||||
match child.try_wait() {
|
||||
Ok(Some(status)) => return status,
|
||||
Ok(None) => tokio::time::sleep(Duration::from_millis(100)).await,
|
||||
Err(_) => break,
|
||||
}
|
||||
}
|
||||
child.wait().unwrap()
|
||||
}).await;
|
||||
|
||||
assert!(exit_result.is_ok(), "Issue 2.1: Process should exit within 10s after SIGTERM");
|
||||
}
|
||||
Vendored
+1
@@ -6,6 +6,7 @@ After=network.target
|
||||
Type=notify
|
||||
ExecStart=/usr/bin/musicfs mount --config /etc/musicfs/config.toml /mnt/music
|
||||
ExecStop=/usr/bin/musicfs shutdown
|
||||
ExecStopPost=/usr/bin/fusermount -uz /mnt/music || true
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
User=musicfs
|
||||
|
||||
@@ -0,0 +1,40 @@
|
||||
services:
|
||||
toxiproxy:
|
||||
image: ghcr.io/shopify/toxiproxy:2.9.0
|
||||
ports:
|
||||
- "8474:8474"
|
||||
- "20000-20010:20000-20010"
|
||||
healthcheck:
|
||||
test: ["CMD", "/toxiproxy-cli", "list"]
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 3
|
||||
|
||||
minio:
|
||||
image: minio/minio:latest
|
||||
command: server /data --console-address ":9001"
|
||||
ports:
|
||||
- "9000:9000"
|
||||
- "9001:9001"
|
||||
environment:
|
||||
MINIO_ROOT_USER: test
|
||||
MINIO_ROOT_PASSWORD: testtest123
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 3
|
||||
volumes:
|
||||
- minio-data:/data
|
||||
|
||||
sftp:
|
||||
image: atmoz/sftp:latest
|
||||
ports:
|
||||
- "2222:22"
|
||||
command: test:test:::music
|
||||
volumes:
|
||||
- sftp-data:/home/test/music
|
||||
|
||||
volumes:
|
||||
minio-data:
|
||||
sftp-data:
|
||||
Reference in New Issue
Block a user