Files
MusicFS/docs/v2/plans/logging-comprehensive.md
Alexander 5ac33987c0 Add comprehensive logging with tracing, file rotation, and systemd integration
- Add tracing-appender and tracing-journald for production logging
- Add LoggingConfig with trace_sample_rate, json_output, journald options
- Expand init_logging() with file rotation, journald, and stderr layers
- Add sanitize_path() helper for PII protection in logs
- Instrument FUSE operations with #[instrument] and trace decision points
- Instrument gRPC handlers (10 methods) with span correlation
- Add spawn instrumentation for health monitor, indexer, watcher tasks
- Add broadcast lag handling (RecvError::Lagged) in event subscribers
- Fix webhook.rs expect() calls with proper error handling
- Add logging to patterns.rs, collections.rs, artwork.rs database ops
- Add Drop impl logging for PluginManager and WatchHandle
- Update systemd service with rate limiting and journal output
- Add logrotate config and example config.toml with logging section
2026-05-13 11:21:51 +02:00

35 KiB

Comprehensive Logging Plan

Goal: Add production-grade logging with trace-level observability, file rotation, and systemd integration
Effort: ~10-12 hours
Dependencies: Existing libraries only (no custom code)

Review Status: Reviewed by Oracle - all gaps addressed


Libraries Used

Need Library Status
Instrumentation tracing Already in workspace
Subscriber/filtering tracing-subscriber Already in workspace
File rotation tracing-appender Add to workspace
systemd journal tracing-journald Add to workspace
Compression logrotate (Linux tool) Config file only

Phase 1: Config & Dependencies (2 hours)

1.1 Add dependencies to workspace

# Cargo.toml [workspace.dependencies]
tracing-appender = "0.2"
tracing-journald = "0.3"
# crates/musicfs-cli/Cargo.toml
tracing-appender.workspace = true
tracing-journald.workspace = true

1.2 Add LoggingConfig to config.rs

// crates/musicfs-core/src/config.rs

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Config {
    pub mount_point: PathBuf,
    pub cache_dir: PathBuf,
    pub origins: Vec<OriginConfig>,
    #[serde(default)]
    pub cache: CacheConfig,
    #[serde(default)]
    pub health: HealthConfig,
    #[serde(default)]
    pub logging: LoggingConfig,  // NEW
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LoggingConfig {
    #[serde(default = "default_log_dir")]
    pub log_dir: PathBuf,
    
    #[serde(default)]
    pub json_output: bool,
    
    #[serde(default = "default_true")]
    pub journald: bool,
    
    #[serde(default = "default_log_level")]
    pub level: String,
}

impl Default for LoggingConfig {
    fn default() -> Self {
        Self {
            log_dir: default_log_dir(),
            json_output: false,
            journald: true,
            level: default_log_level(),
        }
    }
}

fn default_log_dir() -> PathBuf {
    PathBuf::from("/var/log/musicfs")
}
fn default_log_level() -> String {
    "musicfs=info,warn".to_string()
}
fn default_true() -> bool {
    true
}

1.3 Expand init_logging() in main.rs

// crates/musicfs-cli/src/main.rs

use tracing_appender::non_blocking::WorkerGuard;
use tracing_subscriber::{fmt, prelude::*, EnvFilter};

fn init_logging(config: &LoggingConfig) -> Result<WorkerGuard> {
    std::fs::create_dir_all(&config.log_dir)?;
    
    // File layer with daily rotation
    let file_appender = tracing_appender::rolling::daily(&config.log_dir, "musicfs.log");
    let (non_blocking, guard) = tracing_appender::non_blocking(file_appender);
    
    let file_layer = if config.json_output {
        fmt::layer()
            .json()
            .with_writer(non_blocking)
            .with_ansi(false)
            .boxed()
    } else {
        fmt::layer()
            .with_writer(non_blocking)
            .with_ansi(false)
            .boxed()
    };
    
    // Journald layer (Linux only)
    #[cfg(target_os = "linux")]
    let journald_layer = if config.journald {
        tracing_journald::layer()
            .ok()
            .map(|l| l.with_syslog_identifier("musicfs".to_string()))
    } else {
        None
    };
    
    // Stderr layer for interactive use
    let stderr_layer = fmt::layer()
        .with_writer(std::io::stderr)
        .compact();
    
    // Filter from config or env
    let filter = EnvFilter::try_from_default_env()
        .unwrap_or_else(|_| EnvFilter::new(&config.level));
    
    // Compose
    let subscriber = tracing_subscriber::registry()
        .with(filter)
        .with(file_layer)
        .with(stderr_layer);
    
    #[cfg(target_os = "linux")]
    let subscriber = subscriber.with(journald_layer);
    
    subscriber.init();
    
    tracing::info!(version = env!("CARGO_PKG_VERSION"), "MusicFS starting");
    Ok(guard)
}

1.4 Add logrotate config

# dist/logrotate.d/musicfs
/var/log/musicfs/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0640 musicfs musicfs
}

Phase 2: Add tracing to musicfs-core (1 hour)

2.1 Add dependency

# crates/musicfs-core/Cargo.toml
[dependencies]
tracing.workspace = true  # ADD THIS

2.2 Instrument core modules

File What to Add
config.rs Log config file loading, parse errors
credentials.rs Log credential loading (redacted values)
events.rs Log event publishing with counts

Phase 3: Instrument Hot Paths (4 hours)

Priority order by impact

Crate Files What to Add
musicfs-fuse filesystem.rs #[instrument] on all FUSE ops, trace at decision points
musicfs-origins failover.rs, health.rs, router.rs Retry loops, state transitions, selection logic
musicfs-cache tree.rs, metadata.rs Tree mutations, cache hit/miss
musicfs-cas reader.rs, store.rs Chunk operations, dedup decisions
musicfs-sync delta.rs, watcher.rs Change detection, file events

Instrumentation patterns

// Function level - add to all public async functions
#[tracing::instrument(level = "debug", skip(self), fields(path = %path))]
pub async fn read(&self, path: &str) -> Result<Bytes> {
    // ...
}

// Decision points - add trace! at match/if branches
match result {
    Ok(data) => {
        tracing::trace!(bytes = data.len(), "read success");
        data
    }
    Err(e) => {
        tracing::trace!(error = %e, "read failed");
        return Err(e);
    }
}

// State changes - use info! for important transitions
tracing::info!(old = ?old_status, new = ?new_status, origin = %id, "health changed");

// Cache operations
tracing::trace!(hit = true, fresh = true, "cache hit");
tracing::trace!(hit = false, "cache miss");

FUSE operations (filesystem.rs) - highest priority

Operation Level Fields
lookup() debug parent, name, result_ino
getattr() debug ino, file_type
readdir() debug ino, entry_count
read() debug ino, offset, size, bytes_read
open() debug ino, flags
release() trace ino

Origin operations - critical for debugging

Function Level Fields
read_with_failover() debug path, origins_tried, success
read_with_retry() trace origin, attempt, success
check_health() debug origin, old_status, new_status
select_origin() trace candidates, selected, reason

Phase 4: Update Production Files (1 hour)

4.1 Update systemd service

# dist/musicfs.service (add these lines)
Environment="RUST_LOG=musicfs=info,warn"
StandardOutput=journal
StandardError=journal
SyslogIdentifier=musicfs
RateLimitIntervalSec=30s
RateLimitBurst=1000

4.2 Example config.toml

# dist/config.example.toml
mount_point = "/mnt/music"
cache_dir = "/var/cache/musicfs"

[logging]
log_dir = "/var/log/musicfs"
json_output = true
journald = true
level = "musicfs=info,warn"

[cache]
metadata_cache_mb = 100
content_cache_gb = 10

[health]
check_interval_secs = 30
timeout_ms = 5000

[[origins]]
id = "local"
origin_type = "local"
priority = 1
path = "/srv/music"

Detailed Log Locations by Level

ERROR Level (25+ locations) - Unrecoverable Failures

File Line Log Message
musicfs-grpc/src/webhook.rs 43 error!("Failed to initialize webhook HTTP client: {error}")
musicfs-grpc/src/webhook.rs 133 error!("Invalid HMAC secret key for webhook signature: {error}")
musicfs-plugins/src/manager.rs 272 error!("Plugin manager initialization failed: {error}")
musicfs-plugins/src/wasm.rs 142,183 error!("WASM plugin host initialization failed: {error}")
musicfs-search/src/index.rs 211,217 error!("Search index corrupted: failed to deserialize at position {pos}")
musicfs-cas/src/store.rs 105 error!("CAS chunk not found: {hash} - possible data loss")
musicfs-cas/src/store.rs 124-131 error!("CAS integrity check failed: expected {expected}, got {actual}")
musicfs-fuse/src/filesystem.rs 103 error!("Failed to mount filesystem at {mountpoint}: {error}")
musicfs-origins/src/failover.rs 76 error!("No origins available for path {path}")
musicfs-origins/src/failover.rs 125,186 error!("Max retries ({max_attempts}) exceeded for origin {origin_id}")
musicfs-origins/src/nfs.rs 63 error!("NFS stale file handle after {max_retries} retries for {path}")
musicfs-cas/src/reader.rs 75 error!("File manifest not found for file_id {file_id}")
musicfs-cas/src/fetcher.rs 60,68 error!("File/Origin not found for file_id {file_id}")
musicfs-search/src/indexer.rs 44,56 error!("Search indexer/commit failed: {error}")
musicfs-sync/src/watcher.rs 36,59,63 error!("Watcher failed for origin {origin_id}: {error}")

WARN Level (50+ locations) - Recoverable Issues

Category File Line Log Message
Retry Logic failover.rs 90 warn!("Origin {origin_id} failed: {error}, trying next (attempt {n}/{total})")
Retry Logic failover.rs 111-118 warn!("Retrying origin {origin_id} after {delay:?} (attempt {n}/{max})")
Retry Logic nfs.rs 47-52 warn!("NFS stale handle for {path} (attempt {n}/{max}), retrying")
Retry Logic smb.rs 45 warn!("SMB connection lost (ENOTCONN), retrying (attempt {n}/{max})")
Retry Logic webhook.rs 94-108 warn!("Webhook delivery failed to {url} (attempt {n}/{max}): {error}")
Fallback failover.rs 70-73 warn!("No healthy origins for {path}, using fallback {origin_id}")
Timeout smb.rs 107-109 warn!("SMB health check timed out after 5s for {origin_id}")
Timeout nfs.rs 104-106 warn!("NFS health check timed out after 5s for {origin_id}")
Timeout prefetch.rs 91 warn!("Prefetch event receive timed out after 1s")
Health health.rs 209 warn!("Origin {origin_id} is degraded (failures: {count})")
Health health.rs 217-220 warn!("Origin {origin_id} is now unhealthy after {n} consecutive failures")
Remote FS smb.rs 118 warn!("SMB watch using inotify on {share_path} - may be unreliable")
Remote FS nfs.rs 115 warn!("NFS watch using inotify on {mount_point} - may be unreliable")
Plugin manager.rs 152 warn!("Failed to load plugin from {path}: {error}")
Plugin manager.rs 193-194 warn!("Failed to unload plugin {plugin_id}: {error}")
Prefetch prefetch.rs 97 warn!("Failed to record access pattern for {file_id}: {error}")
Prefetch prefetch.rs 159-161 warn!("Prefetch skipped: concurrency limit reached ({max})")
Search indexer.rs 49 warn!("Search indexer event receive error: {error}")
Search indexer.rs 82 warn!("No metadata found for file {path}, skipping indexing")
Collections collections.rs 146,180 warn!("Failed to save/delete collection {name}: {error}")

INFO Level (35+ locations) - Lifecycle & Major Operations

Category File Line Log Message
Lifecycle main.rs 118 info!(version = env!("CARGO_PKG_VERSION"), "MusicFS starting")
Lifecycle filesystem.rs 94 info!("Mounting MusicFS at {:?}", mountpoint)
Lifecycle filesystem.rs 154 info!("MusicFS initialized")
Lifecycle filesystem.rs 159 info!("MusicFS destroyed")
Origin registry.rs 28 info!("Registering origin {} with priority {}", id, priority)
Origin registry.rs 36 info!("Unregistering origin {}", id)
Origin watcher.rs 65 info!("Watching origin {} at {:?}", origin_id, path)
Config main.rs 127 info!("Cache directory: {:?}", cache_dir)
Config main.rs 141 info!("CAS store initialized")
Config store.rs 51 info!("CAS store opened: {} chunks, {} bytes", count, size) (ADD)
Sync main.rs 150,152 info!("Scanning music files...") / info!("Found {} music files", count)
Sync delta.rs 104 info!("Delta complete: {} added, {} removed, {} modified", a, r, m)
Sync delta.rs 63 info!("Sync started for origin {}", origin_id) (ADD)
Index main.rs 160 info!("Virtual tree built")
Index indexer.rs 62 info!("Indexer stopping")
Index indexer.rs 114 info!("Indexed {} files", count)
Index index.rs 170 info!("Search index committed")
Health health.rs 202 info!("Origin {} is now healthy", id)
Health health.rs 150 info!("Health monitor started with interval {:?}", interval) (ADD)
Plugin manager.rs 127 info!("Initializing plugin system")
Plugin manager.rs 150 info!("Loaded plugin '{}' with id {:?}", name, id)
Plugin manager.rs 256 info!("Shutting down plugin system")
Cache prefetch.rs 123 info!("Prefetch engine stopped")
Cache prefetch.rs 174 info!("Prefetched {:?}: {} chunks, {} bytes", file_id, chunks, bytes)
Cache eviction.rs 51 info!("Evicted {} bytes from cache", bytes)
Cache prefetch.rs 73 info!("Prefetch engine started (lookahead: {}, max_concurrent: {})") (ADD)

DEBUG Level (60+ locations) - Operation Details

Category File Line Log Message
FUSE lookup filesystem.rs 162,195,200 Entry + result/miss
FUSE getattr filesystem.rs 203,230,233 Entry + result/miss
FUSE readdir filesystem.rs 237,263,303 Entry + result/miss
FUSE read filesystem.rs 325,338,362,364 Entry + file_id + result/error
Local origin local.rs 51,68 readdir entry + result
Local origin local.rs 88-91,112 read entry + result
SMB origin smb.rs 86,93 readdir/read entry + result
NFS origin nfs.rs 81,89 readdir/read entry + result
Failover failover.rs 66,82,87 Entry + trying origin + success
Tree lookup tree.rs 124,132 Entry + result
Metadata cache metadata.rs 36,40 lookup + is_fresh entry/result
CAS store store.rs 70,101 put/get entry
File reader reader.rs 66,86 manifest cache + read entry
Search ops/search.rs 107,141,182 readdir_query + readlink + execute_query
Search index index.rs 98,174 index_file + search entry
Fetcher fetcher.rs 54,61,121 fetch_file entry + meta + ensure_cached

Key DEBUG fields: ino, parent, name, offset, size, bytes_read, origin_id, path, file_id, query, results_count, latency_ms

TRACE Level (100+ locations) - Fine-Grained Flow

Category File Lines What to Log
Manifest cache reader.rs 67-74 Cache hit/miss decision
Chunk iteration reader.rs 107-127 Each chunk: skip/read boundaries
CAS dedup store.rs 74-77 Dedup hit decision
CAS integrity store.rs 121-134 Verification result
Tree lookup tree.rs 118-129 Path→inode + child lookup
Tree parent tree.rs 148-153 Parent resolution path
Prefetch event prefetch.rs 91-120 Event type match arms
Prefetch semaphore prefetch.rs 150-164 In-flight check + acquire
Delta scan delta.rs 79-102 Each file: cached/modified/unchanged/removed
Delta entries delta.rs 128-146 Each entry: dir/audio/skip
CDC chunking cdc.rs 84-93 Each chunk: offset/length/hash
Failover origin failover.rs 68-93 Each origin attempt result
Failover retry failover.rs 107-122 Each retry: attempt/success/delay
Router select router.rs 79-108 Each candidate + selection reason
FUSE node→attr filesystem.rs 109-145 Directory vs file conversion
FUSE lookup filesystem.rs 192-200 Found/not found
FUSE readdir filesystem.rs 274-291 Each child entry
FUSE read filesystem.rs 340-367 file_id resolution + result
Metadata tag parser.rs 86-100 Each tag extraction
Health transition health.rs 199-237 State transition details
Latency recording router.rs 23-42 Stats update per sample

Key TRACE patterns:

  • Every match arm: trace!("match arm: {variant}")
  • Every if/else: trace!("branch: {condition}={value}")
  • Every loop iteration: trace!("iteration {i}/{total}: ...")
  • Every cache lookup: trace!("cache lookup key={key}, hit={hit}")

gRPC Handler Instrumentation (ADDED - Oracle Review)

Gap identified: 8/10 gRPC handlers had no logging.

server.rs - All Handlers

Handler Line Level Log Message
get_status() 209 DEBUG debug!("gRPC get_status called")
get_cache_stats() 241 DEBUG debug!("gRPC get_cache_stats called")
clear_cache() 278 INFO info!("gRPC clear_cache: clearing {tier}")
prefetch() 296 DEBUG debug!(file_count = paths.len(), "gRPC prefetch started")
list_origins() 322 DEBUG debug!("gRPC list_origins called")
get_origin_health() 329 DEBUG debug!(origin_id = %id, "gRPC get_origin_health")
rescan_origin() 337 INFO info!(origin_id = %id, "gRPC rescan_origin started")
subscribe_events() 376 INFO info!("gRPC subscribe_events: client connected")
shutdown() 402 INFO info!(graceful = graceful, "gRPC shutdown requested")

search_service.rs

Handler Line Level Log Message
search() entry DEBUG debug!(query = %q, limit = limit, "gRPC search")
search() result DEBUG debug!(results = results.len(), "gRPC search completed")

Pattern: Use #[instrument] on all handlers

#[tracing::instrument(level = "debug", skip(self, request), fields(method = "get_status"))]
async fn get_status(&self, request: Request<()>) -> Result<Response<StatusResponse>, Status> {
    // ...
}

Async Task Spawn Instrumentation (ADDED - Oracle Review)

Gap identified: 14 tokio::spawn sites need correlation IDs and span propagation.

Spawn Sites Requiring Instrumentation

File Line Task Instrumentation
server.rs 305 prefetch stream spawn(async { ... }.instrument(info_span!("prefetch_stream")))
server.rs 354 rescan stream spawn(async { ... }.instrument(info_span!("rescan_stream", origin_id = %id)))
server.rs 384 subscribe events spawn(async { ... }.instrument(info_span!("event_subscriber")))
search_service.rs spawn search task spawn(async { ... }.instrument(debug_span!("search_task", query = %q)))
indexer.rs spawn indexer loop spawn(async { ... }.instrument(info_span!("indexer")))
prefetch.rs 87 prefetch engine spawn(async { ... }.instrument(info_span!("prefetch_engine")))
prefetch.rs 169 prefetch file spawn(async { ... }.instrument(debug_span!("prefetch_file", file_id = ?id)))
health.rs 154 health monitor spawn(async { ... }.instrument(info_span!("health_monitor")))
watcher.rs 34 file watcher spawn(async { ... }.instrument(info_span!("file_watcher", origin_id = %id)))
artwork.rs spawn image decode `spawn_blocking(

Pattern: Span Propagation

use tracing::Instrument;

// BEFORE (loses context)
tokio::spawn(async move {
    do_work().await;
});

// AFTER (preserves correlation)
let span = tracing::info_span!("task_name", task_id = %id);
tokio::spawn(async move {
    do_work().await;
}.instrument(span));

Add to init_logging() for request IDs

// Generate request ID for correlation
use tracing::Span;
use uuid::Uuid;

fn with_request_id<F, R>(f: F) -> R
where F: FnOnce() -> R {
    let request_id = Uuid::new_v4();
    let span = tracing::info_span!("request", request_id = %request_id);
    span.in_scope(f)
}

Database Operation Logging (ADDED - Oracle Review)

Gap identified: Zero logging for rusqlite operations in db.rs, collections.rs, patterns.rs, artwork.rs.

db.rs - Core Database

Function Line Level Log Message
open() entry INFO info!(path = ?path, "Opening metadata database")
open() success INFO info!(file_count = count, "Database opened")
upsert_file() entry DEBUG debug!(file_id = ?id, path = %path, "Upserting file")
upsert_file() error ERROR error!(file_id = ?id, error = %e, "Failed to upsert file")
get_file_by_id() miss TRACE trace!(file_id = ?id, "File not found in db")
delete_file() entry DEBUG debug!(file_id = ?id, "Deleting file from db")
list_files_by_origin() result DEBUG debug!(origin_id = %id, count = files.len(), "Listed files")

collections.rs

Function Line Level Log Message
create() entry INFO info!(name = %name, "Creating collection")
save() error WARN warn!(name = %name, error = %e, "Failed to save collection")
delete() entry INFO info!(name = %name, "Deleting collection")
list() result DEBUG debug!(count = collections.len(), "Listed collections")

patterns.rs - Access Patterns

Function Line Level Log Message
record_access() entry TRACE trace!(file_id = ?id, "Recording access pattern")
predict_next() result DEBUG debug!(predictions = preds.len(), "Predicted next files")

artwork.rs

Function Line Level Log Message
store() entry DEBUG debug!(file_id = ?id, size_bytes = data.len(), "Storing artwork")
get() hit/miss TRACE trace!(file_id = ?id, found = found, "Artwork lookup")

Pattern: Database Error Wrapper

// Add to musicfs-cache/src/db.rs
fn log_db_result<T>(op: &str, result: Result<T, rusqlite::Error>) -> Result<T, Error> {
    match result {
        Ok(v) => {
            tracing::trace!(op = op, "db operation succeeded");
            Ok(v)
        }
        Err(e) => {
            tracing::error!(op = op, error = %e, "db operation failed");
            Err(Error::Database(e.to_string()))
        }
    }
}

Channel Operation Logging (ADDED - Oracle Review)

Gap identified: No logging for channel capacity, close, or broadcast lag.

Channel Locations

File Type Log Points
events.rs broadcast Lag warning when receiver falls behind
watcher.rs mpsc Channel close on watcher shutdown
server.rs mpsc gRPC stream channel capacity
indexer.rs mpsc Event queue depth
health.rs mpsc Health check channel

Patterns

// Broadcast lag detection (events.rs)
match rx.recv().await {
    Ok(event) => { /* handle */ }
    Err(broadcast::error::RecvError::Lagged(n)) => {
        tracing::warn!(skipped = n, "Event subscriber lagged, skipped events");
    }
    Err(broadcast::error::RecvError::Closed) => {
        tracing::debug!("Event channel closed");
        break;
    }
}

// Channel capacity warning (before send)
if tx.capacity() < 10 {
    tracing::warn!(remaining = tx.capacity(), "Channel near capacity");
}

// Channel close
impl Drop for EventBus {
    fn drop(&mut self) {
        tracing::debug!("Event bus shutting down");
    }
}

Drop Implementation Logging (ADDED - Oracle Review)

Gap identified: No logging in Drop impls for cleanup verification.

File Type Log Message
manager.rs:276 PluginManager debug!("PluginManager dropping, unloading {} plugins", self.plugins.len())
watcher.rs:157 WatchHandle trace!(origin_id = %self.origin_id, "WatchHandle dropped")
prefetch.rs PrefetchEngine debug!("PrefetchEngine dropping, {} in-flight", self.in_flight.len())
server.rs gRPC server info!("gRPC server shutting down")

Pattern

impl Drop for PluginManager {
    fn drop(&mut self) {
        tracing::debug!(
            plugin_count = self.plugins.len(),
            "PluginManager dropping"
        );
        // existing cleanup...
    }
}

Credential Loading (ADDED - Oracle Review)

Gap identified: No logging in credentials.rs::load().

Function Level Log Message
load() entry DEBUG debug!(origin_id = %origin_id, "Loading credentials")
load() cache hit TRACE trace!(origin_id = %origin_id, "Credential cache hit")
load() success INFO info!(origin_id = %origin_id, cred_type = %cred.type_name(), "Credential loaded")
load() not found DEBUG debug!(origin_id = %origin_id, "No credential found")
load() error WARN warn!(origin_id = %origin_id, error = %e, "Credential load failed")

SECURITY: Never log credential values. The existing Debug impl with redaction is correct.


Security Considerations (ADDED - Oracle Review)

Never Log These

Data Location Mitigation
WebhookConfig.secret webhook.rs Add #[serde(skip_serializing)], use custom Debug
Credential values credentials.rs Already redacted in Debug impl ✓
Full file paths with usernames everywhere Sanitize /home/{user}/~/
API keys/tokens config.rs Mark sensitive fields

Sanitization Helper

// Add to musicfs-core/src/lib.rs
pub fn sanitize_path(path: &Path) -> String {
    if let Ok(home) = std::env::var("HOME") {
        path.to_string_lossy()
            .replace(&home, "~")
            .to_string()
    } else {
        path.to_string_lossy().to_string()
    }
}

// Usage
debug!(path = %sanitize_path(&path), "Reading file");

WebhookConfig Fix

// webhook.rs - add custom Debug
#[derive(Clone, Serialize, Deserialize)]
pub struct WebhookConfig {
    pub url: String,
    #[serde(skip_serializing)]
    pub secret: Option<String>,  // Never serialize
    // ...
}

impl std::fmt::Debug for WebhookConfig {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("WebhookConfig")
            .field("url", &self.url)
            .field("secret", &self.secret.as_ref().map(|_| "[REDACTED]"))
            .finish()
    }
}

Performance Considerations (ADDED - Oracle Review)

Hot Path Warnings

Path Risk Mitigation
reader.rs chunk loop 100s of TRACE logs per seek Log summary only: trace!(chunks_read = n, "Read complete")
store.rs put/get 1000s during sync Keep at DEBUG, not TRACE
delta.rs file scan Log per file during full scan Use TRACE, batch summaries at DEBUG
parser.rs tag extraction Many TRACE per file Sample: log every 100th file

Trace Sampling Config

// Add to LoggingConfig
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct LoggingConfig {
    // ... existing fields ...
    
    /// Sample rate for TRACE logs in hot paths (0.0-1.0, default 1.0)
    #[serde(default = "default_sample_rate")]
    pub trace_sample_rate: f32,
}

fn default_sample_rate() -> f32 { 1.0 }

// Usage in hot paths
if rand::random::<f32>() < config.trace_sample_rate {
    trace!(...);
}

Rate-Limited Warnings

// For repeating warnings during outages (failover.rs)
use std::sync::atomic::{AtomicU64, Ordering};
use std::time::{Duration, Instant};

static LAST_FAILOVER_WARN: AtomicU64 = AtomicU64::new(0);

fn warn_rate_limited(origin_id: &str, error: &str) {
    let now = Instant::now().elapsed().as_secs();
    let last = LAST_FAILOVER_WARN.load(Ordering::Relaxed);
    if now - last >= 60 { // Max once per minute
        LAST_FAILOVER_WARN.store(now, Ordering::Relaxed);
        warn!(origin_id = %origin_id, error = %error, "Origin failover");
    }
}

Standardized Field Names (ADDED - Oracle Review)

Use these consistently across all log statements:

Field Type Usage
origin_id String Origin identifier (not origin)
file_id FileId File identifier
path String Virtual or real path (sanitized)
size_bytes u64 Size in bytes (not size, bytes, len)
offset u64 Read offset
duration_ms u64 Operation duration in milliseconds
count usize Generic count
attempt u32 Retry attempt number
max_attempts u32 Maximum retry attempts
error impl Display Error message (not err, e)
request_id Uuid Correlation ID for requests

Instrumentation Patterns (ADDED - Oracle Review)

Use #[instrument(err)] for Automatic Error Logging

// BEFORE: Manual error logging
pub async fn read(&self, path: &Path) -> Result<Bytes> {
    match self.inner_read(path).await {
        Ok(data) => Ok(data),
        Err(e) => {
            error!(path = ?path, error = %e, "Read failed");
            Err(e)
        }
    }
}

// AFTER: Automatic with #[instrument]
#[tracing::instrument(level = "debug", skip(self), err)]
pub async fn read(&self, path: &Path) -> Result<Bytes> {
    self.inner_read(path).await
}

Span Events vs Regular Logs

// Regular log - standalone event
info!("Operation completed");

// Span event - attached to current span context
tracing::Span::current().record("result", "success");

// Prefer span events for operation outcomes
#[instrument(fields(result))]
async fn operation() -> Result<()> {
    // ... work ...
    Span::current().record("result", "success");
    Ok(())
}

Fixes: Incorrect Line References (ADDED - Oracle Review)

File Issue Fix
webhook.rs:43 Uses expect() (panics) Replace with ? + error log
webhook.rs:133 Uses expect() (panics) Replace with ? + error log
// webhook.rs - BEFORE
let client = reqwest::Client::builder()
    .timeout(Duration::from_secs(30))
    .build()
    .expect("Failed to create HTTP client");

// webhook.rs - AFTER  
let client = reqwest::Client::builder()
    .timeout(Duration::from_secs(30))
    .build()
    .map_err(|e| {
        error!(error = %e, "Failed to create webhook HTTP client");
        WebhookError::ClientInit(e.to_string())
    })?;

Log Levels Guide

Level Use Case Example
ERROR Unrecoverable failures Mount failed, DB corruption
WARN Recoverable issues Origin timeout, retry needed
INFO Lifecycle events Service start/stop, health change
DEBUG Operation details Function entry, request params
TRACE Fine-grained flow Match arms, cache hit/miss

Testing Checklist

Basic Functionality

  • Log files created in configured directory
  • Daily rotation creates new files at midnight
  • JSON output parseable by jq
  • journalctl -t musicfs shows logs
  • RUST_LOG=musicfs=trace enables trace output
  • WorkerGuard kept alive (logs flush on shutdown)
  • Logrotate compresses old files

Correlation & Context (NEW)

  • Request IDs propagate through async tasks
  • Spawned task logs include parent span context
  • gRPC handler logs show method name in span

Security (NEW)

  • WebhookConfig.secret never appears in logs
  • Credential values never appear in logs
  • File paths with /home/{user} show as ~/

Performance (NEW)

  • TRACE sampling respects trace_sample_rate config
  • Hot path chunk loops log summary, not per-chunk
  • Origin failover warnings are rate-limited (1/minute)
  • Database operations log without blocking

Database & Channels (NEW)

  • Database open logs file count
  • Channel capacity warnings appear when queue fills
  • Broadcast lag warnings appear when subscriber falls behind
  • Drop implementations log cleanup

Summary

Phase Effort Deliverables
1. Config & Dependencies 2h LoggingConfig, init_logging(), logrotate, trace sampling
2. Core instrumentation 1h tracing in musicfs-core, credentials, sanitization
3. Hot path instrumentation 4h #[instrument] + trace! across 5 crates
4. gRPC & async tasks 2h Handler instrumentation, spawn correlation
5. Database & channels 2h rusqlite logging, channel capacity/close
6. Production files 1h Updated systemd, example config
Total 12h Full observability

Files to Modify

Phase 1: Config & Dependencies

File Changes
Cargo.toml (workspace) Add tracing-appender, tracing-journald
crates/musicfs-cli/Cargo.toml Add dependencies
crates/musicfs-core/Cargo.toml Add tracing
crates/musicfs-core/src/config.rs Add LoggingConfig with trace_sample_rate
crates/musicfs-cli/src/main.rs Expand init_logging(), request ID helper
crates/musicfs-core/src/lib.rs Add sanitize_path() helper

Phase 2: Core Instrumentation

File Changes
crates/musicfs-core/src/credentials.rs Add load() logging (redacted)
crates/musicfs-core/src/events.rs Add broadcast lag detection

Phase 3: Hot Path Instrumentation

File Changes
crates/musicfs-fuse/src/filesystem.rs Add #[instrument], trace!
crates/musicfs-origins/src/failover.rs Add #[instrument], trace!, rate-limited warn
crates/musicfs-origins/src/health.rs Add state transition logging
crates/musicfs-origins/src/router.rs Add selection logging
crates/musicfs-cache/src/tree.rs Add mutation logging
crates/musicfs-cache/src/metadata.rs Add hit/miss logging
crates/musicfs-cas/src/reader.rs Add chunk assembly logging (summary, not per-chunk)
crates/musicfs-cas/src/store.rs Add dedup logging
crates/musicfs-sync/src/delta.rs Add change detection logging

Phase 4: gRPC & Async Tasks (NEW)

File Changes
crates/musicfs-grpc/src/server.rs Add #[instrument] to all 10 handlers, spawn correlation
crates/musicfs-grpc/src/search_service.rs Add #[instrument], spawn instrumentation
crates/musicfs-grpc/src/webhook.rs Fix expect() → error!, custom Debug for secret
crates/musicfs-cache/src/prefetch.rs Add spawn instrumentation, Drop logging
crates/musicfs-search/src/indexer.rs Add spawn instrumentation
crates/musicfs-sync/src/watcher.rs Add spawn instrumentation, Drop logging
crates/musicfs-plugins/src/manager.rs Add Drop logging

Phase 5: Database & Channels (NEW)

File Changes
crates/musicfs-cache/src/db.rs Add log_db_result() helper, open/upsert/query logging
crates/musicfs-search/src/collections.rs Add CRUD operation logging
crates/musicfs-cache/src/patterns.rs Add access pattern logging
crates/musicfs-cache/src/artwork.rs Add store/get logging

Phase 6: Production Files

File Changes
dist/musicfs.service Add logging directives
dist/logrotate.d/musicfs New file
dist/config.example.toml Add logging section with trace_sample_rate