Add benchmark plan for beetfs performance measurement

Covers: - Mount time scaling (100 to 100K items) - Metadata operations (stat, readdir throughput) - File I/O (open latency, read throughput) - Memory usage (idle, per-file, leak detection) - Concurrent access (GIL impact) - Realistic workloads (library scan, album playback) Tools: fio, mdtest, hyperfine Baselines: ext4, fuse-passthrough, sshfs Key bottlenecks identified: - FileHandler loads entire file into RAM on open - Mount-time bulk load of all library items - Python GIL limits parallelism
2026-05-12 14:36:24 +02:00
parent f8666ae8c6
commit dacd3a7c1f
1 changed files with 403 additions and 0 deletions
@@ -0,0 +1,403 @@
+# beetfs Benchmark Plan
+
+## Executive Summary
+
+Benchmark suite to measure beetfs FUSE filesystem performance across mount time, metadata operations, file I/O, and memory usage. Focus on realistic music library workloads.
+
+## Critical Performance Findings (Pre-Benchmark)
+
+### Architecture Bottlenecks Identified
+
+| Bottleneck | Location | Impact |
+|------------|----------|--------|
+| **Full file load into RAM** | `FileHandler.__init__` line 481 | 50-100MB per open FLAC |
+| **Mount-time bulk load** | `mount()` line 143 | O(N) for N library items |
+| **GIL serialization** | Python 2.7 | Single-core limit for metadata ops |
+| **Per-file DB lookup** | `getattr()`, `access()` | SQLite query per stat call |
+
+### Expected Performance Characteristics
+
+| Operation | Expected Performance | Bottleneck |
+|-----------|---------------------|------------|
+| Mount (10K items) | 5-30 seconds | `lib.items()` + FSNode construction |
+| readdir | Fast (in-memory dict) | None |
+| getattr (file) | Slow (~1ms) | DB lookup + real file stat |
+| open (first) | Very slow | Full file read into RAM |
+| read | Fast | Memory-to-memory copy |
+| Memory (10 open files) | 500MB-1GB | FileHandler caches entire files |
+
+---
+
+## Benchmark Tools
+
+### Primary Tools
+
+| Tool | Purpose | Install |
+|------|---------|---------|
+| **fio** | I/O throughput, IOPS, latency | `nix-shell -p fio` |
+| **mdtest** | Metadata operations (stat, readdir) | `nix-shell -p ior` |
+| **hyperfine** | Mount time, command timing | `nix-shell -p hyperfine` |
+| **time** | Basic timing | builtin |
+| **/usr/bin/time -v** | Memory usage (maxrss) | builtin |
+
+### Measurement Scripts
+
+All benchmarks use synthetic FLAC files (5-10MB) to avoid I/O variance from real storage.
+
+---
+
+## Benchmark Categories
+
+### 1. Mount Time Scaling
+
+**Goal**: Measure how mount time scales with library size.
+
+**Method**:
+```bash
+# Create libraries with N items: 100, 1K, 10K, 50K, 100K
+hyperfine --warmup 1 --runs 5 \
+  'beet mount /mnt/beetfs && sleep 1 && fusermount -u /mnt/beetfs'
+```
+
+**Metrics**:
+- Time to mount (seconds)
+- Memory usage at mount completion (RSS)
+
+**Expected scaling**: O(N) - linear with library size
+
+**Test matrix**:
+| Library Size | Expected Mount Time | Expected Memory |
+|--------------|--------------------:|----------------:|
+| 100 items | <1s | ~50MB |
+| 1,000 items | 1-3s | ~60MB |
+| 10,000 items | 5-15s | ~100MB |
+| 50,000 items | 30-60s | ~300MB |
+| 100,000 items | 60-120s | ~500MB |
+
+---
+
+### 2. Metadata Operations (stat/readdir)
+
+**Goal**: Measure getattr and readdir performance - critical for music players that scan libraries.
+
+#### 2a. Single stat latency
+
+```bash
+# Measure single stat call latency
+hyperfine --warmup 10 --runs 100 \
+  'stat /mnt/beetfs/Artist/Album/01-Track.flac'
+```
+
+**Target**: <5ms average, <20ms p99
+
+#### 2b. Bulk stat (library scan simulation)
+
+```bash
+# Stat all files in library
+hyperfine --warmup 1 --runs 5 \
+  'find /mnt/beetfs -type f -exec stat {} + > /dev/null'
+```
+
+**Metrics**:
+- Total time for N files
+- stat operations per second
+- p50, p95, p99 latency
+
+**Target**: >500 stat/s (Python FUSE baseline)
+
+#### 2c. Directory listing
+
+```bash
+# List directory with N entries
+hyperfine --warmup 3 --runs 10 \
+  'ls /mnt/beetfs/Artist/Album/'
+```
+
+**Test matrix**:
+| Directory entries | Target time |
+|------------------:|------------:|
+| 10 | <50ms |
+| 100 | <100ms |
+| 1,000 | <500ms |
+
+---
+
+### 3. File Open Performance
+
+**Goal**: Measure file open latency - the critical bottleneck due to full file load.
+
+#### 3a. First open (cold)
+
+```bash
+# Clear any caches, then open file
+echo 3 > /proc/sys/vm/drop_caches
+hyperfine --warmup 0 --runs 10 \
+  'head -c 1 /mnt/beetfs/Artist/Album/01-Track.flac > /dev/null'
+```
+
+**Test matrix**:
+| File size | Expected open time |
+|----------:|-------------------:|
+| 5MB | 50-200ms |
+| 20MB | 200-500ms |
+| 50MB | 500ms-1s |
+| 100MB | 1-2s |
+
+#### 3b. Cached open (warm)
+
+```bash
+# File already opened once
+hyperfine --warmup 5 --runs 50 \
+  'head -c 1 /mnt/beetfs/Artist/Album/01-Track.flac > /dev/null'
+```
+
+**Target**: <10ms (should hit FileHandler cache)
+
+---
+
+### 4. Read Throughput
+
+**Goal**: Measure sequential and random read performance.
+
+#### 4a. Sequential read
+
+```bash
+fio --name=seq_read \
+    --filename=/mnt/beetfs/Artist/Album/01-Track.flac \
+    --rw=read --bs=1M --direct=0 \
+    --ioengine=sync --numjobs=1 \
+    --runtime=30 --time_based
+```
+
+**Metrics**: MB/s throughput
+
+**Target**: >100 MB/s (memory-backed after first read)
+
+#### 4b. Random read (simulates seeking in audio player)
+
+```bash
+fio --name=rand_read \
+    --filename=/mnt/beetfs/Artist/Album/01-Track.flac \
+    --rw=randread --bs=64k --direct=0 \
+    --ioengine=sync --numjobs=1 \
+    --runtime=30 --time_based
+```
+
+**Metrics**: IOPS, latency histogram
+
+---
+
+### 5. Memory Usage
+
+**Goal**: Measure memory consumption under load.
+
+#### 5a. Idle memory (mounted, no activity)
+
+```bash
+# Mount and measure RSS
+beet mount /mnt/beetfs &
+sleep 5
+ps -o rss= -p $(pgrep -f beetfs)
+```
+
+#### 5b. Memory per open file
+
+```bash
+# Open N files, measure memory growth
+for i in 1 5 10 20; do
+  # Open $i files simultaneously
+  cat /mnt/beetfs/Artist/Album/0{1..$i}*.flac > /dev/null &
+  ps -o rss= -p $(pgrep -f beetfs)
+done
+```
+
+**Expected**: ~file_size × open_files (FileHandler caches entire file)
+
+#### 5c. Memory leak detection
+
+```bash
+# Repeatedly open/close files, check for memory growth
+for i in {1..100}; do
+  cat /mnt/beetfs/Artist/Album/01-Track.flac > /dev/null
+done
+# Compare RSS before and after
+```
+
+---
+
+### 6. Concurrent Access
+
+**Goal**: Measure performance under parallel access (multiple processes).
+
+```bash
+# Parallel stat operations
+hyperfine --warmup 1 --runs 5 \
+  'seq 1 100 | xargs -P 4 -I {} stat /mnt/beetfs/Artist/Album/0{}-Track.flac'
+```
+
+**Metrics**:
+- Throughput scaling with parallelism (1, 2, 4, 8 workers)
+- Latency degradation
+
+**Expected**: Limited scaling due to Python GIL
+
+---
+
+### 7. Realistic Workloads
+
+#### 7a. Music player library scan
+
+Simulates: Rhythmbox/Clementine scanning library at startup
+
+```bash
+# Recursive stat + readdir
+time find /mnt/beetfs -type f -name "*.flac" -exec stat {} + | wc -l
+```
+
+#### 7b. Album playback
+
+Simulates: Playing 12-track album sequentially
+
+```bash
+# Open each file, read 1MB (simulate buffering), close
+for f in /mnt/beetfs/Artist/Album/*.flac; do
+  dd if="$f" of=/dev/null bs=1M count=1 2>/dev/null
+done
+```
+
+#### 7c. Metadata edit
+
+Simulates: Editing tags in Picard/Kid3
+
+```bash
+# Open file, write to header region, close
+# (Requires write support to be functional)
+```
+
+---
+
+## Baseline Comparisons
+
+### Reference Filesystems
+
+| Filesystem | Purpose |
+|------------|---------|
+| **ext4 (local)** | Best-case baseline |
+| **fuse-passthrough** | FUSE overhead baseline |
+| **sshfs** | Network FUSE comparison |
+
+### Comparison Method
+
+Run identical benchmarks on:
+1. Real music files on ext4
+2. Same files via FUSE passthrough
+3. Same files via beetfs
+
+Calculate overhead: `(beetfs_time - ext4_time) / ext4_time × 100%`
+
+---
+
+## Test Environment
+
+### Hardware Requirements
+
+- CPU: 4+ cores (to test GIL impact)
+- RAM: 8+ GB (for large library tests)
+- Storage: SSD recommended (reduces I/O variance)
+
+### Software Requirements
+
+```nix
+# Add to flake.nix devShell
+buildInputs = [
+  fio
+  hyperfine
+  # ior  # includes mdtest
+];
+```
+
+### Cache Control
+
+```bash
+# Clear all caches before cold benchmarks
+sync
+echo 3 > /proc/sys/vm/drop_caches
+
+# Disable kernel FUSE caching for accurate measurements
+mount -o entry_timeout=0,attr_timeout=0,negative_timeout=0
+```
+
+---
+
+## Success Criteria
+
+### Minimum Viable Performance
+
+| Metric | Minimum | Target | Excellent |
+|--------|--------:|-------:|----------:|
+| Mount time (10K items) | <60s | <15s | <5s |
+| stat latency (avg) | <20ms | <5ms | <1ms |
+| stat throughput | >100/s | >500/s | >2000/s |
+| File open (50MB, cold) | <5s | <1s | <200ms |
+| Read throughput | >50 MB/s | >200 MB/s | >500 MB/s |
+| Memory (idle, 10K items) | <500MB | <100MB | <50MB |
+| Memory per open file | <2× file size | <1.5× | <1.1× |
+
+### Regression Detection
+
+Any benchmark result >20% worse than baseline triggers investigation.
+
+---
+
+## Implementation Notes
+
+### Test Data Generation
+
+Use existing test infrastructure from `tests/conftest.py`:
+- `create_synthetic_flac()` - generates valid FLAC files
+- `BeetFSTestCase` - creates isolated beets library
+
+### Benchmark Script Structure
+
+```
+beetfs/
+├── benchmarks/
+│   ├── run_all.sh          # Master script
+│   ├── bench_mount.sh      # Mount time tests
+│   ├── bench_metadata.sh   # stat/readdir tests
+│   ├── bench_io.sh         # Read/write throughput
+│   ├── bench_memory.sh     # Memory profiling
+│   └── results/            # Output directory
+│       ├── mount_scaling.csv
+│       ├── stat_latency.csv
+│       └── ...
+```
+
+### Output Format
+
+```csv
+# Example: mount_scaling.csv
+library_size,mount_time_ms,memory_rss_kb,timestamp
+100,450,52000,2024-01-15T10:30:00
+1000,2100,61000,2024-01-15T10:31:00
+10000,12500,98000,2024-01-15T10:33:00
+```
+
+---
+
+## Known Limitations
+
+1. **Python 2.7 GIL**: Cannot achieve true parallelism - expect flat scaling beyond 1 core
+2. **FileHandler memory**: Each open file = full file in RAM - will OOM with many large files
+3. **No lazy loading**: All library items loaded at mount - slow for large libraries
+4. **SQLite single-writer**: Concurrent writes will serialize
+
+## Optimization Opportunities (Post-Benchmark)
+
+Based on benchmark results, consider:
+
+1. **Lazy FSNode construction** - Build tree on first access, not mount
+2. **Memory-mapped file access** - mmap instead of full read
+3. **LRU cache for FileHandler** - Evict old files instead of holding all
+4. **Metadata caching** - Cache getattr results, invalidate on DB change
+5. **Batch DB queries** - Prefetch metadata for directory listings