1374084135
docs/v1/ - Original beetfs documentation:
- analysis.md, components.md, data-flow.md, drawbacks.md
- features.md, modernization.md, rust-migration.md
- benchmark-plan.md, benchmark-results.md, e2e-test-plan.md
- README.md
docs/v2/ - New MusicFS architecture:
- requirements.md: Full requirements spec (FR-1 to FR-25, NFR-1 to NFR-14)
- P0: Multi-origin, plugins, CAS, control API
- P1: Search, album art, prefetch, metadata sources
- P3: HA, 10M+ files scalability
- architecture.md: Google BlueDoc style design document
- PlantUML diagrams for all components
- Design requirements with quantitative targets
- Alternatives considered, implementation plan
11 KiB
11 KiB
beetfs Performance Analysis
Executive Summary
beetfs has significant performance limitations due to its 2010-era design assumptions. The primary issues are full file loading into RAM and blocking I/O on file open.
1. Latency Analysis
Operation Latencies
| Operation | Time Complexity | Typical Latency | Notes |
|---|---|---|---|
| File Open | O(file_size) | 50ms - 1s+ | Reads entire file into memory |
| File Read | O(1) | <1ms | Pure memory slice |
| File Write | O(file_size) | 100ms - 2s+ | Reconstructs + DB write |
| Directory List | O(n) | <10ms | In-memory tree traversal |
| getattr | O(depth) | <1ms | Tree navigation + stat |
File Open Breakdown
The file open operation is the critical bottleneck:
Time breakdown for opening 50MB FLAC file:
┌────────────────────────────────────────────────────────────┐
│ 1. open() syscall │ ~1ms │
│ 2. file_object.read() - load entire file │ ~100-200ms │
│ 3. InterpolatedFLAC() - parse FLAC │ ~20-50ms │
│ 4. Inject DB metadata │ ~1ms │
│ 5. get_header() - generate new header │ ~10-20ms │
│ 6. Seek to audio offset │ ~1ms │
│ 7. Read audio into music_data │ ~100-200ms │
├────────────────────────────────────────────────────────────┤
│ TOTAL │ ~230-470ms │
└────────────────────────────────────────────────────────────┘
Code Evidence (lines 461-483):
# Step 2-5: Load and parse entire file
self.inf = InterpolatedFLAC(self.file_object.read()) # FULL FILE READ
self.inf["title"] = self.item.title
# ...
self.header = self.inf.get_header(self.real_path)
# Step 6-7: Cache all audio data
self.file_object.seek(self.music_offset)
self.music_data = self.file_object.read() # ANOTHER FULL READ
Read Operation (Post-Open)
After file is opened, reads are fast:
def read(self, size, offset):
if offset < self.bound:
return self.header[offset:offset+size] # Memory slice: O(1)
else:
return self.music_data[offset - len(self.header):...] # Memory slice: O(1)
Write Operation
Writes to header area trigger expensive reconstruction:
Time breakdown for tag write:
┌────────────────────────────────────────────────────────────┐
│ 1. Reconstruct filedata in memory │ ~10-50ms │
│ 2. Parse as InterpolatedFLAC │ ~20-50ms │
│ 3. Extract tag values │ ~1ms │
│ 4. lib.store() + lib.save() (SQLite) │ ~10-50ms │
│ 5. Regenerate header │ ~10-20ms │
├────────────────────────────────────────────────────────────┤
│ TOTAL │ ~50-170ms │
└────────────────────────────────────────────────────────────┘
2. Memory Footprint
Per-File Memory Usage
┌─────────────────────────────────────────────────────────────────────┐
│ FileHandler Memory Layout │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ self.music_data (bytes) │ │
│ │ Size: file_size - original_header_size │ │
│ │ Typical: 95-99% of file size │ │
│ │ Example: 48.5 MB for 50 MB file │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ self.header (bytes) │ │
│ │ Size: Generated FLAC header with DB metadata │ │
│ │ Typical: 4 KB - 64 KB (depends on metadata + padding) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ self.inf (InterpolatedFLAC) │ │
│ │ Size: Parsed metadata blocks + internal state │ │
│ │ Typical: 10 KB - 100 KB │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Other attributes │ │
│ │ path, real_path, item reference, format, etc. │ │
│ │ Typical: ~1 KB │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
├─────────────────────────────────────────────────────────────────────┤
│ TOTAL per file: ~1.0x - 1.1x original file size │
└─────────────────────────────────────────────────────────────────────┘
Memory Scaling
| Scenario | Files Open | Avg File Size | RAM Usage |
|---|---|---|---|
| Single track playback | 1 | 30 MB | ~32 MB |
| Album playback (gapless) | 2-3 | 30 MB | ~65-100 MB |
| Album fully opened | 10 | 30 MB | ~320 MB |
| Jellyfin library scan | 50-100 | 30 MB | 1.6 - 3.2 GB |
| Full library scan | 1000 | 30 MB | 32 GB (OOM) |
Global Memory
# Directory tree structure
directory_structure = FSNode({}, {})
# Memory: O(number_of_items)
# Typical: 1-10 MB for libraries with 10,000-100,000 tracks
# Open file handles
self.files = {} # Dict[str, FileHandler]
# Memory: Sum of all FileHandler instances
# Unbounded - grows with concurrent opens
3. I/O Patterns
Current (Inefficient)
File Open:
Disk → [Read ALL] → RAM (music_data)
→ RAM (inf object)
→ RAM (header)
File Read:
RAM (header or music_data) → Application
Total I/O: 1x-2x file size on open, 0 on read
Optimal (Not Implemented)
File Open:
Disk → [Read header only] → RAM (small)
File Read:
If header region:
RAM (header) → Application
If audio region:
Disk → [Seek + Read chunk] → Application
Total I/O: ~64KB on open, on-demand reads
4. Concurrency
Current Model
server.multithreaded = 0 # Single-threaded
Implications:
- All FUSE operations serialized
- One slow file open blocks everything
- No benefit from multi-core CPUs
Impact on Use Cases
| Use Case | Impact |
|---|---|
| Single player (VLC) | Acceptable - one file at a time |
| Media server scan | Severe - sequential processing |
| Multiple clients | Severe - requests queue up |
| Concurrent reads | Moderate - reads are fast once open |
5. Benchmarks (Theoretical)
Based on code analysis, not actual measurements:
File Open Time vs Size
File Size Open Time (HDD) Open Time (SSD)
────────────────────────────────────────────────
10 MB 50-100 ms 20-50 ms
30 MB 150-300 ms 50-100 ms
50 MB 250-500 ms 100-200 ms
100 MB 500-1000 ms 200-400 ms
200 MB 1000-2000 ms 400-800 ms
Memory vs Concurrent Opens
Open Files RAM Usage (30MB avg)
─────────────────────────────────────
1 ~32 MB
5 ~160 MB
10 ~320 MB
25 ~800 MB
50 ~1.6 GB
100 ~3.2 GB
6. Comparison with Alternatives
| Metric | beetfs | Direct File | NFS | FUSE passthrough |
|---|---|---|---|---|
| Open latency | 200-500ms | <10ms | 10-50ms | <10ms |
| Read latency | <1ms | <1ms | 1-10ms | <1ms |
| Memory/file | ~1x size | ~0 | ~0 | ~0 |
| Metadata source | Database | File | File | File |
| Modify original | No | Yes | Yes | Yes |
7. Recommendations
For Current Usage
- Limit concurrent opens - Don't scan full library
- Use SSDs - Reduces open latency by 2-3x
- Increase RAM - Expect 1x file size per open
- Avoid large files - 24-bit/192kHz FLACs are problematic
For Modernization
- Implement lazy loading - Read audio on demand
- Add file handle caching - Keep headers, release audio
- Enable multi-threading - Parallelize opens
- Add memory limits - Evict old FileHandlers