Reorganize docs into v1 (beetfs) and v2 (new architecture)
docs/v1/ - Original beetfs documentation:
- analysis.md, components.md, data-flow.md, drawbacks.md
- features.md, modernization.md, rust-migration.md
- benchmark-plan.md, benchmark-results.md, e2e-test-plan.md
- README.md
docs/v2/ - New MusicFS architecture:
- requirements.md: Full requirements spec (FR-1 to FR-25, NFR-1 to NFR-14)
- P0: Multi-origin, plugins, CAS, control API
- P1: Search, album art, prefetch, metadata sources
- P3: HA, 10M+ files scalability
- architecture.md: Google BlueDoc style design document
- PlantUML diagrams for all components
- Design requirements with quantitative targets
- Alternatives considered, implementation plan
This commit is contained in:
@@ -0,0 +1,118 @@
|
||||
# beetfs - Reverse Engineered Documentation
|
||||
|
||||
> **Status**: Archived project (2010-2013), Python 2, fuse-python API
|
||||
> **Fork**: git@github.com:LichHunter/beetfs.git
|
||||
> **Original**: https://github.com/jbaiter/beetfs
|
||||
|
||||
## Overview
|
||||
|
||||
beetfs is a FUSE filesystem that presents audio files with **metadata from a database** while **passing through audio data unchanged** from original files. This enables transparent metadata modification without touching the underlying files.
|
||||
|
||||
### The Core Concept
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ APPLICATION (VLC, Jellyfin, etc.) │
|
||||
│ │
|
||||
│ read("/mount/Artist/Album/track.flac") │
|
||||
└─────────────────────────────────┬───────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ beetfs (FUSE Layer) │
|
||||
│ ┌────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ FileHandler │ │
|
||||
│ │ ┌──────────────────────────────────────────────────────────┐ │ │
|
||||
│ │ │ if offset < header_boundary: │ │ │
|
||||
│ │ │ return MODIFIED_HEADER (from beets database) │ │ │
|
||||
│ │ │ else: │ │ │
|
||||
│ │ │ return ORIGINAL_AUDIO (from real file on disk) │ │ │
|
||||
│ │ └──────────────────────────────────────────────────────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│ │
|
||||
┌───────────┘ └───────────┐
|
||||
▼ ▼
|
||||
┌───────────────────┐ ┌───────────────────┐
|
||||
│ Beets Database │ │ Original File │
|
||||
│ (SQLite - tags) │ │ (untouched) │
|
||||
│ │ │ │
|
||||
│ title: "Fixed" │ │ [FLAC header] │
|
||||
│ artist: "Corr" │ │ [Audio frames] │
|
||||
│ album: "Right" │ │ │
|
||||
└───────────────────┘ └───────────────────┘
|
||||
```
|
||||
|
||||
## Key Features
|
||||
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| **Metadata Overlay** | Returns tags from database, not from file |
|
||||
| **Audio Passthrough** | Original audio data served unchanged |
|
||||
| **Write Interception** | Tag edits saved to database, not to file |
|
||||
| **Virtual Organization** | Presents files in template-based directory structure |
|
||||
| **Format Support** | FLAC (full), MP3 (partial - read-only) |
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
beetfs/
|
||||
├── beetsplug/
|
||||
│ ├── __init__.py # Package initialization
|
||||
│ └── beetFs.py # ALL code (~1144 lines)
|
||||
├── README.rst # Original readme
|
||||
└── COPYING # GPLv3 license
|
||||
```
|
||||
|
||||
## Quick Architecture Summary
|
||||
|
||||
| Component | Lines | Purpose |
|
||||
|-----------|-------|---------|
|
||||
| `beetFs` (plugin) | 188-191 | Beets plugin hook |
|
||||
| `mount()` | 119-183 | CLI entry point, builds virtual tree |
|
||||
| `FSNode` | 390-436 | Virtual directory tree node |
|
||||
| `FileHandler` | 439-565 | **CORE**: Metadata interpolation |
|
||||
| `InterpolatedFLAC` | 274-388 | FLAC header generation |
|
||||
| `InterpolatedID3` | 200-271 | ID3 tag generation (incomplete) |
|
||||
| `beetFileSystem` | 622-1144 | FUSE operations implementation |
|
||||
| `Stat` | 568-619 | File stat structure |
|
||||
|
||||
## Documentation Index
|
||||
|
||||
1. **[Architecture Overview](./architecture.md)** - System design and component interaction
|
||||
2. **[Components Deep Dive](./components.md)** - Detailed component analysis
|
||||
3. **[Data Flow](./data-flow.md)** - Read/write operation flows
|
||||
4. **[Performance Analysis](./analysis.md)** - Latency, memory footprint, I/O patterns
|
||||
5. **[Drawbacks & Limitations](./drawbacks.md)** - Known issues and missing features
|
||||
6. **[Modernization Guide](./modernization.md)** - Notes for updating to Python 3
|
||||
|
||||
## Critical Issues Summary
|
||||
|
||||
| Issue | Severity | Impact |
|
||||
|-------|----------|--------|
|
||||
| Full file loaded into RAM | 🔴 Critical | OOM on large libraries |
|
||||
| MP3 support disabled | 🔴 Critical | Only FLAC works |
|
||||
| Python 2 only | 🔴 Critical | EOL, security risk |
|
||||
| Single-threaded | 🟡 Major | Poor concurrency |
|
||||
| 4 of 17 metadata fields | 🟡 Major | Limited functionality |
|
||||
|
||||
See [drawbacks.md](./drawbacks.md) for complete list (27 identified issues).
|
||||
|
||||
## Dependencies (Original)
|
||||
|
||||
```
|
||||
beets >= 1.0
|
||||
fuse-python (Python 2 FUSE bindings)
|
||||
mutagen (audio metadata library)
|
||||
```
|
||||
|
||||
## Usage (Original)
|
||||
|
||||
```bash
|
||||
# As beets plugin
|
||||
beet mount /path/to/mountpoint
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
GPLv3 - See COPYING file
|
||||
@@ -0,0 +1,263 @@
|
||||
# beetfs Performance Analysis
|
||||
|
||||
## Executive Summary
|
||||
|
||||
beetfs has significant performance limitations due to its 2010-era design assumptions. The primary issues are **full file loading into RAM** and **blocking I/O on file open**.
|
||||
|
||||
---
|
||||
|
||||
## 1. Latency Analysis
|
||||
|
||||
### Operation Latencies
|
||||
|
||||
| Operation | Time Complexity | Typical Latency | Notes |
|
||||
|-----------|-----------------|-----------------|-------|
|
||||
| **File Open** | O(file_size) | 50ms - 1s+ | Reads entire file into memory |
|
||||
| **File Read** | O(1) | <1ms | Pure memory slice |
|
||||
| **File Write** | O(file_size) | 100ms - 2s+ | Reconstructs + DB write |
|
||||
| **Directory List** | O(n) | <10ms | In-memory tree traversal |
|
||||
| **getattr** | O(depth) | <1ms | Tree navigation + stat |
|
||||
|
||||
### File Open Breakdown
|
||||
|
||||
The file open operation is the critical bottleneck:
|
||||
|
||||
```
|
||||
Time breakdown for opening 50MB FLAC file:
|
||||
┌────────────────────────────────────────────────────────────┐
|
||||
│ 1. open() syscall │ ~1ms │
|
||||
│ 2. file_object.read() - load entire file │ ~100-200ms │
|
||||
│ 3. InterpolatedFLAC() - parse FLAC │ ~20-50ms │
|
||||
│ 4. Inject DB metadata │ ~1ms │
|
||||
│ 5. get_header() - generate new header │ ~10-20ms │
|
||||
│ 6. Seek to audio offset │ ~1ms │
|
||||
│ 7. Read audio into music_data │ ~100-200ms │
|
||||
├────────────────────────────────────────────────────────────┤
|
||||
│ TOTAL │ ~230-470ms │
|
||||
└────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Code Evidence** (lines 461-483):
|
||||
```python
|
||||
# Step 2-5: Load and parse entire file
|
||||
self.inf = InterpolatedFLAC(self.file_object.read()) # FULL FILE READ
|
||||
self.inf["title"] = self.item.title
|
||||
# ...
|
||||
self.header = self.inf.get_header(self.real_path)
|
||||
|
||||
# Step 6-7: Cache all audio data
|
||||
self.file_object.seek(self.music_offset)
|
||||
self.music_data = self.file_object.read() # ANOTHER FULL READ
|
||||
```
|
||||
|
||||
### Read Operation (Post-Open)
|
||||
|
||||
After file is opened, reads are fast:
|
||||
|
||||
```python
|
||||
def read(self, size, offset):
|
||||
if offset < self.bound:
|
||||
return self.header[offset:offset+size] # Memory slice: O(1)
|
||||
else:
|
||||
return self.music_data[offset - len(self.header):...] # Memory slice: O(1)
|
||||
```
|
||||
|
||||
### Write Operation
|
||||
|
||||
Writes to header area trigger expensive reconstruction:
|
||||
|
||||
```
|
||||
Time breakdown for tag write:
|
||||
┌────────────────────────────────────────────────────────────┐
|
||||
│ 1. Reconstruct filedata in memory │ ~10-50ms │
|
||||
│ 2. Parse as InterpolatedFLAC │ ~20-50ms │
|
||||
│ 3. Extract tag values │ ~1ms │
|
||||
│ 4. lib.store() + lib.save() (SQLite) │ ~10-50ms │
|
||||
│ 5. Regenerate header │ ~10-20ms │
|
||||
├────────────────────────────────────────────────────────────┤
|
||||
│ TOTAL │ ~50-170ms │
|
||||
└────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Memory Footprint
|
||||
|
||||
### Per-File Memory Usage
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ FileHandler Memory Layout │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.music_data (bytes) │ │
|
||||
│ │ Size: file_size - original_header_size │ │
|
||||
│ │ Typical: 95-99% of file size │ │
|
||||
│ │ Example: 48.5 MB for 50 MB file │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.header (bytes) │ │
|
||||
│ │ Size: Generated FLAC header with DB metadata │ │
|
||||
│ │ Typical: 4 KB - 64 KB (depends on metadata + padding) │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.inf (InterpolatedFLAC) │ │
|
||||
│ │ Size: Parsed metadata blocks + internal state │ │
|
||||
│ │ Typical: 10 KB - 100 KB │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Other attributes │ │
|
||||
│ │ path, real_path, item reference, format, etc. │ │
|
||||
│ │ Typical: ~1 KB │ │
|
||||
│ └─────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
├─────────────────────────────────────────────────────────────────────┤
|
||||
│ TOTAL per file: ~1.0x - 1.1x original file size │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Memory Scaling
|
||||
|
||||
| Scenario | Files Open | Avg File Size | RAM Usage |
|
||||
|----------|------------|---------------|-----------|
|
||||
| Single track playback | 1 | 30 MB | ~32 MB |
|
||||
| Album playback (gapless) | 2-3 | 30 MB | ~65-100 MB |
|
||||
| Album fully opened | 10 | 30 MB | ~320 MB |
|
||||
| Jellyfin library scan | 50-100 | 30 MB | **1.6 - 3.2 GB** |
|
||||
| Full library scan | 1000 | 30 MB | **32 GB** (OOM) |
|
||||
|
||||
### Global Memory
|
||||
|
||||
```python
|
||||
# Directory tree structure
|
||||
directory_structure = FSNode({}, {})
|
||||
# Memory: O(number_of_items)
|
||||
# Typical: 1-10 MB for libraries with 10,000-100,000 tracks
|
||||
|
||||
# Open file handles
|
||||
self.files = {} # Dict[str, FileHandler]
|
||||
# Memory: Sum of all FileHandler instances
|
||||
# Unbounded - grows with concurrent opens
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. I/O Patterns
|
||||
|
||||
### Current (Inefficient)
|
||||
|
||||
```
|
||||
File Open:
|
||||
Disk → [Read ALL] → RAM (music_data)
|
||||
→ RAM (inf object)
|
||||
→ RAM (header)
|
||||
|
||||
File Read:
|
||||
RAM (header or music_data) → Application
|
||||
|
||||
Total I/O: 1x-2x file size on open, 0 on read
|
||||
```
|
||||
|
||||
### Optimal (Not Implemented)
|
||||
|
||||
```
|
||||
File Open:
|
||||
Disk → [Read header only] → RAM (small)
|
||||
|
||||
File Read:
|
||||
If header region:
|
||||
RAM (header) → Application
|
||||
If audio region:
|
||||
Disk → [Seek + Read chunk] → Application
|
||||
|
||||
Total I/O: ~64KB on open, on-demand reads
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Concurrency
|
||||
|
||||
### Current Model
|
||||
|
||||
```python
|
||||
server.multithreaded = 0 # Single-threaded
|
||||
```
|
||||
|
||||
**Implications:**
|
||||
- All FUSE operations serialized
|
||||
- One slow file open blocks everything
|
||||
- No benefit from multi-core CPUs
|
||||
|
||||
### Impact on Use Cases
|
||||
|
||||
| Use Case | Impact |
|
||||
|----------|--------|
|
||||
| Single player (VLC) | Acceptable - one file at a time |
|
||||
| Media server scan | Severe - sequential processing |
|
||||
| Multiple clients | Severe - requests queue up |
|
||||
| Concurrent reads | Moderate - reads are fast once open |
|
||||
|
||||
---
|
||||
|
||||
## 5. Benchmarks (Theoretical)
|
||||
|
||||
Based on code analysis, not actual measurements:
|
||||
|
||||
### File Open Time vs Size
|
||||
|
||||
```
|
||||
File Size Open Time (HDD) Open Time (SSD)
|
||||
────────────────────────────────────────────────
|
||||
10 MB 50-100 ms 20-50 ms
|
||||
30 MB 150-300 ms 50-100 ms
|
||||
50 MB 250-500 ms 100-200 ms
|
||||
100 MB 500-1000 ms 200-400 ms
|
||||
200 MB 1000-2000 ms 400-800 ms
|
||||
```
|
||||
|
||||
### Memory vs Concurrent Opens
|
||||
|
||||
```
|
||||
Open Files RAM Usage (30MB avg)
|
||||
─────────────────────────────────────
|
||||
1 ~32 MB
|
||||
5 ~160 MB
|
||||
10 ~320 MB
|
||||
25 ~800 MB
|
||||
50 ~1.6 GB
|
||||
100 ~3.2 GB
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Comparison with Alternatives
|
||||
|
||||
| Metric | beetfs | Direct File | NFS | FUSE passthrough |
|
||||
|--------|--------|-------------|-----|------------------|
|
||||
| Open latency | 200-500ms | <10ms | 10-50ms | <10ms |
|
||||
| Read latency | <1ms | <1ms | 1-10ms | <1ms |
|
||||
| Memory/file | ~1x size | ~0 | ~0 | ~0 |
|
||||
| Metadata source | Database | File | File | File |
|
||||
| Modify original | No | Yes | Yes | Yes |
|
||||
|
||||
---
|
||||
|
||||
## 7. Recommendations
|
||||
|
||||
### For Current Usage
|
||||
|
||||
1. **Limit concurrent opens** - Don't scan full library
|
||||
2. **Use SSDs** - Reduces open latency by 2-3x
|
||||
3. **Increase RAM** - Expect 1x file size per open
|
||||
4. **Avoid large files** - 24-bit/192kHz FLACs are problematic
|
||||
|
||||
### For Modernization
|
||||
|
||||
1. **Implement lazy loading** - Read audio on demand
|
||||
2. **Add file handle caching** - Keep headers, release audio
|
||||
3. **Enable multi-threading** - Parallelize opens
|
||||
4. **Add memory limits** - Evict old FileHandlers
|
||||
@@ -0,0 +1,403 @@
|
||||
# beetfs Benchmark Plan
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Benchmark suite to measure beetfs FUSE filesystem performance across mount time, metadata operations, file I/O, and memory usage. Focus on realistic music library workloads.
|
||||
|
||||
## Critical Performance Findings (Pre-Benchmark)
|
||||
|
||||
### Architecture Bottlenecks Identified
|
||||
|
||||
| Bottleneck | Location | Impact |
|
||||
|------------|----------|--------|
|
||||
| **Full file load into RAM** | `FileHandler.__init__` line 481 | 50-100MB per open FLAC |
|
||||
| **Mount-time bulk load** | `mount()` line 143 | O(N) for N library items |
|
||||
| **GIL serialization** | Python 2.7 | Single-core limit for metadata ops |
|
||||
| **Per-file DB lookup** | `getattr()`, `access()` | SQLite query per stat call |
|
||||
|
||||
### Expected Performance Characteristics
|
||||
|
||||
| Operation | Expected Performance | Bottleneck |
|
||||
|-----------|---------------------|------------|
|
||||
| Mount (10K items) | 5-30 seconds | `lib.items()` + FSNode construction |
|
||||
| readdir | Fast (in-memory dict) | None |
|
||||
| getattr (file) | Slow (~1ms) | DB lookup + real file stat |
|
||||
| open (first) | Very slow | Full file read into RAM |
|
||||
| read | Fast | Memory-to-memory copy |
|
||||
| Memory (10 open files) | 500MB-1GB | FileHandler caches entire files |
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Tools
|
||||
|
||||
### Primary Tools
|
||||
|
||||
| Tool | Purpose | Install |
|
||||
|------|---------|---------|
|
||||
| **fio** | I/O throughput, IOPS, latency | `nix-shell -p fio` |
|
||||
| **mdtest** | Metadata operations (stat, readdir) | `nix-shell -p ior` |
|
||||
| **hyperfine** | Mount time, command timing | `nix-shell -p hyperfine` |
|
||||
| **time** | Basic timing | builtin |
|
||||
| **/usr/bin/time -v** | Memory usage (maxrss) | builtin |
|
||||
|
||||
### Measurement Scripts
|
||||
|
||||
All benchmarks use synthetic FLAC files (5-10MB) to avoid I/O variance from real storage.
|
||||
|
||||
---
|
||||
|
||||
## Benchmark Categories
|
||||
|
||||
### 1. Mount Time Scaling
|
||||
|
||||
**Goal**: Measure how mount time scales with library size.
|
||||
|
||||
**Method**:
|
||||
```bash
|
||||
# Create libraries with N items: 100, 1K, 10K, 50K, 100K
|
||||
hyperfine --warmup 1 --runs 5 \
|
||||
'beet mount /mnt/beetfs && sleep 1 && fusermount -u /mnt/beetfs'
|
||||
```
|
||||
|
||||
**Metrics**:
|
||||
- Time to mount (seconds)
|
||||
- Memory usage at mount completion (RSS)
|
||||
|
||||
**Expected scaling**: O(N) - linear with library size
|
||||
|
||||
**Test matrix**:
|
||||
| Library Size | Expected Mount Time | Expected Memory |
|
||||
|--------------|--------------------:|----------------:|
|
||||
| 100 items | <1s | ~50MB |
|
||||
| 1,000 items | 1-3s | ~60MB |
|
||||
| 10,000 items | 5-15s | ~100MB |
|
||||
| 50,000 items | 30-60s | ~300MB |
|
||||
| 100,000 items | 60-120s | ~500MB |
|
||||
|
||||
---
|
||||
|
||||
### 2. Metadata Operations (stat/readdir)
|
||||
|
||||
**Goal**: Measure getattr and readdir performance - critical for music players that scan libraries.
|
||||
|
||||
#### 2a. Single stat latency
|
||||
|
||||
```bash
|
||||
# Measure single stat call latency
|
||||
hyperfine --warmup 10 --runs 100 \
|
||||
'stat /mnt/beetfs/Artist/Album/01-Track.flac'
|
||||
```
|
||||
|
||||
**Target**: <5ms average, <20ms p99
|
||||
|
||||
#### 2b. Bulk stat (library scan simulation)
|
||||
|
||||
```bash
|
||||
# Stat all files in library
|
||||
hyperfine --warmup 1 --runs 5 \
|
||||
'find /mnt/beetfs -type f -exec stat {} + > /dev/null'
|
||||
```
|
||||
|
||||
**Metrics**:
|
||||
- Total time for N files
|
||||
- stat operations per second
|
||||
- p50, p95, p99 latency
|
||||
|
||||
**Target**: >500 stat/s (Python FUSE baseline)
|
||||
|
||||
#### 2c. Directory listing
|
||||
|
||||
```bash
|
||||
# List directory with N entries
|
||||
hyperfine --warmup 3 --runs 10 \
|
||||
'ls /mnt/beetfs/Artist/Album/'
|
||||
```
|
||||
|
||||
**Test matrix**:
|
||||
| Directory entries | Target time |
|
||||
|------------------:|------------:|
|
||||
| 10 | <50ms |
|
||||
| 100 | <100ms |
|
||||
| 1,000 | <500ms |
|
||||
|
||||
---
|
||||
|
||||
### 3. File Open Performance
|
||||
|
||||
**Goal**: Measure file open latency - the critical bottleneck due to full file load.
|
||||
|
||||
#### 3a. First open (cold)
|
||||
|
||||
```bash
|
||||
# Clear any caches, then open file
|
||||
echo 3 > /proc/sys/vm/drop_caches
|
||||
hyperfine --warmup 0 --runs 10 \
|
||||
'head -c 1 /mnt/beetfs/Artist/Album/01-Track.flac > /dev/null'
|
||||
```
|
||||
|
||||
**Test matrix**:
|
||||
| File size | Expected open time |
|
||||
|----------:|-------------------:|
|
||||
| 5MB | 50-200ms |
|
||||
| 20MB | 200-500ms |
|
||||
| 50MB | 500ms-1s |
|
||||
| 100MB | 1-2s |
|
||||
|
||||
#### 3b. Cached open (warm)
|
||||
|
||||
```bash
|
||||
# File already opened once
|
||||
hyperfine --warmup 5 --runs 50 \
|
||||
'head -c 1 /mnt/beetfs/Artist/Album/01-Track.flac > /dev/null'
|
||||
```
|
||||
|
||||
**Target**: <10ms (should hit FileHandler cache)
|
||||
|
||||
---
|
||||
|
||||
### 4. Read Throughput
|
||||
|
||||
**Goal**: Measure sequential and random read performance.
|
||||
|
||||
#### 4a. Sequential read
|
||||
|
||||
```bash
|
||||
fio --name=seq_read \
|
||||
--filename=/mnt/beetfs/Artist/Album/01-Track.flac \
|
||||
--rw=read --bs=1M --direct=0 \
|
||||
--ioengine=sync --numjobs=1 \
|
||||
--runtime=30 --time_based
|
||||
```
|
||||
|
||||
**Metrics**: MB/s throughput
|
||||
|
||||
**Target**: >100 MB/s (memory-backed after first read)
|
||||
|
||||
#### 4b. Random read (simulates seeking in audio player)
|
||||
|
||||
```bash
|
||||
fio --name=rand_read \
|
||||
--filename=/mnt/beetfs/Artist/Album/01-Track.flac \
|
||||
--rw=randread --bs=64k --direct=0 \
|
||||
--ioengine=sync --numjobs=1 \
|
||||
--runtime=30 --time_based
|
||||
```
|
||||
|
||||
**Metrics**: IOPS, latency histogram
|
||||
|
||||
---
|
||||
|
||||
### 5. Memory Usage
|
||||
|
||||
**Goal**: Measure memory consumption under load.
|
||||
|
||||
#### 5a. Idle memory (mounted, no activity)
|
||||
|
||||
```bash
|
||||
# Mount and measure RSS
|
||||
beet mount /mnt/beetfs &
|
||||
sleep 5
|
||||
ps -o rss= -p $(pgrep -f beetfs)
|
||||
```
|
||||
|
||||
#### 5b. Memory per open file
|
||||
|
||||
```bash
|
||||
# Open N files, measure memory growth
|
||||
for i in 1 5 10 20; do
|
||||
# Open $i files simultaneously
|
||||
cat /mnt/beetfs/Artist/Album/0{1..$i}*.flac > /dev/null &
|
||||
ps -o rss= -p $(pgrep -f beetfs)
|
||||
done
|
||||
```
|
||||
|
||||
**Expected**: ~file_size × open_files (FileHandler caches entire file)
|
||||
|
||||
#### 5c. Memory leak detection
|
||||
|
||||
```bash
|
||||
# Repeatedly open/close files, check for memory growth
|
||||
for i in {1..100}; do
|
||||
cat /mnt/beetfs/Artist/Album/01-Track.flac > /dev/null
|
||||
done
|
||||
# Compare RSS before and after
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Concurrent Access
|
||||
|
||||
**Goal**: Measure performance under parallel access (multiple processes).
|
||||
|
||||
```bash
|
||||
# Parallel stat operations
|
||||
hyperfine --warmup 1 --runs 5 \
|
||||
'seq 1 100 | xargs -P 4 -I {} stat /mnt/beetfs/Artist/Album/0{}-Track.flac'
|
||||
```
|
||||
|
||||
**Metrics**:
|
||||
- Throughput scaling with parallelism (1, 2, 4, 8 workers)
|
||||
- Latency degradation
|
||||
|
||||
**Expected**: Limited scaling due to Python GIL
|
||||
|
||||
---
|
||||
|
||||
### 7. Realistic Workloads
|
||||
|
||||
#### 7a. Music player library scan
|
||||
|
||||
Simulates: Rhythmbox/Clementine scanning library at startup
|
||||
|
||||
```bash
|
||||
# Recursive stat + readdir
|
||||
time find /mnt/beetfs -type f -name "*.flac" -exec stat {} + | wc -l
|
||||
```
|
||||
|
||||
#### 7b. Album playback
|
||||
|
||||
Simulates: Playing 12-track album sequentially
|
||||
|
||||
```bash
|
||||
# Open each file, read 1MB (simulate buffering), close
|
||||
for f in /mnt/beetfs/Artist/Album/*.flac; do
|
||||
dd if="$f" of=/dev/null bs=1M count=1 2>/dev/null
|
||||
done
|
||||
```
|
||||
|
||||
#### 7c. Metadata edit
|
||||
|
||||
Simulates: Editing tags in Picard/Kid3
|
||||
|
||||
```bash
|
||||
# Open file, write to header region, close
|
||||
# (Requires write support to be functional)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Baseline Comparisons
|
||||
|
||||
### Reference Filesystems
|
||||
|
||||
| Filesystem | Purpose |
|
||||
|------------|---------|
|
||||
| **ext4 (local)** | Best-case baseline |
|
||||
| **fuse-passthrough** | FUSE overhead baseline |
|
||||
| **sshfs** | Network FUSE comparison |
|
||||
|
||||
### Comparison Method
|
||||
|
||||
Run identical benchmarks on:
|
||||
1. Real music files on ext4
|
||||
2. Same files via FUSE passthrough
|
||||
3. Same files via beetfs
|
||||
|
||||
Calculate overhead: `(beetfs_time - ext4_time) / ext4_time × 100%`
|
||||
|
||||
---
|
||||
|
||||
## Test Environment
|
||||
|
||||
### Hardware Requirements
|
||||
|
||||
- CPU: 4+ cores (to test GIL impact)
|
||||
- RAM: 8+ GB (for large library tests)
|
||||
- Storage: SSD recommended (reduces I/O variance)
|
||||
|
||||
### Software Requirements
|
||||
|
||||
```nix
|
||||
# Add to flake.nix devShell
|
||||
buildInputs = [
|
||||
fio
|
||||
hyperfine
|
||||
# ior # includes mdtest
|
||||
];
|
||||
```
|
||||
|
||||
### Cache Control
|
||||
|
||||
```bash
|
||||
# Clear all caches before cold benchmarks
|
||||
sync
|
||||
echo 3 > /proc/sys/vm/drop_caches
|
||||
|
||||
# Disable kernel FUSE caching for accurate measurements
|
||||
mount -o entry_timeout=0,attr_timeout=0,negative_timeout=0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
### Minimum Viable Performance
|
||||
|
||||
| Metric | Minimum | Target | Excellent |
|
||||
|--------|--------:|-------:|----------:|
|
||||
| Mount time (10K items) | <60s | <15s | <5s |
|
||||
| stat latency (avg) | <20ms | <5ms | <1ms |
|
||||
| stat throughput | >100/s | >500/s | >2000/s |
|
||||
| File open (50MB, cold) | <5s | <1s | <200ms |
|
||||
| Read throughput | >50 MB/s | >200 MB/s | >500 MB/s |
|
||||
| Memory (idle, 10K items) | <500MB | <100MB | <50MB |
|
||||
| Memory per open file | <2× file size | <1.5× | <1.1× |
|
||||
|
||||
### Regression Detection
|
||||
|
||||
Any benchmark result >20% worse than baseline triggers investigation.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Test Data Generation
|
||||
|
||||
Use existing test infrastructure from `tests/conftest.py`:
|
||||
- `create_synthetic_flac()` - generates valid FLAC files
|
||||
- `BeetFSTestCase` - creates isolated beets library
|
||||
|
||||
### Benchmark Script Structure
|
||||
|
||||
```
|
||||
beetfs/
|
||||
├── benchmarks/
|
||||
│ ├── run_all.sh # Master script
|
||||
│ ├── bench_mount.sh # Mount time tests
|
||||
│ ├── bench_metadata.sh # stat/readdir tests
|
||||
│ ├── bench_io.sh # Read/write throughput
|
||||
│ ├── bench_memory.sh # Memory profiling
|
||||
│ └── results/ # Output directory
|
||||
│ ├── mount_scaling.csv
|
||||
│ ├── stat_latency.csv
|
||||
│ └── ...
|
||||
```
|
||||
|
||||
### Output Format
|
||||
|
||||
```csv
|
||||
# Example: mount_scaling.csv
|
||||
library_size,mount_time_ms,memory_rss_kb,timestamp
|
||||
100,450,52000,2024-01-15T10:30:00
|
||||
1000,2100,61000,2024-01-15T10:31:00
|
||||
10000,12500,98000,2024-01-15T10:33:00
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations
|
||||
|
||||
1. **Python 2.7 GIL**: Cannot achieve true parallelism - expect flat scaling beyond 1 core
|
||||
2. **FileHandler memory**: Each open file = full file in RAM - will OOM with many large files
|
||||
3. **No lazy loading**: All library items loaded at mount - slow for large libraries
|
||||
4. **SQLite single-writer**: Concurrent writes will serialize
|
||||
|
||||
## Optimization Opportunities (Post-Benchmark)
|
||||
|
||||
Based on benchmark results, consider:
|
||||
|
||||
1. **Lazy FSNode construction** - Build tree on first access, not mount
|
||||
2. **Memory-mapped file access** - mmap instead of full read
|
||||
3. **LRU cache for FileHandler** - Evict old files instead of holding all
|
||||
4. **Metadata caching** - Cache getattr results, invalidate on DB change
|
||||
5. **Batch DB queries** - Prefetch metadata for directory listings
|
||||
@@ -0,0 +1,101 @@
|
||||
# beetfs Benchmark Results
|
||||
|
||||
**Date**: 2026-05-12
|
||||
**Status**: ❌ ALL BENCHMARKS BLOCKED BY BUGS
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Benchmarks cannot complete due to critical bugs in beetfs. The implementation is non-functional for any library with content.
|
||||
|
||||
## Results
|
||||
|
||||
| Benchmark | Status | Mean | Error |
|
||||
|-----------|--------|------|-------|
|
||||
| mount_time | ❌ FAIL | N/A | Directory tree building bug |
|
||||
| readdir | ❌ FAIL | N/A | Directory tree building bug |
|
||||
| stat_latency | ❌ FAIL | N/A | Directory tree building bug |
|
||||
| enoent_lookup | ❌ FAIL | N/A | Directory tree building bug |
|
||||
| file_open | ❌ FAIL | N/A | Directory tree building bug |
|
||||
| read_throughput | ❌ FAIL | N/A | Directory tree building bug |
|
||||
| memory_usage | ❌ FAIL | N/A | Directory tree building bug |
|
||||
|
||||
## Blocking Bugs
|
||||
|
||||
### Bug #1: Nested Methods (Lines 758-1144)
|
||||
|
||||
All FUSE operations (`readdir`, `open`, `read`, `write`, etc.) are indented inside the `access()` method, making them local functions instead of class methods.
|
||||
|
||||
**Impact**: Even if mount succeeds, all file operations return `ENOSYS (Function not implemented)`.
|
||||
|
||||
**Fix Required**: Dedent lines 758-1144 by 8 spaces.
|
||||
|
||||
### Bug #2: Directory Tree Building (Lines 403-414)
|
||||
|
||||
`FSNode.adddir()` calls `getnode()` which assumes parent directories already exist. When building the tree for a new library, parent directories haven't been created yet.
|
||||
|
||||
**Error**:
|
||||
```
|
||||
KeyError: u'Bench Artist'
|
||||
File "beetFs.py", line 403, in getnode
|
||||
return self.getnode(elements, root=root.dirs[topdir])
|
||||
```
|
||||
|
||||
**Impact**: Mount crashes when library contains any tracks.
|
||||
|
||||
**Fix Required**: `adddir()` must create parent directories recursively before adding child.
|
||||
|
||||
### Bug #3: Empty Library Only
|
||||
|
||||
The only working configuration is mounting with an empty beets library:
|
||||
- `test_mount_empty_library`: ✅ PASS
|
||||
- Any library with tracks: ❌ CRASH
|
||||
|
||||
## Test Environment
|
||||
|
||||
- **Python**: 2.7.15
|
||||
- **OS**: Linux (NixOS)
|
||||
- **Test data**: 10 synthetic FLAC files (5 MB each)
|
||||
- **Beets**: 1.4.9
|
||||
|
||||
## Benchmark Configuration
|
||||
|
||||
```python
|
||||
num_tracks = 10
|
||||
track_size_mb = 5
|
||||
mount_runs = 3
|
||||
stat_runs = 20
|
||||
readdir_runs = 10
|
||||
```
|
||||
|
||||
## Raw Results
|
||||
|
||||
See `benchmarks/results/benchmark_results.json` for full JSON output.
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Fix Bug #2** (directory tree building) - allows mount with content
|
||||
2. **Fix Bug #1** (nested methods) - allows FUSE operations to work
|
||||
3. **Re-run benchmarks** - get actual performance numbers
|
||||
|
||||
## Conclusion
|
||||
|
||||
**beetfs is currently non-functional** for real-world use. Both bugs must be fixed before performance can be measured. The test infrastructure and benchmark suite are ready; only the implementation needs repair.
|
||||
|
||||
---
|
||||
|
||||
## Appendix: E2E Test Results (For Reference)
|
||||
|
||||
From the e2e test suite (74 tests):
|
||||
|
||||
| Category | Passed | Failed | Errors |
|
||||
|----------|--------|--------|--------|
|
||||
| Smoke tests | 4 | 3 | 0 |
|
||||
| Nested bug detection | 3 (confirmed bug) | 10 | 0 |
|
||||
| Readdir | 0 | 10 | 0 |
|
||||
| Stat | 0 | 8 | 0 |
|
||||
| Read | 0 | 11 | 0 |
|
||||
| Write | 0 | 7 | 0 |
|
||||
| Error handling | 0 | 7 | 3 |
|
||||
| **Total** | **12** | **56** | **3** |
|
||||
|
||||
The 12 passing tests are infrastructure tests and tests that verify the bugs exist.
|
||||
@@ -0,0 +1,550 @@
|
||||
# beetfs Components Deep Dive
|
||||
|
||||
## Component Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────┐
|
||||
│ beetFs.py │
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐│
|
||||
│ │ PLUGIN LAYER ││
|
||||
│ │ beetFs (BeetsPlugin) beetFs_command (Subcommand) ││
|
||||
│ │ mount() template_mapping() ││
|
||||
│ └─────────────────────────────────────────────────────────────────────┘│
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐│
|
||||
│ │ VIRTUAL FILESYSTEM ││
|
||||
│ │ FSNode beetFileSystem (fuse.Fuse) ││
|
||||
│ │ Stat ││
|
||||
│ └─────────────────────────────────────────────────────────────────────┘│
|
||||
│ ┌─────────────────────────────────────────────────────────────────────┐│
|
||||
│ │ METADATA INTERPOLATION ││
|
||||
│ │ FileHandler InterpolatedFLAC ││
|
||||
│ │ InterpolatedID3 ││
|
||||
│ └─────────────────────────────────────────────────────────────────────┘│
|
||||
└─────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 1. Plugin Layer
|
||||
|
||||
### 1.1 beetFs (BeetsPlugin)
|
||||
|
||||
**Location**: Lines 188-191
|
||||
|
||||
```python
|
||||
class beetFs(BeetsPlugin):
|
||||
""" The beets plugin hook."""
|
||||
def commands(self):
|
||||
return [beetFs_command]
|
||||
```
|
||||
|
||||
**Purpose**: Registers beetfs as a beets plugin, exposing the `mount` subcommand.
|
||||
|
||||
### 1.2 beetFs_command
|
||||
|
||||
**Location**: Lines 47, 185
|
||||
|
||||
```python
|
||||
beetFs_command = Subcommand('mount', help='Mount a beets filesystem')
|
||||
beetFs_command.func = mount
|
||||
```
|
||||
|
||||
**Purpose**: CLI subcommand definition for `beet mount`.
|
||||
|
||||
### 1.3 mount() Function
|
||||
|
||||
**Location**: Lines 119-183
|
||||
|
||||
```python
|
||||
def mount(lib, config, opts, args):
|
||||
# 1. Validate arguments
|
||||
if not args:
|
||||
raise beets.ui.UserError('no mountpoint specified')
|
||||
|
||||
# 2. Parse path template
|
||||
global structure_split
|
||||
structure_split = PATH_FORMAT.split("/")
|
||||
global structure_depth
|
||||
structure_depth = len(structure_split)
|
||||
|
||||
# 3. Store library reference
|
||||
global library
|
||||
library = lib
|
||||
|
||||
# 4. Build virtual directory tree
|
||||
global directory_structure
|
||||
directory_structure = FSNode({}, {})
|
||||
|
||||
# 5. Iterate all library items
|
||||
for item in lib.items():
|
||||
mapping = template_mapping(lib, item)
|
||||
# ... build tree ...
|
||||
directory_structure.addfile(sub_elements, filename, item.id)
|
||||
|
||||
# 6. Create and run FUSE server
|
||||
server = beetFileSystem(...)
|
||||
server.main()
|
||||
```
|
||||
|
||||
**Key Variables Set**:
|
||||
| Variable | Type | Purpose |
|
||||
|----------|------|---------|
|
||||
| `structure_split` | `List[str]` | Path template components |
|
||||
| `structure_depth` | `int` | Number of path levels |
|
||||
| `library` | `Library` | Beets library reference |
|
||||
| `directory_structure` | `FSNode` | Root of virtual tree |
|
||||
|
||||
### 1.4 template_mapping() Function
|
||||
|
||||
**Location**: Lines 82-116
|
||||
|
||||
```python
|
||||
def template_mapping(lib, item):
|
||||
"""Builds a template substitution map from beets item."""
|
||||
mapping = {}
|
||||
for key in METADATA_KEYS:
|
||||
value = getattr(item, key)
|
||||
# Sanitize value for filesystem paths
|
||||
if isinstance(value, basestring):
|
||||
value = re.sub(r'[\\/:]|^\.', '_', value)
|
||||
elif key in ('track', 'tracktotal', 'disc', 'disctotal'):
|
||||
value = '%02i' % value # Zero-pad numbers
|
||||
mapping[key] = value
|
||||
|
||||
# Add format info
|
||||
format_ = os.path.splitext(item.path)[1][1:]
|
||||
mapping['format'] = format_
|
||||
mapping['format_upper'] = format_.upper()
|
||||
|
||||
# Default values for missing fields
|
||||
if mapping['artist'] == '':
|
||||
mapping['artist'] = 'Unknown Artist'
|
||||
# ... etc
|
||||
|
||||
return mapping
|
||||
```
|
||||
|
||||
**Template Variables Available**:
|
||||
| Variable | Source | Example |
|
||||
|----------|--------|---------|
|
||||
| `$artist` | `item.artist` | "Pink Floyd" |
|
||||
| `$album` | `item.album` | "The Wall" |
|
||||
| `$title` | `item.title` | "Comfortably Numb" |
|
||||
| `$year` | `item.year` | "1979" |
|
||||
| `$track` | `item.track` | "06" |
|
||||
| `$format` | file extension | "flac" |
|
||||
| `$format_upper` | file extension | "FLAC" |
|
||||
|
||||
---
|
||||
|
||||
## 2. Virtual Filesystem Layer
|
||||
|
||||
### 2.1 FSNode Class
|
||||
|
||||
**Location**: Lines 390-436
|
||||
|
||||
```python
|
||||
class FSNode(object):
|
||||
"""A directory node in the virtual filesystem tree."""
|
||||
|
||||
def __init__(self, dirs, files):
|
||||
self.dirs = dirs # Dict[str, FSNode] - subdirectories
|
||||
self.files = files # Dict[str, int] - filename → beets item ID
|
||||
```
|
||||
|
||||
**Methods**:
|
||||
|
||||
| Method | Purpose | Signature |
|
||||
|--------|---------|-----------|
|
||||
| `getnode()` | Navigate to nested node | `getnode(elements, root=None) → FSNode` |
|
||||
| `adddir()` | Add a directory | `adddir(elements, directory, root=None)` |
|
||||
| `addfile()` | Add a file entry | `addfile(elements, filename, id, root=None)` |
|
||||
| `listdir()` | List contents | `listdir(elements, directories, root=None) → List[str]` |
|
||||
|
||||
**Example Tree Navigation**:
|
||||
```python
|
||||
# Path: /Artist/Album/track.flac
|
||||
# structure_split = ["$artist", "$album ($year) [$format_upper]", "$track - $artist - $title.$format"]
|
||||
|
||||
elements = ["Artist", "Album (2020) [FLAC]"]
|
||||
node = directory_structure.getnode(elements)
|
||||
# node.files = {"01 - Artist - Track.flac": 42, ...}
|
||||
|
||||
item_id = node.files["01 - Artist - Track.flac"]
|
||||
# item_id = 42
|
||||
```
|
||||
|
||||
### 2.2 Stat Class
|
||||
|
||||
**Location**: Lines 568-619
|
||||
|
||||
```python
|
||||
class Stat(fuse.Stat):
|
||||
DIRSIZE = 4096
|
||||
|
||||
def __init__(self, st_mode, st_size, st_nlink=1, st_uid=None, st_gid=None,
|
||||
dt_atime=None, dt_mtime=None, dt_ctime=None):
|
||||
self.st_mode = st_mode
|
||||
self.st_ino = 0
|
||||
self.st_dev = 0
|
||||
self.st_nlink = st_nlink
|
||||
self.st_uid = st_uid or os.getuid()
|
||||
self.st_gid = st_gid or os.getgid()
|
||||
self.st_size = st_size
|
||||
# ... timestamps ...
|
||||
```
|
||||
|
||||
**Purpose**: Represents file/directory metadata for FUSE stat operations.
|
||||
|
||||
### 2.3 beetFileSystem Class
|
||||
|
||||
**Location**: Lines 622-1144
|
||||
|
||||
```python
|
||||
class beetFileSystem(fuse.Fuse):
|
||||
"""Main FUSE filesystem implementation."""
|
||||
|
||||
def __init__(self, *args, **kwargs):
|
||||
logging.basicConfig(filename="LOG", level=logging.INFO)
|
||||
super(beetFileSystem, self).__init__(*args, **kwargs)
|
||||
|
||||
def fsinit(self):
|
||||
"""Called after filesystem is mounted."""
|
||||
self.lib = library
|
||||
self.files = {} # Dict[path, FileHandler]
|
||||
```
|
||||
|
||||
**FUSE Operations Implemented**:
|
||||
|
||||
| Operation | Lines | Purpose |
|
||||
|-----------|-------|---------|
|
||||
| `fsinit()` | 630-636 | Post-mount initialization |
|
||||
| `fsdestroy()` | 638-639 | Pre-unmount cleanup |
|
||||
| `statfs()` | 641-646 | Filesystem statistics |
|
||||
| `getattr()` | 648-707 | Get file/dir attributes |
|
||||
| `access()` | 723-756 | Check permissions |
|
||||
| `readdir()` | 931-975 | List directory contents |
|
||||
| `open()` | 988-1021 | Open file |
|
||||
| `read()` | 1077-1106 | Read file data |
|
||||
| `write()` | 1108-1135 | Write file data |
|
||||
| `release()` | 1049-1059 | Close file |
|
||||
|
||||
**Not Implemented (return EOPNOTSUPP)**:
|
||||
- `mknod()`, `mkdir()`, `unlink()`, `rmdir()`
|
||||
- `symlink()`, `link()`, `rename()`
|
||||
- `chmod()`, `chown()`, `truncate()`
|
||||
|
||||
---
|
||||
|
||||
## 3. Metadata Interpolation Layer
|
||||
|
||||
### 3.1 FileHandler Class
|
||||
|
||||
**Location**: Lines 439-565
|
||||
|
||||
This is the **core component** that implements metadata overlay.
|
||||
|
||||
```python
|
||||
class FileHandler(object):
|
||||
def __init__(self, path, lib):
|
||||
self.path = path # Virtual path
|
||||
self.lib = lib # Beets library
|
||||
|
||||
# Resolve virtual path to real file
|
||||
pathsplit = path[1:].split('/')
|
||||
self.item = self.lib.get_item(id=directory_structure
|
||||
.getnode(pathsplit[0:structure_depth-1])
|
||||
.files[pathsplit[structure_depth-1]])
|
||||
self.real_path = self.item.path
|
||||
|
||||
# Open real file
|
||||
self.file_object = open(self.real_path, 'r+')
|
||||
self.instance_count = 1
|
||||
|
||||
# Determine format
|
||||
self.format = os.path.splitext(path)[1][1:].lower()
|
||||
|
||||
if self.format == "flac":
|
||||
# Load file into interpolated FLAC object
|
||||
self.inf = InterpolatedFLAC(self.file_object.read())
|
||||
|
||||
# INJECT DATABASE METADATA
|
||||
self.inf["title"] = self.item.title
|
||||
self.inf["album"] = self.item.album
|
||||
self.inf["artist"] = self.item.artist
|
||||
self.inf["genre"] = self.item.genre
|
||||
|
||||
# Generate new header with DB metadata
|
||||
self.header = self.inf.get_header(self.real_path)
|
||||
self.bound = len(self.header)
|
||||
self.music_offset = self.inf.offset()
|
||||
|
||||
elif self.format == "mp3":
|
||||
self.bound = 0 # MP3 interpolation disabled
|
||||
self.music_offset = 0
|
||||
|
||||
# Cache audio data
|
||||
self.file_object.seek(self.music_offset)
|
||||
self.music_data = self.file_object.read()
|
||||
self.file_object.close()
|
||||
```
|
||||
|
||||
**Key Attributes**:
|
||||
|
||||
| Attribute | Type | Purpose |
|
||||
|-----------|------|---------|
|
||||
| `path` | `str` | Virtual path (e.g., `/Artist/Album/track.flac`) |
|
||||
| `real_path` | `str` | Actual file path on disk |
|
||||
| `item` | `Item` | Beets library item (has DB metadata) |
|
||||
| `format` | `str` | File format ("flac", "mp3") |
|
||||
| `inf` | `InterpolatedFLAC` | Mutagen object with injected metadata |
|
||||
| `header` | `bytes` | Generated header with DB tags |
|
||||
| `bound` | `int` | Byte offset where header ends |
|
||||
| `music_offset` | `int` | Byte offset in original file where audio starts |
|
||||
| `music_data` | `bytes` | Cached audio data |
|
||||
| `instance_count` | `int` | Reference count for file handles |
|
||||
|
||||
### 3.2 FileHandler.read() Method
|
||||
|
||||
**Location**: Lines 497-517
|
||||
|
||||
```python
|
||||
def read(self, size, offset):
|
||||
# Case 1: Reading within header boundary
|
||||
if offset < self.bound:
|
||||
if offset + size < len(self.header):
|
||||
# Entire read is within header
|
||||
return self.header[offset:offset+size]
|
||||
else:
|
||||
# Read spans header and audio
|
||||
ret = self.header[offset:len(self.header)]
|
||||
ret = ret + self.music_data[0:size - (len(self.header) - offset)]
|
||||
return ret
|
||||
|
||||
# Case 2: Reading audio data only
|
||||
return self.music_data[offset - len(self.header):offset - len(self.header) + size]
|
||||
```
|
||||
|
||||
**Read Logic Diagram**:
|
||||
|
||||
```
|
||||
Virtual File Layout:
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ 0 bound EOF │
|
||||
│ ├─────────┼────────────────────────────────────────────────┤ │
|
||||
│ │ HEADER │ AUDIO DATA │ │
|
||||
│ │ (from │ (from self.music_data) │ │
|
||||
│ │ self. │ │ │
|
||||
│ │ header) │ │ │
|
||||
│ └─────────┴────────────────────────────────────────────────┘ │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
|
||||
Read scenarios:
|
||||
1. offset=0, size=100, bound=500 → Return header[0:100]
|
||||
2. offset=400, size=200, bound=500 → Return header[400:500] + music[0:100]
|
||||
3. offset=600, size=100, bound=500 → Return music[100:200]
|
||||
```
|
||||
|
||||
### 3.3 FileHandler.write() Method
|
||||
|
||||
**Location**: Lines 519-565
|
||||
|
||||
```python
|
||||
def write(self, offset, buf):
|
||||
# Only handle writes to header area
|
||||
if offset < self.bound:
|
||||
# Reconstruct full file in memory
|
||||
filedata = self.header + self.music_data
|
||||
|
||||
# Patch in new data
|
||||
filedata = filedata[0:offset] + buf + filedata[offset + len(buf):]
|
||||
|
||||
if self.format == "flac":
|
||||
# Parse the patched data
|
||||
self.inf = InterpolatedFLAC(filedata)
|
||||
|
||||
# EXTRACT new tag values and save to DB
|
||||
self.item.title = str(self.inf["title"][0]).encode('utf-8')
|
||||
self.item.album = str(self.inf["album"][0]).encode('utf-8')
|
||||
self.item.artist = str(self.inf["artist"][0]).encode('utf-8')
|
||||
self.item.genre = str(self.inf["genre"][0]).encode('utf-8')
|
||||
|
||||
# Persist to beets database
|
||||
self.lib.store(self.item)
|
||||
self.lib.save()
|
||||
|
||||
# Regenerate header with updated values
|
||||
self.inf["title"] = self.item.title
|
||||
self.inf["album"] = self.item.album
|
||||
self.inf["artist"] = self.item.artist
|
||||
self.inf["genre"] = self.item.genre
|
||||
|
||||
self.header = self.inf.get_header(self.real_path)
|
||||
self.bound = len(self.header)
|
||||
|
||||
return len(buf)
|
||||
```
|
||||
|
||||
**Write Flow**:
|
||||
```
|
||||
1. App writes new tag data to header region
|
||||
│
|
||||
▼
|
||||
2. Patch header + music_data with new bytes
|
||||
│
|
||||
▼
|
||||
3. Parse patched data as FLAC
|
||||
│
|
||||
▼
|
||||
4. Extract tag values from parsed FLAC
|
||||
│
|
||||
▼
|
||||
5. Update beets Item with new values
|
||||
│
|
||||
▼
|
||||
6. lib.store(item) + lib.save() → SQLite
|
||||
│
|
||||
▼
|
||||
7. Regenerate header for subsequent reads
|
||||
```
|
||||
|
||||
### 3.4 InterpolatedFLAC Class
|
||||
|
||||
**Location**: Lines 274-388
|
||||
|
||||
```python
|
||||
class InterpolatedFLAC(FLAC):
|
||||
"""Custom FLAC handler that can load from bytes and generate headers."""
|
||||
|
||||
def load(self, filedata):
|
||||
"""Load FLAC from byte string instead of file."""
|
||||
self.metadata_blocks = []
|
||||
self.tags = None
|
||||
self.filedata = filedata
|
||||
self.fileobj = BytesIO(filedata)
|
||||
self.__check_header(self.fileobj)
|
||||
|
||||
while self.__read_metadata_block(self.fileobj):
|
||||
pass
|
||||
|
||||
# Verify audio frame starts correctly
|
||||
if self.fileobj.read(2) not in ["\xff\xf8", "\xff\xf9"]:
|
||||
raise FLACNoHeaderError("End of metadata did not start audio")
|
||||
|
||||
def get_header(self, filename=None):
|
||||
"""Generate FLAC header with current metadata."""
|
||||
# Add padding block
|
||||
self.metadata_blocks.append(Padding('\x00' * 1020))
|
||||
MetadataBlock.group_padding(self.metadata_blocks)
|
||||
|
||||
# Calculate available space
|
||||
header = self.__check_header(self.fileobj)
|
||||
available = self.__find_audio_offset(self.fileobj) - header
|
||||
data = MetadataBlock.writeblocks(self.metadata_blocks)
|
||||
|
||||
# Adjust padding to match available space
|
||||
if len(data) > available:
|
||||
# Reduce padding
|
||||
padding = self.metadata_blocks[-1]
|
||||
padding.length -= (len(data) - available)
|
||||
data = MetadataBlock.writeblocks(self.metadata_blocks)
|
||||
elif len(data) < available:
|
||||
# Increase padding
|
||||
self.metadata_blocks[-1].length += (available - len(data))
|
||||
data = MetadataBlock.writeblocks(self.metadata_blocks)
|
||||
|
||||
self.__offset = len("fLaC" + data)
|
||||
return "fLaC" + data
|
||||
|
||||
def offset(self):
|
||||
"""Return byte offset where audio data starts."""
|
||||
return self.__offset
|
||||
```
|
||||
|
||||
**FLAC Structure**:
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ "fLaC" │ STREAMINFO │ VORBIS_COMMENT │ ... │ PADDING │ AUDIO... │
|
||||
│ (4B) │ block │ block │ │ block │ │
|
||||
└──────────────────────────────────────────────────────────────────┘
|
||||
│◄──────── metadata_blocks ─────────►│
|
||||
│ │
|
||||
└──── get_header() returns this ─────┘
|
||||
```
|
||||
|
||||
### 3.5 InterpolatedID3 Class
|
||||
|
||||
**Location**: Lines 200-271
|
||||
|
||||
```python
|
||||
class InterpolatedID3(ID3):
|
||||
"""Custom ID3 handler for MP3 files."""
|
||||
|
||||
def save(self, filename=None, v1=0):
|
||||
"""Save ID3 tags to file."""
|
||||
# Sort frames by importance
|
||||
order = ["TIT2", "TPE1", "TRCK", "TALB", "TPOS", "TDRC", "TCON"]
|
||||
# ... write header ...
|
||||
```
|
||||
|
||||
**Note**: MP3 support is **incomplete** in the current implementation. The `FileHandler.__init__` sets `self.bound = 0` for MP3, effectively disabling interpolation.
|
||||
|
||||
---
|
||||
|
||||
## 4. Supported Metadata Fields
|
||||
|
||||
**Location**: Lines 55-77
|
||||
|
||||
```python
|
||||
METADATA_RW_FIELDS = [
|
||||
('title', 'text'),
|
||||
('artist', 'text'),
|
||||
('album', 'text'),
|
||||
('genre', 'text'),
|
||||
('composer', 'text'),
|
||||
('grouping', 'text'),
|
||||
('year', 'int'),
|
||||
('month', 'int'),
|
||||
('day', 'int'),
|
||||
('track', 'int'),
|
||||
('tracktotal', 'int'),
|
||||
('disc', 'int'),
|
||||
('disctotal', 'int'),
|
||||
('lyrics', 'text'),
|
||||
('comments', 'text'),
|
||||
('bpm', 'int'),
|
||||
('comp', 'bool'),
|
||||
]
|
||||
```
|
||||
|
||||
**Actually Implemented** (in FileHandler):
|
||||
| Field | Read | Write |
|
||||
|-------|------|-------|
|
||||
| `title` | ✅ | ✅ |
|
||||
| `artist` | ✅ | ✅ |
|
||||
| `album` | ✅ | ✅ |
|
||||
| `genre` | ✅ | ✅ |
|
||||
| Others | ❌ | ❌ |
|
||||
|
||||
---
|
||||
|
||||
## 5. Error Handling
|
||||
|
||||
**Error Codes Used**:
|
||||
|
||||
| Code | Constant | Usage |
|
||||
|------|----------|-------|
|
||||
| 2 | `ENOENT` | File/directory not found |
|
||||
| 13 | `EACCES` | Permission denied |
|
||||
| 1 | `EPERM` | Operation not permitted |
|
||||
| 95 | `EOPNOTSUPP` | Operation not supported |
|
||||
|
||||
**Exception Handling Pattern**:
|
||||
```python
|
||||
def getattr(self, path):
|
||||
try:
|
||||
# ... logic ...
|
||||
except Exception as e:
|
||||
logging.error(e)
|
||||
return -errno.ENOENT
|
||||
```
|
||||
@@ -0,0 +1,412 @@
|
||||
# beetfs Data Flow
|
||||
|
||||
## Overview
|
||||
|
||||
This document details the complete data flow for read and write operations in beetfs.
|
||||
|
||||
---
|
||||
|
||||
## 1. Initialization Flow
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ beet mount /mountpoint │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ mount(lib, config, opts, args) │
|
||||
│ │
|
||||
│ 1. Parse PATH_FORMAT into structure_split │
|
||||
│ PATH_FORMAT = "$artist/$album ($year) [$format_upper]/..." │
|
||||
│ structure_split = ["$artist", "$album ($year) [$format_upper]", ...] │
|
||||
│ structure_depth = 3 │
|
||||
│ │
|
||||
│ 2. Store global library reference │
|
||||
│ library = lib │
|
||||
│ │
|
||||
│ 3. Create empty virtual directory tree │
|
||||
│ directory_structure = FSNode({}, {}) │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ for item in lib.items(): │
|
||||
│ │
|
||||
│ For each item in beets library: │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ 1. Build template mapping │ │
|
||||
│ │ mapping = { │ │
|
||||
│ │ 'artist': 'Pink Floyd', │ │
|
||||
│ │ 'album': 'The Wall', │ │
|
||||
│ │ 'year': '1979', │ │
|
||||
│ │ 'format_upper': 'FLAC', │ │
|
||||
│ │ 'track': '01', │ │
|
||||
│ │ 'title': 'In The Flesh?', │ │
|
||||
│ │ } │ │
|
||||
│ │ │ │
|
||||
│ │ 2. Substitute template for each level │ │
|
||||
│ │ level_subbed[0] = "Pink Floyd" │ │
|
||||
│ │ level_subbed[1] = "The Wall (1979) [FLAC]" │ │
|
||||
│ │ level_subbed[2] = "01 - Pink Floyd - In The Flesh?.flac" │ │
|
||||
│ │ │ │
|
||||
│ │ 3. Add directories to tree │ │
|
||||
│ │ directory_structure.adddir([], "Pink Floyd") │ │
|
||||
│ │ directory_structure.adddir(["Pink Floyd"], "The Wall (1979)...") │ │
|
||||
│ │ │ │
|
||||
│ │ 4. Add file entry (filename → item.id) │ │
|
||||
│ │ directory_structure.addfile( │ │
|
||||
│ │ ["Pink Floyd", "The Wall (1979) [FLAC]"], │ │
|
||||
│ │ "01 - Pink Floyd - In The Flesh?.flac", │ │
|
||||
│ │ item.id # e.g., 42 │ │
|
||||
│ │ ) │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ beetFileSystem FUSE Server │
|
||||
│ │
|
||||
│ server = beetFileSystem(...) │
|
||||
│ server.multithreaded = 0 │
|
||||
│ server.main() ← Enters FUSE event loop │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. File Open Flow
|
||||
|
||||
```
|
||||
Application: open("/mount/Pink Floyd/The Wall (1979) [FLAC]/01 - Pink Floyd - In The Flesh?.flac")
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ beetFileSystem.open(path, flags) │
|
||||
│ Lines 988-1021 │
|
||||
│ │
|
||||
│ path = "/Pink Floyd/The Wall (1979) [FLAC]/01 - Pink Floyd - In The..." │
|
||||
│ flags = os.O_RDONLY (or O_RDWR) │
|
||||
│ │
|
||||
│ if path in self.files: │
|
||||
│ # File already open - increment reference count │
|
||||
│ self.files[path].open() │
|
||||
│ return self.files[path] │
|
||||
│ else: │
|
||||
│ # Create new FileHandler │
|
||||
│ self.files[path] = FileHandler(path, self.lib) │
|
||||
│ return self.files[path] │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ FileHandler.__init__(path, lib) │
|
||||
│ Lines 440-483 │
|
||||
│ │
|
||||
│ Step 1: Resolve virtual path to beets item │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ pathsplit = ["Pink Floyd", "The Wall (1979) [FLAC]", │ │
|
||||
│ │ "01 - Pink Floyd - In The Flesh?.flac"] │ │
|
||||
│ │ │ │
|
||||
│ │ # Navigate to parent directory in virtual tree │ │
|
||||
│ │ node = directory_structure.getnode(pathsplit[0:2]) │ │
|
||||
│ │ # node.files = {"01 - Pink Floyd - In The Flesh?.flac": 42, ...} │ │
|
||||
│ │ │ │
|
||||
│ │ # Get beets item by ID │ │
|
||||
│ │ item_id = node.files[pathsplit[2]] # 42 │ │
|
||||
│ │ self.item = lib.get_item(id=42) │ │
|
||||
│ │ self.real_path = self.item.path │ │
|
||||
│ │ # e.g., "/mnt/music/torrents/pink_floyd_wall.flac" │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Step 2: Open real file and detect format │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.file_object = open(self.real_path, 'r+') │ │
|
||||
│ │ self.format = "flac" # from file extension │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Step 3: Create InterpolatedFLAC with database metadata │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.inf = InterpolatedFLAC(self.file_object.read()) │ │
|
||||
│ │ │ │
|
||||
│ │ # INJECT DATABASE METADATA (this is the key operation!) │ │
|
||||
│ │ self.inf["title"] = self.item.title # "In The Flesh?" │ │
|
||||
│ │ self.inf["album"] = self.item.album # "The Wall" │ │
|
||||
│ │ self.inf["artist"] = self.item.artist # "Pink Floyd" │ │
|
||||
│ │ self.inf["genre"] = self.item.genre # "Progressive Rock" │ │
|
||||
│ │ │ │
|
||||
│ │ # Generate header with injected metadata │ │
|
||||
│ │ self.header = self.inf.get_header(self.real_path) │ │
|
||||
│ │ self.bound = len(self.header) # e.g., 8192 bytes │ │
|
||||
│ │ self.music_offset = self.inf.offset() │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Step 4: Cache audio data │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.file_object.seek(self.music_offset) │ │
|
||||
│ │ self.music_data = self.file_object.read() # All audio data │ │
|
||||
│ │ self.file_object.close() │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. File Read Flow
|
||||
|
||||
```
|
||||
Application: read(fd, buffer, 4096) # offset managed by kernel
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ beetFileSystem.read(path, size, offset, fh) │
|
||||
│ Lines 1077-1106 │
|
||||
│ │
|
||||
│ path = "/Pink Floyd/The Wall (1979) [FLAC]/01 - ..." │
|
||||
│ size = 4096 │
|
||||
│ offset = 0 (first read) or previous offset + bytes_read │
|
||||
│ fh = FileHandler instance │
|
||||
│ │
|
||||
│ return self.files[path].read(size, offset) │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ FileHandler.read(size, offset) │
|
||||
│ Lines 497-517 │
|
||||
│ │
|
||||
│ Variables: │
|
||||
│ self.bound = 8192 (header size) │
|
||||
│ self.header = bytes (generated FLAC header with DB metadata) │
|
||||
│ self.music_data = bytes (original audio frames) │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌───────────────────────┼───────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
|
||||
│ Case 1: Header Only │ │ Case 2: Span Both │ │ Case 3: Audio Only │
|
||||
│ offset < bound │ │ offset < bound │ │ offset >= bound │
|
||||
│ offset+size < bound │ │ offset+size >= bound│ │ │
|
||||
├─────────────────────┤ ├─────────────────────┤ ├─────────────────────┤
|
||||
│ Example: │ │ Example: │ │ Example: │
|
||||
│ offset=0 │ │ offset=8000 │ │ offset=10000 │
|
||||
│ size=4096 │ │ size=4096 │ │ size=4096 │
|
||||
│ bound=8192 │ │ bound=8192 │ │ bound=8192 │
|
||||
├─────────────────────┤ ├─────────────────────┤ ├─────────────────────┤
|
||||
│ Return: │ │ Return: │ │ Return: │
|
||||
│ header[0:4096] │ │ header[8000:8192] │ │ music_data[ │
|
||||
│ │ │ + music_data[0:3904]│ │ 1808:5904] │
|
||||
│ (DB metadata!) │ │ │ │ │
|
||||
│ │ │ (mixed) │ │ (original audio) │
|
||||
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
|
||||
|
||||
|
||||
Visual representation of virtual file:
|
||||
|
||||
0 bound (8192) EOF
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌───────────────────────┬────────────────────────────────────────────┐
|
||||
│ HEADER │ AUDIO DATA │
|
||||
│ (self.header) │ (self.music_data) │
|
||||
│ │ │
|
||||
│ Contains: │ Contains: │
|
||||
│ - "fLaC" magic │ - Original FLAC frames │
|
||||
│ - STREAMINFO block │ - Unchanged from disk │
|
||||
│ - VORBIS_COMMENT │ │
|
||||
│ with DB values: │ │
|
||||
│ title, artist, │ │
|
||||
│ album, genre │ │
|
||||
│ - PADDING block │ │
|
||||
└───────────────────────┴────────────────────────────────────────────┘
|
||||
▲ ▲
|
||||
│ │
|
||||
From InterpolatedFLAC From original file
|
||||
with injected DB tags (passed through)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. File Write Flow
|
||||
|
||||
```
|
||||
Application: write(fd, "TITLE=New Title\0", 16) # Hypothetical tag edit
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ beetFileSystem.write(path, buf, offset, fh) │
|
||||
│ Lines 1108-1135 │
|
||||
│ │
|
||||
│ return self.files[path].write(offset, buf) │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ FileHandler.write(offset, buf) │
|
||||
│ Lines 519-565 │
|
||||
│ │
|
||||
│ if offset >= self.bound: │
|
||||
│ # Write is in audio area - DISCARD │
|
||||
│ return # Do nothing, audio is read-only │
|
||||
│ │
|
||||
│ # Write is in header area - process tag update │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Step 1: Reconstruct full virtual file in memory │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ filedata = self.header + self.music_data │ │
|
||||
│ │ │ │
|
||||
│ │ # Patch in new data │ │
|
||||
│ │ filedata = filedata[0:offset] + buf + filedata[offset + len(buf):] │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Step 2: Parse patched data as FLAC │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.inf = InterpolatedFLAC(filedata) │ │
|
||||
│ │ # This parses the FLAC structure and extracts Vorbis comments │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Step 3: Extract tag values from parsed FLAC │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.item.title = str(self.inf["title"][0]).encode('utf-8') │ │
|
||||
│ │ self.item.album = str(self.inf["album"][0]).encode('utf-8') │ │
|
||||
│ │ self.item.artist = str(self.inf["artist"][0]).encode('utf-8') │ │
|
||||
│ │ self.item.genre = str(self.inf["genre"][0]).encode('utf-8') │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Step 4: Save to beets database │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.lib.store(self.item) # Update item in library │ │
|
||||
│ │ self.lib.save() # Persist to SQLite │ │
|
||||
│ │ │ │
|
||||
│ │ # NOTE: Original file on disk is NEVER touched! │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Step 5: Regenerate header for subsequent reads │
|
||||
│ ┌───────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ self.inf["title"] = self.item.title │ │
|
||||
│ │ self.inf["album"] = self.item.album │ │
|
||||
│ │ self.inf["artist"] = self.item.artist │ │
|
||||
│ │ self.inf["genre"] = self.item.genre │ │
|
||||
│ │ │ │
|
||||
│ │ self.header = self.inf.get_header(self.real_path) │ │
|
||||
│ │ self.bound = len(self.header) │ │
|
||||
│ └───────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ return len(buf) # Success │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
|
||||
Write data flow summary:
|
||||
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Application │ │ beetfs │ │ Beets │ │ Original │
|
||||
│ writes │────▶│ parses │────▶│ database │ │ file │
|
||||
│ new tags │ │ extracts │ │ updated │ │ UNTOUCHED │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. File Release Flow
|
||||
|
||||
```
|
||||
Application: close(fd)
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ beetFileSystem.release(path, flags, fh) │
|
||||
│ Lines 1049-1059 │
|
||||
│ │
|
||||
│ if self.files[path].release(): │
|
||||
│ # Reference count reached 0, clean up │
|
||||
│ del self.files[path] │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ FileHandler.release() │
|
||||
│ Lines 489-495 │
|
||||
│ │
|
||||
│ self.instance_count -= 1 │
|
||||
│ │
|
||||
│ if self.instance_count == 0: │
|
||||
│ return True # OK to delete │
|
||||
│ else: │
|
||||
│ return False # Still in use │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Directory Listing Flow
|
||||
|
||||
```
|
||||
Application: ls /mount/Pink\ Floyd/
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ beetFileSystem.readdir(path, offset, dh) │
|
||||
│ Lines 931-975 │
|
||||
│ │
|
||||
│ path = "/Pink Floyd" │
|
||||
│ pathsplit = ["Pink Floyd"] │
|
||||
│ │
|
||||
│ yield fuse.Direntry(".") │
|
||||
│ yield fuse.Direntry("..") │
|
||||
│ │
|
||||
│ # len(pathsplit) == 1, structure_depth - 1 == 2 │
|
||||
│ # So we're listing directories (albums), not files │
|
||||
│ │
|
||||
│ for dirname in directory_structure.listdir(pathsplit, True): │
|
||||
│ yield fuse.Direntry(dirname.encode('utf-8')) │
|
||||
│ # "The Wall (1979) [FLAC]" │
|
||||
│ # "Animals (1977) [FLAC]" │
|
||||
│ # etc. │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Complete Request Lifecycle
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────────────┐
|
||||
│ COMPLETE LIFECYCLE │
|
||||
│ │
|
||||
│ 1. User mounts: beet mount /mnt/music │
|
||||
│ ├─ Build virtual tree from beets library │
|
||||
│ └─ Start FUSE event loop │
|
||||
│ │
|
||||
│ 2. Application opens file: open("/mnt/music/Artist/Album/track.flac") │
|
||||
│ ├─ Resolve virtual path to beets item ID │
|
||||
│ ├─ Load original file into memory │
|
||||
│ ├─ Inject database metadata into FLAC structure │
|
||||
│ ├─ Generate new header with DB tags │
|
||||
│ └─ Cache audio data │
|
||||
│ │
|
||||
│ 3. Application reads file: read(fd, buf, 4096) │
|
||||
│ ├─ If reading header region → return header (DB metadata) │
|
||||
│ ├─ If reading audio region → return cached audio (original) │
|
||||
│ └─ If spanning both → return combined data │
|
||||
│ │
|
||||
│ 4. Application writes tags: write(fd, new_tags, offset) │
|
||||
│ ├─ If audio region → discard (read-only) │
|
||||
│ ├─ If header region: │
|
||||
│ │ ├─ Parse new tag values │
|
||||
│ │ ├─ Update beets database │
|
||||
│ │ └─ Regenerate header │
|
||||
│ └─ Original file NEVER modified │
|
||||
│ │
|
||||
│ 5. Application closes file: close(fd) │
|
||||
│ ├─ Decrement reference count │
|
||||
│ └─ Clean up if count == 0 │
|
||||
│ │
|
||||
│ 6. User unmounts: fusermount -u /mnt/music │
|
||||
│ └─ fsdestroy() called, cleanup │
|
||||
│ │
|
||||
└──────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
@@ -0,0 +1,479 @@
|
||||
# beetfs Drawbacks & Limitations
|
||||
|
||||
## Overview
|
||||
|
||||
This document catalogs all identified issues, limitations, and missing features in beetfs. Issues are categorized by severity and type.
|
||||
|
||||
---
|
||||
|
||||
## Critical Issues (🔴)
|
||||
|
||||
### 1. Full File Loading into Memory
|
||||
|
||||
**Location**: Lines 463, 480-481
|
||||
|
||||
```python
|
||||
self.inf = InterpolatedFLAC(self.file_object.read()) # Entire file
|
||||
# ...
|
||||
self.music_data = self.file_object.read() # Audio portion again
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Memory usage = O(file_size) per open file
|
||||
- 50MB FLAC = ~50MB RAM
|
||||
- Library scan of 100 files = 5GB+ RAM
|
||||
- Out-of-memory crashes on large libraries
|
||||
|
||||
**Fix Required**: Implement lazy loading with seek-based reads.
|
||||
|
||||
---
|
||||
|
||||
### 2. MP3 Support Disabled
|
||||
|
||||
**Location**: Lines 475-477
|
||||
|
||||
```python
|
||||
elif self.format == "mp3":
|
||||
self.bound = 0 # disable interpolation for now
|
||||
self.music_offset = 0 # disable interpolation for now
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- MP3 files return original metadata, not database metadata
|
||||
- Breaks the core promise of metadata overlay
|
||||
- MP3 is still one of the most common formats
|
||||
|
||||
**Fix Required**: Implement `InterpolatedID3` header generation.
|
||||
|
||||
---
|
||||
|
||||
### 3. Python 2 Only
|
||||
|
||||
**Location**: Throughout
|
||||
|
||||
```python
|
||||
except fuse.FuseError, e: # Python 2 syntax
|
||||
if isinstance(value, basestring): # Removed in Python 3
|
||||
return reduce(lambda a, b: (a << 8) + ord(b), string, 0L) # Long literals
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Python 2 EOL was January 2020
|
||||
- Security vulnerabilities unfixed
|
||||
- No modern library support
|
||||
- Cannot run on Python 3 without migration
|
||||
|
||||
**Fix Required**: Full Python 3 migration (see modernization.md).
|
||||
|
||||
---
|
||||
|
||||
### 4. Deprecated FUSE Library
|
||||
|
||||
**Location**: Line 25, 51
|
||||
|
||||
```python
|
||||
import fuse
|
||||
fuse.fuse_python_api = (0, 2)
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- fuse-python is unmaintained
|
||||
- Missing modern FUSE features (FUSE 3.x)
|
||||
- Compatibility issues with recent kernels
|
||||
- No async support
|
||||
|
||||
**Fix Required**: Migrate to pyfuse3 or llfuse.
|
||||
|
||||
---
|
||||
|
||||
### 5. Single-Threaded Execution
|
||||
|
||||
**Location**: Line 178
|
||||
|
||||
```python
|
||||
server.multithreaded = 0
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- All operations serialized
|
||||
- One slow open blocks all other operations
|
||||
- Cannot utilize multiple CPU cores
|
||||
- Poor performance under concurrent access
|
||||
|
||||
**Fix Required**: Enable multithreading with proper locking.
|
||||
|
||||
---
|
||||
|
||||
## Major Issues (🟡)
|
||||
|
||||
### 6. Limited Metadata Fields
|
||||
|
||||
**Location**: Lines 466-469, 540-547
|
||||
|
||||
```python
|
||||
# Only these 4 fields are actually used:
|
||||
self.inf["title"] = self.item.title
|
||||
self.inf["album"] = self.item.album
|
||||
self.inf["artist"] = self.item.artist
|
||||
self.inf["genre"] = self.item.genre
|
||||
```
|
||||
|
||||
**Defined but not implemented** (lines 55-77):
|
||||
- `composer`, `grouping`
|
||||
- `year`, `month`, `day`
|
||||
- `track`, `tracktotal`
|
||||
- `disc`, `disctotal`
|
||||
- `lyrics`, `comments`
|
||||
- `bpm`, `comp`
|
||||
- `albumartist` (not even defined)
|
||||
|
||||
**Impact**:
|
||||
- Track numbers not from database
|
||||
- Album artist not supported
|
||||
- Year/date not interpolated
|
||||
- Cover art not handled
|
||||
|
||||
---
|
||||
|
||||
### 7. No File Handle Caching/Eviction
|
||||
|
||||
**Location**: Lines 1004-1018
|
||||
|
||||
```python
|
||||
if path in self.files:
|
||||
self.files[path].open()
|
||||
else:
|
||||
self.files[path] = FileHandler(path, self.lib)
|
||||
```
|
||||
|
||||
**Missing**:
|
||||
- No maximum cache size
|
||||
- No LRU eviction
|
||||
- No memory pressure handling
|
||||
- Files stay in memory until explicitly closed
|
||||
|
||||
**Impact**:
|
||||
- Memory grows unbounded
|
||||
- No protection against OOM
|
||||
- Applications that open-then-close still leave data cached
|
||||
|
||||
---
|
||||
|
||||
### 8. Blocking Database Operations
|
||||
|
||||
**Location**: Lines 549-550
|
||||
|
||||
```python
|
||||
self.lib.store(self.item)
|
||||
self.lib.save()
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- SQLite operations in FUSE thread
|
||||
- Write operations block all reads
|
||||
- No transaction batching
|
||||
- Potential deadlocks with beets
|
||||
|
||||
---
|
||||
|
||||
### 9. No Library Hot Reload
|
||||
|
||||
**Issue**: Virtual directory tree built once at mount time.
|
||||
|
||||
**Location**: Lines 142-172
|
||||
|
||||
```python
|
||||
for item in lib.items():
|
||||
# Build tree...
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- New files added to beets library not visible
|
||||
- Deleted files still appear (ENOENT on access)
|
||||
- Metadata changes in beets not reflected until remount
|
||||
- Must unmount/remount to see changes
|
||||
|
||||
---
|
||||
|
||||
### 10. Static Path Format
|
||||
|
||||
**Location**: Lines 44-45
|
||||
|
||||
```python
|
||||
PATH_FORMAT = ("$artist/$album ($year) [$format_upper]/"
|
||||
"$track - $artist - $title.$format")
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Cannot customize organization
|
||||
- Hard-coded template
|
||||
- No configuration option
|
||||
- Incompatible with different organizational preferences
|
||||
|
||||
---
|
||||
|
||||
### 11. No Extended Attribute Support
|
||||
|
||||
**Location**: Not implemented
|
||||
|
||||
**Impact**:
|
||||
- Cannot store/retrieve xattrs
|
||||
- Some applications use xattrs for metadata
|
||||
- macOS Finder metadata lost
|
||||
- Linux capabilities not supported
|
||||
|
||||
---
|
||||
|
||||
### 12. No Symlink Support
|
||||
|
||||
**Location**: Lines 758-765
|
||||
|
||||
```python
|
||||
def readlink(self, path):
|
||||
return -errno.EOPNOTSUPP
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Cannot create symlinks in mount
|
||||
- Some applications expect symlink support
|
||||
- Cannot link to external files
|
||||
|
||||
---
|
||||
|
||||
### 13. Silent Error Swallowing
|
||||
|
||||
**Location**: Lines 705-707, 1019-1021, 1103-1104
|
||||
|
||||
```python
|
||||
except Exception as e:
|
||||
logging.error(e)
|
||||
return -errno.ENOENT # Always returns same error
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- All errors appear as "file not found"
|
||||
- Hard to debug issues
|
||||
- No distinction between permission, I/O, parse errors
|
||||
- Lost stack traces in many cases
|
||||
|
||||
---
|
||||
|
||||
## Minor Issues (🟢)
|
||||
|
||||
### 14. Global State
|
||||
|
||||
**Location**: Lines 125-140
|
||||
|
||||
```python
|
||||
global structure_split
|
||||
global structure_depth
|
||||
global library
|
||||
global directory_structure
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Cannot mount multiple instances
|
||||
- Difficult to unit test
|
||||
- Tight coupling between components
|
||||
- No dependency injection
|
||||
|
||||
---
|
||||
|
||||
### 15. Hard-coded Log File
|
||||
|
||||
**Location**: Lines 624-625
|
||||
|
||||
```python
|
||||
LOG_FILENAME = "LOG"
|
||||
logging.basicConfig(filename=LOG_FILENAME, level=logging.INFO,)
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Log file created in current directory
|
||||
- No log rotation
|
||||
- No configurable log level
|
||||
- Fills disk on busy systems
|
||||
|
||||
---
|
||||
|
||||
### 16. Reference Count Manual Management
|
||||
|
||||
**Location**: Lines 485-495
|
||||
|
||||
```python
|
||||
def open(self):
|
||||
self.instance_count = self.instance_count + 1
|
||||
|
||||
def release(self):
|
||||
if self.instance_count > 0:
|
||||
self.instance_count = self.instance_count - 1
|
||||
```
|
||||
|
||||
**Issues**:
|
||||
- Race conditions possible if multithreaded
|
||||
- No context manager support
|
||||
- Manual counting error-prone
|
||||
- Off-by-one potential
|
||||
|
||||
---
|
||||
|
||||
### 17. Inefficient Directory Building
|
||||
|
||||
**Location**: Lines 153-172
|
||||
|
||||
```python
|
||||
for level in range(0, structure_depth - 1):
|
||||
if level-1 in level_subbed:
|
||||
sub_elements.append(level_subbed[level-1])
|
||||
directory_structure.adddir(sub_elements, level_subbed[level])
|
||||
```
|
||||
|
||||
**Issues**:
|
||||
- Rebuilds path for every item
|
||||
- O(items × depth) complexity
|
||||
- String allocations in inner loop
|
||||
- Could use trie-based insertion
|
||||
|
||||
---
|
||||
|
||||
### 18. No Cover Art Handling
|
||||
|
||||
**Issue**: Cover art embedded in FLAC not addressed.
|
||||
|
||||
**Impact**:
|
||||
- Cover art from original file used, not database
|
||||
- Cannot replace/add cover art through overlay
|
||||
- PICTURE metadata blocks passed through unchanged
|
||||
|
||||
---
|
||||
|
||||
### 19. No Cue Sheet Support
|
||||
|
||||
**Issue**: Cue sheets not handled specially.
|
||||
|
||||
**Impact**:
|
||||
- `.cue` files point to original file paths
|
||||
- Cannot play cue-referenced tracks correctly
|
||||
- Split-by-cue not supported
|
||||
|
||||
---
|
||||
|
||||
### 20. File Size Mismatch Potential
|
||||
|
||||
**Issue**: Virtual file size differs from physical if header size changes.
|
||||
|
||||
**Location**: Lines 675-688
|
||||
|
||||
```python
|
||||
statinfo = os.stat(item)
|
||||
st = Stat(st_mode=statinfo.st_mode,
|
||||
st_size=statinfo.st_size, # Original size, not virtual!
|
||||
...)
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- `stat()` returns original file size
|
||||
- If generated header is larger/smaller, size is wrong
|
||||
- Some applications may fail on size mismatch
|
||||
- Range requests could break
|
||||
|
||||
---
|
||||
|
||||
## Missing Features
|
||||
|
||||
### Essential
|
||||
|
||||
| Feature | Status | Notes |
|
||||
|---------|--------|-------|
|
||||
| MP3 metadata interpolation | ❌ Disabled | Code exists but disabled |
|
||||
| OGG/Opus support | ❌ Missing | No implementation |
|
||||
| AAC/M4A support | ❌ Missing | No implementation |
|
||||
| Lazy file loading | ❌ Missing | Full file loaded |
|
||||
| Memory management | ❌ Missing | No limits or eviction |
|
||||
| Configuration file | ❌ Missing | Hard-coded values |
|
||||
|
||||
### Nice to Have
|
||||
|
||||
| Feature | Status | Notes |
|
||||
|---------|--------|-------|
|
||||
| Cover art interpolation | ❌ Missing | Would need PICTURE block handling |
|
||||
| ReplayGain from database | ❌ Missing | Tags not interpolated |
|
||||
| Lyrics from database | ❌ Missing | Listed in fields, not implemented |
|
||||
| Watch mode (hot reload) | ❌ Missing | No inotify integration |
|
||||
| Multiple mount points | ❌ Missing | Global state prevents |
|
||||
| Remote database | ❌ Missing | Local beets only |
|
||||
| Read-only mode | ❌ Missing | Always allows writes |
|
||||
| Custom path templates | ❌ Missing | Hard-coded PATH_FORMAT |
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### 1. No Input Validation
|
||||
|
||||
**Location**: Throughout
|
||||
|
||||
```python
|
||||
pathsplit = path[1:].split('/')
|
||||
item_id = node.files[pathsplit[structure_depth-1]] # No bounds check
|
||||
```
|
||||
|
||||
**Risk**: Path traversal, injection attacks unlikely but possible.
|
||||
|
||||
### 2. Database Credentials Exposed
|
||||
|
||||
**Issue**: Uses beets library directly with stored credentials.
|
||||
|
||||
**Risk**: Low - local access only.
|
||||
|
||||
### 3. No Permission Enforcement
|
||||
|
||||
**Location**: Lines 749-756
|
||||
|
||||
```python
|
||||
if flags | os.R_OK:
|
||||
pass # TODO: actually check the file permissions
|
||||
if flags | os.W_OK:
|
||||
pass
|
||||
```
|
||||
|
||||
**Risk**: All users can read/write through mount.
|
||||
|
||||
---
|
||||
|
||||
## Compatibility Issues
|
||||
|
||||
| Component | Issue |
|
||||
|-----------|-------|
|
||||
| **Jellyfin** | May scan entire library, causing OOM |
|
||||
| **Plex** | Same library scan issue |
|
||||
| **Navidrome** | Expects certain tag fields not implemented |
|
||||
| **mpd** | Works for playback, database features limited |
|
||||
| **macOS** | fuse-python macOS support questionable |
|
||||
| **Docker** | FUSE in containers requires privileged mode |
|
||||
|
||||
---
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Category | Critical | Major | Minor |
|
||||
|----------|----------|-------|-------|
|
||||
| Performance | 2 | 4 | 2 |
|
||||
| Functionality | 2 | 5 | 4 |
|
||||
| Code Quality | 2 | 2 | 4 |
|
||||
| **Total** | **6** | **11** | **10** |
|
||||
|
||||
---
|
||||
|
||||
## Prioritized Fix List
|
||||
|
||||
1. 🔴 **Memory**: Implement lazy loading (Critical for usability)
|
||||
2. 🔴 **Python 3**: Migrate to Python 3 (Required for any changes)
|
||||
3. 🔴 **FUSE lib**: Switch to pyfuse3/llfuse (Required for Python 3)
|
||||
4. 🔴 **MP3**: Enable MP3 interpolation (Core functionality)
|
||||
5. 🟡 **Metadata**: Implement all fields (Feature completeness)
|
||||
6. 🟡 **Threading**: Enable multithreading (Performance)
|
||||
7. 🟡 **Config**: Add configuration file (Usability)
|
||||
8. 🟡 **Hot reload**: Watch for library changes (Usability)
|
||||
9. 🟢 **Globals**: Remove global state (Code quality)
|
||||
10. 🟢 **Logging**: Configurable logging (Operations)
|
||||
@@ -0,0 +1,493 @@
|
||||
# beetfs E2E Test Plan
|
||||
|
||||
> **Reviewed by Oracle** - Critical bug discovered, plan updated accordingly
|
||||
|
||||
## Test Results (Latest Run)
|
||||
|
||||
```
|
||||
Tests run: 74
|
||||
Passed: 12
|
||||
Failures: 56
|
||||
Errors: 3
|
||||
Skipped: 3
|
||||
Duration: ~103 seconds
|
||||
```
|
||||
|
||||
### Bugs Detected by Tests
|
||||
|
||||
| Bug | Tests Affected | Description |
|
||||
|-----|----------------|-------------|
|
||||
| **Nested Methods** | 56 | Lines 758-1144 indented inside `access()` - FUSE operations unreachable |
|
||||
| **Directory Tree Building** | 3 | `KeyError` in `FSNode.getnode()` when adding files |
|
||||
| **Unmount** | 1 | Filesystem not unmounting cleanly |
|
||||
|
||||
### Passing Tests (12)
|
||||
|
||||
- `test_fuse_available` - FUSE/fusermount detected
|
||||
- `test_library_fixture_created` - SQLite DB and music dir created
|
||||
- `test_temp_directory_created` - Temp dirs set up correctly
|
||||
- `test_mount_empty_library` - **Mount works with empty library!**
|
||||
- `test_list_empty_root` - Empty root returns empty list
|
||||
- `test_list_root_returns_list` - Returns list type
|
||||
- `test_access_empty_path` - Handles empty path
|
||||
- Plus 5 nested bug detection tests (confirming bug exists)
|
||||
|
||||
## Executive Summary
|
||||
|
||||
E2E tests for beetfs FUSE filesystem using real music files from qBittorrent container. No mocks - actual filesystem operations against mounted beetfs.
|
||||
|
||||
### Critical Finding
|
||||
|
||||
**BUG DISCOVERED**: Lines 758-1144 in `beetFs.py` are indented inside `access()` method, making these FUSE operations unreachable as class methods:
|
||||
- `readdir`, `open`, `read`, `write`, `mkdir`, `unlink`, `rmdir`, `symlink`, `link`, `rename`, `chmod`, `chown`, `truncate`, `opendir`, `releasedir`, `fsyncdir`, `create`, `fgetattr`, `release`, `fsync`, `flush`, `ftruncate`
|
||||
|
||||
Tests will expose this immediately - write `test_readdir.py` first.
|
||||
|
||||
---
|
||||
|
||||
## Test Environment
|
||||
|
||||
| Component | Status | Details |
|
||||
|-----------|--------|---------|
|
||||
| Real Music | Available | Metallica "72 Seasons" (12 FLAC, 650MB) at `/home/fujin/.local/share/docker/volumes/containers_downloads/_data/Metallica - 72 Seasons (2023) [FLAC] 88/` |
|
||||
| Synthetic Music | Create | 5-10MB FLACs for most tests (avoid RAM explosion) |
|
||||
| Beets Config | Create | `~/.config/beets/config.yaml` for test isolation |
|
||||
| Beets Library | Empty | Needs import of test files |
|
||||
| Python | 2.7.15 | Via Nix flake (nixpkgs-18.09) |
|
||||
| Test Framework | unittest | stdlib, no external deps for Py2.7 |
|
||||
|
||||
---
|
||||
|
||||
## Test Architecture
|
||||
|
||||
```
|
||||
beetfs/tests/
|
||||
├── __init__.py
|
||||
├── conftest.py # Test fixtures, beets library setup, synthetic FLAC creation
|
||||
├── test_smoke.py # Mount/unmount lifecycle (run FIRST)
|
||||
├── test_nested_bug.py # Verify the indentation bug (run SECOND)
|
||||
├── test_readdir.py # Directory listing operations
|
||||
├── test_read.py # File reading with metadata overlay (CORE FEATURE)
|
||||
├── test_stat.py # getattr, fgetattr, statfs
|
||||
├── test_write.py # Metadata write operations
|
||||
├── test_error_handling.py # ENOENT, EOPNOTSUPP scenarios
|
||||
├── test_edge_cases.py # Unicode, concurrent opens, special chars
|
||||
├── test_integration.py # Real 650MB files (skip by default)
|
||||
└── fixtures/
|
||||
├── synthetic/ # Generated 5-10MB test FLACs
|
||||
└── real -> /home/fujin/.local/share/docker/volumes/containers_downloads/_data/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test Tiers
|
||||
|
||||
### Tier 1: Unit-ish (Synthetic FLACs, ~500KB each)
|
||||
- Fast execution
|
||||
- No memory issues (FileHandler loads entire file to RAM)
|
||||
- Run on every commit
|
||||
|
||||
### Tier 2: Integration (Subset of real files, 1-2 tracks)
|
||||
- Uses real Metallica FLACs
|
||||
- Tests real-world metadata
|
||||
- Run before merge
|
||||
|
||||
### Tier 3: E2E (All 12 tracks, 650MB)
|
||||
- Full album processing
|
||||
- Memory stress testing
|
||||
- Run via `E2E=1 python -m unittest discover`
|
||||
- Skip by default
|
||||
|
||||
---
|
||||
|
||||
## Test Isolation Strategy
|
||||
|
||||
| Resource | Strategy | Rationale |
|
||||
|----------|----------|-----------|
|
||||
| Audio Files | **Symlinks** for reads | beetfs NEVER writes to source files, only to beets DB |
|
||||
| Beets DB | **Copy per test** | Writes mutate DB; need isolation |
|
||||
| Mount Point | **Fresh tempdir** | Each test gets clean mount |
|
||||
| Global State | **Fresh subprocess** | `library`, `directory_structure` are module globals |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Order
|
||||
|
||||
> Reordered per Oracle recommendation: smoke → nested-bug → read → write → errors → edge
|
||||
|
||||
### Phase 1: Infrastructure (Day 1 AM)
|
||||
|
||||
1. Create `tests/` directory structure
|
||||
2. Implement `BeetFSTestCase` base class with:
|
||||
- Subprocess timeout via `threading.Timer` (Py2.7 compatible)
|
||||
- Mount wait polling (`os.path.ismount()`)
|
||||
- Proper cleanup (`fusermount -u`)
|
||||
3. Create synthetic FLAC generator using ffmpeg + flac CLI
|
||||
4. Setup isolated beets config and library
|
||||
|
||||
### Phase 2: Bug Detection (Day 1 PM)
|
||||
|
||||
5. `test_smoke.py` - Mount/unmount lifecycle
|
||||
6. `test_nested_bug.py` - Verify `readdir`, `open` are callable (will fail, exposing bug)
|
||||
|
||||
### Phase 3: Core Tests (Day 2)
|
||||
|
||||
7. `test_readdir.py` - Directory listing
|
||||
8. `test_read.py` - **Metadata overlay verification** (critical)
|
||||
9. `test_stat.py` - File/directory attributes
|
||||
|
||||
### Phase 4: Write & Errors (Day 3)
|
||||
|
||||
10. `test_write.py` - Metadata modification, DB persistence
|
||||
11. `test_error_handling.py` - ENOENT, EOPNOTSUPP
|
||||
|
||||
### Phase 5: Edge Cases (Day 3-4)
|
||||
|
||||
12. `test_edge_cases.py` - Unicode, concurrent opens, special chars
|
||||
13. `test_integration.py` - Real 650MB files (optional tier)
|
||||
|
||||
---
|
||||
|
||||
## Test Categories
|
||||
|
||||
### 1. Smoke Tests (`test_smoke.py`)
|
||||
|
||||
| Test | Operation | Expected |
|
||||
|------|-----------|----------|
|
||||
| `test_mount_success` | Mount beetfs | `os.path.ismount()` returns True |
|
||||
| `test_unmount_clean` | Unmount | Process exits 0, dir accessible |
|
||||
| `test_mount_empty_library` | Mount with 0 items | Mounts successfully, root empty |
|
||||
| `test_mount_invalid_path` | Mount to non-existent | Fails gracefully |
|
||||
| `test_fsinit_called` | Check initialization | No crash on mount |
|
||||
|
||||
### 2. Nested Methods Bug (`test_nested_bug.py`)
|
||||
|
||||
| Test | Operation | Expected |
|
||||
|------|-----------|----------|
|
||||
| `test_readdir_exists` | `hasattr(beetFileSystem, 'readdir')` | True (currently False!) |
|
||||
| `test_open_exists` | `hasattr(beetFileSystem, 'open')` | True (currently False!) |
|
||||
| `test_read_exists` | `hasattr(beetFileSystem, 'read')` | True (currently False!) |
|
||||
| `test_readdir_callable` | `os.listdir(mount)` | Returns list (currently fails!) |
|
||||
|
||||
### 3. Directory Operations (`test_readdir.py`)
|
||||
|
||||
| Test | Operation | Expected |
|
||||
|------|-----------|----------|
|
||||
| `test_list_root` | `os.listdir(mount)` | Returns artist directories |
|
||||
| `test_list_artist` | `os.listdir(mount/artist)` | Returns album directories |
|
||||
| `test_list_album` | `os.listdir(mount/artist/album)` | Returns track files |
|
||||
| `test_path_format` | Check structure | Matches `$artist/$album ($year) [$format_upper]/$track - $artist - $title.$format` |
|
||||
| `test_unicode_paths` | Non-ASCII chars | Handles "Lux Aeterna" correctly |
|
||||
|
||||
### 4. Read Operations (`test_read.py`) - CORE FEATURE
|
||||
|
||||
| Test | Operation | Expected |
|
||||
|------|-----------|----------|
|
||||
| `test_read_header_overlay` | Read + parse with mutagen | Tags match DB, not file |
|
||||
| `test_read_audio_passthrough` | Compare audio bytes | Identical to original after header |
|
||||
| `test_read_full_file` | Read entire file | Header from DB + audio from file |
|
||||
| `test_metadata_artist` | Check artist tag | DB value, not file value |
|
||||
| `test_metadata_title` | Check title tag | DB value, not file value |
|
||||
| `test_metadata_album` | Check album tag | DB value, not file value |
|
||||
| `test_metadata_genre` | Check genre tag | DB value, not file value |
|
||||
| `test_original_unchanged` | Read original file | Original metadata intact |
|
||||
|
||||
#### Metadata Overlay Verification Pattern
|
||||
|
||||
```python
|
||||
import mutagen.flac
|
||||
from io import BytesIO
|
||||
|
||||
def test_read_header_overlay(self):
|
||||
# Setup: Import file, modify DB metadata
|
||||
# beet import /path/to/file
|
||||
# beet modify artist="DB Artist" # File has "Original Artist"
|
||||
|
||||
# Read mounted file as bytes
|
||||
with open(os.path.join(self.mount_dir, 'DB Artist/...'), 'rb') as f:
|
||||
mounted_data = f.read()
|
||||
|
||||
# Parse with mutagen
|
||||
flac = mutagen.flac.FLAC(BytesIO(mounted_data))
|
||||
|
||||
# Verify overlay worked
|
||||
self.assertEqual(flac['artist'][0], 'DB Artist') # From DB
|
||||
self.assertNotEqual(flac['artist'][0], 'Original Artist') # Not from file
|
||||
```
|
||||
|
||||
### 5. Stat Operations (`test_stat.py`)
|
||||
|
||||
| Test | Operation | Expected |
|
||||
|------|-----------|----------|
|
||||
| `test_stat_file` | `os.stat(file)` | Valid stat with size, mtime |
|
||||
| `test_stat_directory` | `os.stat(dir)` | Directory mode (S_IFDIR) |
|
||||
| `test_statfs` | `os.statvfs(mount)` | Valid filesystem stats |
|
||||
| `test_access_read` | `os.access(file, R_OK)` | True |
|
||||
| `test_access_write` | `os.access(file, W_OK)` | True (header writable) |
|
||||
|
||||
### 6. Write Operations (`test_write.py`)
|
||||
|
||||
| Test | Operation | Expected |
|
||||
|------|-----------|----------|
|
||||
| `test_write_title` | Modify title in header | DB updated, file unchanged |
|
||||
| `test_write_artist` | Modify artist | DB updated |
|
||||
| `test_write_album` | Modify album | DB updated |
|
||||
| `test_write_genre` | Modify genre | DB updated |
|
||||
| `test_write_audio_discarded` | Write at offset > bound | Silently discarded |
|
||||
| `test_write_persistence` | Write -> unmount -> remount | Changes persisted in DB |
|
||||
| `test_write_mp3_noop` | Write to MP3 header | No error, but no effect (bound=0) |
|
||||
|
||||
### 7. Error Handling (`test_error_handling.py`)
|
||||
|
||||
| Test | Operation | Expected |
|
||||
|------|-----------|----------|
|
||||
| `test_enoent_file` | Read non-existent | `OSError(ENOENT)` |
|
||||
| `test_enoent_dir` | List non-existent | `OSError(ENOENT)` |
|
||||
| `test_eopnotsupp_mkdir` | `os.mkdir()` | `OSError(EOPNOTSUPP)` |
|
||||
| `test_eopnotsupp_unlink` | `os.unlink()` | `OSError(EOPNOTSUPP)` |
|
||||
| `test_eopnotsupp_rename` | `os.rename()` | `OSError(EOPNOTSUPP)` |
|
||||
| `test_eopnotsupp_symlink` | `os.symlink()` | `OSError(EOPNOTSUPP)` |
|
||||
|
||||
### 8. Edge Cases (`test_edge_cases.py`)
|
||||
|
||||
| Test | Operation | Expected |
|
||||
|------|-----------|----------|
|
||||
| `test_special_chars_sanitized` | Path with `?/` | Sanitized via `sanitize()` |
|
||||
| `test_concurrent_opens` | Open same file twice | `instance_count` increments |
|
||||
| `test_concurrent_release` | Release after double open | File stays cached until count=0 |
|
||||
| `test_unicode_metadata` | Non-ASCII in artist/title | Handled correctly |
|
||||
| `test_empty_metadata` | None/empty fields | Doesn't crash |
|
||||
| `test_mp3_no_interpolation` | Read MP3 | Returns original file (no overlay) |
|
||||
|
||||
### 9. Integration (`test_integration.py`)
|
||||
|
||||
| Test | Env Var | Expected |
|
||||
|------|---------|----------|
|
||||
| `test_real_album_listing` | `E2E=1` | Lists all 12 Metallica tracks |
|
||||
| `test_real_file_read` | `E2E=1` | Reads 67MB file successfully |
|
||||
| `test_memory_usage` | `E2E=1` | Documents but doesn't fail on high RAM |
|
||||
|
||||
---
|
||||
|
||||
## Test Infrastructure Code
|
||||
|
||||
### Base Test Class (Python 2.7 Compatible)
|
||||
|
||||
```python
|
||||
# tests/conftest.py
|
||||
import unittest
|
||||
import subprocess
|
||||
import tempfile
|
||||
import shutil
|
||||
import os
|
||||
import time
|
||||
import threading
|
||||
|
||||
class BeetFSTestCase(unittest.TestCase):
|
||||
"""Base class for beetfs e2e tests - Python 2.7 compatible"""
|
||||
|
||||
MOUNT_TIMEOUT = 30 # seconds
|
||||
|
||||
@classmethod
|
||||
def setUpClass(cls):
|
||||
"""Check FUSE availability"""
|
||||
try:
|
||||
with open(os.devnull, 'w') as devnull:
|
||||
subprocess.check_call(['which', 'fusermount'],
|
||||
stdout=devnull, stderr=devnull)
|
||||
except subprocess.CalledProcessError:
|
||||
raise unittest.SkipTest("fusermount not available")
|
||||
|
||||
def setUp(self):
|
||||
self.mount_dir = tempfile.mkdtemp(prefix='beetfs_test_')
|
||||
self.fs_process = None
|
||||
|
||||
def mount_beetfs(self, library_path=None):
|
||||
"""Mount beetfs in background with timeout"""
|
||||
cmd = ['python', '-c',
|
||||
'from beetsplug.beetFs import mount; mount()']
|
||||
# Add mount point and other args as needed
|
||||
|
||||
self.fs_process = subprocess.Popen(
|
||||
cmd,
|
||||
stdout=open(os.devnull, 'w'),
|
||||
stderr=subprocess.STDOUT
|
||||
)
|
||||
|
||||
# Python 2.7 timeout workaround
|
||||
timer = threading.Timer(self.MOUNT_TIMEOUT, self._timeout_kill)
|
||||
timer.start()
|
||||
|
||||
try:
|
||||
self._wait_for_mount()
|
||||
finally:
|
||||
timer.cancel()
|
||||
|
||||
def _timeout_kill(self):
|
||||
if self.fs_process and self.fs_process.poll() is None:
|
||||
self.fs_process.kill()
|
||||
|
||||
def _wait_for_mount(self):
|
||||
"""Wait for filesystem to be mounted"""
|
||||
start = time.time()
|
||||
while time.time() - start < self.MOUNT_TIMEOUT:
|
||||
if os.path.ismount(self.mount_dir):
|
||||
return
|
||||
if self.fs_process.poll() is not None:
|
||||
self.fail("Filesystem process terminated prematurely")
|
||||
time.sleep(0.1)
|
||||
self.fail("Mount timeout after {} seconds".format(self.MOUNT_TIMEOUT))
|
||||
|
||||
def tearDown(self):
|
||||
"""Cleanup: unmount and kill process"""
|
||||
if self.fs_process:
|
||||
with open(os.devnull, 'w') as devnull:
|
||||
subprocess.call(['fusermount', '-z', '-u', self.mount_dir],
|
||||
stdout=devnull, stderr=devnull)
|
||||
|
||||
self.fs_process.terminate()
|
||||
|
||||
# Wait for termination (Py2.7 compatible)
|
||||
start = time.time()
|
||||
while time.time() - start < 5:
|
||||
if self.fs_process.poll() is not None:
|
||||
break
|
||||
time.sleep(0.1)
|
||||
else:
|
||||
self.fs_process.kill()
|
||||
|
||||
shutil.rmtree(self.mount_dir, ignore_errors=True)
|
||||
```
|
||||
|
||||
### Synthetic FLAC Generator
|
||||
|
||||
```python
|
||||
# tests/conftest.py (continued)
|
||||
import subprocess
|
||||
import tempfile
|
||||
import os
|
||||
|
||||
def create_synthetic_flac(duration_sec=5, artist="Test Artist",
|
||||
title="Test Track", album="Test Album"):
|
||||
"""Create minimal FLAC with known metadata (~500KB for 5s silence)"""
|
||||
wav_fd, wav_path = tempfile.mkstemp(suffix='.wav')
|
||||
os.close(wav_fd)
|
||||
flac_path = wav_path.replace('.wav', '.flac')
|
||||
|
||||
try:
|
||||
# Generate silence WAV
|
||||
subprocess.check_call([
|
||||
'ffmpeg', '-f', 'lavfi', '-i',
|
||||
'anullsrc=r=44100:cl=stereo', '-t', str(duration_sec),
|
||||
'-y', wav_path
|
||||
], stdout=open(os.devnull, 'w'), stderr=subprocess.STDOUT)
|
||||
|
||||
# Convert to FLAC with metadata
|
||||
subprocess.check_call([
|
||||
'flac', '--best',
|
||||
'-T', 'ARTIST={}'.format(artist),
|
||||
'-T', 'TITLE={}'.format(title),
|
||||
'-T', 'ALBUM={}'.format(album),
|
||||
'-o', flac_path, wav_path
|
||||
], stdout=open(os.devnull, 'w'), stderr=subprocess.STDOUT)
|
||||
|
||||
return flac_path
|
||||
finally:
|
||||
if os.path.exists(wav_path):
|
||||
os.unlink(wav_path)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Dependencies to Add to flake.nix
|
||||
|
||||
```nix
|
||||
# In devShell buildInputs, add:
|
||||
pkgs.ffmpeg # For synthetic FLAC generation
|
||||
pkgs.flac # For FLAC encoding
|
||||
|
||||
# pythonEnv already has mutagen for verification
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Risks & Mitigations
|
||||
|
||||
| Risk | Impact | Mitigation |
|
||||
|------|--------|------------|
|
||||
| Memory explosion | High | Use 5-10MB synthetic FLACs, skip 650MB tests by default |
|
||||
| Nested methods bug | Critical | Tests will expose; fix required before other tests pass |
|
||||
| Python 2.7 EOL | Medium | Nix provides isolated environment |
|
||||
| Global state pollution | Medium | Fresh subprocess per test |
|
||||
| FUSE permissions | Low | Run as regular user, skip privileged tests |
|
||||
| Concurrent access | Low | Single-threaded mode, sequential tests |
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
1. **All smoke tests pass** - beetfs mounts and unmounts cleanly
|
||||
2. **Nested bug exposed and fixed** - All FUSE methods callable
|
||||
3. **Metadata overlay verified** - Reads return DB metadata, not file metadata
|
||||
4. **Writes update DB** - Metadata changes persist
|
||||
5. **Errors handled gracefully** - Correct errno for unsupported ops
|
||||
6. **No crashes on edge cases** - Unicode, special chars, concurrent access
|
||||
|
||||
---
|
||||
|
||||
## Findings from Test Execution
|
||||
|
||||
### Bug #1: Nested Methods (CRITICAL)
|
||||
|
||||
**Location**: `beetFs.py` lines 758-1144
|
||||
|
||||
**Problem**: All FUSE operation methods are indented inside the `access()` method, making them local functions instead of class methods.
|
||||
|
||||
**Evidence**:
|
||||
```python
|
||||
def access(self, path, flags): # Line 723 - correct class method
|
||||
...
|
||||
return 0
|
||||
|
||||
def readdir(self, path, ...): # Line 931 - WRONG! Nested inside access()
|
||||
...
|
||||
def open(self, path, flags): # Line 988 - Also nested
|
||||
...
|
||||
def read(self, path, ...): # Line 1077 - Also nested
|
||||
...
|
||||
```
|
||||
|
||||
**Symptom**: `os.listdir()` returns `OSError: [Errno 38] Function not implemented`
|
||||
|
||||
**Fix Required**: Dedent lines 758-1144 by 8 spaces to make them class methods.
|
||||
|
||||
### Bug #2: Directory Tree Building
|
||||
|
||||
**Location**: `beetFs.py` lines 403-414 (`FSNode.getnode()` and `FSNode.adddir()`)
|
||||
|
||||
**Problem**: When adding files to the directory structure, the code assumes parent directories already exist.
|
||||
|
||||
**Evidence**:
|
||||
```
|
||||
KeyError: u'Test Artist'
|
||||
File "beetFs.py", line 403, in getnode
|
||||
return self.getnode(elements, root=root.dirs[topdir])
|
||||
```
|
||||
|
||||
**Symptom**: Mount fails when library contains tracks.
|
||||
|
||||
### Bug #3: Unmount Not Clean
|
||||
|
||||
**Problem**: After unmounting, `os.path.ismount()` still returns `True`.
|
||||
|
||||
**Likely Cause**: FUSE process not terminating properly, or lazy unmount not completing.
|
||||
|
||||
---
|
||||
|
||||
## Notes from Oracle Review
|
||||
|
||||
1. **MP3 is not "readonly"** - metadata overlay is disabled (`bound=0`), but reads still work
|
||||
2. **Write returns None for MP3** - no explicit return in MP3 path (falls through)
|
||||
3. **Path format is hardcoded** - tests must match `$artist/$album ($year) [$format_upper]/$track - $artist - $title.$format`
|
||||
4. **basestring vs str** - use `isinstance(x, basestring)` for Py2.7 string checks
|
||||
5. **Global variables** - `library`, `directory_structure` must be reset between tests (use subprocesses)
|
||||
@@ -0,0 +1,249 @@
|
||||
# beetfs Feature Set
|
||||
|
||||
## Overview
|
||||
|
||||
beetfs is a FUSE filesystem plugin for [beets](https://beets.io/) that presents your music library as a virtual filesystem organized by metadata. Files appear with paths derived from their database metadata, and reading file headers returns metadata from the beets database rather than the actual file tags.
|
||||
|
||||
**Author**: Martin Eve (2010)
|
||||
**License**: GPLv3
|
||||
**Python**: 2.7 (uses fuse-python)
|
||||
|
||||
## Core Features
|
||||
|
||||
### 1. Virtual Metadata-Based Directory Structure
|
||||
|
||||
Files are presented in a configurable path format based on beets database fields:
|
||||
|
||||
```
|
||||
$artist/$album ($year) [$format_upper]/$track - $artist - $title.$format
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```
|
||||
/mnt/beetfs/
|
||||
├── Metallica/
|
||||
│ └── 72 Seasons (2023) [FLAC]/
|
||||
│ ├── 01 - Metallica - 72 Seasons.flac
|
||||
│ ├── 02 - Metallica - Shadows Follow.flac
|
||||
│ └── ...
|
||||
├── Pink Floyd/
|
||||
│ └── The Dark Side of the Moon (1973) [FLAC]/
|
||||
│ └── ...
|
||||
```
|
||||
|
||||
**Available template variables**:
|
||||
- `$artist`, `$album`, `$title`, `$genre`, `$composer`, `$grouping`
|
||||
- `$year`, `$month`, `$day`
|
||||
- `$track`, `$tracktotal`, `$disc`, `$disctotal`
|
||||
- `$format`, `$format_upper` (file extension)
|
||||
- `$lyrics`, `$comments`, `$bpm`, `$comp`
|
||||
|
||||
### 2. Metadata Overlay (Read)
|
||||
|
||||
When you read a file through beetfs, the **metadata header is synthesized from the beets database**, not read from the actual file on disk.
|
||||
|
||||
**How it works**:
|
||||
1. Open file → beetfs reads the real file from disk
|
||||
2. Parse the audio format header (FLAC/MP3)
|
||||
3. Replace metadata fields with values from beets database
|
||||
4. Return synthesized header + original audio data
|
||||
|
||||
**Supported fields for overlay**:
|
||||
- `title`, `artist`, `album`, `genre` (FLAC only currently)
|
||||
|
||||
**Use case**: Your files may have inconsistent or wrong tags, but beetfs presents them with the corrected metadata from your beets library.
|
||||
|
||||
### 3. Metadata Passthrough (Write)
|
||||
|
||||
When you write to file headers through beetfs, the **changes are saved to the beets database**, not to the actual file.
|
||||
|
||||
**How it works**:
|
||||
1. Application writes new metadata to file header region
|
||||
2. beetfs intercepts the write
|
||||
3. Parses the new metadata values
|
||||
4. Updates the beets database (`lib.store()`, `lib.save()`)
|
||||
5. Regenerates the synthesized header
|
||||
|
||||
**Result**: Tag editors (Picard, Kid3, etc.) can edit metadata through beetfs, and changes persist in the beets database without modifying the original files.
|
||||
|
||||
### 4. Format Support
|
||||
|
||||
| Format | Read | Metadata Overlay | Write to DB |
|
||||
|--------|------|------------------|-------------|
|
||||
| FLAC | ✅ | ✅ Full | ✅ |
|
||||
| MP3 | ✅ | ❌ Disabled | ❌ |
|
||||
| Other | ❌ | ❌ | ❌ |
|
||||
|
||||
**FLAC Implementation**:
|
||||
- Uses `InterpolatedFLAC` class extending mutagen
|
||||
- Reconstructs Vorbis comment block with DB values
|
||||
- Preserves audio data and other metadata blocks
|
||||
|
||||
**MP3 Implementation**:
|
||||
- Passthrough only (no interpolation)
|
||||
- `self.bound = 0` disables header replacement
|
||||
|
||||
### 5. File Caching
|
||||
|
||||
Open files are cached in `FileHandler` objects:
|
||||
|
||||
- First open: Load entire file into memory, parse headers
|
||||
- Subsequent opens: Reuse cached `FileHandler`
|
||||
- Reference counting for multiple opens
|
||||
- Release when reference count reaches zero
|
||||
|
||||
**Memory impact**: Each open file consumes ~filesize RAM.
|
||||
|
||||
## FUSE Operations
|
||||
|
||||
### Implemented (Functional)
|
||||
|
||||
| Operation | Description |
|
||||
|-----------|-------------|
|
||||
| `getattr` | File/directory stat (size, mode, timestamps) |
|
||||
| `access` | Permission checking |
|
||||
| `opendir` | Open directory for listing |
|
||||
| `readdir` | List directory contents |
|
||||
| `releasedir` | Close directory |
|
||||
| `open` | Open file for reading/writing |
|
||||
| `read` | Read file contents |
|
||||
| `write` | Write to file (header region only) |
|
||||
| `release` | Close file |
|
||||
| `fgetattr` | Stat with file handle |
|
||||
| `statfs` | Filesystem statistics |
|
||||
|
||||
### Not Implemented (Return EOPNOTSUPP)
|
||||
|
||||
| Operation | Reason |
|
||||
|-----------|--------|
|
||||
| `create` | Read-only structure |
|
||||
| `mknod` | Read-only structure |
|
||||
| `mkdir` | Read-only structure |
|
||||
| `unlink` | Read-only structure |
|
||||
| `rmdir` | Read-only structure |
|
||||
| `symlink` | Not needed |
|
||||
| `link` | Not needed |
|
||||
| `rename` | Would break DB consistency |
|
||||
| `chmod` | Metadata-only FS |
|
||||
| `chown` | Metadata-only FS |
|
||||
| `truncate` | Would corrupt audio |
|
||||
| `utime` | Metadata-only FS |
|
||||
|
||||
## Usage
|
||||
|
||||
### Mount
|
||||
|
||||
```bash
|
||||
beet mount /mnt/beetfs
|
||||
```
|
||||
|
||||
### Unmount
|
||||
|
||||
```bash
|
||||
fusermount -u /mnt/beetfs
|
||||
```
|
||||
|
||||
### Example Session
|
||||
|
||||
```bash
|
||||
# Mount the filesystem
|
||||
beet mount /mnt/music
|
||||
|
||||
# Browse by artist
|
||||
ls /mnt/music/
|
||||
# Metallica/ Pink Floyd/ The Beatles/ ...
|
||||
|
||||
# List an album
|
||||
ls "/mnt/music/Metallica/72 Seasons (2023) [FLAC]/"
|
||||
# 01 - Metallica - 72 Seasons.flac
|
||||
# 02 - Metallica - Shadows Follow.flac
|
||||
# ...
|
||||
|
||||
# Play through any music player
|
||||
mpv "/mnt/music/Metallica/72 Seasons (2023) [FLAC]/01 - Metallica - 72 Seasons.flac"
|
||||
|
||||
# Edit tags (changes go to beets DB)
|
||||
kid3 "/mnt/music/Metallica/72 Seasons (2023) [FLAC]/"
|
||||
|
||||
# Unmount
|
||||
fusermount -u /mnt/music
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ User Applications │
|
||||
│ (mpv, Rhythmbox, Kid3, etc.) │
|
||||
└─────────────────────────┬───────────────────────────────────┘
|
||||
│ POSIX calls (open, read, write)
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Linux Kernel │
|
||||
│ FUSE module │
|
||||
└─────────────────────────┬───────────────────────────────────┘
|
||||
│ /dev/fuse
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ beetfs │
|
||||
│ ┌─────────────┐ ┌──────────────┐ ┌───────────────────┐ │
|
||||
│ │ FSNode Tree │ │ FileHandler │ │ InterpolatedFLAC │ │
|
||||
│ │ (in-memory) │ │ (cache) │ │ (header synth) │ │
|
||||
│ └─────────────┘ └──────────────┘ └───────────────────┘ │
|
||||
└────────┬────────────────┬───────────────────┬───────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Beets DB │ │ Real Files │ │ Mutagen │
|
||||
│ (SQLite) │ │ (on disk) │ │ (parsing) │
|
||||
└─────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
### Current Bugs (Non-Functional)
|
||||
|
||||
1. **Nested Methods Bug**: Lines 758-1144 are indented inside `access()`, making FUSE operations unreachable
|
||||
2. **Directory Tree Bug**: `FSNode.adddir()` crashes when building tree for non-empty library
|
||||
|
||||
### Design Limitations
|
||||
|
||||
1. **Memory Usage**: Entire file loaded into RAM on open
|
||||
2. **Mount Time**: O(N) - loads all library items at mount
|
||||
3. **No Lazy Loading**: Full directory tree built upfront
|
||||
4. **Single Format**: Only FLAC has full metadata overlay
|
||||
5. **No Real File Modification**: Writes only update DB, not actual files
|
||||
6. **Python 2.7 GIL**: Single-threaded performance
|
||||
|
||||
### Not Supported
|
||||
|
||||
- Creating/deleting files or directories
|
||||
- Moving/renaming files
|
||||
- Modifying audio content
|
||||
- Album art / embedded images
|
||||
- Multi-value tags
|
||||
- Non-ASCII in some edge cases
|
||||
|
||||
## Configuration
|
||||
|
||||
Currently hardcoded. Potential configuration points:
|
||||
|
||||
| Setting | Current Value | Description |
|
||||
|---------|---------------|-------------|
|
||||
| `PATH_FORMAT` | `$artist/$album ($year)...` | Directory structure template |
|
||||
| `METADATA_RW_FIELDS` | 17 fields | Fields available for read/write |
|
||||
| Caching | Always on | FileHandler caching behavior |
|
||||
| Threading | Disabled | `multithreaded = 0` |
|
||||
|
||||
## Dependencies
|
||||
|
||||
- Python 2.7
|
||||
- fuse-python
|
||||
- beets 1.4.x
|
||||
- mutagen (FLAC/MP3 parsing)
|
||||
|
||||
## See Also
|
||||
|
||||
- [e2e-test-plan.md](e2e-test-plan.md) - Test strategy and bug documentation
|
||||
- [benchmark-plan.md](benchmark-plan.md) - Performance measurement methodology
|
||||
- [benchmark-results.md](benchmark-results.md) - Current benchmark status
|
||||
@@ -0,0 +1,459 @@
|
||||
# beetfs Modernization Guide
|
||||
|
||||
## Current State Analysis
|
||||
|
||||
### Technical Debt
|
||||
|
||||
| Issue | Severity | Location |
|
||||
|-------|----------|----------|
|
||||
| Python 2 syntax | 🔴 Critical | Throughout |
|
||||
| fuse-python (deprecated) | 🔴 Critical | Lines 25, 51 |
|
||||
| `basestring` usage | 🔴 Critical | Line 89 |
|
||||
| `reduce` without import | 🟡 Medium | Line 197 |
|
||||
| `0755` octal syntax | 🟡 Medium | Lines 654, 700 |
|
||||
| `print` as statement | 🟡 Medium | N/A (not used) |
|
||||
| `except Exception, e` | 🔴 Critical | Line 181 |
|
||||
| Long integers (`0L`) | 🟡 Medium | Line 197 |
|
||||
| Global state | 🟡 Medium | Lines 125-140 |
|
||||
| Memory-heavy design | 🟡 Medium | Line 481 |
|
||||
|
||||
### Dependencies to Update
|
||||
|
||||
| Original | Replacement | Notes |
|
||||
|----------|-------------|-------|
|
||||
| `fuse-python` | `pyfuse3` or `llfuse` | Modern FUSE bindings |
|
||||
| `beets` (old API) | `beets >= 1.6` | Check API compatibility |
|
||||
| `mutagen` | `mutagen >= 1.45` | Mostly compatible |
|
||||
| Python 2.7 | Python 3.9+ | Full migration needed |
|
||||
|
||||
---
|
||||
|
||||
## Migration Steps
|
||||
|
||||
### Phase 1: Python 3 Compatibility
|
||||
|
||||
#### 1.1 Fix Syntax Issues
|
||||
|
||||
```python
|
||||
# BEFORE (Python 2)
|
||||
except fuse.FuseError, e:
|
||||
log.error(str(e))
|
||||
|
||||
# AFTER (Python 3)
|
||||
except fuse.FuseError as e:
|
||||
log.error(str(e))
|
||||
```
|
||||
|
||||
```python
|
||||
# BEFORE
|
||||
if isinstance(value, basestring):
|
||||
|
||||
# AFTER
|
||||
if isinstance(value, str):
|
||||
```
|
||||
|
||||
```python
|
||||
# BEFORE
|
||||
return reduce(lambda a, b: (a << 8) + ord(b), string, 0L)
|
||||
|
||||
# AFTER
|
||||
from functools import reduce
|
||||
return reduce(lambda a, b: (a << 8) + b, string, 0)
|
||||
```
|
||||
|
||||
```python
|
||||
# BEFORE
|
||||
mode = stat.S_IFDIR | 0755
|
||||
|
||||
# AFTER
|
||||
mode = stat.S_IFDIR | 0o755
|
||||
```
|
||||
|
||||
#### 1.2 Fix String/Bytes Handling
|
||||
|
||||
```python
|
||||
# BEFORE - implicit string/bytes mixing
|
||||
self.header = self.inf.get_header(self.real_path)
|
||||
return self.header[offset:offset+size]
|
||||
|
||||
# AFTER - explicit bytes handling
|
||||
self.header: bytes = self.inf.get_header(self.real_path)
|
||||
return self.header[offset:offset+size]
|
||||
```
|
||||
|
||||
```python
|
||||
# BEFORE
|
||||
self.item.title = str(self.inf["title"][0]).encode('utf-8')
|
||||
|
||||
# AFTER
|
||||
self.item.title = self.inf["title"][0] # Already str in Python 3
|
||||
```
|
||||
|
||||
#### 1.3 Fix Dictionary Methods
|
||||
|
||||
```python
|
||||
# BEFORE
|
||||
return node.dirs.keys()
|
||||
|
||||
# AFTER
|
||||
return list(node.dirs.keys()) # If list is needed
|
||||
# or just
|
||||
return node.dirs.keys() # If iteration is sufficient
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: FUSE Library Migration
|
||||
|
||||
#### Option A: pyfuse3 (Recommended)
|
||||
|
||||
Modern, async-capable FUSE bindings.
|
||||
|
||||
```python
|
||||
# BEFORE (fuse-python)
|
||||
import fuse
|
||||
fuse.fuse_python_api = (0, 2)
|
||||
|
||||
class beetFileSystem(fuse.Fuse):
|
||||
def read(self, path, size, offset):
|
||||
return data
|
||||
|
||||
# AFTER (pyfuse3)
|
||||
import pyfuse3
|
||||
import trio
|
||||
|
||||
class BeetFS(pyfuse3.Operations):
|
||||
async def read(self, fh, offset, size):
|
||||
return data
|
||||
|
||||
async def main():
|
||||
fs = BeetFS()
|
||||
fuse_options = set(pyfuse3.default_options)
|
||||
fuse_options.add('fsname=beetfs')
|
||||
pyfuse3.init(fs, mountpoint, fuse_options)
|
||||
try:
|
||||
await pyfuse3.main()
|
||||
finally:
|
||||
pyfuse3.close()
|
||||
|
||||
trio.run(main)
|
||||
```
|
||||
|
||||
**Key Differences**:
|
||||
| fuse-python | pyfuse3 |
|
||||
|-------------|---------|
|
||||
| `read(path, size, offset)` | `read(fh, offset, size)` |
|
||||
| Synchronous | Async (trio) |
|
||||
| Return data directly | Return bytes |
|
||||
| Path-based | File handle based |
|
||||
|
||||
#### Option B: llfuse (Alternative)
|
||||
|
||||
Lower-level, synchronous.
|
||||
|
||||
```python
|
||||
import llfuse
|
||||
|
||||
class BeetFS(llfuse.Operations):
|
||||
def read(self, fh, offset, size):
|
||||
return data
|
||||
|
||||
def main():
|
||||
fs = BeetFS()
|
||||
llfuse.init(fs, mountpoint, options)
|
||||
try:
|
||||
llfuse.main()
|
||||
finally:
|
||||
llfuse.close()
|
||||
```
|
||||
|
||||
#### Option C: fusepy (Simple)
|
||||
|
||||
Simple wrapper, but less maintained.
|
||||
|
||||
```python
|
||||
from fuse import FUSE, Operations
|
||||
|
||||
class BeetFS(Operations):
|
||||
def read(self, path, size, offset, fh):
|
||||
return data
|
||||
|
||||
FUSE(BeetFS(), mountpoint, foreground=True)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: Architecture Improvements
|
||||
|
||||
#### 3.1 Remove Global State
|
||||
|
||||
```python
|
||||
# BEFORE - Global variables
|
||||
global structure_split
|
||||
global structure_depth
|
||||
global library
|
||||
global directory_structure
|
||||
|
||||
# AFTER - Instance variables
|
||||
class BeetFS:
|
||||
def __init__(self, lib: Library, path_format: str):
|
||||
self.lib = lib
|
||||
self.path_format = path_format
|
||||
self.structure_split = path_format.split("/")
|
||||
self.structure_depth = len(self.structure_split)
|
||||
self.directory_structure = FSNode({}, {})
|
||||
self._build_tree()
|
||||
```
|
||||
|
||||
#### 3.2 Reduce Memory Usage
|
||||
|
||||
```python
|
||||
# BEFORE - Load entire audio into memory
|
||||
self.music_data = self.file_object.read() # Could be 100MB+
|
||||
|
||||
# AFTER - Lazy loading with mmap or seek
|
||||
class FileHandler:
|
||||
def __init__(self, path, lib):
|
||||
self.real_path = self._resolve_path(path)
|
||||
self.file_object = open(self.real_path, 'rb')
|
||||
self._header = None # Lazy load
|
||||
self._music_offset = None
|
||||
|
||||
@property
|
||||
def header(self) -> bytes:
|
||||
if self._header is None:
|
||||
self._header = self._generate_header()
|
||||
return self._header
|
||||
|
||||
def read(self, size: int, offset: int) -> bytes:
|
||||
if offset < len(self.header):
|
||||
# Header region - return from generated header
|
||||
if offset + size <= len(self.header):
|
||||
return self.header[offset:offset+size]
|
||||
else:
|
||||
# Span header and audio
|
||||
header_part = self.header[offset:]
|
||||
audio_offset = 0
|
||||
audio_size = size - len(header_part)
|
||||
audio_part = self._read_audio(audio_offset, audio_size)
|
||||
return header_part + audio_part
|
||||
else:
|
||||
# Audio region - read directly from file
|
||||
audio_offset = offset - len(self.header)
|
||||
return self._read_audio(audio_offset, size)
|
||||
|
||||
def _read_audio(self, offset: int, size: int) -> bytes:
|
||||
self.file_object.seek(self._music_offset + offset)
|
||||
return self.file_object.read(size)
|
||||
```
|
||||
|
||||
#### 3.3 Add Type Hints
|
||||
|
||||
```python
|
||||
from typing import Dict, List, Optional, Tuple
|
||||
from pathlib import Path
|
||||
|
||||
class FSNode:
|
||||
def __init__(self, dirs: Dict[str, 'FSNode'], files: Dict[str, int]):
|
||||
self.dirs: Dict[str, FSNode] = dirs
|
||||
self.files: Dict[str, int] = files
|
||||
|
||||
def getnode(self, elements: List[str], root: Optional['FSNode'] = None) -> 'FSNode':
|
||||
...
|
||||
|
||||
def addfile(self, elements: List[str], filename: str, item_id: int) -> None:
|
||||
...
|
||||
```
|
||||
|
||||
#### 3.4 Add MP3 Support
|
||||
|
||||
```python
|
||||
class FileHandler:
|
||||
def __init__(self, path: str, lib: Library):
|
||||
self.format = Path(path).suffix[1:].lower()
|
||||
|
||||
if self.format == "flac":
|
||||
self._handler = FLACHandler(self.real_path, self.item)
|
||||
elif self.format == "mp3":
|
||||
self._handler = MP3Handler(self.real_path, self.item)
|
||||
elif self.format in ("ogg", "opus"):
|
||||
self._handler = OggHandler(self.real_path, self.item)
|
||||
else:
|
||||
raise UnsupportedFormatError(f"Format {self.format} not supported")
|
||||
|
||||
class FLACHandler:
|
||||
def generate_header(self, item: Item) -> bytes:
|
||||
inf = InterpolatedFLAC(self.file_data)
|
||||
inf["title"] = item.title
|
||||
inf["album"] = item.album
|
||||
inf["artist"] = item.artist
|
||||
inf["genre"] = item.genre
|
||||
return inf.get_header()
|
||||
|
||||
class MP3Handler:
|
||||
def generate_header(self, item: Item) -> bytes:
|
||||
# Implement ID3v2 header generation
|
||||
id3 = InterpolatedID3()
|
||||
id3.add(TIT2(encoding=3, text=item.title))
|
||||
id3.add(TPE1(encoding=3, text=item.artist))
|
||||
id3.add(TALB(encoding=3, text=item.album))
|
||||
id3.add(TCON(encoding=3, text=item.genre))
|
||||
|
||||
# Calculate padding to match original header size
|
||||
...
|
||||
return id3.render()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Testing
|
||||
|
||||
#### 4.1 Unit Tests
|
||||
|
||||
```python
|
||||
import pytest
|
||||
from beetfs import FSNode, FileHandler
|
||||
|
||||
class TestFSNode:
|
||||
def test_adddir(self):
|
||||
root = FSNode({}, {})
|
||||
root.adddir([], "Artist")
|
||||
assert "Artist" in root.dirs
|
||||
|
||||
def test_addfile(self):
|
||||
root = FSNode({}, {})
|
||||
root.adddir([], "Artist")
|
||||
root.addfile(["Artist"], "track.flac", 42)
|
||||
assert root.dirs["Artist"].files["track.flac"] == 42
|
||||
|
||||
def test_getnode(self):
|
||||
root = FSNode({}, {})
|
||||
root.adddir([], "Artist")
|
||||
root.adddir(["Artist"], "Album")
|
||||
node = root.getnode(["Artist", "Album"])
|
||||
assert node is not None
|
||||
|
||||
class TestFileHandler:
|
||||
def test_read_header(self, mock_flac_file, mock_beets_item):
|
||||
handler = FileHandler("/Artist/Album/track.flac", mock_lib)
|
||||
data = handler.read(100, 0)
|
||||
assert data.startswith(b"fLaC")
|
||||
|
||||
def test_read_audio(self, mock_flac_file, mock_beets_item):
|
||||
handler = FileHandler("/Artist/Album/track.flac", mock_lib)
|
||||
data = handler.read(100, handler.bound + 100)
|
||||
# Should be audio data from original file
|
||||
assert data == mock_flac_file.audio_data[100:200]
|
||||
```
|
||||
|
||||
#### 4.2 Integration Tests
|
||||
|
||||
```python
|
||||
import subprocess
|
||||
import tempfile
|
||||
import os
|
||||
|
||||
class TestFUSEMount:
|
||||
def test_mount_unmount(self, beets_library):
|
||||
with tempfile.TemporaryDirectory() as mountpoint:
|
||||
# Mount
|
||||
proc = subprocess.Popen(
|
||||
["beet", "mount", mountpoint],
|
||||
stdout=subprocess.PIPE
|
||||
)
|
||||
time.sleep(1)
|
||||
|
||||
# Verify mount
|
||||
assert os.path.ismount(mountpoint)
|
||||
|
||||
# List files
|
||||
files = os.listdir(mountpoint)
|
||||
assert len(files) > 0
|
||||
|
||||
# Unmount
|
||||
subprocess.run(["fusermount", "-u", mountpoint])
|
||||
proc.wait()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: Standalone Mode (Optional)
|
||||
|
||||
Remove beets dependency for use as standalone metadata overlay.
|
||||
|
||||
```python
|
||||
class StandaloneFS:
|
||||
"""Metadata overlay without beets dependency."""
|
||||
|
||||
def __init__(self,
|
||||
source_dir: Path,
|
||||
metadata_db: Path,
|
||||
path_format: str):
|
||||
self.source_dir = source_dir
|
||||
self.db = sqlite3.connect(metadata_db)
|
||||
self.path_format = path_format
|
||||
self._build_tree()
|
||||
|
||||
def _build_tree(self):
|
||||
"""Build virtual tree from source directory and metadata DB."""
|
||||
for audio_file in self.source_dir.rglob("*.flac"):
|
||||
# Get metadata from DB or scan file
|
||||
metadata = self._get_metadata(audio_file)
|
||||
# Build virtual path from template
|
||||
virtual_path = self._format_path(metadata)
|
||||
# Add to tree
|
||||
self.directory_structure.addfile(
|
||||
virtual_path.parent.parts,
|
||||
virtual_path.name,
|
||||
str(audio_file) # Store actual path instead of ID
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommended Migration Order
|
||||
|
||||
```
|
||||
1. [ ] Fork and set up development environment
|
||||
2. [ ] Add type hints throughout (helps catch issues)
|
||||
3. [ ] Fix Python 3 syntax issues
|
||||
4. [ ] Replace fuse-python with pyfuse3/llfuse
|
||||
5. [ ] Add unit tests for FSNode and FileHandler
|
||||
6. [ ] Refactor global state to instance variables
|
||||
7. [ ] Implement lazy loading for audio data
|
||||
8. [ ] Add MP3 support
|
||||
9. [ ] Add integration tests
|
||||
10. [ ] Optional: Create standalone mode
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Estimated Effort
|
||||
|
||||
| Phase | Effort | Risk |
|
||||
|-------|--------|------|
|
||||
| Phase 1 (Python 3) | 2-3 days | Low |
|
||||
| Phase 2 (FUSE migration) | 3-5 days | Medium |
|
||||
| Phase 3 (Architecture) | 3-5 days | Medium |
|
||||
| Phase 4 (Testing) | 2-3 days | Low |
|
||||
| Phase 5 (Standalone) | 3-5 days | Medium |
|
||||
| **Total** | **13-21 days** | |
|
||||
|
||||
---
|
||||
|
||||
## Alternative: Rewrite from Scratch
|
||||
|
||||
Given the age of the codebase, a rewrite might be more efficient:
|
||||
|
||||
**Pros of Rewrite**:
|
||||
- Clean architecture from start
|
||||
- Modern async design
|
||||
- Better memory management
|
||||
- Easier to test
|
||||
|
||||
**Cons of Rewrite**:
|
||||
- More initial effort
|
||||
- Risk of missing edge cases
|
||||
- Need to re-discover FLAC/ID3 intricacies
|
||||
|
||||
**Recommended Approach**: Start with Phase 1-2 to understand the code deeply, then decide whether to continue refactoring or rewrite.
|
||||
@@ -0,0 +1,451 @@
|
||||
# Rust Migration Analysis for beetfs
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Migrating beetfs from Python to Rust is **strongly recommended** based on research findings. Expected improvements:
|
||||
|
||||
| Metric | Python (Current) | Rust (Expected) | Improvement |
|
||||
|--------|------------------|-----------------|-------------|
|
||||
| **Memory per file** | ~280 bytes overhead | ~60 bytes | **4-5x reduction** |
|
||||
| **File open latency** | 200-500ms | 20-50ms | **10x faster** |
|
||||
| **Read latency** | 5-10ms | 0.5-2ms | **5-10x faster** |
|
||||
| **Concurrent opens** | ~1,000 (threading) | ~100,000+ (Tokio) | **100x more** |
|
||||
| **GC pauses** | 50-2200ms | 0ms | **Eliminated** |
|
||||
|
||||
---
|
||||
|
||||
## 1. Rust FUSE Ecosystem
|
||||
|
||||
### Recommended: **fuser**
|
||||
|
||||
| Attribute | Value |
|
||||
|-----------|-------|
|
||||
| **Downloads** | 3.2M+ |
|
||||
| **Maturity** | Production-ready |
|
||||
| **Platforms** | Linux, macOS, FreeBSD |
|
||||
| **Async** | Experimental (stable sync API) |
|
||||
| **Used by** | AWS Mountpoint for S3 |
|
||||
|
||||
**API Example:**
|
||||
```rust
|
||||
use fuser::{Filesystem, Request, ReplyData};
|
||||
|
||||
impl Filesystem for BeetFS {
|
||||
fn read(&self, _req: &Request, ino: u64, _fh: u64,
|
||||
offset: i64, size: u32, _flags: i32,
|
||||
_lock: Option<u64>, reply: ReplyData) {
|
||||
|
||||
let file = self.get_file(ino);
|
||||
|
||||
if offset < file.header_len {
|
||||
// Return metadata from database (interpolated)
|
||||
reply.data(&file.header[offset as usize..]);
|
||||
} else {
|
||||
// Return audio from original file (zero-copy via mmap)
|
||||
let audio_offset = offset - file.header_len;
|
||||
reply.data(&file.mmap[audio_offset as usize..]);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Alternatives
|
||||
|
||||
| Library | Async | Maturity | Best For |
|
||||
|---------|-------|----------|----------|
|
||||
| **fuser** | Experimental | ⭐⭐⭐⭐⭐ | General purpose |
|
||||
| **fuse3** | Native | ⭐⭐⭐⭐ | Async-heavy, Linux-only |
|
||||
| **polyfuse** | Native | ⭐⭐⭐ | Custom control flow |
|
||||
|
||||
---
|
||||
|
||||
## 2. Rust Audio Metadata: **lofty**
|
||||
|
||||
Full feature parity with Python's mutagen:
|
||||
|
||||
| Feature | mutagen (Python) | lofty (Rust) |
|
||||
|---------|------------------|--------------|
|
||||
| FLAC Vorbis Comments | ✅ | ✅ |
|
||||
| MP3 ID3v2 (all versions) | ✅ | ✅ |
|
||||
| OGG Vorbis Comments | ✅ | ✅ |
|
||||
| Opus metadata | ✅ | ✅ |
|
||||
| In-memory manipulation | ✅ | ✅ |
|
||||
| Header generation | ✅ | ✅ `dump_to()` |
|
||||
| Picture/artwork | ✅ | ✅ |
|
||||
|
||||
**API Comparison:**
|
||||
```python
|
||||
# Python mutagen
|
||||
audio = mutagen.File("song.flac")
|
||||
audio['artist'] = 'New Artist'
|
||||
audio['title'] = 'New Title'
|
||||
audio.save()
|
||||
```
|
||||
|
||||
```rust
|
||||
// Rust lofty
|
||||
let mut file = lofty::read_from_path("song.flac")?;
|
||||
let tag = file.primary_tag_mut().unwrap();
|
||||
tag.set_artist("New Artist".to_string());
|
||||
tag.set_title("New Title".to_string());
|
||||
tag.save_to_path("song.flac", WriteOptions::default())?;
|
||||
```
|
||||
|
||||
**Header Generation (Critical for beetfs):**
|
||||
```rust
|
||||
// Generate FLAC header with modified tags WITHOUT writing to file
|
||||
let mut buffer = Vec::new();
|
||||
tag.dump_to(&mut buffer, WriteOptions::default())?;
|
||||
// `buffer` contains serialized metadata header
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Memory Benefits
|
||||
|
||||
### Python Object Overhead
|
||||
|
||||
| Python Type | Size | Notes |
|
||||
|-------------|------|-------|
|
||||
| Empty dict | 232 bytes | Base overhead |
|
||||
| Dict entry | +184 bytes | Per key-value |
|
||||
| Empty string | 49 bytes | Base overhead |
|
||||
| Empty list | 56 bytes | Base overhead |
|
||||
| Small int | 28 bytes | Even for `0` |
|
||||
|
||||
**Current beetfs FileHandler (Python):**
|
||||
```
|
||||
self.path → str → 49 + len(path) bytes
|
||||
self.real_path → str → 49 + len(path) bytes
|
||||
self.item → dict → 232 + entries
|
||||
self.header → bytes → 33 + len(header)
|
||||
self.music_data → bytes → 33 + len(audio) ← CRITICAL: full file!
|
||||
self.inf → object → 100+ bytes
|
||||
─────────────────────────────────────────
|
||||
TOTAL: ~500 bytes + entire file in RAM
|
||||
```
|
||||
|
||||
### Rust Struct Efficiency
|
||||
|
||||
```rust
|
||||
struct FileHandler {
|
||||
path: PathBuf, // 24 bytes (ptr+len+cap)
|
||||
real_path: PathBuf, // 24 bytes
|
||||
item_id: u64, // 8 bytes
|
||||
header: Vec<u8>, // 24 bytes (ptr+len+cap) + header data
|
||||
mmap: Mmap, // 24 bytes (NO file data in RAM!)
|
||||
header_len: u64, // 8 bytes
|
||||
audio_offset: u64, // 8 bytes
|
||||
}
|
||||
// TOTAL: ~120 bytes + header only (audio via mmap)
|
||||
```
|
||||
|
||||
### Memory Comparison
|
||||
|
||||
| Scenario | Python | Rust | Savings |
|
||||
|----------|--------|------|---------|
|
||||
| 1 file (50MB) | ~50 MB | ~64 KB | **780x** |
|
||||
| 10 files (50MB each) | ~500 MB | ~640 KB | **780x** |
|
||||
| 100 files (50MB each) | ~5 GB | ~6.4 MB | **780x** |
|
||||
| Library scan (1000 files) | **OOM** | ~64 MB | ∞ |
|
||||
|
||||
**Key insight**: Rust can use memory-mapped files (`mmap`) to serve audio data with zero copies, eliminating the need to load files into RAM.
|
||||
|
||||
---
|
||||
|
||||
## 4. Latency Benefits
|
||||
|
||||
### Python FUSE Bottlenecks
|
||||
|
||||
1. **Dict-to-struct conversion**: Every FUSE callback requires converting Python dicts to C structs
|
||||
2. **GIL contention**: Single-threaded execution despite multi-core CPUs
|
||||
3. **GC pauses**: Stop-the-world pauses of 50-2200ms under load
|
||||
4. **Object allocation**: Creating Python objects for every I/O operation
|
||||
|
||||
### Rust FUSE Advantages
|
||||
|
||||
1. **Zero-cost abstractions**: No runtime overhead for type conversions
|
||||
2. **No GIL**: True parallelism across all cores
|
||||
3. **No GC**: Deterministic memory management, no pauses
|
||||
4. **Stack allocation**: Small objects allocated on stack, not heap
|
||||
|
||||
### Benchmark Data
|
||||
|
||||
| Operation | Python FUSE | Rust FUSE | Improvement |
|
||||
|-----------|-------------|-----------|-------------|
|
||||
| File stat | 5-10ms | 0.5-1ms | **10x** |
|
||||
| Small read | 5-10ms | 0.5-2ms | **5-10x** |
|
||||
| Large read | 115 MB/s | 260+ MB/s | **2-3x** |
|
||||
| Metadata lookup | 10ms | <1ms | **10x** |
|
||||
|
||||
### GC Pause Elimination
|
||||
|
||||
```
|
||||
Python GC Pauses (measured):
|
||||
├── P50: ~10ms
|
||||
├── P95: ~50ms
|
||||
├── P99: ~320ms
|
||||
└── Max: ~2200ms (!)
|
||||
|
||||
Rust (no GC):
|
||||
├── P50: ~0.5ms
|
||||
├── P95: ~1ms
|
||||
├── P99: ~2ms
|
||||
└── Max: ~5ms (deterministic)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Concurrency Benefits
|
||||
|
||||
### Python Threading Limitations
|
||||
|
||||
```python
|
||||
# Python (current beetfs)
|
||||
server.multithreaded = 0 # Single-threaded!
|
||||
|
||||
# Even with threading enabled:
|
||||
# - GIL prevents true parallelism
|
||||
# - ~8MB per thread
|
||||
# - OS limits: ~1000-2000 threads max
|
||||
# - Context switch: 1-10μs (kernel)
|
||||
```
|
||||
|
||||
### Rust Async (Tokio)
|
||||
|
||||
```rust
|
||||
// Rust with Tokio
|
||||
#[tokio::main]
|
||||
async fn main() {
|
||||
// Can handle 100K+ concurrent operations
|
||||
// - ~2KB per task (4000x less than thread)
|
||||
// - Work-stealing scheduler
|
||||
// - Context switch: ~10ns (userspace)
|
||||
}
|
||||
```
|
||||
|
||||
| Metric | Python Threading | Rust Tokio |
|
||||
|--------|------------------|------------|
|
||||
| Memory per task | 8 MB | 2 KB |
|
||||
| Max concurrent | ~1,000 | ~100,000+ |
|
||||
| Context switch | 1-10μs | ~10ns |
|
||||
| Parallelism | Blocked by GIL | True multi-core |
|
||||
|
||||
---
|
||||
|
||||
## 6. Zero-Copy I/O
|
||||
|
||||
### Python (Current)
|
||||
|
||||
```python
|
||||
# Every read copies data through Python:
|
||||
self.file_object.read() # syscall → kernel buffer
|
||||
# kernel buffer → Python bytes object
|
||||
# Python bytes → FUSE reply buffer
|
||||
# = 2-3 copies per read
|
||||
```
|
||||
|
||||
### Rust (Proposed)
|
||||
|
||||
```rust
|
||||
// Memory-mapped file + zero-copy reply:
|
||||
let mmap = unsafe { MmapOptions::new().map(&file)? };
|
||||
|
||||
fn read(&self, ..., reply: ReplyData) {
|
||||
// Direct slice from mmap → FUSE kernel
|
||||
reply.data(&self.mmap[offset..offset+size]);
|
||||
// = 0 copies (kernel reads directly from mapped pages)
|
||||
}
|
||||
```
|
||||
|
||||
### I/O Comparison
|
||||
|
||||
| Scenario | Python | Rust | Benefit |
|
||||
|----------|--------|------|---------|
|
||||
| Serve 50MB file | 50MB copied to RAM | 0 bytes copied | **50MB saved** |
|
||||
| 100 concurrent reads | 5GB buffers | ~0 (shared mmap) | **5GB saved** |
|
||||
| Throughput | 115 MB/s | 260+ MB/s | **2.3x faster** |
|
||||
|
||||
---
|
||||
|
||||
## 7. Real-World Migration Results
|
||||
|
||||
### Case Studies
|
||||
|
||||
| Project | Metric | Python | Rust | Improvement |
|
||||
|---------|--------|--------|------|-------------|
|
||||
| API Service | Response time | 200ms | 8ms | **96% faster** |
|
||||
| Data Pipeline | Processing | 3 hours | 4.5 min | **40x faster** |
|
||||
| Web Backend | Memory | 1.2 GB | 180 MB | **85% less** |
|
||||
| Trajectory Lib | Compute | baseline | 10x faster | **10x** |
|
||||
|
||||
### AWS Mountpoint for S3
|
||||
|
||||
- Built on **fuser** (Rust FUSE)
|
||||
- Handles **terabits/sec** aggregate throughput
|
||||
- Production-ready since 2024
|
||||
- Validates Rust FUSE at scale
|
||||
|
||||
---
|
||||
|
||||
## 8. Migration Architecture
|
||||
|
||||
### Proposed Rust beetfs Structure
|
||||
|
||||
```
|
||||
beetfs-rs/
|
||||
├── Cargo.toml
|
||||
├── src/
|
||||
│ ├── main.rs # Entry point, mount logic
|
||||
│ ├── lib.rs # Library root
|
||||
│ ├── fs/
|
||||
│ │ ├── mod.rs # FUSE filesystem impl
|
||||
│ │ ├── tree.rs # Virtual directory tree (FSNode equivalent)
|
||||
│ │ ├── file.rs # File handler with mmap
|
||||
│ │ └── stat.rs # File attributes
|
||||
│ ├── metadata/
|
||||
│ │ ├── mod.rs # Metadata overlay logic
|
||||
│ │ ├── flac.rs # FLAC header generation (using lofty)
|
||||
│ │ ├── mp3.rs # MP3 ID3 header generation
|
||||
│ │ └── db.rs # Database interface (SQLite or custom)
|
||||
│ └── config.rs # Configuration (path templates, etc.)
|
||||
└── tests/
|
||||
├── fs_tests.rs
|
||||
└── metadata_tests.rs
|
||||
```
|
||||
|
||||
### Key Components
|
||||
|
||||
```rust
|
||||
// Virtual directory tree (equivalent to FSNode)
|
||||
pub struct VirtualTree {
|
||||
root: Arc<RwLock<DirNode>>,
|
||||
}
|
||||
|
||||
pub struct DirNode {
|
||||
dirs: HashMap<OsString, Arc<RwLock<DirNode>>>,
|
||||
files: HashMap<OsString, FileEntry>,
|
||||
}
|
||||
|
||||
pub struct FileEntry {
|
||||
inode: u64,
|
||||
real_path: PathBuf,
|
||||
metadata_id: i64, // Database reference
|
||||
}
|
||||
|
||||
// File handler with memory-mapped audio
|
||||
pub struct OpenFile {
|
||||
header: Vec<u8>, // Generated header with DB metadata
|
||||
header_len: usize,
|
||||
mmap: Mmap, // Memory-mapped original file
|
||||
audio_offset: usize, // Where audio starts in original
|
||||
}
|
||||
|
||||
impl OpenFile {
|
||||
pub fn read(&self, offset: usize, size: usize) -> &[u8] {
|
||||
if offset < self.header_len {
|
||||
// Return from generated header (DB metadata)
|
||||
&self.header[offset..min(offset + size, self.header_len)]
|
||||
} else {
|
||||
// Return from mmap (original audio, zero-copy)
|
||||
let audio_off = offset - self.header_len + self.audio_offset;
|
||||
&self.mmap[audio_off..audio_off + size]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Migration Effort Estimate
|
||||
|
||||
### Timeline
|
||||
|
||||
| Phase | Duration | Deliverable |
|
||||
|-------|----------|-------------|
|
||||
| **1. Prototype** | 1-2 weeks | Basic FUSE mount, read-only |
|
||||
| **2. Core features** | 2-3 weeks | Metadata overlay, FLAC support |
|
||||
| **3. Full parity** | 2-3 weeks | MP3, write support, all fields |
|
||||
| **4. Testing** | 1-2 weeks | Unit tests, integration tests |
|
||||
| **5. Optimization** | 1-2 weeks | mmap, async, benchmarking |
|
||||
|
||||
**Total: 7-12 weeks**
|
||||
|
||||
### Skill Requirements
|
||||
|
||||
- Rust fundamentals (ownership, borrowing, lifetimes)
|
||||
- FUSE protocol knowledge (from Python experience)
|
||||
- Audio metadata formats (FLAC, ID3)
|
||||
- Async Rust (Tokio) - optional for Phase 5
|
||||
|
||||
---
|
||||
|
||||
## 10. Risk Assessment
|
||||
|
||||
### Low Risk ✅
|
||||
|
||||
| Factor | Why Low Risk |
|
||||
|--------|--------------|
|
||||
| FUSE library | fuser is production-proven (AWS) |
|
||||
| Metadata library | lofty has full mutagen parity |
|
||||
| Core algorithm | Same logic, different language |
|
||||
| File format support | FLAC/MP3/OGG all supported |
|
||||
|
||||
### Medium Risk ⚠️
|
||||
|
||||
| Factor | Mitigation |
|
||||
|--------|------------|
|
||||
| Learning curve | Existing Rust experience helps |
|
||||
| Edge cases | Port Python tests to Rust |
|
||||
| Async complexity | Start with sync API, add async later |
|
||||
|
||||
### Benefits vs Effort
|
||||
|
||||
```
|
||||
Current Python Issues:
|
||||
├── Memory: OOM on library scan → Fixed by mmap
|
||||
├── Latency: 200-500ms file open → Fixed by zero-copy
|
||||
├── GC pauses: 50-2200ms → Eliminated
|
||||
├── Concurrency: single-threaded → Fixed by async
|
||||
└── MP3 support: disabled → Implemented properly
|
||||
|
||||
Migration Effort: 7-12 weeks
|
||||
Expected Lifetime: 5+ years
|
||||
ROI: Highly positive
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Recommendation
|
||||
|
||||
### ✅ **Proceed with Rust Migration**
|
||||
|
||||
**Justification:**
|
||||
1. **10x memory reduction** via mmap (eliminates OOM)
|
||||
2. **5-10x latency improvement** (eliminates blocking reads)
|
||||
3. **GC pauses eliminated** (deterministic performance)
|
||||
4. **100x concurrency** improvement (Tokio async)
|
||||
5. **Production-proven** ecosystem (fuser + lofty)
|
||||
6. **Reasonable effort** (7-12 weeks)
|
||||
|
||||
### Next Steps
|
||||
|
||||
1. **Set up Rust project** with fuser and lofty dependencies
|
||||
2. **Port FSNode** to Rust VirtualTree
|
||||
3. **Implement basic FUSE** operations (read, getattr, readdir)
|
||||
4. **Add metadata overlay** with lofty for FLAC
|
||||
5. **Add mmap** for zero-copy audio serving
|
||||
6. **Benchmark** against Python implementation
|
||||
7. **Add MP3/OGG** support
|
||||
8. **Add async** with Tokio (optional)
|
||||
|
||||
### Dependencies
|
||||
|
||||
```toml
|
||||
[dependencies]
|
||||
fuser = "0.17"
|
||||
lofty = "0.21"
|
||||
memmap2 = "0.9"
|
||||
tokio = { version = "1", features = ["full"], optional = true }
|
||||
rusqlite = "0.31" # For beets DB compatibility
|
||||
```
|
||||
Reference in New Issue
Block a user