Add reverse-engineered documentation

- README.md: Overview, core concept diagram, component summary
- architecture.md: System design, initialization flow, memory model
- components.md: Deep dive on all classes and functions
- data-flow.md: Complete read/write operation flows with diagrams
- analysis.md: Performance analysis (latency, memory footprint, I/O)
- drawbacks.md: 27 identified issues and limitations catalog
- modernization.md: Python 3 migration guide with effort estimates
This commit is contained in:
Alexander
2026-05-12 11:52:48 +02:00
parent 39a9821a07
commit f0a83df190
7 changed files with 2557 additions and 0 deletions
+118
View File
@@ -0,0 +1,118 @@
# beetfs - Reverse Engineered Documentation
> **Status**: Archived project (2010-2013), Python 2, fuse-python API
> **Fork**: git@github.com:LichHunter/beetfs.git
> **Original**: https://github.com/jbaiter/beetfs
## Overview
beetfs is a FUSE filesystem that presents audio files with **metadata from a database** while **passing through audio data unchanged** from original files. This enables transparent metadata modification without touching the underlying files.
### The Core Concept
```
┌─────────────────────────────────────────────────────────────────────┐
│ APPLICATION (VLC, Jellyfin, etc.) │
│ │
│ read("/mount/Artist/Album/track.flac") │
└─────────────────────────────────┬───────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ beetfs (FUSE Layer) │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ FileHandler │ │
│ │ ┌──────────────────────────────────────────────────────────┐ │ │
│ │ │ if offset < header_boundary: │ │ │
│ │ │ return MODIFIED_HEADER (from beets database) │ │ │
│ │ │ else: │ │ │
│ │ │ return ORIGINAL_AUDIO (from real file on disk) │ │ │
│ │ └──────────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│ │
┌───────────┘ └───────────┐
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ Beets Database │ │ Original File │
│ (SQLite - tags) │ │ (untouched) │
│ │ │ │
│ title: "Fixed" │ │ [FLAC header] │
│ artist: "Corr" │ │ [Audio frames] │
│ album: "Right" │ │ │
└───────────────────┘ └───────────────────┘
```
## Key Features
| Feature | Description |
|---------|-------------|
| **Metadata Overlay** | Returns tags from database, not from file |
| **Audio Passthrough** | Original audio data served unchanged |
| **Write Interception** | Tag edits saved to database, not to file |
| **Virtual Organization** | Presents files in template-based directory structure |
| **Format Support** | FLAC (full), MP3 (partial - read-only) |
## File Structure
```
beetfs/
├── beetsplug/
│ ├── __init__.py # Package initialization
│ └── beetFs.py # ALL code (~1144 lines)
├── README.rst # Original readme
└── COPYING # GPLv3 license
```
## Quick Architecture Summary
| Component | Lines | Purpose |
|-----------|-------|---------|
| `beetFs` (plugin) | 188-191 | Beets plugin hook |
| `mount()` | 119-183 | CLI entry point, builds virtual tree |
| `FSNode` | 390-436 | Virtual directory tree node |
| `FileHandler` | 439-565 | **CORE**: Metadata interpolation |
| `InterpolatedFLAC` | 274-388 | FLAC header generation |
| `InterpolatedID3` | 200-271 | ID3 tag generation (incomplete) |
| `beetFileSystem` | 622-1144 | FUSE operations implementation |
| `Stat` | 568-619 | File stat structure |
## Documentation Index
1. **[Architecture Overview](./architecture.md)** - System design and component interaction
2. **[Components Deep Dive](./components.md)** - Detailed component analysis
3. **[Data Flow](./data-flow.md)** - Read/write operation flows
4. **[Performance Analysis](./analysis.md)** - Latency, memory footprint, I/O patterns
5. **[Drawbacks & Limitations](./drawbacks.md)** - Known issues and missing features
6. **[Modernization Guide](./modernization.md)** - Notes for updating to Python 3
## Critical Issues Summary
| Issue | Severity | Impact |
|-------|----------|--------|
| Full file loaded into RAM | 🔴 Critical | OOM on large libraries |
| MP3 support disabled | 🔴 Critical | Only FLAC works |
| Python 2 only | 🔴 Critical | EOL, security risk |
| Single-threaded | 🟡 Major | Poor concurrency |
| 4 of 17 metadata fields | 🟡 Major | Limited functionality |
See [drawbacks.md](./drawbacks.md) for complete list (27 identified issues).
## Dependencies (Original)
```
beets >= 1.0
fuse-python (Python 2 FUSE bindings)
mutagen (audio metadata library)
```
## Usage (Original)
```bash
# As beets plugin
beet mount /path/to/mountpoint
```
## License
GPLv3 - See COPYING file
+263
View File
@@ -0,0 +1,263 @@
# beetfs Performance Analysis
## Executive Summary
beetfs has significant performance limitations due to its 2010-era design assumptions. The primary issues are **full file loading into RAM** and **blocking I/O on file open**.
---
## 1. Latency Analysis
### Operation Latencies
| Operation | Time Complexity | Typical Latency | Notes |
|-----------|-----------------|-----------------|-------|
| **File Open** | O(file_size) | 50ms - 1s+ | Reads entire file into memory |
| **File Read** | O(1) | <1ms | Pure memory slice |
| **File Write** | O(file_size) | 100ms - 2s+ | Reconstructs + DB write |
| **Directory List** | O(n) | <10ms | In-memory tree traversal |
| **getattr** | O(depth) | <1ms | Tree navigation + stat |
### File Open Breakdown
The file open operation is the critical bottleneck:
```
Time breakdown for opening 50MB FLAC file:
┌────────────────────────────────────────────────────────────┐
│ 1. open() syscall │ ~1ms │
│ 2. file_object.read() - load entire file │ ~100-200ms │
│ 3. InterpolatedFLAC() - parse FLAC │ ~20-50ms │
│ 4. Inject DB metadata │ ~1ms │
│ 5. get_header() - generate new header │ ~10-20ms │
│ 6. Seek to audio offset │ ~1ms │
│ 7. Read audio into music_data │ ~100-200ms │
├────────────────────────────────────────────────────────────┤
│ TOTAL │ ~230-470ms │
└────────────────────────────────────────────────────────────┘
```
**Code Evidence** (lines 461-483):
```python
# Step 2-5: Load and parse entire file
self.inf = InterpolatedFLAC(self.file_object.read()) # FULL FILE READ
self.inf["title"] = self.item.title
# ...
self.header = self.inf.get_header(self.real_path)
# Step 6-7: Cache all audio data
self.file_object.seek(self.music_offset)
self.music_data = self.file_object.read() # ANOTHER FULL READ
```
### Read Operation (Post-Open)
After file is opened, reads are fast:
```python
def read(self, size, offset):
if offset < self.bound:
return self.header[offset:offset+size] # Memory slice: O(1)
else:
return self.music_data[offset - len(self.header):...] # Memory slice: O(1)
```
### Write Operation
Writes to header area trigger expensive reconstruction:
```
Time breakdown for tag write:
┌────────────────────────────────────────────────────────────┐
│ 1. Reconstruct filedata in memory │ ~10-50ms │
│ 2. Parse as InterpolatedFLAC │ ~20-50ms │
│ 3. Extract tag values │ ~1ms │
│ 4. lib.store() + lib.save() (SQLite) │ ~10-50ms │
│ 5. Regenerate header │ ~10-20ms │
├────────────────────────────────────────────────────────────┤
│ TOTAL │ ~50-170ms │
└────────────────────────────────────────────────────────────┘
```
---
## 2. Memory Footprint
### Per-File Memory Usage
```
┌─────────────────────────────────────────────────────────────────────┐
│ FileHandler Memory Layout │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ self.music_data (bytes) │ │
│ │ Size: file_size - original_header_size │ │
│ │ Typical: 95-99% of file size │ │
│ │ Example: 48.5 MB for 50 MB file │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ self.header (bytes) │ │
│ │ Size: Generated FLAC header with DB metadata │ │
│ │ Typical: 4 KB - 64 KB (depends on metadata + padding) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ self.inf (InterpolatedFLAC) │ │
│ │ Size: Parsed metadata blocks + internal state │ │
│ │ Typical: 10 KB - 100 KB │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Other attributes │ │
│ │ path, real_path, item reference, format, etc. │ │
│ │ Typical: ~1 KB │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
├─────────────────────────────────────────────────────────────────────┤
│ TOTAL per file: ~1.0x - 1.1x original file size │
└─────────────────────────────────────────────────────────────────────┘
```
### Memory Scaling
| Scenario | Files Open | Avg File Size | RAM Usage |
|----------|------------|---------------|-----------|
| Single track playback | 1 | 30 MB | ~32 MB |
| Album playback (gapless) | 2-3 | 30 MB | ~65-100 MB |
| Album fully opened | 10 | 30 MB | ~320 MB |
| Jellyfin library scan | 50-100 | 30 MB | **1.6 - 3.2 GB** |
| Full library scan | 1000 | 30 MB | **32 GB** (OOM) |
### Global Memory
```python
# Directory tree structure
directory_structure = FSNode({}, {})
# Memory: O(number_of_items)
# Typical: 1-10 MB for libraries with 10,000-100,000 tracks
# Open file handles
self.files = {} # Dict[str, FileHandler]
# Memory: Sum of all FileHandler instances
# Unbounded - grows with concurrent opens
```
---
## 3. I/O Patterns
### Current (Inefficient)
```
File Open:
Disk → [Read ALL] → RAM (music_data)
→ RAM (inf object)
→ RAM (header)
File Read:
RAM (header or music_data) → Application
Total I/O: 1x-2x file size on open, 0 on read
```
### Optimal (Not Implemented)
```
File Open:
Disk → [Read header only] → RAM (small)
File Read:
If header region:
RAM (header) → Application
If audio region:
Disk → [Seek + Read chunk] → Application
Total I/O: ~64KB on open, on-demand reads
```
---
## 4. Concurrency
### Current Model
```python
server.multithreaded = 0 # Single-threaded
```
**Implications:**
- All FUSE operations serialized
- One slow file open blocks everything
- No benefit from multi-core CPUs
### Impact on Use Cases
| Use Case | Impact |
|----------|--------|
| Single player (VLC) | Acceptable - one file at a time |
| Media server scan | Severe - sequential processing |
| Multiple clients | Severe - requests queue up |
| Concurrent reads | Moderate - reads are fast once open |
---
## 5. Benchmarks (Theoretical)
Based on code analysis, not actual measurements:
### File Open Time vs Size
```
File Size Open Time (HDD) Open Time (SSD)
────────────────────────────────────────────────
10 MB 50-100 ms 20-50 ms
30 MB 150-300 ms 50-100 ms
50 MB 250-500 ms 100-200 ms
100 MB 500-1000 ms 200-400 ms
200 MB 1000-2000 ms 400-800 ms
```
### Memory vs Concurrent Opens
```
Open Files RAM Usage (30MB avg)
─────────────────────────────────────
1 ~32 MB
5 ~160 MB
10 ~320 MB
25 ~800 MB
50 ~1.6 GB
100 ~3.2 GB
```
---
## 6. Comparison with Alternatives
| Metric | beetfs | Direct File | NFS | FUSE passthrough |
|--------|--------|-------------|-----|------------------|
| Open latency | 200-500ms | <10ms | 10-50ms | <10ms |
| Read latency | <1ms | <1ms | 1-10ms | <1ms |
| Memory/file | ~1x size | ~0 | ~0 | ~0 |
| Metadata source | Database | File | File | File |
| Modify original | No | Yes | Yes | Yes |
---
## 7. Recommendations
### For Current Usage
1. **Limit concurrent opens** - Don't scan full library
2. **Use SSDs** - Reduces open latency by 2-3x
3. **Increase RAM** - Expect 1x file size per open
4. **Avoid large files** - 24-bit/192kHz FLACs are problematic
### For Modernization
1. **Implement lazy loading** - Read audio on demand
2. **Add file handle caching** - Keep headers, release audio
3. **Enable multi-threading** - Parallelize opens
4. **Add memory limits** - Evict old FileHandlers
+276
View File
@@ -0,0 +1,276 @@
# beetfs Architecture
## System Overview
beetfs implements a **metadata overlay filesystem** using FUSE. The key innovation is separating metadata storage (in beets SQLite database) from audio data storage (original files on disk).
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ USER SPACE │
│ ┌─────────────┐ ┌─────────────────────────────────────────────────────┐ │
│ │ Application │ │ beetfs │ │
│ │ (VLC, etc) │ │ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ │ │
│ │ │◄───┼──┤beetFileSystem│──│ FileHandler │──│ Interpol. │ │ │
│ │ │ │ │ (FUSE) │ │ │ │ FLAC/ID3 │ │ │
│ └─────────────┘ │ └─────────────┘ └──────────────┘ └────────────┘ │ │
│ │ │ │ │ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌─────────────┐ ┌──────────────┐ ┌────────────┐ │ │
│ │ │ FSNode │ │ Beets │ │ Original │ │ │
│ │ │ (dir tree) │ │ Database │ │ Files │ │ │
│ │ └─────────────┘ └──────────────┘ └────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────────────┤
│ KERNEL SPACE │
│ ┌───────────────┐ │
│ │ FUSE VFS │ │
│ └───────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Component Architecture
### 1. Plugin Layer
```python
class beetFs(BeetsPlugin):
"""Beets plugin hook - registers the 'mount' subcommand"""
def commands(self):
return [beetFs_command]
beetFs_command = Subcommand('mount', help='Mount a beets filesystem')
beetFs_command.func = mount
```
### 2. Initialization Flow
```
beet mount /mountpoint
┌───────────────────────────────────────────────────────────────┐
│ mount() function │
│ 1. Parse PATH_FORMAT template │
│ 2. Create FSNode root (directory_structure) │
│ 3. Iterate all items in beets library │
│ 4. For each item: │
│ - Build template substitution map │
│ - Add directories to FSNode tree │
│ - Add file entry (filename → item.id mapping) │
│ 5. Create beetFileSystem FUSE server │
│ 6. server.main() - enter FUSE event loop │
└───────────────────────────────────────────────────────────────┘
```
### 3. Virtual Directory Structure
The default path template:
```python
PATH_FORMAT = "$artist/$album ($year) [$format_upper]/$track - $artist - $title.$format"
```
Results in structure like:
```
/mountpoint/
├── Pink Floyd/
│ └── The Wall (1979) [FLAC]/
│ ├── 01 - Pink Floyd - In The Flesh?.flac
│ └── 02 - Pink Floyd - The Thin Ice.flac
└── Led Zeppelin/
└── IV (1971) [FLAC]/
└── 01 - Led Zeppelin - Black Dog.flac
```
### 4. FSNode Tree Structure
```python
class FSNode:
dirs: Dict[str, FSNode] # subdirectories
files: Dict[str, int] # filename → beets item ID
# Example tree:
FSNode(
dirs={
"Pink Floyd": FSNode(
dirs={
"The Wall (1979) [FLAC]": FSNode(
dirs={},
files={
"01 - Pink Floyd - In The Flesh?.flac": 42,
"02 - Pink Floyd - The Thin Ice.flac": 43
}
)
},
files={}
)
},
files={}
)
```
## Core Data Flow
### Read Operation
```
Application: read("/mount/Artist/Album/track.flac", offset=0, size=4096)
┌───────────────────────┐
│ beetFileSystem.read() │
│ Lines 1077-1106 │
└───────────┬───────────┘
┌───────────────┴───────────────┐
│ Get/Create FileHandler │
│ for this path │
└───────────────┬───────────────┘
┌───────────┴───────────┐
│ FileHandler.read() │
│ Lines 497-517 │
└───────────┬───────────┘
┌───────────────┴───────────────┐
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ offset < bound │ │ offset >= bound │
│ (in header area) │ │ (in audio area) │
└──────────┬──────────┘ └──────────┬──────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ Return modified │ │ Return original │
│ header from DB │ │ audio from file │
│ │ │ │
│ self.header[...] │ │ self.music_data[...]│
└─────────────────────┘ └─────────────────────┘
```
### Write Operation
```
Application: write("/mount/Artist/Album/track.flac", data, offset=100)
┌───────────────────────┐
│ beetFileSystem.write()│
│ Lines 1108-1135 │
└───────────┬───────────┘
┌───────────┴───────────┐
│ FileHandler.write() │
│ Lines 519-565 │
└───────────┬───────────┘
┌───────────────┴───────────────┐
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ offset < bound │ │ offset >= bound │
│ (in header area) │ │ (in audio area) │
└──────────┬──────────┘ └──────────┬──────────┘
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ 1. Patch header │ │ DISCARD │
│ 2. Parse new tags │ │ (audio writes │
│ 3. Extract values │ │ not allowed) │
│ 4. Update beets DB │ │ │
│ 5. Regenerate header│ │ │
└─────────────────────┘ └─────────────────────┘
```
## Memory Model
### FileHandler State
```python
class FileHandler:
# Paths
path: str # Virtual path in FUSE mount
real_path: str # Actual file on disk
# Beets integration
item: Item # Beets library item
lib: Library # Beets library reference
# File data
file_object: File # File handle (closed after init)
music_data: bytes # Audio data cached in memory
# Metadata
format: str # "flac" or "mp3"
inf: FLAC/ID3 # Interpolated metadata object
header: bytes # Generated header with DB metadata
bound: int # Byte offset where header ends
music_offset: int # Byte offset where audio starts in original
# Reference counting
instance_count: int # Number of open handles
```
### Memory Layout
```
Virtual File (as seen by application):
┌────────────────────────────────────────────────────────────────┐
│ HEADER (from DB) │ AUDIO (from file) │
│ [0 ... bound) │ [bound ... EOF) │
│ │ │
│ Generated by InterpolatedFLAC │ Cached in music_data │
│ Contains: title, artist, album, │ Original audio frames │
│ genre from beets DB │ Unchanged │
└────────────────────────────────────────────────────────────────┘
▲ ▲
│ │
self.header self.music_data
Original File (on disk):
┌────────────────────────────────────────────────────────────────┐
│ ORIGINAL HEADER │ AUDIO DATA │
│ [0 ... music_offset) │ [music_offset ... EOF) │
│ │ │
│ May have different │ Same as virtual file │
│ tag values │ │
└────────────────────────────────────────────────────────────────┘
```
## Threading Model
```python
server.multithreaded = 0 # Single-threaded mode
```
beetfs runs in **single-threaded mode** to avoid concurrency issues with:
- Shared `files` dictionary
- Beets library access
- File handle reference counting
## Global State
```python
# Module-level globals (set during mount)
structure_split: List[str] # PATH_FORMAT split by "/"
structure_depth: int # Number of path components
library: Library # Beets library instance
directory_structure: FSNode # Root of virtual directory tree
```
## Error Handling
| Situation | Response |
|-----------|----------|
| File not found | Return `-errno.ENOENT` |
| Permission denied | Return `-errno.EACCES` |
| Operation not supported | Return `-errno.EOPNOTSUPP` |
| Parse error | Log and return `-errno.ENOENT` |
## Limitations
1. **Format Support**: Only FLAC fully implemented; MP3 support is incomplete
2. **Memory Usage**: Entire audio portion cached in memory per open file
3. **Single-threaded**: No concurrent access optimization
4. **No Streaming**: Full file must be read into memory
5. **Python 2**: Uses deprecated language features
6. **fuse-python**: Old FUSE bindings, not maintained
+550
View File
@@ -0,0 +1,550 @@
# beetfs Components Deep Dive
## Component Overview
```
┌─────────────────────────────────────────────────────────────────────────┐
│ beetFs.py │
│ ┌─────────────────────────────────────────────────────────────────────┐│
│ │ PLUGIN LAYER ││
│ │ beetFs (BeetsPlugin) beetFs_command (Subcommand) ││
│ │ mount() template_mapping() ││
│ └─────────────────────────────────────────────────────────────────────┘│
│ ┌─────────────────────────────────────────────────────────────────────┐│
│ │ VIRTUAL FILESYSTEM ││
│ │ FSNode beetFileSystem (fuse.Fuse) ││
│ │ Stat ││
│ └─────────────────────────────────────────────────────────────────────┘│
│ ┌─────────────────────────────────────────────────────────────────────┐│
│ │ METADATA INTERPOLATION ││
│ │ FileHandler InterpolatedFLAC ││
│ │ InterpolatedID3 ││
│ └─────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────┘
```
---
## 1. Plugin Layer
### 1.1 beetFs (BeetsPlugin)
**Location**: Lines 188-191
```python
class beetFs(BeetsPlugin):
""" The beets plugin hook."""
def commands(self):
return [beetFs_command]
```
**Purpose**: Registers beetfs as a beets plugin, exposing the `mount` subcommand.
### 1.2 beetFs_command
**Location**: Lines 47, 185
```python
beetFs_command = Subcommand('mount', help='Mount a beets filesystem')
beetFs_command.func = mount
```
**Purpose**: CLI subcommand definition for `beet mount`.
### 1.3 mount() Function
**Location**: Lines 119-183
```python
def mount(lib, config, opts, args):
# 1. Validate arguments
if not args:
raise beets.ui.UserError('no mountpoint specified')
# 2. Parse path template
global structure_split
structure_split = PATH_FORMAT.split("/")
global structure_depth
structure_depth = len(structure_split)
# 3. Store library reference
global library
library = lib
# 4. Build virtual directory tree
global directory_structure
directory_structure = FSNode({}, {})
# 5. Iterate all library items
for item in lib.items():
mapping = template_mapping(lib, item)
# ... build tree ...
directory_structure.addfile(sub_elements, filename, item.id)
# 6. Create and run FUSE server
server = beetFileSystem(...)
server.main()
```
**Key Variables Set**:
| Variable | Type | Purpose |
|----------|------|---------|
| `structure_split` | `List[str]` | Path template components |
| `structure_depth` | `int` | Number of path levels |
| `library` | `Library` | Beets library reference |
| `directory_structure` | `FSNode` | Root of virtual tree |
### 1.4 template_mapping() Function
**Location**: Lines 82-116
```python
def template_mapping(lib, item):
"""Builds a template substitution map from beets item."""
mapping = {}
for key in METADATA_KEYS:
value = getattr(item, key)
# Sanitize value for filesystem paths
if isinstance(value, basestring):
value = re.sub(r'[\\/:]|^\.', '_', value)
elif key in ('track', 'tracktotal', 'disc', 'disctotal'):
value = '%02i' % value # Zero-pad numbers
mapping[key] = value
# Add format info
format_ = os.path.splitext(item.path)[1][1:]
mapping['format'] = format_
mapping['format_upper'] = format_.upper()
# Default values for missing fields
if mapping['artist'] == '':
mapping['artist'] = 'Unknown Artist'
# ... etc
return mapping
```
**Template Variables Available**:
| Variable | Source | Example |
|----------|--------|---------|
| `$artist` | `item.artist` | "Pink Floyd" |
| `$album` | `item.album` | "The Wall" |
| `$title` | `item.title` | "Comfortably Numb" |
| `$year` | `item.year` | "1979" |
| `$track` | `item.track` | "06" |
| `$format` | file extension | "flac" |
| `$format_upper` | file extension | "FLAC" |
---
## 2. Virtual Filesystem Layer
### 2.1 FSNode Class
**Location**: Lines 390-436
```python
class FSNode(object):
"""A directory node in the virtual filesystem tree."""
def __init__(self, dirs, files):
self.dirs = dirs # Dict[str, FSNode] - subdirectories
self.files = files # Dict[str, int] - filename → beets item ID
```
**Methods**:
| Method | Purpose | Signature |
|--------|---------|-----------|
| `getnode()` | Navigate to nested node | `getnode(elements, root=None) → FSNode` |
| `adddir()` | Add a directory | `adddir(elements, directory, root=None)` |
| `addfile()` | Add a file entry | `addfile(elements, filename, id, root=None)` |
| `listdir()` | List contents | `listdir(elements, directories, root=None) → List[str]` |
**Example Tree Navigation**:
```python
# Path: /Artist/Album/track.flac
# structure_split = ["$artist", "$album ($year) [$format_upper]", "$track - $artist - $title.$format"]
elements = ["Artist", "Album (2020) [FLAC]"]
node = directory_structure.getnode(elements)
# node.files = {"01 - Artist - Track.flac": 42, ...}
item_id = node.files["01 - Artist - Track.flac"]
# item_id = 42
```
### 2.2 Stat Class
**Location**: Lines 568-619
```python
class Stat(fuse.Stat):
DIRSIZE = 4096
def __init__(self, st_mode, st_size, st_nlink=1, st_uid=None, st_gid=None,
dt_atime=None, dt_mtime=None, dt_ctime=None):
self.st_mode = st_mode
self.st_ino = 0
self.st_dev = 0
self.st_nlink = st_nlink
self.st_uid = st_uid or os.getuid()
self.st_gid = st_gid or os.getgid()
self.st_size = st_size
# ... timestamps ...
```
**Purpose**: Represents file/directory metadata for FUSE stat operations.
### 2.3 beetFileSystem Class
**Location**: Lines 622-1144
```python
class beetFileSystem(fuse.Fuse):
"""Main FUSE filesystem implementation."""
def __init__(self, *args, **kwargs):
logging.basicConfig(filename="LOG", level=logging.INFO)
super(beetFileSystem, self).__init__(*args, **kwargs)
def fsinit(self):
"""Called after filesystem is mounted."""
self.lib = library
self.files = {} # Dict[path, FileHandler]
```
**FUSE Operations Implemented**:
| Operation | Lines | Purpose |
|-----------|-------|---------|
| `fsinit()` | 630-636 | Post-mount initialization |
| `fsdestroy()` | 638-639 | Pre-unmount cleanup |
| `statfs()` | 641-646 | Filesystem statistics |
| `getattr()` | 648-707 | Get file/dir attributes |
| `access()` | 723-756 | Check permissions |
| `readdir()` | 931-975 | List directory contents |
| `open()` | 988-1021 | Open file |
| `read()` | 1077-1106 | Read file data |
| `write()` | 1108-1135 | Write file data |
| `release()` | 1049-1059 | Close file |
**Not Implemented (return EOPNOTSUPP)**:
- `mknod()`, `mkdir()`, `unlink()`, `rmdir()`
- `symlink()`, `link()`, `rename()`
- `chmod()`, `chown()`, `truncate()`
---
## 3. Metadata Interpolation Layer
### 3.1 FileHandler Class
**Location**: Lines 439-565
This is the **core component** that implements metadata overlay.
```python
class FileHandler(object):
def __init__(self, path, lib):
self.path = path # Virtual path
self.lib = lib # Beets library
# Resolve virtual path to real file
pathsplit = path[1:].split('/')
self.item = self.lib.get_item(id=directory_structure
.getnode(pathsplit[0:structure_depth-1])
.files[pathsplit[structure_depth-1]])
self.real_path = self.item.path
# Open real file
self.file_object = open(self.real_path, 'r+')
self.instance_count = 1
# Determine format
self.format = os.path.splitext(path)[1][1:].lower()
if self.format == "flac":
# Load file into interpolated FLAC object
self.inf = InterpolatedFLAC(self.file_object.read())
# INJECT DATABASE METADATA
self.inf["title"] = self.item.title
self.inf["album"] = self.item.album
self.inf["artist"] = self.item.artist
self.inf["genre"] = self.item.genre
# Generate new header with DB metadata
self.header = self.inf.get_header(self.real_path)
self.bound = len(self.header)
self.music_offset = self.inf.offset()
elif self.format == "mp3":
self.bound = 0 # MP3 interpolation disabled
self.music_offset = 0
# Cache audio data
self.file_object.seek(self.music_offset)
self.music_data = self.file_object.read()
self.file_object.close()
```
**Key Attributes**:
| Attribute | Type | Purpose |
|-----------|------|---------|
| `path` | `str` | Virtual path (e.g., `/Artist/Album/track.flac`) |
| `real_path` | `str` | Actual file path on disk |
| `item` | `Item` | Beets library item (has DB metadata) |
| `format` | `str` | File format ("flac", "mp3") |
| `inf` | `InterpolatedFLAC` | Mutagen object with injected metadata |
| `header` | `bytes` | Generated header with DB tags |
| `bound` | `int` | Byte offset where header ends |
| `music_offset` | `int` | Byte offset in original file where audio starts |
| `music_data` | `bytes` | Cached audio data |
| `instance_count` | `int` | Reference count for file handles |
### 3.2 FileHandler.read() Method
**Location**: Lines 497-517
```python
def read(self, size, offset):
# Case 1: Reading within header boundary
if offset < self.bound:
if offset + size < len(self.header):
# Entire read is within header
return self.header[offset:offset+size]
else:
# Read spans header and audio
ret = self.header[offset:len(self.header)]
ret = ret + self.music_data[0:size - (len(self.header) - offset)]
return ret
# Case 2: Reading audio data only
return self.music_data[offset - len(self.header):offset - len(self.header) + size]
```
**Read Logic Diagram**:
```
Virtual File Layout:
┌────────────────────────────────────────────────────────────────┐
│ 0 bound EOF │
│ ├─────────┼────────────────────────────────────────────────┤ │
│ │ HEADER │ AUDIO DATA │ │
│ │ (from │ (from self.music_data) │ │
│ │ self. │ │ │
│ │ header) │ │ │
│ └─────────┴────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
Read scenarios:
1. offset=0, size=100, bound=500 → Return header[0:100]
2. offset=400, size=200, bound=500 → Return header[400:500] + music[0:100]
3. offset=600, size=100, bound=500 → Return music[100:200]
```
### 3.3 FileHandler.write() Method
**Location**: Lines 519-565
```python
def write(self, offset, buf):
# Only handle writes to header area
if offset < self.bound:
# Reconstruct full file in memory
filedata = self.header + self.music_data
# Patch in new data
filedata = filedata[0:offset] + buf + filedata[offset + len(buf):]
if self.format == "flac":
# Parse the patched data
self.inf = InterpolatedFLAC(filedata)
# EXTRACT new tag values and save to DB
self.item.title = str(self.inf["title"][0]).encode('utf-8')
self.item.album = str(self.inf["album"][0]).encode('utf-8')
self.item.artist = str(self.inf["artist"][0]).encode('utf-8')
self.item.genre = str(self.inf["genre"][0]).encode('utf-8')
# Persist to beets database
self.lib.store(self.item)
self.lib.save()
# Regenerate header with updated values
self.inf["title"] = self.item.title
self.inf["album"] = self.item.album
self.inf["artist"] = self.item.artist
self.inf["genre"] = self.item.genre
self.header = self.inf.get_header(self.real_path)
self.bound = len(self.header)
return len(buf)
```
**Write Flow**:
```
1. App writes new tag data to header region
2. Patch header + music_data with new bytes
3. Parse patched data as FLAC
4. Extract tag values from parsed FLAC
5. Update beets Item with new values
6. lib.store(item) + lib.save() → SQLite
7. Regenerate header for subsequent reads
```
### 3.4 InterpolatedFLAC Class
**Location**: Lines 274-388
```python
class InterpolatedFLAC(FLAC):
"""Custom FLAC handler that can load from bytes and generate headers."""
def load(self, filedata):
"""Load FLAC from byte string instead of file."""
self.metadata_blocks = []
self.tags = None
self.filedata = filedata
self.fileobj = BytesIO(filedata)
self.__check_header(self.fileobj)
while self.__read_metadata_block(self.fileobj):
pass
# Verify audio frame starts correctly
if self.fileobj.read(2) not in ["\xff\xf8", "\xff\xf9"]:
raise FLACNoHeaderError("End of metadata did not start audio")
def get_header(self, filename=None):
"""Generate FLAC header with current metadata."""
# Add padding block
self.metadata_blocks.append(Padding('\x00' * 1020))
MetadataBlock.group_padding(self.metadata_blocks)
# Calculate available space
header = self.__check_header(self.fileobj)
available = self.__find_audio_offset(self.fileobj) - header
data = MetadataBlock.writeblocks(self.metadata_blocks)
# Adjust padding to match available space
if len(data) > available:
# Reduce padding
padding = self.metadata_blocks[-1]
padding.length -= (len(data) - available)
data = MetadataBlock.writeblocks(self.metadata_blocks)
elif len(data) < available:
# Increase padding
self.metadata_blocks[-1].length += (available - len(data))
data = MetadataBlock.writeblocks(self.metadata_blocks)
self.__offset = len("fLaC" + data)
return "fLaC" + data
def offset(self):
"""Return byte offset where audio data starts."""
return self.__offset
```
**FLAC Structure**:
```
┌──────────────────────────────────────────────────────────────────┐
│ "fLaC" │ STREAMINFO │ VORBIS_COMMENT │ ... │ PADDING │ AUDIO... │
│ (4B) │ block │ block │ │ block │ │
└──────────────────────────────────────────────────────────────────┘
│◄──────── metadata_blocks ─────────►│
│ │
└──── get_header() returns this ─────┘
```
### 3.5 InterpolatedID3 Class
**Location**: Lines 200-271
```python
class InterpolatedID3(ID3):
"""Custom ID3 handler for MP3 files."""
def save(self, filename=None, v1=0):
"""Save ID3 tags to file."""
# Sort frames by importance
order = ["TIT2", "TPE1", "TRCK", "TALB", "TPOS", "TDRC", "TCON"]
# ... write header ...
```
**Note**: MP3 support is **incomplete** in the current implementation. The `FileHandler.__init__` sets `self.bound = 0` for MP3, effectively disabling interpolation.
---
## 4. Supported Metadata Fields
**Location**: Lines 55-77
```python
METADATA_RW_FIELDS = [
('title', 'text'),
('artist', 'text'),
('album', 'text'),
('genre', 'text'),
('composer', 'text'),
('grouping', 'text'),
('year', 'int'),
('month', 'int'),
('day', 'int'),
('track', 'int'),
('tracktotal', 'int'),
('disc', 'int'),
('disctotal', 'int'),
('lyrics', 'text'),
('comments', 'text'),
('bpm', 'int'),
('comp', 'bool'),
]
```
**Actually Implemented** (in FileHandler):
| Field | Read | Write |
|-------|------|-------|
| `title` | ✅ | ✅ |
| `artist` | ✅ | ✅ |
| `album` | ✅ | ✅ |
| `genre` | ✅ | ✅ |
| Others | ❌ | ❌ |
---
## 5. Error Handling
**Error Codes Used**:
| Code | Constant | Usage |
|------|----------|-------|
| 2 | `ENOENT` | File/directory not found |
| 13 | `EACCES` | Permission denied |
| 1 | `EPERM` | Operation not permitted |
| 95 | `EOPNOTSUPP` | Operation not supported |
**Exception Handling Pattern**:
```python
def getattr(self, path):
try:
# ... logic ...
except Exception as e:
logging.error(e)
return -errno.ENOENT
```
+412
View File
@@ -0,0 +1,412 @@
# beetfs Data Flow
## Overview
This document details the complete data flow for read and write operations in beetfs.
---
## 1. Initialization Flow
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ beet mount /mountpoint │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ mount(lib, config, opts, args) │
│ │
│ 1. Parse PATH_FORMAT into structure_split │
│ PATH_FORMAT = "$artist/$album ($year) [$format_upper]/..." │
│ structure_split = ["$artist", "$album ($year) [$format_upper]", ...] │
│ structure_depth = 3 │
│ │
│ 2. Store global library reference │
│ library = lib │
│ │
│ 3. Create empty virtual directory tree │
│ directory_structure = FSNode({}, {}) │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ for item in lib.items(): │
│ │
│ For each item in beets library: │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ 1. Build template mapping │ │
│ │ mapping = { │ │
│ │ 'artist': 'Pink Floyd', │ │
│ │ 'album': 'The Wall', │ │
│ │ 'year': '1979', │ │
│ │ 'format_upper': 'FLAC', │ │
│ │ 'track': '01', │ │
│ │ 'title': 'In The Flesh?', │ │
│ │ } │ │
│ │ │ │
│ │ 2. Substitute template for each level │ │
│ │ level_subbed[0] = "Pink Floyd" │ │
│ │ level_subbed[1] = "The Wall (1979) [FLAC]" │ │
│ │ level_subbed[2] = "01 - Pink Floyd - In The Flesh?.flac" │ │
│ │ │ │
│ │ 3. Add directories to tree │ │
│ │ directory_structure.adddir([], "Pink Floyd") │ │
│ │ directory_structure.adddir(["Pink Floyd"], "The Wall (1979)...") │ │
│ │ │ │
│ │ 4. Add file entry (filename → item.id) │ │
│ │ directory_structure.addfile( │ │
│ │ ["Pink Floyd", "The Wall (1979) [FLAC]"], │ │
│ │ "01 - Pink Floyd - In The Flesh?.flac", │ │
│ │ item.id # e.g., 42 │ │
│ │ ) │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ beetFileSystem FUSE Server │
│ │
│ server = beetFileSystem(...) │
│ server.multithreaded = 0 │
│ server.main() ← Enters FUSE event loop │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 2. File Open Flow
```
Application: open("/mount/Pink Floyd/The Wall (1979) [FLAC]/01 - Pink Floyd - In The Flesh?.flac")
┌─────────────────────────────────────────────────────────────────────────────┐
│ beetFileSystem.open(path, flags) │
│ Lines 988-1021 │
│ │
│ path = "/Pink Floyd/The Wall (1979) [FLAC]/01 - Pink Floyd - In The..." │
│ flags = os.O_RDONLY (or O_RDWR) │
│ │
│ if path in self.files: │
│ # File already open - increment reference count │
│ self.files[path].open() │
│ return self.files[path] │
│ else: │
│ # Create new FileHandler │
│ self.files[path] = FileHandler(path, self.lib) │
│ return self.files[path] │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ FileHandler.__init__(path, lib) │
│ Lines 440-483 │
│ │
│ Step 1: Resolve virtual path to beets item │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ pathsplit = ["Pink Floyd", "The Wall (1979) [FLAC]", │ │
│ │ "01 - Pink Floyd - In The Flesh?.flac"] │ │
│ │ │ │
│ │ # Navigate to parent directory in virtual tree │ │
│ │ node = directory_structure.getnode(pathsplit[0:2]) │ │
│ │ # node.files = {"01 - Pink Floyd - In The Flesh?.flac": 42, ...} │ │
│ │ │ │
│ │ # Get beets item by ID │ │
│ │ item_id = node.files[pathsplit[2]] # 42 │ │
│ │ self.item = lib.get_item(id=42) │ │
│ │ self.real_path = self.item.path │ │
│ │ # e.g., "/mnt/music/torrents/pink_floyd_wall.flac" │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ Step 2: Open real file and detect format │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ self.file_object = open(self.real_path, 'r+') │ │
│ │ self.format = "flac" # from file extension │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ Step 3: Create InterpolatedFLAC with database metadata │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ self.inf = InterpolatedFLAC(self.file_object.read()) │ │
│ │ │ │
│ │ # INJECT DATABASE METADATA (this is the key operation!) │ │
│ │ self.inf["title"] = self.item.title # "In The Flesh?" │ │
│ │ self.inf["album"] = self.item.album # "The Wall" │ │
│ │ self.inf["artist"] = self.item.artist # "Pink Floyd" │ │
│ │ self.inf["genre"] = self.item.genre # "Progressive Rock" │ │
│ │ │ │
│ │ # Generate header with injected metadata │ │
│ │ self.header = self.inf.get_header(self.real_path) │ │
│ │ self.bound = len(self.header) # e.g., 8192 bytes │ │
│ │ self.music_offset = self.inf.offset() │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ Step 4: Cache audio data │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ self.file_object.seek(self.music_offset) │ │
│ │ self.music_data = self.file_object.read() # All audio data │ │
│ │ self.file_object.close() │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 3. File Read Flow
```
Application: read(fd, buffer, 4096) # offset managed by kernel
┌─────────────────────────────────────────────────────────────────────────────┐
│ beetFileSystem.read(path, size, offset, fh) │
│ Lines 1077-1106 │
│ │
│ path = "/Pink Floyd/The Wall (1979) [FLAC]/01 - ..." │
│ size = 4096 │
│ offset = 0 (first read) or previous offset + bytes_read │
│ fh = FileHandler instance │
│ │
│ return self.files[path].read(size, offset) │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ FileHandler.read(size, offset) │
│ Lines 497-517 │
│ │
│ Variables: │
│ self.bound = 8192 (header size) │
│ self.header = bytes (generated FLAC header with DB metadata) │
│ self.music_data = bytes (original audio frames) │
└─────────────────────────────────────────────────────────────────────────────┘
┌───────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐
│ Case 1: Header Only │ │ Case 2: Span Both │ │ Case 3: Audio Only │
│ offset < bound │ │ offset < bound │ │ offset >= bound │
│ offset+size < bound │ │ offset+size >= bound│ │ │
├─────────────────────┤ ├─────────────────────┤ ├─────────────────────┤
│ Example: │ │ Example: │ │ Example: │
│ offset=0 │ │ offset=8000 │ │ offset=10000 │
│ size=4096 │ │ size=4096 │ │ size=4096 │
│ bound=8192 │ │ bound=8192 │ │ bound=8192 │
├─────────────────────┤ ├─────────────────────┤ ├─────────────────────┤
│ Return: │ │ Return: │ │ Return: │
│ header[0:4096] │ │ header[8000:8192] │ │ music_data[ │
│ │ │ + music_data[0:3904]│ │ 1808:5904] │
│ (DB metadata!) │ │ │ │ │
│ │ │ (mixed) │ │ (original audio) │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
Visual representation of virtual file:
0 bound (8192) EOF
│ │ │
▼ ▼ ▼
┌───────────────────────┬────────────────────────────────────────────┐
│ HEADER │ AUDIO DATA │
│ (self.header) │ (self.music_data) │
│ │ │
│ Contains: │ Contains: │
│ - "fLaC" magic │ - Original FLAC frames │
│ - STREAMINFO block │ - Unchanged from disk │
│ - VORBIS_COMMENT │ │
│ with DB values: │ │
│ title, artist, │ │
│ album, genre │ │
│ - PADDING block │ │
└───────────────────────┴────────────────────────────────────────────┘
▲ ▲
│ │
From InterpolatedFLAC From original file
with injected DB tags (passed through)
```
---
## 4. File Write Flow
```
Application: write(fd, "TITLE=New Title\0", 16) # Hypothetical tag edit
┌─────────────────────────────────────────────────────────────────────────────┐
│ beetFileSystem.write(path, buf, offset, fh) │
│ Lines 1108-1135 │
│ │
│ return self.files[path].write(offset, buf) │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ FileHandler.write(offset, buf) │
│ Lines 519-565 │
│ │
│ if offset >= self.bound: │
│ # Write is in audio area - DISCARD │
│ return # Do nothing, audio is read-only │
│ │
│ # Write is in header area - process tag update │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ Step 1: Reconstruct full virtual file in memory │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ filedata = self.header + self.music_data │ │
│ │ │ │
│ │ # Patch in new data │ │
│ │ filedata = filedata[0:offset] + buf + filedata[offset + len(buf):] │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ Step 2: Parse patched data as FLAC │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ self.inf = InterpolatedFLAC(filedata) │ │
│ │ # This parses the FLAC structure and extracts Vorbis comments │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ Step 3: Extract tag values from parsed FLAC │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ self.item.title = str(self.inf["title"][0]).encode('utf-8') │ │
│ │ self.item.album = str(self.inf["album"][0]).encode('utf-8') │ │
│ │ self.item.artist = str(self.inf["artist"][0]).encode('utf-8') │ │
│ │ self.item.genre = str(self.inf["genre"][0]).encode('utf-8') │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ Step 4: Save to beets database │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ self.lib.store(self.item) # Update item in library │ │
│ │ self.lib.save() # Persist to SQLite │ │
│ │ │ │
│ │ # NOTE: Original file on disk is NEVER touched! │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ Step 5: Regenerate header for subsequent reads │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ self.inf["title"] = self.item.title │ │
│ │ self.inf["album"] = self.item.album │ │
│ │ self.inf["artist"] = self.item.artist │ │
│ │ self.inf["genre"] = self.item.genre │ │
│ │ │ │
│ │ self.header = self.inf.get_header(self.real_path) │ │
│ │ self.bound = len(self.header) │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ return len(buf) # Success │
└─────────────────────────────────────────────────────────────────────────────┘
Write data flow summary:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Application │ │ beetfs │ │ Beets │ │ Original │
│ writes │────▶│ parses │────▶│ database │ │ file │
│ new tags │ │ extracts │ │ updated │ │ UNTOUCHED │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
```
---
## 5. File Release Flow
```
Application: close(fd)
┌─────────────────────────────────────────────────────────────────────────────┐
│ beetFileSystem.release(path, flags, fh) │
│ Lines 1049-1059 │
│ │
│ if self.files[path].release(): │
│ # Reference count reached 0, clean up │
│ del self.files[path] │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ FileHandler.release() │
│ Lines 489-495 │
│ │
│ self.instance_count -= 1 │
│ │
│ if self.instance_count == 0: │
│ return True # OK to delete │
│ else: │
│ return False # Still in use │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 6. Directory Listing Flow
```
Application: ls /mount/Pink\ Floyd/
┌─────────────────────────────────────────────────────────────────────────────┐
│ beetFileSystem.readdir(path, offset, dh) │
│ Lines 931-975 │
│ │
│ path = "/Pink Floyd" │
│ pathsplit = ["Pink Floyd"] │
│ │
│ yield fuse.Direntry(".") │
│ yield fuse.Direntry("..") │
│ │
│ # len(pathsplit) == 1, structure_depth - 1 == 2 │
│ # So we're listing directories (albums), not files │
│ │
│ for dirname in directory_structure.listdir(pathsplit, True): │
│ yield fuse.Direntry(dirname.encode('utf-8')) │
│ # "The Wall (1979) [FLAC]" │
│ # "Animals (1977) [FLAC]" │
│ # etc. │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 7. Complete Request Lifecycle
```
┌──────────────────────────────────────────────────────────────────────────────┐
│ COMPLETE LIFECYCLE │
│ │
│ 1. User mounts: beet mount /mnt/music │
│ ├─ Build virtual tree from beets library │
│ └─ Start FUSE event loop │
│ │
│ 2. Application opens file: open("/mnt/music/Artist/Album/track.flac") │
│ ├─ Resolve virtual path to beets item ID │
│ ├─ Load original file into memory │
│ ├─ Inject database metadata into FLAC structure │
│ ├─ Generate new header with DB tags │
│ └─ Cache audio data │
│ │
│ 3. Application reads file: read(fd, buf, 4096) │
│ ├─ If reading header region → return header (DB metadata) │
│ ├─ If reading audio region → return cached audio (original) │
│ └─ If spanning both → return combined data │
│ │
│ 4. Application writes tags: write(fd, new_tags, offset) │
│ ├─ If audio region → discard (read-only) │
│ ├─ If header region: │
│ │ ├─ Parse new tag values │
│ │ ├─ Update beets database │
│ │ └─ Regenerate header │
│ └─ Original file NEVER modified │
│ │
│ 5. Application closes file: close(fd) │
│ ├─ Decrement reference count │
│ └─ Clean up if count == 0 │
│ │
│ 6. User unmounts: fusermount -u /mnt/music │
│ └─ fsdestroy() called, cleanup │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
```
+479
View File
@@ -0,0 +1,479 @@
# beetfs Drawbacks & Limitations
## Overview
This document catalogs all identified issues, limitations, and missing features in beetfs. Issues are categorized by severity and type.
---
## Critical Issues (🔴)
### 1. Full File Loading into Memory
**Location**: Lines 463, 480-481
```python
self.inf = InterpolatedFLAC(self.file_object.read()) # Entire file
# ...
self.music_data = self.file_object.read() # Audio portion again
```
**Impact**:
- Memory usage = O(file_size) per open file
- 50MB FLAC = ~50MB RAM
- Library scan of 100 files = 5GB+ RAM
- Out-of-memory crashes on large libraries
**Fix Required**: Implement lazy loading with seek-based reads.
---
### 2. MP3 Support Disabled
**Location**: Lines 475-477
```python
elif self.format == "mp3":
self.bound = 0 # disable interpolation for now
self.music_offset = 0 # disable interpolation for now
```
**Impact**:
- MP3 files return original metadata, not database metadata
- Breaks the core promise of metadata overlay
- MP3 is still one of the most common formats
**Fix Required**: Implement `InterpolatedID3` header generation.
---
### 3. Python 2 Only
**Location**: Throughout
```python
except fuse.FuseError, e: # Python 2 syntax
if isinstance(value, basestring): # Removed in Python 3
return reduce(lambda a, b: (a << 8) + ord(b), string, 0L) # Long literals
```
**Impact**:
- Python 2 EOL was January 2020
- Security vulnerabilities unfixed
- No modern library support
- Cannot run on Python 3 without migration
**Fix Required**: Full Python 3 migration (see modernization.md).
---
### 4. Deprecated FUSE Library
**Location**: Line 25, 51
```python
import fuse
fuse.fuse_python_api = (0, 2)
```
**Impact**:
- fuse-python is unmaintained
- Missing modern FUSE features (FUSE 3.x)
- Compatibility issues with recent kernels
- No async support
**Fix Required**: Migrate to pyfuse3 or llfuse.
---
### 5. Single-Threaded Execution
**Location**: Line 178
```python
server.multithreaded = 0
```
**Impact**:
- All operations serialized
- One slow open blocks all other operations
- Cannot utilize multiple CPU cores
- Poor performance under concurrent access
**Fix Required**: Enable multithreading with proper locking.
---
## Major Issues (🟡)
### 6. Limited Metadata Fields
**Location**: Lines 466-469, 540-547
```python
# Only these 4 fields are actually used:
self.inf["title"] = self.item.title
self.inf["album"] = self.item.album
self.inf["artist"] = self.item.artist
self.inf["genre"] = self.item.genre
```
**Defined but not implemented** (lines 55-77):
- `composer`, `grouping`
- `year`, `month`, `day`
- `track`, `tracktotal`
- `disc`, `disctotal`
- `lyrics`, `comments`
- `bpm`, `comp`
- `albumartist` (not even defined)
**Impact**:
- Track numbers not from database
- Album artist not supported
- Year/date not interpolated
- Cover art not handled
---
### 7. No File Handle Caching/Eviction
**Location**: Lines 1004-1018
```python
if path in self.files:
self.files[path].open()
else:
self.files[path] = FileHandler(path, self.lib)
```
**Missing**:
- No maximum cache size
- No LRU eviction
- No memory pressure handling
- Files stay in memory until explicitly closed
**Impact**:
- Memory grows unbounded
- No protection against OOM
- Applications that open-then-close still leave data cached
---
### 8. Blocking Database Operations
**Location**: Lines 549-550
```python
self.lib.store(self.item)
self.lib.save()
```
**Impact**:
- SQLite operations in FUSE thread
- Write operations block all reads
- No transaction batching
- Potential deadlocks with beets
---
### 9. No Library Hot Reload
**Issue**: Virtual directory tree built once at mount time.
**Location**: Lines 142-172
```python
for item in lib.items():
# Build tree...
```
**Impact**:
- New files added to beets library not visible
- Deleted files still appear (ENOENT on access)
- Metadata changes in beets not reflected until remount
- Must unmount/remount to see changes
---
### 10. Static Path Format
**Location**: Lines 44-45
```python
PATH_FORMAT = ("$artist/$album ($year) [$format_upper]/"
"$track - $artist - $title.$format")
```
**Impact**:
- Cannot customize organization
- Hard-coded template
- No configuration option
- Incompatible with different organizational preferences
---
### 11. No Extended Attribute Support
**Location**: Not implemented
**Impact**:
- Cannot store/retrieve xattrs
- Some applications use xattrs for metadata
- macOS Finder metadata lost
- Linux capabilities not supported
---
### 12. No Symlink Support
**Location**: Lines 758-765
```python
def readlink(self, path):
return -errno.EOPNOTSUPP
```
**Impact**:
- Cannot create symlinks in mount
- Some applications expect symlink support
- Cannot link to external files
---
### 13. Silent Error Swallowing
**Location**: Lines 705-707, 1019-1021, 1103-1104
```python
except Exception as e:
logging.error(e)
return -errno.ENOENT # Always returns same error
```
**Impact**:
- All errors appear as "file not found"
- Hard to debug issues
- No distinction between permission, I/O, parse errors
- Lost stack traces in many cases
---
## Minor Issues (🟢)
### 14. Global State
**Location**: Lines 125-140
```python
global structure_split
global structure_depth
global library
global directory_structure
```
**Impact**:
- Cannot mount multiple instances
- Difficult to unit test
- Tight coupling between components
- No dependency injection
---
### 15. Hard-coded Log File
**Location**: Lines 624-625
```python
LOG_FILENAME = "LOG"
logging.basicConfig(filename=LOG_FILENAME, level=logging.INFO,)
```
**Impact**:
- Log file created in current directory
- No log rotation
- No configurable log level
- Fills disk on busy systems
---
### 16. Reference Count Manual Management
**Location**: Lines 485-495
```python
def open(self):
self.instance_count = self.instance_count + 1
def release(self):
if self.instance_count > 0:
self.instance_count = self.instance_count - 1
```
**Issues**:
- Race conditions possible if multithreaded
- No context manager support
- Manual counting error-prone
- Off-by-one potential
---
### 17. Inefficient Directory Building
**Location**: Lines 153-172
```python
for level in range(0, structure_depth - 1):
if level-1 in level_subbed:
sub_elements.append(level_subbed[level-1])
directory_structure.adddir(sub_elements, level_subbed[level])
```
**Issues**:
- Rebuilds path for every item
- O(items × depth) complexity
- String allocations in inner loop
- Could use trie-based insertion
---
### 18. No Cover Art Handling
**Issue**: Cover art embedded in FLAC not addressed.
**Impact**:
- Cover art from original file used, not database
- Cannot replace/add cover art through overlay
- PICTURE metadata blocks passed through unchanged
---
### 19. No Cue Sheet Support
**Issue**: Cue sheets not handled specially.
**Impact**:
- `.cue` files point to original file paths
- Cannot play cue-referenced tracks correctly
- Split-by-cue not supported
---
### 20. File Size Mismatch Potential
**Issue**: Virtual file size differs from physical if header size changes.
**Location**: Lines 675-688
```python
statinfo = os.stat(item)
st = Stat(st_mode=statinfo.st_mode,
st_size=statinfo.st_size, # Original size, not virtual!
...)
```
**Impact**:
- `stat()` returns original file size
- If generated header is larger/smaller, size is wrong
- Some applications may fail on size mismatch
- Range requests could break
---
## Missing Features
### Essential
| Feature | Status | Notes |
|---------|--------|-------|
| MP3 metadata interpolation | ❌ Disabled | Code exists but disabled |
| OGG/Opus support | ❌ Missing | No implementation |
| AAC/M4A support | ❌ Missing | No implementation |
| Lazy file loading | ❌ Missing | Full file loaded |
| Memory management | ❌ Missing | No limits or eviction |
| Configuration file | ❌ Missing | Hard-coded values |
### Nice to Have
| Feature | Status | Notes |
|---------|--------|-------|
| Cover art interpolation | ❌ Missing | Would need PICTURE block handling |
| ReplayGain from database | ❌ Missing | Tags not interpolated |
| Lyrics from database | ❌ Missing | Listed in fields, not implemented |
| Watch mode (hot reload) | ❌ Missing | No inotify integration |
| Multiple mount points | ❌ Missing | Global state prevents |
| Remote database | ❌ Missing | Local beets only |
| Read-only mode | ❌ Missing | Always allows writes |
| Custom path templates | ❌ Missing | Hard-coded PATH_FORMAT |
---
## Security Considerations
### 1. No Input Validation
**Location**: Throughout
```python
pathsplit = path[1:].split('/')
item_id = node.files[pathsplit[structure_depth-1]] # No bounds check
```
**Risk**: Path traversal, injection attacks unlikely but possible.
### 2. Database Credentials Exposed
**Issue**: Uses beets library directly with stored credentials.
**Risk**: Low - local access only.
### 3. No Permission Enforcement
**Location**: Lines 749-756
```python
if flags | os.R_OK:
pass # TODO: actually check the file permissions
if flags | os.W_OK:
pass
```
**Risk**: All users can read/write through mount.
---
## Compatibility Issues
| Component | Issue |
|-----------|-------|
| **Jellyfin** | May scan entire library, causing OOM |
| **Plex** | Same library scan issue |
| **Navidrome** | Expects certain tag fields not implemented |
| **mpd** | Works for playback, database features limited |
| **macOS** | fuse-python macOS support questionable |
| **Docker** | FUSE in containers requires privileged mode |
---
## Summary Table
| Category | Critical | Major | Minor |
|----------|----------|-------|-------|
| Performance | 2 | 4 | 2 |
| Functionality | 2 | 5 | 4 |
| Code Quality | 2 | 2 | 4 |
| **Total** | **6** | **11** | **10** |
---
## Prioritized Fix List
1. 🔴 **Memory**: Implement lazy loading (Critical for usability)
2. 🔴 **Python 3**: Migrate to Python 3 (Required for any changes)
3. 🔴 **FUSE lib**: Switch to pyfuse3/llfuse (Required for Python 3)
4. 🔴 **MP3**: Enable MP3 interpolation (Core functionality)
5. 🟡 **Metadata**: Implement all fields (Feature completeness)
6. 🟡 **Threading**: Enable multithreading (Performance)
7. 🟡 **Config**: Add configuration file (Usability)
8. 🟡 **Hot reload**: Watch for library changes (Usability)
9. 🟢 **Globals**: Remove global state (Code quality)
10. 🟢 **Logging**: Configurable logging (Operations)
+459
View File
@@ -0,0 +1,459 @@
# beetfs Modernization Guide
## Current State Analysis
### Technical Debt
| Issue | Severity | Location |
|-------|----------|----------|
| Python 2 syntax | 🔴 Critical | Throughout |
| fuse-python (deprecated) | 🔴 Critical | Lines 25, 51 |
| `basestring` usage | 🔴 Critical | Line 89 |
| `reduce` without import | 🟡 Medium | Line 197 |
| `0755` octal syntax | 🟡 Medium | Lines 654, 700 |
| `print` as statement | 🟡 Medium | N/A (not used) |
| `except Exception, e` | 🔴 Critical | Line 181 |
| Long integers (`0L`) | 🟡 Medium | Line 197 |
| Global state | 🟡 Medium | Lines 125-140 |
| Memory-heavy design | 🟡 Medium | Line 481 |
### Dependencies to Update
| Original | Replacement | Notes |
|----------|-------------|-------|
| `fuse-python` | `pyfuse3` or `llfuse` | Modern FUSE bindings |
| `beets` (old API) | `beets >= 1.6` | Check API compatibility |
| `mutagen` | `mutagen >= 1.45` | Mostly compatible |
| Python 2.7 | Python 3.9+ | Full migration needed |
---
## Migration Steps
### Phase 1: Python 3 Compatibility
#### 1.1 Fix Syntax Issues
```python
# BEFORE (Python 2)
except fuse.FuseError, e:
log.error(str(e))
# AFTER (Python 3)
except fuse.FuseError as e:
log.error(str(e))
```
```python
# BEFORE
if isinstance(value, basestring):
# AFTER
if isinstance(value, str):
```
```python
# BEFORE
return reduce(lambda a, b: (a << 8) + ord(b), string, 0L)
# AFTER
from functools import reduce
return reduce(lambda a, b: (a << 8) + b, string, 0)
```
```python
# BEFORE
mode = stat.S_IFDIR | 0755
# AFTER
mode = stat.S_IFDIR | 0o755
```
#### 1.2 Fix String/Bytes Handling
```python
# BEFORE - implicit string/bytes mixing
self.header = self.inf.get_header(self.real_path)
return self.header[offset:offset+size]
# AFTER - explicit bytes handling
self.header: bytes = self.inf.get_header(self.real_path)
return self.header[offset:offset+size]
```
```python
# BEFORE
self.item.title = str(self.inf["title"][0]).encode('utf-8')
# AFTER
self.item.title = self.inf["title"][0] # Already str in Python 3
```
#### 1.3 Fix Dictionary Methods
```python
# BEFORE
return node.dirs.keys()
# AFTER
return list(node.dirs.keys()) # If list is needed
# or just
return node.dirs.keys() # If iteration is sufficient
```
---
### Phase 2: FUSE Library Migration
#### Option A: pyfuse3 (Recommended)
Modern, async-capable FUSE bindings.
```python
# BEFORE (fuse-python)
import fuse
fuse.fuse_python_api = (0, 2)
class beetFileSystem(fuse.Fuse):
def read(self, path, size, offset):
return data
# AFTER (pyfuse3)
import pyfuse3
import trio
class BeetFS(pyfuse3.Operations):
async def read(self, fh, offset, size):
return data
async def main():
fs = BeetFS()
fuse_options = set(pyfuse3.default_options)
fuse_options.add('fsname=beetfs')
pyfuse3.init(fs, mountpoint, fuse_options)
try:
await pyfuse3.main()
finally:
pyfuse3.close()
trio.run(main)
```
**Key Differences**:
| fuse-python | pyfuse3 |
|-------------|---------|
| `read(path, size, offset)` | `read(fh, offset, size)` |
| Synchronous | Async (trio) |
| Return data directly | Return bytes |
| Path-based | File handle based |
#### Option B: llfuse (Alternative)
Lower-level, synchronous.
```python
import llfuse
class BeetFS(llfuse.Operations):
def read(self, fh, offset, size):
return data
def main():
fs = BeetFS()
llfuse.init(fs, mountpoint, options)
try:
llfuse.main()
finally:
llfuse.close()
```
#### Option C: fusepy (Simple)
Simple wrapper, but less maintained.
```python
from fuse import FUSE, Operations
class BeetFS(Operations):
def read(self, path, size, offset, fh):
return data
FUSE(BeetFS(), mountpoint, foreground=True)
```
---
### Phase 3: Architecture Improvements
#### 3.1 Remove Global State
```python
# BEFORE - Global variables
global structure_split
global structure_depth
global library
global directory_structure
# AFTER - Instance variables
class BeetFS:
def __init__(self, lib: Library, path_format: str):
self.lib = lib
self.path_format = path_format
self.structure_split = path_format.split("/")
self.structure_depth = len(self.structure_split)
self.directory_structure = FSNode({}, {})
self._build_tree()
```
#### 3.2 Reduce Memory Usage
```python
# BEFORE - Load entire audio into memory
self.music_data = self.file_object.read() # Could be 100MB+
# AFTER - Lazy loading with mmap or seek
class FileHandler:
def __init__(self, path, lib):
self.real_path = self._resolve_path(path)
self.file_object = open(self.real_path, 'rb')
self._header = None # Lazy load
self._music_offset = None
@property
def header(self) -> bytes:
if self._header is None:
self._header = self._generate_header()
return self._header
def read(self, size: int, offset: int) -> bytes:
if offset < len(self.header):
# Header region - return from generated header
if offset + size <= len(self.header):
return self.header[offset:offset+size]
else:
# Span header and audio
header_part = self.header[offset:]
audio_offset = 0
audio_size = size - len(header_part)
audio_part = self._read_audio(audio_offset, audio_size)
return header_part + audio_part
else:
# Audio region - read directly from file
audio_offset = offset - len(self.header)
return self._read_audio(audio_offset, size)
def _read_audio(self, offset: int, size: int) -> bytes:
self.file_object.seek(self._music_offset + offset)
return self.file_object.read(size)
```
#### 3.3 Add Type Hints
```python
from typing import Dict, List, Optional, Tuple
from pathlib import Path
class FSNode:
def __init__(self, dirs: Dict[str, 'FSNode'], files: Dict[str, int]):
self.dirs: Dict[str, FSNode] = dirs
self.files: Dict[str, int] = files
def getnode(self, elements: List[str], root: Optional['FSNode'] = None) -> 'FSNode':
...
def addfile(self, elements: List[str], filename: str, item_id: int) -> None:
...
```
#### 3.4 Add MP3 Support
```python
class FileHandler:
def __init__(self, path: str, lib: Library):
self.format = Path(path).suffix[1:].lower()
if self.format == "flac":
self._handler = FLACHandler(self.real_path, self.item)
elif self.format == "mp3":
self._handler = MP3Handler(self.real_path, self.item)
elif self.format in ("ogg", "opus"):
self._handler = OggHandler(self.real_path, self.item)
else:
raise UnsupportedFormatError(f"Format {self.format} not supported")
class FLACHandler:
def generate_header(self, item: Item) -> bytes:
inf = InterpolatedFLAC(self.file_data)
inf["title"] = item.title
inf["album"] = item.album
inf["artist"] = item.artist
inf["genre"] = item.genre
return inf.get_header()
class MP3Handler:
def generate_header(self, item: Item) -> bytes:
# Implement ID3v2 header generation
id3 = InterpolatedID3()
id3.add(TIT2(encoding=3, text=item.title))
id3.add(TPE1(encoding=3, text=item.artist))
id3.add(TALB(encoding=3, text=item.album))
id3.add(TCON(encoding=3, text=item.genre))
# Calculate padding to match original header size
...
return id3.render()
```
---
### Phase 4: Testing
#### 4.1 Unit Tests
```python
import pytest
from beetfs import FSNode, FileHandler
class TestFSNode:
def test_adddir(self):
root = FSNode({}, {})
root.adddir([], "Artist")
assert "Artist" in root.dirs
def test_addfile(self):
root = FSNode({}, {})
root.adddir([], "Artist")
root.addfile(["Artist"], "track.flac", 42)
assert root.dirs["Artist"].files["track.flac"] == 42
def test_getnode(self):
root = FSNode({}, {})
root.adddir([], "Artist")
root.adddir(["Artist"], "Album")
node = root.getnode(["Artist", "Album"])
assert node is not None
class TestFileHandler:
def test_read_header(self, mock_flac_file, mock_beets_item):
handler = FileHandler("/Artist/Album/track.flac", mock_lib)
data = handler.read(100, 0)
assert data.startswith(b"fLaC")
def test_read_audio(self, mock_flac_file, mock_beets_item):
handler = FileHandler("/Artist/Album/track.flac", mock_lib)
data = handler.read(100, handler.bound + 100)
# Should be audio data from original file
assert data == mock_flac_file.audio_data[100:200]
```
#### 4.2 Integration Tests
```python
import subprocess
import tempfile
import os
class TestFUSEMount:
def test_mount_unmount(self, beets_library):
with tempfile.TemporaryDirectory() as mountpoint:
# Mount
proc = subprocess.Popen(
["beet", "mount", mountpoint],
stdout=subprocess.PIPE
)
time.sleep(1)
# Verify mount
assert os.path.ismount(mountpoint)
# List files
files = os.listdir(mountpoint)
assert len(files) > 0
# Unmount
subprocess.run(["fusermount", "-u", mountpoint])
proc.wait()
```
---
### Phase 5: Standalone Mode (Optional)
Remove beets dependency for use as standalone metadata overlay.
```python
class StandaloneFS:
"""Metadata overlay without beets dependency."""
def __init__(self,
source_dir: Path,
metadata_db: Path,
path_format: str):
self.source_dir = source_dir
self.db = sqlite3.connect(metadata_db)
self.path_format = path_format
self._build_tree()
def _build_tree(self):
"""Build virtual tree from source directory and metadata DB."""
for audio_file in self.source_dir.rglob("*.flac"):
# Get metadata from DB or scan file
metadata = self._get_metadata(audio_file)
# Build virtual path from template
virtual_path = self._format_path(metadata)
# Add to tree
self.directory_structure.addfile(
virtual_path.parent.parts,
virtual_path.name,
str(audio_file) # Store actual path instead of ID
)
```
---
## Recommended Migration Order
```
1. [ ] Fork and set up development environment
2. [ ] Add type hints throughout (helps catch issues)
3. [ ] Fix Python 3 syntax issues
4. [ ] Replace fuse-python with pyfuse3/llfuse
5. [ ] Add unit tests for FSNode and FileHandler
6. [ ] Refactor global state to instance variables
7. [ ] Implement lazy loading for audio data
8. [ ] Add MP3 support
9. [ ] Add integration tests
10. [ ] Optional: Create standalone mode
```
---
## Estimated Effort
| Phase | Effort | Risk |
|-------|--------|------|
| Phase 1 (Python 3) | 2-3 days | Low |
| Phase 2 (FUSE migration) | 3-5 days | Medium |
| Phase 3 (Architecture) | 3-5 days | Medium |
| Phase 4 (Testing) | 2-3 days | Low |
| Phase 5 (Standalone) | 3-5 days | Medium |
| **Total** | **13-21 days** | |
---
## Alternative: Rewrite from Scratch
Given the age of the codebase, a rewrite might be more efficient:
**Pros of Rewrite**:
- Clean architecture from start
- Modern async design
- Better memory management
- Easier to test
**Cons of Rewrite**:
- More initial effort
- Risk of missing edge cases
- Need to re-discover FLAC/ID3 intricacies
**Recommended Approach**: Start with Phase 1-2 to understand the code deeply, then decide whether to continue refactoring or rewrite.