Files
MusicFS/docs/v1/modernization.md
T
Alexander 1374084135 Reorganize docs into v1 (beetfs) and v2 (new architecture)
docs/v1/ - Original beetfs documentation:
  - analysis.md, components.md, data-flow.md, drawbacks.md
  - features.md, modernization.md, rust-migration.md
  - benchmark-plan.md, benchmark-results.md, e2e-test-plan.md
  - README.md

docs/v2/ - New MusicFS architecture:
  - requirements.md: Full requirements spec (FR-1 to FR-25, NFR-1 to NFR-14)
    - P0: Multi-origin, plugins, CAS, control API
    - P1: Search, album art, prefetch, metadata sources
    - P3: HA, 10M+ files scalability
  - architecture.md: Google BlueDoc style design document
    - PlantUML diagrams for all components
    - Design requirements with quantitative targets
    - Alternatives considered, implementation plan
2026-05-12 16:46:37 +02:00

12 KiB

beetfs Modernization Guide

Current State Analysis

Technical Debt

Issue Severity Location
Python 2 syntax 🔴 Critical Throughout
fuse-python (deprecated) 🔴 Critical Lines 25, 51
basestring usage 🔴 Critical Line 89
reduce without import 🟡 Medium Line 197
0755 octal syntax 🟡 Medium Lines 654, 700
print as statement 🟡 Medium N/A (not used)
except Exception, e 🔴 Critical Line 181
Long integers (0L) 🟡 Medium Line 197
Global state 🟡 Medium Lines 125-140
Memory-heavy design 🟡 Medium Line 481

Dependencies to Update

Original Replacement Notes
fuse-python pyfuse3 or llfuse Modern FUSE bindings
beets (old API) beets >= 1.6 Check API compatibility
mutagen mutagen >= 1.45 Mostly compatible
Python 2.7 Python 3.9+ Full migration needed

Migration Steps

Phase 1: Python 3 Compatibility

1.1 Fix Syntax Issues

# BEFORE (Python 2)
except fuse.FuseError, e:
    log.error(str(e))

# AFTER (Python 3)
except fuse.FuseError as e:
    log.error(str(e))
# BEFORE
if isinstance(value, basestring):

# AFTER
if isinstance(value, str):
# BEFORE
return reduce(lambda a, b: (a << 8) + ord(b), string, 0L)

# AFTER
from functools import reduce
return reduce(lambda a, b: (a << 8) + b, string, 0)
# BEFORE
mode = stat.S_IFDIR | 0755

# AFTER
mode = stat.S_IFDIR | 0o755

1.2 Fix String/Bytes Handling

# BEFORE - implicit string/bytes mixing
self.header = self.inf.get_header(self.real_path)
return self.header[offset:offset+size]

# AFTER - explicit bytes handling
self.header: bytes = self.inf.get_header(self.real_path)
return self.header[offset:offset+size]
# BEFORE
self.item.title = str(self.inf["title"][0]).encode('utf-8')

# AFTER
self.item.title = self.inf["title"][0]  # Already str in Python 3

1.3 Fix Dictionary Methods

# BEFORE
return node.dirs.keys()

# AFTER
return list(node.dirs.keys())  # If list is needed
# or just
return node.dirs.keys()  # If iteration is sufficient

Phase 2: FUSE Library Migration

Modern, async-capable FUSE bindings.

# BEFORE (fuse-python)
import fuse
fuse.fuse_python_api = (0, 2)

class beetFileSystem(fuse.Fuse):
    def read(self, path, size, offset):
        return data

# AFTER (pyfuse3)
import pyfuse3
import trio

class BeetFS(pyfuse3.Operations):
    async def read(self, fh, offset, size):
        return data

async def main():
    fs = BeetFS()
    fuse_options = set(pyfuse3.default_options)
    fuse_options.add('fsname=beetfs')
    pyfuse3.init(fs, mountpoint, fuse_options)
    try:
        await pyfuse3.main()
    finally:
        pyfuse3.close()

trio.run(main)

Key Differences:

fuse-python pyfuse3
read(path, size, offset) read(fh, offset, size)
Synchronous Async (trio)
Return data directly Return bytes
Path-based File handle based

Option B: llfuse (Alternative)

Lower-level, synchronous.

import llfuse

class BeetFS(llfuse.Operations):
    def read(self, fh, offset, size):
        return data

def main():
    fs = BeetFS()
    llfuse.init(fs, mountpoint, options)
    try:
        llfuse.main()
    finally:
        llfuse.close()

Option C: fusepy (Simple)

Simple wrapper, but less maintained.

from fuse import FUSE, Operations

class BeetFS(Operations):
    def read(self, path, size, offset, fh):
        return data

FUSE(BeetFS(), mountpoint, foreground=True)

Phase 3: Architecture Improvements

3.1 Remove Global State

# BEFORE - Global variables
global structure_split
global structure_depth
global library
global directory_structure

# AFTER - Instance variables
class BeetFS:
    def __init__(self, lib: Library, path_format: str):
        self.lib = lib
        self.path_format = path_format
        self.structure_split = path_format.split("/")
        self.structure_depth = len(self.structure_split)
        self.directory_structure = FSNode({}, {})
        self._build_tree()

3.2 Reduce Memory Usage

# BEFORE - Load entire audio into memory
self.music_data = self.file_object.read()  # Could be 100MB+

# AFTER - Lazy loading with mmap or seek
class FileHandler:
    def __init__(self, path, lib):
        self.real_path = self._resolve_path(path)
        self.file_object = open(self.real_path, 'rb')
        self._header = None  # Lazy load
        self._music_offset = None
    
    @property
    def header(self) -> bytes:
        if self._header is None:
            self._header = self._generate_header()
        return self._header
    
    def read(self, size: int, offset: int) -> bytes:
        if offset < len(self.header):
            # Header region - return from generated header
            if offset + size <= len(self.header):
                return self.header[offset:offset+size]
            else:
                # Span header and audio
                header_part = self.header[offset:]
                audio_offset = 0
                audio_size = size - len(header_part)
                audio_part = self._read_audio(audio_offset, audio_size)
                return header_part + audio_part
        else:
            # Audio region - read directly from file
            audio_offset = offset - len(self.header)
            return self._read_audio(audio_offset, size)
    
    def _read_audio(self, offset: int, size: int) -> bytes:
        self.file_object.seek(self._music_offset + offset)
        return self.file_object.read(size)

3.3 Add Type Hints

from typing import Dict, List, Optional, Tuple
from pathlib import Path

class FSNode:
    def __init__(self, dirs: Dict[str, 'FSNode'], files: Dict[str, int]):
        self.dirs: Dict[str, FSNode] = dirs
        self.files: Dict[str, int] = files
    
    def getnode(self, elements: List[str], root: Optional['FSNode'] = None) -> 'FSNode':
        ...
    
    def addfile(self, elements: List[str], filename: str, item_id: int) -> None:
        ...

3.4 Add MP3 Support

class FileHandler:
    def __init__(self, path: str, lib: Library):
        self.format = Path(path).suffix[1:].lower()
        
        if self.format == "flac":
            self._handler = FLACHandler(self.real_path, self.item)
        elif self.format == "mp3":
            self._handler = MP3Handler(self.real_path, self.item)
        elif self.format in ("ogg", "opus"):
            self._handler = OggHandler(self.real_path, self.item)
        else:
            raise UnsupportedFormatError(f"Format {self.format} not supported")

class FLACHandler:
    def generate_header(self, item: Item) -> bytes:
        inf = InterpolatedFLAC(self.file_data)
        inf["title"] = item.title
        inf["album"] = item.album
        inf["artist"] = item.artist
        inf["genre"] = item.genre
        return inf.get_header()

class MP3Handler:
    def generate_header(self, item: Item) -> bytes:
        # Implement ID3v2 header generation
        id3 = InterpolatedID3()
        id3.add(TIT2(encoding=3, text=item.title))
        id3.add(TPE1(encoding=3, text=item.artist))
        id3.add(TALB(encoding=3, text=item.album))
        id3.add(TCON(encoding=3, text=item.genre))
        
        # Calculate padding to match original header size
        ...
        return id3.render()

Phase 4: Testing

4.1 Unit Tests

import pytest
from beetfs import FSNode, FileHandler

class TestFSNode:
    def test_adddir(self):
        root = FSNode({}, {})
        root.adddir([], "Artist")
        assert "Artist" in root.dirs
    
    def test_addfile(self):
        root = FSNode({}, {})
        root.adddir([], "Artist")
        root.addfile(["Artist"], "track.flac", 42)
        assert root.dirs["Artist"].files["track.flac"] == 42
    
    def test_getnode(self):
        root = FSNode({}, {})
        root.adddir([], "Artist")
        root.adddir(["Artist"], "Album")
        node = root.getnode(["Artist", "Album"])
        assert node is not None

class TestFileHandler:
    def test_read_header(self, mock_flac_file, mock_beets_item):
        handler = FileHandler("/Artist/Album/track.flac", mock_lib)
        data = handler.read(100, 0)
        assert data.startswith(b"fLaC")
    
    def test_read_audio(self, mock_flac_file, mock_beets_item):
        handler = FileHandler("/Artist/Album/track.flac", mock_lib)
        data = handler.read(100, handler.bound + 100)
        # Should be audio data from original file
        assert data == mock_flac_file.audio_data[100:200]

4.2 Integration Tests

import subprocess
import tempfile
import os

class TestFUSEMount:
    def test_mount_unmount(self, beets_library):
        with tempfile.TemporaryDirectory() as mountpoint:
            # Mount
            proc = subprocess.Popen(
                ["beet", "mount", mountpoint],
                stdout=subprocess.PIPE
            )
            time.sleep(1)
            
            # Verify mount
            assert os.path.ismount(mountpoint)
            
            # List files
            files = os.listdir(mountpoint)
            assert len(files) > 0
            
            # Unmount
            subprocess.run(["fusermount", "-u", mountpoint])
            proc.wait()

Phase 5: Standalone Mode (Optional)

Remove beets dependency for use as standalone metadata overlay.

class StandaloneFS:
    """Metadata overlay without beets dependency."""
    
    def __init__(self, 
                 source_dir: Path,
                 metadata_db: Path,
                 path_format: str):
        self.source_dir = source_dir
        self.db = sqlite3.connect(metadata_db)
        self.path_format = path_format
        self._build_tree()
    
    def _build_tree(self):
        """Build virtual tree from source directory and metadata DB."""
        for audio_file in self.source_dir.rglob("*.flac"):
            # Get metadata from DB or scan file
            metadata = self._get_metadata(audio_file)
            # Build virtual path from template
            virtual_path = self._format_path(metadata)
            # Add to tree
            self.directory_structure.addfile(
                virtual_path.parent.parts,
                virtual_path.name,
                str(audio_file)  # Store actual path instead of ID
            )

1. [ ] Fork and set up development environment
2. [ ] Add type hints throughout (helps catch issues)
3. [ ] Fix Python 3 syntax issues
4. [ ] Replace fuse-python with pyfuse3/llfuse
5. [ ] Add unit tests for FSNode and FileHandler
6. [ ] Refactor global state to instance variables
7. [ ] Implement lazy loading for audio data
8. [ ] Add MP3 support
9. [ ] Add integration tests
10. [ ] Optional: Create standalone mode

Estimated Effort

Phase Effort Risk
Phase 1 (Python 3) 2-3 days Low
Phase 2 (FUSE migration) 3-5 days Medium
Phase 3 (Architecture) 3-5 days Medium
Phase 4 (Testing) 2-3 days Low
Phase 5 (Standalone) 3-5 days Medium
Total 13-21 days

Alternative: Rewrite from Scratch

Given the age of the codebase, a rewrite might be more efficient:

Pros of Rewrite:

  • Clean architecture from start
  • Modern async design
  • Better memory management
  • Easier to test

Cons of Rewrite:

  • More initial effort
  • Risk of missing edge cases
  • Need to re-discover FLAC/ID3 intricacies

Recommended Approach: Start with Phase 1-2 to understand the code deeply, then decide whether to continue refactoring or rewrite.