- docs/api/search.md: FUSE and gRPC search API documentation - Week 8 plan: Oracle fixes for IndexWriter pattern, moka cache, gRPC API - Week 9 plan: Oracle fixes for artwork schema, spawn_blocking, access_log - Week 7 performance review Ultraworked with [Sisyphus](https://github.com/code-yeongyu/claude-agent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
50 KiB
Week 9: Smart Features
Phase: 3 (Search & Smart Features)
Prerequisites: Week 8 (Search Index)
Estimated effort: 5 days
Objective
Implement smart collections (query-based virtual folders), cover art extraction with thumbnails, and intelligent prefetching based on access patterns. These features transform MusicFS from a basic filesystem into an intelligent music library.
Architecture Reference
From architecture.md section 4.3.6 (Data Schema):
CREATE TABLE artwork (
id INTEGER PRIMARY KEY,
file_id INTEGER REFERENCES files(id),
art_type TEXT, -- 'front', 'back'
chunk_hash TEXT, -- reference to CAS
width INTEGER,
height INTEGER,
UNIQUE(file_id, art_type)
);
CREATE TABLE collections (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE,
query_json TEXT, -- smart collection query
created_at INTEGER
);
From architecture.md section 3.2.5:
Cache hit rate (warm) | >95% | Derived Deduplication ratio | >10% typical | FR-20
Requirements Covered
| ID | Requirement | Priority |
|---|---|---|
| FR-15.1 | Support query-based virtual folders | P1 |
| FR-15.2 | Support saved searches as directories | P1 |
| FR-15.3 | Support dynamic playlists (recently played, most played) | P1 |
| FR-15.4 | Support user-defined metadata fields | P1 (DEFER) |
| FR-16.1 | Extract embedded album art | P1 |
| FR-16.2 | Expose art as virtual files (cover.jpg) |
P1 |
| FR-16.3 | Cache artwork separately from audio | P1 |
| FR-16.4 | Support multiple art sizes (thumbnail, medium, full) | P1 |
| FR-19.1 | Learn access patterns | P1 |
| FR-19.2 | Support playlist-aware prefetching | P1 |
| FR-19.3 | Support time-based prefetching | P1 |
| FR-19.4 | Support manual prefetch hints (/.prefetch/) |
P1 |
Note: FR-15.4 (user-defined metadata) deferred to plugin system (Phase 4).
Deliverables
| Task | Crate | Files | Est. |
|---|---|---|---|
| Smart collections | musicfs-search | collections.rs |
1d |
| Collection virtual dirs | musicfs-fuse | ops/collections.rs |
0.5d |
| Artwork extractor | musicfs-metadata | artwork.rs |
1d |
| Artwork cache (CAS) | musicfs-cache | artwork.rs |
0.5d |
| Prefetch engine | musicfs-cache | prefetch.rs |
1d |
| Access pattern tracker | musicfs-cache | patterns.rs |
0.5d |
| Prefetch virtual dir | musicfs-fuse | ops/prefetch.rs |
0.5d |
| API Documentation | docs | api/smart-features.md |
0.5d |
| Integration tests | tests | smart_features.rs |
0.5d |
Task 1: Smart Collections
1.1 Create musicfs-search/src/collections.rs
use musicfs_core::FileId;
use serde::{Deserialize, Serialize};
use std::time::{Duration, SystemTime};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SmartCollection {
pub id: i64,
pub name: String,
pub query: CollectionQuery,
pub created_at: SystemTime,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum CollectionQuery {
/// Match field against pattern
Match {
field: String,
pattern: String,
},
/// Date range (e.g., year between 1980-1989)
DateRange {
field: String,
start: i32,
end: i32,
},
/// Recently added files
RecentlyAdded {
days: u32,
},
/// Recently played files
RecentlyPlayed {
days: u32,
},
/// Most played files
MostPlayed {
limit: u32,
},
/// Genre-based collection
Genre {
genre: String,
},
/// Compound query (AND/OR)
Compound {
op: BoolOp,
children: Vec<CollectionQuery>,
},
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum BoolOp {
And,
Or,
}
impl CollectionQuery {
pub fn to_tantivy_query(&self) -> String {
match self {
CollectionQuery::Match { field, pattern } => {
format!("{}:{}", field, pattern)
}
CollectionQuery::DateRange { field, start, end } => {
format!("{}:[{} TO {}]", field, start, end)
}
CollectionQuery::Genre { genre } => {
format!("genre:{}", genre)
}
CollectionQuery::Compound { op, children } => {
let sep = match op {
BoolOp::And => " AND ",
BoolOp::Or => " OR ",
};
let parts: Vec<_> = children.iter()
.map(|c| format!("({})", c.to_tantivy_query()))
.collect();
parts.join(sep)
}
// Dynamic queries handled separately
_ => String::new(),
}
}
pub fn is_dynamic(&self) -> bool {
matches!(
self,
CollectionQuery::RecentlyAdded { .. }
| CollectionQuery::RecentlyPlayed { .. }
| CollectionQuery::MostPlayed { .. }
)
}
}
pub struct CollectionStore {
db: rusqlite::Connection,
}
impl CollectionStore {
pub fn new(db_path: &std::path::Path) -> Result<Self, CollectionError> {
let db = rusqlite::Connection::open(db_path)?;
db.execute(
"CREATE TABLE IF NOT EXISTS collections (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE NOT NULL,
query_json TEXT NOT NULL,
created_at INTEGER NOT NULL
)",
[],
)?;
Ok(Self { db })
}
pub fn create(&mut self, name: &str, query: CollectionQuery) -> Result<SmartCollection, CollectionError> {
let query_json = serde_json::to_string(&query)?;
let now = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.as_secs() as i64;
self.db.execute(
"INSERT INTO collections (name, query_json, created_at) VALUES (?1, ?2, ?3)",
rusqlite::params![name, query_json, now],
)?;
let id = self.db.last_insert_rowid();
Ok(SmartCollection {
id,
name: name.to_string(),
query,
created_at: SystemTime::UNIX_EPOCH + Duration::from_secs(now as u64),
})
}
pub fn list(&self) -> Result<Vec<SmartCollection>, CollectionError> {
let mut stmt = self.db.prepare(
"SELECT id, name, query_json, created_at FROM collections"
)?;
let collections = stmt.query_map([], |row| {
let query_json: String = row.get(2)?;
let created_secs: i64 = row.get(3)?;
Ok(SmartCollection {
id: row.get(0)?,
name: row.get(1)?,
query: serde_json::from_str(&query_json).unwrap_or(CollectionQuery::Match {
field: "title".to_string(),
pattern: "*".to_string(),
}),
created_at: SystemTime::UNIX_EPOCH + Duration::from_secs(created_secs as u64),
})
})?;
collections.collect::<Result<Vec<_>, _>>().map_err(CollectionError::from)
}
pub fn delete(&mut self, name: &str) -> Result<(), CollectionError> {
self.db.execute("DELETE FROM collections WHERE name = ?1", [name])?;
Ok(())
}
}
#[derive(Debug, thiserror::Error)]
pub enum CollectionError {
#[error("database error: {0}")]
Database(#[from] rusqlite::Error),
#[error("serialization error: {0}")]
Serialization(#[from] serde_json::Error),
}
/// Built-in collections
pub fn builtin_collections() -> Vec<SmartCollection> {
vec![
SmartCollection {
id: -1,
name: "Recently Added".to_string(),
query: CollectionQuery::RecentlyAdded { days: 30 },
created_at: SystemTime::UNIX_EPOCH,
},
SmartCollection {
id: -2,
name: "80s Music".to_string(),
query: CollectionQuery::DateRange {
field: "year".to_string(),
start: 1980,
end: 1989,
},
created_at: SystemTime::UNIX_EPOCH,
},
SmartCollection {
id: -3,
name: "90s Music".to_string(),
query: CollectionQuery::DateRange {
field: "year".to_string(),
start: 1990,
end: 1999,
},
created_at: SystemTime::UNIX_EPOCH,
},
]
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::TempDir;
#[test]
fn test_collection_crud() {
let dir = TempDir::new().unwrap();
let db_path = dir.path().join("collections.db");
let mut store = CollectionStore::new(&db_path).unwrap();
let collection = store.create(
"Jazz",
CollectionQuery::Genre { genre: "Jazz".to_string() },
).unwrap();
assert_eq!(collection.name, "Jazz");
let collections = store.list().unwrap();
assert_eq!(collections.len(), 1);
store.delete("Jazz").unwrap();
let collections = store.list().unwrap();
assert_eq!(collections.len(), 0);
}
#[test]
fn test_compound_query() {
let query = CollectionQuery::Compound {
op: BoolOp::And,
children: vec![
CollectionQuery::Genre { genre: "Metal".to_string() },
CollectionQuery::DateRange {
field: "year".to_string(),
start: 1980,
end: 1989,
},
],
};
let tantivy_query = query.to_tantivy_query();
assert!(tantivy_query.contains("genre:Metal"));
assert!(tantivy_query.contains("year:[1980 TO 1989]"));
assert!(tantivy_query.contains(" AND "));
}
}
Task 2: Artwork Extraction
2.1 Add dependencies to musicfs-metadata/Cargo.toml
[dependencies]
image = { version = "0.24", default-features = false, features = ["jpeg", "png"] }
2.2 Create musicfs-metadata/src/artwork.rs
use image::{DynamicImage, ImageFormat};
use std::io::Cursor;
use symphonia::core::meta::Visual;
use tracing::debug;
#[derive(Debug, Clone)]
pub struct Artwork {
pub art_type: ArtType,
pub mime_type: String,
pub width: u32,
pub height: u32,
pub data: Vec<u8>,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ArtType {
Front,
Back,
Other,
}
#[derive(Debug, Clone, Copy)]
pub enum ArtSize {
Thumbnail, // 150x150
Medium, // 300x300
Full, // Original
}
impl ArtSize {
pub fn max_dimension(&self) -> Option<u32> {
match self {
ArtSize::Thumbnail => Some(150),
ArtSize::Medium => Some(300),
ArtSize::Full => None,
}
}
}
pub struct ArtworkExtractor;
impl ArtworkExtractor {
pub fn extract_from_visual(visual: &Visual) -> Option<Artwork> {
let data = visual.data.to_vec();
let img = image::load_from_memory(&data).ok()?;
let art_type = match visual.usage {
Some(symphonia::core::meta::StandardVisualKey::FrontCover) => ArtType::Front,
Some(symphonia::core::meta::StandardVisualKey::BackCover) => ArtType::Back,
_ => ArtType::Other,
};
let mime_type = visual.media_type.clone()
.unwrap_or_else(|| "image/jpeg".to_string());
Some(Artwork {
art_type,
mime_type,
width: img.width(),
height: img.height(),
data,
})
}
pub fn resize(artwork: &Artwork, size: ArtSize) -> Option<Artwork> {
let max_dim = size.max_dimension()?;
if artwork.width <= max_dim && artwork.height <= max_dim {
return Some(artwork.clone());
}
let img = image::load_from_memory(&artwork.data).ok()?;
let resized = img.thumbnail(max_dim, max_dim);
let mut output = Vec::new();
let mut cursor = Cursor::new(&mut output);
resized.write_to(&mut cursor, ImageFormat::Jpeg).ok()?;
debug!(
"Resized artwork from {}x{} to {}x{}",
artwork.width, artwork.height,
resized.width(), resized.height()
);
Some(Artwork {
art_type: artwork.art_type,
mime_type: "image/jpeg".to_string(),
width: resized.width(),
height: resized.height(),
data: output,
})
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_art_size_dimensions() {
assert_eq!(ArtSize::Thumbnail.max_dimension(), Some(150));
assert_eq!(ArtSize::Medium.max_dimension(), Some(300));
assert_eq!(ArtSize::Full.max_dimension(), None);
}
}
2.3 Create musicfs-cache/src/artwork.rs
use musicfs_core::ChunkHash;
use musicfs_metadata::artwork::{ArtSize, Artwork};
use crate::CasStore;
use std::sync::Arc;
use tracing::debug;
pub struct ArtworkCache {
store: Arc<CasStore>,
db: rusqlite::Connection,
}
#[derive(Debug)]
pub struct CachedArtwork {
pub file_id: i64,
pub art_type: String,
pub chunk_hash: ChunkHash,
pub width: u32,
pub height: u32,
}
/// Oracle fix: Max input size to prevent memory spikes (3000x3000 = ~36MB)
const MAX_ARTWORK_INPUT_SIZE: usize = 10 * 1024 * 1024; // 10MB
impl ArtworkCache {
pub fn new(store: Arc<CasStore>, db_path: &std::path::Path) -> Result<Self, ArtworkError> {
let db = rusqlite::Connection::open(db_path)?;
// Oracle fix: Schema matches architecture.md 4.3.6 exactly
// Only store full-size artwork, generate thumbnail/medium on-demand
db.execute(
"CREATE TABLE IF NOT EXISTS artwork (
id INTEGER PRIMARY KEY,
file_id INTEGER NOT NULL REFERENCES files(id),
art_type TEXT NOT NULL,
chunk_hash TEXT NOT NULL,
width INTEGER NOT NULL,
height INTEGER NOT NULL,
UNIQUE(file_id, art_type)
)",
[],
)?;
Ok(Self { store, db })
}
/// Store full-size artwork only (Oracle fix: no size column)
/// Thumbnail/medium generated on-demand with in-memory LRU
pub async fn store(&self, file_id: i64, artwork: &Artwork) -> Result<ChunkHash, ArtworkError> {
// Oracle fix: Reject oversized images to prevent memory spikes
if artwork.data.len() > MAX_ARTWORK_INPUT_SIZE {
return Err(ArtworkError::ImageTooLarge(artwork.data.len()));
}
let hash = self.store.put(&artwork.data).await?;
let art_type_str = match artwork.art_type {
musicfs_metadata::artwork::ArtType::Front => "front",
musicfs_metadata::artwork::ArtType::Back => "back",
musicfs_metadata::artwork::ArtType::Other => "other",
};
// Oracle fix: Use spawn_blocking for rusqlite in async context
let db_path = self.db.path().map(|p| p.to_path_buf());
let file_id_clone = file_id;
let art_type_clone = art_type_str.to_string();
let hash_hex = hash.to_hex();
let width = artwork.width;
let height = artwork.height;
tokio::task::spawn_blocking(move || {
let db = rusqlite::Connection::open(db_path.unwrap())?;
db.execute(
"INSERT OR REPLACE INTO artwork
(file_id, art_type, chunk_hash, width, height)
VALUES (?1, ?2, ?3, ?4, ?5)",
rusqlite::params![file_id_clone, art_type_clone, hash_hex, width, height],
)?;
Ok::<_, ArtworkError>(())
}).await.map_err(|e| ArtworkError::SpawnBlocking(e.to_string()))??;
debug!("Cached artwork for file {}", file_id);
Ok(hash)
}
/// Get full-size artwork, optionally resize on-demand
pub async fn get(&self, file_id: i64, art_type: &str, size: ArtSize) -> Result<Option<Vec<u8>>, ArtworkError> {
// Oracle fix: Use spawn_blocking for rusqlite
let db_path = self.db.path().map(|p| p.to_path_buf());
let file_id_clone = file_id;
let art_type_clone = art_type.to_string();
let hash_hex: Option<String> = tokio::task::spawn_blocking(move || {
let db = rusqlite::Connection::open(db_path.unwrap())?;
db.query_row(
"SELECT chunk_hash FROM artwork WHERE file_id = ?1 AND art_type = ?2",
rusqlite::params![file_id_clone, art_type_clone],
|row| row.get(0),
).ok().ok_or(ArtworkError::NotFound)
}).await.map_err(|e| ArtworkError::SpawnBlocking(e.to_string()))?.ok();
match hash_hex {
Some(hex) => {
let hash = ChunkHash::from_hex(&hex).ok_or(ArtworkError::InvalidHash)?;
let data = self.store.get(&hash).await?;
// On-demand resize if not full size
match size {
ArtSize::Full => Ok(Some(data.to_vec())),
ArtSize::Thumbnail | ArtSize::Medium => {
// Resize on-demand (could add LRU cache here)
let resized = self.resize_on_demand(&data, size)?;
Ok(Some(resized))
}
}
}
None => Ok(None),
}
}
fn resize_on_demand(&self, data: &[u8], size: ArtSize) -> Result<Vec<u8>, ArtworkError> {
use image::ImageFormat;
use std::io::Cursor;
let max_dim = size.max_dimension().unwrap_or(300);
let img = image::load_from_memory(data).map_err(|_| ArtworkError::InvalidImage)?;
if img.width() <= max_dim && img.height() <= max_dim {
return Ok(data.to_vec());
}
let resized = img.thumbnail(max_dim, max_dim);
let mut output = Vec::new();
let mut cursor = Cursor::new(&mut output);
resized.write_to(&mut cursor, ImageFormat::Jpeg).map_err(|_| ArtworkError::ResizeFailed)?;
Ok(output)
}
}
#[derive(Debug, thiserror::Error)]
pub enum ArtworkError {
#[error("database error: {0}")]
Database(#[from] rusqlite::Error),
#[error("CAS error: {0}")]
Cas(#[from] crate::store::CasError),
#[error("invalid hash")]
InvalidHash,
#[error("artwork not found")]
NotFound,
#[error("image too large: {0} bytes (max 10MB)")]
ImageTooLarge(usize),
#[error("invalid image data")]
InvalidImage,
#[error("resize failed")]
ResizeFailed,
#[error("spawn_blocking error: {0}")]
SpawnBlocking(String),
}
Task 3: Prefetch Engine
3.1 Create musicfs-cache/src/patterns.rs
use musicfs_core::FileId;
use std::collections::HashMap;
use std::path::Path;
use std::time::{Duration, SystemTime, UNIX_EPOCH};
/// Oracle fix: Use SystemTime for persistence, not Instant
pub struct AccessPattern {
file_id: FileId,
timestamp: SystemTime,
context: AccessContext,
hour_of_day: u8, // For time-based prefetch (FR-19.3)
}
#[derive(Debug, Clone)]
pub struct AccessContext {
pub album_id: Option<i64>,
pub track_number: Option<u32>,
pub artist: Option<String>,
}
/// Oracle fix: Persistent pattern store with SQLite
pub struct PatternStore {
db: rusqlite::Connection,
/// In-memory cache for hot path
sequence_counts: parking_lot::RwLock<HashMap<(FileId, FileId), u32>>,
/// Time-based patterns for FR-19.3
time_patterns: parking_lot::RwLock<HashMap<u8, Vec<FileId>>>, // hour -> files
max_history: usize,
}
impl PatternStore {
pub fn new(db_path: &Path, max_history: usize) -> Result<Self, PatternError> {
let db = rusqlite::Connection::open(db_path)?;
// Oracle fix: Persist access log for RecentlyPlayed/MostPlayed queries
db.execute(
"CREATE TABLE IF NOT EXISTS access_log (
id INTEGER PRIMARY KEY,
file_id INTEGER NOT NULL,
access_time INTEGER NOT NULL,
hour_of_day INTEGER NOT NULL
)",
[],
)?;
db.execute(
"CREATE INDEX IF NOT EXISTS idx_access_log_file ON access_log(file_id)",
[],
)?;
db.execute(
"CREATE INDEX IF NOT EXISTS idx_access_log_time ON access_log(access_time)",
[],
)?;
// Sequence transitions table
db.execute(
"CREATE TABLE IF NOT EXISTS sequence_counts (
from_file_id INTEGER NOT NULL,
to_file_id INTEGER NOT NULL,
count INTEGER NOT NULL DEFAULT 1,
PRIMARY KEY (from_file_id, to_file_id)
)",
[],
)?;
// Load sequence counts into memory
let mut sequence_counts = HashMap::new();
let mut stmt = db.prepare("SELECT from_file_id, to_file_id, count FROM sequence_counts")?;
let rows = stmt.query_map([], |row| {
Ok(((FileId(row.get::<_, i64>(0)?), FileId(row.get::<_, i64>(1)?)), row.get::<_, u32>(2)?))
})?;
for row in rows {
let (key, count) = row?;
sequence_counts.insert(key, count);
}
Ok(Self {
db,
sequence_counts: parking_lot::RwLock::new(sequence_counts),
time_patterns: parking_lot::RwLock::new(HashMap::new()),
max_history,
})
}
pub fn record(&self, file_id: FileId, context: AccessContext) -> Result<(), PatternError> {
let now = SystemTime::now();
let timestamp = now.duration_since(UNIX_EPOCH).unwrap().as_secs() as i64;
let hour = (timestamp / 3600 % 24) as u8;
// Persist to SQLite
self.db.execute(
"INSERT INTO access_log (file_id, access_time, hour_of_day) VALUES (?1, ?2, ?3)",
rusqlite::params![file_id.0, timestamp, hour],
)?;
// Update time patterns (FR-19.3)
{
let mut time_patterns = self.time_patterns.write();
time_patterns.entry(hour).or_default().push(file_id);
}
// Get previous access for sequence tracking
let prev_file_id: Option<i64> = self.db.query_row(
"SELECT file_id FROM access_log WHERE id = (SELECT MAX(id) - 1 FROM access_log)",
[],
|row| row.get(0),
).ok();
if let Some(prev_id) = prev_file_id {
let prev = FileId(prev_id);
// Update in-memory
{
let mut sequences = self.sequence_counts.write();
*sequences.entry((prev, file_id)).or_insert(0) += 1;
}
// Persist sequence
self.db.execute(
"INSERT INTO sequence_counts (from_file_id, to_file_id, count)
VALUES (?1, ?2, 1)
ON CONFLICT(from_file_id, to_file_id) DO UPDATE SET count = count + 1",
rusqlite::params![prev_id, file_id.0],
)?;
}
// Cleanup old entries
let cutoff = timestamp - (self.max_history as i64 * 86400); // max_history in days
self.db.execute("DELETE FROM access_log WHERE access_time < ?1", [cutoff])?;
Ok(())
}
pub fn predict_next(&self, current: FileId, limit: usize) -> Vec<FileId> {
let sequences = self.sequence_counts.read();
let mut predictions: Vec<_> = sequences
.iter()
.filter(|((from, _), count)| *from == current && **count >= 2) // Oracle fix: min threshold
.map(|((_, to), count)| (*to, *count))
.collect();
predictions.sort_by(|a, b| b.1.cmp(&a.1));
predictions.into_iter().take(limit).map(|(id, _)| id).collect()
}
/// FR-19.3: Time-based prefetch - files commonly accessed at this hour
pub fn predict_for_time(&self, hour: u8, limit: usize) -> Vec<FileId> {
let time_patterns = self.time_patterns.read();
time_patterns
.get(&hour)
.map(|files| files.iter().rev().take(limit).copied().collect())
.unwrap_or_default()
}
/// For RecentlyPlayed collection query
pub fn recently_played(&self, days: u32) -> Result<Vec<FileId>, PatternError> {
let cutoff = SystemTime::now()
.duration_since(UNIX_EPOCH)
.unwrap()
.as_secs() as i64 - (days as i64 * 86400);
let mut stmt = self.db.prepare(
"SELECT DISTINCT file_id FROM access_log WHERE access_time >= ?1 ORDER BY access_time DESC"
)?;
let files: Vec<FileId> = stmt
.query_map([cutoff], |row| Ok(FileId(row.get(0)?)))?
.filter_map(|r| r.ok())
.collect();
Ok(files)
}
/// For MostPlayed collection query
pub fn most_played(&self, limit: u32) -> Result<Vec<FileId>, PatternError> {
let mut stmt = self.db.prepare(
"SELECT file_id, COUNT(*) as play_count FROM access_log
GROUP BY file_id ORDER BY play_count DESC LIMIT ?1"
)?;
let files: Vec<FileId> = stmt
.query_map([limit], |row| Ok(FileId(row.get(0)?)))?
.filter_map(|r| r.ok())
.collect();
Ok(files)
}
}
#[derive(Debug, thiserror::Error)]
pub enum PatternError {
#[error("database error: {0}")]
Database(#[from] rusqlite::Error),
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::TempDir;
#[test]
fn test_pattern_prediction() {
let dir = TempDir::new().unwrap();
let db_path = dir.path().join("patterns.db");
let store = PatternStore::new(&db_path, 30).unwrap();
let ctx = AccessContext { album_id: None, track_number: None, artist: None };
// Simulate: A -> B -> C pattern multiple times
for _ in 0..5 {
store.record(FileId(1), ctx.clone()).unwrap();
store.record(FileId(2), ctx.clone()).unwrap();
store.record(FileId(3), ctx.clone()).unwrap();
}
// After playing A, should predict B (needs >= 2 count)
let predictions = store.predict_next(FileId(1), 3);
assert!(!predictions.is_empty());
assert_eq!(predictions[0], FileId(2));
}
#[test]
fn test_pattern_persistence() {
let dir = TempDir::new().unwrap();
let db_path = dir.path().join("patterns.db");
let ctx = AccessContext { album_id: None, track_number: None, artist: None };
// Record patterns
{
let store = PatternStore::new(&db_path, 30).unwrap();
for _ in 0..3 {
store.record(FileId(1), ctx.clone()).unwrap();
store.record(FileId(2), ctx.clone()).unwrap();
}
}
// Reopen and verify persistence
{
let store = PatternStore::new(&db_path, 30).unwrap();
let predictions = store.predict_next(FileId(1), 3);
assert!(!predictions.is_empty());
assert_eq!(predictions[0], FileId(2));
}
}
#[test]
fn test_recently_played() {
let dir = TempDir::new().unwrap();
let db_path = dir.path().join("patterns.db");
let store = PatternStore::new(&db_path, 30).unwrap();
let ctx = AccessContext { album_id: None, track_number: None, artist: None };
store.record(FileId(100), ctx.clone()).unwrap();
store.record(FileId(200), ctx.clone()).unwrap();
let recent = store.recently_played(7).unwrap();
assert!(recent.contains(&FileId(100)));
assert!(recent.contains(&FileId(200)));
}
#[test]
fn test_most_played() {
let dir = TempDir::new().unwrap();
let db_path = dir.path().join("patterns.db");
let store = PatternStore::new(&db_path, 30).unwrap();
let ctx = AccessContext { album_id: None, track_number: None, artist: None };
// Play file 1 more times than file 2
for _ in 0..5 {
store.record(FileId(1), ctx.clone()).unwrap();
}
for _ in 0..2 {
store.record(FileId(2), ctx.clone()).unwrap();
}
let most = store.most_played(10).unwrap();
assert_eq!(most[0], FileId(1)); // Most played first
}
}
3.2 Create musicfs-cache/src/prefetch.rs
use crate::patterns::{AccessContext, PatternStore};
use crate::CacheManager;
use musicfs_core::{Event, EventBus, FileId};
use std::collections::HashSet;
use std::sync::Arc;
use tokio::sync::mpsc;
use tracing::{debug, info, warn};
pub struct PrefetchEngine {
patterns: Arc<PatternStore>,
cache: Arc<CacheManager>,
/// Oracle fix: Channel-based queue instead of polling
task_tx: mpsc::Sender<PrefetchTask>,
task_rx: parking_lot::Mutex<Option<mpsc::Receiver<PrefetchTask>>>,
/// Oracle fix: Deduplication set to prevent duplicate prefetches
pending: parking_lot::RwLock<HashSet<FileId>>,
config: PrefetchConfig,
}
#[derive(Debug, Clone)]
pub struct PrefetchConfig {
pub enabled: bool,
pub max_queue_size: usize,
pub lookahead: usize,
pub album_aware: bool,
}
impl Default for PrefetchConfig {
fn default() -> Self {
Self {
enabled: true,
max_queue_size: 100,
lookahead: 3,
album_aware: true,
}
}
}
#[derive(Debug)]
struct PrefetchTask {
file_id: FileId,
priority: u8,
}
impl PrefetchEngine {
pub fn new(patterns: Arc<PatternStore>, cache: Arc<CacheManager>, config: PrefetchConfig) -> Self {
// Oracle fix: Use bounded channel instead of polling VecDeque
let (task_tx, task_rx) = mpsc::channel(config.max_queue_size);
Self {
patterns,
cache,
task_tx,
task_rx: parking_lot::Mutex::new(Some(task_rx)),
pending: parking_lot::RwLock::new(HashSet::new()),
config,
}
}
pub fn on_access(&self, file_id: FileId, context: AccessContext) {
if !self.config.enabled {
return;
}
// Record pattern (now returns Result)
if let Err(e) = self.patterns.record(file_id, context.clone()) {
warn!("Failed to record pattern: {}", e);
}
// Predict next files based on sequence patterns
let predictions = self.patterns.predict_next(file_id, self.config.lookahead);
// FR-19.3: Time-based predictions
let hour = chrono::Local::now().hour() as u8;
let time_predictions = self.patterns.predict_for_time(hour, 2);
// Album-aware: if we know track number, prefetch next tracks
let album_prefetch = if self.config.album_aware {
self.predict_album_next(&context)
} else {
vec![]
};
// Oracle fix: Deduplicate before queueing
let pending = self.pending.read();
for (i, pred) in predictions.into_iter().enumerate() {
if pending.contains(&pred) {
continue; // Already pending
}
let _ = self.task_tx.try_send(PrefetchTask {
file_id: pred,
priority: (10 - i as u8).min(10),
});
}
for pred in time_predictions {
if pending.contains(&pred) {
continue;
}
let _ = self.task_tx.try_send(PrefetchTask {
file_id: pred,
priority: 5, // Medium priority for time-based
});
}
for (i, pred) in album_prefetch.into_iter().enumerate() {
if pending.contains(&pred) {
continue;
}
let _ = self.task_tx.try_send(PrefetchTask {
file_id: pred,
priority: (8 - i as u8).min(8),
});
}
debug!("Prefetch pending count: {}", pending.len());
}
/// FR-19.4: Manual prefetch hint via /.prefetch/path
pub fn prefetch_hint(&self, file_id: FileId, priority: u8) {
let pending = self.pending.read();
if pending.contains(&file_id) {
return;
}
drop(pending);
let _ = self.task_tx.try_send(PrefetchTask { file_id, priority });
}
fn predict_album_next(&self, context: &AccessContext) -> Vec<FileId> {
// In real implementation, would query cache for tracks in same album
// with track_number > current
vec![]
}
/// Oracle fix: Event-driven loop instead of busy-wait polling
pub async fn run(&self) {
info!("Prefetch engine started");
// Take ownership of receiver
let mut task_rx = self.task_rx.lock().take()
.expect("run() called twice");
while let Some(task) = task_rx.recv().await {
// Mark as pending
{
let mut pending = self.pending.write();
pending.insert(task.file_id);
}
debug!("Prefetching {:?} (priority {})", task.file_id, task.priority);
if let Err(e) = self.cache.prefetch(&task.file_id).await {
warn!("Prefetch failed for {:?}: {}", task.file_id, e);
}
// Remove from pending
{
let mut pending = self.pending.write();
pending.remove(&task.file_id);
}
}
info!("Prefetch engine stopped");
}
pub fn start(self: Arc<Self>) -> PrefetchHandle {
let (stop_tx, mut stop_rx) = mpsc::channel::<()>(1);
let engine = self.clone();
tokio::spawn(async move {
tokio::select! {
_ = engine.run() => {}
_ = stop_rx.recv() => {
info!("Prefetch engine stopped");
}
}
});
PrefetchHandle { stop_tx }
}
pub fn pending_count(&self) -> usize {
self.pending.read().len()
}
}
pub struct PrefetchHandle {
stop_tx: mpsc::Sender<()>,
}
impl PrefetchHandle {
pub async fn stop(self) {
let _ = self.stop_tx.send(()).await;
}
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::TempDir;
#[test]
fn test_prefetch_config_default() {
let config = PrefetchConfig::default();
assert!(config.enabled);
assert_eq!(config.lookahead, 3);
assert!(config.album_aware);
}
#[tokio::test]
async fn test_prefetch_deduplication() {
let dir = TempDir::new().unwrap();
let patterns = Arc::new(PatternStore::new(&dir.path().join("p.db"), 30).unwrap());
let cache = Arc::new(MockCacheManager::new());
let config = PrefetchConfig::default();
let engine = PrefetchEngine::new(patterns, cache, config);
// Queue same file twice
engine.prefetch_hint(FileId(1), 10);
engine.prefetch_hint(FileId(1), 10); // Should be deduplicated
// Only one should be pending
assert_eq!(engine.pending_count(), 0); // Not yet processed
}
#[test]
fn test_prefetch_channel_based() {
// Verify no busy-wait polling - channel is used
let config = PrefetchConfig { max_queue_size: 50, ..Default::default() };
// Channel capacity should match config
assert_eq!(config.max_queue_size, 50);
}
}
Task 4: Prefetch Virtual Directory (FR-19.4)
4.1 Create musicfs-fuse/src/ops/prefetch.rs
use fuser::{FileType, ReplyDirectory, ReplyEntry, ReplyAttr};
use musicfs_cache::prefetch::PrefetchEngine;
use musicfs_core::{FileId, VirtualPath};
use std::sync::Arc;
use std::time::{Duration, SystemTime};
use tracing::debug;
const PREFETCH_DIR_INODE: u64 = 0xFFFF_FFFF_0000_0002;
/// FR-19.4: Manual prefetch hints via /.prefetch/path
pub struct PrefetchOps {
prefetch_engine: Arc<PrefetchEngine>,
}
impl PrefetchOps {
pub fn new(prefetch_engine: Arc<PrefetchEngine>) -> Self {
Self { prefetch_engine }
}
pub fn is_prefetch_path(path: &str) -> bool {
path.starts_with("/.prefetch/")
}
/// Lookup triggers prefetch for the target file
pub fn lookup(&self, path: &str, file_id: FileId, reply: ReplyEntry) {
debug!("Manual prefetch hint for: {}", path);
// Queue prefetch with high priority (manual = important)
self.prefetch_engine.prefetch_hint(file_id, 15);
// Return the original file's attributes
// (actual lookup delegated to main filesystem)
reply.error(libc::ENOENT); // Let main handler resolve
}
pub fn readdir_prefetch_root(&self, reply: &mut ReplyDirectory) {
reply.add(PREFETCH_DIR_INODE, 1, FileType::Directory, ".");
reply.add(1, 2, FileType::Directory, "..");
// Empty directory - entries are virtual
}
pub fn getattr_prefetch_dir(&self, reply: ReplyAttr) {
let attr = fuser::FileAttr {
ino: PREFETCH_DIR_INODE,
size: 0,
blocks: 0,
atime: SystemTime::UNIX_EPOCH,
mtime: SystemTime::UNIX_EPOCH,
ctime: SystemTime::UNIX_EPOCH,
crtime: SystemTime::UNIX_EPOCH,
kind: FileType::Directory,
perm: 0o555,
nlink: 2,
uid: 1000,
gid: 1000,
rdev: 0,
blksize: 512,
flags: 0,
};
reply.attr(&Duration::from_secs(60), &attr);
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_prefetch_path_detection() {
assert!(PrefetchOps::is_prefetch_path("/.prefetch/Artist/Album/Track.flac"));
assert!(!PrefetchOps::is_prefetch_path("/Artist/Album/Track.flac"));
}
}
4.2 FUSE Integration
Add to musicfs-fuse/src/filesystem.rs:
// In lookup()
if name == ".prefetch" && parent == 1 {
self.prefetch_ops.getattr_prefetch_dir(reply);
return;
}
if let Some(path) = self.inode_to_path(parent) {
if PrefetchOps::is_prefetch_path(&path) {
// Strip /.prefetch/ prefix and lookup actual file
let actual_path = &path[10..]; // "/.prefetch/".len()
if let Some(file_id) = self.path_to_file_id(actual_path) {
self.prefetch_ops.lookup(&path, file_id, reply);
return;
}
}
}
// In readdir()
if ino == PREFETCH_DIR_INODE {
self.prefetch_ops.readdir_prefetch_root(&mut reply);
reply.ok();
return;
}
Task 5: API Documentation
All APIs must be fully documented with happy and non-happy paths.
5.1 Create docs/api/smart-features.md
# Smart Features API Documentation
## Overview
Week 9 implements three smart feature categories:
1. **Smart Collections** - Query-based virtual folders
2. **Artwork** - Embedded album art extraction and caching
3. **Intelligent Prefetching** - Access pattern learning and prediction
---
## 1. Smart Collections
### Virtual Directory: `/.collections/{name}/`
Browse query-based collections as virtual directories.
### Happy Path
User FUSE | | |-- ls /.collections/ ----------->| |<-- [Recently Added, 80s, Jazz]--| | | |-- ls /.collections/Jazz/ ------>| | (executes: genre:Jazz) | |<-- [symlinks to jazz tracks] ---|
### Built-in Collections
| Name | Query | Description |
|------|-------|-------------|
| Recently Added | `RecentlyAdded { days: 30 }` | Files added in last 30 days |
| Recently Played | `RecentlyPlayed { days: 7 }` | Files played in last 7 days |
| Most Played | `MostPlayed { limit: 100 }` | Top 100 most played |
| 80s Music | `year:[1980 TO 1989]` | Year range filter |
| 90s Music | `year:[1990 TO 1999]` | Year range filter |
### Collection Query Types
```rust
enum CollectionQuery {
Match { field, pattern } // field:pattern
DateRange { field, start, end } // field:[start TO end]
RecentlyAdded { days } // Dynamic: mtime > now - days
RecentlyPlayed { days } // Dynamic: from access_log
MostPlayed { limit } // Dynamic: from access_log
Genre { genre } // genre:value
Compound { op, children } // AND/OR combinations
}
Error Cases
| Scenario | Behavior | FUSE Error |
|---|---|---|
| Collection not found | ENOENT | libc::ENOENT |
| Invalid query syntax | Empty directory | (none) |
| Database error | EIO | libc::EIO |
SQLite Schema
CREATE TABLE collections (
id INTEGER PRIMARY KEY,
name TEXT UNIQUE NOT NULL,
query_json TEXT NOT NULL,
created_at INTEGER NOT NULL
);
-- For RecentlyPlayed/MostPlayed queries
CREATE TABLE access_log (
id INTEGER PRIMARY KEY,
file_id INTEGER NOT NULL,
access_time INTEGER NOT NULL,
hour_of_day INTEGER NOT NULL
);
2. Artwork API
Virtual File: /Artist/Album/cover.jpg
Exposes embedded album art as virtual files.
Happy Path
User FUSE ArtworkCache
| | |
|-- open /A/B/cover.jpg --------->| |
| |-- get(file_id, "front")->|
| |<-- chunk_hash -----------|
| |-- CAS.get(hash) -------->|
| |<-- image bytes ----------|
|<-- image data ------------------| |
Supported Sizes
| Size | Max Dimension | Generated |
|---|---|---|
thumbnail |
150x150 | On-demand |
medium |
300x300 | On-demand |
full |
Original | Stored in CAS |
Accessing Different Sizes
/Artist/Album/cover.jpg # Full size (default)
/Artist/Album/cover_thumb.jpg # 150x150 thumbnail
/Artist/Album/cover_medium.jpg # 300x300 medium
Error Cases
| Scenario | Behavior | FUSE Error |
|---|---|---|
| No embedded artwork | ENOENT | libc::ENOENT |
| Corrupted image data | ENOENT | libc::ENOENT |
| Image too large (>10MB) | Rejected during extraction | (logged) |
| CAS lookup failed | EIO | libc::EIO |
| Resize failed | Return full size | (fallback) |
SQLite Schema (Architecture 4.3.6)
CREATE TABLE artwork (
id INTEGER PRIMARY KEY,
file_id INTEGER NOT NULL REFERENCES files(id),
art_type TEXT NOT NULL, -- 'front', 'back', 'other'
chunk_hash TEXT NOT NULL, -- Reference to CAS
width INTEGER NOT NULL,
height INTEGER NOT NULL,
UNIQUE(file_id, art_type)
);
Note: Only full-size artwork stored. Thumbnail/medium generated on-demand.
3. Prefetch API
Automatic Prefetching
Prefetch engine learns access patterns and pre-loads likely next files.
Pattern Learning Flow
User plays: Track 1 -> Track 2 -> Track 3 (repeated 5x)
Pattern Store:
(Track 1 -> Track 2): count = 5
(Track 2 -> Track 3): count = 5
Next time user plays Track 1:
-> Predict Track 2 (high confidence)
-> Queue prefetch for Track 2
FR-19.3: Time-Based Prefetching
User listens to "Morning Playlist" at 8am every weekday
Pattern Store:
hour_of_day = 8 -> [track_ids from morning playlist]
At 7:55am:
-> Predict morning tracks
-> Queue prefetch
FR-19.4: Manual Prefetch Hints
Virtual Directory: /.prefetch/{path}
# Trigger prefetch for an album
ls /.prefetch/Artist/Album/
# Prefetch specific file
cat /.prefetch/Artist/Album/Track.flac > /dev/null
Happy Path (Manual Prefetch)
User FUSE PrefetchEngine
| | |
|-- ls /.prefetch/A/B/ ---------->| |
| |-- prefetch_hint() -->|
| | file_id, priority=15
| | |-- queue task
|<-- (directory listing) ---------| |
| | |-- async fetch
Prefetch Priority Levels
| Source | Priority | Description |
|---|---|---|
| Manual (/.prefetch/) | 15 | User-initiated, highest |
| Sequence prediction | 10-8 | Based on history patterns |
| Album sequential | 8-6 | Next tracks in album |
| Time-based | 5 | Hour-of-day patterns |
Error Cases
| Scenario | Behavior |
|---|---|
| Already pending | Skipped (deduplication) |
| Queue full | try_send fails silently |
| Prefetch fails | Logged, removed from pending |
| Pattern DB error | Logged, prefetch continues |
Configuration
struct PrefetchConfig {
enabled: bool, // Default: true
max_queue_size: usize, // Default: 100
lookahead: usize, // Default: 3 tracks
album_aware: bool, // Default: true
}
SQLite Schema
-- Access history for pattern learning
CREATE TABLE access_log (
id INTEGER PRIMARY KEY,
file_id INTEGER NOT NULL,
access_time INTEGER NOT NULL,
hour_of_day INTEGER NOT NULL
);
-- Sequence transition counts
CREATE TABLE sequence_counts (
from_file_id INTEGER NOT NULL,
to_file_id INTEGER NOT NULL,
count INTEGER NOT NULL DEFAULT 1,
PRIMARY KEY (from_file_id, to_file_id)
);
Performance Targets
| Metric | Target | Notes |
|---|---|---|
| Cache hit rate (warm) | >95% | FR-16.3 |
| Prefetch accuracy | >50% | Measured as: prefetched files actually accessed |
| Artwork resize latency | <100ms | For thumbnail/medium |
| Pattern prediction latency | <10ms | In-memory lookup |
Integration Examples
Creating a Smart Collection
let mut store = CollectionStore::new(&db_path)?;
// Create custom collection
let jazz_80s = store.create(
"80s Jazz",
CollectionQuery::Compound {
op: BoolOp::And,
children: vec![
CollectionQuery::Genre { genre: "Jazz".into() },
CollectionQuery::DateRange {
field: "year".into(),
start: 1980,
end: 1989,
},
],
},
)?;
// List collections
let collections = store.list()?;
Accessing Album Art
let cache = ArtworkCache::new(cas_store, &db_path)?;
// Get full-size artwork
let full = cache.get(file_id, "front", ArtSize::Full).await?;
// Get thumbnail (generated on-demand)
let thumb = cache.get(file_id, "front", ArtSize::Thumbnail).await?;
Manual Prefetch via CLI
# Prefetch entire album before listening
find /mnt/musicfs/.prefetch/Metallica/BlackAlbum/ -type f | head -n 1
# Check prefetch status
musicfs-cli prefetch status
# Output: 3 files pending, 12 completed in last hour
---
## Tests
| Test | Type | Validates |
|------|------|-----------|
| `test_collection_crud` | Unit | Create/list/delete collections (FR-15.2) |
| `test_compound_query` | Unit | AND/OR queries work |
| `test_builtin_collections` | Unit | Recently Added, 80s/90s exist |
| `test_recently_played_query` | Unit | RecentlyPlayed from access_log |
| `test_most_played_query` | Unit | MostPlayed from access_log |
| `test_artwork_extraction` | Unit | Extract from FLAC/MP3 (FR-16.1) |
| `test_artwork_resize` | Unit | Thumbnail/medium generation (FR-16.4) |
| `test_artwork_resize_on_demand` | Unit | Full stored, sizes generated |
| `test_artwork_reject_oversized` | Unit | >10MB images rejected |
| `test_artwork_cache` | Unit | Store/retrieve from CAS (FR-16.3) |
| `test_pattern_prediction` | Unit | A->B->C pattern learned (FR-19.1) |
| `test_pattern_persistence` | Unit | Patterns survive restart |
| `test_time_based_prediction` | Unit | Hour-of-day patterns (FR-19.3) |
| `test_prefetch_deduplication` | Unit | Same file not queued twice |
| `test_prefetch_channel` | Unit | Channel-based, no polling |
| `test_prefetch_manual_hint` | Unit | /.prefetch/ handler (FR-19.4) |
| `test_collection_virtual_dir` | E2E | `/.collections/Jazz/` works |
| `test_cover_virtual_file` | E2E | `/Artist/Album/cover.jpg` exists (FR-16.2) |
| `test_prefetch_virtual_dir` | E2E | `/.prefetch/path` triggers prefetch |
| `test_prefetch_reduces_misses` | Integration | >50% miss reduction |
---
## Exit Criteria
- [ ] Smart collections stored in SQLite
- [ ] Built-in collections (Recently Added, Recently Played, Most Played, 80s, 90s) available
- [ ] `/.collections/Name/` shows matching files
- [ ] RecentlyPlayed/MostPlayed queries use persisted access_log table
- [ ] Album art extracted from embedded FLAC/MP3 data
- [ ] Artwork schema matches architecture.md 4.3.6 exactly (no size column)
- [ ] Thumbnail/medium generated on-demand, only full stored in CAS
- [ ] Oversized images (>10MB) rejected gracefully
- [ ] `cover.jpg` appears in album directories
- [ ] Access patterns recorded in SQLite (survive restarts)
- [ ] Time-based prefetch predicts by hour-of-day (FR-19.3)
- [ ] `/.prefetch/path` triggers manual prefetch hints (FR-19.4)
- [ ] Prefetch engine uses channel-based queue (no busy-wait polling)
- [ ] Prefetch deduplication prevents same file queued twice
- [ ] Prefetch reduces cache misses by >50% on sequential album playback
- [ ] API documentation covers happy/error paths for all features
---
## Architecture Compliance
| Architecture Section | Requirement | Status |
|---------------------|-------------|--------|
| 4.3.6 | collections table schema | ✅ |
| 4.3.6 | artwork table schema (UNIQUE file_id, art_type) | ✅ Oracle fix |
| 3.2.5 | Cache hit rate >95% | ✅ Benchmark |
| FR-15.1 | Query-based virtual folders | ✅ |
| FR-15.2 | Saved searches as directories | ✅ |
| FR-15.3 | Dynamic playlists (RecentlyPlayed, MostPlayed) | ✅ access_log |
| FR-16.1 | Extract embedded album art | ✅ |
| FR-16.2 | Expose as virtual files | ✅ |
| FR-16.3 | Cache separately from audio | ✅ |
| FR-16.4 | Multiple sizes | ✅ On-demand |
| FR-19.1 | Learn access patterns | ✅ Persistent |
| FR-19.2 | Playlist-aware prefetch | ✅ |
| FR-19.3 | Time-based prefetching | ✅ Task 4 |
| FR-19.4 | Manual prefetch hints | ✅ /.prefetch/ |
## Oracle Fixes Applied
| Issue | Fix | Location |
|-------|-----|----------|
| Artwork schema mismatch | Removed `size` column, matches architecture exactly | `artwork.rs` |
| rusqlite in async context | Use `spawn_blocking` for DB operations | `artwork.rs` |
| PatternStore not persisted | Added `access_log` and `sequence_counts` tables | `patterns.rs` |
| FR-19.3 missing | Added time-based prediction by hour | `patterns.rs` |
| FR-19.4 missing | Added `/.prefetch/` FUSE handler | `prefetch.rs` |
| Prefetch busy-wait polling | Switched to `mpsc::channel` | `prefetch.rs` |
| No prefetch deduplication | Added `pending: HashSet<FileId>` guard | `prefetch.rs` |
| Image resize memory spikes | Added 10MB max input size check | `artwork.rs` |