580 lines
21 KiB
Markdown
580 lines
21 KiB
Markdown
# Metadata Enrichment (Standalone Mode): Design Doc
|
||
|
||
**Authors:** Sisyphus
|
||
**Status:** Draft
|
||
**Last Updated:** 2026-05-18
|
||
**Reviewers:** —
|
||
**Approvers:** —
|
||
**Document Link:** `docs/v2/plans/metadata-enrichment-standalone.md`
|
||
**Prerequisites:** [architecture.md](../architecture.md), [week-12-external-metadata.md](week-12-external-metadata.md)
|
||
|
||
---
|
||
|
||
## 1. Abstract
|
||
|
||
When musicfs operates without the music-agregator orchestrator, it should
|
||
still be able to enrich file metadata (genres, label, artwork URL, album
|
||
type) by querying the metadata-agregator service directly. This document
|
||
describes a **built-in metadata provider** compiled into musicfs that
|
||
queries metadata-agregator's gRPC `SearchAlbums` endpoint using
|
||
artist + album names extracted from file tags. Enrichment is lazy and
|
||
non-blocking — file access always returns immediately using embedded
|
||
tags, while a background worker enriches metadata asynchronously.
|
||
|
||
This plan **supersedes** the week-12 plan's approach of embedding
|
||
MusicBrainz/Discogs/Last.fm HTTP clients directly into musicfs. Instead,
|
||
musicfs delegates all external metadata resolution to metadata-agregator,
|
||
which already handles provider APIs, rate limiting, and caching.
|
||
|
||
## 2. Background
|
||
|
||
### 2.1. Current State
|
||
|
||
musicfs extracts audio metadata via symphonia (FLAC, MP3, AAC, OGG,
|
||
Opus) and stores it in `AudioMeta`. This metadata is whatever the file
|
||
tags contain — typically title, artist, album, year, track number.
|
||
|
||
The existing plugin system (`musicfs-plugins`) defines a `MetadataPlugin`
|
||
trait for external metadata lookup, but:
|
||
|
||
- No plugins have been implemented yet.
|
||
- The plugin system only supports native `.so` and WASM plugins.
|
||
- A gRPC client to metadata-agregator would require bundling an async
|
||
runtime and tonic inside a `.so` — an awkward fit.
|
||
|
||
Meanwhile, metadata-agregator is a Go gRPC service that:
|
||
|
||
- Searches MusicBrainz by artist + album name (`SearchAlbums` RPC).
|
||
- Caches results in PostgreSQL.
|
||
- Returns rich metadata: genres, cover URL, label, release date, album
|
||
type, artist credits.
|
||
|
||
### 2.2. Pain Points
|
||
|
||
- musicfs files lack genres, artwork URLs, and label info unless the
|
||
original files were meticulously tagged.
|
||
- The week-12 plan proposed embedding 4 separate HTTP API clients
|
||
(MusicBrainz, Discogs, Last.fm, AcoustID) directly into musicfs,
|
||
duplicating what metadata-agregator already does.
|
||
- The `MetadataPlugin` trait is designed for `.so`/WASM plugins, which
|
||
is wrong for a core infrastructure gRPC client.
|
||
|
||
## 3. Goals & Non-Goals
|
||
|
||
### 3.1. Goals
|
||
|
||
- **G1:** Enrich file metadata with genres, label, album type, and cover
|
||
URL by querying metadata-agregator via gRPC.
|
||
- **G2:** Never block file access — enrichment happens in background.
|
||
- **G3:** Make the provider entirely optional — disabled by default,
|
||
musicfs works identically without it.
|
||
- **G4:** Respect enrichment source priority so orchestrator pushes
|
||
(from the full-system mode) are not overwritten.
|
||
|
||
### 3.2. Non-Goals
|
||
|
||
- **NG1:** Embedding MusicBrainz/Discogs/Last.fm HTTP clients directly
|
||
into musicfs (metadata-agregator handles this).
|
||
- **NG2:** Audio fingerprinting (AcoustID) — deferred to future work.
|
||
- **NG3:** Modifying the existing `MetadataPlugin` trait — the built-in
|
||
provider is separate from the plugin system.
|
||
- **NG4:** Bidirectional communication — musicfs only queries
|
||
metadata-agregator, never the reverse.
|
||
|
||
## 4. Proposed Design
|
||
|
||
### 4.1. High-Level Architecture
|
||
|
||
```plantuml
|
||
@startuml
|
||
!theme plain
|
||
skinparam componentStyle rectangle
|
||
|
||
package "musicfs" as mfs {
|
||
component "FUSE Layer\n(readdir/open/read)" as fuse
|
||
component "MetadataCache / DB" as db
|
||
component "OverlayReader\n(synthesize headers)" as overlay
|
||
component "EnrichmentQueue\n(bounded, async)" as queue
|
||
component "EnrichmentWorker\n(background)" as worker
|
||
}
|
||
|
||
component "metadata-agregator\nSearchAlbums(query, artist)" as meta
|
||
|
||
fuse -right-> db : lookup metadata
|
||
db -right-> overlay : serve with overlay
|
||
|
||
fuse -down-> queue : enriched_at NULL?\npush request
|
||
queue -down-> worker : dequeue
|
||
worker -down-> meta : gRPC:\nSearchAlbums(\n query=album,\n artist=artist)
|
||
meta -up-> worker : Album (genres,\nlabel, cover_url)
|
||
worker -up-> db : write enriched\nmetadata to overlay
|
||
|
||
note bottom of meta
|
||
metadata-agregator handles:
|
||
• MusicBrainz API
|
||
• rate limiting
|
||
• PostgreSQL cache
|
||
end note
|
||
|
||
note right of fuse
|
||
File access is never blocked.
|
||
Returns embedded tags immediately.
|
||
Enrichment happens async.
|
||
end note
|
||
@enduml
|
||
```
|
||
|
||
### 4.2. Enrichment Flow
|
||
|
||
```plantuml
|
||
@startuml
|
||
!theme plain
|
||
skinparam sequenceMessageAlign center
|
||
|
||
participant "Media Player" as mp
|
||
participant "FUSE Layer" as fuse
|
||
participant "MetadataCache\n(SQLite)" as db
|
||
participant "EnrichmentQueue" as queue
|
||
participant "EnrichmentWorker" as worker
|
||
participant "metadata-agregator" as meta
|
||
|
||
== File Access (non-blocking) ==
|
||
|
||
mp -> fuse : open("/Pink Floyd/The Wall/01 - In the Flesh.flac")
|
||
fuse -> db : lookup(virtual_path)
|
||
db --> fuse : AudioMeta(artist, album, title, ...)\nenriched_at = NULL
|
||
|
||
fuse -> queue : try_push(file_id, artist="Pink Floyd", album="The Wall")
|
||
note right of queue : non-blocking,\nbounded queue
|
||
|
||
fuse --> mp : return file handle\n(with embedded tags only)
|
||
|
||
== Background Enrichment (async) ==
|
||
|
||
queue -> worker : dequeue(file_id, artist, album)
|
||
|
||
worker -> worker : check enrichment_source\n(skip if 'orchestrator' or 'provider')
|
||
|
||
worker -> worker : dedup check:\nalready enriched same album?\n(reuse cached result)
|
||
|
||
worker -> meta : SearchAlbums(\n query="The Wall",\n artist="Pink Floyd",\n limit=1)
|
||
meta --> worker : Album(\n genres=["Progressive Rock", "Art Rock"],\n label="Harvest",\n cover_url="https://...",\n album_type="album")
|
||
|
||
worker -> db : update_metadata(\n file_id,\n genres, label, cover_url,\n enrichment_source='provider',\n enriched_at=now())
|
||
|
||
worker -> worker : publish EventBus::FileModified
|
||
|
||
note over mp : next access sees\nenriched metadata
|
||
@enduml
|
||
```
|
||
|
||
### 4.3. Detailed Design
|
||
|
||
#### 4.3.1. Configuration
|
||
|
||
Add `[metadata_provider]` section to `config.toml`:
|
||
|
||
```toml
|
||
[metadata_provider]
|
||
enabled = false # disabled by default
|
||
endpoint = "http://localhost:50051" # metadata-agregator gRPC
|
||
timeout_ms = 5000 # per-request timeout
|
||
retry_max = 3 # max retries on failure
|
||
retry_backoff_ms = 1000 # initial backoff between retries
|
||
queue_size = 256 # enrichment queue capacity
|
||
```
|
||
|
||
Config struct addition in `musicfs-core/src/config.rs`:
|
||
|
||
```rust
|
||
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
|
||
pub struct MetadataProviderConfig {
|
||
#[serde(default)]
|
||
pub enabled: bool,
|
||
#[serde(default = "default_provider_endpoint")]
|
||
pub endpoint: String,
|
||
#[serde(default = "default_provider_timeout_ms")]
|
||
pub timeout_ms: u64,
|
||
#[serde(default = "default_retry_max")]
|
||
pub retry_max: u32,
|
||
#[serde(default = "default_retry_backoff_ms")]
|
||
pub retry_backoff_ms: u64,
|
||
#[serde(default = "default_queue_size")]
|
||
pub queue_size: usize,
|
||
}
|
||
```
|
||
|
||
#### 4.3.2. Built-in Metadata Provider
|
||
|
||
New module in `musicfs-metadata` (not a plugin, compiled in):
|
||
|
||
```rust
|
||
// musicfs-metadata/src/provider.rs
|
||
|
||
pub struct MetadataAgregatorProvider {
|
||
client: MetadataServiceClient<Channel>,
|
||
config: MetadataProviderConfig,
|
||
}
|
||
|
||
impl MetadataAgregatorProvider {
|
||
pub async fn connect(config: &MetadataProviderConfig)
|
||
-> Result<Self>;
|
||
|
||
/// Query metadata-agregator by artist + album names.
|
||
/// Returns enriched metadata if a match is found.
|
||
pub async fn lookup(
|
||
&self,
|
||
artist: &str,
|
||
album: &str,
|
||
) -> Result<Option<EnrichedMetadata>>;
|
||
}
|
||
```
|
||
|
||
The `lookup` method calls `SearchAlbums(query=album, artist=artist,
|
||
limit=1)` on metadata-agregator. If a result is returned, it maps
|
||
the response to `EnrichedMetadata`:
|
||
|
||
```rust
|
||
pub struct EnrichedMetadata {
|
||
pub genres: Vec<String>,
|
||
pub label: Option<String>,
|
||
pub album_type: Option<String>,
|
||
pub cover_url: Option<String>,
|
||
pub release_date: Option<String>,
|
||
pub total_tracks: Option<u32>,
|
||
pub total_discs: Option<u32>,
|
||
}
|
||
```
|
||
|
||
#### 4.3.3. ExternalMetadata Extension
|
||
|
||
Extend the existing `ExternalMetadata` in `musicfs-plugins/src/traits.rs`
|
||
to carry richer data:
|
||
|
||
```rust
|
||
pub struct ExternalMetadata {
|
||
// existing fields...
|
||
pub title: Option<String>,
|
||
pub artist: Option<String>,
|
||
pub album: Option<String>,
|
||
pub album_artist: Option<String>,
|
||
pub genre: Option<String>, // kept for backward compat
|
||
pub year: Option<u32>,
|
||
pub track: Option<u32>,
|
||
pub disc: Option<u32>,
|
||
pub musicbrainz_id: Option<String>,
|
||
pub artwork_url: Option<String>,
|
||
|
||
// new fields
|
||
pub genres: Vec<String>,
|
||
pub label: Option<String>,
|
||
pub album_type: Option<String>,
|
||
pub cover_url: Option<String>,
|
||
}
|
||
```
|
||
|
||
#### 4.3.4. Database Schema Changes
|
||
|
||
Add columns to `file_metadata` table in
|
||
`musicfs-cache/src/schema.sql`:
|
||
|
||
```sql
|
||
ALTER TABLE file_metadata ADD COLUMN enrichment_source TEXT;
|
||
-- 'embedded' | 'provider' | 'orchestrator'
|
||
ALTER TABLE file_metadata ADD COLUMN enriched_at INTEGER;
|
||
-- unix timestamp, NULL = not enriched
|
||
ALTER TABLE file_metadata ADD COLUMN enrichment_attempts INTEGER DEFAULT 0;
|
||
-- number of failed enrichment attempts
|
||
ALTER TABLE file_metadata ADD COLUMN last_enrichment_error TEXT;
|
||
-- last error message, NULL if no error
|
||
ALTER TABLE file_metadata ADD COLUMN genres_json TEXT;
|
||
-- JSON array: '["Progressive Rock","Art Rock"]'
|
||
-- separate from existing `genre` (singular) for backward compat
|
||
ALTER TABLE file_metadata ADD COLUMN label TEXT;
|
||
ALTER TABLE file_metadata ADD COLUMN album_type TEXT;
|
||
ALTER TABLE file_metadata ADD COLUMN cover_url TEXT;
|
||
```
|
||
|
||
> **Note:** The existing `genre TEXT` column (singular) is preserved
|
||
> for backward compatibility. `genres_json` stores the full list.
|
||
> The singular `genre` field is set to the first genre in the array
|
||
> when enriched.
|
||
|
||
#### 4.3.5. Background Enrichment Queue + Worker
|
||
|
||
```rust
|
||
// musicfs-metadata/src/enrichment.rs
|
||
|
||
pub struct EnrichmentQueue {
|
||
tx: mpsc::Sender<EnrichmentRequest>,
|
||
/// Tracks in-flight (artist, album) pairs to prevent duplicate
|
||
/// API calls when multiple tracks from the same album are
|
||
/// accessed simultaneously.
|
||
in_flight: Arc<DashSet<(String, String)>>,
|
||
}
|
||
|
||
struct EnrichmentRequest {
|
||
file_id: FileId,
|
||
artist: String,
|
||
album: String,
|
||
}
|
||
|
||
pub struct EnrichmentWorker {
|
||
rx: mpsc::Receiver<EnrichmentRequest>,
|
||
provider: Arc<MetadataAgregatorProvider>,
|
||
db: Arc<Database>,
|
||
event_bus: Arc<EventBus>,
|
||
in_flight: Arc<DashSet<(String, String)>>,
|
||
config: MetadataProviderConfig,
|
||
}
|
||
```
|
||
|
||
##### Enqueue-time dedup
|
||
|
||
When `EnrichmentQueue::try_push()` is called, it checks the
|
||
`in_flight` `DashSet` before pushing. If `(artist, album)` is
|
||
already in the set, the request is dropped (the worker will enrich
|
||
all files with the same album in one pass). This prevents 12
|
||
simultaneous track opens from making 12 identical API calls.
|
||
|
||
If `try_push` fails because the queue is full, log at WARN level
|
||
and increment `enrichment_queue_drops_total` metric.
|
||
|
||
##### Worker loop (single-threaded, processes one at a time):
|
||
|
||
1. Dequeue `EnrichmentRequest`.
|
||
2. Check `enrichment_attempts` — skip if `>= retry_max`.
|
||
3. **Atomic conflict check**: write uses conditional SQL:
|
||
```sql
|
||
UPDATE file_metadata SET
|
||
genres_json = ?, label = ?, album_type = ?, cover_url = ?,
|
||
genre = ?, -- first genre for backward compat
|
||
enrichment_source = 'provider',
|
||
enriched_at = strftime('%s', 'now'),
|
||
enrichment_attempts = 0,
|
||
last_enrichment_error = NULL
|
||
WHERE file_id = ?
|
||
AND (enrichment_source IS NULL OR enrichment_source = 'embedded')
|
||
```
|
||
This prevents the TOCTOU race — if the orchestrator wrote between
|
||
dequeue and now, the `WHERE` clause prevents overwrite. The UPDATE
|
||
returns rows_affected=0, which the worker treats as "skip, already
|
||
enriched by higher-priority source".
|
||
4. Deduplicate by (artist, album) — if another file in the same album
|
||
was already enriched, reuse the cached `EnrichedMetadata` result
|
||
for all files with the same (artist, album) pair.
|
||
5. Call `provider.lookup(artist, album)`.
|
||
6. On success: execute atomic update (step 3) for all files with this
|
||
(artist, album). Publish `EventBus::FileModified` for each updated
|
||
file. Remove `(artist, album)` from `in_flight` set.
|
||
7. On failure: increment `enrichment_attempts`, set
|
||
`last_enrichment_error`. If `attempts < retry_max`, re-enqueue
|
||
with exponential backoff (`retry_backoff_ms * 2^attempts`).
|
||
If `attempts >= retry_max`, log at WARN and stop retrying.
|
||
Remove from `in_flight` set.
|
||
|
||
##### Shutdown behavior
|
||
|
||
Queue contents are lost on shutdown. This is acceptable — files will
|
||
be re-queued on next access since `enriched_at` is still NULL.
|
||
Enrichment is idempotent.
|
||
|
||
#### 4.3.6. FUSE Integration Point
|
||
|
||
In the FUSE `readdir` / `getattr` / `open` path
|
||
(`musicfs-fuse/src/ops.rs`), after loading `AudioMeta` from DB:
|
||
|
||
```rust
|
||
if metadata_provider.is_enabled()
|
||
&& file_meta.enriched_at.is_none()
|
||
&& file_meta.enrichment_attempts < config.retry_max
|
||
&& file_meta.audio.artist.is_some()
|
||
&& file_meta.audio.album.is_some()
|
||
{
|
||
if let Err(_) = enrichment_queue.try_push(EnrichmentRequest {
|
||
file_id: file_meta.id,
|
||
artist: file_meta.audio.artist.unwrap(),
|
||
album: file_meta.audio.album.unwrap(),
|
||
}) {
|
||
// Queue full — file will be retried on next access
|
||
tracing::warn!(
|
||
file_id = ?file_meta.id,
|
||
"enrichment queue full, dropping request"
|
||
);
|
||
metrics::ENRICHMENT_QUEUE_DROPS.inc();
|
||
}
|
||
// Non-blocking: returns immediately with embedded tags
|
||
}
|
||
```
|
||
|
||
The `enrichment_attempts < retry_max` check prevents files that have
|
||
permanently failed enrichment (e.g., metadata-agregator has no match)
|
||
from being re-queued on every access.
|
||
|
||
#### 4.3.7. Conflict Resolution
|
||
|
||
| Source | Priority | Writes When |
|
||
|--------|----------|-------------|
|
||
| `orchestrator` | Highest | Always overwrites (full-system mode push) |
|
||
| `provider` | Medium | Only if current source is NULL or `'embedded'` |
|
||
| `embedded` | Lowest | Implicit default from file tag parsing |
|
||
|
||
Conflict resolution is enforced **atomically at write time** using
|
||
conditional SQL (`WHERE enrichment_source IS NULL OR
|
||
enrichment_source = 'embedded'`), not at dequeue time. This prevents
|
||
the TOCTOU race where the orchestrator writes between the worker's
|
||
check and the worker's write.
|
||
|
||
#### 4.3.8. Proto Changes Required
|
||
|
||
The existing `UpdateMetadataRequest` in `musicfs.proto` must be
|
||
extended to carry the new enrichment fields:
|
||
|
||
```protobuf
|
||
// Add to UpdateMetadataRequest:
|
||
optional string label = 40;
|
||
optional string album_type = 41;
|
||
optional string cover_url = 42;
|
||
```
|
||
|
||
> **Note on genres:** metadata-agregator returns `repeated Genre`
|
||
> (objects with `id` + `name`). The provider extracts genre names
|
||
> and stores them as a JSON array in `genres_json`. The singular
|
||
> `genre` field in `UpdateMetadataRequest` (already exists at
|
||
> field 9) is set to the first/primary genre for backward compat.
|
||
|
||
#### 4.3.9. `cover_url` Usage
|
||
|
||
`cover_url` is stored in the metadata overlay but is **not used by
|
||
musicfs for artwork embedding or display** in this plan. It is
|
||
stored for consumption by external tools (e.g., media players that
|
||
query musicfs's gRPC `GetMetadata` and fetch artwork themselves).
|
||
Artwork download and caching is deferred to future work.
|
||
|
||
## 5. Cross-Cutting Concerns
|
||
|
||
### 5.1. Security & Privacy
|
||
|
||
- gRPC connection to metadata-agregator is plaintext (internal network).
|
||
TLS can be added via config if needed.
|
||
- No PII involved — only music metadata.
|
||
- No API keys stored in musicfs — metadata-agregator handles provider
|
||
auth.
|
||
|
||
### 5.2. Observability
|
||
|
||
New tracing spans and metrics:
|
||
|
||
| Metric | Type | Description |
|
||
|--------|------|-------------|
|
||
| `enrichment_queue_depth` | Gauge | Current queue size |
|
||
| `enrichment_queue_drops_total` | Counter | Requests dropped (queue full) |
|
||
| `enrichment_inflight_albums` | Gauge | In-flight (artist, album) dedup set size |
|
||
| `enrichment_lookups_total` | Counter | Total provider lookups |
|
||
| `enrichment_hits_total` | Counter | Successful matches |
|
||
| `enrichment_misses_total` | Counter | No match found |
|
||
| `enrichment_errors_total` | Counter | Provider errors |
|
||
| `enrichment_skipped_total` | Counter | Skipped (higher-priority source already wrote) |
|
||
| `enrichment_latency_ms` | Histogram | Lookup latency |
|
||
|
||
### 5.3. Scalability & Performance
|
||
|
||
- Queue is bounded (default 256) — backpressure via `try_push`.
|
||
- Album-level deduplication: 12 tracks in same album = 1 lookup.
|
||
- No impact on file read latency — enrichment is fully async.
|
||
- metadata-agregator caches in PostgreSQL, so repeated lookups are
|
||
cheap.
|
||
|
||
### 5.4. Testing Plan
|
||
|
||
| Test | Type | Validates |
|
||
|------|------|-----------|
|
||
| `test_provider_connect` | Unit | gRPC connection setup |
|
||
| `test_lookup_match` | Unit (mock) | SearchAlbums → EnrichedMetadata mapping |
|
||
| `test_lookup_no_match` | Unit (mock) | Graceful handling of empty results, increments attempts |
|
||
| `test_enrichment_queue_push` | Unit | Queue push + in_flight dedup |
|
||
| `test_enrichment_queue_full_drops` | Unit | try_push fails gracefully, logs, increments metric |
|
||
| `test_enrichment_worker_writes_db` | Integration | DB write after lookup |
|
||
| `test_enrichment_atomic_conflict` | Integration | Orchestrator writes between dequeue and worker write → worker does NOT overwrite |
|
||
| `test_enrichment_retry_backoff` | Unit | Failed attempts increment counter, exponential backoff |
|
||
| `test_enrichment_max_attempts_stop` | Unit | After retry_max failures, file not re-queued |
|
||
| `test_config_disabled` | Unit | No queue/worker when disabled |
|
||
| `test_album_dedup_simultaneous` | Integration | 12 tracks opened at once → 1 API call |
|
||
| `test_genre_backward_compat` | Unit | genres_json stored as array, genre set to first entry |
|
||
|
||
## 6. Alternatives Considered
|
||
|
||
### 6.1. Native .so Plugin
|
||
|
||
Rejected. Requires bundling a separate async runtime + tonic gRPC
|
||
stack inside a dynamically loaded library. ABI instability, duplicate
|
||
runtimes, and deployment complexity outweigh the "purity" of using the
|
||
plugin system.
|
||
|
||
### 6.2. Direct MusicBrainz/Discogs/Last.fm HTTP Clients (week-12 plan)
|
||
|
||
Rejected. metadata-agregator already handles these providers with rate
|
||
limiting, caching, and deduplication. Embedding HTTP clients in musicfs
|
||
would duplicate this work and couple musicfs to specific provider APIs.
|
||
|
||
### 6.3. WASM Plugin
|
||
|
||
Rejected. WASI networking is immature. gRPC over WASM adds unnecessary
|
||
latency and complexity.
|
||
|
||
### 6.4. On-Demand Blocking Lookup
|
||
|
||
Rejected. Blocking file access while waiting for a gRPC response would
|
||
cause latency spikes and kill media player UX. Background async is the
|
||
only acceptable approach.
|
||
|
||
## 7. Implementation Plan
|
||
|
||
### Phase 1: Foundation (Day 1)
|
||
|
||
- [ ] Add `MetadataProviderConfig` to config.rs
|
||
- [ ] Add DB schema columns: `enrichment_source`, `enriched_at`,
|
||
`enrichment_attempts`, `last_enrichment_error`, `genres_json`,
|
||
`label`, `album_type`, `cover_url`
|
||
- [ ] Add `label`, `album_type`, `cover_url` fields to
|
||
`UpdateMetadataRequest` in `musicfs.proto`
|
||
- [ ] Extend `ExternalMetadata` struct
|
||
- [ ] Update `config.example.toml`
|
||
|
||
### Phase 2: Provider + Worker (Day 1–2)
|
||
|
||
- [ ] Implement `MetadataAgregatorProvider` (gRPC client wrapper)
|
||
- [ ] Implement `EnrichmentQueue` with `DashSet` in-flight dedup
|
||
- [ ] Implement `EnrichmentWorker` with:
|
||
- Atomic conditional write (`WHERE enrichment_source IS NULL OR ...`)
|
||
- Retry tracking (`enrichment_attempts`, exponential backoff)
|
||
- Album-level result caching
|
||
- [ ] Add queue drop logging + metrics
|
||
- [ ] Wire into startup (musicfs-cli) — conditional on config
|
||
|
||
### Phase 3: Integration + Tests (Day 2)
|
||
|
||
- [ ] Wire enrichment trigger in FUSE getattr/readdir path
|
||
(with `enrichment_attempts < retry_max` guard)
|
||
- [ ] Write unit tests: atomic conflict, queue drops, retry backoff,
|
||
max attempts, genre backward compat
|
||
- [ ] Write integration test: 12-track simultaneous dedup
|
||
- [ ] Write integration test with in-memory DB + mock gRPC server
|
||
- [ ] Update architecture.md with metadata provider component
|
||
|
||
## 8. Glossary / References
|
||
|
||
| Term | Definition |
|
||
|------|------------|
|
||
| metadata-agregator | Go gRPC service that searches MusicBrainz and caches results in PostgreSQL |
|
||
| Enrichment | Adding genres, label, artwork URL to file metadata beyond what's in file tags |
|
||
| Overlay | musicfs mechanism for serving modified metadata without changing origin files |
|
||
| `AudioMeta` | Core metadata struct extracted from file tags by symphonia |
|
||
| `ExternalMetadata` | Metadata returned by external providers (plugin trait) |
|
||
| `enrichment_source` | Tracks who last wrote metadata: `embedded`, `provider`, or `orchestrator` |
|
||
|
||
- [metadata-agregator proto](../../../../metadata-agregator/proto/metadata/v1/metadata.proto)
|
||
- [musicfs-plugins traits](../../crates/musicfs-plugins/src/traits.rs)
|
||
- [musicfs-cache overlay](../../crates/musicfs-cache/src/overlay.rs)
|
||
- [architecture.md](../architecture.md)
|