- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
22 KiB
AcoustID Architecture
System Architecture Overview
AcoustID employs a monolithic multi-process architecture with microservice-like separation of concerns. The system is split into two major repositories with distinct responsibilities:
- acoustid-server: Monolithic Python application with multiple process types
- acoustid-index: Standalone Zig service for fingerprint indexing
Server Architecture
Process Types
The server runs as multiple independent processes, each with a specific role:
| Process | Entry Point | Purpose | Scaling |
|---|---|---|---|
| API | acoustid.server:make_application() |
Handle API requests | Horizontal |
| Web | acoustid.server:make_application() |
Serve web UI | Horizontal |
| Worker | acoustid.worker:run() |
Process background jobs | Horizontal |
| Cron | acoustid.cron:run() |
Execute scheduled tasks | Single instance |
| Import | acoustid.scripts.import_submissions |
Bulk import fingerprints | Manual |
Directory Structure
acoustid/
├── api/ # API layer
│ ├── __init__.py # API application factory
│ ├── errors.py # Error handling
│ ├── ratelimit.py # Rate limiting logic
│ └── v2/ # API v2 endpoints
│ ├── __init__.py
│ ├── lookup.py # Fingerprint lookup
│ ├── submit.py # Fingerprint submission
│ ├── misc.py # Utility endpoints
│ └── internal.py # Internal admin endpoints
├── data/ # Business logic layer
│ ├── account.py # User account operations
│ ├── application.py # API application management
│ ├── fingerprint.py # Fingerprint operations
│ ├── foreignid.py # Foreign ID management
│ ├── meta.py # Metadata operations
│ ├── musicbrainz.py # MusicBrainz queries
│ ├── stats.py # Statistics tracking
│ ├── submission.py # Submission processing
│ └── track.py # Track operations
├── future/ # Starlette migration
│ ├── app.py # ASGI application
│ ├── lookup.py # Async lookup handler
│ └── submit.py # Async submit handler
├── web/ # Web UI layer
│ ├── __init__.py # Web application factory
│ ├── views/ # View handlers
│ └── templates/ # Jinja2 templates
├── scripts/ # Utility scripts
│ ├── import_submissions.py
│ ├── backfill_fingerprint_index.py
│ └── update_lookup_stats.py
├── cli.py # CLI command definitions
├── server.py # WSGI/ASGI application
├── worker.py # Background worker
├── cron.py # Cron job scheduler
├── fingerprint.py # Fingerprint utilities
├── indexclient.py # Legacy TCP index client
├── fpstore.py # Modern HTTP index client
├── db.py # Database connection management
├── config.py # Configuration loading
└── tables.py # SQLAlchemy ORM models
Layered Architecture
The server follows a traditional layered architecture:
┌─────────────────────────────────────────┐
│ Presentation Layer │
│ (api/, web/, future/) │
│ - HTTP request/response handling │
│ - Input validation │
│ - Response formatting │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Business Logic Layer │
│ (data/) │
│ - Domain operations │
│ - Business rules │
│ - Orchestration │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Data Access Layer │
│ (db.py, tables.py) │
│ - Database queries │
│ - ORM models │
│ - Transaction management │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ External Services Layer │
│ (indexclient.py, fpstore.py) │
│ - Index communication │
│ - MusicBrainz queries │
│ - Redis operations │
└─────────────────────────────────────────┘
Framework Transition
The server is actively transitioning from Flask to Starlette:
Current (Flask/Werkzeug):
- Location:
acoustid/api/,acoustid/web/ - WSGI-based synchronous request handling
- Gunicorn as application server
- Blocking database operations with psycopg2
Future (Starlette):
- Location:
acoustid/future/ - ASGI-based asynchronous request handling
- Uvicorn as application server
- Async database operations with asyncpg
Migration Status:
- Core lookup and submit endpoints have async implementations
- Legacy endpoints still use Flask
- Both frameworks run simultaneously during transition
- Configuration flag controls which implementation is used
Index Architecture
LSM-Tree Design
The index uses a Log-Structured Merge-tree (LSM-tree) for efficient fingerprint storage and retrieval.
Core Concept:
- Writes go to in-memory segment (fast)
- Memory segment periodically flushed to disk
- Background process merges disk segments
- Reads check memory segment first, then disk segments
Components:
┌─────────────────────────────────────────┐
│ MultiIndex │
│ - Manages multiple named indexes │
│ - Routes requests to correct index │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Index │
│ - Single fingerprint index │
│ - Coordinates segments and merging │
└─────────────────────────────────────────┘
↓
┌──────────────────┬──────────────────────┐
│ MemorySegment │ FileSegment(s) │
│ - In-memory │ - On-disk │
│ - Fast writes │ - Immutable │
│ - Volatile │ - Persistent │
└──────────────────┴──────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Oplog (Write-Ahead Log) │
│ - Durability for memory segment │
│ - Replay on crash recovery │
└─────────────────────────────────────────┘
Segment Management
MemorySegment (src/MemorySegment.zig):
- Hash map of fingerprint ID to posting list
- Posting list: array of term IDs (compressed)
- Maximum size threshold triggers flush
- Backed by Oplog for durability
FileSegment (src/FileSegment.zig):
- Immutable on-disk segment
- Binary file format with index and data sections
- StreamVByte compression for posting lists
- Memory-mapped for fast reads
Segment Lifecycle:
- Writes accumulate in MemorySegment
- MemorySegment reaches size threshold
- Flush to new FileSegment
- Clear MemorySegment and Oplog
- Background merger selects segments to merge
- Merge creates new larger FileSegment
- Delete old segments
Merge Policy
Tiered Merge Strategy:
- Segments grouped into tiers by size
- Tier 0: Smallest segments (recently flushed)
- Tier N: Largest segments (heavily merged)
- Merge triggered when tier has too many segments
- Merges segments within same tier
Benefits:
- Write amplification bounded
- Read performance improves over time
- Disk space reclaimed from deleted entries
File Format
Segment File Structure (src/filefmt.zig):
┌─────────────────────────────────────────┐
│ Header │
│ - Magic number │
│ - Version │
│ - Metadata │
├─────────────────────────────────────────┤
│ Index Section │
│ - Fingerprint ID → Offset mapping │
│ - Binary search tree or hash table │
├─────────────────────────────────────────┤
│ Data Section │
│ - Compressed posting lists │
│ - StreamVByte encoded │
└─────────────────────────────────────────┘
Block Compression (src/block.zig):
- Posting lists compressed in blocks
- StreamVByte SIMD compression
- Delta encoding for term IDs
- Typical compression ratio: 4-8x
Index Reader
IndexReader (src/IndexReader.zig):
- Read-only view of index
- Merges results from all segments
- Implements search algorithm
- Returns top-K candidates by score
Search Algorithm:
- Extract query terms from fingerprint
- For each term, fetch posting lists from all segments
- Merge posting lists (union)
- Score each candidate by term overlap
- Return top-K candidates sorted by score
Data Flow
Submission Flow (Detailed)
┌─────────┐
│ Client │
└────┬────┘
│ POST /v2/submit
↓
┌─────────────────────────────────────────┐
│ SubmitHandler (api/v2/submit.py) │
│ 1. Validate API keys (client + user) │
│ 2. Check rate limits (Redis) │
│ 3. Decode fingerprints │
│ 4. Insert into submission table │
│ 5. Publish to NATS queue │
└─────────────────────────────────────────┘
│
↓ NATS message
┌─────────────────────────────────────────┐
│ Worker (worker.py) │
│ 1. Consume message from NATS │
│ 2. Load submission from database │
└─────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────┐
│ FingerprintSearcher (data/fingerprint) │
│ 1. Extract query from fingerprint │
│ 2. Search index for matches │
└─────────────────────────────────────────┘
│
↓ HTTP POST /:index/_search
┌─────────────────────────────────────────┐
│ Index (fpindex) │
│ 1. Decode MessagePack request │
│ 2. Search segments │
│ 3. Score candidates │
│ 4. Return top matches │
└─────────────────────────────────────────┘
│
↓ Candidate fingerprint IDs
┌─────────────────────────────────────────┐
│ Worker (continued) │
│ 1. Fetch candidate metadata from DB │
│ 2. Decide: create new track or link │
│ 3. Insert/update track tables │
│ 4. Update index with new fingerprint │
│ 5. Store result in submission_result │
└─────────────────────────────────────────┘
│
↓ HTTP PUT /:index/:fpid
┌─────────────────────────────────────────┐
│ Index (fpindex) │
│ 1. Add fingerprint to MemorySegment │
│ 2. Append to Oplog │
│ 3. Trigger flush if needed │
└─────────────────────────────────────────┘
Lookup Flow (Detailed)
┌─────────┐
│ Client │
└────┬────┘
│ GET/POST /v2/lookup
↓
┌─────────────────────────────────────────┐
│ LookupHandler (api/v2/lookup.py) │
│ 1. Validate API key (client) │
│ 2. Check rate limits (Redis) │
│ 3. Parse parameters │
└─────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────┐
│ decode_fingerprint (fingerprint.py) │
│ 1. Decode base64 or compressed format │
│ 2. Decompress if needed │
│ 3. Parse Chromaprint data │
└─────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────┐
│ extract_query (fingerprint.py) │
│ 1. Extract hash terms from fingerprint│
│ 2. Build query structure │
└─────────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────┐
│ fpstore.search (fpstore.py) │
│ 1. Encode query as MessagePack │
│ 2. HTTP POST to index │
└─────────────────────────────────────────┘
│
↓ HTTP POST /:index/_search
┌─────────────────────────────────────────┐
│ Index (fpindex) │
│ 1. Parse MessagePack query │
│ 2. Search all segments │
│ 3. Merge and score results │
│ 4. Return top-K candidates │
└─────────────────────────────────────────┘
│
↓ Candidate fingerprint IDs + scores
┌─────────────────────────────────────────┐
│ LookupHandler (continued) │
│ 1. Fetch fingerprint metadata from DB │
│ 2. Fetch track metadata from DB │
│ 3. Fetch MusicBrainz data if requested│
│ 4. Build result structure │
│ 5. Format as JSON/XML │
└─────────────────────────────────────────┘
│
↓ JSON response
┌─────────┐
│ Client │
└─────────┘
Background Processing
Cron Jobs (acoustid/cron.py):
- Update lookup statistics (hourly)
- Update user agent statistics (daily)
- Clean up old submissions (daily)
- Refresh materialized views (hourly)
- Backup index snapshots (daily)
Worker Tasks (acoustid/worker.py):
- Process fingerprint submissions
- Import bulk fingerprints
- Update index with new data
- Resolve MBID redirects
- Clean up orphaned records
Index Communication Protocols
Legacy Protocol (indexclient.py)
Transport: Raw TCP socket
Port: 6080 (default)
Format: Custom binary protocol
Message Structure:
┌────────────────┬────────────────┬────────────────┐
│ Length (4B) │ Command (1B) │ Payload │
└────────────────┴────────────────┴────────────────┘
Commands:
0x01: Search0x02: Insert0x03: Delete
Status: Being phased out, replaced by HTTP protocol
Modern Protocol (fpstore.py)
Transport: HTTP/1.1
Port: 6081 (default)
Format: MessagePack
Endpoints:
| Method | Path | Purpose |
|---|---|---|
| POST | /:index/_search |
Search for fingerprints |
| PUT | /:index/:fpid |
Insert/update fingerprint |
| DELETE | /:index/:fpid |
Delete fingerprint |
| GET | /:index |
Get index info |
| GET | /:index/_segments |
List segments |
| GET | /:index/_snapshot |
Create snapshot |
Search Request:
{
"query": [term_id1, term_id2, ...], # Query terms
"limit": 10, # Max results
"min_score": 0.5 # Score threshold
}
Search Response:
{
"results": [
{"id": fpid1, "score": 0.95},
{"id": fpid2, "score": 0.87},
...
]
}
Concurrency and Parallelism
Server Concurrency
API/Web Processes:
- Multiple worker processes (Gunicorn/Uvicorn)
- Each process handles requests independently
- Shared-nothing architecture
- Database connection pooling per process
Worker Processes:
- Multiple worker instances
- NATS queue provides work distribution
- Each worker processes one submission at a time
- No shared state between workers
Cron Process:
- Single instance (leader election via database)
- Scheduled tasks run sequentially
- Long-running tasks delegated to workers
Index Concurrency
Thread Model:
- Main thread: HTTP server
- Worker threads: Search and merge operations
- Configurable thread pool size
Locking Strategy:
- Read-write lock on Index
- Multiple concurrent readers
- Exclusive writer (for flush/merge)
- Lock-free MemorySegment (atomic operations)
Background Tasks:
- Segment merger runs in background thread
- Oplog flusher runs periodically
- Metrics collector runs independently
Scalability Considerations
Horizontal Scaling
API/Web:
- Stateless processes
- Scale by adding more instances
- Load balancer distributes requests
- Session state in Redis (if needed)
Workers:
- Scale by adding more instances
- NATS queue distributes work
- No coordination required
Index:
- Multiple index instances (sharding)
- Consistent hashing for fingerprint distribution
- NATS for cluster coordination
- Each instance handles subset of fingerprints
Vertical Scaling
Database:
- Connection pooling
- Read replicas for queries
- Partitioning for large tables
- Materialized views for aggregations
Index:
- More threads for search
- Larger memory segment
- Faster disk for segments
- More RAM for file caching
Fault Tolerance
Server Resilience
Database Failures:
- Connection retry with exponential backoff
- Health checks detect failures
- Read-only mode if write DB unavailable
Index Failures:
- Graceful degradation (return partial results)
- Retry with exponential backoff
- Circuit breaker pattern
NATS Failures:
- Persistent queue (JetStream)
- Automatic reconnection
- Message replay on recovery
Index Resilience
Crash Recovery:
- Oplog replay restores MemorySegment
- FileSegments are immutable (no corruption)
- Incomplete merges discarded
Data Integrity:
- Checksums in file format
- Atomic file operations
- Write-ahead logging
Replication:
- NATS-based replication (optional)
- Snapshot-based backup
- Point-in-time recovery
Performance Characteristics
Server Performance
Lookup Latency:
- P50: ~50ms (including index search)
- P95: ~200ms
- P99: ~500ms
Bottlenecks:
- Index search time (dominant)
- Database query time (metadata fetch)
- Network latency (MusicBrainz queries)
Index Performance
Search Latency:
- P50: ~5ms
- P95: ~20ms
- P99: ~50ms
Throughput:
- ~1000 searches/second (single instance)
- ~500 inserts/second (single instance)
Bottlenecks:
- Disk I/O (segment reads)
- CPU (decompression and scoring)
- Memory (segment caching)
Future Architecture Plans
Server Modernization
- Complete migration to Starlette/ASGI
- Remove Flask dependencies
- Async database operations everywhere
- GraphQL API alongside REST
Index Enhancements
- Distributed index with automatic sharding
- Replication for high availability
- Incremental snapshots
- Query result caching
Infrastructure
- Kubernetes deployment
- Service mesh (Istio/Linkerd)
- Distributed tracing (OpenTelemetry)
- Advanced monitoring (Prometheus + Grafana)