a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
392 lines
11 KiB
Markdown
392 lines
11 KiB
Markdown
# AcoustID System Overview
|
|
|
|
## Introduction
|
|
|
|
AcoustID is an open-source audio fingerprinting service that identifies music recordings by analyzing their acoustic characteristics. The system consists of two primary components working in tandem: a Python-based web service (acoustid-server) and a high-performance Zig-based fingerprint index (acoustid-index). Together, they provide a production-grade solution for matching audio fingerprints to MusicBrainz metadata.
|
|
|
|
## System Components
|
|
|
|
### acoustid-server (Python)
|
|
|
|
The server component handles all user-facing operations, database management, and business logic.
|
|
|
|
**Repository**: acoustid/acoustid-server
|
|
**License**: MIT
|
|
**Language**: Python 3.12+
|
|
**Current Version**: 26.3.1
|
|
|
|
**Core Technologies**:
|
|
- **Web Framework**: Werkzeug/Flask (current) with migration to Starlette (future async)
|
|
- **ORM**: SQLAlchemy 2.x with multi-database support
|
|
- **Database**: PostgreSQL 17.4 (4 separate databases)
|
|
- **Cache/Queue**: Redis for rate limiting and task queues
|
|
- **Message Queue**: NATS with JetStream for async submission processing
|
|
- **ASGI Server**: Uvicorn for async endpoints, Gunicorn for legacy
|
|
|
|
**Key Dependencies**:
|
|
```
|
|
acoustid-ext (C extension for Chromaprint)
|
|
Flask (current web framework)
|
|
Starlette (future async framework)
|
|
aiohttp (async HTTP client)
|
|
SQLAlchemy 2.x (ORM)
|
|
alembic (database migrations)
|
|
asyncpg (async PostgreSQL driver)
|
|
psycopg2 (sync PostgreSQL driver)
|
|
nats-py (NATS client)
|
|
mbdata (MusicBrainz data models)
|
|
msgspec (fast JSON/MessagePack)
|
|
zstd (compression)
|
|
gunicorn (WSGI server)
|
|
uvicorn (ASGI server)
|
|
```
|
|
|
|
**Entry Point**:
|
|
```bash
|
|
# Main CLI entry
|
|
python manage.py -> acoustid.cli:main()
|
|
|
|
# Available commands
|
|
python manage.py run web # Web UI server
|
|
python manage.py run api # API server
|
|
python manage.py run cron # Scheduled tasks
|
|
python manage.py run worker # Background worker
|
|
python manage.py run import # Import fingerprints
|
|
```
|
|
|
|
**File Locations**:
|
|
- Entry script: `manage.py`
|
|
- CLI implementation: `acoustid/cli.py`
|
|
- Server logic: `acoustid/server.py`
|
|
- Worker logic: `acoustid/worker.py`
|
|
- Cron jobs: `acoustid/cron.py`
|
|
- Configuration: `acoustid/config.py`
|
|
|
|
### acoustid-index (Zig)
|
|
|
|
The index component provides ultra-fast fingerprint search using advanced data structures and SIMD optimizations.
|
|
|
|
**Repository**: acoustid/acoustid-index
|
|
**License**: GPL-3.0
|
|
**Language**: Zig
|
|
**Build System**: Zig build system
|
|
|
|
**Core Technologies**:
|
|
- **HTTP Server**: httpz (Zig HTTP library)
|
|
- **Data Structure**: LSM-tree (Log-Structured Merge-tree) inverted index
|
|
- **Compression**: StreamVByte SIMD compression for posting lists
|
|
- **Serialization**: MessagePack for wire protocol
|
|
- **Metrics**: Prometheus-compatible metrics endpoint
|
|
|
|
**Key Dependencies**:
|
|
```
|
|
httpz (HTTP server framework)
|
|
metrics (Prometheus metrics)
|
|
zul (Zig utility library)
|
|
msgpack (MessagePack serialization)
|
|
nats (NATS client)
|
|
```
|
|
|
|
**Entry Point**:
|
|
```bash
|
|
# Build and run
|
|
zig build run -- --dir /tmp --port 8080
|
|
|
|
# Binary name
|
|
fpindex
|
|
|
|
# CLI flags
|
|
--dir <path> # Data directory for index storage
|
|
--port <number> # HTTP server port (default: 6081)
|
|
--threads <number> # Worker thread count
|
|
--log-level <level> # Logging verbosity
|
|
--cluster <name> # Cluster name for distributed setup
|
|
--nats-url <url> # NATS server URL for clustering
|
|
```
|
|
|
|
**File Locations**:
|
|
- Main entry: `src/main.zig`
|
|
- HTTP server: `src/server.zig`
|
|
- API handlers: `src/api.zig`
|
|
- Multi-index manager: `src/MultiIndex.zig`
|
|
- Core index: `src/Index.zig`
|
|
- Index reader: `src/IndexReader.zig`
|
|
- Segment management: `src/segment.zig`
|
|
- Memory segment: `src/MemorySegment.zig`
|
|
- File segment: `src/FileSegment.zig`
|
|
- Write-ahead log: `src/Oplog.zig`
|
|
- File format: `src/filefmt.zig`
|
|
- Block compression: `src/block.zig`
|
|
- SIMD compression: `src/streamvbyte.zig`
|
|
- Metrics: `src/metrics.zig`
|
|
|
|
## Build and Run
|
|
|
|
### Server Build
|
|
|
|
```bash
|
|
# Install dependencies with uv
|
|
uv sync
|
|
|
|
# Build Chromaprint extension
|
|
# (handled automatically in Docker build)
|
|
|
|
# Run with docker-compose
|
|
docker compose up
|
|
```
|
|
|
|
**Docker Compose Services**:
|
|
- `nats`: Message queue
|
|
- `redis`: Cache and rate limiting
|
|
- `postgres`: Database (custom pg17.4 image)
|
|
- `index`: Fingerprint index service
|
|
- `api`: API server
|
|
- `web`: Web UI server
|
|
- `cron`: Scheduled tasks
|
|
- `worker`: Background job processor
|
|
|
|
### Index Build
|
|
|
|
```bash
|
|
# Build binary
|
|
zig build
|
|
|
|
# Run with options
|
|
zig build run -- --dir /var/lib/acoustid-index --port 6081 --threads 4
|
|
```
|
|
|
|
## Architecture Relationship
|
|
|
|
The two components work together in a client-server model:
|
|
|
|
1. **Server** receives fingerprint submissions and lookup requests via HTTP API
|
|
2. **Server** stores metadata in PostgreSQL
|
|
3. **Server** sends fingerprint data to **Index** via HTTP/MessagePack protocol
|
|
4. **Index** performs ultra-fast similarity search using LSM-tree
|
|
5. **Index** returns candidate fingerprint IDs to **Server**
|
|
6. **Server** enriches results with metadata from PostgreSQL and MusicBrainz
|
|
7. **Server** returns final results to client
|
|
|
|
## Communication Protocols
|
|
|
|
### Server to Index
|
|
|
|
**Modern Protocol** (fpstore.py):
|
|
- HTTP POST to `http://index:6081/:index/_search`
|
|
- Request body: MessagePack-encoded fingerprint query
|
|
- Response: MessagePack-encoded list of candidate IDs with scores
|
|
|
|
**Legacy Protocol** (indexclient.py):
|
|
- Raw TCP socket connection
|
|
- Binary protocol with custom framing
|
|
- Being phased out in favor of HTTP
|
|
|
|
### Client to Server
|
|
|
|
**Public API**:
|
|
- HTTP GET/POST to `https://api.acoustid.org/v2/*`
|
|
- JSON/XML/JSONP responses
|
|
- Rate-limited by API key and IP
|
|
|
|
## Version Information
|
|
|
|
**Server Version**: 26.3.1
|
|
- Semantic versioning
|
|
- Tagged releases in Git
|
|
- Version defined in `acoustid/__init__.py`
|
|
|
|
**Index Version**: No formal versioning yet
|
|
- Tracked by Git commit hash
|
|
- Breaking changes communicated via commit messages
|
|
|
|
## Deployment Models
|
|
|
|
### Production (acoustid.org)
|
|
|
|
- Multi-server deployment
|
|
- Separate API, web, worker, and cron processes
|
|
- Dedicated PostgreSQL cluster (4 databases)
|
|
- Redis cluster for caching
|
|
- NATS cluster for message queue
|
|
- Multiple index instances for load balancing
|
|
|
|
### Self-Hosted (Docker Compose)
|
|
|
|
- Single-host deployment
|
|
- All services in containers
|
|
- Shared PostgreSQL instance
|
|
- Single Redis instance
|
|
- Single NATS instance
|
|
- Single index instance
|
|
|
|
### Development (Local)
|
|
|
|
- Python virtual environment with uv
|
|
- Local PostgreSQL (or Docker)
|
|
- Local Redis (or Docker)
|
|
- Local NATS (or Docker)
|
|
- Index built and run locally with Zig
|
|
|
|
## Key Features
|
|
|
|
### Server Features
|
|
|
|
- **Fingerprint Submission**: Accept audio fingerprints with optional metadata
|
|
- **Fingerprint Lookup**: Match fingerprints to known recordings
|
|
- **MusicBrainz Integration**: Link fingerprints to MBIDs
|
|
- **User Management**: API key generation and management
|
|
- **Rate Limiting**: Multi-tier rate limiting (global, app, IP)
|
|
- **Batch Operations**: Submit/lookup up to 20 fingerprints per request
|
|
- **Async Processing**: Background workers for heavy operations
|
|
- **Health Checks**: Multiple health endpoints for monitoring
|
|
- **Metrics**: StatsD metrics for observability
|
|
|
|
### Index Features
|
|
|
|
- **Fast Search**: Sub-millisecond fingerprint matching
|
|
- **SIMD Optimization**: StreamVByte compression for posting lists
|
|
- **LSM-Tree Storage**: Efficient write and read performance
|
|
- **Background Merging**: Automatic segment compaction
|
|
- **Snapshot Support**: Point-in-time index snapshots
|
|
- **Cluster Support**: Distributed index via NATS
|
|
- **Prometheus Metrics**: Built-in metrics endpoint
|
|
- **HTTP API**: RESTful API for all operations
|
|
|
|
## Configuration
|
|
|
|
### Server Configuration
|
|
|
|
**Config File**: `acoustid.conf` (INI format)
|
|
**Environment Variables**: `ACOUSTID_*` prefix
|
|
**Secret Files**: `*_file` suffix for file-based secrets
|
|
|
|
Example:
|
|
```ini
|
|
[database]
|
|
name = acoustid_app
|
|
user = acoustid
|
|
password_file = /run/secrets/db_password
|
|
|
|
[redis]
|
|
host = redis
|
|
port = 6379
|
|
|
|
[fingerprint_index]
|
|
host = index
|
|
port = 6081
|
|
```
|
|
|
|
### Index Configuration
|
|
|
|
**CLI Flags Only**: No config file support
|
|
**Environment Variables**: Limited support
|
|
|
|
Example:
|
|
```bash
|
|
fpindex \
|
|
--dir /var/lib/acoustid-index \
|
|
--port 6081 \
|
|
--threads 4 \
|
|
--log-level info \
|
|
--nats-url nats://nats:4222
|
|
```
|
|
|
|
## Data Flow Summary
|
|
|
|
### Submission Flow
|
|
|
|
1. Client submits fingerprint via `/v2/submit`
|
|
2. Server validates API keys and rate limits
|
|
3. Server stores submission in `submission` table
|
|
4. Server publishes message to NATS queue
|
|
5. Worker picks up message from NATS
|
|
6. Worker searches index for matches
|
|
7. Worker creates or links track in PostgreSQL
|
|
8. Worker updates index with new fingerprint
|
|
9. Client polls `/v2/submission_status` for result
|
|
|
|
### Lookup Flow
|
|
|
|
1. Client requests lookup via `/v2/lookup`
|
|
2. Server validates API key and rate limits
|
|
3. Server decodes fingerprint from request
|
|
4. Server extracts query features from fingerprint
|
|
5. Server sends search request to index
|
|
6. Index returns candidate fingerprint IDs
|
|
7. Server fetches metadata from PostgreSQL
|
|
8. Server fetches MusicBrainz data if requested
|
|
9. Server returns enriched results as JSON
|
|
|
|
## Technology Stack Summary
|
|
|
|
| Component | Server | Index |
|
|
|-----------|--------|-------|
|
|
| Language | Python 3.12+ | Zig |
|
|
| Web Framework | Flask/Starlette | httpz |
|
|
| Database | PostgreSQL 17.4 | N/A (file-based) |
|
|
| ORM | SQLAlchemy 2.x | N/A |
|
|
| Cache | Redis | N/A |
|
|
| Queue | NATS+JetStream | NATS (optional) |
|
|
| Serialization | JSON/MessagePack | MessagePack |
|
|
| Compression | zstd | StreamVByte |
|
|
| Metrics | StatsD | Prometheus |
|
|
| Testing | pytest | Zig test |
|
|
| Build | uv | zig build |
|
|
| Container | Docker | Docker |
|
|
|
|
## Repository Structure
|
|
|
|
### acoustid-server
|
|
|
|
```
|
|
acoustid/
|
|
├── api/ # API handlers
|
|
│ └── v2/ # API v2 endpoints
|
|
├── data/ # Business logic layer
|
|
├── future/ # Starlette migration code
|
|
├── web/ # Web UI handlers
|
|
├── scripts/ # Utility scripts
|
|
├── cli.py # CLI commands
|
|
├── server.py # Server entry point
|
|
├── worker.py # Background worker
|
|
├── cron.py # Scheduled tasks
|
|
├── fingerprint.py # Fingerprint utilities
|
|
├── indexclient.py # Legacy index client
|
|
├── fpstore.py # Modern index client
|
|
├── db.py # Database connection
|
|
├── config.py # Configuration
|
|
└── tables.py # SQLAlchemy models
|
|
```
|
|
|
|
### acoustid-index
|
|
|
|
```
|
|
src/
|
|
├── main.zig # Entry point
|
|
├── server.zig # HTTP server
|
|
├── api.zig # API handlers
|
|
├── MultiIndex.zig # Multi-index manager
|
|
├── Index.zig # Core index
|
|
├── IndexReader.zig # Read-only index view
|
|
├── segment.zig # Segment interface
|
|
├── MemorySegment.zig # In-memory segment
|
|
├── FileSegment.zig # On-disk segment
|
|
├── Oplog.zig # Write-ahead log
|
|
├── filefmt.zig # File format
|
|
├── block.zig # Block compression
|
|
├── streamvbyte.zig # SIMD compression
|
|
└── metrics.zig # Prometheus metrics
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
For detailed information on specific aspects of the AcoustID system, refer to:
|
|
|
|
- **ARCHITECTURE.md**: Detailed architecture and data flow
|
|
- **API.md**: Complete API reference
|
|
- **DATA.md**: Database schema and data models
|
|
- **INTEGRATIONS.md**: External service integrations
|
|
- **DEPLOYMENT.md**: Deployment and infrastructure
|
|
- **CODEBASE.md**: Code organization and patterns
|
|
- **EVALUATION.md**: System evaluation and recommendations
|