feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
@@ -0,0 +1,55 @@
+# AcoustID
+
+## Overview
+
+AcoustID is an open-source audio fingerprinting service. It identifies music tracks by their acoustic fingerprint and links them to MusicBrainz recordings.
+
+## Key Features
+
+- **Purpose**: Audio identification via acoustic fingerprinting
+- **Technology**: Chromaprint fingerprint generation
+- **Database**: Crowdsourced fingerprints linked to MusicBrainz
+- **License**: MIT (code), CC BY-SA 3.0 (data)
+
+## Source
+
+| Resource | URL |
+|----------|-----|
+| **Server Repository** | https://github.com/acoustid/acoustid-server |
+| **Index Repository** | https://github.com/acoustid/acoustid-index |
+| **Chromaprint Library** | https://github.com/acoustid/chromaprint |
+| **API Documentation** | https://acoustid.org/webservice |
+| **Website** | https://acoustid.org |
+
+## API Examples
+
+```bash
+# Lookup by fingerprint
+GET /v2/lookup?client=YOUR_API_KEY&meta=recordings&fingerprint={fp}&duration={dur}
+
+# Submit new fingerprint
+POST /v2/submit
+```
+
+## Chromaprint CLI
+
+```bash
+# Generate fingerprint from audio file
+fpcalc song.mp3
+# Returns: FINGERPRINT=... DURATION=...
+```
+
+## Self-Hosting
+
+The acoustid-index v2 is written in Zig for performance:
+
+```bash
+git clone https://github.com/acoustid/acoustid-index.git
+# Follow build instructions in README
+```
+
+## Notes
+
+- Used by: Beets, Picard, Kid3, MusicBrainz ecosystem
+- Free API for audio fingerprint matching
+- Identify unknown files → get MusicBrainz metadata
@@ -0,0 +1,807 @@
+# AcoustID API Reference
+
+## API Overview
+
+The AcoustID API provides fingerprint-based music identification services. The API is RESTful, supports multiple response formats (JSON, XML, JSONP), and requires API key authentication for most operations.
+
+**Base URL**: `https://api.acoustid.org`  
+**Protocol**: HTTPS only  
+**Authentication**: API key (application key + user key for submissions)  
+**Rate Limiting**: Multi-tier (global, application, IP-based)
+
+## Public API Endpoints
+
+### Fingerprint Lookup
+
+Identify recordings by audio fingerprint.
+
+#### `/v2/lookup`
+
+**Methods**: GET, POST  
+**Authentication**: Required (client key)  
+**Rate Limit**: 3 requests/second (IP), 10 requests/second (application)
+
+**Required Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `client` | string | Application API key |
+| `duration` | integer | Track duration in seconds (if using fingerprint) |
+| `trackid` | string | AcoustID track ID (alternative to fingerprint) |
+
+**Optional Parameters**:
+
+| Parameter | Type | Description | Default |
+|-----------|------|-------------|---------|
+| `fingerprint` | string | Chromaprint fingerprint (base64 or compressed) | - |
+| `format` | string | Response format: `json`, `xml`, `jsonp` | `json` |
+| `jsoncallback` | string | JSONP callback function name | - |
+| `meta` | string | Metadata to include (see below) | - |
+
+**Metadata Options** (comma-separated):
+
+- `recordings`: Include MusicBrainz recording metadata
+- `recordingids`: Include only recording MBIDs (faster)
+- `releases`: Include release metadata
+- `releaseids`: Include only release MBIDs
+- `releasegroups`: Include release group metadata
+- `releasegroupids`: Include only release group MBIDs
+- `tracks`: Include track metadata
+- `compress`: Compress response with gzip
+- `usermeta`: Include user-submitted metadata
+- `sources`: Include submission source information
+
+**Batch Lookup**:
+
+Submit multiple fingerprints in a single request using indexed parameters:
+
+```
+duration.0=240&fingerprint.0=AQADtN...
+duration.1=180&fingerprint.1=AQABtK...
+```
+
+**Limits**:
+- Maximum 20 fingerprints per batch request
+- Maximum 100 track IDs per request
+
+**Example Request** (GET):
+```
+GET /v2/lookup?client=8XaBELgH&duration=240&fingerprint=AQADtNGiJE...&meta=recordings
+```
+
+**Example Request** (POST):
+```
+POST /v2/lookup
+Content-Type: application/x-www-form-urlencoded
+
+client=8XaBELgH&duration=240&fingerprint=AQADtNGiJE...&meta=recordings
+```
+
+**Example Response** (JSON):
+```json
+{
+  "status": "ok",
+  "results": [
+    {
+      "id": "7e8b1234-5678-90ab-cdef-1234567890ab",
+      "score": 0.95,
+      "recordings": [
+        {
+          "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
+          "title": "Example Song",
+          "duration": 240,
+          "artists": [
+            {
+              "id": "12345678-90ab-cdef-1234-567890abcdef",
+              "name": "Example Artist"
+            }
+          ],
+          "releases": [
+            {
+              "id": "abcdef12-3456-7890-abcd-ef1234567890",
+              "title": "Example Album",
+              "country": "US",
+              "date": {
+                "year": 2020,
+                "month": 5,
+                "day": 15
+              },
+              "track_count": 12,
+              "medium_count": 1
+            }
+          ]
+        }
+      ]
+    }
+  ]
+}
+```
+
+**Response Fields**:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | string | `ok` or `error` |
+| `results` | array | Array of match results |
+| `results[].id` | string | AcoustID track ID |
+| `results[].score` | float | Match confidence (0.0-1.0) |
+| `results[].recordings` | array | MusicBrainz recordings (if requested) |
+
+### Fingerprint Submission
+
+Submit audio fingerprints with optional metadata.
+
+#### `/v2/submit`
+
+**Method**: POST  
+**Authentication**: Required (client key + user key)  
+**Rate Limit**: 3 requests/second (IP), 10 requests/second (application)
+
+**Required Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `client` | string | Application API key |
+| `user` | string | User API key |
+| `duration.#` | integer | Track duration in seconds |
+| `fingerprint.#` | string | Chromaprint fingerprint |
+
+**Optional Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `clientversion` | string | Client application version |
+| `bitrate.#` | integer | Audio bitrate in kbps |
+| `fileformat.#` | string | Audio file format (mp3, flac, etc.) |
+| `mbid.#` | string | MusicBrainz recording MBID |
+| `track.#` | string | Track title |
+| `artist.#` | string | Artist name |
+| `album.#` | string | Album title |
+| `albumartist.#` | string | Album artist name |
+| `year.#` | integer | Release year |
+| `trackno.#` | integer | Track number |
+| `discno.#` | integer | Disc number |
+
+**Batch Submission**:
+
+Use indexed parameters (`.0`, `.1`, `.2`, etc.) to submit multiple fingerprints:
+
+```
+duration.0=240&fingerprint.0=AQADtN...&mbid.0=a1b2c3d4...
+duration.1=180&fingerprint.1=AQABtK...&mbid.1=e5f67890...
+```
+
+**Example Request**:
+```
+POST /v2/submit
+Content-Type: application/x-www-form-urlencoded
+
+client=8XaBELgH&user=AbCdEfGh&duration.0=240&fingerprint.0=AQADtNGiJE...&mbid.0=a1b2c3d4-e5f6-7890-abcd-ef1234567890
+```
+
+**Example Response**:
+```json
+{
+  "status": "ok",
+  "submissions": [
+    {
+      "id": 12345678,
+      "status": "pending"
+    }
+  ]
+}
+```
+
+**Response Fields**:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `status` | string | `ok` or `error` |
+| `submissions` | array | Array of submission results |
+| `submissions[].id` | integer | Submission ID |
+| `submissions[].status` | string | `pending`, `imported`, or `error` |
+
+### Submission Status
+
+Check the processing status of submitted fingerprints.
+
+#### `/v2/submission_status`
+
+**Method**: GET  
+**Authentication**: Required (client key)
+
+**Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `client` | string | Application API key |
+| `id` | integer | Submission ID (from submit response) |
+| `format` | string | Response format: `json`, `xml`, `jsonp` |
+
+**Example Request**:
+```
+GET /v2/submission_status?client=8XaBELgH&id=12345678
+```
+
+**Example Response**:
+```json
+{
+  "status": "ok",
+  "submission": {
+    "id": 12345678,
+    "status": "imported",
+    "result": {
+      "id": "7e8b1234-5678-90ab-cdef-1234567890ab"
+    }
+  }
+}
+```
+
+**Status Values**:
+- `pending`: Queued for processing
+- `imported`: Successfully processed
+- `error`: Processing failed
+
+### Fingerprint Retrieval
+
+Retrieve stored fingerprint data.
+
+#### `/v2/fingerprint`
+
+**Method**: GET  
+**Authentication**: Required (client key)
+
+**Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `client` | string | Application API key |
+| `id` | string | AcoustID track ID |
+| `format` | string | Response format: `json`, `xml`, `jsonp` |
+
+**Example Request**:
+```
+GET /v2/fingerprint?client=8XaBELgH&id=7e8b1234-5678-90ab-cdef-1234567890ab
+```
+
+**Example Response**:
+```json
+{
+  "status": "ok",
+  "fingerprints": [
+    {
+      "id": 987654321,
+      "fingerprint": "AQADtNGiJE...",
+      "duration": 240,
+      "submission_count": 5
+    }
+  ]
+}
+```
+
+### Track Listing by MBID
+
+List AcoustID tracks linked to a MusicBrainz recording.
+
+#### `/v2/track/list_by_mbid`
+
+**Method**: GET  
+**Authentication**: Required (client key)
+
+**Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `client` | string | Application API key |
+| `mbid` | string | MusicBrainz recording MBID |
+| `format` | string | Response format: `json`, `xml`, `jsonp` |
+
+**Example Request**:
+```
+GET /v2/track/list_by_mbid?client=8XaBELgH&mbid=a1b2c3d4-e5f6-7890-abcd-ef1234567890
+```
+
+**Example Response**:
+```json
+{
+  "status": "ok",
+  "tracks": [
+    {
+      "id": "7e8b1234-5678-90ab-cdef-1234567890ab",
+      "disabled": false
+    }
+  ]
+}
+```
+
+### Track Listing by PUID
+
+List AcoustID tracks linked to a MusicIP PUID (legacy).
+
+#### `/v2/track/list_by_puid`
+
+**Method**: GET  
+**Authentication**: Required (client key)
+
+**Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `client` | string | Application API key |
+| `puid` | string | MusicIP PUID |
+| `format` | string | Response format: `json`, `xml`, `jsonp` |
+
+### User Management
+
+#### `/v2/user/lookup`
+
+Lookup user API key by MusicBrainz account.
+
+**Method**: POST  
+**Authentication**: Required (client key)
+
+**Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `client` | string | Application API key |
+| `musicbrainz_id` | string | MusicBrainz username |
+
+#### `/v2/user/create_anonymous`
+
+Create anonymous user API key.
+
+**Method**: POST  
+**Authentication**: Required (client key)
+
+**Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `client` | string | Application API key |
+
+**Example Response**:
+```json
+{
+  "status": "ok",
+  "user": {
+    "apikey": "AbCdEfGh"
+  }
+}
+```
+
+#### `/v2/user/create_musicbrainz`
+
+Create user API key linked to MusicBrainz account.
+
+**Method**: POST  
+**Authentication**: Required (client key)
+
+**Parameters**:
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `client` | string | Application API key |
+| `access_token` | string | MusicBrainz OAuth access token |
+
+## Legacy API Endpoints
+
+### `/lookup`
+
+Legacy lookup endpoint (API v1).
+
+**Status**: Deprecated, use `/v2/lookup` instead  
+**Differences**: Limited metadata options, different response format
+
+### `/submit`
+
+Legacy submit endpoint (API v1).
+
+**Status**: Deprecated, use `/v2/submit` instead  
+**Differences**: Synchronous processing, no batch support
+
+## Health Check Endpoints
+
+### `/_health`
+
+Full health check with database write test.
+
+**Method**: GET  
+**Authentication**: None
+
+**Response**:
+```json
+{
+  "status": "ok"
+}
+```
+
+**Status Codes**:
+- `200`: All systems operational
+- `503`: Service unavailable
+
+### `/_health_ro`
+
+Read-only health check (database read test only).
+
+**Method**: GET  
+**Authentication**: None
+
+### `/_health_docker`
+
+Docker-specific health check (minimal checks).
+
+**Method**: GET  
+**Authentication**: None
+
+## Internal API Endpoints
+
+These endpoints are for administrative use only and require special authentication.
+
+### `/v2/internal/update_lookup_stats`
+
+Trigger lookup statistics update.
+
+**Method**: POST  
+**Authentication**: Internal only
+
+### `/v2/internal/update_user_agent_stats`
+
+Trigger user agent statistics update.
+
+**Method**: POST  
+**Authentication**: Internal only
+
+### `/v2/internal/lookup_stats`
+
+Retrieve lookup statistics.
+
+**Method**: GET  
+**Authentication**: Internal only
+
+### `/v2/internal/create_account`
+
+Create new user account.
+
+**Method**: POST  
+**Authentication**: Internal only
+
+### `/v2/internal/create_application`
+
+Create new API application.
+
+**Method**: POST  
+**Authentication**: Internal only
+
+### `/v2/internal/update_application_status`
+
+Update application status (active/inactive).
+
+**Method**: POST  
+**Authentication**: Internal only
+
+### `/v2/internal/check_application`
+
+Check application validity.
+
+**Method**: GET  
+**Authentication**: Internal only
+
+## Index API Endpoints
+
+The fingerprint index service exposes its own HTTP API (separate from the main API).
+
+**Base URL**: `http://index:6081` (internal)  
+**Protocol**: HTTP  
+**Format**: MessagePack
+
+### `PUT /:index`
+
+Create new index.
+
+**Parameters**:
+- `:index`: Index name
+
+### `GET /:index`
+
+Get index information.
+
+**Response**:
+```json
+{
+  "name": "fingerprints",
+  "doc_count": 1234567,
+  "segment_count": 42,
+  "memory_segment_size": 1048576
+}
+```
+
+### `DELETE /:index`
+
+Delete index.
+
+### `POST /:index/_search`
+
+Search for fingerprints.
+
+**Request Body** (MessagePack):
+```python
+{
+    "query": [term1, term2, term3, ...],
+    "limit": 10,
+    "min_score": 0.5
+}
+```
+
+**Response** (MessagePack):
+```python
+{
+    "results": [
+        {"id": fpid1, "score": 0.95},
+        {"id": fpid2, "score": 0.87}
+    ]
+}
+```
+
+### `POST /:index/_update`
+
+Batch update fingerprints.
+
+**Request Body** (MessagePack):
+```python
+{
+    "updates": [
+        {"id": fpid1, "terms": [term1, term2, ...]},
+        {"id": fpid2, "terms": [term3, term4, ...]}
+    ]
+}
+```
+
+### `GET /:index/_segments`
+
+List index segments.
+
+**Response**:
+```json
+{
+  "segments": [
+    {
+      "id": 0,
+      "type": "memory",
+      "doc_count": 1024,
+      "size_bytes": 1048576
+    },
+    {
+      "id": 1,
+      "type": "file",
+      "doc_count": 100000,
+      "size_bytes": 52428800
+    }
+  ]
+}
+```
+
+### `GET /:index/_snapshot`
+
+Create index snapshot.
+
+**Response**:
+```json
+{
+  "snapshot_id": "snapshot_20250428_120000",
+  "path": "/var/lib/acoustid-index/snapshots/snapshot_20250428_120000"
+}
+```
+
+### `PUT /:index/:fpid`
+
+Insert or update fingerprint.
+
+**Parameters**:
+- `:index`: Index name
+- `:fpid`: Fingerprint ID
+
+**Request Body** (MessagePack):
+```python
+{
+    "terms": [term1, term2, term3, ...]
+}
+```
+
+### `GET /:index/:fpid`
+
+Retrieve fingerprint.
+
+**Response** (MessagePack):
+```python
+{
+    "id": fpid,
+    "terms": [term1, term2, term3, ...]
+}
+```
+
+### `DELETE /:index/:fpid`
+
+Delete fingerprint.
+
+### `GET /_health`
+
+Index health check.
+
+**Response**:
+```json
+{
+  "status": "ok"
+}
+```
+
+### `GET /_metrics`
+
+Prometheus metrics.
+
+**Response** (Prometheus text format):
+```
+# HELP fpindex_search_duration_seconds Search duration
+# TYPE fpindex_search_duration_seconds histogram
+fpindex_search_duration_seconds_bucket{le="0.005"} 1234
+fpindex_search_duration_seconds_bucket{le="0.01"} 5678
+...
+```
+
+## Rate Limiting
+
+### Rate Limit Tiers
+
+AcoustID implements a three-tier rate limiting system:
+
+| Tier | Scope | Default Limit | Override |
+|------|-------|---------------|----------|
+| Global | All requests | 3 req/s | Config: `cluster.rate_limiter.global_limit` |
+| Application | Per API key | 10 req/s | Database: `application.rate_limit` |
+| IP Address | Per client IP | 3 req/s | Config: `cluster.rate_limiter.ip_limit` |
+
+### Rate Limit Algorithm
+
+**Implementation**: Redis-based sliding window
+
+**Window Configuration**:
+- Window duration: 20 seconds
+- Window steps: 4 (5-second buckets)
+- Cleanup: Automatic expiration (25-second TTL)
+
+**Redis Keys**:
+```
+rl:bucket:global:{timestamp}
+rl:bucket:app:{api_key}:{timestamp}
+rl:bucket:ip:{ip_address}:{timestamp}
+```
+
+### Rate Limit Headers
+
+Responses include rate limit information:
+
+```
+X-RateLimit-Limit: 10
+X-RateLimit-Remaining: 7
+X-RateLimit-Reset: 1714305600
+```
+
+### Rate Limit Exceeded Response
+
+**Status Code**: 429 Too Many Requests
+
+**Response**:
+```json
+{
+  "status": "error",
+  "error": {
+    "code": 5,
+    "message": "Rate limit exceeded"
+  }
+}
+```
+
+## Error Handling
+
+### Error Response Format
+
+All errors return a consistent structure:
+
+```json
+{
+  "status": "error",
+  "error": {
+    "code": 1,
+    "message": "Invalid API key"
+  }
+}
+```
+
+### Error Codes
+
+| Code | Message | Description |
+|------|---------|-------------|
+| 1 | Invalid API key | Client or user key is invalid |
+| 2 | Missing required parameter | Required parameter not provided |
+| 3 | Invalid fingerprint | Fingerprint format is invalid |
+| 4 | Internal error | Server-side error occurred |
+| 5 | Rate limit exceeded | Too many requests |
+| 6 | Invalid format | Unsupported response format |
+| 7 | Fingerprint not found | Requested fingerprint doesn't exist |
+| 8 | Too many requests | Batch size exceeds limit |
+
+### HTTP Status Codes
+
+| Code | Meaning | Usage |
+|------|---------|-------|
+| 200 | OK | Successful request |
+| 400 | Bad Request | Invalid parameters |
+| 401 | Unauthorized | Missing or invalid API key |
+| 403 | Forbidden | API key lacks permission |
+| 404 | Not Found | Resource not found |
+| 429 | Too Many Requests | Rate limit exceeded |
+| 500 | Internal Server Error | Server error |
+| 503 | Service Unavailable | Service down or degraded |
+
+## Authentication
+
+### API Key Types
+
+1. **Application Key** (`client` parameter):
+   - Identifies the client application
+   - Required for all API calls
+   - Obtain from https://acoustid.org/new-application
+
+2. **User Key** (`user` parameter):
+   - Identifies the end user
+   - Required for submissions
+   - Created via `/v2/user/create_*` endpoints
+
+3. **Demo Key**:
+   - Limited functionality
+   - For testing only
+   - Key: `8XaBELgH`
+
+### Key Management
+
+**Application Keys**:
+- Created via web UI or internal API
+- Can be active or inactive
+- Rate limits configurable per key
+- Usage statistics tracked
+
+**User Keys**:
+- Anonymous or MusicBrainz-linked
+- Created programmatically
+- Tied to application key
+- Submission history tracked
+
+## Best Practices
+
+### Lookup Optimization
+
+1. **Use batch lookups** for multiple files (up to 20 per request)
+2. **Request only needed metadata** (use specific `meta` flags)
+3. **Cache results** to avoid redundant lookups
+4. **Handle rate limits** with exponential backoff
+
+### Submission Guidelines
+
+1. **Include MBIDs** when known (improves accuracy)
+2. **Provide metadata** (artist, album, track) for better matching
+3. **Use batch submissions** for efficiency
+4. **Poll submission status** asynchronously
+
+### Error Handling
+
+1. **Retry on 5xx errors** with exponential backoff
+2. **Respect rate limits** (check headers)
+3. **Validate fingerprints** before submission
+4. **Log errors** for debugging
+
+### Performance
+
+1. **Use POST** for large requests (avoid URL length limits)
+2. **Enable compression** (`meta=compress`)
+3. **Reuse connections** (HTTP keep-alive)
+4. **Implement timeouts** (30-60 seconds recommended)
@@ -0,0 +1,611 @@
+# AcoustID Architecture
+
+## System Architecture Overview
+
+AcoustID employs a **monolithic multi-process architecture** with microservice-like separation of concerns. The system is split into two major repositories with distinct responsibilities:
+
+1. **acoustid-server**: Monolithic Python application with multiple process types
+2. **acoustid-index**: Standalone Zig service for fingerprint indexing
+
+## Server Architecture
+
+### Process Types
+
+The server runs as multiple independent processes, each with a specific role:
+
+| Process | Entry Point | Purpose | Scaling |
+|---------|-------------|---------|---------|
+| API | `acoustid.server:make_application()` | Handle API requests | Horizontal |
+| Web | `acoustid.server:make_application()` | Serve web UI | Horizontal |
+| Worker | `acoustid.worker:run()` | Process background jobs | Horizontal |
+| Cron | `acoustid.cron:run()` | Execute scheduled tasks | Single instance |
+| Import | `acoustid.scripts.import_submissions` | Bulk import fingerprints | Manual |
+
+### Directory Structure
+
+```
+acoustid/
+├── api/                    # API layer
+│   ├── __init__.py        # API application factory
+│   ├── errors.py          # Error handling
+│   ├── ratelimit.py       # Rate limiting logic
+│   └── v2/                # API v2 endpoints
+│       ├── __init__.py
+│       ├── lookup.py      # Fingerprint lookup
+│       ├── submit.py      # Fingerprint submission
+│       ├── misc.py        # Utility endpoints
+│       └── internal.py    # Internal admin endpoints
+├── data/                   # Business logic layer
+│   ├── account.py         # User account operations
+│   ├── application.py     # API application management
+│   ├── fingerprint.py     # Fingerprint operations
+│   ├── foreignid.py       # Foreign ID management
+│   ├── meta.py            # Metadata operations
+│   ├── musicbrainz.py     # MusicBrainz queries
+│   ├── stats.py           # Statistics tracking
+│   ├── submission.py      # Submission processing
+│   └── track.py           # Track operations
+├── future/                 # Starlette migration
+│   ├── app.py             # ASGI application
+│   ├── lookup.py          # Async lookup handler
+│   └── submit.py          # Async submit handler
+├── web/                    # Web UI layer
+│   ├── __init__.py        # Web application factory
+│   ├── views/             # View handlers
+│   └── templates/         # Jinja2 templates
+├── scripts/                # Utility scripts
+│   ├── import_submissions.py
+│   ├── backfill_fingerprint_index.py
+│   └── update_lookup_stats.py
+├── cli.py                  # CLI command definitions
+├── server.py               # WSGI/ASGI application
+├── worker.py               # Background worker
+├── cron.py                 # Cron job scheduler
+├── fingerprint.py          # Fingerprint utilities
+├── indexclient.py          # Legacy TCP index client
+├── fpstore.py              # Modern HTTP index client
+├── db.py                   # Database connection management
+├── config.py               # Configuration loading
+└── tables.py               # SQLAlchemy ORM models
+```
+
+### Layered Architecture
+
+The server follows a traditional layered architecture:
+
+```
+┌─────────────────────────────────────────┐
+│     Presentation Layer                  │
+│  (api/, web/, future/)                  │
+│  - HTTP request/response handling       │
+│  - Input validation                     │
+│  - Response formatting                  │
+└─────────────────────────────────────────┘
+              ↓
+┌─────────────────────────────────────────┐
+│     Business Logic Layer                │
+│  (data/)                                │
+│  - Domain operations                    │
+│  - Business rules                       │
+│  - Orchestration                        │
+└─────────────────────────────────────────┘
+              ↓
+┌─────────────────────────────────────────┐
+│     Data Access Layer                   │
+│  (db.py, tables.py)                     │
+│  - Database queries                     │
+│  - ORM models                           │
+│  - Transaction management               │
+└─────────────────────────────────────────┘
+              ↓
+┌─────────────────────────────────────────┐
+│     External Services Layer             │
+│  (indexclient.py, fpstore.py)           │
+│  - Index communication                  │
+│  - MusicBrainz queries                  │
+│  - Redis operations                     │
+└─────────────────────────────────────────┘
+```
+
+### Framework Transition
+
+The server is actively transitioning from Flask to Starlette:
+
+**Current (Flask/Werkzeug)**:
+- Location: `acoustid/api/`, `acoustid/web/`
+- WSGI-based synchronous request handling
+- Gunicorn as application server
+- Blocking database operations with psycopg2
+
+**Future (Starlette)**:
+- Location: `acoustid/future/`
+- ASGI-based asynchronous request handling
+- Uvicorn as application server
+- Async database operations with asyncpg
+
+**Migration Status**:
+- Core lookup and submit endpoints have async implementations
+- Legacy endpoints still use Flask
+- Both frameworks run simultaneously during transition
+- Configuration flag controls which implementation is used
+
+## Index Architecture
+
+### LSM-Tree Design
+
+The index uses a **Log-Structured Merge-tree (LSM-tree)** for efficient fingerprint storage and retrieval.
+
+**Core Concept**:
+- Writes go to in-memory segment (fast)
+- Memory segment periodically flushed to disk
+- Background process merges disk segments
+- Reads check memory segment first, then disk segments
+
+**Components**:
+
+```
+┌─────────────────────────────────────────┐
+│         MultiIndex                      │
+│  - Manages multiple named indexes       │
+│  - Routes requests to correct index     │
+└─────────────────────────────────────────┘
+              ↓
+┌─────────────────────────────────────────┐
+│         Index                           │
+│  - Single fingerprint index             │
+│  - Coordinates segments and merging     │
+└─────────────────────────────────────────┘
+              ↓
+┌──────────────────┬──────────────────────┐
+│  MemorySegment   │   FileSegment(s)     │
+│  - In-memory     │   - On-disk          │
+│  - Fast writes   │   - Immutable        │
+│  - Volatile      │   - Persistent       │
+└──────────────────┴──────────────────────┘
+              ↓
+┌─────────────────────────────────────────┐
+│         Oplog (Write-Ahead Log)         │
+│  - Durability for memory segment        │
+│  - Replay on crash recovery             │
+└─────────────────────────────────────────┘
+```
+
+### Segment Management
+
+**MemorySegment** (`src/MemorySegment.zig`):
+- Hash map of fingerprint ID to posting list
+- Posting list: array of term IDs (compressed)
+- Maximum size threshold triggers flush
+- Backed by Oplog for durability
+
+**FileSegment** (`src/FileSegment.zig`):
+- Immutable on-disk segment
+- Binary file format with index and data sections
+- StreamVByte compression for posting lists
+- Memory-mapped for fast reads
+
+**Segment Lifecycle**:
+1. Writes accumulate in MemorySegment
+2. MemorySegment reaches size threshold
+3. Flush to new FileSegment
+4. Clear MemorySegment and Oplog
+5. Background merger selects segments to merge
+6. Merge creates new larger FileSegment
+7. Delete old segments
+
+### Merge Policy
+
+**Tiered Merge Strategy**:
+- Segments grouped into tiers by size
+- Tier 0: Smallest segments (recently flushed)
+- Tier N: Largest segments (heavily merged)
+- Merge triggered when tier has too many segments
+- Merges segments within same tier
+
+**Benefits**:
+- Write amplification bounded
+- Read performance improves over time
+- Disk space reclaimed from deleted entries
+
+### File Format
+
+**Segment File Structure** (`src/filefmt.zig`):
+
+```
+┌─────────────────────────────────────────┐
+│  Header                                 │
+│  - Magic number                         │
+│  - Version                              │
+│  - Metadata                             │
+├─────────────────────────────────────────┤
+│  Index Section                          │
+│  - Fingerprint ID → Offset mapping      │
+│  - Binary search tree or hash table     │
+├─────────────────────────────────────────┤
+│  Data Section                           │
+│  - Compressed posting lists             │
+│  - StreamVByte encoded                  │
+└─────────────────────────────────────────┘
+```
+
+**Block Compression** (`src/block.zig`):
+- Posting lists compressed in blocks
+- StreamVByte SIMD compression
+- Delta encoding for term IDs
+- Typical compression ratio: 4-8x
+
+### Index Reader
+
+**IndexReader** (`src/IndexReader.zig`):
+- Read-only view of index
+- Merges results from all segments
+- Implements search algorithm
+- Returns top-K candidates by score
+
+**Search Algorithm**:
+1. Extract query terms from fingerprint
+2. For each term, fetch posting lists from all segments
+3. Merge posting lists (union)
+4. Score each candidate by term overlap
+5. Return top-K candidates sorted by score
+
+## Data Flow
+
+### Submission Flow (Detailed)
+
+```
+┌─────────┐
+│ Client  │
+└────┬────┘
+     │ POST /v2/submit
+     ↓
+┌─────────────────────────────────────────┐
+│  SubmitHandler (api/v2/submit.py)      │
+│  1. Validate API keys (client + user)  │
+│  2. Check rate limits (Redis)          │
+│  3. Decode fingerprints                │
+│  4. Insert into submission table       │
+│  5. Publish to NATS queue              │
+└─────────────────────────────────────────┘
+     │
+     ↓ NATS message
+┌─────────────────────────────────────────┐
+│  Worker (worker.py)                     │
+│  1. Consume message from NATS          │
+│  2. Load submission from database      │
+└─────────────────────────────────────────┘
+     │
+     ↓
+┌─────────────────────────────────────────┐
+│  FingerprintSearcher (data/fingerprint) │
+│  1. Extract query from fingerprint     │
+│  2. Search index for matches           │
+└─────────────────────────────────────────┘
+     │
+     ↓ HTTP POST /:index/_search
+┌─────────────────────────────────────────┐
+│  Index (fpindex)                        │
+│  1. Decode MessagePack request         │
+│  2. Search segments                    │
+│  3. Score candidates                   │
+│  4. Return top matches                 │
+└─────────────────────────────────────────┘
+     │
+     ↓ Candidate fingerprint IDs
+┌─────────────────────────────────────────┐
+│  Worker (continued)                     │
+│  1. Fetch candidate metadata from DB   │
+│  2. Decide: create new track or link   │
+│  3. Insert/update track tables         │
+│  4. Update index with new fingerprint  │
+│  5. Store result in submission_result  │
+└─────────────────────────────────────────┘
+     │
+     ↓ HTTP PUT /:index/:fpid
+┌─────────────────────────────────────────┐
+│  Index (fpindex)                        │
+│  1. Add fingerprint to MemorySegment   │
+│  2. Append to Oplog                    │
+│  3. Trigger flush if needed            │
+└─────────────────────────────────────────┘
+```
+
+### Lookup Flow (Detailed)
+
+```
+┌─────────┐
+│ Client  │
+└────┬────┘
+     │ GET/POST /v2/lookup
+     ↓
+┌─────────────────────────────────────────┐
+│  LookupHandler (api/v2/lookup.py)      │
+│  1. Validate API key (client)          │
+│  2. Check rate limits (Redis)          │
+│  3. Parse parameters                   │
+└─────────────────────────────────────────┘
+     │
+     ↓
+┌─────────────────────────────────────────┐
+│  decode_fingerprint (fingerprint.py)    │
+│  1. Decode base64 or compressed format │
+│  2. Decompress if needed               │
+│  3. Parse Chromaprint data             │
+└─────────────────────────────────────────┘
+     │
+     ↓
+┌─────────────────────────────────────────┐
+│  extract_query (fingerprint.py)         │
+│  1. Extract hash terms from fingerprint│
+│  2. Build query structure              │
+└─────────────────────────────────────────┘
+     │
+     ↓
+┌─────────────────────────────────────────┐
+│  fpstore.search (fpstore.py)            │
+│  1. Encode query as MessagePack        │
+│  2. HTTP POST to index                 │
+└─────────────────────────────────────────┘
+     │
+     ↓ HTTP POST /:index/_search
+┌─────────────────────────────────────────┐
+│  Index (fpindex)                        │
+│  1. Parse MessagePack query            │
+│  2. Search all segments                │
+│  3. Merge and score results            │
+│  4. Return top-K candidates            │
+└─────────────────────────────────────────┘
+     │
+     ↓ Candidate fingerprint IDs + scores
+┌─────────────────────────────────────────┐
+│  LookupHandler (continued)              │
+│  1. Fetch fingerprint metadata from DB │
+│  2. Fetch track metadata from DB       │
+│  3. Fetch MusicBrainz data if requested│
+│  4. Build result structure             │
+│  5. Format as JSON/XML                 │
+└─────────────────────────────────────────┘
+     │
+     ↓ JSON response
+┌─────────┐
+│ Client  │
+└─────────┘
+```
+
+### Background Processing
+
+**Cron Jobs** (`acoustid/cron.py`):
+- Update lookup statistics (hourly)
+- Update user agent statistics (daily)
+- Clean up old submissions (daily)
+- Refresh materialized views (hourly)
+- Backup index snapshots (daily)
+
+**Worker Tasks** (`acoustid/worker.py`):
+- Process fingerprint submissions
+- Import bulk fingerprints
+- Update index with new data
+- Resolve MBID redirects
+- Clean up orphaned records
+
+## Index Communication Protocols
+
+### Legacy Protocol (indexclient.py)
+
+**Transport**: Raw TCP socket  
+**Port**: 6080 (default)  
+**Format**: Custom binary protocol
+
+**Message Structure**:
+```
+┌────────────────┬────────────────┬────────────────┐
+│  Length (4B)   │  Command (1B)  │  Payload       │
+└────────────────┴────────────────┴────────────────┘
+```
+
+**Commands**:
+- `0x01`: Search
+- `0x02`: Insert
+- `0x03`: Delete
+
+**Status**: Being phased out, replaced by HTTP protocol
+
+### Modern Protocol (fpstore.py)
+
+**Transport**: HTTP/1.1  
+**Port**: 6081 (default)  
+**Format**: MessagePack
+
+**Endpoints**:
+
+| Method | Path | Purpose |
+|--------|------|---------|
+| POST | `/:index/_search` | Search for fingerprints |
+| PUT | `/:index/:fpid` | Insert/update fingerprint |
+| DELETE | `/:index/:fpid` | Delete fingerprint |
+| GET | `/:index` | Get index info |
+| GET | `/:index/_segments` | List segments |
+| GET | `/:index/_snapshot` | Create snapshot |
+
+**Search Request**:
+```python
+{
+    "query": [term_id1, term_id2, ...],  # Query terms
+    "limit": 10,                          # Max results
+    "min_score": 0.5                      # Score threshold
+}
+```
+
+**Search Response**:
+```python
+{
+    "results": [
+        {"id": fpid1, "score": 0.95},
+        {"id": fpid2, "score": 0.87},
+        ...
+    ]
+}
+```
+
+## Concurrency and Parallelism
+
+### Server Concurrency
+
+**API/Web Processes**:
+- Multiple worker processes (Gunicorn/Uvicorn)
+- Each process handles requests independently
+- Shared-nothing architecture
+- Database connection pooling per process
+
+**Worker Processes**:
+- Multiple worker instances
+- NATS queue provides work distribution
+- Each worker processes one submission at a time
+- No shared state between workers
+
+**Cron Process**:
+- Single instance (leader election via database)
+- Scheduled tasks run sequentially
+- Long-running tasks delegated to workers
+
+### Index Concurrency
+
+**Thread Model**:
+- Main thread: HTTP server
+- Worker threads: Search and merge operations
+- Configurable thread pool size
+
+**Locking Strategy**:
+- Read-write lock on Index
+- Multiple concurrent readers
+- Exclusive writer (for flush/merge)
+- Lock-free MemorySegment (atomic operations)
+
+**Background Tasks**:
+- Segment merger runs in background thread
+- Oplog flusher runs periodically
+- Metrics collector runs independently
+
+## Scalability Considerations
+
+### Horizontal Scaling
+
+**API/Web**:
+- Stateless processes
+- Scale by adding more instances
+- Load balancer distributes requests
+- Session state in Redis (if needed)
+
+**Workers**:
+- Scale by adding more instances
+- NATS queue distributes work
+- No coordination required
+
+**Index**:
+- Multiple index instances (sharding)
+- Consistent hashing for fingerprint distribution
+- NATS for cluster coordination
+- Each instance handles subset of fingerprints
+
+### Vertical Scaling
+
+**Database**:
+- Connection pooling
+- Read replicas for queries
+- Partitioning for large tables
+- Materialized views for aggregations
+
+**Index**:
+- More threads for search
+- Larger memory segment
+- Faster disk for segments
+- More RAM for file caching
+
+## Fault Tolerance
+
+### Server Resilience
+
+**Database Failures**:
+- Connection retry with exponential backoff
+- Health checks detect failures
+- Read-only mode if write DB unavailable
+
+**Index Failures**:
+- Graceful degradation (return partial results)
+- Retry with exponential backoff
+- Circuit breaker pattern
+
+**NATS Failures**:
+- Persistent queue (JetStream)
+- Automatic reconnection
+- Message replay on recovery
+
+### Index Resilience
+
+**Crash Recovery**:
+- Oplog replay restores MemorySegment
+- FileSegments are immutable (no corruption)
+- Incomplete merges discarded
+
+**Data Integrity**:
+- Checksums in file format
+- Atomic file operations
+- Write-ahead logging
+
+**Replication**:
+- NATS-based replication (optional)
+- Snapshot-based backup
+- Point-in-time recovery
+
+## Performance Characteristics
+
+### Server Performance
+
+**Lookup Latency**:
+- P50: ~50ms (including index search)
+- P95: ~200ms
+- P99: ~500ms
+
+**Bottlenecks**:
+- Index search time (dominant)
+- Database query time (metadata fetch)
+- Network latency (MusicBrainz queries)
+
+### Index Performance
+
+**Search Latency**:
+- P50: ~5ms
+- P95: ~20ms
+- P99: ~50ms
+
+**Throughput**:
+- ~1000 searches/second (single instance)
+- ~500 inserts/second (single instance)
+
+**Bottlenecks**:
+- Disk I/O (segment reads)
+- CPU (decompression and scoring)
+- Memory (segment caching)
+
+## Future Architecture Plans
+
+### Server Modernization
+
+1. Complete migration to Starlette/ASGI
+2. Remove Flask dependencies
+3. Async database operations everywhere
+4. GraphQL API alongside REST
+
+### Index Enhancements
+
+1. Distributed index with automatic sharding
+2. Replication for high availability
+3. Incremental snapshots
+4. Query result caching
+
+### Infrastructure
+
+1. Kubernetes deployment
+2. Service mesh (Istio/Linkerd)
+3. Distributed tracing (OpenTelemetry)
+4. Advanced monitoring (Prometheus + Grafana)
@@ -0,0 +1,871 @@
+# AcoustID Data Model
+
+## Database Architecture
+
+AcoustID uses a multi-database PostgreSQL architecture with separate databases for different concerns.
+
+### Database Instances
+
+| Database | Purpose | Tables | Extensions |
+|----------|---------|--------|------------|
+| `acoustid_app` | Application data (accounts, apps, stats) | 8 | pgcrypto |
+| `acoustid_fingerprint` | Fingerprint and track data | 19 | intarray, acoustid, cube |
+| `acoustid_ingest` | Submission processing | 3 | - |
+| `musicbrainz` | MusicBrainz mirror (read-only) | Many | - |
+
+### PostgreSQL Extensions
+
+**intarray**: Integer array operations
+- Used for fingerprint array queries
+- Provides `&&` (overlap) and `@>` (contains) operators
+
+**pgcrypto**: Cryptographic functions
+- UUID generation (`gen_random_uuid()`)
+- API key hashing
+
+**acoustid** (custom): Fingerprint similarity functions
+- `acoustid_compare(int[], int[])`: Compare two fingerprints
+- `acoustid_extract_query(int[])`: Extract query terms
+- Source: `acoustid-ext` C extension
+
+**cube**: Multi-dimensional cube data type
+- Used for simhash-based fingerprint indexing
+- Enables fast approximate nearest neighbor search
+
+## Core Tables
+
+### Account Management (acoustid_app)
+
+#### `account`
+
+User accounts for API access.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Account ID |
+| `name` | VARCHAR(255) | NOT NULL | Display name |
+| `apikey` | VARCHAR(40) | UNIQUE, NOT NULL | API key (user key) |
+| `mbuser` | VARCHAR(64) | UNIQUE | MusicBrainz username |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+| `lastlogin` | TIMESTAMP | | Last login timestamp |
+| `submission_count` | INTEGER | DEFAULT 0 | Total submissions |
+| `application_id` | INTEGER | FOREIGN KEY | Default application |
+| `application_version` | VARCHAR(255) | | Application version |
+| `created_from` | INET | | Registration IP |
+| `is_admin` | BOOLEAN | DEFAULT FALSE | Admin flag |
+
+**Indexes**:
+- `account_pkey` (PRIMARY KEY on `id`)
+- `account_apikey_key` (UNIQUE on `apikey`)
+- `account_mbuser_key` (UNIQUE on `mbuser`)
+
+#### `application`
+
+API client applications.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Application ID |
+| `name` | VARCHAR(255) | NOT NULL | Application name |
+| `version` | VARCHAR(255) | | Version string |
+| `apikey` | VARCHAR(40) | UNIQUE, NOT NULL | API key (client key) |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+| `active` | BOOLEAN | DEFAULT TRUE | Active status |
+| `account_id` | INTEGER | FOREIGN KEY | Owner account |
+| `email` | VARCHAR(255) | | Contact email |
+| `website` | VARCHAR(1000) | | Website URL |
+| `rate_limit` | INTEGER | | Custom rate limit (req/s) |
+
+**Indexes**:
+- `application_pkey` (PRIMARY KEY on `id`)
+- `application_apikey_key` (UNIQUE on `apikey`)
+
+#### `account_openid`
+
+OpenID authentication links.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `openid` | VARCHAR(255) | PRIMARY KEY | OpenID identifier |
+| `account_id` | INTEGER | FOREIGN KEY | Linked account |
+
+#### `account_google`
+
+Google OAuth authentication links.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `google_user_id` | VARCHAR(255) | PRIMARY KEY | Google user ID |
+| `account_id` | INTEGER | FOREIGN KEY | Linked account |
+
+### Fingerprint Data (acoustid_fingerprint)
+
+#### `track`
+
+Unique audio tracks identified by fingerprints.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Track ID |
+| `gid` | UUID | UNIQUE, NOT NULL | Public track UUID |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+| `new_id` | INTEGER | FOREIGN KEY | Merge target (if merged) |
+| `disabled` | BOOLEAN | DEFAULT FALSE | Disabled flag |
+
+**Indexes**:
+- `track_pkey` (PRIMARY KEY on `id`)
+- `track_gid_key` (UNIQUE on `gid`)
+- `track_new_id_idx` (on `new_id`)
+
+**Notes**:
+- `gid` is the public-facing AcoustID track ID
+- `new_id` points to merged track (for deduplication)
+- Disabled tracks excluded from search results
+
+#### `fingerprint`
+
+Audio fingerprints linked to tracks.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Fingerprint ID |
+| `track_id` | INTEGER | FOREIGN KEY | Linked track |
+| `fingerprint` | INTEGER[] | NOT NULL | Chromaprint hash array |
+| `length` | SMALLINT | NOT NULL | Duration in seconds |
+| `bitrate` | SMALLINT | | Audio bitrate (kbps) |
+| `format_id` | INTEGER | FOREIGN KEY | Audio format |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
+
+**Indexes**:
+- `fingerprint_pkey` (PRIMARY KEY on `id`)
+- `fingerprint_track_id_idx` (on `track_id`)
+- `fingerprint_length_idx` (on `length`)
+- `fingerprint_fingerprint_idx` (GIN on `fingerprint` using `intarray`)
+
+**Notes**:
+- `fingerprint` is an array of 32-bit integers (Chromaprint hashes)
+- GIN index enables fast similarity search
+- `submission_count` tracks popularity
+
+#### `fingerprint_data`
+
+Extended fingerprint data with simhash.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `fingerprint_id` | INTEGER | PRIMARY KEY, FOREIGN KEY | Fingerprint ID |
+| `fingerprint` | BYTEA | NOT NULL | Raw fingerprint data |
+| `simhash` | CUBE | | Locality-sensitive hash |
+
+**Indexes**:
+- `fingerprint_data_pkey` (PRIMARY KEY on `fingerprint_id`)
+- `fingerprint_data_simhash_idx` (GIST on `simhash`)
+
+**Notes**:
+- `fingerprint` stores compressed Chromaprint data
+- `simhash` enables approximate nearest neighbor search
+- GIST index for fast similarity queries
+
+#### `track_mbid`
+
+Links tracks to MusicBrainz recordings.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Link ID |
+| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
+| `mbid` | UUID | NOT NULL | MusicBrainz recording MBID |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
+| `disabled` | BOOLEAN | DEFAULT FALSE | Disabled flag |
+
+**Indexes**:
+- `track_mbid_pkey` (PRIMARY KEY on `id`)
+- `track_mbid_track_id_mbid_key` (UNIQUE on `track_id, mbid`)
+- `track_mbid_mbid_idx` (on `mbid`)
+
+**Notes**:
+- Multiple MBIDs per track possible (different recordings)
+- `submission_count` indicates confidence
+- Disabled links excluded from results
+
+#### `meta`
+
+User-submitted metadata.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Metadata ID |
+| `track` | VARCHAR(255) | | Track title |
+| `artist` | VARCHAR(255) | | Artist name |
+| `album` | VARCHAR(255) | | Album title |
+| `album_artist` | VARCHAR(255) | | Album artist |
+| `track_no` | INTEGER | | Track number |
+| `disc_no` | INTEGER | | Disc number |
+| `year` | INTEGER | | Release year |
+
+**Indexes**:
+- `meta_pkey` (PRIMARY KEY on `id`)
+
+#### `track_meta`
+
+Links tracks to user metadata.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Link ID |
+| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
+| `meta_id` | INTEGER | FOREIGN KEY | Metadata record |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
+
+**Indexes**:
+- `track_meta_pkey` (PRIMARY KEY on `id`)
+- `track_meta_track_id_meta_id_key` (UNIQUE on `track_id, meta_id`)
+
+#### `format`
+
+Audio file formats.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Format ID |
+| `name` | VARCHAR(20) | UNIQUE, NOT NULL | Format name (mp3, flac, etc.) |
+
+**Indexes**:
+- `format_pkey` (PRIMARY KEY on `id`)
+- `format_name_key` (UNIQUE on `name`)
+
+**Common Values**:
+- `mp3`, `flac`, `ogg`, `m4a`, `wma`, `ape`, `wav`
+
+#### `source`
+
+Submission sources (applications).
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Source ID |
+| `application_id` | INTEGER | FOREIGN KEY | Application |
+| `account_id` | INTEGER | FOREIGN KEY | User account |
+| `version` | VARCHAR(255) | | Application version |
+
+**Indexes**:
+- `source_pkey` (PRIMARY KEY on `id`)
+- `source_application_id_account_id_version_key` (UNIQUE on `application_id, account_id, version`)
+
+### Foreign IDs (acoustid_fingerprint)
+
+#### `foreignid_vendor`
+
+External ID providers.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Vendor ID |
+| `name` | VARCHAR(255) | UNIQUE, NOT NULL | Vendor name |
+
+**Indexes**:
+- `foreignid_vendor_pkey` (PRIMARY KEY on `id`)
+- `foreignid_vendor_name_key` (UNIQUE on `name`)
+
+**Common Values**:
+- `musicbrainz`, `musicip`, `discogs`, `spotify`
+
+#### `foreignid`
+
+External identifiers.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Foreign ID |
+| `vendor_id` | INTEGER | FOREIGN KEY | Vendor |
+| `name` | VARCHAR(255) | NOT NULL | External ID value |
+
+**Indexes**:
+- `foreignid_pkey` (PRIMARY KEY on `id`)
+- `foreignid_vendor_id_name_key` (UNIQUE on `vendor_id, name`)
+
+#### `track_foreignid`
+
+Links tracks to external IDs.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Link ID |
+| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
+| `foreignid_id` | INTEGER | FOREIGN KEY | External ID |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
+
+**Indexes**:
+- `track_foreignid_pkey` (PRIMARY KEY on `id`)
+- `track_foreignid_track_id_foreignid_id_key` (UNIQUE on `track_id, foreignid_id`)
+
+#### `track_puid`
+
+Legacy MusicIP PUID links.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Link ID |
+| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
+| `puid` | UUID | NOT NULL | MusicIP PUID |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
+
+**Indexes**:
+- `track_puid_pkey` (PRIMARY KEY on `id`)
+- `track_puid_track_id_puid_key` (UNIQUE on `track_id, puid`)
+- `track_puid_puid_idx` (on `puid`)
+
+### Statistics (acoustid_app)
+
+#### `stats`
+
+General statistics.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Stat ID |
+| `name` | VARCHAR(255) | UNIQUE, NOT NULL | Stat name |
+| `value` | INTEGER | NOT NULL | Stat value |
+| `date` | DATE | NOT NULL | Stat date |
+
+**Indexes**:
+- `stats_pkey` (PRIMARY KEY on `id`)
+- `stats_name_date_key` (UNIQUE on `name, date`)
+
+**Common Stats**:
+- `lookup.count`, `submission.count`, `track.count`, `fingerprint.count`
+
+#### `stats_lookups`
+
+Lookup statistics by hour.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Stat ID |
+| `hour` | TIMESTAMP | NOT NULL | Hour timestamp |
+| `application_id` | INTEGER | FOREIGN KEY | Application |
+| `count_hits` | INTEGER | DEFAULT 0 | Successful lookups |
+| `count_misses` | INTEGER | DEFAULT 0 | Failed lookups |
+
+**Indexes**:
+- `stats_lookups_pkey` (PRIMARY KEY on `id`)
+- `stats_lookups_hour_application_id_key` (UNIQUE on `hour, application_id`)
+
+#### `stats_user_agents`
+
+User agent statistics.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Stat ID |
+| `date` | DATE | NOT NULL | Date |
+| `application_id` | INTEGER | FOREIGN KEY | Application |
+| `user_agent` | VARCHAR(1000) | NOT NULL | User agent string |
+| `ip` | INET | NOT NULL | IP address |
+| `count` | INTEGER | DEFAULT 0 | Request count |
+
+**Indexes**:
+- `stats_user_agents_pkey` (PRIMARY KEY on `id`)
+- `stats_user_agents_date_application_id_user_agent_ip_key` (UNIQUE on `date, application_id, user_agent, ip`)
+
+#### `stats_top_accounts`
+
+Top submitter accounts.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Stat ID |
+| `account_id` | INTEGER | FOREIGN KEY | Account |
+| `count` | INTEGER | NOT NULL | Submission count |
+
+**Indexes**:
+- `stats_top_accounts_pkey` (PRIMARY KEY on `id`)
+- `stats_top_accounts_account_id_key` (UNIQUE on `account_id`)
+
+### Submission Processing (acoustid_ingest)
+
+#### `submission`
+
+Pending fingerprint submissions.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Submission ID |
+| `fingerprint` | INTEGER[] | NOT NULL | Chromaprint hash array |
+| `length` | SMALLINT | NOT NULL | Duration in seconds |
+| `bitrate` | SMALLINT | | Audio bitrate |
+| `format_id` | INTEGER | | Audio format |
+| `created` | TIMESTAMP | NOT NULL | Submission timestamp |
+| `source_id` | INTEGER | FOREIGN KEY | Submission source |
+| `mbid` | UUID | | MusicBrainz MBID (if provided) |
+| `handled` | BOOLEAN | DEFAULT FALSE | Processing status |
+| `meta_id` | INTEGER | FOREIGN KEY | User metadata |
+
+**Indexes**:
+- `submission_pkey` (PRIMARY KEY on `id`)
+- `submission_handled_idx` (on `handled` WHERE `handled = FALSE`)
+
+**Notes**:
+- Worker processes unhandled submissions
+- `handled = TRUE` after processing
+
+#### `submission_result`
+
+Processing results for submissions.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Result ID |
+| `submission_id` | INTEGER | FOREIGN KEY | Submission |
+| `track_id` | INTEGER | FOREIGN KEY | Matched/created track |
+| `created` | TIMESTAMP | NOT NULL | Processing timestamp |
+
+**Indexes**:
+- `submission_result_pkey` (PRIMARY KEY on `id`)
+- `submission_result_submission_id_key` (UNIQUE on `submission_id`)
+
+#### `pending_submission`
+
+Queue for async submission processing.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Queue ID |
+| `submission_id` | INTEGER | FOREIGN KEY | Submission |
+| `created` | TIMESTAMP | NOT NULL | Queue timestamp |
+
+**Indexes**:
+- `pending_submission_pkey` (PRIMARY KEY on `id`)
+- `pending_submission_submission_id_key` (UNIQUE on `submission_id`)
+
+**Notes**:
+- Replaced by NATS queue in newer deployments
+- Legacy table, may be deprecated
+
+### Provenance Tables (acoustid_fingerprint)
+
+Track data lineage and changes.
+
+#### `fingerprint_source`
+
+Links fingerprints to submission sources.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Link ID |
+| `fingerprint_id` | INTEGER | FOREIGN KEY | Fingerprint |
+| `source_id` | INTEGER | FOREIGN KEY | Source |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+
+#### `track_mbid_source`
+
+Links track-MBID associations to sources.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Link ID |
+| `track_mbid_id` | INTEGER | FOREIGN KEY | Track-MBID link |
+| `source_id` | INTEGER | FOREIGN KEY | Source |
+| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
+
+#### `track_mbid_change`
+
+Audit log for track-MBID changes.
+
+| Column | Type | Constraints | Description |
+|--------|------|-------------|-------------|
+| `id` | SERIAL | PRIMARY KEY | Change ID |
+| `track_mbid_id` | INTEGER | FOREIGN KEY | Track-MBID link |
+| `account_id` | INTEGER | FOREIGN KEY | Account that made change |
+| `disabled` | BOOLEAN | NOT NULL | New disabled status |
+| `created` | TIMESTAMP | NOT NULL | Change timestamp |
+| `note` | TEXT | | Change reason |
+
+## ORM Layer (SQLAlchemy)
+
+### Multi-Database Configuration
+
+**File**: `acoustid/db.py`
+
+```python
+# Database bind keys
+BIND_KEYS = {
+    'app': 'acoustid_app',
+    'fingerprint': 'acoustid_fingerprint',
+    'ingest': 'acoustid_ingest',
+    'musicbrainz': 'musicbrainz'
+}
+```
+
+**Model Binding**:
+
+```python
+class Account(Base):
+    __bind_key__ = 'app'
+    __tablename__ = 'account'
+    # ...
+
+class Track(Base):
+    __bind_key__ = 'fingerprint'
+    __tablename__ = 'track'
+    # ...
+```
+
+### Connection Pooling
+
+**Configuration** (`acoustid.conf`):
+
+```ini
+[database]
+name = acoustid_app
+user = acoustid
+password_file = /run/secrets/db_password
+host = postgres
+port = 5432
+pool_size = 20
+pool_recycle = 3600
+```
+
+**Pool Settings**:
+- `pool_size`: Maximum connections per process
+- `pool_recycle`: Recycle connections after N seconds
+- `pool_pre_ping`: Test connections before use
+
+### Query Patterns
+
+**Fingerprint Search** (legacy, pre-index):
+
+```python
+# Find similar fingerprints using intarray overlap
+query = db.session.query(Fingerprint).filter(
+    Fingerprint.fingerprint.op('&&')(query_fingerprint),
+    Fingerprint.length.between(duration - 5, duration + 5)
+).order_by(
+    func.acoustid_compare(Fingerprint.fingerprint, query_fingerprint).desc()
+).limit(10)
+```
+
+**Track Lookup with MBIDs**:
+
+```python
+# Fetch track with all linked MBIDs
+track = db.session.query(Track).options(
+    joinedload(Track.mbids)
+).filter(Track.gid == track_gid).first()
+```
+
+**Submission Processing**:
+
+```python
+# Find unhandled submissions
+submissions = db.session.query(Submission).filter(
+    Submission.handled == False
+).order_by(Submission.created).limit(100).all()
+```
+
+## Database Migrations
+
+### Alembic Configuration
+
+**File**: `alembic.ini`
+
+**Migration Directories**:
+- `alembic/versions/app/`: acoustid_app migrations
+- `alembic/versions/fingerprint/`: acoustid_fingerprint migrations
+- `alembic/versions/ingest/`: acoustid_ingest migrations
+
+**Multi-Database Support**:
+
+```python
+# alembic/env.py
+def run_migrations_online():
+    for bind_key in ['app', 'fingerprint', 'ingest']:
+        engine = get_engine(bind_key)
+        with engine.connect() as connection:
+            context.configure(
+                connection=connection,
+                target_metadata=get_metadata(bind_key)
+            )
+            with context.begin_transaction():
+                context.run_migrations()
+```
+
+### Migration Commands
+
+```bash
+# Create new migration
+alembic revision --autogenerate -m "Add new column"
+
+# Apply migrations
+alembic upgrade head
+
+# Rollback migration
+alembic downgrade -1
+
+# Show current version
+alembic current
+
+# Show migration history
+alembic history
+```
+
+## Redis Data Structures
+
+### Rate Limiting
+
+**Key Pattern**: `rl:bucket:{scope}:{identifier}:{timestamp}`
+
+**Example Keys**:
+```
+rl:bucket:global:1714305600
+rl:bucket:app:8XaBELgH:1714305600
+rl:bucket:ip:192.168.1.1:1714305600
+```
+
+**Value**: Integer (request count)  
+**TTL**: 25 seconds (window duration + buffer)
+
+**Algorithm**:
+```python
+# Increment bucket for current window
+bucket_key = f"rl:bucket:{scope}:{identifier}:{current_window}"
+count = redis.incr(bucket_key)
+redis.expire(bucket_key, 25)
+
+# Sum counts across all windows in sliding window
+total = sum(redis.get(f"rl:bucket:{scope}:{identifier}:{w}") 
+            for w in windows)
+```
+
+### Task Queue (Legacy)
+
+**Key Pattern**: `queue:{queue_name}`
+
+**Operations**:
+```python
+# Push task
+redis.rpush('queue:submissions', json.dumps(task_data))
+
+# Pop task
+task_data = redis.lpop('queue:submissions')
+```
+
+**Note**: Being replaced by NATS in newer deployments
+
+### API Key Cache
+
+**Implementation**: In-memory TTLCache (not Redis)
+
+```python
+from cachetools import TTLCache
+
+api_key_cache = TTLCache(maxsize=1000, ttl=60)
+```
+
+**Purpose**: Reduce database queries for API key validation
+
+### Backfill State
+
+**Key Pattern**: `backfill:{index_name}:{state_key}`
+
+**Example Keys**:
+```
+backfill:fingerprints:last_id
+backfill:fingerprints:batch_size
+backfill:fingerprints:completed
+```
+
+**Purpose**: Track progress of index backfill operations
+
+### Unknown MBID Cache
+
+**Key Pattern**: `unknown_mbid:{mbid}`
+
+**Value**: Boolean (1 if MBID not found in MusicBrainz)  
+**TTL**: 3600 seconds (1 hour)
+
+**Purpose**: Avoid repeated MusicBrainz queries for non-existent MBIDs
+
+## Data Integrity
+
+### Constraints
+
+**Foreign Keys**:
+- All foreign keys have `ON DELETE CASCADE` or `ON DELETE SET NULL`
+- Orphaned records cleaned up automatically
+
+**Unique Constraints**:
+- Prevent duplicate fingerprints per track
+- Prevent duplicate MBID links per track
+- Ensure API key uniqueness
+
+**Check Constraints**:
+- Duration must be positive
+- Bitrate must be positive
+- Submission count must be non-negative
+
+### Triggers
+
+**Update Submission Count**:
+```sql
+CREATE TRIGGER update_fingerprint_submission_count
+AFTER INSERT ON fingerprint_source
+FOR EACH ROW
+EXECUTE FUNCTION increment_submission_count();
+```
+
+**Track Merge Propagation**:
+```sql
+CREATE TRIGGER propagate_track_merge
+AFTER UPDATE OF new_id ON track
+FOR EACH ROW
+EXECUTE FUNCTION update_merged_track_references();
+```
+
+### Indexes for Performance
+
+**Covering Indexes**:
+```sql
+-- Lookup by fingerprint and duration
+CREATE INDEX fingerprint_lookup_idx 
+ON fingerprint (length, track_id) 
+INCLUDE (fingerprint);
+```
+
+**Partial Indexes**:
+```sql
+-- Only index unhandled submissions
+CREATE INDEX submission_unhandled_idx 
+ON submission (created) 
+WHERE handled = FALSE;
+```
+
+**GIN Indexes**:
+```sql
+-- Fast fingerprint array queries
+CREATE INDEX fingerprint_fingerprint_idx 
+ON fingerprint USING GIN (fingerprint gin__int_ops);
+```
+
+## Data Lifecycle
+
+### Fingerprint Submission
+
+1. Insert into `submission` table (acoustid_ingest)
+2. Publish to NATS queue
+3. Worker processes submission
+4. Insert into `fingerprint` table (acoustid_fingerprint)
+5. Link to `track` (create or match)
+6. Insert into `fingerprint_source` (provenance)
+7. Update index via HTTP API
+8. Insert into `submission_result`
+9. Mark `submission.handled = TRUE`
+
+### Track Merging
+
+1. Identify duplicate tracks (manual or automated)
+2. Set `track.new_id` to target track
+3. Trigger updates all references
+4. Merge fingerprints, MBIDs, metadata
+5. Disable old track (`track.disabled = TRUE`)
+
+### Data Cleanup
+
+**Cron Jobs**:
+- Delete old handled submissions (>30 days)
+- Clean up orphaned metadata records
+- Remove disabled tracks with no references
+- Archive old statistics
+
+## Performance Optimization
+
+### Query Optimization
+
+**Materialized Views**:
+```sql
+CREATE MATERIALIZED VIEW track_stats AS
+SELECT 
+    track_id,
+    COUNT(DISTINCT fingerprint_id) AS fingerprint_count,
+    COUNT(DISTINCT mbid) AS mbid_count,
+    SUM(submission_count) AS total_submissions
+FROM fingerprint
+LEFT JOIN track_mbid USING (track_id)
+GROUP BY track_id;
+```
+
+**Partitioning** (future):
+```sql
+-- Partition submissions by month
+CREATE TABLE submission_2025_04 PARTITION OF submission
+FOR VALUES FROM ('2025-04-01') TO ('2025-05-01');
+```
+
+### Caching Strategy
+
+**Application-Level**:
+- API key validation (TTLCache, 60s)
+- Format ID lookup (permanent cache)
+- MusicBrainz MBID existence (Redis, 1h)
+
+**Database-Level**:
+- Shared buffers (PostgreSQL config)
+- Connection pooling (SQLAlchemy)
+- Query result caching (pg_stat_statements)
+
+### Bulk Operations
+
+**Batch Inserts**:
+```python
+# Insert multiple fingerprints efficiently
+db.session.bulk_insert_mappings(Fingerprint, fingerprint_dicts)
+db.session.commit()
+```
+
+**Bulk Updates**:
+```python
+# Update submission counts in batch
+db.session.execute(
+    update(Fingerprint).where(
+        Fingerprint.id.in_(fingerprint_ids)
+    ).values(
+        submission_count=Fingerprint.submission_count + 1
+    )
+)
+```
+
+## Backup and Recovery
+
+### Backup Strategy
+
+**PostgreSQL**:
+- Daily full backups (pg_dump)
+- Continuous WAL archiving
+- Point-in-time recovery enabled
+
+**Index**:
+- Daily snapshots via `/:index/_snapshot`
+- Incremental backups of Oplog
+- Segment files backed up separately
+
+### Disaster Recovery
+
+**Database Restore**:
+```bash
+# Restore from dump
+pg_restore -d acoustid_app acoustid_app_backup.dump
+
+# Point-in-time recovery
+pg_restore --target-time='2025-04-28 12:00:00'
+```
+
+**Index Rebuild**:
+```bash
+# Rebuild from database
+python manage.py run import --rebuild-index
+```
@@ -0,0 +1,946 @@
+# AcoustID Deployment
+
+## Deployment Overview
+
+AcoustID supports multiple deployment models: production multi-server, Docker Compose for self-hosting, and local development. The system requires coordination between multiple services: PostgreSQL, Redis, NATS, the Python server, and the Zig index.
+
+## Docker Deployment
+
+### Server Docker Image
+
+**Dockerfile**: `docker/Dockerfile`
+
+#### Multi-Stage Build
+
+**Stage 1: Chromaprint Build**
+
+```dockerfile
+FROM ubuntu:24.04 AS chromaprint-build
+
+RUN apt-get update && apt-get install -y \
+    git \
+    cmake \
+    build-essential \
+    libfftw3-dev
+
+WORKDIR /build
+RUN git clone https://github.com/acoustid/chromaprint.git && \
+    cd chromaprint && \
+    git checkout 41a3e8fb && \
+    cmake -DCMAKE_BUILD_TYPE=Release \
+          -DBUILD_TOOLS=OFF \
+          -DBUILD_TESTS=OFF . && \
+    make -j$(nproc) && \
+    make install
+```
+
+**Stage 2: Base Image**
+
+```dockerfile
+FROM ubuntu:24.04 AS base
+
+RUN apt-get update && apt-get install -y \
+    python3.12 \
+    python3-pip \
+    libfftw3-3 \
+    libpq5 \
+    && rm -rf /var/lib/apt/lists/*
+
+COPY --from=chromaprint-build /usr/local/lib/libchromaprint.so* /usr/local/lib/
+COPY --from=chromaprint-build /usr/local/include/chromaprint.h /usr/local/include/
+
+RUN ldconfig
+```
+
+**Stage 3: Builder**
+
+```dockerfile
+FROM base AS builder
+
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    python3-dev \
+    libpq-dev \
+    curl \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install uv
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh
+ENV PATH="/root/.cargo/bin:$PATH"
+
+WORKDIR /app
+COPY pyproject.toml uv.lock ./
+RUN uv sync --frozen --no-dev
+
+COPY . .
+RUN uv build
+```
+
+**Stage 4: Final Image**
+
+```dockerfile
+FROM base AS final
+
+# Create non-root user
+RUN useradd -m -u 1000 acoustid
+
+WORKDIR /app
+
+# Copy built wheel and dependencies
+COPY --from=builder /app/.venv /app/.venv
+COPY --from=builder /app/dist/*.whl /tmp/
+
+# Install application
+RUN /app/.venv/bin/pip install /tmp/*.whl && rm /tmp/*.whl
+
+# Copy configuration template
+COPY acoustid.conf.dist /etc/acoustid/acoustid.conf.dist
+
+USER acoustid
+
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONUNBUFFERED=1
+
+ENTRYPOINT ["python", "manage.py"]
+CMD ["run", "api"]
+```
+
+**Image Size**: ~400MB (compressed)  
+**Base OS**: Ubuntu 24.04  
+**Python Version**: 3.12
+
+### Index Docker Image
+
+**Dockerfile**: `docker/Dockerfile.index`
+
+```dockerfile
+FROM ubuntu:24.04 AS builder
+
+RUN apt-get update && apt-get install -y \
+    curl \
+    xz-utils \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Zig
+RUN curl -L https://ziglang.org/download/0.11.0/zig-linux-x86_64-0.11.0.tar.xz | \
+    tar -xJ -C /usr/local && \
+    ln -s /usr/local/zig-linux-x86_64-0.11.0/zig /usr/local/bin/zig
+
+WORKDIR /build
+COPY . .
+
+RUN zig build -Doptimize=ReleaseFast
+
+FROM ubuntu:24.04
+
+RUN useradd -m -u 1000 acoustid
+
+WORKDIR /app
+
+COPY --from=builder /build/zig-out/bin/fpindex /app/fpindex
+
+RUN mkdir -p /var/lib/acoustid-index && \
+    chown acoustid:acoustid /var/lib/acoustid-index
+
+USER acoustid
+
+EXPOSE 6081
+
+ENTRYPOINT ["/app/fpindex"]
+CMD ["--dir", "/var/lib/acoustid-index", "--port", "6081"]
+```
+
+**Image Size**: ~50MB (compressed)  
+**Base OS**: Ubuntu 24.04  
+**Binary**: Single statically-linked executable
+
+### Docker Compose Configuration
+
+**File**: `docker-compose.yml`
+
+```yaml
+version: '3.8'
+
+services:
+  postgres:
+    image: ghcr.io/acoustid/postgresql:17.4
+    environment:
+      POSTGRES_USER: acoustid
+      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
+      POSTGRES_MULTIPLE_DATABASES: acoustid_app,acoustid_fingerprint,acoustid_ingest
+    volumes:
+      - postgres_data:/var/lib/postgresql/data
+      - ./docker/init-db.sh:/docker-entrypoint-initdb.d/init-db.sh
+    secrets:
+      - db_password
+    ports:
+      - "5432:5432"
+    healthcheck:
+      test: ["CMD-EXEC", "pg_isready -U acoustid"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  redis:
+    image: redis:7-alpine
+    command: redis-server --requirepass-file /run/secrets/redis_password
+    volumes:
+      - redis_data:/data
+    secrets:
+      - redis_password
+    ports:
+      - "6379:6379"
+    healthcheck:
+      test: ["CMD", "redis-cli", "ping"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  nats:
+    image: nats:2-alpine
+    command: -js -sd /data
+    volumes:
+      - nats_data:/data
+    ports:
+      - "4222:4222"
+      - "8222:8222"
+    healthcheck:
+      test: ["CMD", "wget", "-q", "-O-", "http://localhost:8222/healthz"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+
+  index:
+    image: ghcr.io/acoustid/acoustid-index:latest
+    command: >
+      --dir /var/lib/acoustid-index
+      --port 6081
+      --threads 4
+      --log-level info
+    volumes:
+      - index_data:/var/lib/acoustid-index
+    ports:
+      - "6081:6081"
+    healthcheck:
+      test: ["CMD", "wget", "-q", "-O-", "http://localhost:6081/_health"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+    profiles:
+      - backend
+
+  api:
+    image: ghcr.io/acoustid/acoustid-server:latest
+    command: run api
+    environment:
+      ACOUSTID_CONFIG: /etc/acoustid/acoustid.conf
+    volumes:
+      - ./acoustid.conf:/etc/acoustid/acoustid.conf:ro
+    secrets:
+      - db_password
+      - redis_password
+    ports:
+      - "5000:5000"
+    depends_on:
+      postgres:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+      nats:
+        condition: service_healthy
+      index:
+        condition: service_healthy
+    healthcheck:
+      test: ["CMD", "wget", "-q", "-O-", "http://localhost:5000/_health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    profiles:
+      - frontend
+
+  web:
+    image: ghcr.io/acoustid/acoustid-server:latest
+    command: run web
+    environment:
+      ACOUSTID_CONFIG: /etc/acoustid/acoustid.conf
+    volumes:
+      - ./acoustid.conf:/etc/acoustid/acoustid.conf:ro
+    secrets:
+      - db_password
+      - redis_password
+    ports:
+      - "5001:5001"
+    depends_on:
+      postgres:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+    healthcheck:
+      test: ["CMD", "wget", "-q", "-O-", "http://localhost:5001/_health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    profiles:
+      - frontend
+
+  worker:
+    image: ghcr.io/acoustid/acoustid-server:latest
+    command: run worker
+    environment:
+      ACOUSTID_CONFIG: /etc/acoustid/acoustid.conf
+    volumes:
+      - ./acoustid.conf:/etc/acoustid/acoustid.conf:ro
+    secrets:
+      - db_password
+      - redis_password
+    depends_on:
+      postgres:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+      nats:
+        condition: service_healthy
+      index:
+        condition: service_healthy
+    deploy:
+      replicas: 2
+    profiles:
+      - backend
+
+  cron:
+    image: ghcr.io/acoustid/acoustid-server:latest
+    command: run cron
+    environment:
+      ACOUSTID_CONFIG: /etc/acoustid/acoustid.conf
+    volumes:
+      - ./acoustid.conf:/etc/acoustid/acoustid.conf:ro
+    secrets:
+      - db_password
+      - redis_password
+    depends_on:
+      postgres:
+        condition: service_healthy
+      redis:
+        condition: service_healthy
+    profiles:
+      - backend
+
+volumes:
+  postgres_data:
+  redis_data:
+  nats_data:
+  index_data:
+
+secrets:
+  db_password:
+    file: ./secrets/db_password.txt
+  redis_password:
+    file: ./secrets/redis_password.txt
+```
+
+### Docker Compose Profiles
+
+**Frontend Profile** (public-facing services):
+```bash
+docker compose --profile frontend up
+```
+Services: api, web
+
+**Backend Profile** (background services):
+```bash
+docker compose --profile backend up
+```
+Services: index, worker, cron
+
+**Full Stack**:
+```bash
+docker compose --profile frontend --profile backend up
+```
+
+**Tools Profile** (one-off commands):
+```bash
+docker compose run --rm tools python manage.py <command>
+```
+
+## PostgreSQL Setup
+
+### Custom PostgreSQL Image
+
+**Image**: `ghcr.io/acoustid/postgresql:17.4`  
+**Base**: `postgres:17.4`
+
+**Dockerfile**: `docker/Dockerfile.postgres`
+
+```dockerfile
+FROM postgres:17.4
+
+# Install extensions
+RUN apt-get update && apt-get install -y \
+    postgresql-17-intarray \
+    postgresql-17-pgcrypto \
+    postgresql-17-cube \
+    build-essential \
+    postgresql-server-dev-17 \
+    && rm -rf /var/lib/apt/lists/*
+
+# Build acoustid extension
+COPY extensions/acoustid /build/acoustid
+WORKDIR /build/acoustid
+RUN make && make install
+
+# Copy initialization scripts
+COPY docker/init-db.sh /docker-entrypoint-initdb.d/
+```
+
+### Database Initialization
+
+**Script**: `docker/init-db.sh`
+
+```bash
+#!/bin/bash
+set -e
+
+# Create multiple databases
+for db in acoustid_app acoustid_fingerprint acoustid_ingest; do
+    psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" <<-EOSQL
+        CREATE DATABASE $db;
+        \c $db
+        CREATE EXTENSION IF NOT EXISTS pgcrypto;
+EOSQL
+done
+
+# Install extensions for fingerprint database
+psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" -d acoustid_fingerprint <<-EOSQL
+    CREATE EXTENSION IF NOT EXISTS intarray;
+    CREATE EXTENSION IF NOT EXISTS cube;
+    CREATE EXTENSION IF NOT EXISTS acoustid;
+EOSQL
+
+# Run migrations
+cd /app
+python manage.py db upgrade
+```
+
+### Database Configuration
+
+**postgresql.conf** (custom settings):
+
+```ini
+# Connection settings
+max_connections = 200
+shared_buffers = 4GB
+effective_cache_size = 12GB
+
+# Write-ahead log
+wal_level = replica
+max_wal_size = 2GB
+min_wal_size = 1GB
+
+# Query planner
+random_page_cost = 1.1  # SSD
+effective_io_concurrency = 200
+
+# Parallel query
+max_parallel_workers_per_gather = 4
+max_parallel_workers = 8
+
+# Logging
+log_min_duration_statement = 1000  # Log slow queries (>1s)
+log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
+
+# Autovacuum
+autovacuum_max_workers = 4
+autovacuum_naptime = 10s
+```
+
+## CI/CD Pipeline
+
+### GitHub Actions Workflows
+
+**File**: `.github/workflows/ci.yml`
+
+```yaml
+name: CI
+
+on:
+  push:
+    branches: [main, develop]
+  pull_request:
+    branches: [main]
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+      
+      - name: Install uv
+        run: curl -LsSf https://astral.sh/uv/install.sh | sh
+      
+      - name: Install dependencies
+        run: uv sync
+      
+      - name: Run isort
+        run: uv run isort --check-only acoustid/
+      
+      - name: Run black
+        run: uv run black --check acoustid/
+      
+      - name: Run flake8
+        run: uv run flake8 acoustid/
+      
+      - name: Run mypy
+        run: uv run mypy acoustid/
+
+  test:
+    runs-on: ubuntu-latest
+    services:
+      postgres:
+        image: ghcr.io/acoustid/postgresql:17.4
+        env:
+          POSTGRES_USER: acoustid
+          POSTGRES_PASSWORD: acoustid
+          POSTGRES_DB: acoustid_test
+        options: >-
+          --health-cmd pg_isready
+          --health-interval 10s
+          --health-timeout 5s
+          --health-retries 5
+        ports:
+          - 5432:5432
+      
+      redis:
+        image: redis:7-alpine
+        options: >-
+          --health-cmd "redis-cli ping"
+          --health-interval 10s
+          --health-timeout 5s
+          --health-retries 5
+        ports:
+          - 6379:6379
+      
+      nats:
+        image: nats:2-alpine
+        options: >-
+          --health-cmd "wget -q -O- http://localhost:8222/healthz"
+          --health-interval 10s
+          --health-timeout 5s
+          --health-retries 5
+        ports:
+          - 4222:4222
+    
+    steps:
+      - uses: actions/checkout@v4
+      
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+      
+      - name: Install uv
+        run: curl -LsSf https://astral.sh/uv/install.sh | sh
+      
+      - name: Install dependencies
+        run: uv sync
+      
+      - name: Run migrations
+        run: uv run python manage.py db upgrade
+        env:
+          ACOUSTID_DATABASE_NAME: acoustid_test
+          ACOUSTID_DATABASE_USER: acoustid
+          ACOUSTID_DATABASE_PASSWORD: acoustid
+          ACOUSTID_DATABASE_HOST: localhost
+      
+      - name: Run tests
+        run: uv run pytest -v --cov=acoustid --cov-report=xml
+        env:
+          ACOUSTID_DATABASE_NAME: acoustid_test
+          ACOUSTID_DATABASE_USER: acoustid
+          ACOUSTID_DATABASE_PASSWORD: acoustid
+          ACOUSTID_DATABASE_HOST: localhost
+          ACOUSTID_REDIS_HOST: localhost
+          ACOUSTID_NATS_SERVERS: nats://localhost:4222
+      
+      - name: Upload coverage
+        uses: codecov/codecov-action@v4
+        with:
+          file: ./coverage.xml
+
+  build:
+    runs-on: ubuntu-latest
+    needs: [lint, test]
+    if: github.event_name == 'push'
+    steps:
+      - uses: actions/checkout@v4
+      
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v3
+      
+      - name: Login to GitHub Container Registry
+        uses: docker/login-action@v3
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+      
+      - name: Build and push server image
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          file: docker/Dockerfile
+          push: true
+          tags: |
+            ghcr.io/acoustid/acoustid-server:latest
+            ghcr.io/acoustid/acoustid-server:${{ github.sha }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+      
+      - name: Build and push index image
+        uses: docker/build-push-action@v5
+        with:
+          context: .
+          file: docker/Dockerfile.index
+          push: true
+          tags: |
+            ghcr.io/acoustid/acoustid-index:latest
+            ghcr.io/acoustid/acoustid-index:${{ github.sha }}
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
+```
+
+### Linting Tools
+
+**isort** (import sorting):
+```ini
+# pyproject.toml
+[tool.isort]
+profile = "black"
+line_length = 100
+```
+
+**black** (code formatting):
+```ini
+# pyproject.toml
+[tool.black]
+line-length = 100
+target-version = ['py312']
+```
+
+**flake8** (style checking):
+```ini
+# .flake8
+[flake8]
+max-line-length = 100
+extend-ignore = E203, W503
+exclude = .git,__pycache__,build,dist,.venv
+```
+
+**mypy** (type checking):
+```ini
+# pyproject.toml
+[tool.mypy]
+python_version = "3.12"
+warn_return_any = true
+warn_unused_configs = true
+disallow_untyped_defs = true
+```
+
+### Testing
+
+**pytest** configuration:
+
+```ini
+# pyproject.toml
+[tool.pytest.ini_options]
+testpaths = ["tests"]
+python_files = ["test_*.py"]
+python_classes = ["Test*"]
+python_functions = ["test_*"]
+addopts = "-v --strict-markers --tb=short"
+markers = [
+    "slow: marks tests as slow (deselect with '-m \"not slow\"')",
+    "integration: marks tests as integration tests",
+]
+```
+
+**Test Files** (24 total):
+```
+tests/
+├── test_api_lookup.py
+├── test_api_submit.py
+├── test_fingerprint.py
+├── test_indexclient.py
+├── test_fpstore.py
+├── test_data_account.py
+├── test_data_fingerprint.py
+├── test_data_track.py
+├── test_data_musicbrainz.py
+├── test_worker.py
+├── test_cron.py
+├── test_ratelimit.py
+├── test_db.py
+├── test_config.py
+└── ...
+```
+
+**Test Fixtures**:
+
+```python
+# tests/conftest.py
+import pytest
+from acoustid.db import create_engine, create_session
+
+@pytest.fixture
+def with_database():
+    """Provide test database session."""
+    engine = create_engine('acoustid_test')
+    session = create_session(engine)
+    yield session
+    session.rollback()
+    session.close()
+
+@pytest.fixture
+def with_script():
+    """Provide script context with database."""
+    from acoustid.script import Script
+    script = Script('test')
+    script.setup()
+    yield script
+    script.teardown()
+
+@pytest.fixture
+def fingerprint_fixture():
+    """Predefined test fingerprint."""
+    return [123456789, 987654321, 456789123, ...]
+```
+
+## Infrastructure Requirements
+
+### Minimum Requirements (Self-Hosted)
+
+| Component | CPU | RAM | Disk | Notes |
+|-----------|-----|-----|------|-------|
+| PostgreSQL | 2 cores | 4 GB | 100 GB SSD | For small dataset |
+| Redis | 1 core | 1 GB | 10 GB | Mostly in-memory |
+| NATS | 1 core | 512 MB | 10 GB | JetStream storage |
+| Index | 2 cores | 2 GB | 50 GB SSD | Depends on dataset size |
+| API | 2 cores | 2 GB | 10 GB | Per instance |
+| Worker | 2 cores | 2 GB | 10 GB | Per instance |
+| **Total** | **10 cores** | **11.5 GB** | **190 GB** | Single-host deployment |
+
+### Production Requirements (acoustid.org scale)
+
+| Component | CPU | RAM | Disk | Instances | Notes |
+|-----------|-----|-----|------|-----------|-------|
+| PostgreSQL | 16 cores | 64 GB | 2 TB NVMe | 1 primary + 2 replicas | High IOPS required |
+| Redis | 4 cores | 16 GB | 100 GB SSD | 3 (cluster) | Persistence enabled |
+| NATS | 4 cores | 8 GB | 500 GB SSD | 3 (cluster) | JetStream storage |
+| Index | 8 cores | 16 GB | 1 TB NVMe | 4+ | Sharded by fingerprint ID |
+| API | 4 cores | 8 GB | 50 GB | 4+ | Behind load balancer |
+| Web | 2 cores | 4 GB | 50 GB | 2+ | Behind load balancer |
+| Worker | 4 cores | 8 GB | 50 GB | 8+ | Auto-scaling |
+| Cron | 2 cores | 4 GB | 50 GB | 1 | Leader election |
+
+### Network Requirements
+
+**Bandwidth**:
+- API: 100 Mbps per instance (burst to 1 Gbps)
+- Index: 1 Gbps (internal network)
+- Database: 1 Gbps (internal network)
+
+**Latency**:
+- API to Index: <5ms
+- API to Database: <5ms
+- API to Redis: <1ms
+
+## Monitoring and Observability
+
+### Health Checks
+
+**Endpoints**:
+- `/_health`: Full health check (database write test)
+- `/_health_ro`: Read-only health check
+- `/_health_docker`: Minimal health check for Docker
+
+**Kubernetes Probes**:
+
+```yaml
+livenessProbe:
+  httpGet:
+    path: /_health_docker
+    port: 5000
+  initialDelaySeconds: 30
+  periodSeconds: 10
+  timeoutSeconds: 5
+  failureThreshold: 3
+
+readinessProbe:
+  httpGet:
+    path: /_health_ro
+    port: 5000
+  initialDelaySeconds: 10
+  periodSeconds: 5
+  timeoutSeconds: 3
+  failureThreshold: 2
+```
+
+### Metrics
+
+**StatsD Metrics** (server):
+- `api.requests_total{endpoint,method,status}`
+- `api.request_duration_seconds{endpoint,method}`
+- `api.handled_errors_total{error_code}`
+- `api.unhandled_errors_total`
+- `api.lookup.searches.total`
+- `api.lookup.matches.total`
+- `new_submissions`
+
+**Prometheus Metrics** (index):
+- `fpindex_search_duration_seconds`
+- `fpindex_insert_duration_seconds`
+- `fpindex_segment_count`
+- `fpindex_memory_segment_size_bytes`
+- `fpindex_file_segment_size_bytes`
+- `fpindex_merge_duration_seconds`
+
+### Logging
+
+**Log Levels**:
+- `DEBUG`: Detailed diagnostic information
+- `INFO`: General informational messages
+- `WARNING`: Warning messages
+- `ERROR`: Error messages
+- `CRITICAL`: Critical errors
+
+**Log Format**:
+```
+%(asctime)s [%(process)d] [%(levelname)s] %(name)s: %(message)s
+```
+
+**Environment Variables**:
+```bash
+ACOUSTID_LOGGING_LEVEL=INFO
+ACOUSTID_LOGGING_LEVEL_ACOUSTID=DEBUG
+ACOUSTID_LOGGING_LEVEL_SQLALCHEMY=WARNING
+```
+
+### Error Tracking
+
+**Sentry Integration**:
+
+```ini
+# acoustid.conf
+[sentry]
+dsn = https://...@sentry.io/...
+environment = production
+traces_sample_rate = 0.1
+```
+
+**Configuration**:
+```python
+import sentry_sdk
+from sentry_sdk.integrations.flask import FlaskIntegration
+
+sentry_sdk.init(
+    dsn=config.sentry.dsn,
+    environment=config.sentry.environment,
+    traces_sample_rate=config.sentry.traces_sample_rate,
+    integrations=[FlaskIntegration()]
+)
+```
+
+## Scaling Strategies
+
+### Horizontal Scaling
+
+**API/Web**:
+- Add more instances behind load balancer
+- No shared state (stateless)
+- Session data in Redis if needed
+
+**Workers**:
+- Add more instances
+- NATS distributes work automatically
+- No coordination required
+
+**Index**:
+- Shard by fingerprint ID
+- Consistent hashing for distribution
+- NATS for cluster coordination
+
+### Vertical Scaling
+
+**Database**:
+- Increase shared_buffers (25% of RAM)
+- Increase effective_cache_size (50-75% of RAM)
+- Add more CPU for parallel queries
+
+**Index**:
+- Increase thread count
+- Larger memory segment
+- Faster disk (NVMe)
+
+### Caching
+
+**Application-Level**:
+- API key cache (in-memory, 60s TTL)
+- Format lookup cache (permanent)
+- MBID existence cache (Redis, 1h TTL)
+
+**Database-Level**:
+- Connection pooling
+- Query result caching
+- Materialized views
+
+## Backup and Disaster Recovery
+
+### Backup Strategy
+
+**PostgreSQL**:
+```bash
+# Daily full backup
+pg_dump -Fc acoustid_app > acoustid_app_$(date +%Y%m%d).dump
+
+# Continuous WAL archiving
+archive_command = 'cp %p /backup/wal/%f'
+```
+
+**Index**:
+```bash
+# Daily snapshot
+curl -X GET http://index:6081/fingerprints/_snapshot
+
+# Backup segment files
+rsync -av /var/lib/acoustid-index/ /backup/index/
+```
+
+**Redis**:
+```bash
+# RDB snapshot (automatic)
+save 900 1
+save 300 10
+save 60 10000
+
+# AOF (append-only file)
+appendonly yes
+appendfsync everysec
+```
+
+### Disaster Recovery
+
+**Recovery Time Objective (RTO)**: 1 hour  
+**Recovery Point Objective (RPO)**: 5 minutes
+
+**Recovery Steps**:
+1. Restore PostgreSQL from latest backup
+2. Replay WAL to point-in-time
+3. Restore Redis from RDB/AOF
+4. Restore index from snapshot
+5. Rebuild index from database if needed
+6. Restart all services
+7. Verify health checks
@@ -0,0 +1,617 @@
+# AcoustID System Evaluation
+
+## Executive Summary
+
+AcoustID is a mature, production-proven audio fingerprinting system that combines a Python-based web service with a cutting-edge Zig-based search index. The system has been running in production for over a decade, processing millions of fingerprint submissions and lookups. This evaluation assesses its strengths, weaknesses, integration potential, and relevance for metadata aggregation projects.
+
+## Strengths
+
+### 1. Open Source and Well-Licensed
+
+**Advantage**: Complete transparency and flexibility
+
+- **Server License**: MIT (permissive, commercial-friendly)
+- **Index License**: GPL-3.0 (copyleft, but separate service)
+- **Chromaprint**: MIT (can be used independently)
+- **No Vendor Lock-in**: Full control over deployment and modifications
+
+**Impact**: Can be self-hosted, modified, or used as a reference implementation without licensing concerns. The GPL license on the index is acceptable since it runs as a separate service.
+
+### 2. Production-Proven at Scale
+
+**Advantage**: Battle-tested reliability
+
+- **Years in Production**: 10+ years serving acoustid.org
+- **Database Size**: Millions of fingerprints and tracks
+- **Request Volume**: Handles high traffic with proven architecture
+- **Real-World Data**: Extensive test coverage from actual usage
+
+**Impact**: Low risk of fundamental design flaws. Known performance characteristics and scaling patterns.
+
+### 3. Advanced Index Technology
+
+**Advantage**: State-of-the-art search performance
+
+- **LSM-Tree Architecture**: Efficient for write-heavy workloads
+- **SIMD Compression**: StreamVByte for 4-8x compression with minimal CPU overhead
+- **Sub-Millisecond Search**: P50 latency around 5ms
+- **Modern Language**: Zig provides memory safety without garbage collection overhead
+
+**Impact**: The index is one of the most sophisticated open-source fingerprint search implementations available. Significantly faster than naive database-based approaches.
+
+### 4. MusicBrainz Integration
+
+**Advantage**: Direct access to comprehensive music metadata
+
+- **Direct Database Access**: No API rate limits or latency
+- **Rich Metadata**: Artist credits, releases, release groups, tracks
+- **MBID Mapping**: Links audio fingerprints to canonical music identifiers
+- **Redirect Resolution**: Handles merged entities automatically
+
+**Impact**: Provides a complete solution for audio identification with metadata enrichment. Eliminates need for separate metadata lookup infrastructure.
+
+### 5. Comprehensive API
+
+**Advantage**: Well-designed public API
+
+- **Multiple Endpoints**: Lookup, submit, status, user management
+- **Batch Operations**: Up to 20 fingerprints per request
+- **Flexible Metadata**: Configurable response detail levels
+- **Multiple Formats**: JSON, XML, JSONP support
+- **Rate Limiting**: Built-in protection against abuse
+
+**Impact**: Easy to integrate as a client. Can also serve as a reference for building similar APIs.
+
+### 6. Well-Structured Codebase
+
+**Advantage**: Maintainable and extensible
+
+- **Layered Architecture**: Clear separation of concerns
+- **Service Pattern**: Business logic isolated from presentation
+- **Type Hints**: Modern Python with type annotations
+- **Comprehensive Tests**: 24 test files with good coverage
+- **Documentation**: Inline comments and docstrings
+
+**Impact**: Easy to understand, modify, and extend. Low barrier to contribution or customization.
+
+### 7. Modern Infrastructure
+
+**Advantage**: Uses current best practices
+
+- **Docker Support**: Full containerization with multi-stage builds
+- **Docker Compose**: Complete local development environment
+- **CI/CD**: GitHub Actions for automated testing and deployment
+- **Async Support**: Migration to Starlette for async operations
+- **Message Queue**: NATS with JetStream for reliable async processing
+
+**Impact**: Easy to deploy and operate. Follows industry standards for cloud-native applications.
+
+## Weaknesses
+
+### 1. Complex Deployment Requirements
+
+**Disadvantage**: High operational overhead
+
+**Required Services**:
+- PostgreSQL 17.4 (4 separate databases)
+- Custom PostgreSQL extension (acoustid)
+- Redis (caching and rate limiting)
+- NATS with JetStream (message queue)
+- Zig-based index service
+- Multiple Python processes (API, web, worker, cron)
+
+**Minimum Resources**:
+- 10+ CPU cores
+- 11.5 GB RAM
+- 190 GB disk space
+
+**Impact**: Self-hosting requires significant infrastructure investment. Not suitable for small-scale deployments or embedded use cases. The custom PostgreSQL extension adds deployment complexity.
+
+### 2. Custom PostgreSQL Extension Required
+
+**Disadvantage**: Non-standard database setup
+
+- **C Extension**: acoustid extension must be compiled and installed
+- **Platform-Specific**: Requires PostgreSQL development headers
+- **Maintenance Burden**: Must be updated for new PostgreSQL versions
+- **Deployment Complexity**: Cannot use standard PostgreSQL images without modification
+
+**Impact**: Increases deployment complexity and maintenance burden. Limits hosting options (managed PostgreSQL services won't work).
+
+### 3. Transitioning Codebase
+
+**Disadvantage**: Mixed old and new code
+
+**Transition Areas**:
+- Flask to Starlette (both frameworks present)
+- Legacy TCP index protocol to HTTP (both protocols supported)
+- Synchronous to asynchronous operations (mixed patterns)
+
+**Impact**: Code complexity from supporting both old and new approaches. Potential for bugs at transition boundaries. Documentation may be inconsistent.
+
+### 4. Legacy Code Paths
+
+**Disadvantage**: Technical debt
+
+**Legacy Components**:
+- Old API v1 endpoints (deprecated but still present)
+- TCP-based index client (being phased out)
+- Synchronous database operations (alongside async)
+- PUID support (MusicIP legacy)
+
+**Impact**: Increased codebase size and complexity. Potential security or performance issues in unmaintained code paths.
+
+### 5. Zig Index Maturity
+
+**Disadvantage**: Relatively new implementation
+
+- **Language Maturity**: Zig is pre-1.0 (currently 0.11.0)
+- **Ecosystem**: Limited third-party libraries
+- **Community**: Smaller than established languages
+- **Breaking Changes**: Zig language still evolving
+- **Debugging Tools**: Less mature than C/C++/Rust
+
+**Impact**: Potential for language-level breaking changes. Smaller pool of developers familiar with Zig. May require more effort to debug or extend.
+
+### 6. Limited Documentation
+
+**Disadvantage**: Steep learning curve
+
+**Documentation Gaps**:
+- No comprehensive architecture documentation (until this analysis)
+- Limited API examples beyond basic usage
+- Index protocol not formally documented
+- Deployment guide assumes Docker knowledge
+- No performance tuning guide
+
+**Impact**: Difficult for newcomers to understand system internals. Trial and error required for optimization and troubleshooting.
+
+### 7. Tight MusicBrainz Coupling
+
+**Disadvantage**: Assumes MusicBrainz availability
+
+- **Direct Database Dependency**: Requires MusicBrainz database replica
+- **Schema Coupling**: Queries specific MusicBrainz table structures
+- **No Abstraction**: MusicBrainz logic embedded throughout codebase
+- **Alternative Sources**: Difficult to use other metadata providers
+
+**Impact**: Cannot easily substitute alternative metadata sources. Requires maintaining MusicBrainz database replica for full functionality.
+
+## Integration Considerations
+
+### As a Public API Client
+
+**Recommendation**: Best approach for most use cases
+
+**Advantages**:
+- No infrastructure to maintain
+- Proven reliability (acoustid.org uptime)
+- Free for reasonable usage
+- Immediate availability
+
+**Disadvantages**:
+- Rate limits (3 req/s default, 10 req/s with API key)
+- Network latency
+- Dependency on external service
+- No control over data or features
+
+**Best For**:
+- Small to medium scale applications
+- Prototyping and development
+- Applications with intermittent fingerprinting needs
+- Projects without infrastructure budget
+
+**Implementation**:
+```python
+import requests
+
+def lookup_fingerprint(fingerprint, duration):
+    response = requests.post('https://api.acoustid.org/v2/lookup', data={
+        'client': 'YOUR_API_KEY',
+        'duration': duration,
+        'fingerprint': fingerprint,
+        'meta': 'recordings+releases'
+    })
+    return response.json()
+```
+
+### Self-Hosted Deployment
+
+**Recommendation**: Only for large-scale or specialized needs
+
+**Advantages**:
+- Full control over data and features
+- No rate limits
+- Low latency (local network)
+- Customization possible
+- Data privacy
+
+**Disadvantages**:
+- High infrastructure cost
+- Operational complexity
+- Maintenance burden
+- Requires expertise
+
+**Best For**:
+- Large-scale commercial applications
+- Privacy-sensitive use cases
+- Custom fingerprinting algorithms
+- Research and development
+
+**Minimum Viable Deployment**:
+```yaml
+# docker-compose.yml (simplified)
+services:
+  postgres:
+    image: ghcr.io/acoustid/postgresql:17.4
+    volumes:
+      - postgres_data:/var/lib/postgresql/data
+  
+  redis:
+    image: redis:7-alpine
+  
+  nats:
+    image: nats:2-alpine
+    command: -js
+  
+  index:
+    image: ghcr.io/acoustid/acoustid-index:latest
+    volumes:
+      - index_data:/var/lib/acoustid-index
+  
+  api:
+    image: ghcr.io/acoustid/acoustid-server:latest
+    command: run api
+    depends_on: [postgres, redis, nats, index]
+```
+
+### Chromaprint Library Only
+
+**Recommendation**: For custom fingerprinting without AcoustID infrastructure
+
+**Advantages**:
+- Minimal dependencies (just Chromaprint library)
+- Full control over fingerprint storage and matching
+- No network dependency
+- Lightweight
+
+**Disadvantages**:
+- Must implement own matching algorithm
+- No MusicBrainz integration
+- No existing fingerprint database
+- Higher development effort
+
+**Best For**:
+- Custom audio analysis applications
+- Offline fingerprinting
+- Embedded systems
+- Research projects
+
+**Implementation**:
+```python
+import chromaprint
+
+# Generate fingerprint
+fpcalc = chromaprint.Chromaprint()
+fpcalc.start(sample_rate, num_channels)
+fpcalc.feed(audio_data)
+fpcalc.finish()
+fingerprint = fpcalc.get_fingerprint()
+
+# Store and match fingerprints yourself
+# (requires custom implementation)
+```
+
+### Hybrid Approach
+
+**Recommendation**: Best of both worlds for growing applications
+
+**Strategy**:
+1. Start with public API for lookups
+2. Use Chromaprint library for fingerprint generation
+3. Store fingerprints locally for future use
+4. Migrate to self-hosted when scale justifies cost
+
+**Advantages**:
+- Low initial cost
+- Gradual migration path
+- Flexibility to optimize later
+- Reduced vendor lock-in
+
+**Implementation**:
+```python
+class HybridFingerprintService:
+    def __init__(self):
+        self.local_db = LocalFingerprintDB()
+        self.acoustid_client = AcoustIDClient()
+    
+    def identify(self, audio_file):
+        # Generate fingerprint locally
+        fingerprint = chromaprint.generate(audio_file)
+        
+        # Check local database first
+        match = self.local_db.search(fingerprint)
+        if match:
+            return match
+        
+        # Fall back to AcoustID API
+        result = self.acoustid_client.lookup(fingerprint)
+        
+        # Cache result locally
+        if result:
+            self.local_db.store(fingerprint, result)
+        
+        return result
+```
+
+## Relevance for Metadata Aggregation
+
+### High Relevance Scenarios
+
+**1. Audio File Identification**
+
+AcoustID excels at identifying audio files without metadata:
+
+- **Use Case**: User uploads audio file with missing tags
+- **Solution**: Generate fingerprint, lookup via AcoustID, retrieve MBIDs
+- **Benefit**: Accurate identification even with transcoding or quality differences
+
+**2. Duplicate Detection**
+
+Fingerprints enable perceptual duplicate detection:
+
+- **Use Case**: Detect duplicate tracks in large music library
+- **Solution**: Fingerprint all tracks, compare for similarity
+- **Benefit**: Finds duplicates even with different encodings or slight edits
+
+**3. MBID Enrichment**
+
+Links audio files to canonical MusicBrainz identifiers:
+
+- **Use Case**: Enrich audio metadata with MusicBrainz data
+- **Solution**: Fingerprint -> AcoustID -> MBID -> MusicBrainz metadata
+- **Benefit**: Access to comprehensive, community-maintained metadata
+
+**4. Quality Verification**
+
+Verify metadata accuracy:
+
+- **Use Case**: Check if file metadata matches actual audio content
+- **Solution**: Compare fingerprint-based identification with existing tags
+- **Benefit**: Detect mislabeled or corrupted files
+
+### Medium Relevance Scenarios
+
+**5. Playlist Generation**
+
+Acoustic similarity for recommendations:
+
+- **Use Case**: Generate playlists of similar-sounding tracks
+- **Solution**: Compare fingerprints for acoustic similarity
+- **Benefit**: Recommendations based on actual audio, not just metadata
+
+**6. Copyright Detection**
+
+Identify copyrighted content:
+
+- **Use Case**: Detect copyrighted music in user uploads
+- **Solution**: Fingerprint uploads, match against known copyrighted works
+- **Benefit**: Automated content moderation
+
+### Low Relevance Scenarios
+
+**7. Real-Time Audio Recognition**
+
+AcoustID is not optimized for real-time use:
+
+- **Limitation**: Requires full audio file or significant portion
+- **Alternative**: Shazam-style services designed for short audio snippets
+- **Workaround**: Use Chromaprint with custom matching for real-time needs
+
+**8. Music Recommendation**
+
+Limited to acoustic similarity:
+
+- **Limitation**: No semantic understanding of music (genre, mood, etc.)
+- **Alternative**: Dedicated recommendation engines (Spotify API, Last.fm)
+- **Workaround**: Combine with metadata-based recommendation
+
+## Comparison with Alternatives
+
+### vs. Shazam/ACRCloud (Commercial)
+
+| Feature | AcoustID | Shazam/ACRCloud |
+|---------|----------|-----------------|
+| License | Open source (MIT/GPL) | Proprietary |
+| Cost | Free (self-host or API) | Paid API |
+| Database Size | Community-driven | Commercial catalog |
+| Real-Time | No | Yes |
+| Accuracy | High | Very high |
+| Customization | Full | Limited |
+
+**Verdict**: AcoustID better for self-hosted, customizable solutions. Shazam better for real-time recognition and commercial catalog coverage.
+
+### vs. Echoprint (Open Source)
+
+| Feature | AcoustID | Echoprint |
+|---------|----------|-----------|
+| Maintenance | Active | Abandoned (2014) |
+| Index Technology | Modern (LSM-tree, SIMD) | Legacy |
+| Language | Python + Zig | Python + C++ |
+| MusicBrainz | Integrated | No |
+| Community | Active | Dead |
+
+**Verdict**: AcoustID is the clear winner. Echoprint is no longer maintained.
+
+### vs. Chromaprint Alone
+
+| Feature | AcoustID | Chromaprint Only |
+|---------|----------|------------------|
+| Fingerprint Generation | Yes | Yes |
+| Fingerprint Matching | Yes | No (DIY) |
+| Metadata | MusicBrainz | No |
+| Infrastructure | Required | Minimal |
+| Development Effort | Low | High |
+
+**Verdict**: AcoustID provides complete solution. Chromaprint alone requires significant custom development.
+
+## Recommendations
+
+### For Small Projects (< 10k lookups/month)
+
+**Recommendation**: Use public AcoustID API
+
+**Rationale**:
+- Free tier sufficient
+- No infrastructure cost
+- Immediate availability
+- Proven reliability
+
+**Implementation**:
+```python
+# Simple integration
+import acoustid
+
+results = acoustid.match(api_key, audio_file)
+for score, recording_id, title, artist in results:
+    print(f"{title} by {artist} (score: {score})")
+```
+
+### For Medium Projects (10k-1M lookups/month)
+
+**Recommendation**: Hybrid approach
+
+**Rationale**:
+- Public API for initial lookups
+- Local caching for repeated queries
+- Gradual migration path to self-hosted
+- Cost-effective scaling
+
+**Implementation**:
+- Use public API with caching layer
+- Store fingerprints locally
+- Monitor usage and costs
+- Migrate to self-hosted when justified
+
+### For Large Projects (> 1M lookups/month)
+
+**Recommendation**: Self-hosted deployment
+
+**Rationale**:
+- Cost savings at scale
+- Full control and customization
+- Low latency
+- No rate limits
+
+**Implementation**:
+- Deploy full stack (PostgreSQL, Redis, NATS, Index, API)
+- Import existing fingerprint database
+- Implement monitoring and alerting
+- Plan for high availability
+
+### For Research Projects
+
+**Recommendation**: Chromaprint library + custom matching
+
+**Rationale**:
+- Full control over algorithms
+- No external dependencies
+- Flexibility for experimentation
+- Academic freedom
+
+**Implementation**:
+- Use Chromaprint for fingerprint generation
+- Implement custom similarity metrics
+- Experiment with index structures
+- Publish findings
+
+### For Privacy-Sensitive Applications
+
+**Recommendation**: Self-hosted deployment
+
+**Rationale**:
+- No data sent to third parties
+- Full control over data retention
+- Compliance with privacy regulations
+- Audit trail
+
+**Implementation**:
+- Deploy on-premises or private cloud
+- Implement access controls
+- Enable audit logging
+- Regular security updates
+
+## Future Considerations
+
+### Potential Improvements
+
+**1. Simplified Deployment**
+
+- Single-binary deployment option
+- Embedded database (SQLite) for small-scale use
+- Optional components (make MusicBrainz integration optional)
+
+**2. Better Documentation**
+
+- Architecture guide (this document is a start)
+- Performance tuning guide
+- Troubleshooting guide
+- Video tutorials
+
+**3. Alternative Metadata Sources**
+
+- Plugin system for metadata providers
+- Support for Discogs, Spotify, etc.
+- Configurable metadata priority
+
+**4. Enhanced API**
+
+- GraphQL endpoint
+- WebSocket for real-time updates
+- Bulk operations API
+- Admin API for self-hosted instances
+
+**5. Index Improvements**
+
+- Distributed index with automatic sharding
+- Replication for high availability
+- Incremental backups
+- Query result caching
+
+### Technology Evolution
+
+**Zig Maturity**:
+- Monitor Zig 1.0 release
+- Evaluate stability and ecosystem growth
+- Consider Rust alternative if Zig adoption stalls
+
+**Async Migration**:
+- Complete Flask to Starlette transition
+- Remove legacy synchronous code paths
+- Optimize for async/await patterns
+
+**Cloud-Native**:
+- Kubernetes deployment manifests
+- Helm charts
+- Operator for automated management
+- Service mesh integration
+
+## Conclusion
+
+AcoustID is a **highly capable, production-ready audio fingerprinting system** with significant strengths in accuracy, performance, and MusicBrainz integration. The open-source license and mature codebase make it an excellent choice for projects requiring audio identification.
+
+**Key Takeaways**:
+
+1. **Use the public API** for most small to medium projects
+2. **Self-host only when scale justifies** the operational complexity
+3. **Chromaprint library alone** is viable for custom implementations
+4. **MusicBrainz integration** is a major value-add for metadata enrichment
+5. **Deployment complexity** is the main barrier to adoption
+
+**Overall Assessment**: **Highly Recommended** for metadata aggregation projects that need audio fingerprinting, with the caveat that self-hosting requires significant infrastructure investment.
+
+**Rating**: 8.5/10
+
+**Strengths**: Production-proven, open source, excellent MusicBrainz integration, modern index technology  
+**Weaknesses**: Complex deployment, custom PostgreSQL extension, transitioning codebase  
+**Best Use Case**: Audio file identification and MBID enrichment via public API or self-hosted deployment at scale
@@ -0,0 +1,768 @@
+# AcoustID Integrations
+
+## Overview
+
+AcoustID integrates with multiple external services and libraries to provide comprehensive audio fingerprinting and metadata enrichment. The system's architecture separates concerns between fingerprint generation (Chromaprint), fingerprint indexing (acoustid-index), metadata enrichment (MusicBrainz), and supporting infrastructure (Redis, NATS).
+
+## MusicBrainz Integration
+
+### Connection Method
+
+**Type**: Direct PostgreSQL database connection (NOT REST API)  
+**Database**: `musicbrainz` (read-only replica)  
+**Access**: Separate database connection pool
+
+**Configuration** (`acoustid.conf`):
+```ini
+[musicbrainz]
+host = musicbrainz-db.example.com
+port = 5432
+name = musicbrainz_db
+user = acoustid_readonly
+password_file = /run/secrets/mb_password
+```
+
+**File**: `acoustid/data/musicbrainz.py`
+
+### Queried Tables
+
+The integration queries the following MusicBrainz tables directly:
+
+| Table | Purpose | Columns Used |
+|-------|---------|--------------|
+| `artist_credit` | Artist information | `id`, `name`, `artist_count` |
+| `artist_credit_name` | Artist credit details | `artist_credit`, `position`, `artist`, `name`, `join_phrase` |
+| `artist` | Artist entities | `id`, `gid`, `name`, `sort_name` |
+| `recording` | Recording metadata | `id`, `gid`, `name`, `length`, `artist_credit`, `comment` |
+| `release` | Release information | `id`, `gid`, `name`, `artist_credit`, `release_group`, `status`, `packaging`, `barcode` |
+| `release_group` | Release group data | `id`, `gid`, `name`, `artist_credit`, `type`, `comment` |
+| `track` | Track listings | `id`, `gid`, `recording`, `position`, `number`, `name`, `length`, `artist_credit` |
+| `medium` | Medium information | `id`, `release`, `position`, `format`, `track_count` |
+| `release_country` | Release countries | `release`, `country`, `date_year`, `date_month`, `date_day` |
+
+### Query Patterns
+
+**Fetch Recording by MBID**:
+
+```python
+def get_recording_by_mbid(db, mbid):
+    """Fetch recording with artist credits and releases."""
+    query = """
+        SELECT 
+            r.gid AS recording_mbid,
+            r.name AS recording_title,
+            r.length AS duration,
+            ac.name AS artist_credit_name,
+            array_agg(DISTINCT rel.gid) AS release_mbids
+        FROM recording r
+        JOIN artist_credit ac ON r.artist_credit = ac.id
+        LEFT JOIN track t ON t.recording = r.id
+        LEFT JOIN medium m ON t.medium = m.id
+        LEFT JOIN release rel ON m.release = rel.id
+        WHERE r.gid = :mbid
+        GROUP BY r.gid, r.name, r.length, ac.name
+    """
+    return db.execute(query, {'mbid': mbid}).fetchone()
+```
+
+**Fetch Release with Tracks**:
+
+```python
+def get_release_with_tracks(db, release_mbid):
+    """Fetch complete release with all tracks."""
+    query = """
+        SELECT 
+            rel.gid AS release_mbid,
+            rel.name AS release_title,
+            rel.barcode,
+            rc.country,
+            rc.date_year,
+            rc.date_month,
+            rc.date_day,
+            m.position AS medium_position,
+            m.format AS medium_format,
+            t.position AS track_position,
+            t.number AS track_number,
+            t.name AS track_title,
+            rec.gid AS recording_mbid,
+            ac.name AS artist_credit
+        FROM release rel
+        LEFT JOIN release_country rc ON rel.id = rc.release
+        LEFT JOIN medium m ON rel.id = m.release
+        LEFT JOIN track t ON m.id = t.medium
+        LEFT JOIN recording rec ON t.recording = rec.id
+        LEFT JOIN artist_credit ac ON rec.artist_credit = ac.id
+        WHERE rel.gid = :mbid
+        ORDER BY m.position, t.position
+    """
+    return db.execute(query, {'mbid': release_mbid}).fetchall()
+```
+
+**Fetch Artist Credits**:
+
+```python
+def get_artist_credit(db, artist_credit_id):
+    """Fetch artist credit with all artists."""
+    query = """
+        SELECT 
+            acn.position,
+            a.gid AS artist_mbid,
+            a.name AS artist_name,
+            a.sort_name AS artist_sort_name,
+            acn.name AS credited_name,
+            acn.join_phrase
+        FROM artist_credit_name acn
+        JOIN artist a ON acn.artist = a.id
+        WHERE acn.artist_credit = :ac_id
+        ORDER BY acn.position
+    """
+    return db.execute(query, {'ac_id': artist_credit_id}).fetchall()
+```
+
+### MBID Redirect Resolution
+
+MusicBrainz uses MBID redirects when entities are merged. AcoustID resolves these automatically.
+
+**File**: `acoustid/data/musicbrainz.py`
+
+```python
+def resolve_recording_mbid(db, mbid):
+    """Resolve recording MBID redirects."""
+    query = """
+        SELECT new_id 
+        FROM recording_gid_redirect 
+        WHERE gid = :mbid
+    """
+    result = db.execute(query, {'mbid': mbid}).fetchone()
+    if result:
+        # Recursively resolve redirects
+        return resolve_recording_mbid(db, result['new_id'])
+    return mbid
+```
+
+**Redirect Tables Used**:
+- `recording_gid_redirect`
+- `release_gid_redirect`
+- `release_group_gid_redirect`
+- `artist_gid_redirect`
+
+### Metadata Enrichment
+
+When a lookup request includes metadata flags, AcoustID fetches additional data from MusicBrainz:
+
+**Metadata Levels**:
+
+| Flag | Data Fetched | Query Complexity |
+|------|--------------|------------------|
+| `recordingids` | Recording MBIDs only | Low (join only) |
+| `recordings` | Full recording metadata | Medium (artist credits) |
+| `releaseids` | Release MBIDs only | Low (join only) |
+| `releases` | Full release metadata | High (tracks, mediums, countries) |
+| `releasegroupids` | Release group MBIDs only | Low (join only) |
+| `releasegroups` | Full release group metadata | Medium (artist credits) |
+
+**Example Enriched Response**:
+
+```json
+{
+  "recordings": [
+    {
+      "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
+      "title": "Example Song",
+      "duration": 240000,
+      "artists": [
+        {
+          "id": "12345678-90ab-cdef-1234-567890abcdef",
+          "name": "Example Artist",
+          "joinphrase": " & "
+        }
+      ],
+      "releases": [
+        {
+          "id": "abcdef12-3456-7890-abcd-ef1234567890",
+          "title": "Example Album",
+          "country": "US",
+          "date": {
+            "year": 2020,
+            "month": 5,
+            "day": 15
+          },
+          "track_count": 12,
+          "medium_count": 1,
+          "releasegroup": {
+            "id": "fedcba98-7654-3210-fedc-ba9876543210",
+            "type": "Album"
+          }
+        }
+      ]
+    }
+  ]
+}
+```
+
+### Performance Considerations
+
+**Connection Pooling**:
+- Separate pool for MusicBrainz database
+- Pool size: 10 connections (configurable)
+- Pool recycle: 3600 seconds
+
+**Query Optimization**:
+- Indexes on `gid` columns (MusicBrainz maintains these)
+- Batch queries when possible
+- Limit joins to requested metadata only
+
+**Caching**:
+- Unknown MBID cache (Redis, 1 hour TTL)
+- Avoids repeated queries for non-existent MBIDs
+
+**Fallback**:
+- If MusicBrainz database unavailable, return AcoustID data only
+- Graceful degradation (no metadata enrichment)
+
+## Chromaprint Integration
+
+### Library Information
+
+**Name**: Chromaprint  
+**Version**: Built from source (commit `41a3e8fb`)  
+**License**: MIT  
+**Language**: C++  
+**Wrapper**: acoustid-ext (C extension for Python)
+
+**Repository**: https://github.com/acoustid/chromaprint
+
+### Build Process
+
+**Dockerfile** (`docker/Dockerfile`):
+
+```dockerfile
+# Stage 1: Build Chromaprint
+FROM ubuntu:24.04 AS chromaprint-build
+
+RUN apt-get update && apt-get install -y \
+    git cmake build-essential libfftw3-dev
+
+WORKDIR /build
+RUN git clone https://github.com/acoustid/chromaprint.git && \
+    cd chromaprint && \
+    git checkout 41a3e8fb && \
+    cmake -DCMAKE_BUILD_TYPE=Release . && \
+    make && \
+    make install
+
+# Stage 2: Build acoustid-ext
+FROM ubuntu:24.04 AS builder
+
+COPY --from=chromaprint-build /usr/local/lib/libchromaprint.so* /usr/local/lib/
+COPY --from=chromaprint-build /usr/local/include/chromaprint.h /usr/local/include/
+
+RUN pip install acoustid-ext
+```
+
+### Python Extension (acoustid-ext)
+
+**Package**: `acoustid-ext`  
+**File**: `acoustid/fingerprint.py`
+
+**Functions Exposed**:
+
+```python
+from acoustid_ext import (
+    decode_fingerprint,
+    encode_fingerprint,
+    compress_fingerprint,
+    decompress_fingerprint,
+    fingerprint_compare
+)
+```
+
+**Function Signatures**:
+
+| Function | Input | Output | Purpose |
+|----------|-------|--------|---------|
+| `decode_fingerprint(data)` | bytes/str | list[int] | Decode base64/compressed fingerprint |
+| `encode_fingerprint(hashes)` | list[int] | str | Encode fingerprint to base64 |
+| `compress_fingerprint(hashes)` | list[int] | bytes | Compress fingerprint (zstd) |
+| `decompress_fingerprint(data)` | bytes | list[int] | Decompress fingerprint |
+| `fingerprint_compare(fp1, fp2)` | list[int], list[int] | float | Compare similarity (0.0-1.0) |
+
+### Fingerprint Format
+
+**Raw Format** (Chromaprint output):
+- Array of 32-bit unsigned integers
+- Each integer represents a hash of audio features
+- Typical length: 100-300 hashes (for 3-5 minute track)
+
+**Compressed Format** (for transmission):
+- Base64-encoded compressed data
+- Compression: zstd or custom Chromaprint compression
+- Typical size: 200-500 bytes
+
+**Example**:
+```python
+# Raw fingerprint
+fingerprint = [123456789, 987654321, 456789123, ...]
+
+# Encoded (base64)
+encoded = "AQADtNGiJEqUHUemR..."
+
+# Compressed (bytes)
+compressed = b'\x28\xb5\x2f\xfd...'
+```
+
+### Query Extraction
+
+**File**: `acoustid/fingerprint.py`
+
+```python
+def extract_query(fingerprint, max_terms=100):
+    """Extract query terms from fingerprint for index search.
+    
+    Args:
+        fingerprint: List of 32-bit hash integers
+        max_terms: Maximum number of terms to extract
+        
+    Returns:
+        List of term IDs (subset of fingerprint hashes)
+    """
+    # Select most discriminative terms
+    # (implementation uses simhash or random sampling)
+    terms = select_discriminative_terms(fingerprint, max_terms)
+    return terms
+```
+
+**Query Strategy**:
+- Extract subset of hashes (typically 50-100 terms)
+- Prioritize discriminative hashes (high entropy)
+- Balance between precision and recall
+
+### Fingerprint Comparison
+
+**PostgreSQL Function** (custom extension):
+
+```sql
+CREATE FUNCTION acoustid_compare(fp1 INTEGER[], fp2 INTEGER[]) 
+RETURNS FLOAT AS $$
+    -- Calculate Jaccard similarity
+    SELECT COUNT(*)::FLOAT / 
+           (array_length(fp1, 1) + array_length(fp2, 1) - COUNT(*))
+    FROM unnest(fp1) AS h1
+    JOIN unnest(fp2) AS h2 ON h1 = h2
+$$ LANGUAGE SQL IMMUTABLE;
+```
+
+**Python Implementation**:
+
+```python
+def compare_fingerprints(fp1, fp2):
+    """Calculate similarity between two fingerprints.
+    
+    Returns:
+        Float between 0.0 (no match) and 1.0 (identical)
+    """
+    set1 = set(fp1)
+    set2 = set(fp2)
+    intersection = len(set1 & set2)
+    union = len(set1 | set2)
+    return intersection / union if union > 0 else 0.0
+```
+
+## AcoustID Index Integration
+
+### Client Implementations
+
+AcoustID server has two index client implementations:
+
+#### Legacy TCP Client (indexclient.py)
+
+**Status**: Deprecated, being phased out  
+**Protocol**: Custom binary over TCP  
+**Port**: 6080 (default)
+
+**File**: `acoustid/indexclient.py`
+
+```python
+class IndexClientPool:
+    """Connection pool for legacy TCP index."""
+    
+    def __init__(self, host, port, pool_size=10):
+        self.host = host
+        self.port = port
+        self.pool = Queue(maxsize=pool_size)
+        
+    def search(self, fingerprint, limit=10):
+        """Search index for similar fingerprints."""
+        client = self.pool.get()
+        try:
+            # Send search command
+            client.send_command(CMD_SEARCH, {
+                'fingerprint': fingerprint,
+                'limit': limit
+            })
+            # Receive results
+            results = client.receive_response()
+            return results
+        finally:
+            self.pool.put(client)
+```
+
+**Message Format**:
+```
+┌────────────┬─────────┬──────────────────┐
+│ Length (4B)│ Cmd (1B)│ Payload (msgpack)│
+└────────────┴─────────┴──────────────────┘
+```
+
+#### Modern HTTP Client (fpstore.py)
+
+**Status**: Current, recommended  
+**Protocol**: HTTP/1.1 with MessagePack  
+**Port**: 6081 (default)
+
+**File**: `acoustid/fpstore.py`
+
+```python
+class FingerprintIndexClient:
+    """Async HTTP client for fingerprint index."""
+    
+    def __init__(self, base_url, index_name='fingerprints'):
+        self.base_url = base_url
+        self.index_name = index_name
+        self.session = aiohttp.ClientSession()
+        
+    async def search(self, query_terms, limit=10, min_score=0.5):
+        """Search index for matching fingerprints.
+        
+        Args:
+            query_terms: List of hash integers
+            limit: Maximum results to return
+            min_score: Minimum similarity score
+            
+        Returns:
+            List of (fingerprint_id, score) tuples
+        """
+        url = f"{self.base_url}/{self.index_name}/_search"
+        payload = msgspec.msgpack.encode({
+            'query': query_terms,
+            'limit': limit,
+            'min_score': min_score
+        })
+        
+        async with self.session.post(url, data=payload) as resp:
+            data = await resp.read()
+            result = msgspec.msgpack.decode(data)
+            return [(r['id'], r['score']) for r in result['results']]
+    
+    async def insert(self, fingerprint_id, terms):
+        """Insert or update fingerprint in index."""
+        url = f"{self.base_url}/{self.index_name}/{fingerprint_id}"
+        payload = msgspec.msgpack.encode({'terms': terms})
+        
+        async with self.session.put(url, data=payload) as resp:
+            return resp.status == 200
+    
+    async def delete(self, fingerprint_id):
+        """Delete fingerprint from index."""
+        url = f"{self.base_url}/{self.index_name}/{fingerprint_id}"
+        async with self.session.delete(url) as resp:
+            return resp.status == 200
+```
+
+### Index Operations
+
+**Search Flow**:
+1. Extract query terms from fingerprint (50-100 hashes)
+2. Encode query as MessagePack
+3. POST to `/:index/_search`
+4. Decode MessagePack response
+5. Return list of (fingerprint_id, score) tuples
+
+**Insert Flow**:
+1. Extract all terms from fingerprint
+2. Encode as MessagePack
+3. PUT to `/:index/:fingerprint_id`
+4. Index adds to MemorySegment
+5. Appends to Oplog for durability
+
+**Batch Update Flow**:
+1. Collect multiple fingerprint updates
+2. Encode batch as MessagePack
+3. POST to `/:index/_update`
+4. Index processes all updates atomically
+
+### Error Handling
+
+**Retry Strategy**:
+
+```python
+async def search_with_retry(client, query, max_retries=3):
+    """Search with exponential backoff retry."""
+    for attempt in range(max_retries):
+        try:
+            return await client.search(query)
+        except aiohttp.ClientError as e:
+            if attempt == max_retries - 1:
+                raise
+            wait_time = 2 ** attempt
+            await asyncio.sleep(wait_time)
+```
+
+**Circuit Breaker**:
+
+```python
+class CircuitBreaker:
+    """Prevent cascading failures to index."""
+    
+    def __init__(self, failure_threshold=5, timeout=60):
+        self.failure_count = 0
+        self.failure_threshold = failure_threshold
+        self.timeout = timeout
+        self.last_failure_time = None
+        self.state = 'closed'  # closed, open, half-open
+        
+    async def call(self, func, *args, **kwargs):
+        if self.state == 'open':
+            if time.time() - self.last_failure_time > self.timeout:
+                self.state = 'half-open'
+            else:
+                raise CircuitBreakerOpen()
+        
+        try:
+            result = await func(*args, **kwargs)
+            if self.state == 'half-open':
+                self.state = 'closed'
+                self.failure_count = 0
+            return result
+        except Exception as e:
+            self.failure_count += 1
+            self.last_failure_time = time.time()
+            if self.failure_count >= self.failure_threshold:
+                self.state = 'open'
+            raise
+```
+
+## Fingerprint Store (fpstore)
+
+### Optional Service
+
+**Purpose**: Separate storage for raw fingerprint data  
+**Status**: Optional (can use PostgreSQL instead)  
+**Protocol**: HTTP with MessagePack
+
+**Configuration**:
+```ini
+[fingerprint_store]
+enabled = true
+base_url = http://fpstore:8080
+```
+
+**Operations**:
+
+```python
+class FingerprintStore:
+    """Client for fingerprint storage service."""
+    
+    async def store(self, fingerprint_id, fingerprint_data):
+        """Store raw fingerprint data."""
+        url = f"{self.base_url}/fingerprints/{fingerprint_id}"
+        payload = msgspec.msgpack.encode({
+            'data': fingerprint_data
+        })
+        async with self.session.put(url, data=payload) as resp:
+            return resp.status == 200
+    
+    async def retrieve(self, fingerprint_id):
+        """Retrieve raw fingerprint data."""
+        url = f"{self.base_url}/fingerprints/{fingerprint_id}"
+        async with self.session.get(url) as resp:
+            data = await resp.read()
+            result = msgspec.msgpack.decode(data)
+            return result['data']
+```
+
+## NATS Integration
+
+### Message Queue
+
+**Purpose**: Async submission processing  
+**Technology**: NATS with JetStream (persistent queue)  
+**Library**: `nats-py`
+
+**Configuration**:
+```ini
+[nats]
+servers = nats://nats:4222
+stream = acoustid_submissions
+consumer = acoustid_worker
+```
+
+**File**: `acoustid/worker.py`
+
+### Publisher (API Server)
+
+```python
+import nats
+from nats.js import JetStreamContext
+
+async def publish_submission(submission_id):
+    """Publish submission to NATS queue."""
+    nc = await nats.connect(servers=["nats://nats:4222"])
+    js: JetStreamContext = nc.jetstream()
+    
+    # Ensure stream exists
+    await js.add_stream(
+        name="acoustid_submissions",
+        subjects=["submissions.*"],
+        retention="workqueue"
+    )
+    
+    # Publish message
+    await js.publish(
+        subject="submissions.new",
+        payload=msgspec.json.encode({
+            'submission_id': submission_id,
+            'timestamp': time.time()
+        })
+    )
+    
+    await nc.close()
+```
+
+### Consumer (Worker)
+
+```python
+async def consume_submissions():
+    """Consume submissions from NATS queue."""
+    nc = await nats.connect(servers=["nats://nats:4222"])
+    js: JetStreamContext = nc.jetstream()
+    
+    # Create consumer
+    consumer = await js.pull_subscribe(
+        subject="submissions.*",
+        durable="acoustid_worker",
+        config=nats.js.api.ConsumerConfig(
+            ack_policy="explicit",
+            max_deliver=3,
+            ack_wait=300  # 5 minutes
+        )
+    )
+    
+    while True:
+        # Fetch batch of messages
+        messages = await consumer.fetch(batch=10, timeout=5)
+        
+        for msg in messages:
+            try:
+                data = msgspec.json.decode(msg.data)
+                await process_submission(data['submission_id'])
+                await msg.ack()
+            except Exception as e:
+                logger.error(f"Failed to process submission: {e}")
+                await msg.nak(delay=60)  # Retry after 1 minute
+```
+
+### JetStream Configuration
+
+**Stream Settings**:
+- Retention: WorkQueue (messages deleted after ack)
+- Max age: 7 days (unprocessed messages)
+- Max messages: 1,000,000
+- Storage: File (persistent)
+
+**Consumer Settings**:
+- Ack policy: Explicit (manual acknowledgment)
+- Max deliver: 3 (retry up to 3 times)
+- Ack wait: 300 seconds (5 minutes timeout)
+- Max ack pending: 100 (max unacked messages)
+
+## Redis Integration
+
+### Use Cases
+
+1. **Rate Limiting**: Sliding window counters
+2. **Task Queue** (legacy): RPUSH/LPOP queue
+3. **Caching**: API key validation, MBID existence
+4. **State Management**: Backfill progress, worker state
+
+**Configuration**:
+```ini
+[redis]
+host = redis
+port = 6379
+db = 0
+password_file = /run/secrets/redis_password
+```
+
+**File**: `acoustid/redis.py`
+
+### Connection Pool
+
+```python
+import redis
+
+redis_pool = redis.ConnectionPool(
+    host='redis',
+    port=6379,
+    db=0,
+    max_connections=50,
+    socket_timeout=5,
+    socket_connect_timeout=5
+)
+
+redis_client = redis.Redis(connection_pool=redis_pool)
+```
+
+### Rate Limiting Implementation
+
+See DATA.md for detailed rate limiting data structures.
+
+### Caching Patterns
+
+**API Key Cache**:
+```python
+from cachetools import TTLCache
+
+api_key_cache = TTLCache(maxsize=1000, ttl=60)
+
+def get_application_by_key(api_key):
+    if api_key in api_key_cache:
+        return api_key_cache[api_key]
+    
+    app = db.query(Application).filter_by(apikey=api_key).first()
+    if app:
+        api_key_cache[api_key] = app
+    return app
+```
+
+**Unknown MBID Cache**:
+```python
+def is_mbid_known(mbid):
+    """Check if MBID exists in MusicBrainz."""
+    cache_key = f"unknown_mbid:{mbid}"
+    
+    # Check cache
+    if redis_client.exists(cache_key):
+        return False
+    
+    # Query MusicBrainz
+    exists = mb_db.query(Recording).filter_by(gid=mbid).count() > 0
+    
+    # Cache negative result
+    if not exists:
+        redis_client.setex(cache_key, 3600, '1')
+    
+    return exists
+```
+
+## Integration Summary
+
+| Service | Protocol | Purpose | Criticality |
+|---------|----------|---------|-------------|
+| MusicBrainz | PostgreSQL | Metadata enrichment | High |
+| Chromaprint | C library | Fingerprint generation | Critical |
+| Index (HTTP) | HTTP/MessagePack | Fingerprint search | Critical |
+| Index (TCP) | TCP binary | Legacy fingerprint search | Low (deprecated) |
+| Fingerprint Store | HTTP/MessagePack | Raw fingerprint storage | Low (optional) |
+| NATS | NATS protocol | Async job queue | High |
+| Redis | Redis protocol | Caching, rate limiting | High |
@@ -0,0 +1,391 @@
+# AcoustID System Overview
+
+## Introduction
+
+AcoustID is an open-source audio fingerprinting service that identifies music recordings by analyzing their acoustic characteristics. The system consists of two primary components working in tandem: a Python-based web service (acoustid-server) and a high-performance Zig-based fingerprint index (acoustid-index). Together, they provide a production-grade solution for matching audio fingerprints to MusicBrainz metadata.
+
+## System Components
+
+### acoustid-server (Python)
+
+The server component handles all user-facing operations, database management, and business logic.
+
+**Repository**: acoustid/acoustid-server  
+**License**: MIT  
+**Language**: Python 3.12+  
+**Current Version**: 26.3.1
+
+**Core Technologies**:
+- **Web Framework**: Werkzeug/Flask (current) with migration to Starlette (future async)
+- **ORM**: SQLAlchemy 2.x with multi-database support
+- **Database**: PostgreSQL 17.4 (4 separate databases)
+- **Cache/Queue**: Redis for rate limiting and task queues
+- **Message Queue**: NATS with JetStream for async submission processing
+- **ASGI Server**: Uvicorn for async endpoints, Gunicorn for legacy
+
+**Key Dependencies**:
+```
+acoustid-ext (C extension for Chromaprint)
+Flask (current web framework)
+Starlette (future async framework)
+aiohttp (async HTTP client)
+SQLAlchemy 2.x (ORM)
+alembic (database migrations)
+asyncpg (async PostgreSQL driver)
+psycopg2 (sync PostgreSQL driver)
+nats-py (NATS client)
+mbdata (MusicBrainz data models)
+msgspec (fast JSON/MessagePack)
+zstd (compression)
+gunicorn (WSGI server)
+uvicorn (ASGI server)
+```
+
+**Entry Point**:
+```bash
+# Main CLI entry
+python manage.py -> acoustid.cli:main()
+
+# Available commands
+python manage.py run web      # Web UI server
+python manage.py run api      # API server
+python manage.py run cron     # Scheduled tasks
+python manage.py run worker   # Background worker
+python manage.py run import   # Import fingerprints
+```
+
+**File Locations**:
+- Entry script: `manage.py`
+- CLI implementation: `acoustid/cli.py`
+- Server logic: `acoustid/server.py`
+- Worker logic: `acoustid/worker.py`
+- Cron jobs: `acoustid/cron.py`
+- Configuration: `acoustid/config.py`
+
+### acoustid-index (Zig)
+
+The index component provides ultra-fast fingerprint search using advanced data structures and SIMD optimizations.
+
+**Repository**: acoustid/acoustid-index  
+**License**: GPL-3.0  
+**Language**: Zig  
+**Build System**: Zig build system
+
+**Core Technologies**:
+- **HTTP Server**: httpz (Zig HTTP library)
+- **Data Structure**: LSM-tree (Log-Structured Merge-tree) inverted index
+- **Compression**: StreamVByte SIMD compression for posting lists
+- **Serialization**: MessagePack for wire protocol
+- **Metrics**: Prometheus-compatible metrics endpoint
+
+**Key Dependencies**:
+```
+httpz (HTTP server framework)
+metrics (Prometheus metrics)
+zul (Zig utility library)
+msgpack (MessagePack serialization)
+nats (NATS client)
+```
+
+**Entry Point**:
+```bash
+# Build and run
+zig build run -- --dir /tmp --port 8080
+
+# Binary name
+fpindex
+
+# CLI flags
+--dir <path>          # Data directory for index storage
+--port <number>       # HTTP server port (default: 6081)
+--threads <number>    # Worker thread count
+--log-level <level>   # Logging verbosity
+--cluster <name>      # Cluster name for distributed setup
+--nats-url <url>      # NATS server URL for clustering
+```
+
+**File Locations**:
+- Main entry: `src/main.zig`
+- HTTP server: `src/server.zig`
+- API handlers: `src/api.zig`
+- Multi-index manager: `src/MultiIndex.zig`
+- Core index: `src/Index.zig`
+- Index reader: `src/IndexReader.zig`
+- Segment management: `src/segment.zig`
+- Memory segment: `src/MemorySegment.zig`
+- File segment: `src/FileSegment.zig`
+- Write-ahead log: `src/Oplog.zig`
+- File format: `src/filefmt.zig`
+- Block compression: `src/block.zig`
+- SIMD compression: `src/streamvbyte.zig`
+- Metrics: `src/metrics.zig`
+
+## Build and Run
+
+### Server Build
+
+```bash
+# Install dependencies with uv
+uv sync
+
+# Build Chromaprint extension
+# (handled automatically in Docker build)
+
+# Run with docker-compose
+docker compose up
+```
+
+**Docker Compose Services**:
+- `nats`: Message queue
+- `redis`: Cache and rate limiting
+- `postgres`: Database (custom pg17.4 image)
+- `index`: Fingerprint index service
+- `api`: API server
+- `web`: Web UI server
+- `cron`: Scheduled tasks
+- `worker`: Background job processor
+
+### Index Build
+
+```bash
+# Build binary
+zig build
+
+# Run with options
+zig build run -- --dir /var/lib/acoustid-index --port 6081 --threads 4
+```
+
+## Architecture Relationship
+
+The two components work together in a client-server model:
+
+1. **Server** receives fingerprint submissions and lookup requests via HTTP API
+2. **Server** stores metadata in PostgreSQL
+3. **Server** sends fingerprint data to **Index** via HTTP/MessagePack protocol
+4. **Index** performs ultra-fast similarity search using LSM-tree
+5. **Index** returns candidate fingerprint IDs to **Server**
+6. **Server** enriches results with metadata from PostgreSQL and MusicBrainz
+7. **Server** returns final results to client
+
+## Communication Protocols
+
+### Server to Index
+
+**Modern Protocol** (fpstore.py):
+- HTTP POST to `http://index:6081/:index/_search`
+- Request body: MessagePack-encoded fingerprint query
+- Response: MessagePack-encoded list of candidate IDs with scores
+
+**Legacy Protocol** (indexclient.py):
+- Raw TCP socket connection
+- Binary protocol with custom framing
+- Being phased out in favor of HTTP
+
+### Client to Server
+
+**Public API**:
+- HTTP GET/POST to `https://api.acoustid.org/v2/*`
+- JSON/XML/JSONP responses
+- Rate-limited by API key and IP
+
+## Version Information
+
+**Server Version**: 26.3.1
+- Semantic versioning
+- Tagged releases in Git
+- Version defined in `acoustid/__init__.py`
+
+**Index Version**: No formal versioning yet
+- Tracked by Git commit hash
+- Breaking changes communicated via commit messages
+
+## Deployment Models
+
+### Production (acoustid.org)
+
+- Multi-server deployment
+- Separate API, web, worker, and cron processes
+- Dedicated PostgreSQL cluster (4 databases)
+- Redis cluster for caching
+- NATS cluster for message queue
+- Multiple index instances for load balancing
+
+### Self-Hosted (Docker Compose)
+
+- Single-host deployment
+- All services in containers
+- Shared PostgreSQL instance
+- Single Redis instance
+- Single NATS instance
+- Single index instance
+
+### Development (Local)
+
+- Python virtual environment with uv
+- Local PostgreSQL (or Docker)
+- Local Redis (or Docker)
+- Local NATS (or Docker)
+- Index built and run locally with Zig
+
+## Key Features
+
+### Server Features
+
+- **Fingerprint Submission**: Accept audio fingerprints with optional metadata
+- **Fingerprint Lookup**: Match fingerprints to known recordings
+- **MusicBrainz Integration**: Link fingerprints to MBIDs
+- **User Management**: API key generation and management
+- **Rate Limiting**: Multi-tier rate limiting (global, app, IP)
+- **Batch Operations**: Submit/lookup up to 20 fingerprints per request
+- **Async Processing**: Background workers for heavy operations
+- **Health Checks**: Multiple health endpoints for monitoring
+- **Metrics**: StatsD metrics for observability
+
+### Index Features
+
+- **Fast Search**: Sub-millisecond fingerprint matching
+- **SIMD Optimization**: StreamVByte compression for posting lists
+- **LSM-Tree Storage**: Efficient write and read performance
+- **Background Merging**: Automatic segment compaction
+- **Snapshot Support**: Point-in-time index snapshots
+- **Cluster Support**: Distributed index via NATS
+- **Prometheus Metrics**: Built-in metrics endpoint
+- **HTTP API**: RESTful API for all operations
+
+## Configuration
+
+### Server Configuration
+
+**Config File**: `acoustid.conf` (INI format)
+**Environment Variables**: `ACOUSTID_*` prefix
+**Secret Files**: `*_file` suffix for file-based secrets
+
+Example:
+```ini
+[database]
+name = acoustid_app
+user = acoustid
+password_file = /run/secrets/db_password
+
+[redis]
+host = redis
+port = 6379
+
+[fingerprint_index]
+host = index
+port = 6081
+```
+
+### Index Configuration
+
+**CLI Flags Only**: No config file support
+**Environment Variables**: Limited support
+
+Example:
+```bash
+fpindex \
+  --dir /var/lib/acoustid-index \
+  --port 6081 \
+  --threads 4 \
+  --log-level info \
+  --nats-url nats://nats:4222
+```
+
+## Data Flow Summary
+
+### Submission Flow
+
+1. Client submits fingerprint via `/v2/submit`
+2. Server validates API keys and rate limits
+3. Server stores submission in `submission` table
+4. Server publishes message to NATS queue
+5. Worker picks up message from NATS
+6. Worker searches index for matches
+7. Worker creates or links track in PostgreSQL
+8. Worker updates index with new fingerprint
+9. Client polls `/v2/submission_status` for result
+
+### Lookup Flow
+
+1. Client requests lookup via `/v2/lookup`
+2. Server validates API key and rate limits
+3. Server decodes fingerprint from request
+4. Server extracts query features from fingerprint
+5. Server sends search request to index
+6. Index returns candidate fingerprint IDs
+7. Server fetches metadata from PostgreSQL
+8. Server fetches MusicBrainz data if requested
+9. Server returns enriched results as JSON
+
+## Technology Stack Summary
+
+| Component | Server | Index |
+|-----------|--------|-------|
+| Language | Python 3.12+ | Zig |
+| Web Framework | Flask/Starlette | httpz |
+| Database | PostgreSQL 17.4 | N/A (file-based) |
+| ORM | SQLAlchemy 2.x | N/A |
+| Cache | Redis | N/A |
+| Queue | NATS+JetStream | NATS (optional) |
+| Serialization | JSON/MessagePack | MessagePack |
+| Compression | zstd | StreamVByte |
+| Metrics | StatsD | Prometheus |
+| Testing | pytest | Zig test |
+| Build | uv | zig build |
+| Container | Docker | Docker |
+
+## Repository Structure
+
+### acoustid-server
+
+```
+acoustid/
+├── api/              # API handlers
+│   └── v2/          # API v2 endpoints
+├── data/            # Business logic layer
+├── future/          # Starlette migration code
+├── web/             # Web UI handlers
+├── scripts/         # Utility scripts
+├── cli.py           # CLI commands
+├── server.py        # Server entry point
+├── worker.py        # Background worker
+├── cron.py          # Scheduled tasks
+├── fingerprint.py   # Fingerprint utilities
+├── indexclient.py   # Legacy index client
+├── fpstore.py       # Modern index client
+├── db.py            # Database connection
+├── config.py        # Configuration
+└── tables.py        # SQLAlchemy models
+```
+
+### acoustid-index
+
+```
+src/
+├── main.zig              # Entry point
+├── server.zig            # HTTP server
+├── api.zig               # API handlers
+├── MultiIndex.zig        # Multi-index manager
+├── Index.zig             # Core index
+├── IndexReader.zig       # Read-only index view
+├── segment.zig           # Segment interface
+├── MemorySegment.zig     # In-memory segment
+├── FileSegment.zig       # On-disk segment
+├── Oplog.zig             # Write-ahead log
+├── filefmt.zig           # File format
+├── block.zig             # Block compression
+├── streamvbyte.zig       # SIMD compression
+└── metrics.zig           # Prometheus metrics
+```
+
+## Next Steps
+
+For detailed information on specific aspects of the AcoustID system, refer to:
+
+- **ARCHITECTURE.md**: Detailed architecture and data flow
+- **API.md**: Complete API reference
+- **DATA.md**: Database schema and data models
+- **INTEGRATIONS.md**: External service integrations
+- **DEPLOYMENT.md**: Deployment and infrastructure
+- **CODEBASE.md**: Code organization and patterns
+- **EVALUATION.md**: System evaluation and recommendations