a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
872 lines
24 KiB
Markdown
872 lines
24 KiB
Markdown
# AcoustID Data Model
|
|
|
|
## Database Architecture
|
|
|
|
AcoustID uses a multi-database PostgreSQL architecture with separate databases for different concerns.
|
|
|
|
### Database Instances
|
|
|
|
| Database | Purpose | Tables | Extensions |
|
|
|----------|---------|--------|------------|
|
|
| `acoustid_app` | Application data (accounts, apps, stats) | 8 | pgcrypto |
|
|
| `acoustid_fingerprint` | Fingerprint and track data | 19 | intarray, acoustid, cube |
|
|
| `acoustid_ingest` | Submission processing | 3 | - |
|
|
| `musicbrainz` | MusicBrainz mirror (read-only) | Many | - |
|
|
|
|
### PostgreSQL Extensions
|
|
|
|
**intarray**: Integer array operations
|
|
- Used for fingerprint array queries
|
|
- Provides `&&` (overlap) and `@>` (contains) operators
|
|
|
|
**pgcrypto**: Cryptographic functions
|
|
- UUID generation (`gen_random_uuid()`)
|
|
- API key hashing
|
|
|
|
**acoustid** (custom): Fingerprint similarity functions
|
|
- `acoustid_compare(int[], int[])`: Compare two fingerprints
|
|
- `acoustid_extract_query(int[])`: Extract query terms
|
|
- Source: `acoustid-ext` C extension
|
|
|
|
**cube**: Multi-dimensional cube data type
|
|
- Used for simhash-based fingerprint indexing
|
|
- Enables fast approximate nearest neighbor search
|
|
|
|
## Core Tables
|
|
|
|
### Account Management (acoustid_app)
|
|
|
|
#### `account`
|
|
|
|
User accounts for API access.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Account ID |
|
|
| `name` | VARCHAR(255) | NOT NULL | Display name |
|
|
| `apikey` | VARCHAR(40) | UNIQUE, NOT NULL | API key (user key) |
|
|
| `mbuser` | VARCHAR(64) | UNIQUE | MusicBrainz username |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
| `lastlogin` | TIMESTAMP | | Last login timestamp |
|
|
| `submission_count` | INTEGER | DEFAULT 0 | Total submissions |
|
|
| `application_id` | INTEGER | FOREIGN KEY | Default application |
|
|
| `application_version` | VARCHAR(255) | | Application version |
|
|
| `created_from` | INET | | Registration IP |
|
|
| `is_admin` | BOOLEAN | DEFAULT FALSE | Admin flag |
|
|
|
|
**Indexes**:
|
|
- `account_pkey` (PRIMARY KEY on `id`)
|
|
- `account_apikey_key` (UNIQUE on `apikey`)
|
|
- `account_mbuser_key` (UNIQUE on `mbuser`)
|
|
|
|
#### `application`
|
|
|
|
API client applications.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Application ID |
|
|
| `name` | VARCHAR(255) | NOT NULL | Application name |
|
|
| `version` | VARCHAR(255) | | Version string |
|
|
| `apikey` | VARCHAR(40) | UNIQUE, NOT NULL | API key (client key) |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
| `active` | BOOLEAN | DEFAULT TRUE | Active status |
|
|
| `account_id` | INTEGER | FOREIGN KEY | Owner account |
|
|
| `email` | VARCHAR(255) | | Contact email |
|
|
| `website` | VARCHAR(1000) | | Website URL |
|
|
| `rate_limit` | INTEGER | | Custom rate limit (req/s) |
|
|
|
|
**Indexes**:
|
|
- `application_pkey` (PRIMARY KEY on `id`)
|
|
- `application_apikey_key` (UNIQUE on `apikey`)
|
|
|
|
#### `account_openid`
|
|
|
|
OpenID authentication links.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `openid` | VARCHAR(255) | PRIMARY KEY | OpenID identifier |
|
|
| `account_id` | INTEGER | FOREIGN KEY | Linked account |
|
|
|
|
#### `account_google`
|
|
|
|
Google OAuth authentication links.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `google_user_id` | VARCHAR(255) | PRIMARY KEY | Google user ID |
|
|
| `account_id` | INTEGER | FOREIGN KEY | Linked account |
|
|
|
|
### Fingerprint Data (acoustid_fingerprint)
|
|
|
|
#### `track`
|
|
|
|
Unique audio tracks identified by fingerprints.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Track ID |
|
|
| `gid` | UUID | UNIQUE, NOT NULL | Public track UUID |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
| `new_id` | INTEGER | FOREIGN KEY | Merge target (if merged) |
|
|
| `disabled` | BOOLEAN | DEFAULT FALSE | Disabled flag |
|
|
|
|
**Indexes**:
|
|
- `track_pkey` (PRIMARY KEY on `id`)
|
|
- `track_gid_key` (UNIQUE on `gid`)
|
|
- `track_new_id_idx` (on `new_id`)
|
|
|
|
**Notes**:
|
|
- `gid` is the public-facing AcoustID track ID
|
|
- `new_id` points to merged track (for deduplication)
|
|
- Disabled tracks excluded from search results
|
|
|
|
#### `fingerprint`
|
|
|
|
Audio fingerprints linked to tracks.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Fingerprint ID |
|
|
| `track_id` | INTEGER | FOREIGN KEY | Linked track |
|
|
| `fingerprint` | INTEGER[] | NOT NULL | Chromaprint hash array |
|
|
| `length` | SMALLINT | NOT NULL | Duration in seconds |
|
|
| `bitrate` | SMALLINT | | Audio bitrate (kbps) |
|
|
| `format_id` | INTEGER | FOREIGN KEY | Audio format |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
|
|
|
**Indexes**:
|
|
- `fingerprint_pkey` (PRIMARY KEY on `id`)
|
|
- `fingerprint_track_id_idx` (on `track_id`)
|
|
- `fingerprint_length_idx` (on `length`)
|
|
- `fingerprint_fingerprint_idx` (GIN on `fingerprint` using `intarray`)
|
|
|
|
**Notes**:
|
|
- `fingerprint` is an array of 32-bit integers (Chromaprint hashes)
|
|
- GIN index enables fast similarity search
|
|
- `submission_count` tracks popularity
|
|
|
|
#### `fingerprint_data`
|
|
|
|
Extended fingerprint data with simhash.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `fingerprint_id` | INTEGER | PRIMARY KEY, FOREIGN KEY | Fingerprint ID |
|
|
| `fingerprint` | BYTEA | NOT NULL | Raw fingerprint data |
|
|
| `simhash` | CUBE | | Locality-sensitive hash |
|
|
|
|
**Indexes**:
|
|
- `fingerprint_data_pkey` (PRIMARY KEY on `fingerprint_id`)
|
|
- `fingerprint_data_simhash_idx` (GIST on `simhash`)
|
|
|
|
**Notes**:
|
|
- `fingerprint` stores compressed Chromaprint data
|
|
- `simhash` enables approximate nearest neighbor search
|
|
- GIST index for fast similarity queries
|
|
|
|
#### `track_mbid`
|
|
|
|
Links tracks to MusicBrainz recordings.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
|
| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
|
|
| `mbid` | UUID | NOT NULL | MusicBrainz recording MBID |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
|
| `disabled` | BOOLEAN | DEFAULT FALSE | Disabled flag |
|
|
|
|
**Indexes**:
|
|
- `track_mbid_pkey` (PRIMARY KEY on `id`)
|
|
- `track_mbid_track_id_mbid_key` (UNIQUE on `track_id, mbid`)
|
|
- `track_mbid_mbid_idx` (on `mbid`)
|
|
|
|
**Notes**:
|
|
- Multiple MBIDs per track possible (different recordings)
|
|
- `submission_count` indicates confidence
|
|
- Disabled links excluded from results
|
|
|
|
#### `meta`
|
|
|
|
User-submitted metadata.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Metadata ID |
|
|
| `track` | VARCHAR(255) | | Track title |
|
|
| `artist` | VARCHAR(255) | | Artist name |
|
|
| `album` | VARCHAR(255) | | Album title |
|
|
| `album_artist` | VARCHAR(255) | | Album artist |
|
|
| `track_no` | INTEGER | | Track number |
|
|
| `disc_no` | INTEGER | | Disc number |
|
|
| `year` | INTEGER | | Release year |
|
|
|
|
**Indexes**:
|
|
- `meta_pkey` (PRIMARY KEY on `id`)
|
|
|
|
#### `track_meta`
|
|
|
|
Links tracks to user metadata.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
|
| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
|
|
| `meta_id` | INTEGER | FOREIGN KEY | Metadata record |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
|
|
|
**Indexes**:
|
|
- `track_meta_pkey` (PRIMARY KEY on `id`)
|
|
- `track_meta_track_id_meta_id_key` (UNIQUE on `track_id, meta_id`)
|
|
|
|
#### `format`
|
|
|
|
Audio file formats.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Format ID |
|
|
| `name` | VARCHAR(20) | UNIQUE, NOT NULL | Format name (mp3, flac, etc.) |
|
|
|
|
**Indexes**:
|
|
- `format_pkey` (PRIMARY KEY on `id`)
|
|
- `format_name_key` (UNIQUE on `name`)
|
|
|
|
**Common Values**:
|
|
- `mp3`, `flac`, `ogg`, `m4a`, `wma`, `ape`, `wav`
|
|
|
|
#### `source`
|
|
|
|
Submission sources (applications).
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Source ID |
|
|
| `application_id` | INTEGER | FOREIGN KEY | Application |
|
|
| `account_id` | INTEGER | FOREIGN KEY | User account |
|
|
| `version` | VARCHAR(255) | | Application version |
|
|
|
|
**Indexes**:
|
|
- `source_pkey` (PRIMARY KEY on `id`)
|
|
- `source_application_id_account_id_version_key` (UNIQUE on `application_id, account_id, version`)
|
|
|
|
### Foreign IDs (acoustid_fingerprint)
|
|
|
|
#### `foreignid_vendor`
|
|
|
|
External ID providers.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Vendor ID |
|
|
| `name` | VARCHAR(255) | UNIQUE, NOT NULL | Vendor name |
|
|
|
|
**Indexes**:
|
|
- `foreignid_vendor_pkey` (PRIMARY KEY on `id`)
|
|
- `foreignid_vendor_name_key` (UNIQUE on `name`)
|
|
|
|
**Common Values**:
|
|
- `musicbrainz`, `musicip`, `discogs`, `spotify`
|
|
|
|
#### `foreignid`
|
|
|
|
External identifiers.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Foreign ID |
|
|
| `vendor_id` | INTEGER | FOREIGN KEY | Vendor |
|
|
| `name` | VARCHAR(255) | NOT NULL | External ID value |
|
|
|
|
**Indexes**:
|
|
- `foreignid_pkey` (PRIMARY KEY on `id`)
|
|
- `foreignid_vendor_id_name_key` (UNIQUE on `vendor_id, name`)
|
|
|
|
#### `track_foreignid`
|
|
|
|
Links tracks to external IDs.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
|
| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
|
|
| `foreignid_id` | INTEGER | FOREIGN KEY | External ID |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
|
|
|
**Indexes**:
|
|
- `track_foreignid_pkey` (PRIMARY KEY on `id`)
|
|
- `track_foreignid_track_id_foreignid_id_key` (UNIQUE on `track_id, foreignid_id`)
|
|
|
|
#### `track_puid`
|
|
|
|
Legacy MusicIP PUID links.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
|
| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
|
|
| `puid` | UUID | NOT NULL | MusicIP PUID |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
|
|
|
**Indexes**:
|
|
- `track_puid_pkey` (PRIMARY KEY on `id`)
|
|
- `track_puid_track_id_puid_key` (UNIQUE on `track_id, puid`)
|
|
- `track_puid_puid_idx` (on `puid`)
|
|
|
|
### Statistics (acoustid_app)
|
|
|
|
#### `stats`
|
|
|
|
General statistics.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Stat ID |
|
|
| `name` | VARCHAR(255) | UNIQUE, NOT NULL | Stat name |
|
|
| `value` | INTEGER | NOT NULL | Stat value |
|
|
| `date` | DATE | NOT NULL | Stat date |
|
|
|
|
**Indexes**:
|
|
- `stats_pkey` (PRIMARY KEY on `id`)
|
|
- `stats_name_date_key` (UNIQUE on `name, date`)
|
|
|
|
**Common Stats**:
|
|
- `lookup.count`, `submission.count`, `track.count`, `fingerprint.count`
|
|
|
|
#### `stats_lookups`
|
|
|
|
Lookup statistics by hour.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Stat ID |
|
|
| `hour` | TIMESTAMP | NOT NULL | Hour timestamp |
|
|
| `application_id` | INTEGER | FOREIGN KEY | Application |
|
|
| `count_hits` | INTEGER | DEFAULT 0 | Successful lookups |
|
|
| `count_misses` | INTEGER | DEFAULT 0 | Failed lookups |
|
|
|
|
**Indexes**:
|
|
- `stats_lookups_pkey` (PRIMARY KEY on `id`)
|
|
- `stats_lookups_hour_application_id_key` (UNIQUE on `hour, application_id`)
|
|
|
|
#### `stats_user_agents`
|
|
|
|
User agent statistics.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Stat ID |
|
|
| `date` | DATE | NOT NULL | Date |
|
|
| `application_id` | INTEGER | FOREIGN KEY | Application |
|
|
| `user_agent` | VARCHAR(1000) | NOT NULL | User agent string |
|
|
| `ip` | INET | NOT NULL | IP address |
|
|
| `count` | INTEGER | DEFAULT 0 | Request count |
|
|
|
|
**Indexes**:
|
|
- `stats_user_agents_pkey` (PRIMARY KEY on `id`)
|
|
- `stats_user_agents_date_application_id_user_agent_ip_key` (UNIQUE on `date, application_id, user_agent, ip`)
|
|
|
|
#### `stats_top_accounts`
|
|
|
|
Top submitter accounts.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Stat ID |
|
|
| `account_id` | INTEGER | FOREIGN KEY | Account |
|
|
| `count` | INTEGER | NOT NULL | Submission count |
|
|
|
|
**Indexes**:
|
|
- `stats_top_accounts_pkey` (PRIMARY KEY on `id`)
|
|
- `stats_top_accounts_account_id_key` (UNIQUE on `account_id`)
|
|
|
|
### Submission Processing (acoustid_ingest)
|
|
|
|
#### `submission`
|
|
|
|
Pending fingerprint submissions.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Submission ID |
|
|
| `fingerprint` | INTEGER[] | NOT NULL | Chromaprint hash array |
|
|
| `length` | SMALLINT | NOT NULL | Duration in seconds |
|
|
| `bitrate` | SMALLINT | | Audio bitrate |
|
|
| `format_id` | INTEGER | | Audio format |
|
|
| `created` | TIMESTAMP | NOT NULL | Submission timestamp |
|
|
| `source_id` | INTEGER | FOREIGN KEY | Submission source |
|
|
| `mbid` | UUID | | MusicBrainz MBID (if provided) |
|
|
| `handled` | BOOLEAN | DEFAULT FALSE | Processing status |
|
|
| `meta_id` | INTEGER | FOREIGN KEY | User metadata |
|
|
|
|
**Indexes**:
|
|
- `submission_pkey` (PRIMARY KEY on `id`)
|
|
- `submission_handled_idx` (on `handled` WHERE `handled = FALSE`)
|
|
|
|
**Notes**:
|
|
- Worker processes unhandled submissions
|
|
- `handled = TRUE` after processing
|
|
|
|
#### `submission_result`
|
|
|
|
Processing results for submissions.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Result ID |
|
|
| `submission_id` | INTEGER | FOREIGN KEY | Submission |
|
|
| `track_id` | INTEGER | FOREIGN KEY | Matched/created track |
|
|
| `created` | TIMESTAMP | NOT NULL | Processing timestamp |
|
|
|
|
**Indexes**:
|
|
- `submission_result_pkey` (PRIMARY KEY on `id`)
|
|
- `submission_result_submission_id_key` (UNIQUE on `submission_id`)
|
|
|
|
#### `pending_submission`
|
|
|
|
Queue for async submission processing.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Queue ID |
|
|
| `submission_id` | INTEGER | FOREIGN KEY | Submission |
|
|
| `created` | TIMESTAMP | NOT NULL | Queue timestamp |
|
|
|
|
**Indexes**:
|
|
- `pending_submission_pkey` (PRIMARY KEY on `id`)
|
|
- `pending_submission_submission_id_key` (UNIQUE on `submission_id`)
|
|
|
|
**Notes**:
|
|
- Replaced by NATS queue in newer deployments
|
|
- Legacy table, may be deprecated
|
|
|
|
### Provenance Tables (acoustid_fingerprint)
|
|
|
|
Track data lineage and changes.
|
|
|
|
#### `fingerprint_source`
|
|
|
|
Links fingerprints to submission sources.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
|
| `fingerprint_id` | INTEGER | FOREIGN KEY | Fingerprint |
|
|
| `source_id` | INTEGER | FOREIGN KEY | Source |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
|
|
#### `track_mbid_source`
|
|
|
|
Links track-MBID associations to sources.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
|
| `track_mbid_id` | INTEGER | FOREIGN KEY | Track-MBID link |
|
|
| `source_id` | INTEGER | FOREIGN KEY | Source |
|
|
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
|
|
|
#### `track_mbid_change`
|
|
|
|
Audit log for track-MBID changes.
|
|
|
|
| Column | Type | Constraints | Description |
|
|
|--------|------|-------------|-------------|
|
|
| `id` | SERIAL | PRIMARY KEY | Change ID |
|
|
| `track_mbid_id` | INTEGER | FOREIGN KEY | Track-MBID link |
|
|
| `account_id` | INTEGER | FOREIGN KEY | Account that made change |
|
|
| `disabled` | BOOLEAN | NOT NULL | New disabled status |
|
|
| `created` | TIMESTAMP | NOT NULL | Change timestamp |
|
|
| `note` | TEXT | | Change reason |
|
|
|
|
## ORM Layer (SQLAlchemy)
|
|
|
|
### Multi-Database Configuration
|
|
|
|
**File**: `acoustid/db.py`
|
|
|
|
```python
|
|
# Database bind keys
|
|
BIND_KEYS = {
|
|
'app': 'acoustid_app',
|
|
'fingerprint': 'acoustid_fingerprint',
|
|
'ingest': 'acoustid_ingest',
|
|
'musicbrainz': 'musicbrainz'
|
|
}
|
|
```
|
|
|
|
**Model Binding**:
|
|
|
|
```python
|
|
class Account(Base):
|
|
__bind_key__ = 'app'
|
|
__tablename__ = 'account'
|
|
# ...
|
|
|
|
class Track(Base):
|
|
__bind_key__ = 'fingerprint'
|
|
__tablename__ = 'track'
|
|
# ...
|
|
```
|
|
|
|
### Connection Pooling
|
|
|
|
**Configuration** (`acoustid.conf`):
|
|
|
|
```ini
|
|
[database]
|
|
name = acoustid_app
|
|
user = acoustid
|
|
password_file = /run/secrets/db_password
|
|
host = postgres
|
|
port = 5432
|
|
pool_size = 20
|
|
pool_recycle = 3600
|
|
```
|
|
|
|
**Pool Settings**:
|
|
- `pool_size`: Maximum connections per process
|
|
- `pool_recycle`: Recycle connections after N seconds
|
|
- `pool_pre_ping`: Test connections before use
|
|
|
|
### Query Patterns
|
|
|
|
**Fingerprint Search** (legacy, pre-index):
|
|
|
|
```python
|
|
# Find similar fingerprints using intarray overlap
|
|
query = db.session.query(Fingerprint).filter(
|
|
Fingerprint.fingerprint.op('&&')(query_fingerprint),
|
|
Fingerprint.length.between(duration - 5, duration + 5)
|
|
).order_by(
|
|
func.acoustid_compare(Fingerprint.fingerprint, query_fingerprint).desc()
|
|
).limit(10)
|
|
```
|
|
|
|
**Track Lookup with MBIDs**:
|
|
|
|
```python
|
|
# Fetch track with all linked MBIDs
|
|
track = db.session.query(Track).options(
|
|
joinedload(Track.mbids)
|
|
).filter(Track.gid == track_gid).first()
|
|
```
|
|
|
|
**Submission Processing**:
|
|
|
|
```python
|
|
# Find unhandled submissions
|
|
submissions = db.session.query(Submission).filter(
|
|
Submission.handled == False
|
|
).order_by(Submission.created).limit(100).all()
|
|
```
|
|
|
|
## Database Migrations
|
|
|
|
### Alembic Configuration
|
|
|
|
**File**: `alembic.ini`
|
|
|
|
**Migration Directories**:
|
|
- `alembic/versions/app/`: acoustid_app migrations
|
|
- `alembic/versions/fingerprint/`: acoustid_fingerprint migrations
|
|
- `alembic/versions/ingest/`: acoustid_ingest migrations
|
|
|
|
**Multi-Database Support**:
|
|
|
|
```python
|
|
# alembic/env.py
|
|
def run_migrations_online():
|
|
for bind_key in ['app', 'fingerprint', 'ingest']:
|
|
engine = get_engine(bind_key)
|
|
with engine.connect() as connection:
|
|
context.configure(
|
|
connection=connection,
|
|
target_metadata=get_metadata(bind_key)
|
|
)
|
|
with context.begin_transaction():
|
|
context.run_migrations()
|
|
```
|
|
|
|
### Migration Commands
|
|
|
|
```bash
|
|
# Create new migration
|
|
alembic revision --autogenerate -m "Add new column"
|
|
|
|
# Apply migrations
|
|
alembic upgrade head
|
|
|
|
# Rollback migration
|
|
alembic downgrade -1
|
|
|
|
# Show current version
|
|
alembic current
|
|
|
|
# Show migration history
|
|
alembic history
|
|
```
|
|
|
|
## Redis Data Structures
|
|
|
|
### Rate Limiting
|
|
|
|
**Key Pattern**: `rl:bucket:{scope}:{identifier}:{timestamp}`
|
|
|
|
**Example Keys**:
|
|
```
|
|
rl:bucket:global:1714305600
|
|
rl:bucket:app:8XaBELgH:1714305600
|
|
rl:bucket:ip:192.168.1.1:1714305600
|
|
```
|
|
|
|
**Value**: Integer (request count)
|
|
**TTL**: 25 seconds (window duration + buffer)
|
|
|
|
**Algorithm**:
|
|
```python
|
|
# Increment bucket for current window
|
|
bucket_key = f"rl:bucket:{scope}:{identifier}:{current_window}"
|
|
count = redis.incr(bucket_key)
|
|
redis.expire(bucket_key, 25)
|
|
|
|
# Sum counts across all windows in sliding window
|
|
total = sum(redis.get(f"rl:bucket:{scope}:{identifier}:{w}")
|
|
for w in windows)
|
|
```
|
|
|
|
### Task Queue (Legacy)
|
|
|
|
**Key Pattern**: `queue:{queue_name}`
|
|
|
|
**Operations**:
|
|
```python
|
|
# Push task
|
|
redis.rpush('queue:submissions', json.dumps(task_data))
|
|
|
|
# Pop task
|
|
task_data = redis.lpop('queue:submissions')
|
|
```
|
|
|
|
**Note**: Being replaced by NATS in newer deployments
|
|
|
|
### API Key Cache
|
|
|
|
**Implementation**: In-memory TTLCache (not Redis)
|
|
|
|
```python
|
|
from cachetools import TTLCache
|
|
|
|
api_key_cache = TTLCache(maxsize=1000, ttl=60)
|
|
```
|
|
|
|
**Purpose**: Reduce database queries for API key validation
|
|
|
|
### Backfill State
|
|
|
|
**Key Pattern**: `backfill:{index_name}:{state_key}`
|
|
|
|
**Example Keys**:
|
|
```
|
|
backfill:fingerprints:last_id
|
|
backfill:fingerprints:batch_size
|
|
backfill:fingerprints:completed
|
|
```
|
|
|
|
**Purpose**: Track progress of index backfill operations
|
|
|
|
### Unknown MBID Cache
|
|
|
|
**Key Pattern**: `unknown_mbid:{mbid}`
|
|
|
|
**Value**: Boolean (1 if MBID not found in MusicBrainz)
|
|
**TTL**: 3600 seconds (1 hour)
|
|
|
|
**Purpose**: Avoid repeated MusicBrainz queries for non-existent MBIDs
|
|
|
|
## Data Integrity
|
|
|
|
### Constraints
|
|
|
|
**Foreign Keys**:
|
|
- All foreign keys have `ON DELETE CASCADE` or `ON DELETE SET NULL`
|
|
- Orphaned records cleaned up automatically
|
|
|
|
**Unique Constraints**:
|
|
- Prevent duplicate fingerprints per track
|
|
- Prevent duplicate MBID links per track
|
|
- Ensure API key uniqueness
|
|
|
|
**Check Constraints**:
|
|
- Duration must be positive
|
|
- Bitrate must be positive
|
|
- Submission count must be non-negative
|
|
|
|
### Triggers
|
|
|
|
**Update Submission Count**:
|
|
```sql
|
|
CREATE TRIGGER update_fingerprint_submission_count
|
|
AFTER INSERT ON fingerprint_source
|
|
FOR EACH ROW
|
|
EXECUTE FUNCTION increment_submission_count();
|
|
```
|
|
|
|
**Track Merge Propagation**:
|
|
```sql
|
|
CREATE TRIGGER propagate_track_merge
|
|
AFTER UPDATE OF new_id ON track
|
|
FOR EACH ROW
|
|
EXECUTE FUNCTION update_merged_track_references();
|
|
```
|
|
|
|
### Indexes for Performance
|
|
|
|
**Covering Indexes**:
|
|
```sql
|
|
-- Lookup by fingerprint and duration
|
|
CREATE INDEX fingerprint_lookup_idx
|
|
ON fingerprint (length, track_id)
|
|
INCLUDE (fingerprint);
|
|
```
|
|
|
|
**Partial Indexes**:
|
|
```sql
|
|
-- Only index unhandled submissions
|
|
CREATE INDEX submission_unhandled_idx
|
|
ON submission (created)
|
|
WHERE handled = FALSE;
|
|
```
|
|
|
|
**GIN Indexes**:
|
|
```sql
|
|
-- Fast fingerprint array queries
|
|
CREATE INDEX fingerprint_fingerprint_idx
|
|
ON fingerprint USING GIN (fingerprint gin__int_ops);
|
|
```
|
|
|
|
## Data Lifecycle
|
|
|
|
### Fingerprint Submission
|
|
|
|
1. Insert into `submission` table (acoustid_ingest)
|
|
2. Publish to NATS queue
|
|
3. Worker processes submission
|
|
4. Insert into `fingerprint` table (acoustid_fingerprint)
|
|
5. Link to `track` (create or match)
|
|
6. Insert into `fingerprint_source` (provenance)
|
|
7. Update index via HTTP API
|
|
8. Insert into `submission_result`
|
|
9. Mark `submission.handled = TRUE`
|
|
|
|
### Track Merging
|
|
|
|
1. Identify duplicate tracks (manual or automated)
|
|
2. Set `track.new_id` to target track
|
|
3. Trigger updates all references
|
|
4. Merge fingerprints, MBIDs, metadata
|
|
5. Disable old track (`track.disabled = TRUE`)
|
|
|
|
### Data Cleanup
|
|
|
|
**Cron Jobs**:
|
|
- Delete old handled submissions (>30 days)
|
|
- Clean up orphaned metadata records
|
|
- Remove disabled tracks with no references
|
|
- Archive old statistics
|
|
|
|
## Performance Optimization
|
|
|
|
### Query Optimization
|
|
|
|
**Materialized Views**:
|
|
```sql
|
|
CREATE MATERIALIZED VIEW track_stats AS
|
|
SELECT
|
|
track_id,
|
|
COUNT(DISTINCT fingerprint_id) AS fingerprint_count,
|
|
COUNT(DISTINCT mbid) AS mbid_count,
|
|
SUM(submission_count) AS total_submissions
|
|
FROM fingerprint
|
|
LEFT JOIN track_mbid USING (track_id)
|
|
GROUP BY track_id;
|
|
```
|
|
|
|
**Partitioning** (future):
|
|
```sql
|
|
-- Partition submissions by month
|
|
CREATE TABLE submission_2025_04 PARTITION OF submission
|
|
FOR VALUES FROM ('2025-04-01') TO ('2025-05-01');
|
|
```
|
|
|
|
### Caching Strategy
|
|
|
|
**Application-Level**:
|
|
- API key validation (TTLCache, 60s)
|
|
- Format ID lookup (permanent cache)
|
|
- MusicBrainz MBID existence (Redis, 1h)
|
|
|
|
**Database-Level**:
|
|
- Shared buffers (PostgreSQL config)
|
|
- Connection pooling (SQLAlchemy)
|
|
- Query result caching (pg_stat_statements)
|
|
|
|
### Bulk Operations
|
|
|
|
**Batch Inserts**:
|
|
```python
|
|
# Insert multiple fingerprints efficiently
|
|
db.session.bulk_insert_mappings(Fingerprint, fingerprint_dicts)
|
|
db.session.commit()
|
|
```
|
|
|
|
**Bulk Updates**:
|
|
```python
|
|
# Update submission counts in batch
|
|
db.session.execute(
|
|
update(Fingerprint).where(
|
|
Fingerprint.id.in_(fingerprint_ids)
|
|
).values(
|
|
submission_count=Fingerprint.submission_count + 1
|
|
)
|
|
)
|
|
```
|
|
|
|
## Backup and Recovery
|
|
|
|
### Backup Strategy
|
|
|
|
**PostgreSQL**:
|
|
- Daily full backups (pg_dump)
|
|
- Continuous WAL archiving
|
|
- Point-in-time recovery enabled
|
|
|
|
**Index**:
|
|
- Daily snapshots via `/:index/_snapshot`
|
|
- Incremental backups of Oplog
|
|
- Segment files backed up separately
|
|
|
|
### Disaster Recovery
|
|
|
|
**Database Restore**:
|
|
```bash
|
|
# Restore from dump
|
|
pg_restore -d acoustid_app acoustid_app_backup.dump
|
|
|
|
# Point-in-time recovery
|
|
pg_restore --target-time='2025-04-28 12:00:00'
|
|
```
|
|
|
|
**Index Rebuild**:
|
|
```bash
|
|
# Rebuild from database
|
|
python manage.py run import --rebuild-index
|
|
```
|