feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,871 @@
|
||||
# AcoustID Data Model
|
||||
|
||||
## Database Architecture
|
||||
|
||||
AcoustID uses a multi-database PostgreSQL architecture with separate databases for different concerns.
|
||||
|
||||
### Database Instances
|
||||
|
||||
| Database | Purpose | Tables | Extensions |
|
||||
|----------|---------|--------|------------|
|
||||
| `acoustid_app` | Application data (accounts, apps, stats) | 8 | pgcrypto |
|
||||
| `acoustid_fingerprint` | Fingerprint and track data | 19 | intarray, acoustid, cube |
|
||||
| `acoustid_ingest` | Submission processing | 3 | - |
|
||||
| `musicbrainz` | MusicBrainz mirror (read-only) | Many | - |
|
||||
|
||||
### PostgreSQL Extensions
|
||||
|
||||
**intarray**: Integer array operations
|
||||
- Used for fingerprint array queries
|
||||
- Provides `&&` (overlap) and `@>` (contains) operators
|
||||
|
||||
**pgcrypto**: Cryptographic functions
|
||||
- UUID generation (`gen_random_uuid()`)
|
||||
- API key hashing
|
||||
|
||||
**acoustid** (custom): Fingerprint similarity functions
|
||||
- `acoustid_compare(int[], int[])`: Compare two fingerprints
|
||||
- `acoustid_extract_query(int[])`: Extract query terms
|
||||
- Source: `acoustid-ext` C extension
|
||||
|
||||
**cube**: Multi-dimensional cube data type
|
||||
- Used for simhash-based fingerprint indexing
|
||||
- Enables fast approximate nearest neighbor search
|
||||
|
||||
## Core Tables
|
||||
|
||||
### Account Management (acoustid_app)
|
||||
|
||||
#### `account`
|
||||
|
||||
User accounts for API access.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Account ID |
|
||||
| `name` | VARCHAR(255) | NOT NULL | Display name |
|
||||
| `apikey` | VARCHAR(40) | UNIQUE, NOT NULL | API key (user key) |
|
||||
| `mbuser` | VARCHAR(64) | UNIQUE | MusicBrainz username |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
| `lastlogin` | TIMESTAMP | | Last login timestamp |
|
||||
| `submission_count` | INTEGER | DEFAULT 0 | Total submissions |
|
||||
| `application_id` | INTEGER | FOREIGN KEY | Default application |
|
||||
| `application_version` | VARCHAR(255) | | Application version |
|
||||
| `created_from` | INET | | Registration IP |
|
||||
| `is_admin` | BOOLEAN | DEFAULT FALSE | Admin flag |
|
||||
|
||||
**Indexes**:
|
||||
- `account_pkey` (PRIMARY KEY on `id`)
|
||||
- `account_apikey_key` (UNIQUE on `apikey`)
|
||||
- `account_mbuser_key` (UNIQUE on `mbuser`)
|
||||
|
||||
#### `application`
|
||||
|
||||
API client applications.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Application ID |
|
||||
| `name` | VARCHAR(255) | NOT NULL | Application name |
|
||||
| `version` | VARCHAR(255) | | Version string |
|
||||
| `apikey` | VARCHAR(40) | UNIQUE, NOT NULL | API key (client key) |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
| `active` | BOOLEAN | DEFAULT TRUE | Active status |
|
||||
| `account_id` | INTEGER | FOREIGN KEY | Owner account |
|
||||
| `email` | VARCHAR(255) | | Contact email |
|
||||
| `website` | VARCHAR(1000) | | Website URL |
|
||||
| `rate_limit` | INTEGER | | Custom rate limit (req/s) |
|
||||
|
||||
**Indexes**:
|
||||
- `application_pkey` (PRIMARY KEY on `id`)
|
||||
- `application_apikey_key` (UNIQUE on `apikey`)
|
||||
|
||||
#### `account_openid`
|
||||
|
||||
OpenID authentication links.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `openid` | VARCHAR(255) | PRIMARY KEY | OpenID identifier |
|
||||
| `account_id` | INTEGER | FOREIGN KEY | Linked account |
|
||||
|
||||
#### `account_google`
|
||||
|
||||
Google OAuth authentication links.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `google_user_id` | VARCHAR(255) | PRIMARY KEY | Google user ID |
|
||||
| `account_id` | INTEGER | FOREIGN KEY | Linked account |
|
||||
|
||||
### Fingerprint Data (acoustid_fingerprint)
|
||||
|
||||
#### `track`
|
||||
|
||||
Unique audio tracks identified by fingerprints.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Track ID |
|
||||
| `gid` | UUID | UNIQUE, NOT NULL | Public track UUID |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
| `new_id` | INTEGER | FOREIGN KEY | Merge target (if merged) |
|
||||
| `disabled` | BOOLEAN | DEFAULT FALSE | Disabled flag |
|
||||
|
||||
**Indexes**:
|
||||
- `track_pkey` (PRIMARY KEY on `id`)
|
||||
- `track_gid_key` (UNIQUE on `gid`)
|
||||
- `track_new_id_idx` (on `new_id`)
|
||||
|
||||
**Notes**:
|
||||
- `gid` is the public-facing AcoustID track ID
|
||||
- `new_id` points to merged track (for deduplication)
|
||||
- Disabled tracks excluded from search results
|
||||
|
||||
#### `fingerprint`
|
||||
|
||||
Audio fingerprints linked to tracks.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Fingerprint ID |
|
||||
| `track_id` | INTEGER | FOREIGN KEY | Linked track |
|
||||
| `fingerprint` | INTEGER[] | NOT NULL | Chromaprint hash array |
|
||||
| `length` | SMALLINT | NOT NULL | Duration in seconds |
|
||||
| `bitrate` | SMALLINT | | Audio bitrate (kbps) |
|
||||
| `format_id` | INTEGER | FOREIGN KEY | Audio format |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
||||
|
||||
**Indexes**:
|
||||
- `fingerprint_pkey` (PRIMARY KEY on `id`)
|
||||
- `fingerprint_track_id_idx` (on `track_id`)
|
||||
- `fingerprint_length_idx` (on `length`)
|
||||
- `fingerprint_fingerprint_idx` (GIN on `fingerprint` using `intarray`)
|
||||
|
||||
**Notes**:
|
||||
- `fingerprint` is an array of 32-bit integers (Chromaprint hashes)
|
||||
- GIN index enables fast similarity search
|
||||
- `submission_count` tracks popularity
|
||||
|
||||
#### `fingerprint_data`
|
||||
|
||||
Extended fingerprint data with simhash.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `fingerprint_id` | INTEGER | PRIMARY KEY, FOREIGN KEY | Fingerprint ID |
|
||||
| `fingerprint` | BYTEA | NOT NULL | Raw fingerprint data |
|
||||
| `simhash` | CUBE | | Locality-sensitive hash |
|
||||
|
||||
**Indexes**:
|
||||
- `fingerprint_data_pkey` (PRIMARY KEY on `fingerprint_id`)
|
||||
- `fingerprint_data_simhash_idx` (GIST on `simhash`)
|
||||
|
||||
**Notes**:
|
||||
- `fingerprint` stores compressed Chromaprint data
|
||||
- `simhash` enables approximate nearest neighbor search
|
||||
- GIST index for fast similarity queries
|
||||
|
||||
#### `track_mbid`
|
||||
|
||||
Links tracks to MusicBrainz recordings.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
||||
| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
|
||||
| `mbid` | UUID | NOT NULL | MusicBrainz recording MBID |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
||||
| `disabled` | BOOLEAN | DEFAULT FALSE | Disabled flag |
|
||||
|
||||
**Indexes**:
|
||||
- `track_mbid_pkey` (PRIMARY KEY on `id`)
|
||||
- `track_mbid_track_id_mbid_key` (UNIQUE on `track_id, mbid`)
|
||||
- `track_mbid_mbid_idx` (on `mbid`)
|
||||
|
||||
**Notes**:
|
||||
- Multiple MBIDs per track possible (different recordings)
|
||||
- `submission_count` indicates confidence
|
||||
- Disabled links excluded from results
|
||||
|
||||
#### `meta`
|
||||
|
||||
User-submitted metadata.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Metadata ID |
|
||||
| `track` | VARCHAR(255) | | Track title |
|
||||
| `artist` | VARCHAR(255) | | Artist name |
|
||||
| `album` | VARCHAR(255) | | Album title |
|
||||
| `album_artist` | VARCHAR(255) | | Album artist |
|
||||
| `track_no` | INTEGER | | Track number |
|
||||
| `disc_no` | INTEGER | | Disc number |
|
||||
| `year` | INTEGER | | Release year |
|
||||
|
||||
**Indexes**:
|
||||
- `meta_pkey` (PRIMARY KEY on `id`)
|
||||
|
||||
#### `track_meta`
|
||||
|
||||
Links tracks to user metadata.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
||||
| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
|
||||
| `meta_id` | INTEGER | FOREIGN KEY | Metadata record |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
||||
|
||||
**Indexes**:
|
||||
- `track_meta_pkey` (PRIMARY KEY on `id`)
|
||||
- `track_meta_track_id_meta_id_key` (UNIQUE on `track_id, meta_id`)
|
||||
|
||||
#### `format`
|
||||
|
||||
Audio file formats.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Format ID |
|
||||
| `name` | VARCHAR(20) | UNIQUE, NOT NULL | Format name (mp3, flac, etc.) |
|
||||
|
||||
**Indexes**:
|
||||
- `format_pkey` (PRIMARY KEY on `id`)
|
||||
- `format_name_key` (UNIQUE on `name`)
|
||||
|
||||
**Common Values**:
|
||||
- `mp3`, `flac`, `ogg`, `m4a`, `wma`, `ape`, `wav`
|
||||
|
||||
#### `source`
|
||||
|
||||
Submission sources (applications).
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Source ID |
|
||||
| `application_id` | INTEGER | FOREIGN KEY | Application |
|
||||
| `account_id` | INTEGER | FOREIGN KEY | User account |
|
||||
| `version` | VARCHAR(255) | | Application version |
|
||||
|
||||
**Indexes**:
|
||||
- `source_pkey` (PRIMARY KEY on `id`)
|
||||
- `source_application_id_account_id_version_key` (UNIQUE on `application_id, account_id, version`)
|
||||
|
||||
### Foreign IDs (acoustid_fingerprint)
|
||||
|
||||
#### `foreignid_vendor`
|
||||
|
||||
External ID providers.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Vendor ID |
|
||||
| `name` | VARCHAR(255) | UNIQUE, NOT NULL | Vendor name |
|
||||
|
||||
**Indexes**:
|
||||
- `foreignid_vendor_pkey` (PRIMARY KEY on `id`)
|
||||
- `foreignid_vendor_name_key` (UNIQUE on `name`)
|
||||
|
||||
**Common Values**:
|
||||
- `musicbrainz`, `musicip`, `discogs`, `spotify`
|
||||
|
||||
#### `foreignid`
|
||||
|
||||
External identifiers.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Foreign ID |
|
||||
| `vendor_id` | INTEGER | FOREIGN KEY | Vendor |
|
||||
| `name` | VARCHAR(255) | NOT NULL | External ID value |
|
||||
|
||||
**Indexes**:
|
||||
- `foreignid_pkey` (PRIMARY KEY on `id`)
|
||||
- `foreignid_vendor_id_name_key` (UNIQUE on `vendor_id, name`)
|
||||
|
||||
#### `track_foreignid`
|
||||
|
||||
Links tracks to external IDs.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
||||
| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
|
||||
| `foreignid_id` | INTEGER | FOREIGN KEY | External ID |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
||||
|
||||
**Indexes**:
|
||||
- `track_foreignid_pkey` (PRIMARY KEY on `id`)
|
||||
- `track_foreignid_track_id_foreignid_id_key` (UNIQUE on `track_id, foreignid_id`)
|
||||
|
||||
#### `track_puid`
|
||||
|
||||
Legacy MusicIP PUID links.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
||||
| `track_id` | INTEGER | FOREIGN KEY | AcoustID track |
|
||||
| `puid` | UUID | NOT NULL | MusicIP PUID |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
| `submission_count` | INTEGER | DEFAULT 1 | Number of submissions |
|
||||
|
||||
**Indexes**:
|
||||
- `track_puid_pkey` (PRIMARY KEY on `id`)
|
||||
- `track_puid_track_id_puid_key` (UNIQUE on `track_id, puid`)
|
||||
- `track_puid_puid_idx` (on `puid`)
|
||||
|
||||
### Statistics (acoustid_app)
|
||||
|
||||
#### `stats`
|
||||
|
||||
General statistics.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Stat ID |
|
||||
| `name` | VARCHAR(255) | UNIQUE, NOT NULL | Stat name |
|
||||
| `value` | INTEGER | NOT NULL | Stat value |
|
||||
| `date` | DATE | NOT NULL | Stat date |
|
||||
|
||||
**Indexes**:
|
||||
- `stats_pkey` (PRIMARY KEY on `id`)
|
||||
- `stats_name_date_key` (UNIQUE on `name, date`)
|
||||
|
||||
**Common Stats**:
|
||||
- `lookup.count`, `submission.count`, `track.count`, `fingerprint.count`
|
||||
|
||||
#### `stats_lookups`
|
||||
|
||||
Lookup statistics by hour.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Stat ID |
|
||||
| `hour` | TIMESTAMP | NOT NULL | Hour timestamp |
|
||||
| `application_id` | INTEGER | FOREIGN KEY | Application |
|
||||
| `count_hits` | INTEGER | DEFAULT 0 | Successful lookups |
|
||||
| `count_misses` | INTEGER | DEFAULT 0 | Failed lookups |
|
||||
|
||||
**Indexes**:
|
||||
- `stats_lookups_pkey` (PRIMARY KEY on `id`)
|
||||
- `stats_lookups_hour_application_id_key` (UNIQUE on `hour, application_id`)
|
||||
|
||||
#### `stats_user_agents`
|
||||
|
||||
User agent statistics.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Stat ID |
|
||||
| `date` | DATE | NOT NULL | Date |
|
||||
| `application_id` | INTEGER | FOREIGN KEY | Application |
|
||||
| `user_agent` | VARCHAR(1000) | NOT NULL | User agent string |
|
||||
| `ip` | INET | NOT NULL | IP address |
|
||||
| `count` | INTEGER | DEFAULT 0 | Request count |
|
||||
|
||||
**Indexes**:
|
||||
- `stats_user_agents_pkey` (PRIMARY KEY on `id`)
|
||||
- `stats_user_agents_date_application_id_user_agent_ip_key` (UNIQUE on `date, application_id, user_agent, ip`)
|
||||
|
||||
#### `stats_top_accounts`
|
||||
|
||||
Top submitter accounts.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Stat ID |
|
||||
| `account_id` | INTEGER | FOREIGN KEY | Account |
|
||||
| `count` | INTEGER | NOT NULL | Submission count |
|
||||
|
||||
**Indexes**:
|
||||
- `stats_top_accounts_pkey` (PRIMARY KEY on `id`)
|
||||
- `stats_top_accounts_account_id_key` (UNIQUE on `account_id`)
|
||||
|
||||
### Submission Processing (acoustid_ingest)
|
||||
|
||||
#### `submission`
|
||||
|
||||
Pending fingerprint submissions.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Submission ID |
|
||||
| `fingerprint` | INTEGER[] | NOT NULL | Chromaprint hash array |
|
||||
| `length` | SMALLINT | NOT NULL | Duration in seconds |
|
||||
| `bitrate` | SMALLINT | | Audio bitrate |
|
||||
| `format_id` | INTEGER | | Audio format |
|
||||
| `created` | TIMESTAMP | NOT NULL | Submission timestamp |
|
||||
| `source_id` | INTEGER | FOREIGN KEY | Submission source |
|
||||
| `mbid` | UUID | | MusicBrainz MBID (if provided) |
|
||||
| `handled` | BOOLEAN | DEFAULT FALSE | Processing status |
|
||||
| `meta_id` | INTEGER | FOREIGN KEY | User metadata |
|
||||
|
||||
**Indexes**:
|
||||
- `submission_pkey` (PRIMARY KEY on `id`)
|
||||
- `submission_handled_idx` (on `handled` WHERE `handled = FALSE`)
|
||||
|
||||
**Notes**:
|
||||
- Worker processes unhandled submissions
|
||||
- `handled = TRUE` after processing
|
||||
|
||||
#### `submission_result`
|
||||
|
||||
Processing results for submissions.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Result ID |
|
||||
| `submission_id` | INTEGER | FOREIGN KEY | Submission |
|
||||
| `track_id` | INTEGER | FOREIGN KEY | Matched/created track |
|
||||
| `created` | TIMESTAMP | NOT NULL | Processing timestamp |
|
||||
|
||||
**Indexes**:
|
||||
- `submission_result_pkey` (PRIMARY KEY on `id`)
|
||||
- `submission_result_submission_id_key` (UNIQUE on `submission_id`)
|
||||
|
||||
#### `pending_submission`
|
||||
|
||||
Queue for async submission processing.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Queue ID |
|
||||
| `submission_id` | INTEGER | FOREIGN KEY | Submission |
|
||||
| `created` | TIMESTAMP | NOT NULL | Queue timestamp |
|
||||
|
||||
**Indexes**:
|
||||
- `pending_submission_pkey` (PRIMARY KEY on `id`)
|
||||
- `pending_submission_submission_id_key` (UNIQUE on `submission_id`)
|
||||
|
||||
**Notes**:
|
||||
- Replaced by NATS queue in newer deployments
|
||||
- Legacy table, may be deprecated
|
||||
|
||||
### Provenance Tables (acoustid_fingerprint)
|
||||
|
||||
Track data lineage and changes.
|
||||
|
||||
#### `fingerprint_source`
|
||||
|
||||
Links fingerprints to submission sources.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
||||
| `fingerprint_id` | INTEGER | FOREIGN KEY | Fingerprint |
|
||||
| `source_id` | INTEGER | FOREIGN KEY | Source |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
|
||||
#### `track_mbid_source`
|
||||
|
||||
Links track-MBID associations to sources.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Link ID |
|
||||
| `track_mbid_id` | INTEGER | FOREIGN KEY | Track-MBID link |
|
||||
| `source_id` | INTEGER | FOREIGN KEY | Source |
|
||||
| `created` | TIMESTAMP | NOT NULL | Creation timestamp |
|
||||
|
||||
#### `track_mbid_change`
|
||||
|
||||
Audit log for track-MBID changes.
|
||||
|
||||
| Column | Type | Constraints | Description |
|
||||
|--------|------|-------------|-------------|
|
||||
| `id` | SERIAL | PRIMARY KEY | Change ID |
|
||||
| `track_mbid_id` | INTEGER | FOREIGN KEY | Track-MBID link |
|
||||
| `account_id` | INTEGER | FOREIGN KEY | Account that made change |
|
||||
| `disabled` | BOOLEAN | NOT NULL | New disabled status |
|
||||
| `created` | TIMESTAMP | NOT NULL | Change timestamp |
|
||||
| `note` | TEXT | | Change reason |
|
||||
|
||||
## ORM Layer (SQLAlchemy)
|
||||
|
||||
### Multi-Database Configuration
|
||||
|
||||
**File**: `acoustid/db.py`
|
||||
|
||||
```python
|
||||
# Database bind keys
|
||||
BIND_KEYS = {
|
||||
'app': 'acoustid_app',
|
||||
'fingerprint': 'acoustid_fingerprint',
|
||||
'ingest': 'acoustid_ingest',
|
||||
'musicbrainz': 'musicbrainz'
|
||||
}
|
||||
```
|
||||
|
||||
**Model Binding**:
|
||||
|
||||
```python
|
||||
class Account(Base):
|
||||
__bind_key__ = 'app'
|
||||
__tablename__ = 'account'
|
||||
# ...
|
||||
|
||||
class Track(Base):
|
||||
__bind_key__ = 'fingerprint'
|
||||
__tablename__ = 'track'
|
||||
# ...
|
||||
```
|
||||
|
||||
### Connection Pooling
|
||||
|
||||
**Configuration** (`acoustid.conf`):
|
||||
|
||||
```ini
|
||||
[database]
|
||||
name = acoustid_app
|
||||
user = acoustid
|
||||
password_file = /run/secrets/db_password
|
||||
host = postgres
|
||||
port = 5432
|
||||
pool_size = 20
|
||||
pool_recycle = 3600
|
||||
```
|
||||
|
||||
**Pool Settings**:
|
||||
- `pool_size`: Maximum connections per process
|
||||
- `pool_recycle`: Recycle connections after N seconds
|
||||
- `pool_pre_ping`: Test connections before use
|
||||
|
||||
### Query Patterns
|
||||
|
||||
**Fingerprint Search** (legacy, pre-index):
|
||||
|
||||
```python
|
||||
# Find similar fingerprints using intarray overlap
|
||||
query = db.session.query(Fingerprint).filter(
|
||||
Fingerprint.fingerprint.op('&&')(query_fingerprint),
|
||||
Fingerprint.length.between(duration - 5, duration + 5)
|
||||
).order_by(
|
||||
func.acoustid_compare(Fingerprint.fingerprint, query_fingerprint).desc()
|
||||
).limit(10)
|
||||
```
|
||||
|
||||
**Track Lookup with MBIDs**:
|
||||
|
||||
```python
|
||||
# Fetch track with all linked MBIDs
|
||||
track = db.session.query(Track).options(
|
||||
joinedload(Track.mbids)
|
||||
).filter(Track.gid == track_gid).first()
|
||||
```
|
||||
|
||||
**Submission Processing**:
|
||||
|
||||
```python
|
||||
# Find unhandled submissions
|
||||
submissions = db.session.query(Submission).filter(
|
||||
Submission.handled == False
|
||||
).order_by(Submission.created).limit(100).all()
|
||||
```
|
||||
|
||||
## Database Migrations
|
||||
|
||||
### Alembic Configuration
|
||||
|
||||
**File**: `alembic.ini`
|
||||
|
||||
**Migration Directories**:
|
||||
- `alembic/versions/app/`: acoustid_app migrations
|
||||
- `alembic/versions/fingerprint/`: acoustid_fingerprint migrations
|
||||
- `alembic/versions/ingest/`: acoustid_ingest migrations
|
||||
|
||||
**Multi-Database Support**:
|
||||
|
||||
```python
|
||||
# alembic/env.py
|
||||
def run_migrations_online():
|
||||
for bind_key in ['app', 'fingerprint', 'ingest']:
|
||||
engine = get_engine(bind_key)
|
||||
with engine.connect() as connection:
|
||||
context.configure(
|
||||
connection=connection,
|
||||
target_metadata=get_metadata(bind_key)
|
||||
)
|
||||
with context.begin_transaction():
|
||||
context.run_migrations()
|
||||
```
|
||||
|
||||
### Migration Commands
|
||||
|
||||
```bash
|
||||
# Create new migration
|
||||
alembic revision --autogenerate -m "Add new column"
|
||||
|
||||
# Apply migrations
|
||||
alembic upgrade head
|
||||
|
||||
# Rollback migration
|
||||
alembic downgrade -1
|
||||
|
||||
# Show current version
|
||||
alembic current
|
||||
|
||||
# Show migration history
|
||||
alembic history
|
||||
```
|
||||
|
||||
## Redis Data Structures
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
**Key Pattern**: `rl:bucket:{scope}:{identifier}:{timestamp}`
|
||||
|
||||
**Example Keys**:
|
||||
```
|
||||
rl:bucket:global:1714305600
|
||||
rl:bucket:app:8XaBELgH:1714305600
|
||||
rl:bucket:ip:192.168.1.1:1714305600
|
||||
```
|
||||
|
||||
**Value**: Integer (request count)
|
||||
**TTL**: 25 seconds (window duration + buffer)
|
||||
|
||||
**Algorithm**:
|
||||
```python
|
||||
# Increment bucket for current window
|
||||
bucket_key = f"rl:bucket:{scope}:{identifier}:{current_window}"
|
||||
count = redis.incr(bucket_key)
|
||||
redis.expire(bucket_key, 25)
|
||||
|
||||
# Sum counts across all windows in sliding window
|
||||
total = sum(redis.get(f"rl:bucket:{scope}:{identifier}:{w}")
|
||||
for w in windows)
|
||||
```
|
||||
|
||||
### Task Queue (Legacy)
|
||||
|
||||
**Key Pattern**: `queue:{queue_name}`
|
||||
|
||||
**Operations**:
|
||||
```python
|
||||
# Push task
|
||||
redis.rpush('queue:submissions', json.dumps(task_data))
|
||||
|
||||
# Pop task
|
||||
task_data = redis.lpop('queue:submissions')
|
||||
```
|
||||
|
||||
**Note**: Being replaced by NATS in newer deployments
|
||||
|
||||
### API Key Cache
|
||||
|
||||
**Implementation**: In-memory TTLCache (not Redis)
|
||||
|
||||
```python
|
||||
from cachetools import TTLCache
|
||||
|
||||
api_key_cache = TTLCache(maxsize=1000, ttl=60)
|
||||
```
|
||||
|
||||
**Purpose**: Reduce database queries for API key validation
|
||||
|
||||
### Backfill State
|
||||
|
||||
**Key Pattern**: `backfill:{index_name}:{state_key}`
|
||||
|
||||
**Example Keys**:
|
||||
```
|
||||
backfill:fingerprints:last_id
|
||||
backfill:fingerprints:batch_size
|
||||
backfill:fingerprints:completed
|
||||
```
|
||||
|
||||
**Purpose**: Track progress of index backfill operations
|
||||
|
||||
### Unknown MBID Cache
|
||||
|
||||
**Key Pattern**: `unknown_mbid:{mbid}`
|
||||
|
||||
**Value**: Boolean (1 if MBID not found in MusicBrainz)
|
||||
**TTL**: 3600 seconds (1 hour)
|
||||
|
||||
**Purpose**: Avoid repeated MusicBrainz queries for non-existent MBIDs
|
||||
|
||||
## Data Integrity
|
||||
|
||||
### Constraints
|
||||
|
||||
**Foreign Keys**:
|
||||
- All foreign keys have `ON DELETE CASCADE` or `ON DELETE SET NULL`
|
||||
- Orphaned records cleaned up automatically
|
||||
|
||||
**Unique Constraints**:
|
||||
- Prevent duplicate fingerprints per track
|
||||
- Prevent duplicate MBID links per track
|
||||
- Ensure API key uniqueness
|
||||
|
||||
**Check Constraints**:
|
||||
- Duration must be positive
|
||||
- Bitrate must be positive
|
||||
- Submission count must be non-negative
|
||||
|
||||
### Triggers
|
||||
|
||||
**Update Submission Count**:
|
||||
```sql
|
||||
CREATE TRIGGER update_fingerprint_submission_count
|
||||
AFTER INSERT ON fingerprint_source
|
||||
FOR EACH ROW
|
||||
EXECUTE FUNCTION increment_submission_count();
|
||||
```
|
||||
|
||||
**Track Merge Propagation**:
|
||||
```sql
|
||||
CREATE TRIGGER propagate_track_merge
|
||||
AFTER UPDATE OF new_id ON track
|
||||
FOR EACH ROW
|
||||
EXECUTE FUNCTION update_merged_track_references();
|
||||
```
|
||||
|
||||
### Indexes for Performance
|
||||
|
||||
**Covering Indexes**:
|
||||
```sql
|
||||
-- Lookup by fingerprint and duration
|
||||
CREATE INDEX fingerprint_lookup_idx
|
||||
ON fingerprint (length, track_id)
|
||||
INCLUDE (fingerprint);
|
||||
```
|
||||
|
||||
**Partial Indexes**:
|
||||
```sql
|
||||
-- Only index unhandled submissions
|
||||
CREATE INDEX submission_unhandled_idx
|
||||
ON submission (created)
|
||||
WHERE handled = FALSE;
|
||||
```
|
||||
|
||||
**GIN Indexes**:
|
||||
```sql
|
||||
-- Fast fingerprint array queries
|
||||
CREATE INDEX fingerprint_fingerprint_idx
|
||||
ON fingerprint USING GIN (fingerprint gin__int_ops);
|
||||
```
|
||||
|
||||
## Data Lifecycle
|
||||
|
||||
### Fingerprint Submission
|
||||
|
||||
1. Insert into `submission` table (acoustid_ingest)
|
||||
2. Publish to NATS queue
|
||||
3. Worker processes submission
|
||||
4. Insert into `fingerprint` table (acoustid_fingerprint)
|
||||
5. Link to `track` (create or match)
|
||||
6. Insert into `fingerprint_source` (provenance)
|
||||
7. Update index via HTTP API
|
||||
8. Insert into `submission_result`
|
||||
9. Mark `submission.handled = TRUE`
|
||||
|
||||
### Track Merging
|
||||
|
||||
1. Identify duplicate tracks (manual or automated)
|
||||
2. Set `track.new_id` to target track
|
||||
3. Trigger updates all references
|
||||
4. Merge fingerprints, MBIDs, metadata
|
||||
5. Disable old track (`track.disabled = TRUE`)
|
||||
|
||||
### Data Cleanup
|
||||
|
||||
**Cron Jobs**:
|
||||
- Delete old handled submissions (>30 days)
|
||||
- Clean up orphaned metadata records
|
||||
- Remove disabled tracks with no references
|
||||
- Archive old statistics
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Query Optimization
|
||||
|
||||
**Materialized Views**:
|
||||
```sql
|
||||
CREATE MATERIALIZED VIEW track_stats AS
|
||||
SELECT
|
||||
track_id,
|
||||
COUNT(DISTINCT fingerprint_id) AS fingerprint_count,
|
||||
COUNT(DISTINCT mbid) AS mbid_count,
|
||||
SUM(submission_count) AS total_submissions
|
||||
FROM fingerprint
|
||||
LEFT JOIN track_mbid USING (track_id)
|
||||
GROUP BY track_id;
|
||||
```
|
||||
|
||||
**Partitioning** (future):
|
||||
```sql
|
||||
-- Partition submissions by month
|
||||
CREATE TABLE submission_2025_04 PARTITION OF submission
|
||||
FOR VALUES FROM ('2025-04-01') TO ('2025-05-01');
|
||||
```
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
**Application-Level**:
|
||||
- API key validation (TTLCache, 60s)
|
||||
- Format ID lookup (permanent cache)
|
||||
- MusicBrainz MBID existence (Redis, 1h)
|
||||
|
||||
**Database-Level**:
|
||||
- Shared buffers (PostgreSQL config)
|
||||
- Connection pooling (SQLAlchemy)
|
||||
- Query result caching (pg_stat_statements)
|
||||
|
||||
### Bulk Operations
|
||||
|
||||
**Batch Inserts**:
|
||||
```python
|
||||
# Insert multiple fingerprints efficiently
|
||||
db.session.bulk_insert_mappings(Fingerprint, fingerprint_dicts)
|
||||
db.session.commit()
|
||||
```
|
||||
|
||||
**Bulk Updates**:
|
||||
```python
|
||||
# Update submission counts in batch
|
||||
db.session.execute(
|
||||
update(Fingerprint).where(
|
||||
Fingerprint.id.in_(fingerprint_ids)
|
||||
).values(
|
||||
submission_count=Fingerprint.submission_count + 1
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
**PostgreSQL**:
|
||||
- Daily full backups (pg_dump)
|
||||
- Continuous WAL archiving
|
||||
- Point-in-time recovery enabled
|
||||
|
||||
**Index**:
|
||||
- Daily snapshots via `/:index/_snapshot`
|
||||
- Incremental backups of Oplog
|
||||
- Segment files backed up separately
|
||||
|
||||
### Disaster Recovery
|
||||
|
||||
**Database Restore**:
|
||||
```bash
|
||||
# Restore from dump
|
||||
pg_restore -d acoustid_app acoustid_app_backup.dump
|
||||
|
||||
# Point-in-time recovery
|
||||
pg_restore --target-time='2025-04-28 12:00:00'
|
||||
```
|
||||
|
||||
**Index Rebuild**:
|
||||
```bash
|
||||
# Rebuild from database
|
||||
python manage.py run import --rebuild-index
|
||||
```
|
||||
Reference in New Issue
Block a user