# AcoustID Data Model ## Database Architecture AcoustID uses a multi-database PostgreSQL architecture with separate databases for different concerns. ### Database Instances | Database | Purpose | Tables | Extensions | |----------|---------|--------|------------| | `acoustid_app` | Application data (accounts, apps, stats) | 8 | pgcrypto | | `acoustid_fingerprint` | Fingerprint and track data | 19 | intarray, acoustid, cube | | `acoustid_ingest` | Submission processing | 3 | - | | `musicbrainz` | MusicBrainz mirror (read-only) | Many | - | ### PostgreSQL Extensions **intarray**: Integer array operations - Used for fingerprint array queries - Provides `&&` (overlap) and `@>` (contains) operators **pgcrypto**: Cryptographic functions - UUID generation (`gen_random_uuid()`) - API key hashing **acoustid** (custom): Fingerprint similarity functions - `acoustid_compare(int[], int[])`: Compare two fingerprints - `acoustid_extract_query(int[])`: Extract query terms - Source: `acoustid-ext` C extension **cube**: Multi-dimensional cube data type - Used for simhash-based fingerprint indexing - Enables fast approximate nearest neighbor search ## Core Tables ### Account Management (acoustid_app) #### `account` User accounts for API access. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Account ID | | `name` | VARCHAR(255) | NOT NULL | Display name | | `apikey` | VARCHAR(40) | UNIQUE, NOT NULL | API key (user key) | | `mbuser` | VARCHAR(64) | UNIQUE | MusicBrainz username | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | | `lastlogin` | TIMESTAMP | | Last login timestamp | | `submission_count` | INTEGER | DEFAULT 0 | Total submissions | | `application_id` | INTEGER | FOREIGN KEY | Default application | | `application_version` | VARCHAR(255) | | Application version | | `created_from` | INET | | Registration IP | | `is_admin` | BOOLEAN | DEFAULT FALSE | Admin flag | **Indexes**: - `account_pkey` (PRIMARY KEY on `id`) - `account_apikey_key` (UNIQUE on `apikey`) - `account_mbuser_key` (UNIQUE on `mbuser`) #### `application` API client applications. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Application ID | | `name` | VARCHAR(255) | NOT NULL | Application name | | `version` | VARCHAR(255) | | Version string | | `apikey` | VARCHAR(40) | UNIQUE, NOT NULL | API key (client key) | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | | `active` | BOOLEAN | DEFAULT TRUE | Active status | | `account_id` | INTEGER | FOREIGN KEY | Owner account | | `email` | VARCHAR(255) | | Contact email | | `website` | VARCHAR(1000) | | Website URL | | `rate_limit` | INTEGER | | Custom rate limit (req/s) | **Indexes**: - `application_pkey` (PRIMARY KEY on `id`) - `application_apikey_key` (UNIQUE on `apikey`) #### `account_openid` OpenID authentication links. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `openid` | VARCHAR(255) | PRIMARY KEY | OpenID identifier | | `account_id` | INTEGER | FOREIGN KEY | Linked account | #### `account_google` Google OAuth authentication links. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `google_user_id` | VARCHAR(255) | PRIMARY KEY | Google user ID | | `account_id` | INTEGER | FOREIGN KEY | Linked account | ### Fingerprint Data (acoustid_fingerprint) #### `track` Unique audio tracks identified by fingerprints. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Track ID | | `gid` | UUID | UNIQUE, NOT NULL | Public track UUID | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | | `new_id` | INTEGER | FOREIGN KEY | Merge target (if merged) | | `disabled` | BOOLEAN | DEFAULT FALSE | Disabled flag | **Indexes**: - `track_pkey` (PRIMARY KEY on `id`) - `track_gid_key` (UNIQUE on `gid`) - `track_new_id_idx` (on `new_id`) **Notes**: - `gid` is the public-facing AcoustID track ID - `new_id` points to merged track (for deduplication) - Disabled tracks excluded from search results #### `fingerprint` Audio fingerprints linked to tracks. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Fingerprint ID | | `track_id` | INTEGER | FOREIGN KEY | Linked track | | `fingerprint` | INTEGER[] | NOT NULL | Chromaprint hash array | | `length` | SMALLINT | NOT NULL | Duration in seconds | | `bitrate` | SMALLINT | | Audio bitrate (kbps) | | `format_id` | INTEGER | FOREIGN KEY | Audio format | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | | `submission_count` | INTEGER | DEFAULT 1 | Number of submissions | **Indexes**: - `fingerprint_pkey` (PRIMARY KEY on `id`) - `fingerprint_track_id_idx` (on `track_id`) - `fingerprint_length_idx` (on `length`) - `fingerprint_fingerprint_idx` (GIN on `fingerprint` using `intarray`) **Notes**: - `fingerprint` is an array of 32-bit integers (Chromaprint hashes) - GIN index enables fast similarity search - `submission_count` tracks popularity #### `fingerprint_data` Extended fingerprint data with simhash. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `fingerprint_id` | INTEGER | PRIMARY KEY, FOREIGN KEY | Fingerprint ID | | `fingerprint` | BYTEA | NOT NULL | Raw fingerprint data | | `simhash` | CUBE | | Locality-sensitive hash | **Indexes**: - `fingerprint_data_pkey` (PRIMARY KEY on `fingerprint_id`) - `fingerprint_data_simhash_idx` (GIST on `simhash`) **Notes**: - `fingerprint` stores compressed Chromaprint data - `simhash` enables approximate nearest neighbor search - GIST index for fast similarity queries #### `track_mbid` Links tracks to MusicBrainz recordings. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Link ID | | `track_id` | INTEGER | FOREIGN KEY | AcoustID track | | `mbid` | UUID | NOT NULL | MusicBrainz recording MBID | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | | `submission_count` | INTEGER | DEFAULT 1 | Number of submissions | | `disabled` | BOOLEAN | DEFAULT FALSE | Disabled flag | **Indexes**: - `track_mbid_pkey` (PRIMARY KEY on `id`) - `track_mbid_track_id_mbid_key` (UNIQUE on `track_id, mbid`) - `track_mbid_mbid_idx` (on `mbid`) **Notes**: - Multiple MBIDs per track possible (different recordings) - `submission_count` indicates confidence - Disabled links excluded from results #### `meta` User-submitted metadata. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Metadata ID | | `track` | VARCHAR(255) | | Track title | | `artist` | VARCHAR(255) | | Artist name | | `album` | VARCHAR(255) | | Album title | | `album_artist` | VARCHAR(255) | | Album artist | | `track_no` | INTEGER | | Track number | | `disc_no` | INTEGER | | Disc number | | `year` | INTEGER | | Release year | **Indexes**: - `meta_pkey` (PRIMARY KEY on `id`) #### `track_meta` Links tracks to user metadata. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Link ID | | `track_id` | INTEGER | FOREIGN KEY | AcoustID track | | `meta_id` | INTEGER | FOREIGN KEY | Metadata record | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | | `submission_count` | INTEGER | DEFAULT 1 | Number of submissions | **Indexes**: - `track_meta_pkey` (PRIMARY KEY on `id`) - `track_meta_track_id_meta_id_key` (UNIQUE on `track_id, meta_id`) #### `format` Audio file formats. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Format ID | | `name` | VARCHAR(20) | UNIQUE, NOT NULL | Format name (mp3, flac, etc.) | **Indexes**: - `format_pkey` (PRIMARY KEY on `id`) - `format_name_key` (UNIQUE on `name`) **Common Values**: - `mp3`, `flac`, `ogg`, `m4a`, `wma`, `ape`, `wav` #### `source` Submission sources (applications). | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Source ID | | `application_id` | INTEGER | FOREIGN KEY | Application | | `account_id` | INTEGER | FOREIGN KEY | User account | | `version` | VARCHAR(255) | | Application version | **Indexes**: - `source_pkey` (PRIMARY KEY on `id`) - `source_application_id_account_id_version_key` (UNIQUE on `application_id, account_id, version`) ### Foreign IDs (acoustid_fingerprint) #### `foreignid_vendor` External ID providers. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Vendor ID | | `name` | VARCHAR(255) | UNIQUE, NOT NULL | Vendor name | **Indexes**: - `foreignid_vendor_pkey` (PRIMARY KEY on `id`) - `foreignid_vendor_name_key` (UNIQUE on `name`) **Common Values**: - `musicbrainz`, `musicip`, `discogs`, `spotify` #### `foreignid` External identifiers. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Foreign ID | | `vendor_id` | INTEGER | FOREIGN KEY | Vendor | | `name` | VARCHAR(255) | NOT NULL | External ID value | **Indexes**: - `foreignid_pkey` (PRIMARY KEY on `id`) - `foreignid_vendor_id_name_key` (UNIQUE on `vendor_id, name`) #### `track_foreignid` Links tracks to external IDs. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Link ID | | `track_id` | INTEGER | FOREIGN KEY | AcoustID track | | `foreignid_id` | INTEGER | FOREIGN KEY | External ID | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | | `submission_count` | INTEGER | DEFAULT 1 | Number of submissions | **Indexes**: - `track_foreignid_pkey` (PRIMARY KEY on `id`) - `track_foreignid_track_id_foreignid_id_key` (UNIQUE on `track_id, foreignid_id`) #### `track_puid` Legacy MusicIP PUID links. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Link ID | | `track_id` | INTEGER | FOREIGN KEY | AcoustID track | | `puid` | UUID | NOT NULL | MusicIP PUID | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | | `submission_count` | INTEGER | DEFAULT 1 | Number of submissions | **Indexes**: - `track_puid_pkey` (PRIMARY KEY on `id`) - `track_puid_track_id_puid_key` (UNIQUE on `track_id, puid`) - `track_puid_puid_idx` (on `puid`) ### Statistics (acoustid_app) #### `stats` General statistics. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Stat ID | | `name` | VARCHAR(255) | UNIQUE, NOT NULL | Stat name | | `value` | INTEGER | NOT NULL | Stat value | | `date` | DATE | NOT NULL | Stat date | **Indexes**: - `stats_pkey` (PRIMARY KEY on `id`) - `stats_name_date_key` (UNIQUE on `name, date`) **Common Stats**: - `lookup.count`, `submission.count`, `track.count`, `fingerprint.count` #### `stats_lookups` Lookup statistics by hour. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Stat ID | | `hour` | TIMESTAMP | NOT NULL | Hour timestamp | | `application_id` | INTEGER | FOREIGN KEY | Application | | `count_hits` | INTEGER | DEFAULT 0 | Successful lookups | | `count_misses` | INTEGER | DEFAULT 0 | Failed lookups | **Indexes**: - `stats_lookups_pkey` (PRIMARY KEY on `id`) - `stats_lookups_hour_application_id_key` (UNIQUE on `hour, application_id`) #### `stats_user_agents` User agent statistics. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Stat ID | | `date` | DATE | NOT NULL | Date | | `application_id` | INTEGER | FOREIGN KEY | Application | | `user_agent` | VARCHAR(1000) | NOT NULL | User agent string | | `ip` | INET | NOT NULL | IP address | | `count` | INTEGER | DEFAULT 0 | Request count | **Indexes**: - `stats_user_agents_pkey` (PRIMARY KEY on `id`) - `stats_user_agents_date_application_id_user_agent_ip_key` (UNIQUE on `date, application_id, user_agent, ip`) #### `stats_top_accounts` Top submitter accounts. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Stat ID | | `account_id` | INTEGER | FOREIGN KEY | Account | | `count` | INTEGER | NOT NULL | Submission count | **Indexes**: - `stats_top_accounts_pkey` (PRIMARY KEY on `id`) - `stats_top_accounts_account_id_key` (UNIQUE on `account_id`) ### Submission Processing (acoustid_ingest) #### `submission` Pending fingerprint submissions. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Submission ID | | `fingerprint` | INTEGER[] | NOT NULL | Chromaprint hash array | | `length` | SMALLINT | NOT NULL | Duration in seconds | | `bitrate` | SMALLINT | | Audio bitrate | | `format_id` | INTEGER | | Audio format | | `created` | TIMESTAMP | NOT NULL | Submission timestamp | | `source_id` | INTEGER | FOREIGN KEY | Submission source | | `mbid` | UUID | | MusicBrainz MBID (if provided) | | `handled` | BOOLEAN | DEFAULT FALSE | Processing status | | `meta_id` | INTEGER | FOREIGN KEY | User metadata | **Indexes**: - `submission_pkey` (PRIMARY KEY on `id`) - `submission_handled_idx` (on `handled` WHERE `handled = FALSE`) **Notes**: - Worker processes unhandled submissions - `handled = TRUE` after processing #### `submission_result` Processing results for submissions. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Result ID | | `submission_id` | INTEGER | FOREIGN KEY | Submission | | `track_id` | INTEGER | FOREIGN KEY | Matched/created track | | `created` | TIMESTAMP | NOT NULL | Processing timestamp | **Indexes**: - `submission_result_pkey` (PRIMARY KEY on `id`) - `submission_result_submission_id_key` (UNIQUE on `submission_id`) #### `pending_submission` Queue for async submission processing. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Queue ID | | `submission_id` | INTEGER | FOREIGN KEY | Submission | | `created` | TIMESTAMP | NOT NULL | Queue timestamp | **Indexes**: - `pending_submission_pkey` (PRIMARY KEY on `id`) - `pending_submission_submission_id_key` (UNIQUE on `submission_id`) **Notes**: - Replaced by NATS queue in newer deployments - Legacy table, may be deprecated ### Provenance Tables (acoustid_fingerprint) Track data lineage and changes. #### `fingerprint_source` Links fingerprints to submission sources. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Link ID | | `fingerprint_id` | INTEGER | FOREIGN KEY | Fingerprint | | `source_id` | INTEGER | FOREIGN KEY | Source | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | #### `track_mbid_source` Links track-MBID associations to sources. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Link ID | | `track_mbid_id` | INTEGER | FOREIGN KEY | Track-MBID link | | `source_id` | INTEGER | FOREIGN KEY | Source | | `created` | TIMESTAMP | NOT NULL | Creation timestamp | #### `track_mbid_change` Audit log for track-MBID changes. | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `id` | SERIAL | PRIMARY KEY | Change ID | | `track_mbid_id` | INTEGER | FOREIGN KEY | Track-MBID link | | `account_id` | INTEGER | FOREIGN KEY | Account that made change | | `disabled` | BOOLEAN | NOT NULL | New disabled status | | `created` | TIMESTAMP | NOT NULL | Change timestamp | | `note` | TEXT | | Change reason | ## ORM Layer (SQLAlchemy) ### Multi-Database Configuration **File**: `acoustid/db.py` ```python # Database bind keys BIND_KEYS = { 'app': 'acoustid_app', 'fingerprint': 'acoustid_fingerprint', 'ingest': 'acoustid_ingest', 'musicbrainz': 'musicbrainz' } ``` **Model Binding**: ```python class Account(Base): __bind_key__ = 'app' __tablename__ = 'account' # ... class Track(Base): __bind_key__ = 'fingerprint' __tablename__ = 'track' # ... ``` ### Connection Pooling **Configuration** (`acoustid.conf`): ```ini [database] name = acoustid_app user = acoustid password_file = /run/secrets/db_password host = postgres port = 5432 pool_size = 20 pool_recycle = 3600 ``` **Pool Settings**: - `pool_size`: Maximum connections per process - `pool_recycle`: Recycle connections after N seconds - `pool_pre_ping`: Test connections before use ### Query Patterns **Fingerprint Search** (legacy, pre-index): ```python # Find similar fingerprints using intarray overlap query = db.session.query(Fingerprint).filter( Fingerprint.fingerprint.op('&&')(query_fingerprint), Fingerprint.length.between(duration - 5, duration + 5) ).order_by( func.acoustid_compare(Fingerprint.fingerprint, query_fingerprint).desc() ).limit(10) ``` **Track Lookup with MBIDs**: ```python # Fetch track with all linked MBIDs track = db.session.query(Track).options( joinedload(Track.mbids) ).filter(Track.gid == track_gid).first() ``` **Submission Processing**: ```python # Find unhandled submissions submissions = db.session.query(Submission).filter( Submission.handled == False ).order_by(Submission.created).limit(100).all() ``` ## Database Migrations ### Alembic Configuration **File**: `alembic.ini` **Migration Directories**: - `alembic/versions/app/`: acoustid_app migrations - `alembic/versions/fingerprint/`: acoustid_fingerprint migrations - `alembic/versions/ingest/`: acoustid_ingest migrations **Multi-Database Support**: ```python # alembic/env.py def run_migrations_online(): for bind_key in ['app', 'fingerprint', 'ingest']: engine = get_engine(bind_key) with engine.connect() as connection: context.configure( connection=connection, target_metadata=get_metadata(bind_key) ) with context.begin_transaction(): context.run_migrations() ``` ### Migration Commands ```bash # Create new migration alembic revision --autogenerate -m "Add new column" # Apply migrations alembic upgrade head # Rollback migration alembic downgrade -1 # Show current version alembic current # Show migration history alembic history ``` ## Redis Data Structures ### Rate Limiting **Key Pattern**: `rl:bucket:{scope}:{identifier}:{timestamp}` **Example Keys**: ``` rl:bucket:global:1714305600 rl:bucket:app:8XaBELgH:1714305600 rl:bucket:ip:192.168.1.1:1714305600 ``` **Value**: Integer (request count) **TTL**: 25 seconds (window duration + buffer) **Algorithm**: ```python # Increment bucket for current window bucket_key = f"rl:bucket:{scope}:{identifier}:{current_window}" count = redis.incr(bucket_key) redis.expire(bucket_key, 25) # Sum counts across all windows in sliding window total = sum(redis.get(f"rl:bucket:{scope}:{identifier}:{w}") for w in windows) ``` ### Task Queue (Legacy) **Key Pattern**: `queue:{queue_name}` **Operations**: ```python # Push task redis.rpush('queue:submissions', json.dumps(task_data)) # Pop task task_data = redis.lpop('queue:submissions') ``` **Note**: Being replaced by NATS in newer deployments ### API Key Cache **Implementation**: In-memory TTLCache (not Redis) ```python from cachetools import TTLCache api_key_cache = TTLCache(maxsize=1000, ttl=60) ``` **Purpose**: Reduce database queries for API key validation ### Backfill State **Key Pattern**: `backfill:{index_name}:{state_key}` **Example Keys**: ``` backfill:fingerprints:last_id backfill:fingerprints:batch_size backfill:fingerprints:completed ``` **Purpose**: Track progress of index backfill operations ### Unknown MBID Cache **Key Pattern**: `unknown_mbid:{mbid}` **Value**: Boolean (1 if MBID not found in MusicBrainz) **TTL**: 3600 seconds (1 hour) **Purpose**: Avoid repeated MusicBrainz queries for non-existent MBIDs ## Data Integrity ### Constraints **Foreign Keys**: - All foreign keys have `ON DELETE CASCADE` or `ON DELETE SET NULL` - Orphaned records cleaned up automatically **Unique Constraints**: - Prevent duplicate fingerprints per track - Prevent duplicate MBID links per track - Ensure API key uniqueness **Check Constraints**: - Duration must be positive - Bitrate must be positive - Submission count must be non-negative ### Triggers **Update Submission Count**: ```sql CREATE TRIGGER update_fingerprint_submission_count AFTER INSERT ON fingerprint_source FOR EACH ROW EXECUTE FUNCTION increment_submission_count(); ``` **Track Merge Propagation**: ```sql CREATE TRIGGER propagate_track_merge AFTER UPDATE OF new_id ON track FOR EACH ROW EXECUTE FUNCTION update_merged_track_references(); ``` ### Indexes for Performance **Covering Indexes**: ```sql -- Lookup by fingerprint and duration CREATE INDEX fingerprint_lookup_idx ON fingerprint (length, track_id) INCLUDE (fingerprint); ``` **Partial Indexes**: ```sql -- Only index unhandled submissions CREATE INDEX submission_unhandled_idx ON submission (created) WHERE handled = FALSE; ``` **GIN Indexes**: ```sql -- Fast fingerprint array queries CREATE INDEX fingerprint_fingerprint_idx ON fingerprint USING GIN (fingerprint gin__int_ops); ``` ## Data Lifecycle ### Fingerprint Submission 1. Insert into `submission` table (acoustid_ingest) 2. Publish to NATS queue 3. Worker processes submission 4. Insert into `fingerprint` table (acoustid_fingerprint) 5. Link to `track` (create or match) 6. Insert into `fingerprint_source` (provenance) 7. Update index via HTTP API 8. Insert into `submission_result` 9. Mark `submission.handled = TRUE` ### Track Merging 1. Identify duplicate tracks (manual or automated) 2. Set `track.new_id` to target track 3. Trigger updates all references 4. Merge fingerprints, MBIDs, metadata 5. Disable old track (`track.disabled = TRUE`) ### Data Cleanup **Cron Jobs**: - Delete old handled submissions (>30 days) - Clean up orphaned metadata records - Remove disabled tracks with no references - Archive old statistics ## Performance Optimization ### Query Optimization **Materialized Views**: ```sql CREATE MATERIALIZED VIEW track_stats AS SELECT track_id, COUNT(DISTINCT fingerprint_id) AS fingerprint_count, COUNT(DISTINCT mbid) AS mbid_count, SUM(submission_count) AS total_submissions FROM fingerprint LEFT JOIN track_mbid USING (track_id) GROUP BY track_id; ``` **Partitioning** (future): ```sql -- Partition submissions by month CREATE TABLE submission_2025_04 PARTITION OF submission FOR VALUES FROM ('2025-04-01') TO ('2025-05-01'); ``` ### Caching Strategy **Application-Level**: - API key validation (TTLCache, 60s) - Format ID lookup (permanent cache) - MusicBrainz MBID existence (Redis, 1h) **Database-Level**: - Shared buffers (PostgreSQL config) - Connection pooling (SQLAlchemy) - Query result caching (pg_stat_statements) ### Bulk Operations **Batch Inserts**: ```python # Insert multiple fingerprints efficiently db.session.bulk_insert_mappings(Fingerprint, fingerprint_dicts) db.session.commit() ``` **Bulk Updates**: ```python # Update submission counts in batch db.session.execute( update(Fingerprint).where( Fingerprint.id.in_(fingerprint_ids) ).values( submission_count=Fingerprint.submission_count + 1 ) ) ``` ## Backup and Recovery ### Backup Strategy **PostgreSQL**: - Daily full backups (pg_dump) - Continuous WAL archiving - Point-in-time recovery enabled **Index**: - Daily snapshots via `/:index/_snapshot` - Incremental backups of Oplog - Segment files backed up separately ### Disaster Recovery **Database Restore**: ```bash # Restore from dump pg_restore -d acoustid_app acoustid_app_backup.dump # Point-in-time recovery pg_restore --target-time='2025-04-28 12:00:00' ``` **Index Rebuild**: ```bash # Rebuild from database python manage.py run import --rebuild-index ```