- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
24 KiB
AcoustID Data Model
Database Architecture
AcoustID uses a multi-database PostgreSQL architecture with separate databases for different concerns.
Database Instances
| Database | Purpose | Tables | Extensions |
|---|---|---|---|
acoustid_app |
Application data (accounts, apps, stats) | 8 | pgcrypto |
acoustid_fingerprint |
Fingerprint and track data | 19 | intarray, acoustid, cube |
acoustid_ingest |
Submission processing | 3 | - |
musicbrainz |
MusicBrainz mirror (read-only) | Many | - |
PostgreSQL Extensions
intarray: Integer array operations
- Used for fingerprint array queries
- Provides
&&(overlap) and@>(contains) operators
pgcrypto: Cryptographic functions
- UUID generation (
gen_random_uuid()) - API key hashing
acoustid (custom): Fingerprint similarity functions
acoustid_compare(int[], int[]): Compare two fingerprintsacoustid_extract_query(int[]): Extract query terms- Source:
acoustid-extC extension
cube: Multi-dimensional cube data type
- Used for simhash-based fingerprint indexing
- Enables fast approximate nearest neighbor search
Core Tables
Account Management (acoustid_app)
account
User accounts for API access.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Account ID |
name |
VARCHAR(255) | NOT NULL | Display name |
apikey |
VARCHAR(40) | UNIQUE, NOT NULL | API key (user key) |
mbuser |
VARCHAR(64) | UNIQUE | MusicBrainz username |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
lastlogin |
TIMESTAMP | Last login timestamp | |
submission_count |
INTEGER | DEFAULT 0 | Total submissions |
application_id |
INTEGER | FOREIGN KEY | Default application |
application_version |
VARCHAR(255) | Application version | |
created_from |
INET | Registration IP | |
is_admin |
BOOLEAN | DEFAULT FALSE | Admin flag |
Indexes:
account_pkey(PRIMARY KEY onid)account_apikey_key(UNIQUE onapikey)account_mbuser_key(UNIQUE onmbuser)
application
API client applications.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Application ID |
name |
VARCHAR(255) | NOT NULL | Application name |
version |
VARCHAR(255) | Version string | |
apikey |
VARCHAR(40) | UNIQUE, NOT NULL | API key (client key) |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
active |
BOOLEAN | DEFAULT TRUE | Active status |
account_id |
INTEGER | FOREIGN KEY | Owner account |
email |
VARCHAR(255) | Contact email | |
website |
VARCHAR(1000) | Website URL | |
rate_limit |
INTEGER | Custom rate limit (req/s) |
Indexes:
application_pkey(PRIMARY KEY onid)application_apikey_key(UNIQUE onapikey)
account_openid
OpenID authentication links.
| Column | Type | Constraints | Description |
|---|---|---|---|
openid |
VARCHAR(255) | PRIMARY KEY | OpenID identifier |
account_id |
INTEGER | FOREIGN KEY | Linked account |
account_google
Google OAuth authentication links.
| Column | Type | Constraints | Description |
|---|---|---|---|
google_user_id |
VARCHAR(255) | PRIMARY KEY | Google user ID |
account_id |
INTEGER | FOREIGN KEY | Linked account |
Fingerprint Data (acoustid_fingerprint)
track
Unique audio tracks identified by fingerprints.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Track ID |
gid |
UUID | UNIQUE, NOT NULL | Public track UUID |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
new_id |
INTEGER | FOREIGN KEY | Merge target (if merged) |
disabled |
BOOLEAN | DEFAULT FALSE | Disabled flag |
Indexes:
track_pkey(PRIMARY KEY onid)track_gid_key(UNIQUE ongid)track_new_id_idx(onnew_id)
Notes:
gidis the public-facing AcoustID track IDnew_idpoints to merged track (for deduplication)- Disabled tracks excluded from search results
fingerprint
Audio fingerprints linked to tracks.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Fingerprint ID |
track_id |
INTEGER | FOREIGN KEY | Linked track |
fingerprint |
INTEGER[] | NOT NULL | Chromaprint hash array |
length |
SMALLINT | NOT NULL | Duration in seconds |
bitrate |
SMALLINT | Audio bitrate (kbps) | |
format_id |
INTEGER | FOREIGN KEY | Audio format |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
submission_count |
INTEGER | DEFAULT 1 | Number of submissions |
Indexes:
fingerprint_pkey(PRIMARY KEY onid)fingerprint_track_id_idx(ontrack_id)fingerprint_length_idx(onlength)fingerprint_fingerprint_idx(GIN onfingerprintusingintarray)
Notes:
fingerprintis an array of 32-bit integers (Chromaprint hashes)- GIN index enables fast similarity search
submission_counttracks popularity
fingerprint_data
Extended fingerprint data with simhash.
| Column | Type | Constraints | Description |
|---|---|---|---|
fingerprint_id |
INTEGER | PRIMARY KEY, FOREIGN KEY | Fingerprint ID |
fingerprint |
BYTEA | NOT NULL | Raw fingerprint data |
simhash |
CUBE | Locality-sensitive hash |
Indexes:
fingerprint_data_pkey(PRIMARY KEY onfingerprint_id)fingerprint_data_simhash_idx(GIST onsimhash)
Notes:
fingerprintstores compressed Chromaprint datasimhashenables approximate nearest neighbor search- GIST index for fast similarity queries
track_mbid
Links tracks to MusicBrainz recordings.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Link ID |
track_id |
INTEGER | FOREIGN KEY | AcoustID track |
mbid |
UUID | NOT NULL | MusicBrainz recording MBID |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
submission_count |
INTEGER | DEFAULT 1 | Number of submissions |
disabled |
BOOLEAN | DEFAULT FALSE | Disabled flag |
Indexes:
track_mbid_pkey(PRIMARY KEY onid)track_mbid_track_id_mbid_key(UNIQUE ontrack_id, mbid)track_mbid_mbid_idx(onmbid)
Notes:
- Multiple MBIDs per track possible (different recordings)
submission_countindicates confidence- Disabled links excluded from results
meta
User-submitted metadata.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Metadata ID |
track |
VARCHAR(255) | Track title | |
artist |
VARCHAR(255) | Artist name | |
album |
VARCHAR(255) | Album title | |
album_artist |
VARCHAR(255) | Album artist | |
track_no |
INTEGER | Track number | |
disc_no |
INTEGER | Disc number | |
year |
INTEGER | Release year |
Indexes:
meta_pkey(PRIMARY KEY onid)
track_meta
Links tracks to user metadata.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Link ID |
track_id |
INTEGER | FOREIGN KEY | AcoustID track |
meta_id |
INTEGER | FOREIGN KEY | Metadata record |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
submission_count |
INTEGER | DEFAULT 1 | Number of submissions |
Indexes:
track_meta_pkey(PRIMARY KEY onid)track_meta_track_id_meta_id_key(UNIQUE ontrack_id, meta_id)
format
Audio file formats.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Format ID |
name |
VARCHAR(20) | UNIQUE, NOT NULL | Format name (mp3, flac, etc.) |
Indexes:
format_pkey(PRIMARY KEY onid)format_name_key(UNIQUE onname)
Common Values:
mp3,flac,ogg,m4a,wma,ape,wav
source
Submission sources (applications).
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Source ID |
application_id |
INTEGER | FOREIGN KEY | Application |
account_id |
INTEGER | FOREIGN KEY | User account |
version |
VARCHAR(255) | Application version |
Indexes:
source_pkey(PRIMARY KEY onid)source_application_id_account_id_version_key(UNIQUE onapplication_id, account_id, version)
Foreign IDs (acoustid_fingerprint)
foreignid_vendor
External ID providers.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Vendor ID |
name |
VARCHAR(255) | UNIQUE, NOT NULL | Vendor name |
Indexes:
foreignid_vendor_pkey(PRIMARY KEY onid)foreignid_vendor_name_key(UNIQUE onname)
Common Values:
musicbrainz,musicip,discogs,spotify
foreignid
External identifiers.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Foreign ID |
vendor_id |
INTEGER | FOREIGN KEY | Vendor |
name |
VARCHAR(255) | NOT NULL | External ID value |
Indexes:
foreignid_pkey(PRIMARY KEY onid)foreignid_vendor_id_name_key(UNIQUE onvendor_id, name)
track_foreignid
Links tracks to external IDs.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Link ID |
track_id |
INTEGER | FOREIGN KEY | AcoustID track |
foreignid_id |
INTEGER | FOREIGN KEY | External ID |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
submission_count |
INTEGER | DEFAULT 1 | Number of submissions |
Indexes:
track_foreignid_pkey(PRIMARY KEY onid)track_foreignid_track_id_foreignid_id_key(UNIQUE ontrack_id, foreignid_id)
track_puid
Legacy MusicIP PUID links.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Link ID |
track_id |
INTEGER | FOREIGN KEY | AcoustID track |
puid |
UUID | NOT NULL | MusicIP PUID |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
submission_count |
INTEGER | DEFAULT 1 | Number of submissions |
Indexes:
track_puid_pkey(PRIMARY KEY onid)track_puid_track_id_puid_key(UNIQUE ontrack_id, puid)track_puid_puid_idx(onpuid)
Statistics (acoustid_app)
stats
General statistics.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Stat ID |
name |
VARCHAR(255) | UNIQUE, NOT NULL | Stat name |
value |
INTEGER | NOT NULL | Stat value |
date |
DATE | NOT NULL | Stat date |
Indexes:
stats_pkey(PRIMARY KEY onid)stats_name_date_key(UNIQUE onname, date)
Common Stats:
lookup.count,submission.count,track.count,fingerprint.count
stats_lookups
Lookup statistics by hour.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Stat ID |
hour |
TIMESTAMP | NOT NULL | Hour timestamp |
application_id |
INTEGER | FOREIGN KEY | Application |
count_hits |
INTEGER | DEFAULT 0 | Successful lookups |
count_misses |
INTEGER | DEFAULT 0 | Failed lookups |
Indexes:
stats_lookups_pkey(PRIMARY KEY onid)stats_lookups_hour_application_id_key(UNIQUE onhour, application_id)
stats_user_agents
User agent statistics.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Stat ID |
date |
DATE | NOT NULL | Date |
application_id |
INTEGER | FOREIGN KEY | Application |
user_agent |
VARCHAR(1000) | NOT NULL | User agent string |
ip |
INET | NOT NULL | IP address |
count |
INTEGER | DEFAULT 0 | Request count |
Indexes:
stats_user_agents_pkey(PRIMARY KEY onid)stats_user_agents_date_application_id_user_agent_ip_key(UNIQUE ondate, application_id, user_agent, ip)
stats_top_accounts
Top submitter accounts.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Stat ID |
account_id |
INTEGER | FOREIGN KEY | Account |
count |
INTEGER | NOT NULL | Submission count |
Indexes:
stats_top_accounts_pkey(PRIMARY KEY onid)stats_top_accounts_account_id_key(UNIQUE onaccount_id)
Submission Processing (acoustid_ingest)
submission
Pending fingerprint submissions.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Submission ID |
fingerprint |
INTEGER[] | NOT NULL | Chromaprint hash array |
length |
SMALLINT | NOT NULL | Duration in seconds |
bitrate |
SMALLINT | Audio bitrate | |
format_id |
INTEGER | Audio format | |
created |
TIMESTAMP | NOT NULL | Submission timestamp |
source_id |
INTEGER | FOREIGN KEY | Submission source |
mbid |
UUID | MusicBrainz MBID (if provided) | |
handled |
BOOLEAN | DEFAULT FALSE | Processing status |
meta_id |
INTEGER | FOREIGN KEY | User metadata |
Indexes:
submission_pkey(PRIMARY KEY onid)submission_handled_idx(onhandledWHEREhandled = FALSE)
Notes:
- Worker processes unhandled submissions
handled = TRUEafter processing
submission_result
Processing results for submissions.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Result ID |
submission_id |
INTEGER | FOREIGN KEY | Submission |
track_id |
INTEGER | FOREIGN KEY | Matched/created track |
created |
TIMESTAMP | NOT NULL | Processing timestamp |
Indexes:
submission_result_pkey(PRIMARY KEY onid)submission_result_submission_id_key(UNIQUE onsubmission_id)
pending_submission
Queue for async submission processing.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Queue ID |
submission_id |
INTEGER | FOREIGN KEY | Submission |
created |
TIMESTAMP | NOT NULL | Queue timestamp |
Indexes:
pending_submission_pkey(PRIMARY KEY onid)pending_submission_submission_id_key(UNIQUE onsubmission_id)
Notes:
- Replaced by NATS queue in newer deployments
- Legacy table, may be deprecated
Provenance Tables (acoustid_fingerprint)
Track data lineage and changes.
fingerprint_source
Links fingerprints to submission sources.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Link ID |
fingerprint_id |
INTEGER | FOREIGN KEY | Fingerprint |
source_id |
INTEGER | FOREIGN KEY | Source |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
track_mbid_source
Links track-MBID associations to sources.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Link ID |
track_mbid_id |
INTEGER | FOREIGN KEY | Track-MBID link |
source_id |
INTEGER | FOREIGN KEY | Source |
created |
TIMESTAMP | NOT NULL | Creation timestamp |
track_mbid_change
Audit log for track-MBID changes.
| Column | Type | Constraints | Description |
|---|---|---|---|
id |
SERIAL | PRIMARY KEY | Change ID |
track_mbid_id |
INTEGER | FOREIGN KEY | Track-MBID link |
account_id |
INTEGER | FOREIGN KEY | Account that made change |
disabled |
BOOLEAN | NOT NULL | New disabled status |
created |
TIMESTAMP | NOT NULL | Change timestamp |
note |
TEXT | Change reason |
ORM Layer (SQLAlchemy)
Multi-Database Configuration
File: acoustid/db.py
# Database bind keys
BIND_KEYS = {
'app': 'acoustid_app',
'fingerprint': 'acoustid_fingerprint',
'ingest': 'acoustid_ingest',
'musicbrainz': 'musicbrainz'
}
Model Binding:
class Account(Base):
__bind_key__ = 'app'
__tablename__ = 'account'
# ...
class Track(Base):
__bind_key__ = 'fingerprint'
__tablename__ = 'track'
# ...
Connection Pooling
Configuration (acoustid.conf):
[database]
name = acoustid_app
user = acoustid
password_file = /run/secrets/db_password
host = postgres
port = 5432
pool_size = 20
pool_recycle = 3600
Pool Settings:
pool_size: Maximum connections per processpool_recycle: Recycle connections after N secondspool_pre_ping: Test connections before use
Query Patterns
Fingerprint Search (legacy, pre-index):
# Find similar fingerprints using intarray overlap
query = db.session.query(Fingerprint).filter(
Fingerprint.fingerprint.op('&&')(query_fingerprint),
Fingerprint.length.between(duration - 5, duration + 5)
).order_by(
func.acoustid_compare(Fingerprint.fingerprint, query_fingerprint).desc()
).limit(10)
Track Lookup with MBIDs:
# Fetch track with all linked MBIDs
track = db.session.query(Track).options(
joinedload(Track.mbids)
).filter(Track.gid == track_gid).first()
Submission Processing:
# Find unhandled submissions
submissions = db.session.query(Submission).filter(
Submission.handled == False
).order_by(Submission.created).limit(100).all()
Database Migrations
Alembic Configuration
File: alembic.ini
Migration Directories:
alembic/versions/app/: acoustid_app migrationsalembic/versions/fingerprint/: acoustid_fingerprint migrationsalembic/versions/ingest/: acoustid_ingest migrations
Multi-Database Support:
# alembic/env.py
def run_migrations_online():
for bind_key in ['app', 'fingerprint', 'ingest']:
engine = get_engine(bind_key)
with engine.connect() as connection:
context.configure(
connection=connection,
target_metadata=get_metadata(bind_key)
)
with context.begin_transaction():
context.run_migrations()
Migration Commands
# Create new migration
alembic revision --autogenerate -m "Add new column"
# Apply migrations
alembic upgrade head
# Rollback migration
alembic downgrade -1
# Show current version
alembic current
# Show migration history
alembic history
Redis Data Structures
Rate Limiting
Key Pattern: rl:bucket:{scope}:{identifier}:{timestamp}
Example Keys:
rl:bucket:global:1714305600
rl:bucket:app:8XaBELgH:1714305600
rl:bucket:ip:192.168.1.1:1714305600
Value: Integer (request count)
TTL: 25 seconds (window duration + buffer)
Algorithm:
# Increment bucket for current window
bucket_key = f"rl:bucket:{scope}:{identifier}:{current_window}"
count = redis.incr(bucket_key)
redis.expire(bucket_key, 25)
# Sum counts across all windows in sliding window
total = sum(redis.get(f"rl:bucket:{scope}:{identifier}:{w}")
for w in windows)
Task Queue (Legacy)
Key Pattern: queue:{queue_name}
Operations:
# Push task
redis.rpush('queue:submissions', json.dumps(task_data))
# Pop task
task_data = redis.lpop('queue:submissions')
Note: Being replaced by NATS in newer deployments
API Key Cache
Implementation: In-memory TTLCache (not Redis)
from cachetools import TTLCache
api_key_cache = TTLCache(maxsize=1000, ttl=60)
Purpose: Reduce database queries for API key validation
Backfill State
Key Pattern: backfill:{index_name}:{state_key}
Example Keys:
backfill:fingerprints:last_id
backfill:fingerprints:batch_size
backfill:fingerprints:completed
Purpose: Track progress of index backfill operations
Unknown MBID Cache
Key Pattern: unknown_mbid:{mbid}
Value: Boolean (1 if MBID not found in MusicBrainz)
TTL: 3600 seconds (1 hour)
Purpose: Avoid repeated MusicBrainz queries for non-existent MBIDs
Data Integrity
Constraints
Foreign Keys:
- All foreign keys have
ON DELETE CASCADEorON DELETE SET NULL - Orphaned records cleaned up automatically
Unique Constraints:
- Prevent duplicate fingerprints per track
- Prevent duplicate MBID links per track
- Ensure API key uniqueness
Check Constraints:
- Duration must be positive
- Bitrate must be positive
- Submission count must be non-negative
Triggers
Update Submission Count:
CREATE TRIGGER update_fingerprint_submission_count
AFTER INSERT ON fingerprint_source
FOR EACH ROW
EXECUTE FUNCTION increment_submission_count();
Track Merge Propagation:
CREATE TRIGGER propagate_track_merge
AFTER UPDATE OF new_id ON track
FOR EACH ROW
EXECUTE FUNCTION update_merged_track_references();
Indexes for Performance
Covering Indexes:
-- Lookup by fingerprint and duration
CREATE INDEX fingerprint_lookup_idx
ON fingerprint (length, track_id)
INCLUDE (fingerprint);
Partial Indexes:
-- Only index unhandled submissions
CREATE INDEX submission_unhandled_idx
ON submission (created)
WHERE handled = FALSE;
GIN Indexes:
-- Fast fingerprint array queries
CREATE INDEX fingerprint_fingerprint_idx
ON fingerprint USING GIN (fingerprint gin__int_ops);
Data Lifecycle
Fingerprint Submission
- Insert into
submissiontable (acoustid_ingest) - Publish to NATS queue
- Worker processes submission
- Insert into
fingerprinttable (acoustid_fingerprint) - Link to
track(create or match) - Insert into
fingerprint_source(provenance) - Update index via HTTP API
- Insert into
submission_result - Mark
submission.handled = TRUE
Track Merging
- Identify duplicate tracks (manual or automated)
- Set
track.new_idto target track - Trigger updates all references
- Merge fingerprints, MBIDs, metadata
- Disable old track (
track.disabled = TRUE)
Data Cleanup
Cron Jobs:
- Delete old handled submissions (>30 days)
- Clean up orphaned metadata records
- Remove disabled tracks with no references
- Archive old statistics
Performance Optimization
Query Optimization
Materialized Views:
CREATE MATERIALIZED VIEW track_stats AS
SELECT
track_id,
COUNT(DISTINCT fingerprint_id) AS fingerprint_count,
COUNT(DISTINCT mbid) AS mbid_count,
SUM(submission_count) AS total_submissions
FROM fingerprint
LEFT JOIN track_mbid USING (track_id)
GROUP BY track_id;
Partitioning (future):
-- Partition submissions by month
CREATE TABLE submission_2025_04 PARTITION OF submission
FOR VALUES FROM ('2025-04-01') TO ('2025-05-01');
Caching Strategy
Application-Level:
- API key validation (TTLCache, 60s)
- Format ID lookup (permanent cache)
- MusicBrainz MBID existence (Redis, 1h)
Database-Level:
- Shared buffers (PostgreSQL config)
- Connection pooling (SQLAlchemy)
- Query result caching (pg_stat_statements)
Bulk Operations
Batch Inserts:
# Insert multiple fingerprints efficiently
db.session.bulk_insert_mappings(Fingerprint, fingerprint_dicts)
db.session.commit()
Bulk Updates:
# Update submission counts in batch
db.session.execute(
update(Fingerprint).where(
Fingerprint.id.in_(fingerprint_ids)
).values(
submission_count=Fingerprint.submission_count + 1
)
)
Backup and Recovery
Backup Strategy
PostgreSQL:
- Daily full backups (pg_dump)
- Continuous WAL archiving
- Point-in-time recovery enabled
Index:
- Daily snapshots via
/:index/_snapshot - Incremental backups of Oplog
- Segment files backed up separately
Disaster Recovery
Database Restore:
# Restore from dump
pg_restore -d acoustid_app acoustid_app_backup.dump
# Point-in-time recovery
pg_restore --target-time='2025-04-28 12:00:00'
Index Rebuild:
# Rebuild from database
python manage.py run import --rebuild-index