- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
14 KiB
MusicBrainz Server Codebase
Configuration System
Two-Tier Architecture
File: lib/DBDefs.pm
Structure:
lib/DBDefs/Default.pm- Base defaults (in git)lib/DBDefs.pm- Instance-specific overrides (not in git)
Pattern:
package DBDefs;
use parent 'DBDefs::Default';
# Override defaults for this instance
sub DB_SCHEMA_SEQUENCE { 28 }
sub DB_STAGING_SERVER { 0 }
sub REPLICATION_TYPE { RT_MASTER }
Configuration Categories
Database Configuration:
# Primary database
sub READWRITE_DATABASE {
return {
database => 'musicbrainz_db',
host => 'localhost',
port => 5432,
username => 'musicbrainz',
password => 'musicbrainz',
};
}
# Read-only replica (optional)
sub READONLY_DATABASE { READWRITE_DATABASE }
# System user for maintenance
sub SYSTEM_USER { 'musicbrainz' }
# Schema version
sub DB_SCHEMA_SEQUENCE { 28 }
# Staging server flag
sub DB_STAGING_SERVER { 0 }
Redis Configuration:
# Redis server
sub REDIS_SERVER { 'localhost:6379' }
# Redis namespace (prefix for all keys)
sub REDIS_NAMESPACE { 'MB' }
# Redis databases (0-15)
sub REDIS_DATABASE_CACHE { 0 }
sub REDIS_DATABASE_SESSION { 1 }
sub REDIS_DATABASE_SEARCH { 2 }
sub REDIS_DATABASE_STATS { 3 }
Solr Configuration:
# Solr server
sub SOLR_SERVER { 'http://localhost:8983/solr' }
# Solr cores
sub SOLR_CORE_ARTIST { 'artist' }
sub SOLR_CORE_RELEASE { 'release' }
sub SOLR_CORE_RECORDING { 'recording' }
# ... (13 cores total)
Web Server Configuration:
# Server processes
sub WEB_SERVER_PROCESSES { 10 }
# Server host
sub WEB_SERVER_HOST { 'localhost' }
# Server port
sub WEB_SERVER_PORT { 5000 }
# Use reverse proxy
sub WEB_SERVER_USED_IN_REVERSE_PROXY { 1 }
Mail Configuration:
# SMTP server
sub SMTP_SERVER { 'localhost' }
# From address
sub EMAIL_SUPPORT_ADDRESS { 'support@musicbrainz.org' }
# Noreply address
sub EMAIL_NOREPLY_ADDRESS { 'noreply@musicbrainz.org' }
# Bugs address
sub EMAIL_BUGS_ADDRESS { 'bugs@musicbrainz.org' }
External Service Configuration:
# Cover Art Archive
sub COVER_ART_ARCHIVE_ACCESS_KEY { '' }
sub COVER_ART_ARCHIVE_SECRET_KEY { '' }
sub COVER_ART_ARCHIVE_UPLOAD_PREFIXER { 'MB' }
sub COVER_ART_ARCHIVE_DOWNLOAD_PREFIX { 'https://coverartarchive.org' }
# Wikipedia
sub WIKIPEDIA_CACHE_TIMEOUT { 259200 } # 3 days
# Discourse SSO
sub DISCOURSE_SSO_SECRET { '' }
sub DISCOURSE_SERVER { 'https://community.metabrainz.org' }
# MetaBrainz OAuth
sub OAUTH2_ENFORCE_TLS { 1 }
Replication Configuration:
# Replication type
sub REPLICATION_TYPE { RT_STANDALONE } # RT_MASTER, RT_MIRROR, RT_STANDALONE
# Replication access token
sub REPLICATION_ACCESS_TOKEN { '' }
# Replication URL
sub REPLICATION_URL { 'https://data.musicbrainz.org/replication' }
Session Configuration:
# Session expiry (10 hours)
sub SESSION_EXPIRE { 36000 }
# Session idle timeout (3 hours)
sub SESSION_IDLE_TIMEOUT { 10800 }
# Session cookie name
sub SESSION_COOKIE { 'AF_SID' }
# Session cookie domain
sub SESSION_DOMAIN { '.musicbrainz.org' }
Feature Flags:
# Enable beta features
sub BETA_FEATURES { 0 }
# Enable development mode
sub DEVELOPMENT_SERVER { 0 }
# Enable debug mode
sub DEBUG { 0 }
# Enable SQL logging
sub DB_READ_ONLY { 0 }
Rate Limiting:
# API rate limit (requests per second)
sub API_RATE_LIMIT { 1 }
# Web rate limit (requests per second)
sub WEB_RATE_LIMIT { 10 }
Caching:
# Cache TTL for entities (seconds)
sub CACHE_TTL_ENTITY { 3600 } # 1 hour
# Cache TTL for search results (seconds)
sub CACHE_TTL_SEARCH { 900 } # 15 minutes
# Cache TTL for statistics (seconds)
sub CACHE_TTL_STATS { 3600 } # 1 hour
Logging System
Log::Dispatch Framework
Configuration:
use Log::Dispatch;
my $log = Log::Dispatch->new(
outputs => [
[
'Screen',
min_level => 'debug',
stderr => 1,
newline => 1,
],
[
'File',
min_level => 'info',
filename => '/var/log/musicbrainz/server.log',
mode => 'append',
newline => 1,
],
],
);
Log Levels
DEBUG: Verbose debugging information
$log->debug("Loading artist with GID: $gid");
INFO: Informational messages
$log->info("User $username logged in");
WARN: Warning messages
$log->warn("Cache miss for entity $gid");
ERROR: Error messages
$log->error("Failed to connect to database: $error");
FATAL: Fatal errors
$log->fatal("Database connection lost, shutting down");
Message Limit
Maximum Size: 16KB per log message
Truncation: Messages exceeding 16KB are truncated with "..." suffix
Rationale: Prevent log flooding from large data dumps
Lazy Evaluation
Pattern:
# Expensive operation only executed if debug level enabled
$log->debug(sub {
my $data = expensive_serialization($object);
return "Object data: $data";
});
Benefits:
- Avoid expensive operations when logging disabled
- Reduce CPU usage in production
Stack Traces
Automatic: Stack traces included for ERROR and FATAL levels
Format:
ERROR: Failed to load artist
Stack trace:
at MusicBrainz::Server::Data::Artist::get_by_gid line 123
at MusicBrainz::Server::Controller::Artist::show line 45
at Catalyst::Action::execute line 67
Log Rotation
Tool: logrotate
Configuration:
/var/log/musicbrainz/*.log {
daily
rotate 30
compress
delaycompress
notifempty
create 0640 musicbrainz musicbrainz
sharedscripts
postrotate
/usr/bin/killall -HUP starman
endscript
}
Error Tracking (Sentry)
Server-Side Integration
Library: Sentry::Raven (Perl SDK)
Configuration:
use Sentry::Raven;
my $raven = Sentry::Raven->new(
sentry_dsn => 'https://public_key@sentry.io/project_id',
environment => 'production',
release => '2024.01.15',
);
Capture Exception:
eval {
# Code that might fail
$c->model('Artist')->get_by_gid($gid);
};
if ($@) {
$raven->capture_exception($@, {
request => {
url => $c->req->uri,
method => $c->req->method,
headers => $c->req->headers,
},
user => {
id => $c->user->id,
username => $c->user->name,
},
extra => {
gid => $gid,
},
});
}
Client-Side Integration
Library: @sentry/browser (JavaScript SDK)
Configuration:
import * as Sentry from '@sentry/browser';
Sentry.init({
dsn: 'https://public_key@sentry.io/project_id',
environment: 'production',
release: '2024.01.15',
integrations: [
new Sentry.BrowserTracing(),
],
tracesSampleRate: 0.1,
});
Capture Exception:
try {
// Code that might fail
loadArtist(gid);
} catch (error) {
Sentry.captureException(error, {
tags: {
component: 'ArtistPage',
},
extra: {
gid: gid,
},
});
}
Context Enrichment
Request Context:
- URL
- HTTP method
- Headers
- Query parameters
- POST data (sanitized)
User Context:
- User ID
- Username
- Email (hashed)
- IP address (anonymized)
Custom Context:
- Entity GID
- Edit ID
- Search query
- API endpoint
Monitoring
Current State
Metrics Endpoint: None (no Prometheus exporter)
Health Check Endpoint: None (no /health endpoint)
Workarounds:
- Monitor HTTP 200 responses on
/ - Parse logs for error rates
- Monitor database connection count
- Monitor Redis memory usage
Planned Improvements
Prometheus Exporter:
- Request count by endpoint
- Request duration histogram
- Database query count
- Database query duration
- Cache hit/miss ratio
- Edit submission rate
- Vote count
Health Check Endpoint:
- Database connectivity
- Redis connectivity
- Solr connectivity
- Disk space
- Memory usage
Session Management
Redis-Backed Sessions
Storage: Redis database 1
Session Key: session:{session_id}
Session Data:
{
"user_id": 12345,
"username": "user",
"csrf_token": "abc123...",
"last_activity": 1609459200,
"preferences": {
"language": "en",
"timezone": "UTC"
}
}
Session Lifecycle
Creation:
my $session_id = generate_session_id(); # Random 32-byte hex
my $session_data = {
user_id => $user->id,
csrf_token => generate_csrf_token(),
last_activity => time(),
};
$redis->setex(
"session:$session_id",
36000, # 10 hours
encode_json($session_data)
);
$c->res->cookies->{AF_SID} = {
value => $session_id,
path => '/',
domain => '.musicbrainz.org',
secure => 1,
httponly => 1,
samesite => 'Lax',
};
Validation:
my $session_id = $c->req->cookies->{AF_SID};
my $session_json = $redis->get("session:$session_id");
if (!$session_json) {
# Session expired or invalid
return undef;
}
my $session_data = decode_json($session_json);
# Check idle timeout
my $idle_time = time() - $session_data->{last_activity};
if ($idle_time > 10800) { # 3 hours
$redis->del("session:$session_id");
return undef;
}
# Update last activity
$session_data->{last_activity} = time();
$redis->setex("session:$session_id", 36000, encode_json($session_data));
return $session_data;
Destruction:
$redis->del("session:$session_id");
$c->res->cookies->{AF_SID} = {
value => '',
expires => '-1d',
};
Session Expiry
Absolute Expiry: 10 hours (36,000 seconds)
Idle Timeout: 3 hours (10,800 seconds)
Sliding Window: Last activity updated on each request
Cookie Configuration
Name: AF_SID
Attributes:
Secure- HTTPS onlyHttpOnly- Not accessible via JavaScriptSameSite=Lax- CSRF protectionDomain=.musicbrainz.org- Shared across subdomainsPath=/- Available site-wide
Security
CSRF Protection
Token Generation:
use Digest::SHA qw(sha256_hex);
my $csrf_token = sha256_hex(
$session_id .
$user_id .
time() .
random_bytes(32)
);
Token Storage: Stored in session data
Token Validation:
sub validate_csrf_token {
my ($c, $submitted_token) = @_;
my $session_token = $c->session->{csrf_token};
if (!$session_token || $submitted_token ne $session_token) {
$c->detach('/error_403');
}
}
Form Inclusion:
<form method="POST" action="/edit/artist/create">
<input type="hidden" name="csrf_token" value="[% csrf_token %]">
<!-- form fields -->
</form>
AJAX Requests:
fetch('/api/endpoint', {
method: 'POST',
headers: {
'X-CSRF-Token': csrfToken,
'Content-Type': 'application/json',
},
body: JSON.stringify(data),
});
Content Security Policy (CSP)
Header:
Content-Security-Policy:
default-src 'self';
script-src 'self' 'unsafe-inline' https://www.google-analytics.com;
style-src 'self' 'unsafe-inline';
img-src 'self' data: https:;
font-src 'self' data:;
connect-src 'self' https://sentry.io;
frame-ancestors 'none';
Directives:
default-src 'self'- Only load resources from same originscript-src- Allow scripts from self and Google Analyticsstyle-src- Allow styles from self (inline allowed for legacy)img-src- Allow images from anywhere (cover art, etc.)connect-src- Allow AJAX to self and Sentryframe-ancestors 'none'- Prevent clickjacking
Authentication
Realms:
- Session-based (cookie)
- HTTP Digest (legacy)
- OAuth2 Bearer token
Session Authentication:
sub authenticate_session {
my ($c) = @_;
my $session_id = $c->req->cookies->{AF_SID};
my $session = $c->model('Session')->load($session_id);
if ($session) {
my $user = $c->model('Editor')->get_by_id($session->{user_id});
$c->set_authenticated_user($user);
}
}
OAuth2 Authentication:
sub authenticate_oauth2 {
my ($c) = @_;
my $auth_header = $c->req->header('Authorization');
if ($auth_header =~ /^Bearer (.+)$/) {
my $token = $1;
my $token_info = $c->model('OAuth2')->introspect($token);
if ($token_info->{active}) {
my $user = $c->model('Editor')->get_by_id($token_info->{sub});
$c->set_authenticated_user($user);
}
}
}
Password Hashing
Algorithm: Bcrypt
Cost Factor: 12 (2^12 = 4096 iterations)
Hashing:
use Crypt::Eksblowfish::Bcrypt qw(bcrypt en_base64);
sub hash_password {
my ($password) = @_;
my $salt = generate_salt(); # 16 random bytes
my $settings = '$2a$12$' . en_base64($salt);
return bcrypt($password, $settings);
}
Verification:
sub verify_password {
my ($password, $hash) = @_;
my $computed_hash = bcrypt($password, $hash);
return $computed_hash eq $hash;
}
Password Requirements:
- Minimum 8 characters
- No maximum length
- No complexity requirements (user choice)
Editor Privileges
Privilege Flags (Bitmask):
| Flag | Value | Description |
|---|---|---|
UNTRUSTED |
1 | New user, limited privileges |
AUTOEDITOR |
2 | Auto-editor, edits auto-approved |
BOT |
4 | Bot account |
UNTRUSTED_BOT |
5 | Untrusted bot (1 + 4) |
RELATIONSHIP_EDITOR |
8 | Can edit relationships |
WIKI_TRANSCLUSION |
16 | Can transclude wiki content |
MBID_SUBMITTER |
32 | Can submit MBIDs |
ACCOUNT_ADMIN |
64 | Can manage user accounts |
LOCATION_EDITOR |
128 | Can edit locations |
BANNER_EDITOR |
256 | Can edit site banners |
EDITING_DISABLED |
512 | Editing disabled (banned) |
ADDING_NOTES_DISABLED |
1024 | Cannot add edit notes |
SPAMMER |
2048 | Marked as spammer |
AUTO_EDITOR_ELECTIONS |
4096 | Can vote in auto-editor elections |
DONT_NAG |
8192 | Don't show donation nag |
Privilege Check:
sub is_auto_editor {
my ($user) = @_;
return ($user->privs & 2) != 0;
}
sub can_edit_relationships {
my ($user) = @_;
return ($user->privs & 8) != 0;
}
Auto-Editor Election System
Eligibility:
- 100+ accepted edits
- Member for 2+ weeks
- No recent failed votes
Election Process:
- User nominates self or is nominated
- 1-week voting period
- Existing auto-editors vote
- 75% approval required
- Minimum 5 votes required
Auto-Editor Benefits:
- Edits auto-approved (no voting period)
- Can vote in elections
- Can approve/reject edits
- Higher trust level