- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
23 KiB
Bedrock-API Data Layer
Database Technology
RDBMS: PostgreSQL 15
Driver: github.com/jackc/pgx/v5 (native PostgreSQL driver)
Connection Pooling: pgxpool (pgx connection pool)
Migration Tool: None (manual SQL execution)
Database Schema
Users Table
File: db/migrations/001_create_users_table.up.sql
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
role VARCHAR(50) DEFAULT 'user',
is_verified BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_users_email ON users(email);
Columns:
| Column | Type | Constraints | Purpose |
|---|---|---|---|
| id | UUID | PRIMARY KEY, DEFAULT gen_random_uuid() | Unique user identifier |
| VARCHAR(255) | UNIQUE, NOT NULL | User email (login identifier) | |
| password_hash | VARCHAR(255) | NOT NULL | bcrypt hashed password |
| role | VARCHAR(50) | DEFAULT 'user' | User role (user/admin) |
| is_verified | BOOLEAN | DEFAULT false | Email verification status |
| created_at | TIMESTAMP | DEFAULT CURRENT_TIMESTAMP | Account creation timestamp |
Indexes:
- Primary key index on
id(automatic) - B-tree index on
email(for login lookups)
No Foreign Keys: Single table schema, no relationships
Schema Limitations
Missing Tables:
- No metadata cache (tracks, albums, artists, playlists)
- No user listening history
- No user playlists
- No user favorites/likes
- No play counts
- No search history
- No provider credentials (Spotify tokens, etc.)
Minimal User Data:
- No user profile (name, avatar, bio)
- No user preferences (language, region)
- No user settings (privacy, notifications)
- No user sessions (active logins)
Connection Management
Connection Pool Configuration
File: bedrock_server/main.go
func initDB() (*pgxpool.Pool, error) {
dbURL := os.Getenv("DATABASE_URL")
if dbURL == "" {
return nil, errors.New("DATABASE_URL not set")
}
config, err := pgxpool.ParseConfig(dbURL)
if err != nil {
return nil, fmt.Errorf("parse config: %w", err)
}
// Pool configuration
config.MaxConns = 10
config.MinConns = 2
config.MaxConnLifetime = time.Hour
config.MaxConnIdleTime = 30 * time.Minute
config.HealthCheckPeriod = 1 * time.Minute
pool, err := pgxpool.NewWithConfig(context.Background(), config)
if err != nil {
return nil, fmt.Errorf("create pool: %w", err)
}
// Test connection
if err := pool.Ping(context.Background()); err != nil {
return nil, fmt.Errorf("ping: %w", err)
}
log.Println("Database connection pool initialized")
return pool, nil
}
Pool Parameters:
| Parameter | Value | Rationale |
|---|---|---|
| MaxConns | 10 | Limit concurrent DB connections |
| MinConns | 2 | Keep warm connections ready |
| MaxConnLifetime | 1 hour | Prevent stale connections |
| MaxConnIdleTime | 30 minutes | Close idle connections |
| HealthCheckPeriod | 1 minute | Detect dead connections |
Connection String Format:
postgresql://username:password@host:port/database?sslmode=disable
Example:
DATABASE_URL=postgresql://bedrock:bedrock@localhost:5432/bedrock?sslmode=disable
Connection Lifecycle
Application Start:
1. Parse DATABASE_URL from environment
2. Create pgxpool.Config with custom parameters
3. Initialize connection pool
4. Ping database to verify connectivity
5. Pass pool to service layer
Request Handling:
1. Service method receives context and pool
2. Acquire connection from pool (automatic)
3. Execute query
4. Release connection back to pool (automatic via defer)
Application Shutdown:
1. Close connection pool
2. Wait for active connections to finish
3. Release all resources
Data Access Layer
User Store
File: store/user.go
type UserStore struct {
db *pgxpool.Pool
}
func NewUserStore(db *pgxpool.Pool) *UserStore {
return &UserStore{db: db}
}
User Operations
Save User
func (s *UserStore) Save(ctx context.Context, email, passwordHash string) (string, error) {
var userID string
query := `
INSERT INTO users (email, password_hash)
VALUES ($1, $2)
RETURNING id
`
err := s.db.QueryRow(ctx, query, email, passwordHash).Scan(&userID)
if err != nil {
if strings.Contains(err.Error(), "duplicate key") {
return "", errors.New("email already exists")
}
return "", fmt.Errorf("insert user: %w", err)
}
return userID, nil
}
Behavior:
- Inserts new user with email and password hash
- Returns generated UUID
- Handles duplicate email error
- Uses parameterized query (SQL injection safe)
Example:
userID, err := userStore.Save(ctx, "user@example.com", "$2a$10$...")
// userID = "550e8400-e29b-41d4-a716-446655440000"
Find User by Email
func (s *UserStore) Find(ctx context.Context, email string) (*User, error) {
var user User
query := `
SELECT id, email, password_hash, role, is_verified, created_at
FROM users
WHERE email = $1
`
err := s.db.QueryRow(ctx, query, email).Scan(
&user.ID,
&user.Email,
&user.PasswordHash,
&user.Role,
&user.IsVerified,
&user.CreatedAt,
)
if err != nil {
if err == pgx.ErrNoRows {
return nil, errors.New("user not found")
}
return nil, fmt.Errorf("query user: %w", err)
}
return &user, nil
}
Behavior:
- Queries user by email (uses index)
- Returns full user record
- Handles not found case
- Uses parameterized query
Example:
user, err := userStore.Find(ctx, "user@example.com")
// user.ID = "550e8400-e29b-41d4-a716-446655440000"
// user.Email = "user@example.com"
// user.PasswordHash = "$2a$10$..."
Find User by ID
func (s *UserStore) FindByID(ctx context.Context, id string) (*User, error) {
var user User
query := `
SELECT id, email, password_hash, role, is_verified, created_at
FROM users
WHERE id = $1
`
err := s.db.QueryRow(ctx, query, id).Scan(
&user.ID,
&user.Email,
&user.PasswordHash,
&user.Role,
&user.IsVerified,
&user.CreatedAt,
)
if err != nil {
if err == pgx.ErrNoRows {
return nil, errors.New("user not found")
}
return nil, fmt.Errorf("query user: %w", err)
}
return &user, nil
}
Behavior: Similar to Find, but queries by UUID primary key
User Model
type User struct {
ID string
Email string
PasswordHash string
Role string
IsVerified bool
CreatedAt time.Time
}
No ORM: Plain structs, manual scanning
Database Migrations
Migration Files
Directory: db/migrations/
Naming Convention: {number}_{description}.{up|down}.sql
Example Structure:
db/migrations/
├── 001_create_users_table.up.sql
├── 001_create_users_table.down.sql
├── 002_add_user_roles.up.sql
├── 002_add_user_roles.down.sql
├── 003_add_email_verification.up.sql
└── 003_add_email_verification.down.sql
Migration 001: Create Users Table
Up Migration (001_create_users_table.up.sql):
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
role VARCHAR(50) DEFAULT 'user',
is_verified BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_users_email ON users(email);
Down Migration (001_create_users_table.down.sql):
DROP INDEX IF EXISTS idx_users_email;
DROP TABLE IF EXISTS users;
Migration Execution
No Automated Tool: Migrations must be run manually
Manual Execution:
# Apply migration
psql $DATABASE_URL -f db/migrations/001_create_users_table.up.sql
# Rollback migration
psql $DATABASE_URL -f db/migrations/001_create_users_table.down.sql
Recommended Tools (not integrated):
golang-migrate/migratepressly/gooserubenv/sql-migrate
Migration Tracking
No Tracking Table: No record of applied migrations
Risks:
- No way to know which migrations have been applied
- Manual tracking required
- Risk of applying migrations out of order
- Risk of applying same migration twice
Recommendation: Integrate migration tool with tracking table
Caching Strategy
Current Implementation
No Caching: All data fetched from providers on every request
Impact:
- High latency (200-500ms per search)
- Provider API rate limits
- Unnecessary API quota consumption
- No offline capability
Planned Caching (Redis)
Not Implemented: Redis integration planned but not built
Proposed Cache Keys:
| Key Pattern | TTL | Purpose |
|---|---|---|
track:{platform}:{id} |
1 hour | Track metadata |
album:{platform}:{id} |
1 hour | Album metadata |
artist:{platform}:{id} |
1 hour | Artist metadata |
playlist:{platform}:{id} |
5 minutes | Playlist metadata (changes frequently) |
stream:{platform}:{id} |
1 hour | Stream URLs (expire after 1-6 hours) |
search:{query}:{platform} |
5 minutes | Search results |
lyrics:{artist}:{title} |
24 hours | Lyrics (rarely change) |
play:{user_id}:{track_id} |
30 seconds | Play deduplication |
status:{platform} |
5 minutes | Provider health status |
Proposed Cache Invalidation:
- TTL-based expiration (no manual invalidation)
- No cache warming (lazy loading)
- No cache preloading
Proposed Redis Configuration:
redisClient := redis.NewClient(&redis.Options{
Addr: os.Getenv("REDIS_URL"),
Password: os.Getenv("REDIS_PASSWORD"),
DB: 0,
MaxRetries: 3,
PoolSize: 10,
MinIdleConns: 2,
})
Cache-Aside Pattern (Proposed)
func (s *server) GetTrack(ctx context.Context, req *pb.GetRequest) (*pb.Track, error) {
// Try cache first
cacheKey := fmt.Sprintf("track:%s", req.Id)
cached, err := s.redis.Get(ctx, cacheKey).Result()
if err == nil {
var track pb.Track
json.Unmarshal([]byte(cached), &track)
return &track, nil
}
// Cache miss, fetch from provider
platform, nativeID := parseNamespacedID(req.Id)
provider := s.getProvider(platform)
track, err := provider.GetTrack(ctx, nativeID)
if err != nil {
return nil, err
}
// Store in cache
trackJSON, _ := json.Marshal(track)
s.redis.Set(ctx, cacheKey, trackJSON, 1*time.Hour)
return track, nil
}
Data Persistence Patterns
No Metadata Persistence
Current: All metadata is ephemeral (fetched from providers, not stored)
Implications:
- No historical data
- No offline access
- No analytics on metadata changes
- No data ownership
Alternative Approach (not implemented):
- Store all fetched metadata in PostgreSQL
- Update on cache miss
- Enable historical queries
- Reduce provider API dependency
No User Data Persistence
Current: Only authentication data is stored
Missing User Data:
- Listening history
- Favorite tracks/albums/artists
- Created playlists
- Search history
- Playback state (current track, position)
- User preferences
Implications:
- No personalization
- No recommendations based on history
- No cross-device sync
- No user analytics
Transaction Handling
No Transactions
Current: All database operations are single-statement
Example (no transaction):
func (s *UserStore) Save(ctx context.Context, email, passwordHash string) (string, error) {
var userID string
err := s.db.QueryRow(ctx,
"INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id",
email, passwordHash,
).Scan(&userID)
return userID, err
}
No Multi-Statement Operations: No need for transactions with single table
Future Considerations: If schema expands (user profiles, playlists, etc.), transactions will be needed
Transaction Example (not used):
func (s *UserStore) SaveWithProfile(ctx context.Context, email, passwordHash, name string) error {
tx, err := s.db.Begin(ctx)
if err != nil {
return err
}
defer tx.Rollback(ctx)
var userID string
err = tx.QueryRow(ctx,
"INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id",
email, passwordHash,
).Scan(&userID)
if err != nil {
return err
}
_, err = tx.Exec(ctx,
"INSERT INTO profiles (user_id, name) VALUES ($1, $2)",
userID, name,
)
if err != nil {
return err
}
return tx.Commit(ctx)
}
Query Performance
Index Usage
Indexed Queries:
-- Uses idx_users_email (B-tree index)
SELECT * FROM users WHERE email = 'user@example.com';
-- Uses primary key index (automatic)
SELECT * FROM users WHERE id = '550e8400-e29b-41d4-a716-446655440000';
No Full Table Scans: All queries use indexes
Query Patterns
Point Lookups Only: No range queries, no aggregations, no joins
Example Queries:
-- Login (index scan on email)
SELECT id, email, password_hash, role, is_verified, created_at
FROM users
WHERE email = $1;
-- Token refresh (index scan on id)
SELECT id, email, role
FROM users
WHERE id = $1;
-- Registration (insert with RETURNING)
INSERT INTO users (email, password_hash)
VALUES ($1, $2)
RETURNING id;
No Complex Queries: Simple CRUD operations only
Data Consistency
Email Uniqueness
Constraint: UNIQUE constraint on email column
Enforcement: Database-level (PostgreSQL)
Race Condition Handling:
err := s.db.QueryRow(ctx, query, email, passwordHash).Scan(&userID)
if err != nil {
if strings.Contains(err.Error(), "duplicate key") {
return "", errors.New("email already exists")
}
return "", fmt.Errorf("insert user: %w", err)
}
Concurrent Registration: Database prevents duplicate emails even with concurrent requests
UUID Generation
Method: PostgreSQL gen_random_uuid() function
Collision Probability: Negligible (UUID v4 has 122 random bits)
No Application-Level ID Generation: Database handles ID creation
Backup and Recovery
No Automated Backups
Current: No backup strategy implemented
Risks:
- Data loss on database failure
- No point-in-time recovery
- No disaster recovery plan
Recommendations:
- Enable PostgreSQL continuous archiving (WAL archiving)
- Schedule daily full backups
- Test restore procedures
- Store backups off-site (S3, etc.)
Manual Backup
pg_dump:
pg_dump $DATABASE_URL > backup.sql
Restore:
psql $DATABASE_URL < backup.sql
Data Security
Password Storage
Hashing Algorithm: bcrypt
Cost Factor: 10 (2^10 = 1024 iterations)
Implementation:
func hashPassword(password string) (string, error) {
bytes, err := bcrypt.GenerateFromPassword([]byte(password), 10)
return string(bytes), err
}
func checkPasswordHash(password, hash string) bool {
err := bcrypt.CompareHashAndPassword([]byte(hash), []byte(password))
return err == nil
}
Security Properties:
- Salted (bcrypt includes random salt)
- Slow (cost factor 10 = ~100ms per hash)
- Resistant to rainbow tables
- Resistant to brute force (with rate limiting, not implemented)
SQL Injection Prevention
Parameterized Queries: All queries use $1, $2 placeholders
Safe Example:
// Safe: parameterized query
err := s.db.QueryRow(ctx,
"SELECT * FROM users WHERE email = $1",
email,
).Scan(&user)
Unsafe Example (not used):
// Unsafe: string concatenation (NOT USED IN CODEBASE)
query := fmt.Sprintf("SELECT * FROM users WHERE email = '%s'", email)
err := s.db.QueryRow(ctx, query).Scan(&user)
All Queries Are Safe: No string concatenation in SQL queries
Connection Security
SSL Mode: Configurable via connection string
Example (SSL disabled):
DATABASE_URL=postgresql://user:pass@localhost:5432/db?sslmode=disable
Example (SSL required):
DATABASE_URL=postgresql://user:pass@localhost:5432/db?sslmode=require
Production Recommendation: Use sslmode=require or sslmode=verify-full
Database Monitoring
No Monitoring
Current: No database monitoring implemented
Missing Metrics:
- Connection pool utilization
- Query latency
- Slow query log
- Deadlock detection
- Table bloat
- Index usage statistics
Recommendations:
- Enable PostgreSQL
pg_stat_statementsextension - Monitor connection pool metrics (pgxpool provides stats)
- Set up alerts for connection pool exhaustion
- Log slow queries (> 1 second)
Connection Pool Stats (Available but Not Used)
stats := pool.Stat()
log.Printf("Total connections: %d", stats.TotalConns())
log.Printf("Idle connections: %d", stats.IdleConns())
log.Printf("Acquired connections: %d", stats.AcquiredConns())
log.Printf("Max connections: %d", stats.MaxConns())
Not Implemented: Stats are available but not logged or exposed
Data Retention
No Retention Policy
Current: Data is never deleted
User Data:
- Users are never deleted (no account deletion endpoint)
- No GDPR compliance (no data export, no right to be forgotten)
Recommendations:
- Implement account deletion endpoint
- Add soft delete (deleted_at timestamp)
- Implement data export (GDPR compliance)
- Add retention policy for inactive accounts
Scalability Considerations
Vertical Scaling
Current Limits:
- Connection pool: 10 max connections
- Single PostgreSQL instance
- No read replicas
Scaling Up:
- Increase connection pool size
- Increase PostgreSQL resources (CPU, RAM)
- Tune PostgreSQL configuration (shared_buffers, work_mem)
Horizontal Scaling
Not Supported: Single database instance
Challenges:
- No sharding strategy
- No read/write splitting
- No multi-region support
Future Considerations:
- Add read replicas for search queries
- Shard by user ID for user data
- Use connection pooler (PgBouncer) for connection management
Data Model Limitations
Single Table Schema
Pros:
- Simple to understand
- No joins required
- Fast queries (index lookups only)
Cons:
- No relational data (playlists, favorites, etc.)
- No metadata persistence
- No user activity tracking
- Limited functionality
No Audit Trail
Missing:
- No login history
- No password change history
- No account modification log
- No admin action log
Implications:
- No security forensics
- No compliance audit trail
- No user activity analytics
No Soft Deletes
Hard Delete Only: If delete functionality is added, records are permanently removed
Recommendation: Add deleted_at timestamp for soft deletes
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP;
CREATE INDEX idx_users_deleted_at ON users(deleted_at);
-- Query active users
SELECT * FROM users WHERE deleted_at IS NULL;
Testing Strategy
No Database Tests
Current: No unit tests for database operations
Missing Tests:
- User creation with duplicate email
- User lookup by email
- User lookup by ID
- Connection pool exhaustion
- Database connection failure
- Transaction rollback (if added)
Recommendation: Add integration tests with test database
Example Test (not implemented):
func TestUserStore_Save_DuplicateEmail(t *testing.T) {
db := setupTestDB(t)
defer db.Close()
store := NewUserStore(db)
// First save should succeed
_, err := store.Save(context.Background(), "test@example.com", "hash1")
if err != nil {
t.Fatalf("first save failed: %v", err)
}
// Second save with same email should fail
_, err = store.Save(context.Background(), "test@example.com", "hash2")
if err == nil {
t.Fatal("expected duplicate email error")
}
}
Environment Configuration
Database URL
Environment Variable: DATABASE_URL
Format: PostgreSQL connection string
Example:
DATABASE_URL=postgresql://bedrock:bedrock@localhost:5432/bedrock?sslmode=disable
Components:
- Protocol:
postgresql:// - Username:
bedrock - Password:
bedrock - Host:
localhost - Port:
5432 - Database:
bedrock - SSL Mode:
sslmode=disable
No Validation: Application crashes if DATABASE_URL is invalid
Recommendation: Validate connection string format on startup
Docker Deployment
Docker Compose PostgreSQL
File: docker-compose.yml
version: '3.8'
services:
postgres:
image: postgres:15-alpine
environment:
POSTGRES_USER: bedrock
POSTGRES_PASSWORD: bedrock
POSTGRES_DB: bedrock
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U bedrock"]
interval: 10s
timeout: 5s
retries: 5
volumes:
postgres_data:
Features:
- PostgreSQL 15 Alpine (minimal image)
- Named volume for data persistence
- Health check for container orchestration
- Exposed port for local development
Missing:
- No initialization scripts (migrations must be run manually)
- No backup configuration
- No replication
- No connection pooler (PgBouncer)
Database Initialization
Manual Process:
# Start PostgreSQL
docker-compose up -d postgres
# Wait for PostgreSQL to be ready
docker-compose exec postgres pg_isready -U bedrock
# Run migrations
docker-compose exec postgres psql -U bedrock -d bedrock -f /migrations/001_create_users_table.up.sql
No Automated Initialization: Migrations must be run manually after container start
Recommendation: Add init script to docker-compose
postgres:
image: postgres:15-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
- ./db/migrations:/docker-entrypoint-initdb.d
Data Layer Summary
Strengths
- Simple, focused schema (users only)
- Proper indexing (email lookup is fast)
- Connection pooling (pgx/v5)
- Parameterized queries (SQL injection safe)
- bcrypt password hashing (secure)
Weaknesses
- No metadata persistence (all data is ephemeral)
- No caching (high latency, provider API dependency)
- No migration tool (manual SQL execution)
- No monitoring (connection pool, query performance)
- No backup strategy (data loss risk)
- No audit trail (security, compliance)
- Minimal schema (no user data beyond auth)
Recommendations for Metadata Aggregator
Adopt:
- pgx/v5 driver (excellent performance, native PostgreSQL features)
- Connection pooling configuration (sensible defaults)
- Parameterized queries (security best practice)
Avoid:
- Manual migrations (use golang-migrate or goose)
- No caching (implement Redis for metadata)
- Minimal schema (metadata aggregator needs rich schema)
Enhance:
- Add metadata tables (tracks, albums, artists, labels, etc.)
- Add user data tables (favorites, playlists, history)
- Add caching layer (Redis for hot data)
- Add migration tool (automated schema management)
- Add monitoring (connection pool, query latency)
- Add backup strategy (automated backups, point-in-time recovery)