Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

23 KiB

Raw Blame History

Bedrock-API Data Layer

Database Technology

RDBMS: PostgreSQL 15
Driver: github.com/jackc/pgx/v5 (native PostgreSQL driver)
Connection Pooling: pgxpool (pgx connection pool)
Migration Tool: None (manual SQL execution)

Database Schema

Users Table

File: db/migrations/001_create_users_table.up.sql

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    role VARCHAR(50) DEFAULT 'user',
    is_verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_users_email ON users(email);

Columns:

Column	Type	Constraints	Purpose
id	UUID	PRIMARY KEY, DEFAULT gen_random_uuid()	Unique user identifier
email	VARCHAR(255)	UNIQUE, NOT NULL	User email (login identifier)
password_hash	VARCHAR(255)	NOT NULL	bcrypt hashed password
role	VARCHAR(50)	DEFAULT 'user'	User role (user/admin)
is_verified	BOOLEAN	DEFAULT false	Email verification status
created_at	TIMESTAMP	DEFAULT CURRENT_TIMESTAMP	Account creation timestamp

Indexes:

Primary key index on id (automatic)
B-tree index on email (for login lookups)

No Foreign Keys: Single table schema, no relationships

Schema Limitations

Missing Tables:

No metadata cache (tracks, albums, artists, playlists)
No user listening history
No user playlists
No user favorites/likes
No play counts
No search history
No provider credentials (Spotify tokens, etc.)

Minimal User Data:

No user profile (name, avatar, bio)
No user preferences (language, region)
No user settings (privacy, notifications)
No user sessions (active logins)

Connection Management

Connection Pool Configuration

File: bedrock_server/main.go

func initDB() (*pgxpool.Pool, error) {
    dbURL := os.Getenv("DATABASE_URL")
    if dbURL == "" {
        return nil, errors.New("DATABASE_URL not set")
    }
    
    config, err := pgxpool.ParseConfig(dbURL)
    if err != nil {
        return nil, fmt.Errorf("parse config: %w", err)
    }
    
    // Pool configuration
    config.MaxConns = 10
    config.MinConns = 2
    config.MaxConnLifetime = time.Hour
    config.MaxConnIdleTime = 30 * time.Minute
    config.HealthCheckPeriod = 1 * time.Minute
    
    pool, err := pgxpool.NewWithConfig(context.Background(), config)
    if err != nil {
        return nil, fmt.Errorf("create pool: %w", err)
    }
    
    // Test connection
    if err := pool.Ping(context.Background()); err != nil {
        return nil, fmt.Errorf("ping: %w", err)
    }
    
    log.Println("Database connection pool initialized")
    return pool, nil
}

Pool Parameters:

Parameter	Value	Rationale
MaxConns	10	Limit concurrent DB connections
MinConns	2	Keep warm connections ready
MaxConnLifetime	1 hour	Prevent stale connections
MaxConnIdleTime	30 minutes	Close idle connections
HealthCheckPeriod	1 minute	Detect dead connections

Connection String Format:

postgresql://username:password@host:port/database?sslmode=disable

Example:

DATABASE_URL=postgresql://bedrock:bedrock@localhost:5432/bedrock?sslmode=disable

Connection Lifecycle

Application Start:
1. Parse DATABASE_URL from environment
2. Create pgxpool.Config with custom parameters
3. Initialize connection pool
4. Ping database to verify connectivity
5. Pass pool to service layer

Request Handling:
1. Service method receives context and pool
2. Acquire connection from pool (automatic)
3. Execute query
4. Release connection back to pool (automatic via defer)

Application Shutdown:
1. Close connection pool
2. Wait for active connections to finish
3. Release all resources

Data Access Layer

User Store

File: store/user.go

type UserStore struct {
    db *pgxpool.Pool
}

func NewUserStore(db *pgxpool.Pool) *UserStore {
    return &UserStore{db: db}
}

User Operations

Save User

func (s *UserStore) Save(ctx context.Context, email, passwordHash string) (string, error) {
    var userID string
    
    query := `
        INSERT INTO users (email, password_hash)
        VALUES ($1, $2)
        RETURNING id
    `
    
    err := s.db.QueryRow(ctx, query, email, passwordHash).Scan(&userID)
    if err != nil {
        if strings.Contains(err.Error(), "duplicate key") {
            return "", errors.New("email already exists")
        }
        return "", fmt.Errorf("insert user: %w", err)
    }
    
    return userID, nil
}

Behavior:

Inserts new user with email and password hash
Returns generated UUID
Handles duplicate email error
Uses parameterized query (SQL injection safe)

Example:

userID, err := userStore.Save(ctx, "user@example.com", "$2a$10$...")
// userID = "550e8400-e29b-41d4-a716-446655440000"

Find User by Email

func (s *UserStore) Find(ctx context.Context, email string) (*User, error) {
    var user User
    
    query := `
        SELECT id, email, password_hash, role, is_verified, created_at
        FROM users
        WHERE email = $1
    `
    
    err := s.db.QueryRow(ctx, query, email).Scan(
        &user.ID,
        &user.Email,
        &user.PasswordHash,
        &user.Role,
        &user.IsVerified,
        &user.CreatedAt,
    )
    
    if err != nil {
        if err == pgx.ErrNoRows {
            return nil, errors.New("user not found")
        }
        return nil, fmt.Errorf("query user: %w", err)
    }
    
    return &user, nil
}

Behavior:

Queries user by email (uses index)
Returns full user record
Handles not found case
Uses parameterized query

Example:

user, err := userStore.Find(ctx, "user@example.com")
// user.ID = "550e8400-e29b-41d4-a716-446655440000"
// user.Email = "user@example.com"
// user.PasswordHash = "$2a$10$..."

Find User by ID

func (s *UserStore) FindByID(ctx context.Context, id string) (*User, error) {
    var user User
    
    query := `
        SELECT id, email, password_hash, role, is_verified, created_at
        FROM users
        WHERE id = $1
    `
    
    err := s.db.QueryRow(ctx, query, id).Scan(
        &user.ID,
        &user.Email,
        &user.PasswordHash,
        &user.Role,
        &user.IsVerified,
        &user.CreatedAt,
    )
    
    if err != nil {
        if err == pgx.ErrNoRows {
            return nil, errors.New("user not found")
        }
        return nil, fmt.Errorf("query user: %w", err)
    }
    
    return &user, nil
}

Behavior: Similar to Find, but queries by UUID primary key

User Model

type User struct {
    ID           string
    Email        string
    PasswordHash string
    Role         string
    IsVerified   bool
    CreatedAt    time.Time
}

No ORM: Plain structs, manual scanning

Database Migrations

Migration Files

Directory: db/migrations/

Naming Convention: {number}_{description}.{up|down}.sql

Example Structure:

db/migrations/
├── 001_create_users_table.up.sql
├── 001_create_users_table.down.sql
├── 002_add_user_roles.up.sql
├── 002_add_user_roles.down.sql
├── 003_add_email_verification.up.sql
└── 003_add_email_verification.down.sql

Migration 001: Create Users Table

Up Migration (001_create_users_table.up.sql):

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    role VARCHAR(50) DEFAULT 'user',
    is_verified BOOLEAN DEFAULT false,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_users_email ON users(email);

Down Migration (001_create_users_table.down.sql):

DROP INDEX IF EXISTS idx_users_email;
DROP TABLE IF EXISTS users;

Migration Execution

No Automated Tool: Migrations must be run manually

Manual Execution:

# Apply migration
psql $DATABASE_URL -f db/migrations/001_create_users_table.up.sql

# Rollback migration
psql $DATABASE_URL -f db/migrations/001_create_users_table.down.sql

Recommended Tools (not integrated):

golang-migrate/migrate
pressly/goose
rubenv/sql-migrate

Migration Tracking

No Tracking Table: No record of applied migrations

Risks:

No way to know which migrations have been applied
Manual tracking required
Risk of applying migrations out of order
Risk of applying same migration twice

Recommendation: Integrate migration tool with tracking table

Caching Strategy

Current Implementation

No Caching: All data fetched from providers on every request

Impact:

High latency (200-500ms per search)
Provider API rate limits
Unnecessary API quota consumption
No offline capability

Planned Caching (Redis)

Not Implemented: Redis integration planned but not built

Proposed Cache Keys:

Key Pattern	TTL	Purpose
`track:{platform}:{id}`	1 hour	Track metadata
`album:{platform}:{id}`	1 hour	Album metadata
`artist:{platform}:{id}`	1 hour	Artist metadata
`playlist:{platform}:{id}`	5 minutes	Playlist metadata (changes frequently)
`stream:{platform}:{id}`	1 hour	Stream URLs (expire after 1-6 hours)
`search:{query}:{platform}`	5 minutes	Search results
`lyrics:{artist}:{title}`	24 hours	Lyrics (rarely change)
`play:{user_id}:{track_id}`	30 seconds	Play deduplication
`status:{platform}`	5 minutes	Provider health status

Proposed Cache Invalidation:

TTL-based expiration (no manual invalidation)
No cache warming (lazy loading)
No cache preloading

Proposed Redis Configuration:

redisClient := redis.NewClient(&redis.Options{
    Addr:         os.Getenv("REDIS_URL"),
    Password:     os.Getenv("REDIS_PASSWORD"),
    DB:           0,
    MaxRetries:   3,
    PoolSize:     10,
    MinIdleConns: 2,
})

Cache-Aside Pattern (Proposed)

func (s *server) GetTrack(ctx context.Context, req *pb.GetRequest) (*pb.Track, error) {
    // Try cache first
    cacheKey := fmt.Sprintf("track:%s", req.Id)
    cached, err := s.redis.Get(ctx, cacheKey).Result()
    if err == nil {
        var track pb.Track
        json.Unmarshal([]byte(cached), &track)
        return &track, nil
    }
    
    // Cache miss, fetch from provider
    platform, nativeID := parseNamespacedID(req.Id)
    provider := s.getProvider(platform)
    track, err := provider.GetTrack(ctx, nativeID)
    if err != nil {
        return nil, err
    }
    
    // Store in cache
    trackJSON, _ := json.Marshal(track)
    s.redis.Set(ctx, cacheKey, trackJSON, 1*time.Hour)
    
    return track, nil
}

Data Persistence Patterns

No Metadata Persistence

Current: All metadata is ephemeral (fetched from providers, not stored)

Implications:

No historical data
No offline access
No analytics on metadata changes
No data ownership

Alternative Approach (not implemented):

Store all fetched metadata in PostgreSQL
Update on cache miss
Enable historical queries
Reduce provider API dependency

No User Data Persistence

Current: Only authentication data is stored

Missing User Data:

Listening history
Favorite tracks/albums/artists
Created playlists
Search history
Playback state (current track, position)
User preferences

Implications:

No personalization
No recommendations based on history
No cross-device sync
No user analytics

Transaction Handling

No Transactions

Current: All database operations are single-statement

Example (no transaction):

func (s *UserStore) Save(ctx context.Context, email, passwordHash string) (string, error) {
    var userID string
    err := s.db.QueryRow(ctx,
        "INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id",
        email, passwordHash,
    ).Scan(&userID)
    return userID, err
}

No Multi-Statement Operations: No need for transactions with single table

Future Considerations: If schema expands (user profiles, playlists, etc.), transactions will be needed

Transaction Example (not used):

func (s *UserStore) SaveWithProfile(ctx context.Context, email, passwordHash, name string) error {
    tx, err := s.db.Begin(ctx)
    if err != nil {
        return err
    }
    defer tx.Rollback(ctx)
    
    var userID string
    err = tx.QueryRow(ctx,
        "INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id",
        email, passwordHash,
    ).Scan(&userID)
    if err != nil {
        return err
    }
    
    _, err = tx.Exec(ctx,
        "INSERT INTO profiles (user_id, name) VALUES ($1, $2)",
        userID, name,
    )
    if err != nil {
        return err
    }
    
    return tx.Commit(ctx)
}

Query Performance

Index Usage

Indexed Queries:

-- Uses idx_users_email (B-tree index)
SELECT * FROM users WHERE email = 'user@example.com';

-- Uses primary key index (automatic)
SELECT * FROM users WHERE id = '550e8400-e29b-41d4-a716-446655440000';

No Full Table Scans: All queries use indexes

Query Patterns

Point Lookups Only: No range queries, no aggregations, no joins

Example Queries:

-- Login (index scan on email)
SELECT id, email, password_hash, role, is_verified, created_at
FROM users
WHERE email = $1;

-- Token refresh (index scan on id)
SELECT id, email, role
FROM users
WHERE id = $1;

-- Registration (insert with RETURNING)
INSERT INTO users (email, password_hash)
VALUES ($1, $2)
RETURNING id;

No Complex Queries: Simple CRUD operations only

Data Consistency

Email Uniqueness

Constraint: UNIQUE constraint on email column

Enforcement: Database-level (PostgreSQL)

Race Condition Handling:

err := s.db.QueryRow(ctx, query, email, passwordHash).Scan(&userID)
if err != nil {
    if strings.Contains(err.Error(), "duplicate key") {
        return "", errors.New("email already exists")
    }
    return "", fmt.Errorf("insert user: %w", err)
}

Concurrent Registration: Database prevents duplicate emails even with concurrent requests

UUID Generation

Method: PostgreSQL gen_random_uuid() function

Collision Probability: Negligible (UUID v4 has 122 random bits)

No Application-Level ID Generation: Database handles ID creation

Backup and Recovery

No Automated Backups

Current: No backup strategy implemented

Risks:

Data loss on database failure
No point-in-time recovery
No disaster recovery plan

Recommendations:

Enable PostgreSQL continuous archiving (WAL archiving)
Schedule daily full backups
Test restore procedures
Store backups off-site (S3, etc.)

Manual Backup

pg_dump:

pg_dump $DATABASE_URL > backup.sql

Restore:

psql $DATABASE_URL < backup.sql

Data Security

Password Storage

Hashing Algorithm: bcrypt
Cost Factor: 10 (2^10 = 1024 iterations)

Implementation:

func hashPassword(password string) (string, error) {
    bytes, err := bcrypt.GenerateFromPassword([]byte(password), 10)
    return string(bytes), err
}

func checkPasswordHash(password, hash string) bool {
    err := bcrypt.CompareHashAndPassword([]byte(hash), []byte(password))
    return err == nil
}

Security Properties:

Salted (bcrypt includes random salt)
Slow (cost factor 10 = ~100ms per hash)
Resistant to rainbow tables
Resistant to brute force (with rate limiting, not implemented)

SQL Injection Prevention

Parameterized Queries: All queries use $1, $2 placeholders

Safe Example:

// Safe: parameterized query
err := s.db.QueryRow(ctx,
    "SELECT * FROM users WHERE email = $1",
    email,
).Scan(&user)

Unsafe Example (not used):

// Unsafe: string concatenation (NOT USED IN CODEBASE)
query := fmt.Sprintf("SELECT * FROM users WHERE email = '%s'", email)
err := s.db.QueryRow(ctx, query).Scan(&user)

All Queries Are Safe: No string concatenation in SQL queries

Connection Security

SSL Mode: Configurable via connection string

Example (SSL disabled):

DATABASE_URL=postgresql://user:pass@localhost:5432/db?sslmode=disable

Example (SSL required):

DATABASE_URL=postgresql://user:pass@localhost:5432/db?sslmode=require

Production Recommendation: Use sslmode=require or sslmode=verify-full

Database Monitoring

No Monitoring

Current: No database monitoring implemented

Missing Metrics:

Connection pool utilization
Query latency
Slow query log
Deadlock detection
Table bloat
Index usage statistics

Recommendations:

Enable PostgreSQL pg_stat_statements extension
Monitor connection pool metrics (pgxpool provides stats)
Set up alerts for connection pool exhaustion
Log slow queries (> 1 second)

Connection Pool Stats (Available but Not Used)

stats := pool.Stat()
log.Printf("Total connections: %d", stats.TotalConns())
log.Printf("Idle connections: %d", stats.IdleConns())
log.Printf("Acquired connections: %d", stats.AcquiredConns())
log.Printf("Max connections: %d", stats.MaxConns())

Not Implemented: Stats are available but not logged or exposed

Data Retention

No Retention Policy

Current: Data is never deleted

User Data:

Users are never deleted (no account deletion endpoint)
No GDPR compliance (no data export, no right to be forgotten)

Recommendations:

Implement account deletion endpoint
Add soft delete (deleted_at timestamp)
Implement data export (GDPR compliance)
Add retention policy for inactive accounts

Scalability Considerations

Vertical Scaling

Current Limits:

Connection pool: 10 max connections
Single PostgreSQL instance
No read replicas

Scaling Up:

Increase connection pool size
Increase PostgreSQL resources (CPU, RAM)
Tune PostgreSQL configuration (shared_buffers, work_mem)

Horizontal Scaling

Not Supported: Single database instance

Challenges:

No sharding strategy
No read/write splitting
No multi-region support

Future Considerations:

Add read replicas for search queries
Shard by user ID for user data
Use connection pooler (PgBouncer) for connection management

Data Model Limitations

Single Table Schema

Pros:

Simple to understand
No joins required
Fast queries (index lookups only)

Cons:

No relational data (playlists, favorites, etc.)
No metadata persistence
No user activity tracking
Limited functionality

No Audit Trail

Missing:

No login history
No password change history
No account modification log
No admin action log

Implications:

No security forensics
No compliance audit trail
No user activity analytics

No Soft Deletes

Hard Delete Only: If delete functionality is added, records are permanently removed

Recommendation: Add deleted_at timestamp for soft deletes

ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP;
CREATE INDEX idx_users_deleted_at ON users(deleted_at);

-- Query active users
SELECT * FROM users WHERE deleted_at IS NULL;

Testing Strategy

No Database Tests

Current: No unit tests for database operations

Missing Tests:

User creation with duplicate email
User lookup by email
User lookup by ID
Connection pool exhaustion
Database connection failure
Transaction rollback (if added)

Recommendation: Add integration tests with test database

Example Test (not implemented):

func TestUserStore_Save_DuplicateEmail(t *testing.T) {
    db := setupTestDB(t)
    defer db.Close()
    
    store := NewUserStore(db)
    
    // First save should succeed
    _, err := store.Save(context.Background(), "test@example.com", "hash1")
    if err != nil {
        t.Fatalf("first save failed: %v", err)
    }
    
    // Second save with same email should fail
    _, err = store.Save(context.Background(), "test@example.com", "hash2")
    if err == nil {
        t.Fatal("expected duplicate email error")
    }
}

Environment Configuration

Database URL

Environment Variable: DATABASE_URL

Format: PostgreSQL connection string

Example:

DATABASE_URL=postgresql://bedrock:bedrock@localhost:5432/bedrock?sslmode=disable

Components:

Protocol: postgresql://
Username: bedrock
Password: bedrock
Host: localhost
Port: 5432
Database: bedrock
SSL Mode: sslmode=disable

No Validation: Application crashes if DATABASE_URL is invalid

Recommendation: Validate connection string format on startup

Docker Deployment

Docker Compose PostgreSQL

File: docker-compose.yml

version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: bedrock
      POSTGRES_PASSWORD: bedrock
      POSTGRES_DB: bedrock
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U bedrock"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  postgres_data:

Features:

PostgreSQL 15 Alpine (minimal image)
Named volume for data persistence
Health check for container orchestration
Exposed port for local development

Missing:

No initialization scripts (migrations must be run manually)
No backup configuration
No replication
No connection pooler (PgBouncer)

Database Initialization

Manual Process:

# Start PostgreSQL
docker-compose up -d postgres

# Wait for PostgreSQL to be ready
docker-compose exec postgres pg_isready -U bedrock

# Run migrations
docker-compose exec postgres psql -U bedrock -d bedrock -f /migrations/001_create_users_table.up.sql

No Automated Initialization: Migrations must be run manually after container start

Recommendation: Add init script to docker-compose

postgres:
  image: postgres:15-alpine
  volumes:
    - postgres_data:/var/lib/postgresql/data
    - ./db/migrations:/docker-entrypoint-initdb.d

Data Layer Summary

Strengths

Simple, focused schema (users only)
Proper indexing (email lookup is fast)
Connection pooling (pgx/v5)
Parameterized queries (SQL injection safe)
bcrypt password hashing (secure)

Weaknesses

No metadata persistence (all data is ephemeral)
No caching (high latency, provider API dependency)
No migration tool (manual SQL execution)
No monitoring (connection pool, query performance)
No backup strategy (data loss risk)
No audit trail (security, compliance)
Minimal schema (no user data beyond auth)

Recommendations for Metadata Aggregator

Adopt:

pgx/v5 driver (excellent performance, native PostgreSQL features)
Connection pooling configuration (sensible defaults)
Parameterized queries (security best practice)

Avoid:

Manual migrations (use golang-migrate or goose)
No caching (implement Redis for metadata)
Minimal schema (metadata aggregator needs rich schema)

Enhance:

Add metadata tables (tracks, albums, artists, labels, etc.)
Add user data tables (favorites, playlists, history)
Add caching layer (Redis for hot data)
Add migration tool (automated schema management)
Add monitoring (connection pool, query latency)
Add backup strategy (automated backups, point-in-time recovery)

23 KiB Raw Blame History

Bedrock-API Data Layer

Database Technology

Database Schema

Users Table

Schema Limitations

Connection Management

Connection Pool Configuration

Connection Lifecycle

Data Access Layer

User Store

User Operations

Save User

Find User by Email

Find User by ID

User Model

Database Migrations

Migration Files

Migration 001: Create Users Table

Migration Execution

Migration Tracking

Caching Strategy

Current Implementation

Planned Caching (Redis)

Cache-Aside Pattern (Proposed)

Data Persistence Patterns

No Metadata Persistence

No User Data Persistence

Transaction Handling

No Transactions

Query Performance

Index Usage

Query Patterns

Data Consistency

Email Uniqueness

UUID Generation

Backup and Recovery

No Automated Backups

Manual Backup

Data Security

Password Storage

SQL Injection Prevention

Connection Security

Database Monitoring

No Monitoring

Connection Pool Stats (Available but Not Used)

Data Retention

No Retention Policy

Scalability Considerations

Vertical Scaling

Horizontal Scaling

Data Model Limitations

Single Table Schema

No Audit Trail

No Soft Deletes

Testing Strategy

No Database Tests

Environment Configuration

Database URL

Docker Deployment

Docker Compose PostgreSQL

Database Initialization

Data Layer Summary

Strengths

Weaknesses

Recommendations for Metadata Aggregator

23 KiB

Raw Blame History