# Bedrock-API Data Layer ## Database Technology **RDBMS**: PostgreSQL 15 **Driver**: `github.com/jackc/pgx/v5` (native PostgreSQL driver) **Connection Pooling**: `pgxpool` (pgx connection pool) **Migration Tool**: None (manual SQL execution) ## Database Schema ### Users Table **File**: `db/migrations/001_create_users_table.up.sql` ```sql CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, password_hash VARCHAR(255) NOT NULL, role VARCHAR(50) DEFAULT 'user', is_verified BOOLEAN DEFAULT false, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_users_email ON users(email); ``` **Columns**: | Column | Type | Constraints | Purpose | |--------|------|-------------|---------| | id | UUID | PRIMARY KEY, DEFAULT gen_random_uuid() | Unique user identifier | | email | VARCHAR(255) | UNIQUE, NOT NULL | User email (login identifier) | | password_hash | VARCHAR(255) | NOT NULL | bcrypt hashed password | | role | VARCHAR(50) | DEFAULT 'user' | User role (user/admin) | | is_verified | BOOLEAN | DEFAULT false | Email verification status | | created_at | TIMESTAMP | DEFAULT CURRENT_TIMESTAMP | Account creation timestamp | **Indexes**: - Primary key index on `id` (automatic) - B-tree index on `email` (for login lookups) **No Foreign Keys**: Single table schema, no relationships ### Schema Limitations **Missing Tables**: - No metadata cache (tracks, albums, artists, playlists) - No user listening history - No user playlists - No user favorites/likes - No play counts - No search history - No provider credentials (Spotify tokens, etc.) **Minimal User Data**: - No user profile (name, avatar, bio) - No user preferences (language, region) - No user settings (privacy, notifications) - No user sessions (active logins) ## Connection Management ### Connection Pool Configuration **File**: `bedrock_server/main.go` ```go func initDB() (*pgxpool.Pool, error) { dbURL := os.Getenv("DATABASE_URL") if dbURL == "" { return nil, errors.New("DATABASE_URL not set") } config, err := pgxpool.ParseConfig(dbURL) if err != nil { return nil, fmt.Errorf("parse config: %w", err) } // Pool configuration config.MaxConns = 10 config.MinConns = 2 config.MaxConnLifetime = time.Hour config.MaxConnIdleTime = 30 * time.Minute config.HealthCheckPeriod = 1 * time.Minute pool, err := pgxpool.NewWithConfig(context.Background(), config) if err != nil { return nil, fmt.Errorf("create pool: %w", err) } // Test connection if err := pool.Ping(context.Background()); err != nil { return nil, fmt.Errorf("ping: %w", err) } log.Println("Database connection pool initialized") return pool, nil } ``` **Pool Parameters**: | Parameter | Value | Rationale | |-----------|-------|-----------| | MaxConns | 10 | Limit concurrent DB connections | | MinConns | 2 | Keep warm connections ready | | MaxConnLifetime | 1 hour | Prevent stale connections | | MaxConnIdleTime | 30 minutes | Close idle connections | | HealthCheckPeriod | 1 minute | Detect dead connections | **Connection String Format**: ``` postgresql://username:password@host:port/database?sslmode=disable ``` **Example**: ``` DATABASE_URL=postgresql://bedrock:bedrock@localhost:5432/bedrock?sslmode=disable ``` ### Connection Lifecycle ``` Application Start: 1. Parse DATABASE_URL from environment 2. Create pgxpool.Config with custom parameters 3. Initialize connection pool 4. Ping database to verify connectivity 5. Pass pool to service layer Request Handling: 1. Service method receives context and pool 2. Acquire connection from pool (automatic) 3. Execute query 4. Release connection back to pool (automatic via defer) Application Shutdown: 1. Close connection pool 2. Wait for active connections to finish 3. Release all resources ``` ## Data Access Layer ### User Store **File**: `store/user.go` ```go type UserStore struct { db *pgxpool.Pool } func NewUserStore(db *pgxpool.Pool) *UserStore { return &UserStore{db: db} } ``` ### User Operations #### Save User ```go func (s *UserStore) Save(ctx context.Context, email, passwordHash string) (string, error) { var userID string query := ` INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id ` err := s.db.QueryRow(ctx, query, email, passwordHash).Scan(&userID) if err != nil { if strings.Contains(err.Error(), "duplicate key") { return "", errors.New("email already exists") } return "", fmt.Errorf("insert user: %w", err) } return userID, nil } ``` **Behavior**: - Inserts new user with email and password hash - Returns generated UUID - Handles duplicate email error - Uses parameterized query (SQL injection safe) **Example**: ```go userID, err := userStore.Save(ctx, "user@example.com", "$2a$10$...") // userID = "550e8400-e29b-41d4-a716-446655440000" ``` #### Find User by Email ```go func (s *UserStore) Find(ctx context.Context, email string) (*User, error) { var user User query := ` SELECT id, email, password_hash, role, is_verified, created_at FROM users WHERE email = $1 ` err := s.db.QueryRow(ctx, query, email).Scan( &user.ID, &user.Email, &user.PasswordHash, &user.Role, &user.IsVerified, &user.CreatedAt, ) if err != nil { if err == pgx.ErrNoRows { return nil, errors.New("user not found") } return nil, fmt.Errorf("query user: %w", err) } return &user, nil } ``` **Behavior**: - Queries user by email (uses index) - Returns full user record - Handles not found case - Uses parameterized query **Example**: ```go user, err := userStore.Find(ctx, "user@example.com") // user.ID = "550e8400-e29b-41d4-a716-446655440000" // user.Email = "user@example.com" // user.PasswordHash = "$2a$10$..." ``` #### Find User by ID ```go func (s *UserStore) FindByID(ctx context.Context, id string) (*User, error) { var user User query := ` SELECT id, email, password_hash, role, is_verified, created_at FROM users WHERE id = $1 ` err := s.db.QueryRow(ctx, query, id).Scan( &user.ID, &user.Email, &user.PasswordHash, &user.Role, &user.IsVerified, &user.CreatedAt, ) if err != nil { if err == pgx.ErrNoRows { return nil, errors.New("user not found") } return nil, fmt.Errorf("query user: %w", err) } return &user, nil } ``` **Behavior**: Similar to Find, but queries by UUID primary key ### User Model ```go type User struct { ID string Email string PasswordHash string Role string IsVerified bool CreatedAt time.Time } ``` **No ORM**: Plain structs, manual scanning ## Database Migrations ### Migration Files **Directory**: `db/migrations/` **Naming Convention**: `{number}_{description}.{up|down}.sql` **Example Structure**: ``` db/migrations/ ├── 001_create_users_table.up.sql ├── 001_create_users_table.down.sql ├── 002_add_user_roles.up.sql ├── 002_add_user_roles.down.sql ├── 003_add_email_verification.up.sql └── 003_add_email_verification.down.sql ``` ### Migration 001: Create Users Table **Up Migration** (`001_create_users_table.up.sql`): ```sql CREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, password_hash VARCHAR(255) NOT NULL, role VARCHAR(50) DEFAULT 'user', is_verified BOOLEAN DEFAULT false, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE INDEX idx_users_email ON users(email); ``` **Down Migration** (`001_create_users_table.down.sql`): ```sql DROP INDEX IF EXISTS idx_users_email; DROP TABLE IF EXISTS users; ``` ### Migration Execution **No Automated Tool**: Migrations must be run manually **Manual Execution**: ```bash # Apply migration psql $DATABASE_URL -f db/migrations/001_create_users_table.up.sql # Rollback migration psql $DATABASE_URL -f db/migrations/001_create_users_table.down.sql ``` **Recommended Tools** (not integrated): - `golang-migrate/migrate` - `pressly/goose` - `rubenv/sql-migrate` ### Migration Tracking **No Tracking Table**: No record of applied migrations **Risks**: - No way to know which migrations have been applied - Manual tracking required - Risk of applying migrations out of order - Risk of applying same migration twice **Recommendation**: Integrate migration tool with tracking table ## Caching Strategy ### Current Implementation **No Caching**: All data fetched from providers on every request **Impact**: - High latency (200-500ms per search) - Provider API rate limits - Unnecessary API quota consumption - No offline capability ### Planned Caching (Redis) **Not Implemented**: Redis integration planned but not built **Proposed Cache Keys**: | Key Pattern | TTL | Purpose | |-------------|-----|---------| | `track:{platform}:{id}` | 1 hour | Track metadata | | `album:{platform}:{id}` | 1 hour | Album metadata | | `artist:{platform}:{id}` | 1 hour | Artist metadata | | `playlist:{platform}:{id}` | 5 minutes | Playlist metadata (changes frequently) | | `stream:{platform}:{id}` | 1 hour | Stream URLs (expire after 1-6 hours) | | `search:{query}:{platform}` | 5 minutes | Search results | | `lyrics:{artist}:{title}` | 24 hours | Lyrics (rarely change) | | `play:{user_id}:{track_id}` | 30 seconds | Play deduplication | | `status:{platform}` | 5 minutes | Provider health status | **Proposed Cache Invalidation**: - TTL-based expiration (no manual invalidation) - No cache warming (lazy loading) - No cache preloading **Proposed Redis Configuration**: ```go redisClient := redis.NewClient(&redis.Options{ Addr: os.Getenv("REDIS_URL"), Password: os.Getenv("REDIS_PASSWORD"), DB: 0, MaxRetries: 3, PoolSize: 10, MinIdleConns: 2, }) ``` ### Cache-Aside Pattern (Proposed) ```go func (s *server) GetTrack(ctx context.Context, req *pb.GetRequest) (*pb.Track, error) { // Try cache first cacheKey := fmt.Sprintf("track:%s", req.Id) cached, err := s.redis.Get(ctx, cacheKey).Result() if err == nil { var track pb.Track json.Unmarshal([]byte(cached), &track) return &track, nil } // Cache miss, fetch from provider platform, nativeID := parseNamespacedID(req.Id) provider := s.getProvider(platform) track, err := provider.GetTrack(ctx, nativeID) if err != nil { return nil, err } // Store in cache trackJSON, _ := json.Marshal(track) s.redis.Set(ctx, cacheKey, trackJSON, 1*time.Hour) return track, nil } ``` ## Data Persistence Patterns ### No Metadata Persistence **Current**: All metadata is ephemeral (fetched from providers, not stored) **Implications**: - No historical data - No offline access - No analytics on metadata changes - No data ownership **Alternative Approach** (not implemented): - Store all fetched metadata in PostgreSQL - Update on cache miss - Enable historical queries - Reduce provider API dependency ### No User Data Persistence **Current**: Only authentication data is stored **Missing User Data**: - Listening history - Favorite tracks/albums/artists - Created playlists - Search history - Playback state (current track, position) - User preferences **Implications**: - No personalization - No recommendations based on history - No cross-device sync - No user analytics ## Transaction Handling ### No Transactions **Current**: All database operations are single-statement **Example** (no transaction): ```go func (s *UserStore) Save(ctx context.Context, email, passwordHash string) (string, error) { var userID string err := s.db.QueryRow(ctx, "INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id", email, passwordHash, ).Scan(&userID) return userID, err } ``` **No Multi-Statement Operations**: No need for transactions with single table **Future Considerations**: If schema expands (user profiles, playlists, etc.), transactions will be needed **Transaction Example** (not used): ```go func (s *UserStore) SaveWithProfile(ctx context.Context, email, passwordHash, name string) error { tx, err := s.db.Begin(ctx) if err != nil { return err } defer tx.Rollback(ctx) var userID string err = tx.QueryRow(ctx, "INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id", email, passwordHash, ).Scan(&userID) if err != nil { return err } _, err = tx.Exec(ctx, "INSERT INTO profiles (user_id, name) VALUES ($1, $2)", userID, name, ) if err != nil { return err } return tx.Commit(ctx) } ``` ## Query Performance ### Index Usage **Indexed Queries**: ```sql -- Uses idx_users_email (B-tree index) SELECT * FROM users WHERE email = 'user@example.com'; -- Uses primary key index (automatic) SELECT * FROM users WHERE id = '550e8400-e29b-41d4-a716-446655440000'; ``` **No Full Table Scans**: All queries use indexes ### Query Patterns **Point Lookups Only**: No range queries, no aggregations, no joins **Example Queries**: ```sql -- Login (index scan on email) SELECT id, email, password_hash, role, is_verified, created_at FROM users WHERE email = $1; -- Token refresh (index scan on id) SELECT id, email, role FROM users WHERE id = $1; -- Registration (insert with RETURNING) INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id; ``` **No Complex Queries**: Simple CRUD operations only ## Data Consistency ### Email Uniqueness **Constraint**: `UNIQUE` constraint on `email` column **Enforcement**: Database-level (PostgreSQL) **Race Condition Handling**: ```go err := s.db.QueryRow(ctx, query, email, passwordHash).Scan(&userID) if err != nil { if strings.Contains(err.Error(), "duplicate key") { return "", errors.New("email already exists") } return "", fmt.Errorf("insert user: %w", err) } ``` **Concurrent Registration**: Database prevents duplicate emails even with concurrent requests ### UUID Generation **Method**: PostgreSQL `gen_random_uuid()` function **Collision Probability**: Negligible (UUID v4 has 122 random bits) **No Application-Level ID Generation**: Database handles ID creation ## Backup and Recovery ### No Automated Backups **Current**: No backup strategy implemented **Risks**: - Data loss on database failure - No point-in-time recovery - No disaster recovery plan **Recommendations**: - Enable PostgreSQL continuous archiving (WAL archiving) - Schedule daily full backups - Test restore procedures - Store backups off-site (S3, etc.) ### Manual Backup **pg_dump**: ```bash pg_dump $DATABASE_URL > backup.sql ``` **Restore**: ```bash psql $DATABASE_URL < backup.sql ``` ## Data Security ### Password Storage **Hashing Algorithm**: bcrypt **Cost Factor**: 10 (2^10 = 1024 iterations) **Implementation**: ```go func hashPassword(password string) (string, error) { bytes, err := bcrypt.GenerateFromPassword([]byte(password), 10) return string(bytes), err } func checkPasswordHash(password, hash string) bool { err := bcrypt.CompareHashAndPassword([]byte(hash), []byte(password)) return err == nil } ``` **Security Properties**: - Salted (bcrypt includes random salt) - Slow (cost factor 10 = ~100ms per hash) - Resistant to rainbow tables - Resistant to brute force (with rate limiting, not implemented) ### SQL Injection Prevention **Parameterized Queries**: All queries use `$1`, `$2` placeholders **Safe Example**: ```go // Safe: parameterized query err := s.db.QueryRow(ctx, "SELECT * FROM users WHERE email = $1", email, ).Scan(&user) ``` **Unsafe Example** (not used): ```go // Unsafe: string concatenation (NOT USED IN CODEBASE) query := fmt.Sprintf("SELECT * FROM users WHERE email = '%s'", email) err := s.db.QueryRow(ctx, query).Scan(&user) ``` **All Queries Are Safe**: No string concatenation in SQL queries ### Connection Security **SSL Mode**: Configurable via connection string **Example** (SSL disabled): ``` DATABASE_URL=postgresql://user:pass@localhost:5432/db?sslmode=disable ``` **Example** (SSL required): ``` DATABASE_URL=postgresql://user:pass@localhost:5432/db?sslmode=require ``` **Production Recommendation**: Use `sslmode=require` or `sslmode=verify-full` ## Database Monitoring ### No Monitoring **Current**: No database monitoring implemented **Missing Metrics**: - Connection pool utilization - Query latency - Slow query log - Deadlock detection - Table bloat - Index usage statistics **Recommendations**: - Enable PostgreSQL `pg_stat_statements` extension - Monitor connection pool metrics (pgxpool provides stats) - Set up alerts for connection pool exhaustion - Log slow queries (> 1 second) ### Connection Pool Stats (Available but Not Used) ```go stats := pool.Stat() log.Printf("Total connections: %d", stats.TotalConns()) log.Printf("Idle connections: %d", stats.IdleConns()) log.Printf("Acquired connections: %d", stats.AcquiredConns()) log.Printf("Max connections: %d", stats.MaxConns()) ``` **Not Implemented**: Stats are available but not logged or exposed ## Data Retention ### No Retention Policy **Current**: Data is never deleted **User Data**: - Users are never deleted (no account deletion endpoint) - No GDPR compliance (no data export, no right to be forgotten) **Recommendations**: - Implement account deletion endpoint - Add soft delete (deleted_at timestamp) - Implement data export (GDPR compliance) - Add retention policy for inactive accounts ## Scalability Considerations ### Vertical Scaling **Current Limits**: - Connection pool: 10 max connections - Single PostgreSQL instance - No read replicas **Scaling Up**: - Increase connection pool size - Increase PostgreSQL resources (CPU, RAM) - Tune PostgreSQL configuration (shared_buffers, work_mem) ### Horizontal Scaling **Not Supported**: Single database instance **Challenges**: - No sharding strategy - No read/write splitting - No multi-region support **Future Considerations**: - Add read replicas for search queries - Shard by user ID for user data - Use connection pooler (PgBouncer) for connection management ## Data Model Limitations ### Single Table Schema **Pros**: - Simple to understand - No joins required - Fast queries (index lookups only) **Cons**: - No relational data (playlists, favorites, etc.) - No metadata persistence - No user activity tracking - Limited functionality ### No Audit Trail **Missing**: - No login history - No password change history - No account modification log - No admin action log **Implications**: - No security forensics - No compliance audit trail - No user activity analytics ### No Soft Deletes **Hard Delete Only**: If delete functionality is added, records are permanently removed **Recommendation**: Add `deleted_at` timestamp for soft deletes ```sql ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP; CREATE INDEX idx_users_deleted_at ON users(deleted_at); -- Query active users SELECT * FROM users WHERE deleted_at IS NULL; ``` ## Testing Strategy ### No Database Tests **Current**: No unit tests for database operations **Missing Tests**: - User creation with duplicate email - User lookup by email - User lookup by ID - Connection pool exhaustion - Database connection failure - Transaction rollback (if added) **Recommendation**: Add integration tests with test database **Example Test** (not implemented): ```go func TestUserStore_Save_DuplicateEmail(t *testing.T) { db := setupTestDB(t) defer db.Close() store := NewUserStore(db) // First save should succeed _, err := store.Save(context.Background(), "test@example.com", "hash1") if err != nil { t.Fatalf("first save failed: %v", err) } // Second save with same email should fail _, err = store.Save(context.Background(), "test@example.com", "hash2") if err == nil { t.Fatal("expected duplicate email error") } } ``` ## Environment Configuration ### Database URL **Environment Variable**: `DATABASE_URL` **Format**: PostgreSQL connection string **Example**: ``` DATABASE_URL=postgresql://bedrock:bedrock@localhost:5432/bedrock?sslmode=disable ``` **Components**: - Protocol: `postgresql://` - Username: `bedrock` - Password: `bedrock` - Host: `localhost` - Port: `5432` - Database: `bedrock` - SSL Mode: `sslmode=disable` **No Validation**: Application crashes if DATABASE_URL is invalid **Recommendation**: Validate connection string format on startup ## Docker Deployment ### Docker Compose PostgreSQL **File**: `docker-compose.yml` ```yaml version: '3.8' services: postgres: image: postgres:15-alpine environment: POSTGRES_USER: bedrock POSTGRES_PASSWORD: bedrock POSTGRES_DB: bedrock ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U bedrock"] interval: 10s timeout: 5s retries: 5 volumes: postgres_data: ``` **Features**: - PostgreSQL 15 Alpine (minimal image) - Named volume for data persistence - Health check for container orchestration - Exposed port for local development **Missing**: - No initialization scripts (migrations must be run manually) - No backup configuration - No replication - No connection pooler (PgBouncer) ### Database Initialization **Manual Process**: ```bash # Start PostgreSQL docker-compose up -d postgres # Wait for PostgreSQL to be ready docker-compose exec postgres pg_isready -U bedrock # Run migrations docker-compose exec postgres psql -U bedrock -d bedrock -f /migrations/001_create_users_table.up.sql ``` **No Automated Initialization**: Migrations must be run manually after container start **Recommendation**: Add init script to docker-compose ```yaml postgres: image: postgres:15-alpine volumes: - postgres_data:/var/lib/postgresql/data - ./db/migrations:/docker-entrypoint-initdb.d ``` ## Data Layer Summary ### Strengths - Simple, focused schema (users only) - Proper indexing (email lookup is fast) - Connection pooling (pgx/v5) - Parameterized queries (SQL injection safe) - bcrypt password hashing (secure) ### Weaknesses - No metadata persistence (all data is ephemeral) - No caching (high latency, provider API dependency) - No migration tool (manual SQL execution) - No monitoring (connection pool, query performance) - No backup strategy (data loss risk) - No audit trail (security, compliance) - Minimal schema (no user data beyond auth) ### Recommendations for Metadata Aggregator **Adopt**: - pgx/v5 driver (excellent performance, native PostgreSQL features) - Connection pooling configuration (sensible defaults) - Parameterized queries (security best practice) **Avoid**: - Manual migrations (use golang-migrate or goose) - No caching (implement Redis for metadata) - Minimal schema (metadata aggregator needs rich schema) **Enhance**: - Add metadata tables (tracks, albums, artists, labels, etc.) - Add user data tables (favorites, playlists, history) - Add caching layer (Redis for hot data) - Add migration tool (automated schema management) - Add monitoring (connection pool, query latency) - Add backup strategy (automated backups, point-in-time recovery)