feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
+57
View File
@@ -0,0 +1,57 @@
# Bedrock-API
## Overview
Multi-source music streaming aggregator written in Go. Provides unified gRPC API across multiple streaming platforms with cross-platform track bridging.
## Key Features
- **API**: gRPC + HTTP streaming proxy
- **Performance**: High-performance Go implementation
- **Bridging**: Resolves non-streamable tracks to playable alternatives
- **Auth**: JWT with PostgreSQL backend
- **License**: MIT
## Source
| Resource | URL |
|----------|-----|
| **Repository** | https://github.com/feralbureau/bedrock-api |
## Supported Providers
| Provider | Metadata | Search | Streaming | Playlist | Bridge |
|----------|----------|--------|-----------|----------|--------|
| Spotify | Yes | Yes | Bridged | Yes | SoundCloud |
| SoundCloud | Yes | Yes | Yes | Yes | - |
| Deezer | Yes | Yes | Bridged | Yes | SoundCloud |
| YouTube Music | Yes | Yes | Limited | Yes | SoundCloud |
| Yandex | Partial | Partial | - | - | - |
| VK | Partial | Partial | - | - | - |
## Architecture
- **Unified gRPC/Protobuf models** for all music entities
- **Cross-platform bridging** - resolves non-streamable tracks
- **Parallel provider searches** with Go concurrency
- **HTTP streaming proxy** with range request support
- **Lyrics integration** (LrcLib, Genius in progress)
## Self-Hosting
```bash
git clone https://github.com/feralbureau/bedrock-api.git
cd bedrock-api
# Configure providers and database
cp config.example.yaml config.yaml
# Run
go run .
```
## Notes
- Best for streaming aggregation use cases
- gRPC for high performance
- Automatic track resolution across platforms
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+978
View File
@@ -0,0 +1,978 @@
# Bedrock-API Data Layer
## Database Technology
**RDBMS**: PostgreSQL 15
**Driver**: `github.com/jackc/pgx/v5` (native PostgreSQL driver)
**Connection Pooling**: `pgxpool` (pgx connection pool)
**Migration Tool**: None (manual SQL execution)
## Database Schema
### Users Table
**File**: `db/migrations/001_create_users_table.up.sql`
```sql
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
role VARCHAR(50) DEFAULT 'user',
is_verified BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_users_email ON users(email);
```
**Columns**:
| Column | Type | Constraints | Purpose |
|--------|------|-------------|---------|
| id | UUID | PRIMARY KEY, DEFAULT gen_random_uuid() | Unique user identifier |
| email | VARCHAR(255) | UNIQUE, NOT NULL | User email (login identifier) |
| password_hash | VARCHAR(255) | NOT NULL | bcrypt hashed password |
| role | VARCHAR(50) | DEFAULT 'user' | User role (user/admin) |
| is_verified | BOOLEAN | DEFAULT false | Email verification status |
| created_at | TIMESTAMP | DEFAULT CURRENT_TIMESTAMP | Account creation timestamp |
**Indexes**:
- Primary key index on `id` (automatic)
- B-tree index on `email` (for login lookups)
**No Foreign Keys**: Single table schema, no relationships
### Schema Limitations
**Missing Tables**:
- No metadata cache (tracks, albums, artists, playlists)
- No user listening history
- No user playlists
- No user favorites/likes
- No play counts
- No search history
- No provider credentials (Spotify tokens, etc.)
**Minimal User Data**:
- No user profile (name, avatar, bio)
- No user preferences (language, region)
- No user settings (privacy, notifications)
- No user sessions (active logins)
## Connection Management
### Connection Pool Configuration
**File**: `bedrock_server/main.go`
```go
func initDB() (*pgxpool.Pool, error) {
dbURL := os.Getenv("DATABASE_URL")
if dbURL == "" {
return nil, errors.New("DATABASE_URL not set")
}
config, err := pgxpool.ParseConfig(dbURL)
if err != nil {
return nil, fmt.Errorf("parse config: %w", err)
}
// Pool configuration
config.MaxConns = 10
config.MinConns = 2
config.MaxConnLifetime = time.Hour
config.MaxConnIdleTime = 30 * time.Minute
config.HealthCheckPeriod = 1 * time.Minute
pool, err := pgxpool.NewWithConfig(context.Background(), config)
if err != nil {
return nil, fmt.Errorf("create pool: %w", err)
}
// Test connection
if err := pool.Ping(context.Background()); err != nil {
return nil, fmt.Errorf("ping: %w", err)
}
log.Println("Database connection pool initialized")
return pool, nil
}
```
**Pool Parameters**:
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| MaxConns | 10 | Limit concurrent DB connections |
| MinConns | 2 | Keep warm connections ready |
| MaxConnLifetime | 1 hour | Prevent stale connections |
| MaxConnIdleTime | 30 minutes | Close idle connections |
| HealthCheckPeriod | 1 minute | Detect dead connections |
**Connection String Format**:
```
postgresql://username:password@host:port/database?sslmode=disable
```
**Example**:
```
DATABASE_URL=postgresql://bedrock:bedrock@localhost:5432/bedrock?sslmode=disable
```
### Connection Lifecycle
```
Application Start:
1. Parse DATABASE_URL from environment
2. Create pgxpool.Config with custom parameters
3. Initialize connection pool
4. Ping database to verify connectivity
5. Pass pool to service layer
Request Handling:
1. Service method receives context and pool
2. Acquire connection from pool (automatic)
3. Execute query
4. Release connection back to pool (automatic via defer)
Application Shutdown:
1. Close connection pool
2. Wait for active connections to finish
3. Release all resources
```
## Data Access Layer
### User Store
**File**: `store/user.go`
```go
type UserStore struct {
db *pgxpool.Pool
}
func NewUserStore(db *pgxpool.Pool) *UserStore {
return &UserStore{db: db}
}
```
### User Operations
#### Save User
```go
func (s *UserStore) Save(ctx context.Context, email, passwordHash string) (string, error) {
var userID string
query := `
INSERT INTO users (email, password_hash)
VALUES ($1, $2)
RETURNING id
`
err := s.db.QueryRow(ctx, query, email, passwordHash).Scan(&userID)
if err != nil {
if strings.Contains(err.Error(), "duplicate key") {
return "", errors.New("email already exists")
}
return "", fmt.Errorf("insert user: %w", err)
}
return userID, nil
}
```
**Behavior**:
- Inserts new user with email and password hash
- Returns generated UUID
- Handles duplicate email error
- Uses parameterized query (SQL injection safe)
**Example**:
```go
userID, err := userStore.Save(ctx, "user@example.com", "$2a$10$...")
// userID = "550e8400-e29b-41d4-a716-446655440000"
```
#### Find User by Email
```go
func (s *UserStore) Find(ctx context.Context, email string) (*User, error) {
var user User
query := `
SELECT id, email, password_hash, role, is_verified, created_at
FROM users
WHERE email = $1
`
err := s.db.QueryRow(ctx, query, email).Scan(
&user.ID,
&user.Email,
&user.PasswordHash,
&user.Role,
&user.IsVerified,
&user.CreatedAt,
)
if err != nil {
if err == pgx.ErrNoRows {
return nil, errors.New("user not found")
}
return nil, fmt.Errorf("query user: %w", err)
}
return &user, nil
}
```
**Behavior**:
- Queries user by email (uses index)
- Returns full user record
- Handles not found case
- Uses parameterized query
**Example**:
```go
user, err := userStore.Find(ctx, "user@example.com")
// user.ID = "550e8400-e29b-41d4-a716-446655440000"
// user.Email = "user@example.com"
// user.PasswordHash = "$2a$10$..."
```
#### Find User by ID
```go
func (s *UserStore) FindByID(ctx context.Context, id string) (*User, error) {
var user User
query := `
SELECT id, email, password_hash, role, is_verified, created_at
FROM users
WHERE id = $1
`
err := s.db.QueryRow(ctx, query, id).Scan(
&user.ID,
&user.Email,
&user.PasswordHash,
&user.Role,
&user.IsVerified,
&user.CreatedAt,
)
if err != nil {
if err == pgx.ErrNoRows {
return nil, errors.New("user not found")
}
return nil, fmt.Errorf("query user: %w", err)
}
return &user, nil
}
```
**Behavior**: Similar to Find, but queries by UUID primary key
### User Model
```go
type User struct {
ID string
Email string
PasswordHash string
Role string
IsVerified bool
CreatedAt time.Time
}
```
**No ORM**: Plain structs, manual scanning
## Database Migrations
### Migration Files
**Directory**: `db/migrations/`
**Naming Convention**: `{number}_{description}.{up|down}.sql`
**Example Structure**:
```
db/migrations/
├── 001_create_users_table.up.sql
├── 001_create_users_table.down.sql
├── 002_add_user_roles.up.sql
├── 002_add_user_roles.down.sql
├── 003_add_email_verification.up.sql
└── 003_add_email_verification.down.sql
```
### Migration 001: Create Users Table
**Up Migration** (`001_create_users_table.up.sql`):
```sql
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
role VARCHAR(50) DEFAULT 'user',
is_verified BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_users_email ON users(email);
```
**Down Migration** (`001_create_users_table.down.sql`):
```sql
DROP INDEX IF EXISTS idx_users_email;
DROP TABLE IF EXISTS users;
```
### Migration Execution
**No Automated Tool**: Migrations must be run manually
**Manual Execution**:
```bash
# Apply migration
psql $DATABASE_URL -f db/migrations/001_create_users_table.up.sql
# Rollback migration
psql $DATABASE_URL -f db/migrations/001_create_users_table.down.sql
```
**Recommended Tools** (not integrated):
- `golang-migrate/migrate`
- `pressly/goose`
- `rubenv/sql-migrate`
### Migration Tracking
**No Tracking Table**: No record of applied migrations
**Risks**:
- No way to know which migrations have been applied
- Manual tracking required
- Risk of applying migrations out of order
- Risk of applying same migration twice
**Recommendation**: Integrate migration tool with tracking table
## Caching Strategy
### Current Implementation
**No Caching**: All data fetched from providers on every request
**Impact**:
- High latency (200-500ms per search)
- Provider API rate limits
- Unnecessary API quota consumption
- No offline capability
### Planned Caching (Redis)
**Not Implemented**: Redis integration planned but not built
**Proposed Cache Keys**:
| Key Pattern | TTL | Purpose |
|-------------|-----|---------|
| `track:{platform}:{id}` | 1 hour | Track metadata |
| `album:{platform}:{id}` | 1 hour | Album metadata |
| `artist:{platform}:{id}` | 1 hour | Artist metadata |
| `playlist:{platform}:{id}` | 5 minutes | Playlist metadata (changes frequently) |
| `stream:{platform}:{id}` | 1 hour | Stream URLs (expire after 1-6 hours) |
| `search:{query}:{platform}` | 5 minutes | Search results |
| `lyrics:{artist}:{title}` | 24 hours | Lyrics (rarely change) |
| `play:{user_id}:{track_id}` | 30 seconds | Play deduplication |
| `status:{platform}` | 5 minutes | Provider health status |
**Proposed Cache Invalidation**:
- TTL-based expiration (no manual invalidation)
- No cache warming (lazy loading)
- No cache preloading
**Proposed Redis Configuration**:
```go
redisClient := redis.NewClient(&redis.Options{
Addr: os.Getenv("REDIS_URL"),
Password: os.Getenv("REDIS_PASSWORD"),
DB: 0,
MaxRetries: 3,
PoolSize: 10,
MinIdleConns: 2,
})
```
### Cache-Aside Pattern (Proposed)
```go
func (s *server) GetTrack(ctx context.Context, req *pb.GetRequest) (*pb.Track, error) {
// Try cache first
cacheKey := fmt.Sprintf("track:%s", req.Id)
cached, err := s.redis.Get(ctx, cacheKey).Result()
if err == nil {
var track pb.Track
json.Unmarshal([]byte(cached), &track)
return &track, nil
}
// Cache miss, fetch from provider
platform, nativeID := parseNamespacedID(req.Id)
provider := s.getProvider(platform)
track, err := provider.GetTrack(ctx, nativeID)
if err != nil {
return nil, err
}
// Store in cache
trackJSON, _ := json.Marshal(track)
s.redis.Set(ctx, cacheKey, trackJSON, 1*time.Hour)
return track, nil
}
```
## Data Persistence Patterns
### No Metadata Persistence
**Current**: All metadata is ephemeral (fetched from providers, not stored)
**Implications**:
- No historical data
- No offline access
- No analytics on metadata changes
- No data ownership
**Alternative Approach** (not implemented):
- Store all fetched metadata in PostgreSQL
- Update on cache miss
- Enable historical queries
- Reduce provider API dependency
### No User Data Persistence
**Current**: Only authentication data is stored
**Missing User Data**:
- Listening history
- Favorite tracks/albums/artists
- Created playlists
- Search history
- Playback state (current track, position)
- User preferences
**Implications**:
- No personalization
- No recommendations based on history
- No cross-device sync
- No user analytics
## Transaction Handling
### No Transactions
**Current**: All database operations are single-statement
**Example** (no transaction):
```go
func (s *UserStore) Save(ctx context.Context, email, passwordHash string) (string, error) {
var userID string
err := s.db.QueryRow(ctx,
"INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id",
email, passwordHash,
).Scan(&userID)
return userID, err
}
```
**No Multi-Statement Operations**: No need for transactions with single table
**Future Considerations**: If schema expands (user profiles, playlists, etc.), transactions will be needed
**Transaction Example** (not used):
```go
func (s *UserStore) SaveWithProfile(ctx context.Context, email, passwordHash, name string) error {
tx, err := s.db.Begin(ctx)
if err != nil {
return err
}
defer tx.Rollback(ctx)
var userID string
err = tx.QueryRow(ctx,
"INSERT INTO users (email, password_hash) VALUES ($1, $2) RETURNING id",
email, passwordHash,
).Scan(&userID)
if err != nil {
return err
}
_, err = tx.Exec(ctx,
"INSERT INTO profiles (user_id, name) VALUES ($1, $2)",
userID, name,
)
if err != nil {
return err
}
return tx.Commit(ctx)
}
```
## Query Performance
### Index Usage
**Indexed Queries**:
```sql
-- Uses idx_users_email (B-tree index)
SELECT * FROM users WHERE email = 'user@example.com';
-- Uses primary key index (automatic)
SELECT * FROM users WHERE id = '550e8400-e29b-41d4-a716-446655440000';
```
**No Full Table Scans**: All queries use indexes
### Query Patterns
**Point Lookups Only**: No range queries, no aggregations, no joins
**Example Queries**:
```sql
-- Login (index scan on email)
SELECT id, email, password_hash, role, is_verified, created_at
FROM users
WHERE email = $1;
-- Token refresh (index scan on id)
SELECT id, email, role
FROM users
WHERE id = $1;
-- Registration (insert with RETURNING)
INSERT INTO users (email, password_hash)
VALUES ($1, $2)
RETURNING id;
```
**No Complex Queries**: Simple CRUD operations only
## Data Consistency
### Email Uniqueness
**Constraint**: `UNIQUE` constraint on `email` column
**Enforcement**: Database-level (PostgreSQL)
**Race Condition Handling**:
```go
err := s.db.QueryRow(ctx, query, email, passwordHash).Scan(&userID)
if err != nil {
if strings.Contains(err.Error(), "duplicate key") {
return "", errors.New("email already exists")
}
return "", fmt.Errorf("insert user: %w", err)
}
```
**Concurrent Registration**: Database prevents duplicate emails even with concurrent requests
### UUID Generation
**Method**: PostgreSQL `gen_random_uuid()` function
**Collision Probability**: Negligible (UUID v4 has 122 random bits)
**No Application-Level ID Generation**: Database handles ID creation
## Backup and Recovery
### No Automated Backups
**Current**: No backup strategy implemented
**Risks**:
- Data loss on database failure
- No point-in-time recovery
- No disaster recovery plan
**Recommendations**:
- Enable PostgreSQL continuous archiving (WAL archiving)
- Schedule daily full backups
- Test restore procedures
- Store backups off-site (S3, etc.)
### Manual Backup
**pg_dump**:
```bash
pg_dump $DATABASE_URL > backup.sql
```
**Restore**:
```bash
psql $DATABASE_URL < backup.sql
```
## Data Security
### Password Storage
**Hashing Algorithm**: bcrypt
**Cost Factor**: 10 (2^10 = 1024 iterations)
**Implementation**:
```go
func hashPassword(password string) (string, error) {
bytes, err := bcrypt.GenerateFromPassword([]byte(password), 10)
return string(bytes), err
}
func checkPasswordHash(password, hash string) bool {
err := bcrypt.CompareHashAndPassword([]byte(hash), []byte(password))
return err == nil
}
```
**Security Properties**:
- Salted (bcrypt includes random salt)
- Slow (cost factor 10 = ~100ms per hash)
- Resistant to rainbow tables
- Resistant to brute force (with rate limiting, not implemented)
### SQL Injection Prevention
**Parameterized Queries**: All queries use `$1`, `$2` placeholders
**Safe Example**:
```go
// Safe: parameterized query
err := s.db.QueryRow(ctx,
"SELECT * FROM users WHERE email = $1",
email,
).Scan(&user)
```
**Unsafe Example** (not used):
```go
// Unsafe: string concatenation (NOT USED IN CODEBASE)
query := fmt.Sprintf("SELECT * FROM users WHERE email = '%s'", email)
err := s.db.QueryRow(ctx, query).Scan(&user)
```
**All Queries Are Safe**: No string concatenation in SQL queries
### Connection Security
**SSL Mode**: Configurable via connection string
**Example** (SSL disabled):
```
DATABASE_URL=postgresql://user:pass@localhost:5432/db?sslmode=disable
```
**Example** (SSL required):
```
DATABASE_URL=postgresql://user:pass@localhost:5432/db?sslmode=require
```
**Production Recommendation**: Use `sslmode=require` or `sslmode=verify-full`
## Database Monitoring
### No Monitoring
**Current**: No database monitoring implemented
**Missing Metrics**:
- Connection pool utilization
- Query latency
- Slow query log
- Deadlock detection
- Table bloat
- Index usage statistics
**Recommendations**:
- Enable PostgreSQL `pg_stat_statements` extension
- Monitor connection pool metrics (pgxpool provides stats)
- Set up alerts for connection pool exhaustion
- Log slow queries (> 1 second)
### Connection Pool Stats (Available but Not Used)
```go
stats := pool.Stat()
log.Printf("Total connections: %d", stats.TotalConns())
log.Printf("Idle connections: %d", stats.IdleConns())
log.Printf("Acquired connections: %d", stats.AcquiredConns())
log.Printf("Max connections: %d", stats.MaxConns())
```
**Not Implemented**: Stats are available but not logged or exposed
## Data Retention
### No Retention Policy
**Current**: Data is never deleted
**User Data**:
- Users are never deleted (no account deletion endpoint)
- No GDPR compliance (no data export, no right to be forgotten)
**Recommendations**:
- Implement account deletion endpoint
- Add soft delete (deleted_at timestamp)
- Implement data export (GDPR compliance)
- Add retention policy for inactive accounts
## Scalability Considerations
### Vertical Scaling
**Current Limits**:
- Connection pool: 10 max connections
- Single PostgreSQL instance
- No read replicas
**Scaling Up**:
- Increase connection pool size
- Increase PostgreSQL resources (CPU, RAM)
- Tune PostgreSQL configuration (shared_buffers, work_mem)
### Horizontal Scaling
**Not Supported**: Single database instance
**Challenges**:
- No sharding strategy
- No read/write splitting
- No multi-region support
**Future Considerations**:
- Add read replicas for search queries
- Shard by user ID for user data
- Use connection pooler (PgBouncer) for connection management
## Data Model Limitations
### Single Table Schema
**Pros**:
- Simple to understand
- No joins required
- Fast queries (index lookups only)
**Cons**:
- No relational data (playlists, favorites, etc.)
- No metadata persistence
- No user activity tracking
- Limited functionality
### No Audit Trail
**Missing**:
- No login history
- No password change history
- No account modification log
- No admin action log
**Implications**:
- No security forensics
- No compliance audit trail
- No user activity analytics
### No Soft Deletes
**Hard Delete Only**: If delete functionality is added, records are permanently removed
**Recommendation**: Add `deleted_at` timestamp for soft deletes
```sql
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP;
CREATE INDEX idx_users_deleted_at ON users(deleted_at);
-- Query active users
SELECT * FROM users WHERE deleted_at IS NULL;
```
## Testing Strategy
### No Database Tests
**Current**: No unit tests for database operations
**Missing Tests**:
- User creation with duplicate email
- User lookup by email
- User lookup by ID
- Connection pool exhaustion
- Database connection failure
- Transaction rollback (if added)
**Recommendation**: Add integration tests with test database
**Example Test** (not implemented):
```go
func TestUserStore_Save_DuplicateEmail(t *testing.T) {
db := setupTestDB(t)
defer db.Close()
store := NewUserStore(db)
// First save should succeed
_, err := store.Save(context.Background(), "test@example.com", "hash1")
if err != nil {
t.Fatalf("first save failed: %v", err)
}
// Second save with same email should fail
_, err = store.Save(context.Background(), "test@example.com", "hash2")
if err == nil {
t.Fatal("expected duplicate email error")
}
}
```
## Environment Configuration
### Database URL
**Environment Variable**: `DATABASE_URL`
**Format**: PostgreSQL connection string
**Example**:
```
DATABASE_URL=postgresql://bedrock:bedrock@localhost:5432/bedrock?sslmode=disable
```
**Components**:
- Protocol: `postgresql://`
- Username: `bedrock`
- Password: `bedrock`
- Host: `localhost`
- Port: `5432`
- Database: `bedrock`
- SSL Mode: `sslmode=disable`
**No Validation**: Application crashes if DATABASE_URL is invalid
**Recommendation**: Validate connection string format on startup
## Docker Deployment
### Docker Compose PostgreSQL
**File**: `docker-compose.yml`
```yaml
version: '3.8'
services:
postgres:
image: postgres:15-alpine
environment:
POSTGRES_USER: bedrock
POSTGRES_PASSWORD: bedrock
POSTGRES_DB: bedrock
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U bedrock"]
interval: 10s
timeout: 5s
retries: 5
volumes:
postgres_data:
```
**Features**:
- PostgreSQL 15 Alpine (minimal image)
- Named volume for data persistence
- Health check for container orchestration
- Exposed port for local development
**Missing**:
- No initialization scripts (migrations must be run manually)
- No backup configuration
- No replication
- No connection pooler (PgBouncer)
### Database Initialization
**Manual Process**:
```bash
# Start PostgreSQL
docker-compose up -d postgres
# Wait for PostgreSQL to be ready
docker-compose exec postgres pg_isready -U bedrock
# Run migrations
docker-compose exec postgres psql -U bedrock -d bedrock -f /migrations/001_create_users_table.up.sql
```
**No Automated Initialization**: Migrations must be run manually after container start
**Recommendation**: Add init script to docker-compose
```yaml
postgres:
image: postgres:15-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
- ./db/migrations:/docker-entrypoint-initdb.d
```
## Data Layer Summary
### Strengths
- Simple, focused schema (users only)
- Proper indexing (email lookup is fast)
- Connection pooling (pgx/v5)
- Parameterized queries (SQL injection safe)
- bcrypt password hashing (secure)
### Weaknesses
- No metadata persistence (all data is ephemeral)
- No caching (high latency, provider API dependency)
- No migration tool (manual SQL execution)
- No monitoring (connection pool, query performance)
- No backup strategy (data loss risk)
- No audit trail (security, compliance)
- Minimal schema (no user data beyond auth)
### Recommendations for Metadata Aggregator
**Adopt**:
- pgx/v5 driver (excellent performance, native PostgreSQL features)
- Connection pooling configuration (sensible defaults)
- Parameterized queries (security best practice)
**Avoid**:
- Manual migrations (use golang-migrate or goose)
- No caching (implement Redis for metadata)
- Minimal schema (metadata aggregator needs rich schema)
**Enhance**:
- Add metadata tables (tracks, albums, artists, labels, etc.)
- Add user data tables (favorites, playlists, history)
- Add caching layer (Redis for hot data)
- Add migration tool (automated schema management)
- Add monitoring (connection pool, query latency)
- Add backup strategy (automated backups, point-in-time recovery)
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,760 @@
# Bedrock-API Evaluation
## Executive Summary
Bedrock-API is a music metadata and streaming aggregation service built in Go 1.25 with gRPC and HTTP interfaces. The project demonstrates strong architectural patterns (provider abstraction, fan-out concurrency, partial response handling) but lacks production-readiness features (caching, monitoring, comprehensive testing, security hardening).
**Primary Value**: Cross-platform stream resolution (bridges non-streaming APIs like Spotify to streaming platforms like SoundCloud/YouTube Music).
**Target Use Case**: Unified music search and streaming across multiple platforms.
**Maturity Level**: Early production (functional but missing observability, caching, and security features).
## Strengths
### 1. Clean Provider Abstraction
**Pattern**: Implicit `trackProvider` interface isolates platform-specific logic
**Benefits**:
- Easy to add new providers (implement interface)
- Platform failures don't affect other providers
- Testable in isolation (mock providers)
**Example**:
```go
type trackProvider interface {
Name() string
SearchTracks(ctx context.Context, query string, limit int32) ([]*pb.Track, error)
GetStreamURL(ctx context.Context, id string) (string, error)
// ... other methods
}
```
**Applicability to Metadata Aggregator**: Directly applicable. Same pattern can be used for metadata providers (Discogs, MusicBrainz, Last.fm, etc.).
### 2. Fan-Out Concurrency
**Pattern**: Parallel goroutines per provider with WaitGroup coordination
**Benefits**:
- Response time = slowest provider (not sum of all)
- Typical search: 200-500ms (4 providers in parallel)
- Scales linearly with provider count
**Example**:
```go
var wg sync.WaitGroup
for _, provider := range providers {
wg.Add(1)
go func(p trackProvider) {
defer wg.Done()
results, err := p.SearchTracks(ctx, query, limit)
// Aggregate results
}(provider)
}
wg.Wait()
```
**Applicability to Metadata Aggregator**: Directly applicable. Metadata queries can be parallelized across providers.
### 3. Partial Response Handling
**Pattern**: Return successful results even if some providers fail
**Benefits**:
- Resilient to individual provider failures
- Degraded service instead of complete failure
- Client can decide how to handle partial results
**Example**:
```go
if len(errors) > 0 {
if len(allTracks) == 0 {
status = pb.ResponseStatus_ERROR
} else {
status = pb.ResponseStatus_PARTIAL
}
}
return &pb.SearchTracksResponse{
Tracks: allTracks,
Status: status,
Errors: errors, // Per-provider error details
}
```
**Applicability to Metadata Aggregator**: Directly applicable. Metadata aggregation should be resilient to individual provider failures.
### 4. Cross-Platform Stream Resolution
**Pattern**: Bridge non-streaming platforms to streaming platforms
**Algorithm**:
1. Check if platform supports streaming (SoundCloud, YouTube Music)
2. If not, search SoundCloud for matching track
3. If SoundCloud fails, search YouTube Music
4. Return first successful stream URL
**Benefits**:
- Unified streaming interface (even for non-streaming APIs)
- Automatic fallback chain
- Transparent to client
**Applicability to Metadata Aggregator**: Not directly applicable (metadata aggregator doesn't need streaming). However, the fallback pattern is useful for metadata resolution (try provider A, fallback to provider B).
### 5. YouTube 7-Client Fallback
**Pattern**: Rotate through 7 different YouTube client types to maximize stream availability
**Clients**:
- TVHTML5_SIMPLY_EMBEDDED (primary)
- TVHTML5
- ANDROID_VR (2 variants)
- ANDROID
- IOS
- WEB
**Benefits**:
- Maximizes success rate (different clients have different capabilities)
- Avoids ciphered streams (encrypted, require decryption)
- Handles geo-restrictions
**Applicability to Metadata Aggregator**: Pattern is applicable for providers with multiple API endpoints or client types.
### 6. ID Namespacing
**Pattern**: Platform-prefixed IDs (`{platform}:{type}:{native_id}`)
**Examples**:
- `spotify:track:3n3Ppam7vgaVa1iaRUc9Lp`
- `soundcloud:track:1234567890`
- `deezer:album:302127`
**Benefits**:
- Prevents ID collisions across platforms
- Explicit routing (no lookup required)
- Self-documenting (ID reveals source platform)
**Applicability to Metadata Aggregator**: Directly applicable. Metadata IDs should be namespaced to prevent collisions.
### 7. gRPC for Performance
**Benefits**:
- HTTP/2 multiplexing (multiple requests over single connection)
- Binary protocol (smaller payloads than JSON)
- Streaming support (future use)
- Strong typing (protobuf)
**Tradeoffs**:
- Requires client code generation
- Less human-readable than REST/JSON
- Tooling less mature than REST
**Applicability to Metadata Aggregator**: Consider gRPC for internal services, REST for public API.
### 8. JWT Authentication
**Implementation**: HS256 tokens with bcrypt password hashing
**Benefits**:
- Stateless authentication (no session storage)
- Token expiration (15min access, 7 day refresh)
- Secure password storage (bcrypt cost 10)
**Limitations**:
- No token revocation
- No refresh token rotation
- Single shared secret (HS256)
**Applicability to Metadata Aggregator**: JWT is suitable, but consider RS256 (asymmetric) for better security.
### 9. SoundCloud Client ID Rotation
**Pattern**: Rotate through multiple client IDs to avoid rate limits
**Implementation**:
```go
func (p *SoundCloudProvider) getClientID() string {
p.mu.Lock()
defer p.mu.Unlock()
id := p.clientIDs[p.currentID]
p.currentID = (p.currentID + 1) % len(p.clientIDs)
return id
}
```
**Benefits**:
- Increases effective rate limit (4 IDs = 4x limit)
- Automatic rotation (no manual intervention)
**Applicability to Metadata Aggregator**: Applicable for providers with rate limits (rotate API keys).
### 10. Batch Hydration (SoundCloud)
**Pattern**: Fetch details for multiple IDs in single request
**Implementation**: SoundCloud allows up to 30 IDs per request
**Benefits**:
- Reduces API calls (30x reduction for playlists)
- Faster response times
- Lower rate limit consumption
**Applicability to Metadata Aggregator**: Applicable for providers that support batch requests (MusicBrainz, Discogs).
## Weaknesses
### 1. No Caching
**Impact**:
- High latency (200-500ms per search)
- Provider API rate limits
- Unnecessary API quota consumption
- No offline capability
**Recommendation**: Implement Redis caching
**Cache Strategy**:
- Track metadata: 1 hour TTL
- Search results: 5 minutes TTL
- Stream URLs: 1 hour TTL (expire after 1-6 hours anyway)
- Lyrics: 24 hours TTL (rarely change)
**Applicability to Metadata Aggregator**: Critical. Metadata aggregator must cache to avoid repeated API calls.
### 2. Minimal Database Schema
**Current**: Single `users` table (authentication only)
**Missing**:
- No metadata persistence (tracks, albums, artists)
- No user data (favorites, playlists, history)
- No analytics (play counts, search trends)
**Impact**:
- All data is ephemeral (fetched from providers every time)
- No historical data
- No offline access
- No data ownership
**Applicability to Metadata Aggregator**: Metadata aggregator needs rich schema for metadata persistence.
### 3. No Monitoring
**Missing**:
- Prometheus metrics (request rate, error rate, latency)
- Grafana dashboards
- Distributed tracing (Jaeger)
- Log aggregation (Loki)
**Impact**:
- No visibility into performance
- No alerting on failures
- Difficult to debug production issues
**Recommendation**: Implement full observability stack
**Applicability to Metadata Aggregator**: Critical for production. Monitoring is essential.
### 4. No Rate Limiting
**Missing**:
- Per-user rate limiting
- Per-IP rate limiting
- Provider-level rate limiting
**Impact**:
- Abuse possible (unlimited requests)
- Provider API rate limits can be exceeded
- No protection against DDoS
**Recommendation**: Implement rate limiting
**Example**:
```go
import "golang.org/x/time/rate"
var limiters = make(map[string]*rate.Limiter)
func getLimiter(userID string) *rate.Limiter {
limiter, exists := limiters[userID]
if !exists {
limiter = rate.NewLimiter(rate.Every(time.Second), 10) // 10 req/sec
limiters[userID] = limiter
}
return limiter
}
```
**Applicability to Metadata Aggregator**: Critical. Rate limiting prevents abuse and protects provider APIs.
### 5. Stub Providers (Yandex, VK)
**Status**: Placeholder only, no implementation
**Impact**:
- Incomplete platform coverage
- Misleading (listed as supported but not functional)
**Recommendation**: Remove stubs or implement fully
**Applicability to Metadata Aggregator**: Don't list providers as supported unless fully implemented.
### 6. No TLS
**Current**: gRPC and HTTP without TLS
**Impact**:
- Credentials transmitted in plaintext
- JWT tokens exposed
- Man-in-the-middle attacks possible
**Recommendation**: Deploy behind reverse proxy with TLS termination
**Applicability to Metadata Aggregator**: TLS is mandatory for production.
### 7. Go Version Mismatch
**Issue**: `go.mod` specifies 1.25, Dockerfile uses 1.23
**Impact**:
- Build failures if Go 1.25 features are used
- Inconsistent builds
**Fix**:
```dockerfile
FROM golang:1.25-alpine AS builder
```
**Applicability to Metadata Aggregator**: Keep build environment in sync with go.mod.
### 8. Custom Submodule Dependency
**Issue**: `spotapi-go` is custom fork, not official library
**Impact**:
- Maintenance burden
- Submodule initialization required
- Potential security issues (unmaintained fork)
**Recommendation**: Use official library directly
**Applicability to Metadata Aggregator**: Avoid custom forks. Use official libraries or vendor dependencies.
### 9. No Unit Tests
**Current**: Integration tests only (require running server and providers)
**Missing**:
- Provider adapter unit tests (mocked HTTP responses)
- Database store unit tests (mocked database)
- Authentication unit tests (mocked JWT)
**Impact**:
- Slow test execution
- Difficult to test edge cases
- Requires provider credentials for testing
**Recommendation**: Add unit tests with mocks
**Applicability to Metadata Aggregator**: Unit tests are essential for fast feedback and edge case coverage.
### 10. Health Check Stub
**Current**: `GetServiceStatus` always returns healthy
**Impact**:
- No actual health monitoring
- Kubernetes probes don't detect failures
- No dependency health visibility
**Recommendation**: Implement real health checks
**Applicability to Metadata Aggregator**: Health checks are critical for orchestration (Kubernetes, Docker Swarm).
### 11. No Pagination
**Current**: Search results limited by `limit` parameter (max 50)
**Impact**:
- Large result sets cannot be retrieved incrementally
- No cursor-based pagination
- No total count
**Recommendation**: Add pagination
**Example**:
```protobuf
message SearchRequest {
string query = 1;
int32 limit = 2;
string cursor = 3; // Pagination cursor
}
message SearchTracksResponse {
repeated Track tracks = 1;
string next_cursor = 2; // Next page cursor
int32 total = 3; // Total result count
}
```
**Applicability to Metadata Aggregator**: Pagination is essential for large result sets.
### 12. No API Versioning
**Current**: No version in package name or endpoint
**Impact**:
- Breaking changes affect all clients
- No backward compatibility
- No deprecation path
**Recommendation**: Add versioning
**Example**:
```protobuf
package bedrock.v1;
service BedrockService {
// ...
}
```
**Applicability to Metadata Aggregator**: API versioning is critical for backward compatibility.
## Integration Complexity
### Provider Integration Effort
| Provider | Complexity | Reason |
|----------|------------|--------|
| Spotify | Medium | OAuth 2.0, submodule dependency |
| SoundCloud | Low | Simple HTTP API, client ID rotation |
| Deezer | Low | Public API, no auth |
| YouTube Music | High | Undocumented Innertube API, 7-client fallback, cipher handling |
| Yandex | Unknown | Not implemented |
| VK | Unknown | Not implemented |
**Easiest**: Deezer (public API, no auth)
**Hardest**: YouTube Music (undocumented API, complex fallback logic)
### Client Integration Effort
**gRPC Clients**: Requires protobuf compilation
**Steps**:
1. Install protoc compiler
2. Install language-specific protobuf plugin
3. Generate client code from `.proto` file
4. Implement authentication (JWT in metadata)
**Example** (Go):
```bash
protoc --go_out=. --go-grpc_out=. bedrock_service.proto
```
**Example** (Python):
```bash
python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. bedrock_service.proto
```
**Complexity**: Medium (requires tooling setup)
**Alternative**: Provide pre-generated clients for popular languages
## Performance Analysis
### Latency Breakdown
**Typical Search Request** (4 providers):
| Component | Latency | Notes |
|-----------|---------|-------|
| gRPC overhead | 1-5ms | Minimal |
| Authentication | 1-2ms | JWT validation |
| Provider queries (parallel) | 200-500ms | Slowest provider wins |
| Response aggregation | 1-5ms | Mutex-protected append |
| **Total** | **200-510ms** | Dominated by provider latency |
**Optimization Opportunities**:
- Cache metadata (reduce provider calls)
- Implement timeouts (don't wait for slow providers)
- Add circuit breakers (skip failing providers)
### Throughput
**Single Instance** (no caching):
- Requests per second: ~10-20 (limited by provider APIs)
- Concurrent requests: Limited by goroutine count (unbounded, risky)
**With Caching** (Redis):
- Requests per second: ~1000+ (cache hits)
- Concurrent requests: Limited by database connections (10 max)
**Scaling**:
- Horizontal: Run multiple instances behind load balancer
- Vertical: Increase CPU/RAM for single instance
### Resource Usage
**Memory**: ~50-100 MB (idle), ~200-500 MB (under load)
**CPU**: Low (I/O bound, waiting on provider APIs)
**Network**: High (streaming proxy, provider API calls)
## Security Assessment
### Authentication
**Strengths**:
- JWT tokens (stateless)
- bcrypt password hashing (secure)
- gRPC interceptors (centralized auth)
**Weaknesses**:
- No token revocation
- No refresh token rotation
- Single shared secret (HS256)
- No rate limiting (brute force possible)
- No account lockout
**Risk Level**: Medium
**Recommendations**:
- Implement token revocation list (Redis)
- Use RS256 (asymmetric keys)
- Add rate limiting on auth endpoints
- Add account lockout after failed attempts
### Transport Security
**Strengths**: None (no TLS)
**Weaknesses**:
- Credentials transmitted in plaintext
- JWT tokens exposed
- Man-in-the-middle attacks possible
**Risk Level**: High
**Recommendations**:
- Deploy behind reverse proxy with TLS
- Use Let's Encrypt for free certificates
- Enforce HTTPS redirects
### Input Validation
**Strengths**:
- Parameterized queries (SQL injection safe)
- Email format validation
**Weaknesses**:
- No query length limits
- No ID format validation
- No limit parameter bounds
**Risk Level**: Low (no SQL injection, but potential DoS)
**Recommendations**:
- Validate all inputs (length, format, bounds)
- Sanitize user-provided data
- Add request size limits
### Secrets Management
**Strengths**: None (plaintext `.env` files)
**Weaknesses**:
- Secrets in plaintext
- No rotation
- No encryption at rest
**Risk Level**: Medium
**Recommendations**:
- Use secrets manager (AWS Secrets Manager, Vault)
- Rotate secrets periodically
- Encrypt secrets at rest
## Scalability
### Vertical Scaling
**Current Limits**:
- Database connections: 10 max
- Goroutines: Unbounded (risky)
- Memory: ~500 MB under load
**Scaling Up**:
- Increase database connection pool
- Add worker pool (bounded goroutines)
- Increase instance size (CPU, RAM)
**Max Capacity** (single instance): ~100 req/sec (with caching)
### Horizontal Scaling
**Stateless Design**: Yes (JWT tokens, no sessions)
**Scaling Out**:
- Run multiple instances behind load balancer
- Share PostgreSQL database (read replicas for reads)
- Share Redis cache (cluster mode)
**Max Capacity** (10 instances): ~1000 req/sec (with caching)
### Database Scaling
**Current**: Single PostgreSQL instance
**Scaling Options**:
- Read replicas (for read-heavy workloads)
- Connection pooler (PgBouncer)
- Sharding (by user ID)
**Bottleneck**: Database is not bottleneck (minimal schema, simple queries)
## Maintainability
### Code Organization
**Strengths**:
- Clean provider abstraction
- Separation of concerns (providers, store, auth)
**Weaknesses**:
- Single 1300+ line file (`main.go`)
- No package documentation
- No API documentation
**Recommendation**: Split `main.go` by domain (search, retrieval, streaming, etc.)
### Testing
**Strengths**:
- Integration tests for all providers
- GitHub Actions CI/CD
**Weaknesses**:
- No unit tests
- No test coverage reporting
- No mocks
**Recommendation**: Add unit tests with mocks, measure coverage
### Documentation
**Strengths**:
- README with setup instructions
- `.env.example` template
**Weaknesses**:
- No API documentation (OpenAPI/Swagger)
- No architecture documentation
- No deployment guide
**Recommendation**: Add comprehensive documentation
### Dependency Management
**Strengths**:
- Go modules (versioned dependencies)
- Minimal dependencies (8 direct)
**Weaknesses**:
- Custom submodule (spotapi-go)
- No automated updates (Dependabot)
**Recommendation**: Remove submodule, add Dependabot
## Comparison to Metadata Aggregator Requirements
### Alignment
| Requirement | Bedrock-API | Metadata Aggregator | Alignment |
|-------------|-------------|---------------------|-----------|
| Multi-provider aggregation | Yes (4 active) | Yes (10+ planned) | High |
| Parallel queries | Yes (goroutines) | Yes | High |
| Partial response handling | Yes | Yes | High |
| Metadata persistence | No | Yes | Low |
| Caching | No | Yes (critical) | Low |
| Rich metadata | Medium | High | Medium |
| Streaming | Yes | No | N/A |
| Authentication | JWT | TBD | Medium |
| Monitoring | No | Yes | Low |
| Testing | Integration only | Unit + Integration | Medium |
### Reusable Patterns
**Directly Applicable**:
- Provider interface pattern
- Fan-out concurrency
- Partial response handling
- ID namespacing
- gRPC interceptors
**Needs Adaptation**:
- Authentication (add RBAC, token revocation)
- Database schema (expand for metadata)
- Caching (add Redis)
- Monitoring (add Prometheus)
**Not Applicable**:
- Stream resolution (metadata aggregator doesn't need streaming)
- YouTube 7-client fallback (specific to YouTube)
## Recommendations for Metadata Aggregator
### Adopt
1. **Provider Interface Pattern**: Clean abstraction for platform-specific logic
2. **Fan-Out Concurrency**: Parallel queries for fast responses
3. **Partial Response Handling**: Resilient to individual provider failures
4. **ID Namespacing**: Prevent collisions, enable explicit routing
5. **gRPC for Internal Services**: Performance benefits for service-to-service communication
6. **JWT Authentication**: Stateless, scalable authentication
7. **bcrypt Password Hashing**: Secure password storage
### Avoid
1. **No Caching**: Implement Redis from day one
2. **Minimal Database Schema**: Design rich schema for metadata persistence
3. **No Monitoring**: Implement Prometheus + Grafana from start
4. **No Rate Limiting**: Add rate limiting to prevent abuse
5. **Stub Providers**: Only list fully implemented providers
6. **No TLS**: Deploy with TLS from start
7. **Custom Submodules**: Use official libraries or vendor dependencies
8. **No Unit Tests**: Write unit tests with mocks
9. **Single Large File**: Split code by domain
10. **No API Versioning**: Version API from start
### Enhance
1. **Add Caching Layer**: Redis for metadata, search results, provider responses
2. **Expand Database Schema**: Tables for tracks, albums, artists, labels, genres, etc.
3. **Implement Monitoring**: Prometheus metrics, Grafana dashboards, distributed tracing
4. **Add Rate Limiting**: Per-user, per-IP, per-provider limits
5. **Implement Health Checks**: Real health checks for dependencies
6. **Add Pagination**: Cursor-based pagination for large result sets
7. **Add API Versioning**: Version API for backward compatibility
8. **Add Comprehensive Testing**: Unit tests with mocks, integration tests, E2E tests
9. **Add Documentation**: API docs (OpenAPI), architecture docs, deployment guide
10. **Add Security Features**: Token revocation, refresh token rotation, RS256, TLS
## Final Verdict
**Overall Assessment**: Good architectural foundation, but lacks production-readiness features.
**Strengths**: Clean provider abstraction, fan-out concurrency, partial response handling, cross-platform stream resolution.
**Weaknesses**: No caching, minimal database schema, no monitoring, no rate limiting, no TLS, stub providers.
**Maturity Level**: Early production (functional but missing critical features).
**Recommendation for Metadata Aggregator**: Adopt core patterns (provider interface, fan-out concurrency, partial responses, ID namespacing), but enhance with caching, monitoring, comprehensive testing, and security features.
**Effort to Adapt**: Medium (core patterns are reusable, but significant enhancements needed for production).
**Value Proposition**: Bedrock-API demonstrates proven patterns for multi-provider aggregation. The metadata aggregator can learn from its strengths (clean abstraction, concurrency, resilience) while avoiding its weaknesses (no caching, minimal schema, no monitoring).
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,460 @@
# Bedrock-API Overview
## Project Identity
**Repository**: https://github.com/feralbureau/bedrock-api
**Language**: Go 1.25
**License**: MIT
**Primary Protocols**: gRPC, HTTP
**Database**: PostgreSQL 15
**Entry Point**: `bedrock_server/main.go`
Bedrock-API is a unified music metadata and streaming aggregation service that consolidates six music platforms into a single gRPC interface. The project's core value proposition is cross-platform stream resolution: when a platform doesn't provide streaming (Spotify partner API, Deezer public API), Bedrock bridges to SoundCloud or YouTube Music to deliver playable URLs.
## Platform Coverage
| Platform | Status | API Type | Streaming | Authentication | Special Features |
|----------|--------|----------|-----------|----------------|------------------|
| Spotify | Full | Partner API | No (bridged) | OAuth via submodule | Full discography, namespaced IDs |
| SoundCloud | Full | api-v2 | Yes (progressive MP3) | Client ID rotation | Batch hydration (30 IDs), /resolve endpoint |
| Deezer | Full | Public API | No (bridged) | None | Concurrent artist data fetching |
| YouTube Music | Full | Innertube | Yes (7-client fallback) | Cookies for age-restricted | WEB_REMIX metadata, itag priority |
| Yandex Music | Stub | N/A | No | N/A | Placeholder only |
| VK Music | Stub | N/A | No | N/A | Placeholder only |
**Active Platforms**: 4 (Spotify, SoundCloud, Deezer, YouTube Music)
**Stub Platforms**: 2 (Yandex, VK)
## Core Capabilities
### gRPC Service Interface
**Total Methods**: 23 RPC endpoints
**Protocol Buffer**: `bedrock_service.proto` (622 lines)
Method categories:
- **Search**: 4 methods (tracks, albums, artists, playlists)
- **Retrieval**: 4 methods (get track, album, artist, playlist by ID)
- **Streaming**: 1 method (GetStreamURL)
- **Discovery**: 1 method (GetSimilarTracks)
- **Lyrics**: 2 methods (GetLyrics, GetSyncedLyrics)
- **Statistics**: 3 methods (GetTopTracks, GetTopAlbums, GetTopArtists)
- **Import**: 1 method (ImportPlaylist)
- **Health**: 1 method (GetServiceStatus)
- **Authentication**: 3 methods (Register, Login, RefreshToken)
### HTTP Streaming Proxy
**Endpoints**:
- `/stream/{service}/{id}` - Audio stream proxy with range request support
- `/cover/{service}/{id}` - Album art proxy
**Ports**:
- gRPC: `:50052`
- HTTP: `:8080`
Both endpoints support HTTP range requests for seeking and partial content delivery.
## Technology Stack
### Core Dependencies
```
google.golang.org/grpc v1.79.1
google.golang.org/protobuf v1.36.4
github.com/jackc/pgx/v5 v5.7.2
github.com/golang-jwt/jwt/v5 v5.2.1
golang.org/x/crypto (bcrypt)
github.com/joho/godotenv v1.5.1
```
### Provider Libraries
```
github.com/zmb3/spotify/v2 (via spotapi-go submodule)
github.com/kkdai/youtube/v2 v2.10.3
github.com/rhnvrm/lyric-api-go v0.1.4 (Genius)
```
**Submodule**: `spotapi-go` (custom Spotify client wrapper)
### Build Requirements
- Go 1.25 (go.mod specification)
- Git submodules (spotapi-go)
- PostgreSQL 15+ (runtime)
- Protocol buffer compiler (development)
## Architecture Highlights
### Fan-Out Concurrency Pattern
All search and retrieval methods execute parallel goroutines across enabled providers:
```go
var wg sync.WaitGroup
for _, provider := range providers {
wg.Add(1)
go func(p trackProvider) {
defer wg.Done()
results, err := p.SearchTracks(query, limit)
// aggregate results
}(provider)
}
wg.Wait()
```
This pattern enables sub-second response times even when querying 4+ platforms simultaneously.
### Stream Resolution Bridge
**Problem**: Spotify partner API and Deezer public API don't provide streaming URLs.
**Solution**: Three-tier fallback cascade:
1. Check if requested platform supports streaming (SoundCloud, YouTube Music)
2. If not, search SoundCloud for "{artist} - {title}"
3. If SoundCloud fails, search YouTube Music with same query
4. Return first successful stream URL
**Implementation**: `providers/resolver.go`
### YouTube Music 7-Client Fallback Pool
YouTube Music streams use a client rotation strategy to maximize success rate:
```
TVHTML5_SIMPLY_EMBEDDED (primary)
TVHTML5
ANDROID_VR (variant 1)
ANDROID_VR (variant 2)
ANDROID
IOS
WEB
```
Each client has different capabilities and restrictions. The service tries clients sequentially until a valid stream URL is obtained. Ciphered streams fall back to SoundCloud.
### ID Namespacing
All entity IDs use platform prefixes to avoid collisions:
```
spotify:track:3n3Ppam7vgaVa1iaRUc9Lp
soundcloud:track:1234567890
deezer:album:302127
youtube:video:dQw4w9WgXcQ
```
Format: `{platform}:{entity_type}:{native_id}`
## Data Layer
### PostgreSQL Schema
**Single Table**: `users`
```sql
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
role VARCHAR(50) DEFAULT 'user',
is_verified BOOLEAN DEFAULT false,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
**Connection**: pgx/v5 with connection pooling
**Migrations**: `db/migrations/` (up/down SQL pairs)
### Caching Strategy
**Current**: No caching implemented
**Planned**: Redis for:
- Play deduplication (30s window)
- Service status cache (5min TTL)
- Stream URL cache (1hr TTL)
## Authentication System
**Token Type**: JWT (HS256)
**Access Token**: 15 minutes
**Refresh Token**: 7 days
**Password Hashing**: bcrypt (cost 10)
**gRPC Interceptor**: Validates JWT on all methods except:
- Register
- Login
- RefreshToken
- GetServiceStatus
**Storage**: User credentials in PostgreSQL, tokens issued in-memory (no revocation list).
## Lyrics Integration
### LrcLib (Synced Lyrics)
**Endpoint**: `https://lrclib.net/api/get`
**Format**: LRC (timestamped)
**Timeout**: 5 seconds
**Matching**: Artist + title + album + duration
### Genius (Plain Lyrics)
**Authentication**: `GENIUS_ACCESS_TOKEN` environment variable
**Features**: Plain text lyrics + annotations
**Library**: `github.com/rhnvrm/lyric-api-go`
Both services are queried in parallel when lyrics are requested. Synced lyrics take priority if available.
## Configuration Management
### Environment Variables
**Required**:
```
DATABASE_URL=postgresql://user:pass@localhost:5432/bedrock
JWT_SECRET=your-secret-key
```
**Optional Platform Credentials**:
```
SPOTIFY_CLIENT_ID
SPOTIFY_CLIENT_SECRET
SOUNDCLOUD_CLIENT_IDS=id1,id2,id3
DEEZER_APP_ID
YOUTUBE_COOKIES=cookie-string
GENIUS_ACCESS_TOKEN
```
**Search Locations**:
1. Current working directory
2. `bedrock_server/` directory
3. Parent directory
**Loader**: `github.com/joho/godotenv`
### CLI Flags
```
-port int gRPC server port (default 50052)
-proxy-addr string HTTP proxy address (default :8080)
-proxy-host string HTTP proxy host for URL generation
```
## File Structure
```
bedrock-api/
├── bedrock_server/
│ ├── main.go (1329 lines - service implementation)
│ ├── resolver.go (stream resolution logic)
│ ├── proxy.go (HTTP streaming proxy)
│ ├── auth.go (JWT + bcrypt)
│ ├── lrclib.go (synced lyrics)
│ └── genius.go (plain lyrics)
├── providers/
│ ├── spotify.go (partner API adapter)
│ ├── soundcloud.go (api-v2 adapter)
│ ├── deezer.go (public API adapter)
│ ├── youtube.go (Innertube adapter)
│ ├── yandex.go (stub)
│ └── vk.go (stub)
├── store/
│ └── user.go (PostgreSQL user operations)
├── db/
│ └── migrations/ (SQL migration files)
├── tests/
│ ├── auth_test.go
│ ├── spotify_test.go
│ ├── soundcloud_test.go
│ ├── youtube_test.go
│ ├── deezer_test.go
│ └── lyrics_test.go
├── proto/
│ └── bedrock_service.proto
├── Dockerfile
├── docker-compose.yml
└── go.mod
```
**Total Service Code**: ~3000+ lines (main.go + providers + auth + lyrics)
**Protocol Definition**: 622 lines
**Test Coverage**: 6 integration test files
## Deployment Options
### Docker
**Multi-stage Build**:
- Builder: `golang:1.23-alpine`
- Runtime: `alpine:latest`
- Exposed Ports: `50052`, `8080`
**Note**: Dockerfile uses Go 1.23, but go.mod specifies 1.25 (version mismatch).
### Docker Compose
**Services**:
- PostgreSQL 15-alpine only
- No Redis (planned)
- No reverse proxy (TLS must be added externally)
### Local Development
```bash
git clone https://github.com/feralbureau/bedrock-api
cd bedrock-api
git submodule update --init --recursive
cp .env.example .env
# Configure .env with credentials
go run ./bedrock_server
```
**Submodule Requirement**: `spotapi-go` must be initialized before build.
## CI/CD Pipeline
### GitHub Actions Workflows
**test.yml**:
- Runs on: push, pull_request
- Go version: 1.24
- Services: PostgreSQL 15
- Steps: Submodule init, integration tests with provider secrets
- Timeout: 120 seconds per test
**lint.yml**:
- golangci-lint (standard Go linting)
- Custom comment linter (enforces no decorative comments, no uppercase-leading comments)
**Secrets Required**:
- `SPOTIFY_CLIENT_ID`
- `SPOTIFY_CLIENT_SECRET`
- `SOUNDCLOUD_CLIENT_IDS`
- `GENIUS_ACCESS_TOKEN`
- `YOUTUBE_COOKIES`
## Observability
### Logging
**Implementation**: Go stdlib `log.Printf`
**Format**: `[provider] message` prefix pattern
**Levels**: No structured levels (info/warn/error mixed)
### Monitoring
**Current**: None
**Missing**:
- Prometheus metrics
- APM/tracing
- Structured logging (JSON)
- Error tracking (Sentry, etc.)
### Health Checks
**Endpoint**: `GetServiceStatus` RPC
**Implementation**: Stub (always returns OK)
**Planned**: Per-provider health checks with latency measurement
## Performance Characteristics
### Concurrency Model
- Goroutine per provider for all search/retrieval operations
- `sync.WaitGroup` for coordination
- No rate limiting (relies on provider-level throttling)
- No circuit breakers (failures are logged, partial responses returned)
### Response Patterns
**Partial Response Strategy**: If 2/4 providers fail, return results from 2 successful providers with `ResponseStatus: PARTIAL` and `ProviderError[]` array listing failures.
**Timeout Handling**: No global timeout (relies on HTTP client defaults and provider-specific timeouts like LrcLib 5s).
## Security Posture
### Authentication
- JWT tokens (HS256, not RS256 public/private key)
- bcrypt password hashing (cost 10)
- No rate limiting on auth endpoints
- No account lockout after failed attempts
- No email verification enforcement (is_verified field exists but unused)
### Transport Security
- No built-in TLS (requires reverse proxy like nginx/Caddy)
- gRPC without TLS (insecure credentials)
- HTTP proxy without HTTPS
### Secrets Management
- Environment variables only
- No secrets rotation
- Client IDs/tokens in plaintext .env files
- No vault integration
## Unique Features
1. **Cross-Platform Stream Resolution**: Automatically bridges non-streaming platforms (Spotify, Deezer) to streaming platforms (SoundCloud, YouTube Music)
2. **YouTube 7-Client Fallback**: Maximizes stream availability by rotating through 7 different YouTube client types
3. **SoundCloud Client ID Rotation**: Handles rate limiting by cycling through multiple client IDs
4. **Dual Lyrics Sources**: Combines synced (LrcLib) and annotated (Genius) lyrics
5. **Namespaced ID System**: Platform-prefixed IDs prevent collisions and enable explicit routing
6. **Partial Response Model**: Returns successful provider results even when some providers fail
## Limitations
1. **Incomplete Platform Coverage**: Yandex and VK are stubs only
2. **No Caching**: Every request hits provider APIs (high latency, rate limit risk)
3. **Minimal Database Schema**: Only user authentication, no metadata persistence
4. **No Observability**: Missing metrics, tracing, structured logging
5. **Security Gaps**: No TLS, no rate limiting, no account security features
6. **Version Mismatch**: go.mod (1.25) vs Dockerfile (1.23)
7. **Submodule Dependency**: Custom spotapi-go fork creates maintenance burden
## Use Cases
### Primary
- Multi-platform music search aggregation
- Stream URL resolution for non-streaming APIs
- Unified metadata retrieval across platforms
- Lyrics lookup with sync support
### Secondary
- Playlist import/export across platforms
- Artist/album discovery with similar tracks
- Top charts aggregation
- Music recommendation engine backend
## Integration Considerations
**For Metadata Aggregator Project**:
- Provider adapter pattern is directly applicable
- Fan-out concurrency model can be adopted
- Partial response handling is valuable for resilience
- ID namespacing prevents collision issues
- Stream resolution bridge concept is novel but out of scope for pure metadata
- gRPC interface requires client generation (protobuf compilation)
**Reusable Patterns**:
- `trackProvider` interface design
- Parallel goroutine search with WaitGroup
- Error aggregation in partial responses
- Platform-specific adapter isolation
**Not Applicable**:
- Streaming focus (metadata aggregator doesn't need stream URLs)
- JWT auth (different auth requirements)
- Minimal database schema (metadata needs richer storage)