Files
metadata-agregator/docs/research/bedrock-api/analysis/EVALUATION.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

21 KiB

Bedrock-API Evaluation

Executive Summary

Bedrock-API is a music metadata and streaming aggregation service built in Go 1.25 with gRPC and HTTP interfaces. The project demonstrates strong architectural patterns (provider abstraction, fan-out concurrency, partial response handling) but lacks production-readiness features (caching, monitoring, comprehensive testing, security hardening).

Primary Value: Cross-platform stream resolution (bridges non-streaming APIs like Spotify to streaming platforms like SoundCloud/YouTube Music).

Target Use Case: Unified music search and streaming across multiple platforms.

Maturity Level: Early production (functional but missing observability, caching, and security features).

Strengths

1. Clean Provider Abstraction

Pattern: Implicit trackProvider interface isolates platform-specific logic

Benefits:

  • Easy to add new providers (implement interface)
  • Platform failures don't affect other providers
  • Testable in isolation (mock providers)

Example:

type trackProvider interface {
    Name() string
    SearchTracks(ctx context.Context, query string, limit int32) ([]*pb.Track, error)
    GetStreamURL(ctx context.Context, id string) (string, error)
    // ... other methods
}

Applicability to Metadata Aggregator: Directly applicable. Same pattern can be used for metadata providers (Discogs, MusicBrainz, Last.fm, etc.).

2. Fan-Out Concurrency

Pattern: Parallel goroutines per provider with WaitGroup coordination

Benefits:

  • Response time = slowest provider (not sum of all)
  • Typical search: 200-500ms (4 providers in parallel)
  • Scales linearly with provider count

Example:

var wg sync.WaitGroup
for _, provider := range providers {
    wg.Add(1)
    go func(p trackProvider) {
        defer wg.Done()
        results, err := p.SearchTracks(ctx, query, limit)
        // Aggregate results
    }(provider)
}
wg.Wait()

Applicability to Metadata Aggregator: Directly applicable. Metadata queries can be parallelized across providers.

3. Partial Response Handling

Pattern: Return successful results even if some providers fail

Benefits:

  • Resilient to individual provider failures
  • Degraded service instead of complete failure
  • Client can decide how to handle partial results

Example:

if len(errors) > 0 {
    if len(allTracks) == 0 {
        status = pb.ResponseStatus_ERROR
    } else {
        status = pb.ResponseStatus_PARTIAL
    }
}

return &pb.SearchTracksResponse{
    Tracks: allTracks,
    Status: status,
    Errors: errors, // Per-provider error details
}

Applicability to Metadata Aggregator: Directly applicable. Metadata aggregation should be resilient to individual provider failures.

4. Cross-Platform Stream Resolution

Pattern: Bridge non-streaming platforms to streaming platforms

Algorithm:

  1. Check if platform supports streaming (SoundCloud, YouTube Music)
  2. If not, search SoundCloud for matching track
  3. If SoundCloud fails, search YouTube Music
  4. Return first successful stream URL

Benefits:

  • Unified streaming interface (even for non-streaming APIs)
  • Automatic fallback chain
  • Transparent to client

Applicability to Metadata Aggregator: Not directly applicable (metadata aggregator doesn't need streaming). However, the fallback pattern is useful for metadata resolution (try provider A, fallback to provider B).

5. YouTube 7-Client Fallback

Pattern: Rotate through 7 different YouTube client types to maximize stream availability

Clients:

  • TVHTML5_SIMPLY_EMBEDDED (primary)
  • TVHTML5
  • ANDROID_VR (2 variants)
  • ANDROID
  • IOS
  • WEB

Benefits:

  • Maximizes success rate (different clients have different capabilities)
  • Avoids ciphered streams (encrypted, require decryption)
  • Handles geo-restrictions

Applicability to Metadata Aggregator: Pattern is applicable for providers with multiple API endpoints or client types.

6. ID Namespacing

Pattern: Platform-prefixed IDs ({platform}:{type}:{native_id})

Examples:

  • spotify:track:3n3Ppam7vgaVa1iaRUc9Lp
  • soundcloud:track:1234567890
  • deezer:album:302127

Benefits:

  • Prevents ID collisions across platforms
  • Explicit routing (no lookup required)
  • Self-documenting (ID reveals source platform)

Applicability to Metadata Aggregator: Directly applicable. Metadata IDs should be namespaced to prevent collisions.

7. gRPC for Performance

Benefits:

  • HTTP/2 multiplexing (multiple requests over single connection)
  • Binary protocol (smaller payloads than JSON)
  • Streaming support (future use)
  • Strong typing (protobuf)

Tradeoffs:

  • Requires client code generation
  • Less human-readable than REST/JSON
  • Tooling less mature than REST

Applicability to Metadata Aggregator: Consider gRPC for internal services, REST for public API.

8. JWT Authentication

Implementation: HS256 tokens with bcrypt password hashing

Benefits:

  • Stateless authentication (no session storage)
  • Token expiration (15min access, 7 day refresh)
  • Secure password storage (bcrypt cost 10)

Limitations:

  • No token revocation
  • No refresh token rotation
  • Single shared secret (HS256)

Applicability to Metadata Aggregator: JWT is suitable, but consider RS256 (asymmetric) for better security.

9. SoundCloud Client ID Rotation

Pattern: Rotate through multiple client IDs to avoid rate limits

Implementation:

func (p *SoundCloudProvider) getClientID() string {
    p.mu.Lock()
    defer p.mu.Unlock()
    
    id := p.clientIDs[p.currentID]
    p.currentID = (p.currentID + 1) % len(p.clientIDs)
    
    return id
}

Benefits:

  • Increases effective rate limit (4 IDs = 4x limit)
  • Automatic rotation (no manual intervention)

Applicability to Metadata Aggregator: Applicable for providers with rate limits (rotate API keys).

10. Batch Hydration (SoundCloud)

Pattern: Fetch details for multiple IDs in single request

Implementation: SoundCloud allows up to 30 IDs per request

Benefits:

  • Reduces API calls (30x reduction for playlists)
  • Faster response times
  • Lower rate limit consumption

Applicability to Metadata Aggregator: Applicable for providers that support batch requests (MusicBrainz, Discogs).

Weaknesses

1. No Caching

Impact:

  • High latency (200-500ms per search)
  • Provider API rate limits
  • Unnecessary API quota consumption
  • No offline capability

Recommendation: Implement Redis caching

Cache Strategy:

  • Track metadata: 1 hour TTL
  • Search results: 5 minutes TTL
  • Stream URLs: 1 hour TTL (expire after 1-6 hours anyway)
  • Lyrics: 24 hours TTL (rarely change)

Applicability to Metadata Aggregator: Critical. Metadata aggregator must cache to avoid repeated API calls.

2. Minimal Database Schema

Current: Single users table (authentication only)

Missing:

  • No metadata persistence (tracks, albums, artists)
  • No user data (favorites, playlists, history)
  • No analytics (play counts, search trends)

Impact:

  • All data is ephemeral (fetched from providers every time)
  • No historical data
  • No offline access
  • No data ownership

Applicability to Metadata Aggregator: Metadata aggregator needs rich schema for metadata persistence.

3. No Monitoring

Missing:

  • Prometheus metrics (request rate, error rate, latency)
  • Grafana dashboards
  • Distributed tracing (Jaeger)
  • Log aggregation (Loki)

Impact:

  • No visibility into performance
  • No alerting on failures
  • Difficult to debug production issues

Recommendation: Implement full observability stack

Applicability to Metadata Aggregator: Critical for production. Monitoring is essential.

4. No Rate Limiting

Missing:

  • Per-user rate limiting
  • Per-IP rate limiting
  • Provider-level rate limiting

Impact:

  • Abuse possible (unlimited requests)
  • Provider API rate limits can be exceeded
  • No protection against DDoS

Recommendation: Implement rate limiting

Example:

import "golang.org/x/time/rate"

var limiters = make(map[string]*rate.Limiter)

func getLimiter(userID string) *rate.Limiter {
    limiter, exists := limiters[userID]
    if !exists {
        limiter = rate.NewLimiter(rate.Every(time.Second), 10) // 10 req/sec
        limiters[userID] = limiter
    }
    return limiter
}

Applicability to Metadata Aggregator: Critical. Rate limiting prevents abuse and protects provider APIs.

5. Stub Providers (Yandex, VK)

Status: Placeholder only, no implementation

Impact:

  • Incomplete platform coverage
  • Misleading (listed as supported but not functional)

Recommendation: Remove stubs or implement fully

Applicability to Metadata Aggregator: Don't list providers as supported unless fully implemented.

6. No TLS

Current: gRPC and HTTP without TLS

Impact:

  • Credentials transmitted in plaintext
  • JWT tokens exposed
  • Man-in-the-middle attacks possible

Recommendation: Deploy behind reverse proxy with TLS termination

Applicability to Metadata Aggregator: TLS is mandatory for production.

7. Go Version Mismatch

Issue: go.mod specifies 1.25, Dockerfile uses 1.23

Impact:

  • Build failures if Go 1.25 features are used
  • Inconsistent builds

Fix:

FROM golang:1.25-alpine AS builder

Applicability to Metadata Aggregator: Keep build environment in sync with go.mod.

8. Custom Submodule Dependency

Issue: spotapi-go is custom fork, not official library

Impact:

  • Maintenance burden
  • Submodule initialization required
  • Potential security issues (unmaintained fork)

Recommendation: Use official library directly

Applicability to Metadata Aggregator: Avoid custom forks. Use official libraries or vendor dependencies.

9. No Unit Tests

Current: Integration tests only (require running server and providers)

Missing:

  • Provider adapter unit tests (mocked HTTP responses)
  • Database store unit tests (mocked database)
  • Authentication unit tests (mocked JWT)

Impact:

  • Slow test execution
  • Difficult to test edge cases
  • Requires provider credentials for testing

Recommendation: Add unit tests with mocks

Applicability to Metadata Aggregator: Unit tests are essential for fast feedback and edge case coverage.

10. Health Check Stub

Current: GetServiceStatus always returns healthy

Impact:

  • No actual health monitoring
  • Kubernetes probes don't detect failures
  • No dependency health visibility

Recommendation: Implement real health checks

Applicability to Metadata Aggregator: Health checks are critical for orchestration (Kubernetes, Docker Swarm).

11. No Pagination

Current: Search results limited by limit parameter (max 50)

Impact:

  • Large result sets cannot be retrieved incrementally
  • No cursor-based pagination
  • No total count

Recommendation: Add pagination

Example:

message SearchRequest {
    string query = 1;
    int32 limit = 2;
    string cursor = 3; // Pagination cursor
}

message SearchTracksResponse {
    repeated Track tracks = 1;
    string next_cursor = 2; // Next page cursor
    int32 total = 3; // Total result count
}

Applicability to Metadata Aggregator: Pagination is essential for large result sets.

12. No API Versioning

Current: No version in package name or endpoint

Impact:

  • Breaking changes affect all clients
  • No backward compatibility
  • No deprecation path

Recommendation: Add versioning

Example:

package bedrock.v1;

service BedrockService {
    // ...
}

Applicability to Metadata Aggregator: API versioning is critical for backward compatibility.

Integration Complexity

Provider Integration Effort

Provider Complexity Reason
Spotify Medium OAuth 2.0, submodule dependency
SoundCloud Low Simple HTTP API, client ID rotation
Deezer Low Public API, no auth
YouTube Music High Undocumented Innertube API, 7-client fallback, cipher handling
Yandex Unknown Not implemented
VK Unknown Not implemented

Easiest: Deezer (public API, no auth)
Hardest: YouTube Music (undocumented API, complex fallback logic)

Client Integration Effort

gRPC Clients: Requires protobuf compilation

Steps:

  1. Install protoc compiler
  2. Install language-specific protobuf plugin
  3. Generate client code from .proto file
  4. Implement authentication (JWT in metadata)

Example (Go):

protoc --go_out=. --go-grpc_out=. bedrock_service.proto

Example (Python):

python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. bedrock_service.proto

Complexity: Medium (requires tooling setup)

Alternative: Provide pre-generated clients for popular languages

Performance Analysis

Latency Breakdown

Typical Search Request (4 providers):

Component Latency Notes
gRPC overhead 1-5ms Minimal
Authentication 1-2ms JWT validation
Provider queries (parallel) 200-500ms Slowest provider wins
Response aggregation 1-5ms Mutex-protected append
Total 200-510ms Dominated by provider latency

Optimization Opportunities:

  • Cache metadata (reduce provider calls)
  • Implement timeouts (don't wait for slow providers)
  • Add circuit breakers (skip failing providers)

Throughput

Single Instance (no caching):

  • Requests per second: ~10-20 (limited by provider APIs)
  • Concurrent requests: Limited by goroutine count (unbounded, risky)

With Caching (Redis):

  • Requests per second: ~1000+ (cache hits)
  • Concurrent requests: Limited by database connections (10 max)

Scaling:

  • Horizontal: Run multiple instances behind load balancer
  • Vertical: Increase CPU/RAM for single instance

Resource Usage

Memory: ~50-100 MB (idle), ~200-500 MB (under load)
CPU: Low (I/O bound, waiting on provider APIs)
Network: High (streaming proxy, provider API calls)

Security Assessment

Authentication

Strengths:

  • JWT tokens (stateless)
  • bcrypt password hashing (secure)
  • gRPC interceptors (centralized auth)

Weaknesses:

  • No token revocation
  • No refresh token rotation
  • Single shared secret (HS256)
  • No rate limiting (brute force possible)
  • No account lockout

Risk Level: Medium

Recommendations:

  • Implement token revocation list (Redis)
  • Use RS256 (asymmetric keys)
  • Add rate limiting on auth endpoints
  • Add account lockout after failed attempts

Transport Security

Strengths: None (no TLS)

Weaknesses:

  • Credentials transmitted in plaintext
  • JWT tokens exposed
  • Man-in-the-middle attacks possible

Risk Level: High

Recommendations:

  • Deploy behind reverse proxy with TLS
  • Use Let's Encrypt for free certificates
  • Enforce HTTPS redirects

Input Validation

Strengths:

  • Parameterized queries (SQL injection safe)
  • Email format validation

Weaknesses:

  • No query length limits
  • No ID format validation
  • No limit parameter bounds

Risk Level: Low (no SQL injection, but potential DoS)

Recommendations:

  • Validate all inputs (length, format, bounds)
  • Sanitize user-provided data
  • Add request size limits

Secrets Management

Strengths: None (plaintext .env files)

Weaknesses:

  • Secrets in plaintext
  • No rotation
  • No encryption at rest

Risk Level: Medium

Recommendations:

  • Use secrets manager (AWS Secrets Manager, Vault)
  • Rotate secrets periodically
  • Encrypt secrets at rest

Scalability

Vertical Scaling

Current Limits:

  • Database connections: 10 max
  • Goroutines: Unbounded (risky)
  • Memory: ~500 MB under load

Scaling Up:

  • Increase database connection pool
  • Add worker pool (bounded goroutines)
  • Increase instance size (CPU, RAM)

Max Capacity (single instance): ~100 req/sec (with caching)

Horizontal Scaling

Stateless Design: Yes (JWT tokens, no sessions)

Scaling Out:

  • Run multiple instances behind load balancer
  • Share PostgreSQL database (read replicas for reads)
  • Share Redis cache (cluster mode)

Max Capacity (10 instances): ~1000 req/sec (with caching)

Database Scaling

Current: Single PostgreSQL instance

Scaling Options:

  • Read replicas (for read-heavy workloads)
  • Connection pooler (PgBouncer)
  • Sharding (by user ID)

Bottleneck: Database is not bottleneck (minimal schema, simple queries)

Maintainability

Code Organization

Strengths:

  • Clean provider abstraction
  • Separation of concerns (providers, store, auth)

Weaknesses:

  • Single 1300+ line file (main.go)
  • No package documentation
  • No API documentation

Recommendation: Split main.go by domain (search, retrieval, streaming, etc.)

Testing

Strengths:

  • Integration tests for all providers
  • GitHub Actions CI/CD

Weaknesses:

  • No unit tests
  • No test coverage reporting
  • No mocks

Recommendation: Add unit tests with mocks, measure coverage

Documentation

Strengths:

  • README with setup instructions
  • .env.example template

Weaknesses:

  • No API documentation (OpenAPI/Swagger)
  • No architecture documentation
  • No deployment guide

Recommendation: Add comprehensive documentation

Dependency Management

Strengths:

  • Go modules (versioned dependencies)
  • Minimal dependencies (8 direct)

Weaknesses:

  • Custom submodule (spotapi-go)
  • No automated updates (Dependabot)

Recommendation: Remove submodule, add Dependabot

Comparison to Metadata Aggregator Requirements

Alignment

Requirement Bedrock-API Metadata Aggregator Alignment
Multi-provider aggregation Yes (4 active) Yes (10+ planned) High
Parallel queries Yes (goroutines) Yes High
Partial response handling Yes Yes High
Metadata persistence No Yes Low
Caching No Yes (critical) Low
Rich metadata Medium High Medium
Streaming Yes No N/A
Authentication JWT TBD Medium
Monitoring No Yes Low
Testing Integration only Unit + Integration Medium

Reusable Patterns

Directly Applicable:

  • Provider interface pattern
  • Fan-out concurrency
  • Partial response handling
  • ID namespacing
  • gRPC interceptors

Needs Adaptation:

  • Authentication (add RBAC, token revocation)
  • Database schema (expand for metadata)
  • Caching (add Redis)
  • Monitoring (add Prometheus)

Not Applicable:

  • Stream resolution (metadata aggregator doesn't need streaming)
  • YouTube 7-client fallback (specific to YouTube)

Recommendations for Metadata Aggregator

Adopt

  1. Provider Interface Pattern: Clean abstraction for platform-specific logic
  2. Fan-Out Concurrency: Parallel queries for fast responses
  3. Partial Response Handling: Resilient to individual provider failures
  4. ID Namespacing: Prevent collisions, enable explicit routing
  5. gRPC for Internal Services: Performance benefits for service-to-service communication
  6. JWT Authentication: Stateless, scalable authentication
  7. bcrypt Password Hashing: Secure password storage

Avoid

  1. No Caching: Implement Redis from day one
  2. Minimal Database Schema: Design rich schema for metadata persistence
  3. No Monitoring: Implement Prometheus + Grafana from start
  4. No Rate Limiting: Add rate limiting to prevent abuse
  5. Stub Providers: Only list fully implemented providers
  6. No TLS: Deploy with TLS from start
  7. Custom Submodules: Use official libraries or vendor dependencies
  8. No Unit Tests: Write unit tests with mocks
  9. Single Large File: Split code by domain
  10. No API Versioning: Version API from start

Enhance

  1. Add Caching Layer: Redis for metadata, search results, provider responses
  2. Expand Database Schema: Tables for tracks, albums, artists, labels, genres, etc.
  3. Implement Monitoring: Prometheus metrics, Grafana dashboards, distributed tracing
  4. Add Rate Limiting: Per-user, per-IP, per-provider limits
  5. Implement Health Checks: Real health checks for dependencies
  6. Add Pagination: Cursor-based pagination for large result sets
  7. Add API Versioning: Version API for backward compatibility
  8. Add Comprehensive Testing: Unit tests with mocks, integration tests, E2E tests
  9. Add Documentation: API docs (OpenAPI), architecture docs, deployment guide
  10. Add Security Features: Token revocation, refresh token rotation, RS256, TLS

Final Verdict

Overall Assessment: Good architectural foundation, but lacks production-readiness features.

Strengths: Clean provider abstraction, fan-out concurrency, partial response handling, cross-platform stream resolution.

Weaknesses: No caching, minimal database schema, no monitoring, no rate limiting, no TLS, stub providers.

Maturity Level: Early production (functional but missing critical features).

Recommendation for Metadata Aggregator: Adopt core patterns (provider interface, fan-out concurrency, partial responses, ID namespacing), but enhance with caching, monitoring, comprehensive testing, and security features.

Effort to Adapt: Medium (core patterns are reusable, but significant enhancements needed for production).

Value Proposition: Bedrock-API demonstrates proven patterns for multi-provider aggregation. The metadata aggregator can learn from its strengths (clean abstraction, concurrency, resilience) while avoiding its weaknesses (no caching, minimal schema, no monitoring).