feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,460 @@
|
||||
# Bedrock-API Overview
|
||||
|
||||
## Project Identity
|
||||
|
||||
**Repository**: https://github.com/feralbureau/bedrock-api
|
||||
**Language**: Go 1.25
|
||||
**License**: MIT
|
||||
**Primary Protocols**: gRPC, HTTP
|
||||
**Database**: PostgreSQL 15
|
||||
**Entry Point**: `bedrock_server/main.go`
|
||||
|
||||
Bedrock-API is a unified music metadata and streaming aggregation service that consolidates six music platforms into a single gRPC interface. The project's core value proposition is cross-platform stream resolution: when a platform doesn't provide streaming (Spotify partner API, Deezer public API), Bedrock bridges to SoundCloud or YouTube Music to deliver playable URLs.
|
||||
|
||||
## Platform Coverage
|
||||
|
||||
| Platform | Status | API Type | Streaming | Authentication | Special Features |
|
||||
|----------|--------|----------|-----------|----------------|------------------|
|
||||
| Spotify | Full | Partner API | No (bridged) | OAuth via submodule | Full discography, namespaced IDs |
|
||||
| SoundCloud | Full | api-v2 | Yes (progressive MP3) | Client ID rotation | Batch hydration (30 IDs), /resolve endpoint |
|
||||
| Deezer | Full | Public API | No (bridged) | None | Concurrent artist data fetching |
|
||||
| YouTube Music | Full | Innertube | Yes (7-client fallback) | Cookies for age-restricted | WEB_REMIX metadata, itag priority |
|
||||
| Yandex Music | Stub | N/A | No | N/A | Placeholder only |
|
||||
| VK Music | Stub | N/A | No | N/A | Placeholder only |
|
||||
|
||||
**Active Platforms**: 4 (Spotify, SoundCloud, Deezer, YouTube Music)
|
||||
**Stub Platforms**: 2 (Yandex, VK)
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
### gRPC Service Interface
|
||||
|
||||
**Total Methods**: 23 RPC endpoints
|
||||
**Protocol Buffer**: `bedrock_service.proto` (622 lines)
|
||||
|
||||
Method categories:
|
||||
- **Search**: 4 methods (tracks, albums, artists, playlists)
|
||||
- **Retrieval**: 4 methods (get track, album, artist, playlist by ID)
|
||||
- **Streaming**: 1 method (GetStreamURL)
|
||||
- **Discovery**: 1 method (GetSimilarTracks)
|
||||
- **Lyrics**: 2 methods (GetLyrics, GetSyncedLyrics)
|
||||
- **Statistics**: 3 methods (GetTopTracks, GetTopAlbums, GetTopArtists)
|
||||
- **Import**: 1 method (ImportPlaylist)
|
||||
- **Health**: 1 method (GetServiceStatus)
|
||||
- **Authentication**: 3 methods (Register, Login, RefreshToken)
|
||||
|
||||
### HTTP Streaming Proxy
|
||||
|
||||
**Endpoints**:
|
||||
- `/stream/{service}/{id}` - Audio stream proxy with range request support
|
||||
- `/cover/{service}/{id}` - Album art proxy
|
||||
|
||||
**Ports**:
|
||||
- gRPC: `:50052`
|
||||
- HTTP: `:8080`
|
||||
|
||||
Both endpoints support HTTP range requests for seeking and partial content delivery.
|
||||
|
||||
## Technology Stack
|
||||
|
||||
### Core Dependencies
|
||||
|
||||
```
|
||||
google.golang.org/grpc v1.79.1
|
||||
google.golang.org/protobuf v1.36.4
|
||||
github.com/jackc/pgx/v5 v5.7.2
|
||||
github.com/golang-jwt/jwt/v5 v5.2.1
|
||||
golang.org/x/crypto (bcrypt)
|
||||
github.com/joho/godotenv v1.5.1
|
||||
```
|
||||
|
||||
### Provider Libraries
|
||||
|
||||
```
|
||||
github.com/zmb3/spotify/v2 (via spotapi-go submodule)
|
||||
github.com/kkdai/youtube/v2 v2.10.3
|
||||
github.com/rhnvrm/lyric-api-go v0.1.4 (Genius)
|
||||
```
|
||||
|
||||
**Submodule**: `spotapi-go` (custom Spotify client wrapper)
|
||||
|
||||
### Build Requirements
|
||||
|
||||
- Go 1.25 (go.mod specification)
|
||||
- Git submodules (spotapi-go)
|
||||
- PostgreSQL 15+ (runtime)
|
||||
- Protocol buffer compiler (development)
|
||||
|
||||
## Architecture Highlights
|
||||
|
||||
### Fan-Out Concurrency Pattern
|
||||
|
||||
All search and retrieval methods execute parallel goroutines across enabled providers:
|
||||
|
||||
```go
|
||||
var wg sync.WaitGroup
|
||||
for _, provider := range providers {
|
||||
wg.Add(1)
|
||||
go func(p trackProvider) {
|
||||
defer wg.Done()
|
||||
results, err := p.SearchTracks(query, limit)
|
||||
// aggregate results
|
||||
}(provider)
|
||||
}
|
||||
wg.Wait()
|
||||
```
|
||||
|
||||
This pattern enables sub-second response times even when querying 4+ platforms simultaneously.
|
||||
|
||||
### Stream Resolution Bridge
|
||||
|
||||
**Problem**: Spotify partner API and Deezer public API don't provide streaming URLs.
|
||||
|
||||
**Solution**: Three-tier fallback cascade:
|
||||
|
||||
1. Check if requested platform supports streaming (SoundCloud, YouTube Music)
|
||||
2. If not, search SoundCloud for "{artist} - {title}"
|
||||
3. If SoundCloud fails, search YouTube Music with same query
|
||||
4. Return first successful stream URL
|
||||
|
||||
**Implementation**: `providers/resolver.go`
|
||||
|
||||
### YouTube Music 7-Client Fallback Pool
|
||||
|
||||
YouTube Music streams use a client rotation strategy to maximize success rate:
|
||||
|
||||
```
|
||||
TVHTML5_SIMPLY_EMBEDDED (primary)
|
||||
TVHTML5
|
||||
ANDROID_VR (variant 1)
|
||||
ANDROID_VR (variant 2)
|
||||
ANDROID
|
||||
IOS
|
||||
WEB
|
||||
```
|
||||
|
||||
Each client has different capabilities and restrictions. The service tries clients sequentially until a valid stream URL is obtained. Ciphered streams fall back to SoundCloud.
|
||||
|
||||
### ID Namespacing
|
||||
|
||||
All entity IDs use platform prefixes to avoid collisions:
|
||||
|
||||
```
|
||||
spotify:track:3n3Ppam7vgaVa1iaRUc9Lp
|
||||
soundcloud:track:1234567890
|
||||
deezer:album:302127
|
||||
youtube:video:dQw4w9WgXcQ
|
||||
```
|
||||
|
||||
Format: `{platform}:{entity_type}:{native_id}`
|
||||
|
||||
## Data Layer
|
||||
|
||||
### PostgreSQL Schema
|
||||
|
||||
**Single Table**: `users`
|
||||
|
||||
```sql
|
||||
CREATE TABLE users (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
email VARCHAR(255) UNIQUE NOT NULL,
|
||||
password_hash VARCHAR(255) NOT NULL,
|
||||
role VARCHAR(50) DEFAULT 'user',
|
||||
is_verified BOOLEAN DEFAULT false,
|
||||
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
**Connection**: pgx/v5 with connection pooling
|
||||
**Migrations**: `db/migrations/` (up/down SQL pairs)
|
||||
|
||||
### Caching Strategy
|
||||
|
||||
**Current**: No caching implemented
|
||||
**Planned**: Redis for:
|
||||
- Play deduplication (30s window)
|
||||
- Service status cache (5min TTL)
|
||||
- Stream URL cache (1hr TTL)
|
||||
|
||||
## Authentication System
|
||||
|
||||
**Token Type**: JWT (HS256)
|
||||
**Access Token**: 15 minutes
|
||||
**Refresh Token**: 7 days
|
||||
**Password Hashing**: bcrypt (cost 10)
|
||||
|
||||
**gRPC Interceptor**: Validates JWT on all methods except:
|
||||
- Register
|
||||
- Login
|
||||
- RefreshToken
|
||||
- GetServiceStatus
|
||||
|
||||
**Storage**: User credentials in PostgreSQL, tokens issued in-memory (no revocation list).
|
||||
|
||||
## Lyrics Integration
|
||||
|
||||
### LrcLib (Synced Lyrics)
|
||||
|
||||
**Endpoint**: `https://lrclib.net/api/get`
|
||||
**Format**: LRC (timestamped)
|
||||
**Timeout**: 5 seconds
|
||||
**Matching**: Artist + title + album + duration
|
||||
|
||||
### Genius (Plain Lyrics)
|
||||
|
||||
**Authentication**: `GENIUS_ACCESS_TOKEN` environment variable
|
||||
**Features**: Plain text lyrics + annotations
|
||||
**Library**: `github.com/rhnvrm/lyric-api-go`
|
||||
|
||||
Both services are queried in parallel when lyrics are requested. Synced lyrics take priority if available.
|
||||
|
||||
## Configuration Management
|
||||
|
||||
### Environment Variables
|
||||
|
||||
**Required**:
|
||||
```
|
||||
DATABASE_URL=postgresql://user:pass@localhost:5432/bedrock
|
||||
JWT_SECRET=your-secret-key
|
||||
```
|
||||
|
||||
**Optional Platform Credentials**:
|
||||
```
|
||||
SPOTIFY_CLIENT_ID
|
||||
SPOTIFY_CLIENT_SECRET
|
||||
SOUNDCLOUD_CLIENT_IDS=id1,id2,id3
|
||||
DEEZER_APP_ID
|
||||
YOUTUBE_COOKIES=cookie-string
|
||||
GENIUS_ACCESS_TOKEN
|
||||
```
|
||||
|
||||
**Search Locations**:
|
||||
1. Current working directory
|
||||
2. `bedrock_server/` directory
|
||||
3. Parent directory
|
||||
|
||||
**Loader**: `github.com/joho/godotenv`
|
||||
|
||||
### CLI Flags
|
||||
|
||||
```
|
||||
-port int gRPC server port (default 50052)
|
||||
-proxy-addr string HTTP proxy address (default :8080)
|
||||
-proxy-host string HTTP proxy host for URL generation
|
||||
```
|
||||
|
||||
## File Structure
|
||||
|
||||
```
|
||||
bedrock-api/
|
||||
├── bedrock_server/
|
||||
│ ├── main.go (1329 lines - service implementation)
|
||||
│ ├── resolver.go (stream resolution logic)
|
||||
│ ├── proxy.go (HTTP streaming proxy)
|
||||
│ ├── auth.go (JWT + bcrypt)
|
||||
│ ├── lrclib.go (synced lyrics)
|
||||
│ └── genius.go (plain lyrics)
|
||||
├── providers/
|
||||
│ ├── spotify.go (partner API adapter)
|
||||
│ ├── soundcloud.go (api-v2 adapter)
|
||||
│ ├── deezer.go (public API adapter)
|
||||
│ ├── youtube.go (Innertube adapter)
|
||||
│ ├── yandex.go (stub)
|
||||
│ └── vk.go (stub)
|
||||
├── store/
|
||||
│ └── user.go (PostgreSQL user operations)
|
||||
├── db/
|
||||
│ └── migrations/ (SQL migration files)
|
||||
├── tests/
|
||||
│ ├── auth_test.go
|
||||
│ ├── spotify_test.go
|
||||
│ ├── soundcloud_test.go
|
||||
│ ├── youtube_test.go
|
||||
│ ├── deezer_test.go
|
||||
│ └── lyrics_test.go
|
||||
├── proto/
|
||||
│ └── bedrock_service.proto
|
||||
├── Dockerfile
|
||||
├── docker-compose.yml
|
||||
└── go.mod
|
||||
```
|
||||
|
||||
**Total Service Code**: ~3000+ lines (main.go + providers + auth + lyrics)
|
||||
**Protocol Definition**: 622 lines
|
||||
**Test Coverage**: 6 integration test files
|
||||
|
||||
## Deployment Options
|
||||
|
||||
### Docker
|
||||
|
||||
**Multi-stage Build**:
|
||||
- Builder: `golang:1.23-alpine`
|
||||
- Runtime: `alpine:latest`
|
||||
- Exposed Ports: `50052`, `8080`
|
||||
|
||||
**Note**: Dockerfile uses Go 1.23, but go.mod specifies 1.25 (version mismatch).
|
||||
|
||||
### Docker Compose
|
||||
|
||||
**Services**:
|
||||
- PostgreSQL 15-alpine only
|
||||
- No Redis (planned)
|
||||
- No reverse proxy (TLS must be added externally)
|
||||
|
||||
### Local Development
|
||||
|
||||
```bash
|
||||
git clone https://github.com/feralbureau/bedrock-api
|
||||
cd bedrock-api
|
||||
git submodule update --init --recursive
|
||||
cp .env.example .env
|
||||
# Configure .env with credentials
|
||||
go run ./bedrock_server
|
||||
```
|
||||
|
||||
**Submodule Requirement**: `spotapi-go` must be initialized before build.
|
||||
|
||||
## CI/CD Pipeline
|
||||
|
||||
### GitHub Actions Workflows
|
||||
|
||||
**test.yml**:
|
||||
- Runs on: push, pull_request
|
||||
- Go version: 1.24
|
||||
- Services: PostgreSQL 15
|
||||
- Steps: Submodule init, integration tests with provider secrets
|
||||
- Timeout: 120 seconds per test
|
||||
|
||||
**lint.yml**:
|
||||
- golangci-lint (standard Go linting)
|
||||
- Custom comment linter (enforces no decorative comments, no uppercase-leading comments)
|
||||
|
||||
**Secrets Required**:
|
||||
- `SPOTIFY_CLIENT_ID`
|
||||
- `SPOTIFY_CLIENT_SECRET`
|
||||
- `SOUNDCLOUD_CLIENT_IDS`
|
||||
- `GENIUS_ACCESS_TOKEN`
|
||||
- `YOUTUBE_COOKIES`
|
||||
|
||||
## Observability
|
||||
|
||||
### Logging
|
||||
|
||||
**Implementation**: Go stdlib `log.Printf`
|
||||
**Format**: `[provider] message` prefix pattern
|
||||
**Levels**: No structured levels (info/warn/error mixed)
|
||||
|
||||
### Monitoring
|
||||
|
||||
**Current**: None
|
||||
**Missing**:
|
||||
- Prometheus metrics
|
||||
- APM/tracing
|
||||
- Structured logging (JSON)
|
||||
- Error tracking (Sentry, etc.)
|
||||
|
||||
### Health Checks
|
||||
|
||||
**Endpoint**: `GetServiceStatus` RPC
|
||||
**Implementation**: Stub (always returns OK)
|
||||
**Planned**: Per-provider health checks with latency measurement
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Concurrency Model
|
||||
|
||||
- Goroutine per provider for all search/retrieval operations
|
||||
- `sync.WaitGroup` for coordination
|
||||
- No rate limiting (relies on provider-level throttling)
|
||||
- No circuit breakers (failures are logged, partial responses returned)
|
||||
|
||||
### Response Patterns
|
||||
|
||||
**Partial Response Strategy**: If 2/4 providers fail, return results from 2 successful providers with `ResponseStatus: PARTIAL` and `ProviderError[]` array listing failures.
|
||||
|
||||
**Timeout Handling**: No global timeout (relies on HTTP client defaults and provider-specific timeouts like LrcLib 5s).
|
||||
|
||||
## Security Posture
|
||||
|
||||
### Authentication
|
||||
|
||||
- JWT tokens (HS256, not RS256 public/private key)
|
||||
- bcrypt password hashing (cost 10)
|
||||
- No rate limiting on auth endpoints
|
||||
- No account lockout after failed attempts
|
||||
- No email verification enforcement (is_verified field exists but unused)
|
||||
|
||||
### Transport Security
|
||||
|
||||
- No built-in TLS (requires reverse proxy like nginx/Caddy)
|
||||
- gRPC without TLS (insecure credentials)
|
||||
- HTTP proxy without HTTPS
|
||||
|
||||
### Secrets Management
|
||||
|
||||
- Environment variables only
|
||||
- No secrets rotation
|
||||
- Client IDs/tokens in plaintext .env files
|
||||
- No vault integration
|
||||
|
||||
## Unique Features
|
||||
|
||||
1. **Cross-Platform Stream Resolution**: Automatically bridges non-streaming platforms (Spotify, Deezer) to streaming platforms (SoundCloud, YouTube Music)
|
||||
|
||||
2. **YouTube 7-Client Fallback**: Maximizes stream availability by rotating through 7 different YouTube client types
|
||||
|
||||
3. **SoundCloud Client ID Rotation**: Handles rate limiting by cycling through multiple client IDs
|
||||
|
||||
4. **Dual Lyrics Sources**: Combines synced (LrcLib) and annotated (Genius) lyrics
|
||||
|
||||
5. **Namespaced ID System**: Platform-prefixed IDs prevent collisions and enable explicit routing
|
||||
|
||||
6. **Partial Response Model**: Returns successful provider results even when some providers fail
|
||||
|
||||
## Limitations
|
||||
|
||||
1. **Incomplete Platform Coverage**: Yandex and VK are stubs only
|
||||
2. **No Caching**: Every request hits provider APIs (high latency, rate limit risk)
|
||||
3. **Minimal Database Schema**: Only user authentication, no metadata persistence
|
||||
4. **No Observability**: Missing metrics, tracing, structured logging
|
||||
5. **Security Gaps**: No TLS, no rate limiting, no account security features
|
||||
6. **Version Mismatch**: go.mod (1.25) vs Dockerfile (1.23)
|
||||
7. **Submodule Dependency**: Custom spotapi-go fork creates maintenance burden
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Primary
|
||||
|
||||
- Multi-platform music search aggregation
|
||||
- Stream URL resolution for non-streaming APIs
|
||||
- Unified metadata retrieval across platforms
|
||||
- Lyrics lookup with sync support
|
||||
|
||||
### Secondary
|
||||
|
||||
- Playlist import/export across platforms
|
||||
- Artist/album discovery with similar tracks
|
||||
- Top charts aggregation
|
||||
- Music recommendation engine backend
|
||||
|
||||
## Integration Considerations
|
||||
|
||||
**For Metadata Aggregator Project**:
|
||||
|
||||
- Provider adapter pattern is directly applicable
|
||||
- Fan-out concurrency model can be adopted
|
||||
- Partial response handling is valuable for resilience
|
||||
- ID namespacing prevents collision issues
|
||||
- Stream resolution bridge concept is novel but out of scope for pure metadata
|
||||
- gRPC interface requires client generation (protobuf compilation)
|
||||
|
||||
**Reusable Patterns**:
|
||||
- `trackProvider` interface design
|
||||
- Parallel goroutine search with WaitGroup
|
||||
- Error aggregation in partial responses
|
||||
- Platform-specific adapter isolation
|
||||
|
||||
**Not Applicable**:
|
||||
- Streaming focus (metadata aggregator doesn't need stream URLs)
|
||||
- JWT auth (different auth requirements)
|
||||
- Minimal database schema (metadata needs richer storage)
|
||||
Reference in New Issue
Block a user