a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
960 lines
26 KiB
Markdown
960 lines
26 KiB
Markdown
# Harmony - Evaluation and Recommendations
|
|
|
|
## Executive Summary
|
|
|
|
Harmony is the **most relevant and architecturally sound** reference project for building a music metadata aggregation system. Its 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED), provider abstraction system, and intelligent merge algorithm represent best-in-class design patterns for multi-source data integration.
|
|
|
|
**Key Strengths**:
|
|
- Best-in-class multi-source aggregation architecture
|
|
- Intelligent 3-phase merge algorithm with provider preferences
|
|
- Comprehensive 273-line HarmonyRelease schema
|
|
- MusicBrainz integration with MBID resolution and seeding
|
|
- Type-safe TypeScript implementation with full test coverage
|
|
- Graceful degradation via Promise.allSettled
|
|
- Permalink system for reproducible results
|
|
|
|
**Key Limitations**:
|
|
- Web UI only (no REST/JSON API)
|
|
- Single developer project (bus factor = 1)
|
|
- No containerization (Docker)
|
|
- HTML scraping providers are fragile
|
|
- No monitoring/metrics infrastructure
|
|
|
|
**Recommendation**: **Adopt Harmony's architecture patterns** while addressing limitations through:
|
|
1. Add REST API layer for programmatic access
|
|
2. Containerize for easier deployment
|
|
3. Add monitoring and metrics
|
|
4. Expand provider ecosystem
|
|
5. Build community around project
|
|
|
|
## Detailed Evaluation
|
|
|
|
### Architecture (Score: 9.5/10)
|
|
|
|
#### Strengths
|
|
|
|
**1. 4-Stage Pipeline Design**
|
|
|
|
The LOOKUP → HARMONIZE → MERGE → SEED pipeline is exceptionally well-designed:
|
|
|
|
- **Clear separation of concerns**: Each stage has distinct responsibilities
|
|
- **Composable**: Stages can be used independently or combined
|
|
- **Testable**: Each stage can be tested in isolation
|
|
- **Extensible**: New providers or merge strategies can be added without affecting other stages
|
|
|
|
**Example Use Cases**:
|
|
- LOOKUP only: Fetch data from providers without harmonization
|
|
- LOOKUP + HARMONIZE: Get standardized data without merging
|
|
- Full pipeline: Complete aggregation and MusicBrainz seeding
|
|
|
|
**2. Provider Abstraction System**
|
|
|
|
The base class hierarchy is exemplary:
|
|
|
|
```
|
|
MetadataProvider (abstract)
|
|
├── MetadataApiProvider (OAuth2)
|
|
├── ReleaseLookup (GTIN/URL/ID)
|
|
└── ReleaseApiLookup (multi-region)
|
|
```
|
|
|
|
**Benefits**:
|
|
- **Consistent interface**: All providers implement same methods
|
|
- **Code reuse**: Common functionality (caching, rate limiting, OAuth2) in base classes
|
|
- **Easy provider addition**: New providers require minimal boilerplate
|
|
- **Feature quality ratings**: Transparent quality assessment
|
|
|
|
**3. Intelligent Merge Algorithm**
|
|
|
|
The 3-phase merge (collect → check compatibility → select best) is sophisticated:
|
|
|
|
- **Compatibility checking**: Detects conflicts before merging
|
|
- **Provider preferences**: Configurable priority order
|
|
- **Source tracking**: SourceMap records which provider contributed each field
|
|
- **Conflict reporting**: IncompatibilityInfo provides detailed conflict information
|
|
|
|
**Real-world value**: Solves the "which source wins" problem elegantly.
|
|
|
|
**4. Type Safety**
|
|
|
|
Full TypeScript coverage with 273-line HarmonyRelease schema ensures:
|
|
|
|
- **Compile-time error detection**: Catch bugs before runtime
|
|
- **IDE autocomplete**: Better developer experience
|
|
- **Self-documenting**: Types serve as documentation
|
|
- **Refactoring safety**: Changes propagate through type system
|
|
|
|
#### Weaknesses
|
|
|
|
**1. No REST API**
|
|
|
|
Web UI only limits programmatic access:
|
|
|
|
- **Integration difficulty**: Other applications can't easily consume data
|
|
- **Automation challenges**: No API for batch processing
|
|
- **Mobile apps**: Can't build native mobile clients
|
|
|
|
**Mitigation**: Add REST API layer (see recommendations)
|
|
|
|
**2. Tight Coupling to Fresh Framework**
|
|
|
|
Fresh is Deno-only, limiting deployment options:
|
|
|
|
- **No Node.js support**: Can't run on Node.js infrastructure
|
|
- **Framework lock-in**: Migrating to another framework would be difficult
|
|
- **Smaller ecosystem**: Fresh has fewer resources than Next.js/Remix
|
|
|
|
**Mitigation**: Extract core logic into framework-agnostic library
|
|
|
|
### Data Model (Score: 9/10)
|
|
|
|
#### Strengths
|
|
|
|
**1. Comprehensive HarmonyRelease Schema**
|
|
|
|
273 lines covering all music metadata needs:
|
|
|
|
- **Basic metadata**: Title, artists, GTIN
|
|
- **Media structure**: Multi-disc support with tracks
|
|
- **Commercial info**: Labels, catalog numbers, copyright
|
|
- **Distribution**: Available/excluded countries
|
|
- **Visual assets**: Images with dimensions and types
|
|
- **External links**: Provider URLs with link types
|
|
- **Metadata about metadata**: Providers, messages, source map
|
|
|
|
**Coverage**: Matches or exceeds MusicBrainz schema.
|
|
|
|
**2. Partial Date Support**
|
|
|
|
`PartialDate` interface handles incomplete dates:
|
|
|
|
```typescript
|
|
{ year: 2014 } // Year only
|
|
{ year: 2014, month: 11 } // Year and month
|
|
{ year: 2014, month: 11, day: 24 } // Full date
|
|
```
|
|
|
|
**Real-world value**: Many releases have incomplete release dates.
|
|
|
|
**3. Artist Credit System**
|
|
|
|
`ArtistCreditName[]` with join phrases:
|
|
|
|
```typescript
|
|
[
|
|
{ name: "Artist A", joinPhrase: " & " },
|
|
{ name: "Artist B", joinPhrase: " feat. " },
|
|
{ name: "Artist C" }
|
|
]
|
|
// Renders: "Artist A & Artist B feat. Artist C"
|
|
```
|
|
|
|
**Real-world value**: Handles complex artist credits (collaborations, features, etc.)
|
|
|
|
**4. Source Tracking**
|
|
|
|
`SourceMap` records which provider contributed each field:
|
|
|
|
```typescript
|
|
{
|
|
"title": "spotify",
|
|
"releaseDate": "spotify",
|
|
"gtin": "deezer",
|
|
"media[0].tracks[0].isrc": "spotify"
|
|
}
|
|
```
|
|
|
|
**Real-world value**: Enables data provenance and debugging.
|
|
|
|
#### Weaknesses
|
|
|
|
**1. No Versioning**
|
|
|
|
Schema has no version field:
|
|
|
|
- **Breaking changes**: No way to detect schema version
|
|
- **Migration challenges**: Can't handle multiple schema versions simultaneously
|
|
|
|
**Mitigation**: Add `schemaVersion` field to HarmonyRelease
|
|
|
|
**2. Limited Extensibility**
|
|
|
|
No extension mechanism for provider-specific data:
|
|
|
|
- **Custom fields**: No way to store provider-specific metadata
|
|
- **Experimental features**: Can't add new fields without schema change
|
|
|
|
**Mitigation**: Add `extensions` object for provider-specific data
|
|
|
|
### Provider Integration (Score: 8.5/10)
|
|
|
|
#### Strengths
|
|
|
|
**1. Diverse Provider Ecosystem**
|
|
|
|
9 providers covering major platforms:
|
|
|
|
- **Streaming**: Spotify, Deezer, Tidal
|
|
- **Purchase**: iTunes, Bandcamp, Beatport
|
|
- **Regional**: Mora, Ototoy (Japan)
|
|
- **Reference**: MusicBrainz
|
|
|
|
**Coverage**: Excellent global coverage with regional specialists.
|
|
|
|
**2. Multi-Access Methods**
|
|
|
|
Both API-based (5) and HTML scraping (4):
|
|
|
|
- **API-based**: Reliable, structured data
|
|
- **HTML scraping**: Access to platforms without APIs
|
|
|
|
**Flexibility**: Can integrate any platform regardless of API availability.
|
|
|
|
**3. OAuth2 Support**
|
|
|
|
Spotify and Tidal use OAuth2 with token caching:
|
|
|
|
- **Secure**: Industry-standard authentication
|
|
- **Efficient**: Token caching reduces auth requests
|
|
- **Automatic renewal**: Handles token expiration
|
|
|
|
**4. Rate Limiting**
|
|
|
|
Per-provider rate limiters with exponential backoff:
|
|
|
|
- **API compliance**: Respects provider rate limits
|
|
- **Retry-After support**: Parses and respects Retry-After headers
|
|
- **Configurable**: Different limits per provider
|
|
|
|
**5. Multi-Region Support**
|
|
|
|
iTunes queries multiple regions in parallel:
|
|
|
|
- **Global coverage**: Access region-specific releases
|
|
- **Parallel execution**: Faster than sequential queries
|
|
|
|
#### Weaknesses
|
|
|
|
**1. HTML Scraping Fragility**
|
|
|
|
4 providers rely on HTML scraping:
|
|
|
|
- **Breaks on redesigns**: Site changes break scrapers
|
|
- **Maintenance burden**: Requires constant updates
|
|
- **No guarantees**: Sites can block scrapers
|
|
|
|
**Mitigation**: Add monitoring for scraper failures, fallback to other providers
|
|
|
|
**2. KKBOX Not Implemented**
|
|
|
|
Mentioned but not implemented:
|
|
|
|
- **Missing coverage**: No Taiwan/Hong Kong/Southeast Asia specialist
|
|
- **Incomplete**: Documentation mentions it but code doesn't include it
|
|
|
|
**Mitigation**: Implement KKBOX provider or remove from documentation
|
|
|
|
**3. No Provider Health Monitoring**
|
|
|
|
No system to track provider availability:
|
|
|
|
- **Silent failures**: Providers can fail without notification
|
|
- **No metrics**: Can't track provider reliability over time
|
|
|
|
**Mitigation**: Add provider health checks and metrics
|
|
|
|
### MusicBrainz Integration (Score: 9/10)
|
|
|
|
#### Strengths
|
|
|
|
**1. Batch MBID Resolution**
|
|
|
|
100 URLs per request:
|
|
|
|
- **Efficient**: Reduces API calls by 100x
|
|
- **Fast**: Single request instead of 100
|
|
- **Caching**: Results cached for future lookups
|
|
|
|
**Real-world value**: Essential for duplicate detection.
|
|
|
|
**2. Duplicate Detection**
|
|
|
|
Checks if external URLs already linked to MusicBrainz:
|
|
|
|
- **Prevents duplicates**: Warns before creating duplicate releases
|
|
- **Links to existing**: Provides link to existing release
|
|
- **User-friendly**: Clear warning messages
|
|
|
|
**3. Seeding Integration**
|
|
|
|
Pre-filled form for MusicBrainz import:
|
|
|
|
- **Edit notes**: Include provider URLs and permalink
|
|
- **Annotation**: Extra metadata not in main form
|
|
- **Copy-to-clipboard**: Easy data transfer
|
|
|
|
**4. Template Provider Mode**
|
|
|
|
MusicBrainz as reference data:
|
|
|
|
- **Verification**: Compare external sources against MusicBrainz
|
|
- **Quality control**: Identify discrepancies
|
|
- **Improvement**: Find missing data in MusicBrainz
|
|
|
|
#### Weaknesses
|
|
|
|
**1. No Automatic Submission**
|
|
|
|
Manual copy-paste required:
|
|
|
|
- **Friction**: User must manually transfer data
|
|
- **Error-prone**: Copy-paste can introduce errors
|
|
|
|
**Mitigation**: Add MusicBrainz API submission (requires user authentication)
|
|
|
|
**2. No Edit Tracking**
|
|
|
|
No way to track submitted edits:
|
|
|
|
- **No feedback**: User doesn't know if edit was accepted
|
|
- **No metrics**: Can't measure Harmony's impact on MusicBrainz
|
|
|
|
**Mitigation**: Add edit tracking via MusicBrainz API
|
|
|
|
### Testing and Quality (Score: 9/10)
|
|
|
|
#### Strengths
|
|
|
|
**1. Comprehensive Test Coverage**
|
|
|
|
38 test files covering all modules:
|
|
|
|
- **Providers**: All 9 providers tested
|
|
- **Harmonizer**: Merge, compatibility, deduplication tested
|
|
- **MusicBrainz**: Seeding, MBID resolution tested
|
|
|
|
**2. Declarative Provider Tests**
|
|
|
|
`describeProvider` helper reduces boilerplate:
|
|
|
|
- **Consistent**: All providers tested the same way
|
|
- **Maintainable**: Changes to test structure affect all providers
|
|
- **Readable**: Tests are self-documenting
|
|
|
|
**3. Offline Testing**
|
|
|
|
43 cached responses in `testdata/`:
|
|
|
|
- **Fast**: No network requests during tests
|
|
- **Reproducible**: Same results every time
|
|
- **Offline-friendly**: Can test without internet
|
|
|
|
**4. Snapshot Testing**
|
|
|
|
Verify output stability:
|
|
|
|
- **Regression detection**: Catch unintended changes
|
|
- **Easy updates**: Update snapshots when changes are intentional
|
|
|
|
#### Weaknesses
|
|
|
|
**1. No Integration Tests**
|
|
|
|
Only unit tests, no end-to-end tests:
|
|
|
|
- **Missing coverage**: Full pipeline not tested together
|
|
- **Real-world scenarios**: Can't test actual provider interactions
|
|
|
|
**Mitigation**: Add integration tests with real provider calls (optional, gated by flag)
|
|
|
|
**2. No Performance Tests**
|
|
|
|
No benchmarks or performance tests:
|
|
|
|
- **No baselines**: Can't detect performance regressions
|
|
- **No optimization targets**: Don't know what to optimize
|
|
|
|
**Mitigation**: Add benchmark tests for critical paths (merge algorithm, provider lookups)
|
|
|
|
### Deployment and Operations (Score: 6/10)
|
|
|
|
#### Strengths
|
|
|
|
**1. Simple Deployment**
|
|
|
|
No Docker, no Kubernetes:
|
|
|
|
- **Low complexity**: Easy to understand and debug
|
|
- **Fast startup**: No container overhead
|
|
- **Direct access**: Can inspect process directly
|
|
|
|
**2. systemd Integration**
|
|
|
|
Standard Linux service management:
|
|
|
|
- **Familiar**: Most Linux admins know systemd
|
|
- **Reliable**: systemd handles restarts, logging
|
|
- **Secure**: systemd security hardening options
|
|
|
|
**3. CI/CD Automation**
|
|
|
|
GitHub Actions with SSH deployment:
|
|
|
|
- **Automated**: Deploy on git tag
|
|
- **Simple**: No complex orchestration
|
|
- **Reliable**: SSH is battle-tested
|
|
|
|
#### Weaknesses
|
|
|
|
**1. No Containerization**
|
|
|
|
No Docker support:
|
|
|
|
- **Deployment friction**: Requires Deno installation on server
|
|
- **Inconsistent environments**: Dev/prod differences possible
|
|
- **No orchestration**: Can't use Kubernetes, Docker Swarm
|
|
|
|
**Mitigation**: Add Dockerfile and docker-compose.yml
|
|
|
|
**2. No Monitoring**
|
|
|
|
No metrics, no health checks:
|
|
|
|
- **Blind operations**: Can't see system health
|
|
- **No alerting**: Can't detect issues proactively
|
|
- **No performance tracking**: Can't optimize without data
|
|
|
|
**Mitigation**: Add Prometheus metrics, health endpoint, logging aggregation
|
|
|
|
**3. No Horizontal Scaling**
|
|
|
|
Single-instance deployment:
|
|
|
|
- **Limited capacity**: Can't handle high traffic
|
|
- **No redundancy**: Single point of failure
|
|
- **No load balancing**: Can't distribute load
|
|
|
|
**Mitigation**: Add load balancer support, stateless design (already stateless)
|
|
|
|
**4. Manual Cache Management**
|
|
|
|
No automatic cache cleanup:
|
|
|
|
- **Disk growth**: Cache grows indefinitely
|
|
- **Manual intervention**: Requires manual cleanup scripts
|
|
- **No monitoring**: Don't know cache size without checking
|
|
|
|
**Mitigation**: Add automatic cache eviction, cache size monitoring
|
|
|
|
### Documentation (Score: 7/10)
|
|
|
|
#### Strengths
|
|
|
|
**1. Inline Comments**
|
|
|
|
Code is well-commented:
|
|
|
|
- **Type definitions**: Comprehensive JSDoc comments
|
|
- **Complex logic**: Explanations for non-obvious code
|
|
- **Examples**: Usage examples in comments
|
|
|
|
**2. Type Definitions as Documentation**
|
|
|
|
273-line HarmonyRelease schema is self-documenting:
|
|
|
|
- **Clear structure**: Types show data model
|
|
- **IDE support**: Autocomplete and type hints
|
|
- **Always up-to-date**: Types can't be out of sync with code
|
|
|
|
**3. Test Specs as Documentation**
|
|
|
|
Declarative provider tests show usage:
|
|
|
|
- **Examples**: Tests demonstrate how to use providers
|
|
- **Expected behavior**: Tests document expected outputs
|
|
|
|
#### Weaknesses
|
|
|
|
**1. No Architecture Documentation**
|
|
|
|
No high-level architecture docs:
|
|
|
|
- **Onboarding difficulty**: New contributors must read code
|
|
- **No diagrams**: Visual learners have no reference
|
|
- **No decision records**: Don't know why choices were made
|
|
|
|
**Mitigation**: Add architecture documentation (this analysis addresses this)
|
|
|
|
**2. No API Documentation**
|
|
|
|
No OpenAPI/Swagger spec:
|
|
|
|
- **Integration difficulty**: Developers must read code to understand API
|
|
- **No interactive docs**: Can't try API in browser
|
|
|
|
**Mitigation**: Add OpenAPI spec (once REST API is added)
|
|
|
|
**3. No User Guide**
|
|
|
|
No end-user documentation:
|
|
|
|
- **Learning curve**: Users must figure out UI themselves
|
|
- **No tutorials**: No step-by-step guides
|
|
- **No FAQ**: Common questions not answered
|
|
|
|
**Mitigation**: Add user guide with screenshots and examples
|
|
|
|
## Comparison with Alternatives
|
|
|
|
### vs. Beets
|
|
|
|
**Beets**: Music library management tool with metadata fetching
|
|
|
|
| Aspect | Harmony | Beets |
|
|
|--------|---------|-------|
|
|
| **Purpose** | MusicBrainz seeding | Library management |
|
|
| **Architecture** | Web UI + CLI | CLI only |
|
|
| **Providers** | 9 providers | MusicBrainz + plugins |
|
|
| **Merge algorithm** | 3-phase intelligent merge | Plugin-based |
|
|
| **MusicBrainz integration** | Seeding focus | Lookup focus |
|
|
| **Language** | TypeScript/Deno | Python |
|
|
| **Deployment** | Self-hosted web app | Local CLI tool |
|
|
|
|
**Verdict**: Harmony is better for MusicBrainz seeding, Beets is better for library management.
|
|
|
|
### vs. Picard
|
|
|
|
**Picard**: MusicBrainz official tagger
|
|
|
|
| Aspect | Harmony | Picard |
|
|
|--------|---------|-------|
|
|
| **Purpose** | Multi-source aggregation | MusicBrainz tagging |
|
|
| **Architecture** | Web UI | Desktop GUI |
|
|
| **Providers** | 9 providers | MusicBrainz + AcoustID |
|
|
| **Merge algorithm** | Intelligent merge | MusicBrainz priority |
|
|
| **Use case** | Release research | File tagging |
|
|
| **Language** | TypeScript/Deno | Python/Qt |
|
|
|
|
**Verdict**: Harmony is better for release research, Picard is better for file tagging.
|
|
|
|
### vs. Custom Scraper
|
|
|
|
**Custom Scraper**: Ad-hoc provider integration
|
|
|
|
| Aspect | Harmony | Custom Scraper |
|
|
|--------|---------|----------------|
|
|
| **Architecture** | 4-stage pipeline | Ad-hoc |
|
|
| **Provider abstraction** | Base classes | None |
|
|
| **Merge algorithm** | 3-phase intelligent | Manual |
|
|
| **Type safety** | Full TypeScript | Varies |
|
|
| **Testing** | 38 test files | Varies |
|
|
| **Maintenance** | Single codebase | Per-scraper |
|
|
|
|
**Verdict**: Harmony is vastly superior to custom scrapers.
|
|
|
|
## Adoption Recommendations
|
|
|
|
### What to Adopt
|
|
|
|
#### 1. Architecture Patterns (Priority: CRITICAL)
|
|
|
|
**Adopt**:
|
|
- 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED)
|
|
- Provider base class hierarchy
|
|
- Feature quality rating system
|
|
- Graceful degradation via Promise.allSettled
|
|
|
|
**Rationale**: These patterns are proven, well-designed, and solve real problems.
|
|
|
|
**Implementation**:
|
|
```typescript
|
|
// Adopt provider base class
|
|
abstract class MetadataProvider {
|
|
abstract name: string;
|
|
abstract urlPattern: URLPattern;
|
|
abstract lookupByUrl(url: string): Promise<Release>;
|
|
abstract harmonize(release: Release): HarmonyRelease;
|
|
abstract featureQuality: FeatureQualityMap;
|
|
}
|
|
|
|
// Adopt 4-stage pipeline
|
|
async function aggregateMetadata(input: LookupInput): Promise<MergedHarmonyRelease> {
|
|
// Stage 1: LOOKUP
|
|
const releases = await combinedLookup(input);
|
|
|
|
// Stage 2: HARMONIZE (already done in provider.lookup)
|
|
|
|
// Stage 3: MERGE
|
|
const merged = await mergeReleases(releases);
|
|
|
|
// Stage 4: SEED (optional)
|
|
const mbFormat = await convertToMusicBrainz(merged);
|
|
|
|
return merged;
|
|
}
|
|
```
|
|
|
|
#### 2. Data Model (Priority: HIGH)
|
|
|
|
**Adopt**:
|
|
- HarmonyRelease schema (273 lines)
|
|
- PartialDate interface
|
|
- ArtistCreditName with join phrases
|
|
- SourceMap for data provenance
|
|
- IncompatibilityInfo for conflict reporting
|
|
|
|
**Rationale**: Comprehensive, well-designed, covers all metadata needs.
|
|
|
|
**Modifications**:
|
|
- Add `schemaVersion` field
|
|
- Add `extensions` object for provider-specific data
|
|
|
|
#### 3. Merge Algorithm (Priority: HIGH)
|
|
|
|
**Adopt**:
|
|
- 3-phase merge (collect → check compatibility → select best)
|
|
- Provider preference system
|
|
- Compatibility checking
|
|
- Conflict reporting
|
|
|
|
**Rationale**: Solves the "which source wins" problem elegantly.
|
|
|
|
**Enhancements**:
|
|
- Add user override mechanism
|
|
- Add machine learning for automatic preference learning
|
|
|
|
#### 4. Testing Patterns (Priority: MEDIUM)
|
|
|
|
**Adopt**:
|
|
- Declarative provider tests (`describeProvider`)
|
|
- Offline testing with cached responses
|
|
- Snapshot testing
|
|
|
|
**Rationale**: Reduces boilerplate, improves maintainability.
|
|
|
|
### What to Modify
|
|
|
|
#### 1. Add REST API (Priority: CRITICAL)
|
|
|
|
**Current**: Web UI only
|
|
|
|
**Proposed**: Add REST API layer
|
|
|
|
**Endpoints**:
|
|
```
|
|
GET /api/v1/release?gtin={gtin}®ion={region}
|
|
GET /api/v1/release?url={url}
|
|
POST /api/v1/release/batch
|
|
GET /api/v1/providers
|
|
GET /api/v1/providers/{name}
|
|
```
|
|
|
|
**Response format**: JSON (HarmonyRelease or MergedHarmonyRelease)
|
|
|
|
**Benefits**:
|
|
- Programmatic access
|
|
- Integration with other applications
|
|
- Mobile app support
|
|
- Batch processing
|
|
|
|
#### 2. Add Containerization (Priority: HIGH)
|
|
|
|
**Current**: No Docker
|
|
|
|
**Proposed**: Add Dockerfile and docker-compose.yml
|
|
|
|
**Dockerfile**:
|
|
```dockerfile
|
|
FROM denoland/deno:1.37.0
|
|
|
|
WORKDIR /app
|
|
COPY . .
|
|
|
|
RUN deno cache server/main.ts
|
|
|
|
EXPOSE 8000
|
|
CMD ["deno", "run", "-A", "server/main.ts"]
|
|
```
|
|
|
|
**docker-compose.yml**:
|
|
```yaml
|
|
version: '3.8'
|
|
services:
|
|
harmony:
|
|
build: .
|
|
ports:
|
|
- "8000:8000"
|
|
environment:
|
|
- HARMONY_SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
|
|
- HARMONY_SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
|
|
volumes:
|
|
- ./data:/var/lib/harmony
|
|
```
|
|
|
|
**Benefits**:
|
|
- Consistent environments
|
|
- Easy deployment
|
|
- Orchestration support (Kubernetes)
|
|
|
|
#### 3. Add Monitoring (Priority: HIGH)
|
|
|
|
**Current**: No metrics, no health checks
|
|
|
|
**Proposed**: Add Prometheus metrics and health endpoint
|
|
|
|
**Metrics**:
|
|
- Request count by route
|
|
- Request duration by route
|
|
- Provider success/failure rate
|
|
- Cache hit/miss rate
|
|
- Merge conflict rate
|
|
|
|
**Health endpoint**:
|
|
```typescript
|
|
// GET /health
|
|
{
|
|
"status": "ok",
|
|
"version": "v1.2.3",
|
|
"uptime": 3600,
|
|
"providers": {
|
|
"spotify": "ok",
|
|
"deezer": "ok",
|
|
"itunes": "degraded"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Benefits**:
|
|
- Proactive issue detection
|
|
- Performance optimization
|
|
- Capacity planning
|
|
|
|
#### 4. Add Provider Health Monitoring (Priority: MEDIUM)
|
|
|
|
**Current**: Silent provider failures
|
|
|
|
**Proposed**: Track provider availability and performance
|
|
|
|
**Implementation**:
|
|
```typescript
|
|
interface ProviderHealth {
|
|
name: string;
|
|
status: 'ok' | 'degraded' | 'down';
|
|
successRate: number; // Last 100 requests
|
|
avgResponseTime: number; // Milliseconds
|
|
lastSuccess: number; // Timestamp
|
|
lastFailure: number; // Timestamp
|
|
lastError?: string;
|
|
}
|
|
```
|
|
|
|
**Benefits**:
|
|
- Identify unreliable providers
|
|
- Adjust provider preferences dynamically
|
|
- Alert on provider failures
|
|
|
|
### What to Avoid
|
|
|
|
#### 1. Don't Add Database (Priority: HIGH)
|
|
|
|
**Current**: Cache-first, no database
|
|
|
|
**Recommendation**: Keep cache-first approach
|
|
|
|
**Rationale**:
|
|
- Simplicity is a strength
|
|
- No migrations to manage
|
|
- Stateless design enables horizontal scaling
|
|
- Permalink system works well with cache
|
|
|
|
**Exception**: If adding user accounts, use separate auth database (don't mix with metadata)
|
|
|
|
#### 2. Don't Add Complex Build System (Priority: MEDIUM)
|
|
|
|
**Current**: Deno handles everything
|
|
|
|
**Recommendation**: Keep Deno's built-in tooling
|
|
|
|
**Rationale**:
|
|
- Deno fmt, lint, test are sufficient
|
|
- No need for Webpack, Vite, etc.
|
|
- Fresh handles asset bundling
|
|
|
|
**Exception**: If migrating to Node.js, use Vite or similar
|
|
|
|
#### 3. Don't Rewrite in Another Language (Priority: HIGH)
|
|
|
|
**Current**: TypeScript/Deno
|
|
|
|
**Recommendation**: Keep TypeScript/Deno
|
|
|
|
**Rationale**:
|
|
- Type safety is critical for data aggregation
|
|
- Deno tooling is excellent
|
|
- Migration cost is high
|
|
- No significant benefits from other languages
|
|
|
|
**Exception**: If Deno becomes unmaintained (unlikely)
|
|
|
|
## Integration Strategy
|
|
|
|
### Phase 1: Study and Prototype (2-4 weeks)
|
|
|
|
**Goals**:
|
|
- Deep understanding of Harmony architecture
|
|
- Prototype key components in target stack
|
|
- Validate design decisions
|
|
|
|
**Tasks**:
|
|
1. Read all source code
|
|
2. Run Harmony locally
|
|
3. Test all providers
|
|
4. Prototype provider base class
|
|
5. Prototype merge algorithm
|
|
6. Prototype HarmonyRelease schema
|
|
|
|
**Deliverables**:
|
|
- Architecture documentation (this document)
|
|
- Prototype codebase
|
|
- Design decisions document
|
|
|
|
### Phase 2: Core Implementation (6-8 weeks)
|
|
|
|
**Goals**:
|
|
- Implement 4-stage pipeline
|
|
- Implement provider abstraction
|
|
- Implement merge algorithm
|
|
- Implement 3-5 providers
|
|
|
|
**Tasks**:
|
|
1. Implement MetadataProvider base class
|
|
2. Implement HarmonyRelease schema
|
|
3. Implement CombinedReleaseLookup
|
|
4. Implement merge algorithm
|
|
5. Implement Spotify provider
|
|
6. Implement Deezer provider
|
|
7. Implement MusicBrainz provider
|
|
8. Add comprehensive tests
|
|
|
|
**Deliverables**:
|
|
- Working 4-stage pipeline
|
|
- 3-5 providers implemented
|
|
- Test coverage >80%
|
|
|
|
### Phase 3: API and Deployment (4-6 weeks)
|
|
|
|
**Goals**:
|
|
- Add REST API
|
|
- Add containerization
|
|
- Add monitoring
|
|
- Deploy to production
|
|
|
|
**Tasks**:
|
|
1. Design REST API
|
|
2. Implement API endpoints
|
|
3. Add OpenAPI documentation
|
|
4. Create Dockerfile
|
|
5. Add Prometheus metrics
|
|
6. Add health endpoint
|
|
7. Deploy to staging
|
|
8. Load testing
|
|
9. Deploy to production
|
|
|
|
**Deliverables**:
|
|
- REST API with OpenAPI spec
|
|
- Docker images
|
|
- Monitoring dashboard
|
|
- Production deployment
|
|
|
|
### Phase 4: Expansion (Ongoing)
|
|
|
|
**Goals**:
|
|
- Add more providers
|
|
- Improve merge algorithm
|
|
- Add features
|
|
|
|
**Tasks**:
|
|
1. Add iTunes provider
|
|
2. Add Tidal provider
|
|
3. Add Bandcamp provider
|
|
4. Improve compatibility checking
|
|
5. Add machine learning for provider preferences
|
|
6. Add user feedback mechanism
|
|
|
|
**Deliverables**:
|
|
- 9+ providers
|
|
- Improved merge accuracy
|
|
- User feedback system
|
|
|
|
## Risk Assessment
|
|
|
|
### Technical Risks
|
|
|
|
| Risk | Probability | Impact | Mitigation |
|
|
|------|-------------|--------|------------|
|
|
| **Provider API changes** | High | High | Monitor provider APIs, add health checks, graceful degradation |
|
|
| **HTML scraping breaks** | High | Medium | Monitor scraper failures, fallback to other providers |
|
|
| **Rate limiting** | Medium | Medium | Respect rate limits, implement backoff, cache aggressively |
|
|
| **OAuth2 token expiration** | Low | Low | Automatic token renewal, error handling |
|
|
| **Merge conflicts** | Medium | Medium | Comprehensive compatibility checking, user override |
|
|
| **Performance degradation** | Low | Medium | Monitoring, caching, optimization |
|
|
|
|
### Operational Risks
|
|
|
|
| Risk | Probability | Impact | Mitigation |
|
|
|------|-------------|--------|------------|
|
|
| **Single developer dependency** | High | High | Build community, document architecture, onboard contributors |
|
|
| **Deno ecosystem changes** | Low | Medium | Monitor Deno releases, test before upgrading |
|
|
| **Fresh framework changes** | Medium | Medium | Pin Fresh version, test before upgrading |
|
|
| **Provider terms of service** | Low | High | Review ToS, add rate limiting, respect robots.txt |
|
|
| **Cache growth** | Medium | Low | Automatic cache eviction, monitoring |
|
|
|
|
### Business Risks
|
|
|
|
| Risk | Probability | Impact | Mitigation |
|
|
|------|-------------|--------|------------|
|
|
| **Low adoption** | Medium | Medium | Marketing, documentation, community building |
|
|
| **Competition** | Low | Low | Focus on MusicBrainz integration, unique features |
|
|
| **Maintenance burden** | Medium | Medium | Automate testing, monitoring, deployment |
|
|
|
|
## Conclusion
|
|
|
|
Harmony is an **exceptional reference project** for music metadata aggregation. Its architecture, data model, and merge algorithm are best-in-class and should be adopted with minimal modifications.
|
|
|
|
**Key Takeaways**:
|
|
|
|
1. **Architecture**: 4-stage pipeline is proven and extensible
|
|
2. **Data Model**: HarmonyRelease schema is comprehensive and well-designed
|
|
3. **Merge Algorithm**: 3-phase merge with provider preferences solves real problems
|
|
4. **Provider Abstraction**: Base class hierarchy enables easy provider addition
|
|
5. **Type Safety**: Full TypeScript coverage prevents bugs
|
|
6. **Testing**: Declarative provider tests and offline testing are excellent patterns
|
|
|
|
**Critical Additions**:
|
|
|
|
1. **REST API**: Essential for programmatic access
|
|
2. **Containerization**: Simplifies deployment
|
|
3. **Monitoring**: Required for production operations
|
|
4. **Documentation**: Improves onboarding and adoption
|
|
|
|
**Adoption Path**:
|
|
|
|
1. Study Harmony architecture (2-4 weeks)
|
|
2. Implement core components (6-8 weeks)
|
|
3. Add API and deployment (4-6 weeks)
|
|
4. Expand providers and features (ongoing)
|
|
|
|
**Expected Outcome**: Production-ready metadata aggregation system with 9+ providers, intelligent merging, and MusicBrainz integration within 3-4 months.
|
|
|
|
## Relevance Score: 10/10
|
|
|
|
Harmony is the **most relevant project** for metadata aggregation:
|
|
|
|
- **Architecture**: Best-in-class multi-source aggregation
|
|
- **Data Model**: Comprehensive and well-designed
|
|
- **MusicBrainz Integration**: Seamless seeding workflow
|
|
- **Code Quality**: Type-safe, well-tested, maintainable
|
|
- **Production-Ready**: Used by MusicBrainz community
|
|
|
|
**Recommendation**: **Adopt Harmony's architecture as the foundation** for the metadata aggregation system. The investment in studying and adapting Harmony will pay dividends in reduced development time, fewer bugs, and better design decisions.
|