- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
26 KiB
Harmony - Evaluation and Recommendations
Executive Summary
Harmony is the most relevant and architecturally sound reference project for building a music metadata aggregation system. Its 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED), provider abstraction system, and intelligent merge algorithm represent best-in-class design patterns for multi-source data integration.
Key Strengths:
- Best-in-class multi-source aggregation architecture
- Intelligent 3-phase merge algorithm with provider preferences
- Comprehensive 273-line HarmonyRelease schema
- MusicBrainz integration with MBID resolution and seeding
- Type-safe TypeScript implementation with full test coverage
- Graceful degradation via Promise.allSettled
- Permalink system for reproducible results
Key Limitations:
- Web UI only (no REST/JSON API)
- Single developer project (bus factor = 1)
- No containerization (Docker)
- HTML scraping providers are fragile
- No monitoring/metrics infrastructure
Recommendation: Adopt Harmony's architecture patterns while addressing limitations through:
- Add REST API layer for programmatic access
- Containerize for easier deployment
- Add monitoring and metrics
- Expand provider ecosystem
- Build community around project
Detailed Evaluation
Architecture (Score: 9.5/10)
Strengths
1. 4-Stage Pipeline Design
The LOOKUP → HARMONIZE → MERGE → SEED pipeline is exceptionally well-designed:
- Clear separation of concerns: Each stage has distinct responsibilities
- Composable: Stages can be used independently or combined
- Testable: Each stage can be tested in isolation
- Extensible: New providers or merge strategies can be added without affecting other stages
Example Use Cases:
- LOOKUP only: Fetch data from providers without harmonization
- LOOKUP + HARMONIZE: Get standardized data without merging
- Full pipeline: Complete aggregation and MusicBrainz seeding
2. Provider Abstraction System
The base class hierarchy is exemplary:
MetadataProvider (abstract)
├── MetadataApiProvider (OAuth2)
├── ReleaseLookup (GTIN/URL/ID)
└── ReleaseApiLookup (multi-region)
Benefits:
- Consistent interface: All providers implement same methods
- Code reuse: Common functionality (caching, rate limiting, OAuth2) in base classes
- Easy provider addition: New providers require minimal boilerplate
- Feature quality ratings: Transparent quality assessment
3. Intelligent Merge Algorithm
The 3-phase merge (collect → check compatibility → select best) is sophisticated:
- Compatibility checking: Detects conflicts before merging
- Provider preferences: Configurable priority order
- Source tracking: SourceMap records which provider contributed each field
- Conflict reporting: IncompatibilityInfo provides detailed conflict information
Real-world value: Solves the "which source wins" problem elegantly.
4. Type Safety
Full TypeScript coverage with 273-line HarmonyRelease schema ensures:
- Compile-time error detection: Catch bugs before runtime
- IDE autocomplete: Better developer experience
- Self-documenting: Types serve as documentation
- Refactoring safety: Changes propagate through type system
Weaknesses
1. No REST API
Web UI only limits programmatic access:
- Integration difficulty: Other applications can't easily consume data
- Automation challenges: No API for batch processing
- Mobile apps: Can't build native mobile clients
Mitigation: Add REST API layer (see recommendations)
2. Tight Coupling to Fresh Framework
Fresh is Deno-only, limiting deployment options:
- No Node.js support: Can't run on Node.js infrastructure
- Framework lock-in: Migrating to another framework would be difficult
- Smaller ecosystem: Fresh has fewer resources than Next.js/Remix
Mitigation: Extract core logic into framework-agnostic library
Data Model (Score: 9/10)
Strengths
1. Comprehensive HarmonyRelease Schema
273 lines covering all music metadata needs:
- Basic metadata: Title, artists, GTIN
- Media structure: Multi-disc support with tracks
- Commercial info: Labels, catalog numbers, copyright
- Distribution: Available/excluded countries
- Visual assets: Images with dimensions and types
- External links: Provider URLs with link types
- Metadata about metadata: Providers, messages, source map
Coverage: Matches or exceeds MusicBrainz schema.
2. Partial Date Support
PartialDate interface handles incomplete dates:
{ year: 2014 } // Year only
{ year: 2014, month: 11 } // Year and month
{ year: 2014, month: 11, day: 24 } // Full date
Real-world value: Many releases have incomplete release dates.
3. Artist Credit System
ArtistCreditName[] with join phrases:
[
{ name: "Artist A", joinPhrase: " & " },
{ name: "Artist B", joinPhrase: " feat. " },
{ name: "Artist C" }
]
// Renders: "Artist A & Artist B feat. Artist C"
Real-world value: Handles complex artist credits (collaborations, features, etc.)
4. Source Tracking
SourceMap records which provider contributed each field:
{
"title": "spotify",
"releaseDate": "spotify",
"gtin": "deezer",
"media[0].tracks[0].isrc": "spotify"
}
Real-world value: Enables data provenance and debugging.
Weaknesses
1. No Versioning
Schema has no version field:
- Breaking changes: No way to detect schema version
- Migration challenges: Can't handle multiple schema versions simultaneously
Mitigation: Add schemaVersion field to HarmonyRelease
2. Limited Extensibility
No extension mechanism for provider-specific data:
- Custom fields: No way to store provider-specific metadata
- Experimental features: Can't add new fields without schema change
Mitigation: Add extensions object for provider-specific data
Provider Integration (Score: 8.5/10)
Strengths
1. Diverse Provider Ecosystem
9 providers covering major platforms:
- Streaming: Spotify, Deezer, Tidal
- Purchase: iTunes, Bandcamp, Beatport
- Regional: Mora, Ototoy (Japan)
- Reference: MusicBrainz
Coverage: Excellent global coverage with regional specialists.
2. Multi-Access Methods
Both API-based (5) and HTML scraping (4):
- API-based: Reliable, structured data
- HTML scraping: Access to platforms without APIs
Flexibility: Can integrate any platform regardless of API availability.
3. OAuth2 Support
Spotify and Tidal use OAuth2 with token caching:
- Secure: Industry-standard authentication
- Efficient: Token caching reduces auth requests
- Automatic renewal: Handles token expiration
4. Rate Limiting
Per-provider rate limiters with exponential backoff:
- API compliance: Respects provider rate limits
- Retry-After support: Parses and respects Retry-After headers
- Configurable: Different limits per provider
5. Multi-Region Support
iTunes queries multiple regions in parallel:
- Global coverage: Access region-specific releases
- Parallel execution: Faster than sequential queries
Weaknesses
1. HTML Scraping Fragility
4 providers rely on HTML scraping:
- Breaks on redesigns: Site changes break scrapers
- Maintenance burden: Requires constant updates
- No guarantees: Sites can block scrapers
Mitigation: Add monitoring for scraper failures, fallback to other providers
2. KKBOX Not Implemented
Mentioned but not implemented:
- Missing coverage: No Taiwan/Hong Kong/Southeast Asia specialist
- Incomplete: Documentation mentions it but code doesn't include it
Mitigation: Implement KKBOX provider or remove from documentation
3. No Provider Health Monitoring
No system to track provider availability:
- Silent failures: Providers can fail without notification
- No metrics: Can't track provider reliability over time
Mitigation: Add provider health checks and metrics
MusicBrainz Integration (Score: 9/10)
Strengths
1. Batch MBID Resolution
100 URLs per request:
- Efficient: Reduces API calls by 100x
- Fast: Single request instead of 100
- Caching: Results cached for future lookups
Real-world value: Essential for duplicate detection.
2. Duplicate Detection
Checks if external URLs already linked to MusicBrainz:
- Prevents duplicates: Warns before creating duplicate releases
- Links to existing: Provides link to existing release
- User-friendly: Clear warning messages
3. Seeding Integration
Pre-filled form for MusicBrainz import:
- Edit notes: Include provider URLs and permalink
- Annotation: Extra metadata not in main form
- Copy-to-clipboard: Easy data transfer
4. Template Provider Mode
MusicBrainz as reference data:
- Verification: Compare external sources against MusicBrainz
- Quality control: Identify discrepancies
- Improvement: Find missing data in MusicBrainz
Weaknesses
1. No Automatic Submission
Manual copy-paste required:
- Friction: User must manually transfer data
- Error-prone: Copy-paste can introduce errors
Mitigation: Add MusicBrainz API submission (requires user authentication)
2. No Edit Tracking
No way to track submitted edits:
- No feedback: User doesn't know if edit was accepted
- No metrics: Can't measure Harmony's impact on MusicBrainz
Mitigation: Add edit tracking via MusicBrainz API
Testing and Quality (Score: 9/10)
Strengths
1. Comprehensive Test Coverage
38 test files covering all modules:
- Providers: All 9 providers tested
- Harmonizer: Merge, compatibility, deduplication tested
- MusicBrainz: Seeding, MBID resolution tested
2. Declarative Provider Tests
describeProvider helper reduces boilerplate:
- Consistent: All providers tested the same way
- Maintainable: Changes to test structure affect all providers
- Readable: Tests are self-documenting
3. Offline Testing
43 cached responses in testdata/:
- Fast: No network requests during tests
- Reproducible: Same results every time
- Offline-friendly: Can test without internet
4. Snapshot Testing
Verify output stability:
- Regression detection: Catch unintended changes
- Easy updates: Update snapshots when changes are intentional
Weaknesses
1. No Integration Tests
Only unit tests, no end-to-end tests:
- Missing coverage: Full pipeline not tested together
- Real-world scenarios: Can't test actual provider interactions
Mitigation: Add integration tests with real provider calls (optional, gated by flag)
2. No Performance Tests
No benchmarks or performance tests:
- No baselines: Can't detect performance regressions
- No optimization targets: Don't know what to optimize
Mitigation: Add benchmark tests for critical paths (merge algorithm, provider lookups)
Deployment and Operations (Score: 6/10)
Strengths
1. Simple Deployment
No Docker, no Kubernetes:
- Low complexity: Easy to understand and debug
- Fast startup: No container overhead
- Direct access: Can inspect process directly
2. systemd Integration
Standard Linux service management:
- Familiar: Most Linux admins know systemd
- Reliable: systemd handles restarts, logging
- Secure: systemd security hardening options
3. CI/CD Automation
GitHub Actions with SSH deployment:
- Automated: Deploy on git tag
- Simple: No complex orchestration
- Reliable: SSH is battle-tested
Weaknesses
1. No Containerization
No Docker support:
- Deployment friction: Requires Deno installation on server
- Inconsistent environments: Dev/prod differences possible
- No orchestration: Can't use Kubernetes, Docker Swarm
Mitigation: Add Dockerfile and docker-compose.yml
2. No Monitoring
No metrics, no health checks:
- Blind operations: Can't see system health
- No alerting: Can't detect issues proactively
- No performance tracking: Can't optimize without data
Mitigation: Add Prometheus metrics, health endpoint, logging aggregation
3. No Horizontal Scaling
Single-instance deployment:
- Limited capacity: Can't handle high traffic
- No redundancy: Single point of failure
- No load balancing: Can't distribute load
Mitigation: Add load balancer support, stateless design (already stateless)
4. Manual Cache Management
No automatic cache cleanup:
- Disk growth: Cache grows indefinitely
- Manual intervention: Requires manual cleanup scripts
- No monitoring: Don't know cache size without checking
Mitigation: Add automatic cache eviction, cache size monitoring
Documentation (Score: 7/10)
Strengths
1. Inline Comments
Code is well-commented:
- Type definitions: Comprehensive JSDoc comments
- Complex logic: Explanations for non-obvious code
- Examples: Usage examples in comments
2. Type Definitions as Documentation
273-line HarmonyRelease schema is self-documenting:
- Clear structure: Types show data model
- IDE support: Autocomplete and type hints
- Always up-to-date: Types can't be out of sync with code
3. Test Specs as Documentation
Declarative provider tests show usage:
- Examples: Tests demonstrate how to use providers
- Expected behavior: Tests document expected outputs
Weaknesses
1. No Architecture Documentation
No high-level architecture docs:
- Onboarding difficulty: New contributors must read code
- No diagrams: Visual learners have no reference
- No decision records: Don't know why choices were made
Mitigation: Add architecture documentation (this analysis addresses this)
2. No API Documentation
No OpenAPI/Swagger spec:
- Integration difficulty: Developers must read code to understand API
- No interactive docs: Can't try API in browser
Mitigation: Add OpenAPI spec (once REST API is added)
3. No User Guide
No end-user documentation:
- Learning curve: Users must figure out UI themselves
- No tutorials: No step-by-step guides
- No FAQ: Common questions not answered
Mitigation: Add user guide with screenshots and examples
Comparison with Alternatives
vs. Beets
Beets: Music library management tool with metadata fetching
| Aspect | Harmony | Beets |
|---|---|---|
| Purpose | MusicBrainz seeding | Library management |
| Architecture | Web UI + CLI | CLI only |
| Providers | 9 providers | MusicBrainz + plugins |
| Merge algorithm | 3-phase intelligent merge | Plugin-based |
| MusicBrainz integration | Seeding focus | Lookup focus |
| Language | TypeScript/Deno | Python |
| Deployment | Self-hosted web app | Local CLI tool |
Verdict: Harmony is better for MusicBrainz seeding, Beets is better for library management.
vs. Picard
Picard: MusicBrainz official tagger
| Aspect | Harmony | Picard |
|---|---|---|
| Purpose | Multi-source aggregation | MusicBrainz tagging |
| Architecture | Web UI | Desktop GUI |
| Providers | 9 providers | MusicBrainz + AcoustID |
| Merge algorithm | Intelligent merge | MusicBrainz priority |
| Use case | Release research | File tagging |
| Language | TypeScript/Deno | Python/Qt |
Verdict: Harmony is better for release research, Picard is better for file tagging.
vs. Custom Scraper
Custom Scraper: Ad-hoc provider integration
| Aspect | Harmony | Custom Scraper |
|---|---|---|
| Architecture | 4-stage pipeline | Ad-hoc |
| Provider abstraction | Base classes | None |
| Merge algorithm | 3-phase intelligent | Manual |
| Type safety | Full TypeScript | Varies |
| Testing | 38 test files | Varies |
| Maintenance | Single codebase | Per-scraper |
Verdict: Harmony is vastly superior to custom scrapers.
Adoption Recommendations
What to Adopt
1. Architecture Patterns (Priority: CRITICAL)
Adopt:
- 4-stage pipeline (LOOKUP → HARMONIZE → MERGE → SEED)
- Provider base class hierarchy
- Feature quality rating system
- Graceful degradation via Promise.allSettled
Rationale: These patterns are proven, well-designed, and solve real problems.
Implementation:
// Adopt provider base class
abstract class MetadataProvider {
abstract name: string;
abstract urlPattern: URLPattern;
abstract lookupByUrl(url: string): Promise<Release>;
abstract harmonize(release: Release): HarmonyRelease;
abstract featureQuality: FeatureQualityMap;
}
// Adopt 4-stage pipeline
async function aggregateMetadata(input: LookupInput): Promise<MergedHarmonyRelease> {
// Stage 1: LOOKUP
const releases = await combinedLookup(input);
// Stage 2: HARMONIZE (already done in provider.lookup)
// Stage 3: MERGE
const merged = await mergeReleases(releases);
// Stage 4: SEED (optional)
const mbFormat = await convertToMusicBrainz(merged);
return merged;
}
2. Data Model (Priority: HIGH)
Adopt:
- HarmonyRelease schema (273 lines)
- PartialDate interface
- ArtistCreditName with join phrases
- SourceMap for data provenance
- IncompatibilityInfo for conflict reporting
Rationale: Comprehensive, well-designed, covers all metadata needs.
Modifications:
- Add
schemaVersionfield - Add
extensionsobject for provider-specific data
3. Merge Algorithm (Priority: HIGH)
Adopt:
- 3-phase merge (collect → check compatibility → select best)
- Provider preference system
- Compatibility checking
- Conflict reporting
Rationale: Solves the "which source wins" problem elegantly.
Enhancements:
- Add user override mechanism
- Add machine learning for automatic preference learning
4. Testing Patterns (Priority: MEDIUM)
Adopt:
- Declarative provider tests (
describeProvider) - Offline testing with cached responses
- Snapshot testing
Rationale: Reduces boilerplate, improves maintainability.
What to Modify
1. Add REST API (Priority: CRITICAL)
Current: Web UI only
Proposed: Add REST API layer
Endpoints:
GET /api/v1/release?gtin={gtin}®ion={region}
GET /api/v1/release?url={url}
POST /api/v1/release/batch
GET /api/v1/providers
GET /api/v1/providers/{name}
Response format: JSON (HarmonyRelease or MergedHarmonyRelease)
Benefits:
- Programmatic access
- Integration with other applications
- Mobile app support
- Batch processing
2. Add Containerization (Priority: HIGH)
Current: No Docker
Proposed: Add Dockerfile and docker-compose.yml
Dockerfile:
FROM denoland/deno:1.37.0
WORKDIR /app
COPY . .
RUN deno cache server/main.ts
EXPOSE 8000
CMD ["deno", "run", "-A", "server/main.ts"]
docker-compose.yml:
version: '3.8'
services:
harmony:
build: .
ports:
- "8000:8000"
environment:
- HARMONY_SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
- HARMONY_SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
volumes:
- ./data:/var/lib/harmony
Benefits:
- Consistent environments
- Easy deployment
- Orchestration support (Kubernetes)
3. Add Monitoring (Priority: HIGH)
Current: No metrics, no health checks
Proposed: Add Prometheus metrics and health endpoint
Metrics:
- Request count by route
- Request duration by route
- Provider success/failure rate
- Cache hit/miss rate
- Merge conflict rate
Health endpoint:
// GET /health
{
"status": "ok",
"version": "v1.2.3",
"uptime": 3600,
"providers": {
"spotify": "ok",
"deezer": "ok",
"itunes": "degraded"
}
}
Benefits:
- Proactive issue detection
- Performance optimization
- Capacity planning
4. Add Provider Health Monitoring (Priority: MEDIUM)
Current: Silent provider failures
Proposed: Track provider availability and performance
Implementation:
interface ProviderHealth {
name: string;
status: 'ok' | 'degraded' | 'down';
successRate: number; // Last 100 requests
avgResponseTime: number; // Milliseconds
lastSuccess: number; // Timestamp
lastFailure: number; // Timestamp
lastError?: string;
}
Benefits:
- Identify unreliable providers
- Adjust provider preferences dynamically
- Alert on provider failures
What to Avoid
1. Don't Add Database (Priority: HIGH)
Current: Cache-first, no database
Recommendation: Keep cache-first approach
Rationale:
- Simplicity is a strength
- No migrations to manage
- Stateless design enables horizontal scaling
- Permalink system works well with cache
Exception: If adding user accounts, use separate auth database (don't mix with metadata)
2. Don't Add Complex Build System (Priority: MEDIUM)
Current: Deno handles everything
Recommendation: Keep Deno's built-in tooling
Rationale:
- Deno fmt, lint, test are sufficient
- No need for Webpack, Vite, etc.
- Fresh handles asset bundling
Exception: If migrating to Node.js, use Vite or similar
3. Don't Rewrite in Another Language (Priority: HIGH)
Current: TypeScript/Deno
Recommendation: Keep TypeScript/Deno
Rationale:
- Type safety is critical for data aggregation
- Deno tooling is excellent
- Migration cost is high
- No significant benefits from other languages
Exception: If Deno becomes unmaintained (unlikely)
Integration Strategy
Phase 1: Study and Prototype (2-4 weeks)
Goals:
- Deep understanding of Harmony architecture
- Prototype key components in target stack
- Validate design decisions
Tasks:
- Read all source code
- Run Harmony locally
- Test all providers
- Prototype provider base class
- Prototype merge algorithm
- Prototype HarmonyRelease schema
Deliverables:
- Architecture documentation (this document)
- Prototype codebase
- Design decisions document
Phase 2: Core Implementation (6-8 weeks)
Goals:
- Implement 4-stage pipeline
- Implement provider abstraction
- Implement merge algorithm
- Implement 3-5 providers
Tasks:
- Implement MetadataProvider base class
- Implement HarmonyRelease schema
- Implement CombinedReleaseLookup
- Implement merge algorithm
- Implement Spotify provider
- Implement Deezer provider
- Implement MusicBrainz provider
- Add comprehensive tests
Deliverables:
- Working 4-stage pipeline
- 3-5 providers implemented
- Test coverage >80%
Phase 3: API and Deployment (4-6 weeks)
Goals:
- Add REST API
- Add containerization
- Add monitoring
- Deploy to production
Tasks:
- Design REST API
- Implement API endpoints
- Add OpenAPI documentation
- Create Dockerfile
- Add Prometheus metrics
- Add health endpoint
- Deploy to staging
- Load testing
- Deploy to production
Deliverables:
- REST API with OpenAPI spec
- Docker images
- Monitoring dashboard
- Production deployment
Phase 4: Expansion (Ongoing)
Goals:
- Add more providers
- Improve merge algorithm
- Add features
Tasks:
- Add iTunes provider
- Add Tidal provider
- Add Bandcamp provider
- Improve compatibility checking
- Add machine learning for provider preferences
- Add user feedback mechanism
Deliverables:
- 9+ providers
- Improved merge accuracy
- User feedback system
Risk Assessment
Technical Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Provider API changes | High | High | Monitor provider APIs, add health checks, graceful degradation |
| HTML scraping breaks | High | Medium | Monitor scraper failures, fallback to other providers |
| Rate limiting | Medium | Medium | Respect rate limits, implement backoff, cache aggressively |
| OAuth2 token expiration | Low | Low | Automatic token renewal, error handling |
| Merge conflicts | Medium | Medium | Comprehensive compatibility checking, user override |
| Performance degradation | Low | Medium | Monitoring, caching, optimization |
Operational Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Single developer dependency | High | High | Build community, document architecture, onboard contributors |
| Deno ecosystem changes | Low | Medium | Monitor Deno releases, test before upgrading |
| Fresh framework changes | Medium | Medium | Pin Fresh version, test before upgrading |
| Provider terms of service | Low | High | Review ToS, add rate limiting, respect robots.txt |
| Cache growth | Medium | Low | Automatic cache eviction, monitoring |
Business Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Low adoption | Medium | Medium | Marketing, documentation, community building |
| Competition | Low | Low | Focus on MusicBrainz integration, unique features |
| Maintenance burden | Medium | Medium | Automate testing, monitoring, deployment |
Conclusion
Harmony is an exceptional reference project for music metadata aggregation. Its architecture, data model, and merge algorithm are best-in-class and should be adopted with minimal modifications.
Key Takeaways:
- Architecture: 4-stage pipeline is proven and extensible
- Data Model: HarmonyRelease schema is comprehensive and well-designed
- Merge Algorithm: 3-phase merge with provider preferences solves real problems
- Provider Abstraction: Base class hierarchy enables easy provider addition
- Type Safety: Full TypeScript coverage prevents bugs
- Testing: Declarative provider tests and offline testing are excellent patterns
Critical Additions:
- REST API: Essential for programmatic access
- Containerization: Simplifies deployment
- Monitoring: Required for production operations
- Documentation: Improves onboarding and adoption
Adoption Path:
- Study Harmony architecture (2-4 weeks)
- Implement core components (6-8 weeks)
- Add API and deployment (4-6 weeks)
- Expand providers and features (ongoing)
Expected Outcome: Production-ready metadata aggregation system with 9+ providers, intelligent merging, and MusicBrainz integration within 3-4 months.
Relevance Score: 10/10
Harmony is the most relevant project for metadata aggregation:
- Architecture: Best-in-class multi-source aggregation
- Data Model: Comprehensive and well-designed
- MusicBrainz Integration: Seamless seeding workflow
- Code Quality: Type-safe, well-tested, maintainable
- Production-Ready: Used by MusicBrainz community
Recommendation: Adopt Harmony's architecture as the foundation for the metadata aggregation system. The investment in studying and adapting Harmony will pay dividends in reduced development time, fewer bugs, and better design decisions.