Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

34 KiB

Raw Permalink Blame History

Reverse Engineering Plan

Systematic analysis of all 17 projects in the research folder. Each project follows the 10-phase methodology from REVERSE_ENGINEERING_PROMPT.md.

Output: For each project, create docs/research/{project-slug}/analysis/ with deliverable files.

1. MusicBrainz Server

Repo: https://github.com/metabrainz/musicbrainz-server Language: Perl | Framework: Catalyst

Todos

Phase 1 - Identity & Entry Points: Locate Perl entry point, Catalyst app bootstrap, package manifests (cpanfile), Makefile, Docker setup. Identify version and release cycle.
Phase 2 - Architecture & Structure: Map src/ structure (lib/MusicBrainz/), identify MVC layers, module boundaries. Document Catalyst controllers, models, views.
Phase 3 - API Surface: Document REST API at /ws/2/ (XML/JSON). Extract all entity endpoints (artist, release, recording, work, label, area, event, instrument, place, series, url). Map query parameters, includes, subqueries.
Phase 4 - Data Layer: Analyze PostgreSQL schema, find migration scripts, map all entity tables and relationships. Document Solr search integration.
Phase 5 - External Integrations: Cover Art Archive integration, relationship to other MetaBrainz services (ListenBrainz, AcoustID, BookBrainz). Replication system.
Phase 6 - Auth & Security: Document editor authentication, OAuth for API, permission model (auto-editors, voting system).
Phase 7 - Configuration: Extract all environment variables, database config, Solr config, Redis config.
Phase 8 - Testing: Identify test framework (Test::More/Test2), test coverage, CI setup.
Phase 9 - Observability: Logging, metrics, health endpoints.
Phase 10 - Deployment: Docker-compose setup, replication tokens, database initialization, Solr setup. Document resource requirements (~350GB DB).
Synthesize: Write OVERVIEW.md, ARCHITECTURE.md, API.md, DATA.md, INTEGRATIONS.md, DEPLOYMENT.md, CODEBASE.md, EVALUATION.md

2. AcoustID

Repo: https://github.com/acoustid/acoustid-server Language: Python | Index: https://github.com/acoustid/acoustid-index (Zig)

Todos

Phase 1 - Identity & Entry Points: Locate Python entry point, identify web framework, find acoustid-index Zig entry. Map both repos (server + index).
Phase 2 - Architecture & Structure: Map server architecture (fingerprint submission, lookup, matching). Understand index architecture (StreamVByte compression, HTTP API).
Phase 3 - API Surface: Document /v2/lookup and /v2/submit endpoints. Extract all query parameters (meta, fingerprint, duration, client). Document response formats.
Phase 4 - Data Layer: Identify database (PostgreSQL), fingerprint storage format, index data structure. Map relationship to MusicBrainz recording IDs.
Phase 5 - External Integrations: MusicBrainz API integration for recording metadata. Chromaprint fingerprint format compatibility.
Phase 6 - Auth & Security: API key system, rate limiting per client.
Phase 7 - Configuration: Environment variables, database config, index config.
Phase 8 - Testing: Test framework, test data.
Phase 9 - Observability: Logging, health checks.
Phase 10 - Deployment: Docker setup for both server and index. Resource requirements.
Synthesize: Write analysis deliverables.

3. ListenBrainz

Repo: https://github.com/metabrainz/listenbrainz-server Language: Python

Todos

Phase 1 - Identity & Entry Points: Locate Flask/web framework entry, CLI scripts, worker processes.
Phase 2 - Architecture & Structure: Map web server, spark cluster, data pipeline. Identify recommendation engine components.
Phase 3 - API Surface: Document all /1/ API endpoints: listens, stats, recommendations, playlists, social, explore (fresh-releases, lb-radio). Extract auth requirements per endpoint.
Phase 4 - Data Layer: Identify databases (PostgreSQL, TimescaleDB, Spark). Map listen data schema, user data, recommendation models.
Phase 5 - External Integrations: MusicBrainz mapping, Spotify import, Last.fm import, MBID mapping service.
Phase 6 - Auth & Security: Token-based auth, MusicBrainz OAuth integration.
Phase 7 - Configuration: Environment variables, Spark config, database config.
Phase 8 - Testing: Test framework, test data, CI pipeline.
Phase 9 - Observability: Logging, metrics, Sentry integration.
Phase 10 - Deployment: Docker-compose, Spark cluster setup, resource requirements.
Synthesize: Write analysis deliverables.

4. music-metadata-api

Repo: https://github.com/Aunali321/music-metadata-api Language: Go

Todos

Phase 1 - Identity & Entry Points: Locate main.go, identify HTTP framework, find CLI flags (-db path).
Phase 2 - Architecture & Structure: Map Go package structure. Identify handler/service/repository layers.
Phase 3 - API Surface: Document all endpoints: /lookup/* (isrc, track, artist, album), /search/* (track, artist), /batch/lookup. Extract OpenAPI 3.1 spec. Document rate limiting (100 req/s, burst 200).
Phase 4 - Data Layer: Analyze SQLite schema for both databases. Map tables: tracks, artists, albums. Document indexes, query patterns, batch lookup implementation.
Phase 5 - External Integrations: None expected (self-contained with pre-built DBs). Verify.
Phase 6 - Auth & Security: Identify if any auth exists. Rate limiting implementation.
Phase 7 - Configuration: CLI flags, environment variables, database paths.
Phase 8 - Testing: Test coverage, test data.
Phase 9 - Observability: /health endpoint, logging.
Phase 10 - Deployment: Docker image (ghcr.io), binary build process. Database acquisition process.
Synthesize: Write analysis deliverables.

5. MiniMediaMetadataAPI

Repo: https://github.com/MusicMoveArr/MiniMediaMetadataAPI Language: C#

Todos

Phase 1 - Identity & Entry Points: Locate Program.cs / Startup.cs, identify .NET version, find *.csproj files.
Phase 2 - Architecture & Structure: Map C# project structure (Controllers, Services, Models). Identify DI configuration.
Phase 3 - API Surface: Document /api/artists, /api/albums, /api/tracks endpoints. Extract provider query parameter (Any, Tidal, MusicBrainz, Spotify, Deezer, Discogs).
Phase 4 - Data Layer: Analyze PostgreSQL schema (shared with MiniMediaScanner). Map entity models, EF Core migrations.
Phase 5 - External Integrations: Document provider implementations for: MusicBrainz API, Spotify API, Tidal API, Deezer API, Discogs API. Extract auth methods per provider.
Phase 6 - Auth & Security: API authentication, provider credential management.
Phase 7 - Configuration: appsettings.json structure, environment variables, connection strings.
Phase 8 - Testing: Test projects, coverage.
Phase 9 - Observability: Logging (Serilog?), health checks.
Phase 10 - Deployment: Docker image, docker-compose, memory limits (<256M).
Synthesize: Write analysis deliverables.

6. Lidarr Metadata API

Repo: https://github.com/Lidarr/LidarrAPI.Metadata Language: Python

Todos

Phase 1 - Identity & Entry Points: Locate server.py, identify web framework, find lidarr-metadata-server CLI entry.
Phase 2 - Architecture & Structure: Map Python package structure. Identify caching layer (lm_cache_db).
Phase 3 - API Surface: Document metadata endpoints used by Lidarr. Artist lookup, album lookup, search. Response format.
Phase 4 - Data Layer: MusicBrainz PostgreSQL dependency. Cache database schema. Solr search integration.
Phase 5 - External Integrations: MusicBrainz database (direct PostgreSQL access, not API). Solr search server. Cover Art Archive.
Phase 6 - Auth & Security: Database credentials (hardcoded abc/abc?). API access control.
Phase 7 - Configuration: Docker environment, database connection, Solr config.
Phase 8 - Testing: Test framework, test data.
Phase 9 - Observability: Logging, crash recovery behavior.
Phase 10 - Deployment: docker-compose.yml (base, dev, prod variants). SQL index creation scripts. Resource requirements.
Synthesize: Write analysis deliverables.

7. Harmony

Repo: https://github.com/kellnerd/harmony Language: TypeScript | Runtime: Deno | Framework: Fresh

Todos

Phase 1 - Identity & Entry Points: Locate deno.json, Fresh app entry, identify import map.
Phase 2 - Architecture & Structure: Map providers/ directory (each provider is a module). Understand lookup → harmonize → merge → seed pipeline. Document provider interface contract.
Phase 3 - API Surface: Document /release route, lookup API (GTIN, URL, provider ID parameters). Response format (harmonized release).
Phase 4 - Data Layer: Identify if any persistence exists (permalink snapshots). Cache strategy.
Phase 5 - External Integrations: Document each provider adapter: MusicBrainz, Spotify, Deezer, Bandcamp, Beatport, iTunes, Tidal, KKBOX, Mora, Ototoy. Extract API auth per provider.
Phase 6 - Auth & Security: Provider credential management. User-facing auth (if any).
Phase 7 - Configuration: Environment variables for API keys, provider config.
Phase 8 - Testing: Deno test framework, test data/fixtures.
Phase 9 - Observability: Logging (getLogger), error handling.
Phase 10 - Deployment: Deno Deploy compatibility, self-hosting. Resource requirements.
Synthesize: Write analysis deliverables.

8. GraphBrainz

Repo: https://github.com/exogen/graphbrainz Language: JavaScript | Framework: Express + GraphQL

Todos

Phase 1 - Identity & Entry Points: Locate package.json main, CLI entry (graphbrainz command), Express middleware export.
Phase 2 - Architecture & Structure: Map schema definition, resolver structure, extension system. Document how type extensions work (schema stitching).
Phase 3 - API Surface: Document full GraphQL schema: lookup queries (artist, release, recording, etc.), browse queries, search queries. Extract all type definitions and fields. Document extension-added fields.
Phase 4 - Data Layer: Caching layer (configurable TTL). Identify cache implementation.
Phase 5 - External Integrations: Core: MusicBrainz API. Extensions: Cover Art Archive, fanart.tv, MediaWiki, TheAudioDB, Last.fm, Discogs, Spotify. Document rate limiting per service.
Phase 6 - Auth & Security: MusicBrainz API rate limiting compliance. Extension API key management.
Phase 7 - Configuration: Environment variables, extension configuration, cache TTL.
Phase 8 - Testing: Test framework (Jest?), GraphQL query testing.
Phase 9 - Observability: Logging, error handling in resolvers.
Phase 10 - Deployment: npm install, Docker, Express middleware integration.
Synthesize: Write analysis deliverables.

9. Bedrock-API

Repo: https://github.com/feralbureau/bedrock-api Language: Go | API: gRPC + HTTP

Todos

Phase 1 - Identity & Entry Points: Locate main.go, find .proto files, identify gRPC server setup.
Phase 2 - Architecture & Structure: Map provider adapters (Spotify, SoundCloud, Deezer, YouTube Music, Yandex, VK). Document Resolver pattern for cross-platform bridging.
Phase 3 - API Surface: Extract complete .proto definitions. Document gRPC services and methods. Map HTTP streaming proxy endpoints.
Phase 4 - Data Layer: PostgreSQL backend for user/auth data. Identify caching.
Phase 5 - External Integrations: Document each provider adapter: auth methods, API versions, rate limits, supported operations (metadata, search, streaming, playlist). Lyrics: LrcLib, Genius.
Phase 6 - Auth & Security: JWT authentication implementation. Provider credential management.
Phase 7 - Configuration: config.yaml structure, environment variables, provider credentials.
Phase 8 - Testing: Test framework, mocking of external providers.
Phase 9 - Observability: Logging, gRPC interceptors, health checks.
Phase 10 - Deployment: Docker, database setup, provider configuration.
Synthesize: Write analysis deliverables.

10. minim

Repo: https://github.com/bbye98/minim Language: Python | Type: Library (not server)

Todos

Phase 1 - Identity & Entry Points: Locate pyproject.toml/setup.py, identify package structure (minim.*).
Phase 2 - Architecture & Structure: Map module structure: minim.audio, minim.discogs, minim.itunes, minim.qobuz, minim.spotify, minim.tidal. Document common interface patterns.
Phase 3 - API Surface: Document public Python API for each module. Extract search(), lookup(), get_artist(), get_album(), get_track() equivalents per service.
Phase 4 - Data Layer: No persistence (library). Document audio file metadata handling (minim.audio).
Phase 5 - External Integrations: Document each API client: Deezer, Discogs (OAuth), iTunes, Musixmatch, Qobuz, Spotify (multiple grant types), TIDAL (old + new API). Extract auth flows and token caching.
Phase 6 - Auth & Security: OAuth implementations per service. Token caching mechanism. Credential storage.
Phase 7 - Configuration: API key / credential configuration per service.
Phase 8 - Testing: Test framework (pytest?), test coverage, mocking external APIs.
Phase 9 - Observability: Logging.
Phase 10 - Deployment: pip install, PyPI publishing. Dependencies.
Synthesize: Write analysis deliverables.

11. MusicMetaLinker

Repo: https://github.com/andreamust/MusicMetaLinker Language: Python | Type: Library

Todos

Phase 1 - Identity & Entry Points: Locate pyproject.toml/setup.py, identify package entry.
Phase 2 - Architecture & Structure: Map three-step workflow: service selection → information retrieval → filtering. Document linker class hierarchy.
Phase 3 - API Surface: Document public Python API: MusicMetaLinker constructor params, get_track(), get_artist(), get_album(), get_mbid(), get_isrc(), get_deezer_id().
Phase 4 - Data Layer: No persistence. Document input/output data formats.
Phase 5 - External Integrations: MusicBrainz API, AcousticBrainz API, YouTube Music API, Deezer API. Document service selection logic (which service for which input).
Phase 6 - Auth & Security: API key handling per service.
Phase 7 - Configuration: API credentials, service priority configuration.
Phase 8 - Testing: Test framework, test data, mocking.
Phase 9 - Observability: Logging, error handling.
Phase 10 - Deployment: pip install, PyPI. Dependencies.
Synthesize: Write analysis deliverables.

12. Meelo

Repo: https://github.com/Arthi-chaud/Meelo Language: TypeScript (87%), Python, Go

Todos

Phase 1 - Identity & Entry Points: Locate package.json(s) (likely monorepo), identify NestJS/Express entry, find Docker entry points.
Phase 2 - Architecture & Structure: Map monorepo structure: server, scanner, web frontend, matcher. Identify service boundaries. Document plugin/provider system for metadata sources.
Phase 3 - API Surface: Document REST API: artists, albums, tracks, songs, releases endpoints. Extract query/filter parameters. Document auth requirements.
Phase 4 - Data Layer: PostgreSQL schema. Map entities: Artist, Album, Song, Track, Release, Genre, Illustration. Document relationships. Find Prisma/TypeORM models.
Phase 5 - External Integrations: MusicBrainz, Genius, Wikipedia providers. ListenBrainz and Last.fm scrobbling. LRC lyrics sources.
Phase 6 - Auth & Security: User management, API authentication.
Phase 7 - Configuration: docker-compose environment, database config, provider API keys.
Phase 8 - Testing: Test framework (Jest?), test organization.
Phase 9 - Observability: Logging, error handling.
Phase 10 - Deployment: Docker-compose, volume mounts, database initialization.
Synthesize: Write analysis deliverables.

13. Melodee

Repo: https://github.com/melodee-project/melodee Language: C# (.NET 10) | UI: Blazor

Todos

Phase 1 - Identity & Entry Points: Locate Program.cs, .csproj/.sln, identify Blazor app entry. Map project structure.
Phase 2 - Architecture & Structure: Map multi-stage pipeline: Inbound → Staging → Storage. Identify service layer, job scheduler (Quartz.NET), media processing pipeline.
Phase 3 - API Surface: Document three APIs: OpenSubsonic, Jellyfin, Native REST (/scalar/v1). Extract OpenAPI spec at /openapi/v1.json. Map endpoint coverage per API.
Phase 4 - Data Layer: PostgreSQL schema. Map entities: Artist, Album, Track, Library, User. Find EF Core migrations. Document MusicBrainz local cache DB.
Phase 5 - External Integrations: Metadata providers: MusicBrainz (local cache), Last.fm, Spotify, iTunes, Deezer, Brave Search. Scrobbling: Last.fm. Transcoding: ffmpeg.
Phase 6 - Auth & Security: User authentication, API auth per protocol (Subsonic token, Jellyfin, JWT).
Phase 7 - Configuration: appsettings.json, environment variables, library paths, provider API keys.
Phase 8 - Testing: Test projects, xUnit/NUnit.
Phase 9 - Observability: Logging, job scheduler status, health checks.
Phase 10 - Deployment: Docker, Podman, resource requirements (Raspberry Pi compatible). Multi-library federation.
Synthesize: Write analysis deliverables.

14. Navidrome

Repo: https://github.com/navidrome/navidrome Language: Go | UI: React

Todos

Phase 1 - Identity & Entry Points: Locate main.go, identify Gin/Echo/Chi router, find React app entry.
Phase 2 - Architecture & Structure: Map Go package structure: server, model, scanner, subsonic. Identify clean architecture layers.
Phase 3 - API Surface: Document OpenSubsonic API v1.16.1 implementation. Map all /rest/* endpoints: getArtists, getArtist, getAlbum, getSong, search3, stream, getCoverArt, etc.
Phase 4 - Data Layer: Database (SQLite by default). Map entities: Artist, Album, MediaFile, Playlist, User. Find migration scripts.
Phase 5 - External Integrations: Last.fm (scrobbling, artist info, similar artists). ListenBrainz scrobbling. Spotify artwork (if configured).
Phase 6 - Auth & Security: Multi-user auth, JWT tokens, Subsonic token auth.
Phase 7 - Configuration: navidrome.toml / environment variables. All configuration options.
Phase 8 - Testing: Go test framework, test coverage.
Phase 9 - Observability: Logging, /api/health, Prometheus metrics.
Phase 10 - Deployment: Single binary, Docker, resource requirements. 900K+ song library support.
Synthesize: Write analysis deliverables.

15. gonic

Repo: https://github.com/sentriz/gonic Language: Go

Todos

Phase 1 - Identity & Entry Points: Locate main.go (cmd/gonic/), identify web framework.
Phase 2 - Architecture & Structure: Map Go package structure. Identify Subsonic handler layer, scanner, jukebox.
Phase 3 - API Surface: Document Subsonic API implementation. Map supported endpoints. Document multi-value tag handling modes (multi, delim).
Phase 4 - Data Layer: Database (SQLite/GORM?). Map entities. Scanner implementation.
Phase 5 - External Integrations: Last.fm (scrobbling, artist info). ListenBrainz scrobbling. Podcast support.
Phase 6 - Auth & Security: Multi-user, Subsonic auth.
Phase 7 - Configuration: Environment variables (GONIC_*), config file.
Phase 8 - Testing: Go tests.
Phase 9 - Observability: Logging, web interface status.
Phase 10 - Deployment: Docker (ARM images available), binary. Raspberry Pi suitability.
Synthesize: Write analysis deliverables.

16. LMS (Lightweight Music Server)

Repo: https://github.com/epoupon/lms Language: C++

Todos

Phase 1 - Identity & Entry Points: Locate main.cpp, CMakeLists.txt, identify web framework (Wt?).
Phase 2 - Architecture & Structure: Map C++ source structure. Identify modules: core, database, scanner, subsonic, ui.
Phase 3 - API Surface: Document OpenSubsonic API implementation. Map supported endpoints and extensions.
Phase 4 - Data Layer: Database (SQLite). Map entities: Artist, Release, Track, Cluster (for tags). Document multi-valued tag support. MusicBrainz ID storage.
Phase 5 - External Integrations: MusicBrainz IDs from tags. ListenBrainz scrobbling. Artist NFO files (Kodi format).
Phase 6 - Auth & Security: User authentication, API auth.
Phase 7 - Configuration: Configuration file, environment variables.
Phase 8 - Testing: C++ test framework (Catch2?), test coverage.
Phase 9 - Observability: Logging, health.
Phase 10 - Deployment: CMake build, Docker, AUR package. Dependencies (Wt, Boost, TagLib).
Synthesize: Write analysis deliverables.

17. Accentor

Repo: https://github.com/accentor/api Language: Ruby | Framework: Rails

Todos

Phase 1 - Identity & Entry Points: Locate Gemfile, config.ru, identify Rails entry. Map related repos (web, android).
Phase 2 - Architecture & Structure: Map Rails structure: app/controllers, app/models, app/services. Identify deviations from standard Rails.
Phase 3 - API Surface: Document REST API endpoints: /api/artists, /api/albums, /api/tracks. Extract serializers (response format). Document filtering/pagination.
Phase 4 - Data Layer: PostgreSQL. Map ActiveRecord models: Artist, Album, Track, Label, Genre, User. Find db/migrate/ history. Document multi-artist and multi-label relationships.
Phase 5 - External Integrations: Minimal (user-controlled metadata). Verify no external API calls.
Phase 6 - Auth & Security: User authentication (Devise?). API token auth.
Phase 7 - Configuration: database.yml, environment variables, secrets.
Phase 8 - Testing: RSpec/Minitest, test coverage, factory bot fixtures.
Phase 9 - Observability: Rails logging, error handling.
Phase 10 - Deployment: Puma server, nginx reverse proxy, database setup. No Docker (manual deployment).
Synthesize: Write analysis deliverables.

Execution Order (Recommended)

Priority based on relevance as metadata providers/aggregators:

Wave 1: Core Metadata Services

MusicBrainz Server - Foundation everything builds on
AcoustID - Fingerprinting complement to MusicBrainz
ListenBrainz - Recommendations complement

Wave 2: Aggregators (highest value for our project)

Harmony - Best multi-source aggregator
GraphBrainz - GraphQL aggregation layer
MiniMediaMetadataAPI - Multi-provider self-hosted
music-metadata-api - High-volume lookup service
Bedrock-API - gRPC aggregator

Wave 3: Libraries

minim - Python multi-API client
MusicMetaLinker - Entity linking library

Wave 4: Self-Hosted Servers (metadata as secondary feature)

Meelo - Collector-focused with rich metadata
Melodee - All-in-one with multiple API protocols
Navidrome - Popular streaming server
Lidarr Metadata API - *arr ecosystem
LMS - C++ with strong MusicBrainz support
gonic - Minimal Go implementation
Accentor - Metadata-focused Rails server

Per-Project Deliverables

Each project analysis produces:

docs/research/{project-slug}/analysis/
├── OVERVIEW.md        # Purpose, tech stack, license, status
├── ARCHITECTURE.md    # Design patterns, layers, modules
├── API.md             # Endpoints, schemas, authentication
├── DATA.md            # Database, models, migrations
├── INTEGRATIONS.md    # External services, queues, webhooks
├── DEPLOYMENT.md      # Build, CI/CD, infrastructure
├── CODEBASE.md        # Structure, patterns, conventions
└── EVALUATION.md      # Pros, cons, adoption considerations

Agent Dispatch Pattern

For each project, launch in parallel:

1. explore agent  → Code Structure (Phase 1, 2)
2. explore agent  → API Surface (Phase 3)
3. explore agent  → Data Layer (Phase 4)
4. librarian agent → Dependencies (Phase 5, 7)
5. librarian agent → External Integrations (Phase 5, 6)

Then synthesize results into deliverable files.

See REVERSE_ENGINEERING_PROMPT.md for full agent prompt templates.

34 KiB Raw Permalink Blame History

Reverse Engineering Plan

1. MusicBrainz Server

Todos

2. AcoustID

Todos

3. ListenBrainz

Todos

4. music-metadata-api

Todos

5. MiniMediaMetadataAPI

Todos

6. Lidarr Metadata API

Todos

7. Harmony

Todos

8. GraphBrainz

Todos

9. Bedrock-API

Todos

10. minim

Todos

11. MusicMetaLinker

Todos

12. Meelo

Todos

13. Melodee

Todos

14. Navidrome

Todos

15. gonic

Todos

16. LMS (Lightweight Music Server)

Todos

17. Accentor

Todos

Execution Order (Recommended)

Wave 1: Core Metadata Services

Wave 2: Aggregators (highest value for our project)

Wave 3: Libraries

Wave 4: Self-Hosted Servers (metadata as secondary feature)

Per-Project Deliverables

Agent Dispatch Pattern

34 KiB

Raw Permalink Blame History