- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
34 KiB
Reverse Engineering Plan
Systematic analysis of all 17 projects in the research folder. Each project follows the 10-phase methodology from REVERSE_ENGINEERING_PROMPT.md.
Output: For each project, create docs/research/{project-slug}/analysis/ with deliverable files.
1. MusicBrainz Server
Repo: https://github.com/metabrainz/musicbrainz-server Language: Perl | Framework: Catalyst
Todos
- Phase 1 - Identity & Entry Points: Locate Perl entry point, Catalyst app bootstrap, package manifests (cpanfile), Makefile, Docker setup. Identify version and release cycle.
- Phase 2 - Architecture & Structure: Map src/ structure (lib/MusicBrainz/), identify MVC layers, module boundaries. Document Catalyst controllers, models, views.
- Phase 3 - API Surface: Document REST API at /ws/2/ (XML/JSON). Extract all entity endpoints (artist, release, recording, work, label, area, event, instrument, place, series, url). Map query parameters, includes, subqueries.
- Phase 4 - Data Layer: Analyze PostgreSQL schema, find migration scripts, map all entity tables and relationships. Document Solr search integration.
- Phase 5 - External Integrations: Cover Art Archive integration, relationship to other MetaBrainz services (ListenBrainz, AcoustID, BookBrainz). Replication system.
- Phase 6 - Auth & Security: Document editor authentication, OAuth for API, permission model (auto-editors, voting system).
- Phase 7 - Configuration: Extract all environment variables, database config, Solr config, Redis config.
- Phase 8 - Testing: Identify test framework (Test::More/Test2), test coverage, CI setup.
- Phase 9 - Observability: Logging, metrics, health endpoints.
- Phase 10 - Deployment: Docker-compose setup, replication tokens, database initialization, Solr setup. Document resource requirements (~350GB DB).
- Synthesize: Write OVERVIEW.md, ARCHITECTURE.md, API.md, DATA.md, INTEGRATIONS.md, DEPLOYMENT.md, CODEBASE.md, EVALUATION.md
2. AcoustID
Repo: https://github.com/acoustid/acoustid-server Language: Python | Index: https://github.com/acoustid/acoustid-index (Zig)
Todos
- Phase 1 - Identity & Entry Points: Locate Python entry point, identify web framework, find acoustid-index Zig entry. Map both repos (server + index).
- Phase 2 - Architecture & Structure: Map server architecture (fingerprint submission, lookup, matching). Understand index architecture (StreamVByte compression, HTTP API).
- Phase 3 - API Surface: Document /v2/lookup and /v2/submit endpoints. Extract all query parameters (meta, fingerprint, duration, client). Document response formats.
- Phase 4 - Data Layer: Identify database (PostgreSQL), fingerprint storage format, index data structure. Map relationship to MusicBrainz recording IDs.
- Phase 5 - External Integrations: MusicBrainz API integration for recording metadata. Chromaprint fingerprint format compatibility.
- Phase 6 - Auth & Security: API key system, rate limiting per client.
- Phase 7 - Configuration: Environment variables, database config, index config.
- Phase 8 - Testing: Test framework, test data.
- Phase 9 - Observability: Logging, health checks.
- Phase 10 - Deployment: Docker setup for both server and index. Resource requirements.
- Synthesize: Write analysis deliverables.
3. ListenBrainz
Repo: https://github.com/metabrainz/listenbrainz-server Language: Python
Todos
- Phase 1 - Identity & Entry Points: Locate Flask/web framework entry, CLI scripts, worker processes.
- Phase 2 - Architecture & Structure: Map web server, spark cluster, data pipeline. Identify recommendation engine components.
- Phase 3 - API Surface: Document all /1/ API endpoints: listens, stats, recommendations, playlists, social, explore (fresh-releases, lb-radio). Extract auth requirements per endpoint.
- Phase 4 - Data Layer: Identify databases (PostgreSQL, TimescaleDB, Spark). Map listen data schema, user data, recommendation models.
- Phase 5 - External Integrations: MusicBrainz mapping, Spotify import, Last.fm import, MBID mapping service.
- Phase 6 - Auth & Security: Token-based auth, MusicBrainz OAuth integration.
- Phase 7 - Configuration: Environment variables, Spark config, database config.
- Phase 8 - Testing: Test framework, test data, CI pipeline.
- Phase 9 - Observability: Logging, metrics, Sentry integration.
- Phase 10 - Deployment: Docker-compose, Spark cluster setup, resource requirements.
- Synthesize: Write analysis deliverables.
4. music-metadata-api
Repo: https://github.com/Aunali321/music-metadata-api Language: Go
Todos
- Phase 1 - Identity & Entry Points: Locate main.go, identify HTTP framework, find CLI flags (-db path).
- Phase 2 - Architecture & Structure: Map Go package structure. Identify handler/service/repository layers.
- Phase 3 - API Surface: Document all endpoints: /lookup/* (isrc, track, artist, album), /search/* (track, artist), /batch/lookup. Extract OpenAPI 3.1 spec. Document rate limiting (100 req/s, burst 200).
- Phase 4 - Data Layer: Analyze SQLite schema for both databases. Map tables: tracks, artists, albums. Document indexes, query patterns, batch lookup implementation.
- Phase 5 - External Integrations: None expected (self-contained with pre-built DBs). Verify.
- Phase 6 - Auth & Security: Identify if any auth exists. Rate limiting implementation.
- Phase 7 - Configuration: CLI flags, environment variables, database paths.
- Phase 8 - Testing: Test coverage, test data.
- Phase 9 - Observability: /health endpoint, logging.
- Phase 10 - Deployment: Docker image (ghcr.io), binary build process. Database acquisition process.
- Synthesize: Write analysis deliverables.
5. MiniMediaMetadataAPI
Repo: https://github.com/MusicMoveArr/MiniMediaMetadataAPI Language: C#
Todos
- Phase 1 - Identity & Entry Points: Locate Program.cs / Startup.cs, identify .NET version, find *.csproj files.
- Phase 2 - Architecture & Structure: Map C# project structure (Controllers, Services, Models). Identify DI configuration.
- Phase 3 - API Surface: Document /api/artists, /api/albums, /api/tracks endpoints. Extract provider query parameter (Any, Tidal, MusicBrainz, Spotify, Deezer, Discogs).
- Phase 4 - Data Layer: Analyze PostgreSQL schema (shared with MiniMediaScanner). Map entity models, EF Core migrations.
- Phase 5 - External Integrations: Document provider implementations for: MusicBrainz API, Spotify API, Tidal API, Deezer API, Discogs API. Extract auth methods per provider.
- Phase 6 - Auth & Security: API authentication, provider credential management.
- Phase 7 - Configuration: appsettings.json structure, environment variables, connection strings.
- Phase 8 - Testing: Test projects, coverage.
- Phase 9 - Observability: Logging (Serilog?), health checks.
- Phase 10 - Deployment: Docker image, docker-compose, memory limits (<256M).
- Synthesize: Write analysis deliverables.
6. Lidarr Metadata API
Repo: https://github.com/Lidarr/LidarrAPI.Metadata Language: Python
Todos
- Phase 1 - Identity & Entry Points: Locate server.py, identify web framework, find lidarr-metadata-server CLI entry.
- Phase 2 - Architecture & Structure: Map Python package structure. Identify caching layer (lm_cache_db).
- Phase 3 - API Surface: Document metadata endpoints used by Lidarr. Artist lookup, album lookup, search. Response format.
- Phase 4 - Data Layer: MusicBrainz PostgreSQL dependency. Cache database schema. Solr search integration.
- Phase 5 - External Integrations: MusicBrainz database (direct PostgreSQL access, not API). Solr search server. Cover Art Archive.
- Phase 6 - Auth & Security: Database credentials (hardcoded abc/abc?). API access control.
- Phase 7 - Configuration: Docker environment, database connection, Solr config.
- Phase 8 - Testing: Test framework, test data.
- Phase 9 - Observability: Logging, crash recovery behavior.
- Phase 10 - Deployment: docker-compose.yml (base, dev, prod variants). SQL index creation scripts. Resource requirements.
- Synthesize: Write analysis deliverables.
7. Harmony
Repo: https://github.com/kellnerd/harmony Language: TypeScript | Runtime: Deno | Framework: Fresh
Todos
- Phase 1 - Identity & Entry Points: Locate deno.json, Fresh app entry, identify import map.
- Phase 2 - Architecture & Structure: Map providers/ directory (each provider is a module). Understand lookup → harmonize → merge → seed pipeline. Document provider interface contract.
- Phase 3 - API Surface: Document /release route, lookup API (GTIN, URL, provider ID parameters). Response format (harmonized release).
- Phase 4 - Data Layer: Identify if any persistence exists (permalink snapshots). Cache strategy.
- Phase 5 - External Integrations: Document each provider adapter: MusicBrainz, Spotify, Deezer, Bandcamp, Beatport, iTunes, Tidal, KKBOX, Mora, Ototoy. Extract API auth per provider.
- Phase 6 - Auth & Security: Provider credential management. User-facing auth (if any).
- Phase 7 - Configuration: Environment variables for API keys, provider config.
- Phase 8 - Testing: Deno test framework, test data/fixtures.
- Phase 9 - Observability: Logging (getLogger), error handling.
- Phase 10 - Deployment: Deno Deploy compatibility, self-hosting. Resource requirements.
- Synthesize: Write analysis deliverables.
8. GraphBrainz
Repo: https://github.com/exogen/graphbrainz Language: JavaScript | Framework: Express + GraphQL
Todos
- Phase 1 - Identity & Entry Points: Locate package.json main, CLI entry (graphbrainz command), Express middleware export.
- Phase 2 - Architecture & Structure: Map schema definition, resolver structure, extension system. Document how type extensions work (schema stitching).
- Phase 3 - API Surface: Document full GraphQL schema: lookup queries (artist, release, recording, etc.), browse queries, search queries. Extract all type definitions and fields. Document extension-added fields.
- Phase 4 - Data Layer: Caching layer (configurable TTL). Identify cache implementation.
- Phase 5 - External Integrations: Core: MusicBrainz API. Extensions: Cover Art Archive, fanart.tv, MediaWiki, TheAudioDB, Last.fm, Discogs, Spotify. Document rate limiting per service.
- Phase 6 - Auth & Security: MusicBrainz API rate limiting compliance. Extension API key management.
- Phase 7 - Configuration: Environment variables, extension configuration, cache TTL.
- Phase 8 - Testing: Test framework (Jest?), GraphQL query testing.
- Phase 9 - Observability: Logging, error handling in resolvers.
- Phase 10 - Deployment: npm install, Docker, Express middleware integration.
- Synthesize: Write analysis deliverables.
9. Bedrock-API
Repo: https://github.com/feralbureau/bedrock-api Language: Go | API: gRPC + HTTP
Todos
- Phase 1 - Identity & Entry Points: Locate main.go, find .proto files, identify gRPC server setup.
- Phase 2 - Architecture & Structure: Map provider adapters (Spotify, SoundCloud, Deezer, YouTube Music, Yandex, VK). Document Resolver pattern for cross-platform bridging.
- Phase 3 - API Surface: Extract complete .proto definitions. Document gRPC services and methods. Map HTTP streaming proxy endpoints.
- Phase 4 - Data Layer: PostgreSQL backend for user/auth data. Identify caching.
- Phase 5 - External Integrations: Document each provider adapter: auth methods, API versions, rate limits, supported operations (metadata, search, streaming, playlist). Lyrics: LrcLib, Genius.
- Phase 6 - Auth & Security: JWT authentication implementation. Provider credential management.
- Phase 7 - Configuration: config.yaml structure, environment variables, provider credentials.
- Phase 8 - Testing: Test framework, mocking of external providers.
- Phase 9 - Observability: Logging, gRPC interceptors, health checks.
- Phase 10 - Deployment: Docker, database setup, provider configuration.
- Synthesize: Write analysis deliverables.
10. minim
Repo: https://github.com/bbye98/minim Language: Python | Type: Library (not server)
Todos
- Phase 1 - Identity & Entry Points: Locate pyproject.toml/setup.py, identify package structure (minim.*).
- Phase 2 - Architecture & Structure: Map module structure: minim.audio, minim.discogs, minim.itunes, minim.qobuz, minim.spotify, minim.tidal. Document common interface patterns.
- Phase 3 - API Surface: Document public Python API for each module. Extract search(), lookup(), get_artist(), get_album(), get_track() equivalents per service.
- Phase 4 - Data Layer: No persistence (library). Document audio file metadata handling (minim.audio).
- Phase 5 - External Integrations: Document each API client: Deezer, Discogs (OAuth), iTunes, Musixmatch, Qobuz, Spotify (multiple grant types), TIDAL (old + new API). Extract auth flows and token caching.
- Phase 6 - Auth & Security: OAuth implementations per service. Token caching mechanism. Credential storage.
- Phase 7 - Configuration: API key / credential configuration per service.
- Phase 8 - Testing: Test framework (pytest?), test coverage, mocking external APIs.
- Phase 9 - Observability: Logging.
- Phase 10 - Deployment: pip install, PyPI publishing. Dependencies.
- Synthesize: Write analysis deliverables.
11. MusicMetaLinker
Repo: https://github.com/andreamust/MusicMetaLinker Language: Python | Type: Library
Todos
- Phase 1 - Identity & Entry Points: Locate pyproject.toml/setup.py, identify package entry.
- Phase 2 - Architecture & Structure: Map three-step workflow: service selection → information retrieval → filtering. Document linker class hierarchy.
- Phase 3 - API Surface: Document public Python API: MusicMetaLinker constructor params, get_track(), get_artist(), get_album(), get_mbid(), get_isrc(), get_deezer_id().
- Phase 4 - Data Layer: No persistence. Document input/output data formats.
- Phase 5 - External Integrations: MusicBrainz API, AcousticBrainz API, YouTube Music API, Deezer API. Document service selection logic (which service for which input).
- Phase 6 - Auth & Security: API key handling per service.
- Phase 7 - Configuration: API credentials, service priority configuration.
- Phase 8 - Testing: Test framework, test data, mocking.
- Phase 9 - Observability: Logging, error handling.
- Phase 10 - Deployment: pip install, PyPI. Dependencies.
- Synthesize: Write analysis deliverables.
12. Meelo
Repo: https://github.com/Arthi-chaud/Meelo Language: TypeScript (87%), Python, Go
Todos
- Phase 1 - Identity & Entry Points: Locate package.json(s) (likely monorepo), identify NestJS/Express entry, find Docker entry points.
- Phase 2 - Architecture & Structure: Map monorepo structure: server, scanner, web frontend, matcher. Identify service boundaries. Document plugin/provider system for metadata sources.
- Phase 3 - API Surface: Document REST API: artists, albums, tracks, songs, releases endpoints. Extract query/filter parameters. Document auth requirements.
- Phase 4 - Data Layer: PostgreSQL schema. Map entities: Artist, Album, Song, Track, Release, Genre, Illustration. Document relationships. Find Prisma/TypeORM models.
- Phase 5 - External Integrations: MusicBrainz, Genius, Wikipedia providers. ListenBrainz and Last.fm scrobbling. LRC lyrics sources.
- Phase 6 - Auth & Security: User management, API authentication.
- Phase 7 - Configuration: docker-compose environment, database config, provider API keys.
- Phase 8 - Testing: Test framework (Jest?), test organization.
- Phase 9 - Observability: Logging, error handling.
- Phase 10 - Deployment: Docker-compose, volume mounts, database initialization.
- Synthesize: Write analysis deliverables.
13. Melodee
Repo: https://github.com/melodee-project/melodee Language: C# (.NET 10) | UI: Blazor
Todos
- Phase 1 - Identity & Entry Points: Locate Program.cs, .csproj/.sln, identify Blazor app entry. Map project structure.
- Phase 2 - Architecture & Structure: Map multi-stage pipeline: Inbound → Staging → Storage. Identify service layer, job scheduler (Quartz.NET), media processing pipeline.
- Phase 3 - API Surface: Document three APIs: OpenSubsonic, Jellyfin, Native REST (/scalar/v1). Extract OpenAPI spec at /openapi/v1.json. Map endpoint coverage per API.
- Phase 4 - Data Layer: PostgreSQL schema. Map entities: Artist, Album, Track, Library, User. Find EF Core migrations. Document MusicBrainz local cache DB.
- Phase 5 - External Integrations: Metadata providers: MusicBrainz (local cache), Last.fm, Spotify, iTunes, Deezer, Brave Search. Scrobbling: Last.fm. Transcoding: ffmpeg.
- Phase 6 - Auth & Security: User authentication, API auth per protocol (Subsonic token, Jellyfin, JWT).
- Phase 7 - Configuration: appsettings.json, environment variables, library paths, provider API keys.
- Phase 8 - Testing: Test projects, xUnit/NUnit.
- Phase 9 - Observability: Logging, job scheduler status, health checks.
- Phase 10 - Deployment: Docker, Podman, resource requirements (Raspberry Pi compatible). Multi-library federation.
- Synthesize: Write analysis deliverables.
14. Navidrome
Repo: https://github.com/navidrome/navidrome Language: Go | UI: React
Todos
- Phase 1 - Identity & Entry Points: Locate main.go, identify Gin/Echo/Chi router, find React app entry.
- Phase 2 - Architecture & Structure: Map Go package structure: server, model, scanner, subsonic. Identify clean architecture layers.
- Phase 3 - API Surface: Document OpenSubsonic API v1.16.1 implementation. Map all /rest/* endpoints: getArtists, getArtist, getAlbum, getSong, search3, stream, getCoverArt, etc.
- Phase 4 - Data Layer: Database (SQLite by default). Map entities: Artist, Album, MediaFile, Playlist, User. Find migration scripts.
- Phase 5 - External Integrations: Last.fm (scrobbling, artist info, similar artists). ListenBrainz scrobbling. Spotify artwork (if configured).
- Phase 6 - Auth & Security: Multi-user auth, JWT tokens, Subsonic token auth.
- Phase 7 - Configuration: navidrome.toml / environment variables. All configuration options.
- Phase 8 - Testing: Go test framework, test coverage.
- Phase 9 - Observability: Logging, /api/health, Prometheus metrics.
- Phase 10 - Deployment: Single binary, Docker, resource requirements. 900K+ song library support.
- Synthesize: Write analysis deliverables.
15. gonic
Repo: https://github.com/sentriz/gonic Language: Go
Todos
- Phase 1 - Identity & Entry Points: Locate main.go (cmd/gonic/), identify web framework.
- Phase 2 - Architecture & Structure: Map Go package structure. Identify Subsonic handler layer, scanner, jukebox.
- Phase 3 - API Surface: Document Subsonic API implementation. Map supported endpoints. Document multi-value tag handling modes (multi, delim).
- Phase 4 - Data Layer: Database (SQLite/GORM?). Map entities. Scanner implementation.
- Phase 5 - External Integrations: Last.fm (scrobbling, artist info). ListenBrainz scrobbling. Podcast support.
- Phase 6 - Auth & Security: Multi-user, Subsonic auth.
- Phase 7 - Configuration: Environment variables (GONIC_*), config file.
- Phase 8 - Testing: Go tests.
- Phase 9 - Observability: Logging, web interface status.
- Phase 10 - Deployment: Docker (ARM images available), binary. Raspberry Pi suitability.
- Synthesize: Write analysis deliverables.
16. LMS (Lightweight Music Server)
Repo: https://github.com/epoupon/lms Language: C++
Todos
- Phase 1 - Identity & Entry Points: Locate main.cpp, CMakeLists.txt, identify web framework (Wt?).
- Phase 2 - Architecture & Structure: Map C++ source structure. Identify modules: core, database, scanner, subsonic, ui.
- Phase 3 - API Surface: Document OpenSubsonic API implementation. Map supported endpoints and extensions.
- Phase 4 - Data Layer: Database (SQLite). Map entities: Artist, Release, Track, Cluster (for tags). Document multi-valued tag support. MusicBrainz ID storage.
- Phase 5 - External Integrations: MusicBrainz IDs from tags. ListenBrainz scrobbling. Artist NFO files (Kodi format).
- Phase 6 - Auth & Security: User authentication, API auth.
- Phase 7 - Configuration: Configuration file, environment variables.
- Phase 8 - Testing: C++ test framework (Catch2?), test coverage.
- Phase 9 - Observability: Logging, health.
- Phase 10 - Deployment: CMake build, Docker, AUR package. Dependencies (Wt, Boost, TagLib).
- Synthesize: Write analysis deliverables.
17. Accentor
Repo: https://github.com/accentor/api Language: Ruby | Framework: Rails
Todos
- Phase 1 - Identity & Entry Points: Locate Gemfile, config.ru, identify Rails entry. Map related repos (web, android).
- Phase 2 - Architecture & Structure: Map Rails structure: app/controllers, app/models, app/services. Identify deviations from standard Rails.
- Phase 3 - API Surface: Document REST API endpoints: /api/artists, /api/albums, /api/tracks. Extract serializers (response format). Document filtering/pagination.
- Phase 4 - Data Layer: PostgreSQL. Map ActiveRecord models: Artist, Album, Track, Label, Genre, User. Find db/migrate/ history. Document multi-artist and multi-label relationships.
- Phase 5 - External Integrations: Minimal (user-controlled metadata). Verify no external API calls.
- Phase 6 - Auth & Security: User authentication (Devise?). API token auth.
- Phase 7 - Configuration: database.yml, environment variables, secrets.
- Phase 8 - Testing: RSpec/Minitest, test coverage, factory bot fixtures.
- Phase 9 - Observability: Rails logging, error handling.
- Phase 10 - Deployment: Puma server, nginx reverse proxy, database setup. No Docker (manual deployment).
- Synthesize: Write analysis deliverables.
Execution Order (Recommended)
Priority based on relevance as metadata providers/aggregators:
Wave 1: Core Metadata Services
- MusicBrainz Server - Foundation everything builds on
- AcoustID - Fingerprinting complement to MusicBrainz
- ListenBrainz - Recommendations complement
Wave 2: Aggregators (highest value for our project)
- Harmony - Best multi-source aggregator
- GraphBrainz - GraphQL aggregation layer
- MiniMediaMetadataAPI - Multi-provider self-hosted
- music-metadata-api - High-volume lookup service
- Bedrock-API - gRPC aggregator
Wave 3: Libraries
- minim - Python multi-API client
- MusicMetaLinker - Entity linking library
Wave 4: Self-Hosted Servers (metadata as secondary feature)
- Meelo - Collector-focused with rich metadata
- Melodee - All-in-one with multiple API protocols
- Navidrome - Popular streaming server
- Lidarr Metadata API - *arr ecosystem
- LMS - C++ with strong MusicBrainz support
- gonic - Minimal Go implementation
- Accentor - Metadata-focused Rails server
Per-Project Deliverables
Each project analysis produces:
docs/research/{project-slug}/analysis/
├── OVERVIEW.md # Purpose, tech stack, license, status
├── ARCHITECTURE.md # Design patterns, layers, modules
├── API.md # Endpoints, schemas, authentication
├── DATA.md # Database, models, migrations
├── INTEGRATIONS.md # External services, queues, webhooks
├── DEPLOYMENT.md # Build, CI/CD, infrastructure
├── CODEBASE.md # Structure, patterns, conventions
└── EVALUATION.md # Pros, cons, adoption considerations
Agent Dispatch Pattern
For each project, launch in parallel:
1. explore agent → Code Structure (Phase 1, 2)
2. explore agent → API Surface (Phase 3)
3. explore agent → Data Layer (Phase 4)
4. librarian agent → Dependencies (Phase 5, 7)
5. librarian agent → External Integrations (Phase 5, 6)
Then synthesize results into deliverable files.
See REVERSE_ENGINEERING_PROMPT.md for full agent prompt templates.