Files
metadata-agregator/docs/research/REVERSE_ENGINEERING_PLAN.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

34 KiB

Reverse Engineering Plan

Systematic analysis of all 17 projects in the research folder. Each project follows the 10-phase methodology from REVERSE_ENGINEERING_PROMPT.md.

Output: For each project, create docs/research/{project-slug}/analysis/ with deliverable files.


1. MusicBrainz Server

Repo: https://github.com/metabrainz/musicbrainz-server Language: Perl | Framework: Catalyst

Todos

  • Phase 1 - Identity & Entry Points: Locate Perl entry point, Catalyst app bootstrap, package manifests (cpanfile), Makefile, Docker setup. Identify version and release cycle.
  • Phase 2 - Architecture & Structure: Map src/ structure (lib/MusicBrainz/), identify MVC layers, module boundaries. Document Catalyst controllers, models, views.
  • Phase 3 - API Surface: Document REST API at /ws/2/ (XML/JSON). Extract all entity endpoints (artist, release, recording, work, label, area, event, instrument, place, series, url). Map query parameters, includes, subqueries.
  • Phase 4 - Data Layer: Analyze PostgreSQL schema, find migration scripts, map all entity tables and relationships. Document Solr search integration.
  • Phase 5 - External Integrations: Cover Art Archive integration, relationship to other MetaBrainz services (ListenBrainz, AcoustID, BookBrainz). Replication system.
  • Phase 6 - Auth & Security: Document editor authentication, OAuth for API, permission model (auto-editors, voting system).
  • Phase 7 - Configuration: Extract all environment variables, database config, Solr config, Redis config.
  • Phase 8 - Testing: Identify test framework (Test::More/Test2), test coverage, CI setup.
  • Phase 9 - Observability: Logging, metrics, health endpoints.
  • Phase 10 - Deployment: Docker-compose setup, replication tokens, database initialization, Solr setup. Document resource requirements (~350GB DB).
  • Synthesize: Write OVERVIEW.md, ARCHITECTURE.md, API.md, DATA.md, INTEGRATIONS.md, DEPLOYMENT.md, CODEBASE.md, EVALUATION.md

2. AcoustID

Repo: https://github.com/acoustid/acoustid-server Language: Python | Index: https://github.com/acoustid/acoustid-index (Zig)

Todos


3. ListenBrainz

Repo: https://github.com/metabrainz/listenbrainz-server Language: Python

Todos


4. music-metadata-api

Repo: https://github.com/Aunali321/music-metadata-api Language: Go

Todos


5. MiniMediaMetadataAPI

Repo: https://github.com/MusicMoveArr/MiniMediaMetadataAPI Language: C#

Todos


6. Lidarr Metadata API

Repo: https://github.com/Lidarr/LidarrAPI.Metadata Language: Python

Todos


7. Harmony

Repo: https://github.com/kellnerd/harmony Language: TypeScript | Runtime: Deno | Framework: Fresh

Todos


8. GraphBrainz

Repo: https://github.com/exogen/graphbrainz Language: JavaScript | Framework: Express + GraphQL

Todos


9. Bedrock-API

Repo: https://github.com/feralbureau/bedrock-api Language: Go | API: gRPC + HTTP

Todos


10. minim

Repo: https://github.com/bbye98/minim Language: Python | Type: Library (not server)

Todos


11. MusicMetaLinker

Repo: https://github.com/andreamust/MusicMetaLinker Language: Python | Type: Library

Todos


12. Meelo

Repo: https://github.com/Arthi-chaud/Meelo Language: TypeScript (87%), Python, Go

Todos


13. Melodee

Repo: https://github.com/melodee-project/melodee Language: C# (.NET 10) | UI: Blazor

Todos

  • Phase 1 - Identity & Entry Points: Locate Program.cs, .csproj/.sln, identify Blazor app entry. Map project structure.
  • Phase 2 - Architecture & Structure: Map multi-stage pipeline: Inbound → Staging → Storage. Identify service layer, job scheduler (Quartz.NET), media processing pipeline.
  • Phase 3 - API Surface: Document three APIs: OpenSubsonic, Jellyfin, Native REST (/scalar/v1). Extract OpenAPI spec at /openapi/v1.json. Map endpoint coverage per API.
  • Phase 4 - Data Layer: PostgreSQL schema. Map entities: Artist, Album, Track, Library, User. Find EF Core migrations. Document MusicBrainz local cache DB.
  • Phase 5 - External Integrations: Metadata providers: MusicBrainz (local cache), Last.fm, Spotify, iTunes, Deezer, Brave Search. Scrobbling: Last.fm. Transcoding: ffmpeg.
  • Phase 6 - Auth & Security: User authentication, API auth per protocol (Subsonic token, Jellyfin, JWT).
  • Phase 7 - Configuration: appsettings.json, environment variables, library paths, provider API keys.
  • Phase 8 - Testing: Test projects, xUnit/NUnit.
  • Phase 9 - Observability: Logging, job scheduler status, health checks.
  • Phase 10 - Deployment: Docker, Podman, resource requirements (Raspberry Pi compatible). Multi-library federation.
  • Synthesize: Write analysis deliverables.

14. Navidrome

Repo: https://github.com/navidrome/navidrome Language: Go | UI: React

Todos


15. gonic

Repo: https://github.com/sentriz/gonic Language: Go

Todos


16. LMS (Lightweight Music Server)

Repo: https://github.com/epoupon/lms Language: C++

Todos


17. Accentor

Repo: https://github.com/accentor/api Language: Ruby | Framework: Rails

Todos


Priority based on relevance as metadata providers/aggregators:

Wave 1: Core Metadata Services

  1. MusicBrainz Server - Foundation everything builds on
  2. AcoustID - Fingerprinting complement to MusicBrainz
  3. ListenBrainz - Recommendations complement

Wave 2: Aggregators (highest value for our project)

  1. Harmony - Best multi-source aggregator
  2. GraphBrainz - GraphQL aggregation layer
  3. MiniMediaMetadataAPI - Multi-provider self-hosted
  4. music-metadata-api - High-volume lookup service
  5. Bedrock-API - gRPC aggregator

Wave 3: Libraries

  1. minim - Python multi-API client
  2. MusicMetaLinker - Entity linking library

Wave 4: Self-Hosted Servers (metadata as secondary feature)

  1. Meelo - Collector-focused with rich metadata
  2. Melodee - All-in-one with multiple API protocols
  3. Navidrome - Popular streaming server
  4. Lidarr Metadata API - *arr ecosystem
  5. LMS - C++ with strong MusicBrainz support
  6. gonic - Minimal Go implementation
  7. Accentor - Metadata-focused Rails server

Per-Project Deliverables

Each project analysis produces:

docs/research/{project-slug}/analysis/
├── OVERVIEW.md        # Purpose, tech stack, license, status
├── ARCHITECTURE.md    # Design patterns, layers, modules
├── API.md             # Endpoints, schemas, authentication
├── DATA.md            # Database, models, migrations
├── INTEGRATIONS.md    # External services, queues, webhooks
├── DEPLOYMENT.md      # Build, CI/CD, infrastructure
├── CODEBASE.md        # Structure, patterns, conventions
└── EVALUATION.md      # Pros, cons, adoption considerations

Agent Dispatch Pattern

For each project, launch in parallel:

1. explore agent  → Code Structure (Phase 1, 2)
2. explore agent  → API Surface (Phase 3)
3. explore agent  → Data Layer (Phase 4)
4. librarian agent → Dependencies (Phase 5, 7)
5. librarian agent → External Integrations (Phase 5, 6)

Then synthesize results into deliverable files.

See REVERSE_ENGINEERING_PROMPT.md for full agent prompt templates.