Files
metadata-agregator/docs/research/REVERSE_ENGINEERING_PROMPT.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

18 KiB

Project Reverse Engineering - Agent Prompt Templates

Reusable prompts for comprehensive architectural analysis of any codebase.


Master Orchestration Prompt

# PROJECT REVERSE ENGINEERING: {PROJECT_NAME}

## OBJECTIVE
Perform comprehensive architectural analysis of {PROJECT_NAME} ({REPO_URL}).
Extract all information needed for an architect to understand, evaluate, and potentially integrate or fork this project.

## OUTPUT FORMAT
Create a structured report in `docs/research/{project-slug}/analysis/` with:
- `OVERVIEW.md` - Executive summary
- `ARCHITECTURE.md` - System design
- `API.md` - API surface documentation
- `DATA.md` - Data models and persistence
- `INTEGRATIONS.md` - External dependencies and services
- `DEPLOYMENT.md` - Build, deploy, operate
- `CODEBASE.md` - Code organization and patterns

---

## PHASE 1: IDENTITY & ENTRY POINTS

### Search for:
1. **Project metadata files**:
   - README.md, CONTRIBUTING.md, CHANGELOG.md
   - LICENSE, SECURITY.md, CODE_OF_CONDUCT.md

2. **Package manifests** (identify language/framework):
   - package.json, package-lock.json, yarn.lock
   - go.mod, go.sum
   - Cargo.toml, Cargo.lock
   - pyproject.toml, setup.py, requirements.txt, Pipfile
   - *.csproj, *.sln, packages.config
   - pom.xml, build.gradle
   - Gemfile, *.gemspec
   - composer.json

3. **Entry points** (grep patterns):
   - `func main(` (Go)
   - `if __name__ == "__main__"` (Python)
   - `"main":` in package.json (Node.js)
   - `createApp`, `express()`, `fastify()` (JS frameworks)
   - `@SpringBootApplication`, `public static void main` (Java)
   - `Program.cs`, `Startup.cs` (.NET)

4. **Build/task files**:
   - Makefile, Taskfile.yml, justfile
   - package.json scripts section
   - Dockerfile, docker-compose*.yml

### Extract:
- [ ] Project name and description
- [ ] Primary language and framework
- [ ] Version and release status
- [ ] License type
- [ ] Main entry point file(s)
- [ ] Build commands
- [ ] Run commands

---

## PHASE 2: ARCHITECTURE & STRUCTURE

### Search for:
1. **Architecture documentation**:
   - ARCHITECTURE.md, docs/architecture/*, docs/design/*
   - ADR (Architecture Decision Records) in docs/adr/
   - Diagrams: *.mmd, *.puml, *.drawio, docs/diagrams/*

2. **Directory structure patterns**:

src/, lib/, pkg/, internal/, cmd/, app/ core/, domain/, entities/, models/ services/, handlers/, controllers/, api/ repositories/, dal/, db/, persistence/ adapters/, ports/, interfaces/, infrastructure/ utils/, helpers/, common/, shared/


3. **Module boundaries**:
- Separate go.mod files (Go workspaces)
- Multiple package.json (monorepo)
- __init__.py locations (Python packages)
- *.csproj files (.NET projects)

### Extract:
- [ ] Architecture style (monolith, microservices, modular monolith)
- [ ] Layer organization (clean, hexagonal, MVC, etc.)
- [ ] Module/package list with responsibilities
- [ ] Dependency direction (which modules import which)
- [ ] Public vs internal API boundaries

---

## PHASE 3: API SURFACE

### Search for:
1. **API specifications**:
- openapi.yaml, openapi.json, swagger.*
- *.proto (gRPC/protobuf)
- schema.graphql, *.gql
- RAML, API Blueprint files

2. **Route definitions** (grep patterns):
- `router.`, `app.get(`, `app.post(`, `app.use(`
- `@Get(`, `@Post(`, `@Controller(`
- `@app.route(`, `@router.`
- `http.HandleFunc(`, `mux.Handle(`
- `[HttpGet]`, `[HttpPost]`, `[Route(`

3. **API versioning**:
- `/api/v1/`, `/api/v2/` in routes
- Version headers handling
- Version in path vs query vs header

4. **Request/Response types**:
- DTOs, ViewModels, Schemas
- Validation decorators/annotations
- Serialization configuration

### Extract:
- [ ] API style (REST, GraphQL, gRPC, mixed)
- [ ] Complete endpoint list with methods
- [ ] Authentication requirements per endpoint
- [ ] Request/response schemas
- [ ] Rate limiting configuration
- [ ] CORS settings

---

## PHASE 4: DATA LAYER

### Search for:
1. **Database configuration**:
- database.yml, ormconfig.*, knexfile.*
- prisma/schema.prisma
- alembic.ini, alembic/
- Connection strings in config files

2. **Migrations**:
- migrations/, db/migrate/
- *_migration.*, *.up.sql, *.down.sql
- Migration tool config (Flyway, Liquibase, etc.)

3. **Models/Entities**:
- models/, entities/, domain/
- @Entity, @Table decorators
- SQLAlchemy models, Django models
- Prisma models, TypeORM entities

4. **Caching layer**:
- Redis configuration
- Cache decorators/annotations
- TTL settings

5. **Search/indexing**:
- Elasticsearch, Solr, MeiliSearch config
- Index definitions

### Extract:
- [ ] Database type (PostgreSQL, MySQL, SQLite, MongoDB, etc.)
- [ ] ORM/query builder used
- [ ] Complete entity list with relationships
- [ ] Migration history (schema evolution)
- [ ] Indexes defined
- [ ] Caching strategy
- [ ] Search implementation

---

## PHASE 5: EXTERNAL INTEGRATIONS

### Search for:
1. **API clients**:
- clients/, adapters/, providers/
- *Client.*, *Service.*, *API.*
- HTTP client initialization (axios, fetch, http.Client)

2. **Third-party SDKs**:
- aws-sdk, google-cloud, azure
- stripe, twilio, sendgrid
- oauth providers

3. **Message queues**:
- queues/, workers/, jobs/, consumers/
- RabbitMQ, Kafka, Redis Pub/Sub, SQS config
- Bull, Celery, Sidekiq configuration

4. **Webhooks**:
- webhooks/, callbacks/
- Webhook handlers and validators

5. **External service configuration**:
- Service URLs in config
- API keys in env.example

### Extract:
- [ ] List of external services integrated
- [ ] API clients and their configuration
- [ ] Message queue architecture
- [ ] Webhook endpoints (incoming)
- [ ] Outgoing webhook calls
- [ ] Service dependencies (required vs optional)

---

## PHASE 6: AUTHENTICATION & SECURITY

### Search for:
1. **Auth implementation**:
- auth/, authentication/, identity/
- middleware/auth*, guards/, policies/
- JWT handling, session management
- OAuth/OIDC configuration

2. **Authorization**:
- RBAC/ABAC implementation
- Permission checks, policy enforcement
- Role definitions

3. **Security middleware**:
- CORS configuration
- Rate limiting
- Input validation
- CSRF protection

4. **Secrets management**:
- Vault integration
- Secret rotation
- Encryption at rest

### Extract:
- [ ] Authentication method(s) (JWT, session, OAuth, API key)
- [ ] Token storage and lifecycle
- [ ] Authorization model (RBAC, ABAC, custom)
- [ ] Role/permission definitions
- [ ] Security headers configured
- [ ] Rate limiting rules
- [ ] Input validation approach

---

## PHASE 7: CONFIGURATION & ENVIRONMENT

### Search for:
1. **Environment configuration**:
- .env.example, .env.sample, .env.template
- config/, settings/, conf/
- Environment-specific files (*.development.*, *.production.*)

2. **Configuration loaders**:
- Config parsing code
- Environment variable mapping
- Default values

3. **Feature flags**:
- Feature flag service integration
- Local feature flag config

### Extract:
- [ ] All environment variables (from .env.example)
- [ ] Required vs optional configuration
- [ ] Configuration hierarchy (defaults → env → file)
- [ ] Feature flag system
- [ ] Environment-specific overrides

---

## PHASE 8: TESTING

### Search for:
1. **Test files**:
- *_test.*, *.spec.*, *.test.*
- tests/, __tests__/, spec/
- Test configuration (jest.config.*, pytest.ini, etc.)

2. **Test types**:
- Unit tests
- Integration tests (tests/integration/)
- E2E tests (e2e/, cypress/, playwright/)
- Contract tests (pact/)

3. **Test utilities**:
- fixtures/, __mocks__/, testdata/
- factories/, builders/
- Test helpers

### Extract:
- [ ] Test framework(s) used
- [ ] Test coverage configuration
- [ ] Test categories and organization
- [ ] Mocking strategy
- [ ] Test data management
- [ ] CI test commands

---

## PHASE 9: OBSERVABILITY

### Search for:
1. **Logging**:
- logging/, logger.*
- Log configuration
- Log levels and formats

2. **Metrics**:
- metrics/, prometheus.*
- Custom metrics definitions
- Metrics endpoints

3. **Tracing**:
- tracing/, *span*, *trace*
- OpenTelemetry, Jaeger, Zipkin config

4. **Health checks**:
- health.*, /health, /ready, /live endpoints
- Dependency health checks

5. **Error tracking**:
- Sentry, Bugsnag, Rollbar integration

### Extract:
- [ ] Logging framework and configuration
- [ ] Log aggregation destination
- [ ] Metrics exposed
- [ ] Tracing implementation
- [ ] Health check endpoints
- [ ] Error tracking service

---

## PHASE 10: DEPLOYMENT & OPERATIONS

### Search for:
1. **CI/CD**:
- .github/workflows/
- .gitlab-ci.yml
- Jenkinsfile, azure-pipelines.yml
- .circleci/

2. **Containerization**:
- Dockerfile, docker-compose*.yml
- .dockerignore

3. **Orchestration**:
- kubernetes/, k8s/, helm/
- docker-swarm.yml
- nomad/

4. **Infrastructure as Code**:
- terraform/, pulumi/, cdk/
- cloudformation/

5. **Release management**:
- CHANGELOG.md
- Release scripts
- Version bumping config

### Extract:
- [ ] CI/CD pipeline stages
- [ ] Build process
- [ ] Test automation in CI
- [ ] Deployment targets (cloud, k8s, etc.)
- [ ] Infrastructure dependencies
- [ ] Release process
- [ ] Rollback procedures

---

## DELIVERABLES CHECKLIST

For each project, produce:

- [ ] `OVERVIEW.md` - Purpose, tech stack, license, status
- [ ] `ARCHITECTURE.md` - Design patterns, layers, modules
- [ ] `API.md` - Endpoints, schemas, authentication
- [ ] `DATA.md` - Database, models, migrations
- [ ] `INTEGRATIONS.md` - External services, queues, webhooks
- [ ] `DEPLOYMENT.md` - Build, CI/CD, infrastructure
- [ ] `CODEBASE.md` - Structure, patterns, conventions
- [ ] `EVALUATION.md` - Pros, cons, adoption considerations

Specialized Agent Prompts

Explore Agent - Code Structure

[CONTEXT]: Reverse engineering {PROJECT_NAME} at {REPO_URL}
[GOAL]: Map the codebase structure and identify architectural patterns
[DOWNSTREAM]: Feed into comprehensive architecture documentation
[REQUEST]:
1. Clone/examine the repository structure (top 3 levels)
2. Identify the primary language and framework from package manifests
3. Find all entry points (main functions, app bootstrap)
4. Map the directory structure to architectural layers
5. Identify module boundaries and dependencies
6. Find any existing architecture documentation

SKIP: node_modules, vendor, dist, build, .git, __pycache__
RETURN: Structured findings with file paths as evidence

Explore Agent - API Surface

[CONTEXT]: Reverse engineering {PROJECT_NAME} at {REPO_URL}
[GOAL]: Document complete API surface (REST/GraphQL/gRPC)
[DOWNSTREAM]: Create API.md with all endpoints and schemas
[REQUEST]:
1. Find API specification files (openapi.yaml, *.proto, schema.graphql)
2. Grep for route definitions in all supported patterns
3. Extract request/response types and validation
4. Identify authentication requirements per endpoint
5. Find rate limiting and CORS configuration
6. Document any API versioning strategy

RETURN: Complete endpoint list with method, path, auth requirement, and schema reference

Explore Agent - Data Layer

[CONTEXT]: Reverse engineering {PROJECT_NAME} at {REPO_URL}
[GOAL]: Document data persistence layer completely
[DOWNSTREAM]: Create DATA.md with models, relationships, migrations
[REQUEST]:
1. Identify database type from configuration
2. Find all entity/model definitions
3. Extract relationships between entities
4. List all migrations in chronological order
5. Identify caching layer configuration
6. Find any search/indexing implementation

RETURN: Entity list with fields, relationships, and migration history

Librarian Agent - Dependencies

[CONTEXT]: Analyzing dependencies of {PROJECT_NAME}
[GOAL]: Understand external library usage and their purposes
[DOWNSTREAM]: Assess technical debt, security, maintainability
[REQUEST]:
1. Parse package manifest for all dependencies
2. Categorize: runtime vs dev, core vs optional
3. For key dependencies, lookup:
   - Purpose and functionality
   - Current version vs latest
   - Known vulnerabilities (npm audit, safety, etc.)
   - Maintenance status (last release, open issues)
4. Identify any deprecated or unmaintained dependencies

RETURN: Dependency inventory with risk assessment

Librarian Agent - External Integrations

[CONTEXT]: Analyzing external integrations of {PROJECT_NAME}
[GOAL]: Document all third-party service integrations
[DOWNSTREAM]: Understand operational dependencies
[REQUEST]:
1. Find API client implementations in the codebase
2. For each external service:
   - Official documentation links
   - API version being used
   - Authentication method
   - Rate limits and quotas
3. Find message queue integrations
4. Document webhook handlers (incoming/outgoing)

RETURN: Integration inventory with documentation links and configuration requirements

Dispatch Template

// Template for dispatching agents - substitute {PROJECT_NAME} and {REPO_URL}

// Phase 1: Structure Analysis (parallel)
task(subagent_type="explore", load_skills=[], run_in_background=true,
  description="Analyze {PROJECT_NAME} structure",
  prompt=`[CONTEXT]: Reverse engineering {PROJECT_NAME} at {REPO_URL}
[GOAL]: Map the codebase structure and identify architectural patterns
[DOWNSTREAM]: Feed into comprehensive architecture documentation
[REQUEST]:
1. Clone/examine the repository structure (top 3 levels)
2. Identify the primary language and framework from package manifests
3. Find all entry points (main functions, app bootstrap)
4. Map the directory structure to architectural layers
5. Identify module boundaries and dependencies
6. Find any existing architecture documentation

SKIP: node_modules, vendor, dist, build, .git, __pycache__
RETURN: Structured findings with file paths as evidence`
)

task(subagent_type="explore", load_skills=[], run_in_background=true,
  description="Document {PROJECT_NAME} API",
  prompt=`[CONTEXT]: Reverse engineering {PROJECT_NAME} at {REPO_URL}
[GOAL]: Document complete API surface (REST/GraphQL/gRPC)
[DOWNSTREAM]: Create API.md with all endpoints and schemas
[REQUEST]:
1. Find API specification files (openapi.yaml, *.proto, schema.graphql)
2. Grep for route definitions in all supported patterns
3. Extract request/response types and validation
4. Identify authentication requirements per endpoint
5. Find rate limiting and CORS configuration
6. Document any API versioning strategy

RETURN: Complete endpoint list with method, path, auth requirement, and schema reference`
)

task(subagent_type="explore", load_skills=[], run_in_background=true,
  description="Analyze {PROJECT_NAME} data layer",
  prompt=`[CONTEXT]: Reverse engineering {PROJECT_NAME} at {REPO_URL}
[GOAL]: Document data persistence layer completely
[DOWNSTREAM]: Create DATA.md with models, relationships, migrations
[REQUEST]:
1. Identify database type from configuration
2. Find all entity/model definitions
3. Extract relationships between entities
4. List all migrations in chronological order
5. Identify caching layer configuration
6. Find any search/indexing implementation

RETURN: Entity list with fields, relationships, and migration history`
)

// Phase 2: External Research (parallel)
task(subagent_type="librarian", load_skills=[], run_in_background=true,
  description="Research {PROJECT_NAME} dependencies",
  prompt=`[CONTEXT]: Analyzing dependencies of {PROJECT_NAME}
[GOAL]: Understand external library usage and their purposes
[DOWNSTREAM]: Assess technical debt, security, maintainability
[REQUEST]:
1. Parse package manifest for all dependencies
2. Categorize: runtime vs dev, core vs optional
3. For key dependencies, lookup:
   - Purpose and functionality
   - Current version vs latest
   - Known vulnerabilities
   - Maintenance status (last release, open issues)
4. Identify any deprecated or unmaintained dependencies

RETURN: Dependency inventory with risk assessment`
)

task(subagent_type="librarian", load_skills=[], run_in_background=true,
  description="Document {PROJECT_NAME} integrations",
  prompt=`[CONTEXT]: Analyzing external integrations of {PROJECT_NAME}
[GOAL]: Document all third-party service integrations
[DOWNSTREAM]: Understand operational dependencies
[REQUEST]:
1. Find API client implementations in the codebase
2. For each external service:
   - Official documentation links
   - API version being used
   - Authentication method
   - Rate limits and quotas
3. Find message queue integrations
4. Document webhook handlers (incoming/outgoing)

RETURN: Integration inventory with documentation links and configuration requirements`
)

// Phase 3: Wait for completion, then synthesize into documentation files

Quick Search Commands

# Project structure overview
tree -L 3 -I 'node_modules|vendor|.git|__pycache__|dist|build'

# Find largest directories (complexity indicators)
du -sh */ | sort -hr | head -10

# Count lines by language
find . -name "*.ts" -o -name "*.py" -o -name "*.go" | xargs wc -l | tail -1

# Recent activity (what's being worked on)
git log --oneline -20

# Find TODO/FIXME comments
grep -rn "TODO\|FIXME\|HACK\|XXX" --include="*.ts" --include="*.py" --include="*.go"

# Find all entry points
grep -r "func main\|def main\|if __name__\|createApp\|express()" --include="*.go" --include="*.py" --include="*.ts" --include="*.js"

# Find route definitions
grep -rn "router\.\|app\.get\|app\.post\|@Get\|@Post\|@route\|path(" --include="*.ts" --include="*.py" --include="*.go"

# Find database models/entities
grep -rn "class.*Model\|@Entity\|@Table\|type.*struct" --include="*.py" --include="*.ts" --include="*.go" --include="*.java"

# Find external API calls
grep -rn "fetch(\|axios\|http\.Get\|requests\.\|HttpClient" --include="*.ts" --include="*.py" --include="*.go" --include="*.cs"

# Find environment variable usage
grep -rn "process\.env\|os\.getenv\|os\.Getenv\|env::" --include="*.ts" --include="*.py" --include="*.go" --include="*.rs"

Usage

  1. Replace {PROJECT_NAME} with the project name (e.g., "Harmony")
  2. Replace {REPO_URL} with the repository URL (e.g., "https://github.com/kellnerd/harmony")
  3. Dispatch the agents using the template
  4. Collect results and synthesize into documentation files