feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
This commit is contained in:
Alexander
2026-04-28 16:27:14 +02:00
commit a1f6701bac
163 changed files with 95884 additions and 0 deletions
@@ -0,0 +1,275 @@
# MiniMediaMetadataAPI - Project Overview
## Project Identity
**Name:** MiniMediaMetadataAPI
**Repository:** https://github.com/MusicMoveArr/MiniMediaMetadataAPI
**License:** GPL-3.0 (copyleft)
**Maintainer:** Single maintainer (MusicMoveArr organization)
**Status:** Active development
## Technology Stack
### Runtime & Language
- **.NET 8.0** (SDK 8.0.0)
- **C#** (modern language features)
- **ASP.NET Core** web framework
### Database Layer
- **PostgreSQL** as primary data store
- **Dapper 2.1.72** micro-ORM (NOT Entity Framework)
- **Npgsql 10.0.2** PostgreSQL driver for .NET
- **pg_trgm extension** for fuzzy text search
### Core Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| Dapper | 2.1.72 | Lightweight ORM, SQL mapping |
| Npgsql | 10.0.2 | PostgreSQL connectivity |
| FuzzySharp | 2.0.2 | String similarity matching |
| Polly | 8.6.6 | Resilience and transient fault handling |
| Quartz | 3.17.0 | Job scheduling framework |
| SpotifyAPI.Web.Auth | 7.4.2 | Spotify authentication (unused in API) |
| prometheus-net | 8.2.1 | Metrics collection and export |
| Swashbuckle | 10.1.7 | OpenAPI/Swagger documentation |
## Provider Coverage
The API aggregates metadata from **6 music providers**:
1. **Spotify** - Streaming service with rich metadata
2. **Tidal** - High-fidelity streaming platform
3. **MusicBrainz** - Open music encyclopedia
4. **Deezer** - European streaming service
5. **Discogs** - Music database and marketplace
6. **SoundCloud** - User-generated content platform
Each provider has dedicated database models and repository implementations.
## Solution Structure
The codebase is organized into **3 projects**:
### 1. MiniMediaMetadataAPI (Main API)
- ASP.NET Core web application
- Controllers for HTTP endpoints
- Middleware for request processing
- Configuration and dependency injection
- Entry point: `Program.cs`
### 2. MiniMediaMetadataAPI.Application (Business Logic)
- Repository pattern implementations
- Service layer (SearchArtist, SearchAlbum, SearchTrack)
- Database models for all 6 providers
- Entity models for API responses
- Helper utilities
### 3. MiniMediaMetadataAPI.Tests (Testing)
- xUnit test framework
- **Current state: Empty stub only (0% coverage)**
## Dependency Injection Configuration
`Program.cs` registers the following components:
### Repositories (7 total)
- `ISpotifyRepository``SpotifyRepository`
- `ITidalRepository``TidalRepository`
- `IMusicBrainzRepository``MusicBrainzRepository`
- `IDeezerRepository``DeezerRepository`
- `IDiscogsRepository``DiscogsRepository`
- `ISoundCloudRepository``SoundCloudRepository`
- `IJobRepository``JobRepository`
### Services (3 total)
- `ISearchArtistService``SearchArtistService`
- `ISearchAlbumService``SearchAlbumService`
- `ISearchTrackService``SearchTrackService`
## Resource Footprint
**Memory Usage:** <250MB
**Connection Pooling:** MinPoolSize=5, MaxPoolSize=100
This lightweight footprint makes the API suitable for containerized deployments and resource-constrained environments.
## Database Relationship
**Critical architectural note:** This API does NOT own the database schema.
- **Schema Owner:** MiniMediaScanner (separate project)
- **API Role:** Read-only consumer
- **Data Sync:** Handled entirely by MiniMediaScanner
- **No Migrations:** This project contains no database migration code
The API queries pre-populated tables. Data freshness depends on MiniMediaScanner's sync schedule.
## Codebase Metrics
- **Total C# files:** 99
- **Database models:** 60+
- **Controllers:** 4
- **Repositories:** 7
- **Services:** 3
- **Middleware:** 1 (Prometheus request tracking)
## Key Architectural Decisions
### Why Dapper over Entity Framework?
- Lightweight, minimal overhead
- Direct SQL control for complex queries
- Better performance for read-heavy workloads
- No change tracking overhead (read-only API)
### Why Repository Pattern?
- Clean separation between data access and business logic
- Provider-specific implementations isolated
- Easy to mock for testing (though tests are missing)
- Consistent interface across all providers
### Why No Schema Ownership?
- Separation of concerns: MiniMediaScanner handles sync complexity
- API focuses on query optimization and response formatting
- Avoids dual-write problems
- Simpler deployment (no migration coordination)
## Integration Points
### External Dependencies
- PostgreSQL database (shared with MiniMediaScanner)
- Prometheus metrics collector (optional)
### Internal Dependencies
- No inter-service communication
- No message queues
- No caching layer
- No external API calls (data pre-populated)
## Configuration Surface
Primary configuration via `appsettings.json`:
```json
{
"DatabaseConfiguration": {
"ConnectionString": "Host=...;Database=...;Username=...;Password=..."
},
"Prometheus": {
"MetricsUrl": "/metrics"
},
"Logging": {
"LogLevel": {
"Default": "Information"
}
}
}
```
## Deployment Artifacts
- **Dockerfile:** Multi-stage build, non-root user, ports 8080/8081
- **compose.yaml:** Minimal build configuration
- **Production compose:** Port mapping (56232:8080), memory limit (256M), volume mount for config
## CI/CD Pipeline
**GitHub Actions:** `docker-image.yml`
- **Trigger:** Push to main branch
- **Steps:** Build Docker image → Push to Docker Hub
- **Missing:** Test execution, deployment automation, health checks
## API Surface
**Base Path:** `/api`
**Documentation:** `/swagger` (Swagger UI)
**Metrics:** `/metrics` (Prometheus format)
### Endpoints
- `GET /api/SearchArtist` - Search artists across providers
- `GET /api/SearchAlbum` - Search albums across providers
- `GET /api/SearchTrack` - Search tracks across providers
- `GET /api/Search` - Stub endpoint (not implemented)
## Security Posture
**Authentication:** None (fully open API)
**Authorization:** None
**Rate Limiting:** None
**CORS:** Not configured
**HTTPS:** Commented out in production
This is a **trust-based deployment** suitable only for internal networks or behind authentication gateway.
## Observability
**Metrics:** Prometheus request counters (path, method, status labels)
**Logging:** ASP.NET Core default (console output)
**Tracing:** None
**Health Checks:** None
**Error Tracking:** None (no Sentry, no structured logging)
## Testing Strategy
**Current State:** No meaningful tests
**Test Framework:** xUnit configured but unused
**Coverage:** 0%
**CI Integration:** Tests not run in pipeline
This is a significant gap for production readiness.
## License Implications
**GPL-3.0** is a copyleft license requiring:
- Source code disclosure for derivative works
- Same license for modifications
- Patent grant to users
**Impact on integration:**
- Cannot incorporate code into proprietary systems without GPL compliance
- Can use as separate service (API boundary preserves license isolation)
- Database schema and API patterns can inspire clean-room implementations
## Relevance to metadata-aggregator Project
**High relevance** - this is the closest existing implementation to our goals:
1. **Multi-provider aggregation** - exactly our use case
2. **Unified search API** - provider-agnostic queries
3. **Database schema design** - proven model for multi-provider storage
4. **Provider isolation** - clean separation via repository pattern
5. **Fuzzy search** - pg_trgm implementation reference
**Key learnings:**
- Repository-per-provider scales well
- Dapper performs well for read-heavy metadata queries
- Separate sync process (MiniMediaScanner) simplifies API
- Provider=Any pattern enables cross-provider search
**Gaps to address:**
- Add comprehensive testing
- Implement authentication/authorization
- Add caching layer for performance
- Health checks for production readiness
- API versioning for evolution
- Rate limiting for abuse prevention
## Project Maturity Assessment
**Strengths:**
- Clean architecture
- Multiple providers working
- Lightweight and performant
- Good separation of concerns
**Weaknesses:**
- Single maintainer risk
- No test coverage
- Missing production hardening (auth, rate limiting, health checks)
- Schema coupling with external project
- Limited observability
**Maturity Level:** Early production / Advanced prototype
Suitable for internal use or as reference implementation. Needs hardening for public deployment.