Files
metadata-agregator/docs/research/minimediametadataapi/analysis/ARCHITECTURE.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

696 lines
20 KiB
Markdown

# MiniMediaMetadataAPI - Architecture Analysis
## Architectural Pattern
**Primary Pattern:** Repository Pattern with Service Layer
**NOT Clean Architecture** - simpler layered approach without strict dependency inversion
## Project Structure
```
MiniMediaMetadataAPI.sln
├── MiniMediaMetadataAPI/ (Web API Layer)
│ ├── Controllers/ (HTTP endpoints)
│ ├── Middlewares/ (Request pipeline)
│ ├── Options/ (Configuration models)
│ └── Program.cs (Entry point, DI setup)
├── MiniMediaMetadataAPI.Application/ (Business Logic Layer)
│ ├── Configurations/ (Database config models)
│ ├── Enums/ (Provider types, result types)
│ ├── Helpers/ (Utility functions)
│ ├── Models/
│ │ ├── Database/ (Provider-specific DB models)
│ │ │ ├── Deezer/
│ │ │ ├── Discogs/
│ │ │ ├── MusicBrainz/
│ │ │ ├── SoundCloud/
│ │ │ ├── Spotify/
│ │ │ └── Tidal/
│ │ └── Entities/ (API response models)
│ ├── Repositories/ (Data access layer)
│ └── Services/ (Business logic)
└── MiniMediaMetadataAPI.Tests/ (Test project - empty)
```
## Layer Responsibilities
### Web API Layer (MiniMediaMetadataAPI)
**Purpose:** HTTP interface and request handling
**Components:**
- **Controllers (4):** SearchArtist, SearchAlbum, SearchTrack, Search
- **Middleware (1):** RequestMiddleware (Prometheus metrics)
- **Program.cs:** DI container configuration, middleware pipeline setup
**Dependencies:**
- ASP.NET Core framework
- Swashbuckle (Swagger/OpenAPI)
- prometheus-net
- References Application layer
**Responsibilities:**
- HTTP request/response handling
- Input validation and sanitization
- Swagger documentation generation
- Metrics collection
- Dependency injection configuration
### Application Layer (MiniMediaMetadataAPI.Application)
**Purpose:** Business logic and data access
**Components:**
#### Repositories (7 implementations)
1. `SpotifyRepository` - Spotify data access
2. `TidalRepository` - Tidal data access
3. `MusicBrainzRepository` - MusicBrainz data access
4. `DeezerRepository` - Deezer data access
5. `DiscogsRepository` - Discogs data access
6. `SoundCloudRepository` - SoundCloud data access
7. `JobRepository` - Job tracking (unused)
Each repository implements:
- `SearchArtist(string name, int offset)`
- `GetArtistById(string/int/Guid id)`
- `SearchAlbum(string name, string artistId, int offset)`
- `GetAlbumById(string/int/Guid id)`
- `SearchTrack(string name, string artistId, int offset)`
- `GetTrackById(string/int/Guid id)`
#### Services (3 implementations)
1. `SearchArtistService` - Orchestrates artist search across providers
2. `SearchAlbumService` - Orchestrates album search across providers
3. `SearchTrackService` - Orchestrates track search across providers
**Dependencies:**
- Dapper (SQL mapping)
- Npgsql (PostgreSQL driver)
- FuzzySharp (string similarity)
- Polly (resilience)
**Responsibilities:**
- SQL query execution via Dapper
- Provider-specific data mapping
- Fuzzy search logic
- Error handling and logging
- Cross-provider aggregation (in services)
### Test Layer (MiniMediaMetadataAPI.Tests)
**Purpose:** Automated testing (currently unused)
**Current State:**
- xUnit framework configured
- Single empty test stub: `Test1()`
- 0% code coverage
- Not executed in CI/CD pipeline
## Data Flow
### Request Flow (Artist Search Example)
```
HTTP GET /api/SearchArtist?Name=Beatles&Provider=Any
SearchArtistController.Get()
Input sanitization (StringHelper.RemoveControlChars)
ISearchArtistService.SearchArtist()
[Provider=Any] → Query all 6 repositories in parallel
[Provider=Spotify] → Query SpotifyRepository only
Repository.SearchArtist()
Dapper SQL execution with pg_trgm fuzzy match
Map database models → SearchArtistEntity
Return SearchArtistResponse (SearchResultType + entities)
JSON serialization → HTTP 200 OK
```
### Database Query Flow
```
Service Layer
Repository Interface (ISpotifyRepository)
Repository Implementation (SpotifyRepository)
Dapper QueryAsync<T>()
Npgsql Connection (from pool)
PostgreSQL Database
pg_trgm similarity search
Result set → Dapper mapping → Database models
Transform to Entity models
Return to Service
```
## Database Access Strategy
### ORM Choice: Dapper (NOT Entity Framework)
**Rationale:**
- Lightweight, minimal overhead
- Direct SQL control for complex queries
- No change tracking (read-only workload)
- Better performance for high-throughput reads
- Simpler for multi-provider schema
**Trade-offs:**
- No automatic migrations (schema owned externally anyway)
- Manual SQL writing (more verbose)
- No LINQ query translation
- Type safety only at compile time for models
### Connection Management
**Pooling Configuration:**
```
MinPoolSize=5
MaxPoolSize=100
```
**Connection Lifecycle:**
- Connections created per request
- Returned to pool after query
- No long-lived connections
- No connection state management
**No DbContext:** Each repository method opens/closes connections independently.
### Query Patterns
**Fuzzy Search (pg_trgm):**
```sql
SET LOCAL pg_trgm.similarity_threshold = 0.5;
SELECT * FROM spotify_artist
WHERE lower(name) % lower(@searchTerm)
ORDER BY similarity(lower(name), lower(@searchTerm)) DESC
LIMIT 20 OFFSET @offset;
```
**Exact ID Lookup:**
```sql
SELECT * FROM spotify_artist WHERE id = @id;
```
**Join Queries (Album with Artists):**
```sql
SELECT a.*, ar.*
FROM spotify_album a
LEFT JOIN spotify_album_artist aa ON a.id = aa.album_id
LEFT JOIN spotify_artist ar ON aa.artist_id = ar.id
WHERE a.id = @albumId;
```
## Schema Ownership Model
**Critical Design Decision:** This API does NOT own the database schema.
### Responsibilities Split
| Concern | Owner | Location |
|---------|-------|----------|
| Schema definition | MiniMediaScanner | External project |
| Migrations | MiniMediaScanner | External project |
| Data ingestion | MiniMediaScanner | External project |
| Provider API calls | MiniMediaScanner | External project |
| Data sync scheduling | MiniMediaScanner | External project |
| Query optimization | MiniMediaMetadataAPI | This project |
| Read-only queries | MiniMediaMetadataAPI | This project |
| Response formatting | MiniMediaMetadataAPI | This project |
### Implications
**Pros:**
- Clear separation of concerns
- API doesn't need provider API credentials
- Simpler deployment (no migration coordination)
- Avoids dual-write complexity
- Sync logic isolated from query logic
**Cons:**
- Schema changes require coordination
- No control over data freshness
- Dependency on external project
- Can't optimize schema for query patterns
- Breaking schema changes break API
### Coupling Points
1. **Table names** - Hardcoded in repository SQL
2. **Column names** - Hardcoded in Dapper mappings
3. **Data types** - Must match C# model properties
4. **Relationships** - Foreign keys assumed in joins
**No schema validation** - API assumes schema exists and matches expectations.
## Provider Isolation Strategy
### Repository Per Provider
Each provider has dedicated repository implementation:
```
ISpotifyRepository → SpotifyRepository
ITidalRepository → TidalRepository
IMusicBrainzRepository → MusicBrainzRepository
IDeezerRepository → DeezerRepository
IDiscogsRepository → DiscogsRepository
ISoundCloudRepository → SoundCloudRepository
```
**Benefits:**
- Provider-specific logic isolated
- Schema differences handled independently
- Easy to add/remove providers
- Clear testing boundaries
- No cross-provider contamination
**Shared Interface:**
```csharp
public interface IProviderRepository
{
Task<List<ArtistModel>> SearchArtist(string name, int offset);
Task<ArtistModel> GetArtistById(string id);
Task<List<AlbumModel>> SearchAlbum(string name, string artistId, int offset);
Task<AlbumModel> GetAlbumById(string id);
Task<List<TrackModel>> SearchTrack(string name, string artistId, int offset);
Task<TrackModel> GetTrackById(string id);
}
```
**Note:** ID types vary by provider (string, int, Guid, long), so actual interfaces use provider-specific types.
### Database Models Per Provider
**60+ database models** organized by provider:
```
Models/Database/
├── Spotify/
│ ├── SpotifyArtist.cs
│ ├── SpotifyArtistImage.cs
│ ├── SpotifyAlbum.cs
│ ├── SpotifyAlbumArtist.cs
│ ├── SpotifyAlbumImage.cs
│ ├── SpotifyAlbumExternalId.cs
│ ├── SpotifyTrack.cs
│ ├── SpotifyTrackArtist.cs
│ └── SpotifyTrackExternalId.cs
├── Tidal/
│ ├── TidalArtist.cs
│ ├── TidalArtistImageLink.cs
│ ├── TidalAlbum.cs
│ ├── TidalAlbumExternalLink.cs
│ ├── TidalAlbumImage.cs
│ ├── TidalTrack.cs
│ ├── TidalTrackArtist.cs
│ └── TidalTrackExternalLink.cs
├── MusicBrainz/
│ ├── MusicBrainzArtist.cs
│ ├── MusicBrainzRelease.cs
│ ├── MusicBrainzReleaseLabel.cs
│ ├── MusicBrainzLabel.cs
│ ├── MusicBrainzReleaseTrack.cs
│ └── MusicBrainzReleaseTrackArtist.cs
├── Deezer/
│ ├── DeezerArtist.cs
│ ├── DeezerArtistImageLink.cs
│ ├── DeezerAlbum.cs
│ ├── DeezerAlbumImageLink.cs
│ ├── DeezerAlbumArtist.cs
│ ├── DeezerTrack.cs
│ └── DeezerTrackArtist.cs
├── Discogs/
│ ├── DiscogsArtist.cs
│ ├── DiscogsArtistAlias.cs
│ ├── DiscogsArtistUrl.cs
│ ├── DiscogsRelease.cs
│ ├── DiscogsReleaseArtist.cs
│ ├── DiscogsReleaseIdentifier.cs
│ ├── DiscogsReleaseTrack.cs
│ ├── DiscogsLabel.cs
│ ├── DiscogsLabelSublabel.cs
│ └── DiscogsLabelUrl.cs
└── SoundCloud/
├── SoundCloudUser.cs
├── SoundCloudPlaylist.cs
├── SoundCloudTrack.cs
└── SoundCloudTrackArtist.cs
```
**Mapping Strategy:**
- Database models map 1:1 to database tables
- Dapper auto-maps columns to properties (case-insensitive)
- Complex types (arrays, nested objects) handled manually
- No navigation properties (manual joins)
### Unified Entity Models
**API response models** are provider-agnostic:
```
Models/Entities/
├── SearchArtistEntity.cs
├── SearchAlbumEntity.cs
├── SearchTrackEntity.cs
├── ArtistImageEntity.cs
├── AlbumImageEntity.cs
└── TrackImageEntity.cs
```
**Transformation happens in repositories:**
```csharp
// SpotifyRepository
private SearchArtistEntity MapToEntity(SpotifyArtist dbModel)
{
return new SearchArtistEntity
{
ProviderType = ProviderType.Spotify,
Id = dbModel.Id,
Name = dbModel.Name,
Popularity = dbModel.Popularity,
Url = dbModel.ExternalUrl,
TotalFollowers = dbModel.Followers,
Genres = dbModel.Genres,
Images = MapImages(dbModel.Images),
LastSyncTime = dbModel.LastSyncTime
};
}
```
## Service Layer Orchestration
### Cross-Provider Search
Services aggregate results from multiple repositories:
```csharp
public class SearchArtistService : ISearchArtistService
{
private readonly ISpotifyRepository _spotify;
private readonly ITidalRepository _tidal;
private readonly IMusicBrainzRepository _musicBrainz;
private readonly IDeezerRepository _deezer;
private readonly IDiscogsRepository _discogs;
private readonly ISoundCloudRepository _soundCloud;
public async Task<SearchArtistResponse> SearchArtist(
string name,
ProviderType provider,
int offset)
{
if (provider == ProviderType.Any)
{
// Query all providers in parallel
var tasks = new[]
{
_spotify.SearchArtist(name, offset),
_tidal.SearchArtist(name, offset),
_musicBrainz.SearchArtist(name, offset),
_deezer.SearchArtist(name, offset),
_discogs.SearchArtist(name, offset),
_soundCloud.SearchArtist(name, offset)
};
var results = await Task.WhenAll(tasks);
var combined = results.SelectMany(r => r).ToList();
return new SearchArtistResponse
{
SearchResultType = combined.Any()
? SearchResultType.Ok
: SearchResultType.NotFound,
Artists = combined
};
}
else
{
// Query single provider
var repository = GetRepository(provider);
var results = await repository.SearchArtist(name, offset);
return new SearchArtistResponse
{
SearchResultType = results.Any()
? SearchResultType.Ok
: SearchResultType.NotFound,
Artists = results
};
}
}
}
```
**Parallel Execution:** When `Provider=Any`, all 6 repositories queried simultaneously via `Task.WhenAll()`.
**No Result Deduplication:** If same artist exists in multiple providers, returned multiple times with different `ProviderType` values.
## Middleware Pipeline
**Single middleware:** RequestMiddleware
**Purpose:** Prometheus metrics collection
**Implementation:**
```csharp
public class RequestMiddleware
{
private static readonly Counter RequestCounter = Metrics
.CreateCounter(
"minimediametadataapi_request_total",
"Total HTTP requests",
new CounterConfiguration
{
LabelNames = new[] { "path", "method", "status" }
});
public async Task InvokeAsync(HttpContext context, RequestDelegate next)
{
await next(context);
RequestCounter
.WithLabels(
context.Request.Path,
context.Request.Method,
context.Response.StatusCode.ToString())
.Inc();
}
}
```
**Registered in Program.cs:**
```csharp
app.UseMiddleware<RequestMiddleware>();
```
**No other middleware:**
- No authentication middleware
- No rate limiting middleware
- No CORS middleware
- No exception handling middleware (uses ASP.NET Core default)
## Dependency Injection Setup
**Program.cs registration:**
```csharp
// Database configuration
builder.Services.Configure<DatabaseConfiguration>(
builder.Configuration.GetSection("DatabaseConfiguration"));
// Repositories
builder.Services.AddScoped<ISpotifyRepository, SpotifyRepository>();
builder.Services.AddScoped<ITidalRepository, TidalRepository>();
builder.Services.AddScoped<IMusicBrainzRepository, MusicBrainzRepository>();
builder.Services.AddScoped<IDeezerRepository, DeezerRepository>();
builder.Services.AddScoped<IDiscogsRepository, DiscogsRepository>();
builder.Services.AddScoped<ISoundCloudRepository, SoundCloudRepository>();
builder.Services.AddScoped<IJobRepository, JobRepository>();
// Services
builder.Services.AddScoped<ISearchArtistService, SearchArtistService>();
builder.Services.AddScoped<ISearchAlbumService, SearchAlbumService>();
builder.Services.AddScoped<ISearchTrackService, SearchTrackService>();
// Swagger
builder.Services.AddSwaggerGen();
// Controllers
builder.Services.AddControllers();
```
**Lifetime:** All components use `Scoped` lifetime (per-request).
**No Singleton services** - each request gets fresh instances.
## Error Handling Strategy
**Repository Level:**
```csharp
public async Task<List<SearchArtistEntity>> SearchArtist(string name, int offset)
{
try
{
using var connection = new NpgsqlConnection(_connectionString);
var results = await connection.QueryAsync<SpotifyArtist>(sql, parameters);
return results.Select(MapToEntity).ToList();
}
catch (Exception ex)
{
_logger.LogError(ex, "Error searching Spotify artists");
return new List<SearchArtistEntity>();
}
}
```
**Strategy:** Catch all exceptions, log, return empty list.
**No custom exceptions** - generic Exception catch.
**No error propagation** - failures silently return empty results.
**Implications:**
- Partial failures in multi-provider search go unnoticed
- Client can't distinguish between "no results" and "provider error"
- No retry logic (despite Polly dependency)
## Configuration Management
**appsettings.json structure:**
```json
{
"DatabaseConfiguration": {
"ConnectionString": "Host=localhost;Database=minimediametadata;Username=user;Password=pass;MinPoolSize=5;MaxPoolSize=100"
},
"Prometheus": {
"MetricsUrl": "/metrics"
},
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.AspNetCore": "Warning"
}
}
}
```
**Environment-specific overrides:**
- `appsettings.Development.json` - logging only
- No production-specific config file
- Environment variables supported (ASP.NET Core default)
**No secrets management:**
- Database password in plain text
- No Azure Key Vault integration
- No environment variable requirements
## Unused Dependencies
**Quartz (3.17.0):** Job scheduling framework registered but no jobs defined.
**SpotifyAPI.Web.Auth (7.4.2):** Spotify authentication library present but unused (MiniMediaScanner handles auth).
**Polly (8.6.6):** Resilience library registered but no retry policies applied.
**Implications:** Dependency bloat, potential security vulnerabilities in unused packages.
## Scalability Considerations
**Horizontal Scaling:**
- Stateless design (no in-memory state)
- Connection pooling supports multiple instances
- No distributed locking needed
- No session affinity required
**Bottlenecks:**
- Database connection pool (max 100 per instance)
- PostgreSQL query performance
- No caching layer (every request hits database)
**Missing Optimizations:**
- No Redis/Memcached for result caching
- No CDN for static responses
- No query result pagination limits (unbounded result sets)
## Testing Architecture
**Current State:** Non-existent
**Configured Framework:** xUnit
**Missing Test Types:**
- Unit tests (repository logic, service orchestration)
- Integration tests (database queries)
- API tests (controller endpoints)
- Performance tests (load testing)
**Testability Issues:**
- Repositories tightly coupled to Npgsql (hard to mock)
- No repository interfaces in some cases
- No test database setup scripts
- No test data fixtures
## File Organization
**99 C# files** organized as:
```
Controllers/ 4 files
Middlewares/ 1 file
Options/ 1 file
Configurations/ 1 file
Enums/ 2 files
Helpers/ 2 files
Models/Database/ 60+ files (10 per provider average)
Models/Entities/ 6 files
Repositories/ 7 files
Services/ 3 files
Tests/ 1 file (stub)
```
**Naming Conventions:**
- PascalCase for all files
- Suffix pattern: `*Repository.cs`, `*Service.cs`, `*Controller.cs`, `*Entity.cs`
- Provider prefix for database models: `Spotify*.cs`, `Tidal*.cs`, etc.
## Architecture Evaluation
**Strengths:**
- Clear layer separation
- Provider isolation via repositories
- Parallel query execution for multi-provider search
- Lightweight (Dapper over EF)
- Simple dependency graph
**Weaknesses:**
- No caching layer
- Error handling swallows failures
- Unused dependencies
- No testing
- Tight coupling to external schema
- No API versioning strategy
- No health checks
**Suitability for Reference:**
- Repository pattern implementation: **Excellent**
- Multi-provider aggregation: **Good**
- Service orchestration: **Good**
- Error handling: **Poor**
- Testing approach: **Non-existent**
- Production readiness: **Needs work**