Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

20 KiB

Raw Blame History

MiniMediaMetadataAPI - Architecture Analysis

Architectural Pattern

Primary Pattern: Repository Pattern with Service Layer
NOT Clean Architecture - simpler layered approach without strict dependency inversion

Project Structure

MiniMediaMetadataAPI.sln
├── MiniMediaMetadataAPI/              (Web API Layer)
│   ├── Controllers/                   (HTTP endpoints)
│   ├── Middlewares/                   (Request pipeline)
│   ├── Options/                       (Configuration models)
│   └── Program.cs                     (Entry point, DI setup)
├── MiniMediaMetadataAPI.Application/  (Business Logic Layer)
│   ├── Configurations/                (Database config models)
│   ├── Enums/                         (Provider types, result types)
│   ├── Helpers/                       (Utility functions)
│   ├── Models/
│   │   ├── Database/                  (Provider-specific DB models)
│   │   │   ├── Deezer/
│   │   │   ├── Discogs/
│   │   │   ├── MusicBrainz/
│   │   │   ├── SoundCloud/
│   │   │   ├── Spotify/
│   │   │   └── Tidal/
│   │   └── Entities/                  (API response models)
│   ├── Repositories/                  (Data access layer)
│   └── Services/                      (Business logic)
└── MiniMediaMetadataAPI.Tests/        (Test project - empty)

Layer Responsibilities

Web API Layer (MiniMediaMetadataAPI)

Purpose: HTTP interface and request handling

Components:

Controllers (4): SearchArtist, SearchAlbum, SearchTrack, Search
Middleware (1): RequestMiddleware (Prometheus metrics)
Program.cs: DI container configuration, middleware pipeline setup

Dependencies:

ASP.NET Core framework
Swashbuckle (Swagger/OpenAPI)
prometheus-net
References Application layer

Responsibilities:

HTTP request/response handling
Input validation and sanitization
Swagger documentation generation
Metrics collection
Dependency injection configuration

Application Layer (MiniMediaMetadataAPI.Application)

Purpose: Business logic and data access

Components:

Repositories (7 implementations)

SpotifyRepository - Spotify data access
TidalRepository - Tidal data access
MusicBrainzRepository - MusicBrainz data access
DeezerRepository - Deezer data access
DiscogsRepository - Discogs data access
SoundCloudRepository - SoundCloud data access
JobRepository - Job tracking (unused)

Each repository implements:

SearchArtist(string name, int offset)
GetArtistById(string/int/Guid id)
SearchAlbum(string name, string artistId, int offset)
GetAlbumById(string/int/Guid id)
SearchTrack(string name, string artistId, int offset)
GetTrackById(string/int/Guid id)

Services (3 implementations)

SearchArtistService - Orchestrates artist search across providers
SearchAlbumService - Orchestrates album search across providers
SearchTrackService - Orchestrates track search across providers

Dependencies:

Dapper (SQL mapping)
Npgsql (PostgreSQL driver)
FuzzySharp (string similarity)
Polly (resilience)

Responsibilities:

SQL query execution via Dapper
Provider-specific data mapping
Fuzzy search logic
Error handling and logging
Cross-provider aggregation (in services)

Test Layer (MiniMediaMetadataAPI.Tests)

Purpose: Automated testing (currently unused)

Current State:

xUnit framework configured
Single empty test stub: Test1()
0% code coverage
Not executed in CI/CD pipeline

Data Flow

Request Flow (Artist Search Example)

HTTP GET /api/SearchArtist?Name=Beatles&Provider=Any
    ↓
SearchArtistController.Get()
    ↓
Input sanitization (StringHelper.RemoveControlChars)
    ↓
ISearchArtistService.SearchArtist()
    ↓
[Provider=Any] → Query all 6 repositories in parallel
[Provider=Spotify] → Query SpotifyRepository only
    ↓
Repository.SearchArtist()
    ↓
Dapper SQL execution with pg_trgm fuzzy match
    ↓
Map database models → SearchArtistEntity
    ↓
Return SearchArtistResponse (SearchResultType + entities)
    ↓
JSON serialization → HTTP 200 OK

Database Query Flow

Service Layer
    ↓
Repository Interface (ISpotifyRepository)
    ↓
Repository Implementation (SpotifyRepository)
    ↓
Dapper QueryAsync<T>()
    ↓
Npgsql Connection (from pool)
    ↓
PostgreSQL Database
    ↓
pg_trgm similarity search
    ↓
Result set → Dapper mapping → Database models
    ↓
Transform to Entity models
    ↓
Return to Service

Database Access Strategy

ORM Choice: Dapper (NOT Entity Framework)

Rationale:

Lightweight, minimal overhead
Direct SQL control for complex queries
No change tracking (read-only workload)
Better performance for high-throughput reads
Simpler for multi-provider schema

Trade-offs:

No automatic migrations (schema owned externally anyway)
Manual SQL writing (more verbose)
No LINQ query translation
Type safety only at compile time for models

Connection Management

Pooling Configuration:

MinPoolSize=5
MaxPoolSize=100

Connection Lifecycle:

Connections created per request
Returned to pool after query
No long-lived connections
No connection state management

No DbContext: Each repository method opens/closes connections independently.

Query Patterns

Fuzzy Search (pg_trgm):

SET LOCAL pg_trgm.similarity_threshold = 0.5;
SELECT * FROM spotify_artist
WHERE lower(name) % lower(@searchTerm)
ORDER BY similarity(lower(name), lower(@searchTerm)) DESC
LIMIT 20 OFFSET @offset;

Exact ID Lookup:

SELECT * FROM spotify_artist WHERE id = @id;

Join Queries (Album with Artists):

SELECT a.*, ar.* 
FROM spotify_album a
LEFT JOIN spotify_album_artist aa ON a.id = aa.album_id
LEFT JOIN spotify_artist ar ON aa.artist_id = ar.id
WHERE a.id = @albumId;

Schema Ownership Model

Critical Design Decision: This API does NOT own the database schema.

Responsibilities Split

Concern	Owner	Location
Schema definition	MiniMediaScanner	External project
Migrations	MiniMediaScanner	External project
Data ingestion	MiniMediaScanner	External project
Provider API calls	MiniMediaScanner	External project
Data sync scheduling	MiniMediaScanner	External project
Query optimization	MiniMediaMetadataAPI	This project
Read-only queries	MiniMediaMetadataAPI	This project
Response formatting	MiniMediaMetadataAPI	This project

Implications

Pros:

Clear separation of concerns
API doesn't need provider API credentials
Simpler deployment (no migration coordination)
Avoids dual-write complexity
Sync logic isolated from query logic

Cons:

Schema changes require coordination
No control over data freshness
Dependency on external project
Can't optimize schema for query patterns
Breaking schema changes break API

Coupling Points

Table names - Hardcoded in repository SQL
Column names - Hardcoded in Dapper mappings
Data types - Must match C# model properties
Relationships - Foreign keys assumed in joins

No schema validation - API assumes schema exists and matches expectations.

Provider Isolation Strategy

Repository Per Provider

Each provider has dedicated repository implementation:

ISpotifyRepository → SpotifyRepository
ITidalRepository → TidalRepository
IMusicBrainzRepository → MusicBrainzRepository
IDeezerRepository → DeezerRepository
IDiscogsRepository → DiscogsRepository
ISoundCloudRepository → SoundCloudRepository

Benefits:

Provider-specific logic isolated
Schema differences handled independently
Easy to add/remove providers
Clear testing boundaries
No cross-provider contamination

Shared Interface:

public interface IProviderRepository
{
    Task<List<ArtistModel>> SearchArtist(string name, int offset);
    Task<ArtistModel> GetArtistById(string id);
    Task<List<AlbumModel>> SearchAlbum(string name, string artistId, int offset);
    Task<AlbumModel> GetAlbumById(string id);
    Task<List<TrackModel>> SearchTrack(string name, string artistId, int offset);
    Task<TrackModel> GetTrackById(string id);
}

Note: ID types vary by provider (string, int, Guid, long), so actual interfaces use provider-specific types.

Database Models Per Provider

60+ database models organized by provider:

Models/Database/
├── Spotify/
│   ├── SpotifyArtist.cs
│   ├── SpotifyArtistImage.cs
│   ├── SpotifyAlbum.cs
│   ├── SpotifyAlbumArtist.cs
│   ├── SpotifyAlbumImage.cs
│   ├── SpotifyAlbumExternalId.cs
│   ├── SpotifyTrack.cs
│   ├── SpotifyTrackArtist.cs
│   └── SpotifyTrackExternalId.cs
├── Tidal/
│   ├── TidalArtist.cs
│   ├── TidalArtistImageLink.cs
│   ├── TidalAlbum.cs
│   ├── TidalAlbumExternalLink.cs
│   ├── TidalAlbumImage.cs
│   ├── TidalTrack.cs
│   ├── TidalTrackArtist.cs
│   └── TidalTrackExternalLink.cs
├── MusicBrainz/
│   ├── MusicBrainzArtist.cs
│   ├── MusicBrainzRelease.cs
│   ├── MusicBrainzReleaseLabel.cs
│   ├── MusicBrainzLabel.cs
│   ├── MusicBrainzReleaseTrack.cs
│   └── MusicBrainzReleaseTrackArtist.cs
├── Deezer/
│   ├── DeezerArtist.cs
│   ├── DeezerArtistImageLink.cs
│   ├── DeezerAlbum.cs
│   ├── DeezerAlbumImageLink.cs
│   ├── DeezerAlbumArtist.cs
│   ├── DeezerTrack.cs
│   └── DeezerTrackArtist.cs
├── Discogs/
│   ├── DiscogsArtist.cs
│   ├── DiscogsArtistAlias.cs
│   ├── DiscogsArtistUrl.cs
│   ├── DiscogsRelease.cs
│   ├── DiscogsReleaseArtist.cs
│   ├── DiscogsReleaseIdentifier.cs
│   ├── DiscogsReleaseTrack.cs
│   ├── DiscogsLabel.cs
│   ├── DiscogsLabelSublabel.cs
│   └── DiscogsLabelUrl.cs
└── SoundCloud/
    ├── SoundCloudUser.cs
    ├── SoundCloudPlaylist.cs
    ├── SoundCloudTrack.cs
    └── SoundCloudTrackArtist.cs

Mapping Strategy:

Database models map 1:1 to database tables
Dapper auto-maps columns to properties (case-insensitive)
Complex types (arrays, nested objects) handled manually
No navigation properties (manual joins)

Unified Entity Models

API response models are provider-agnostic:

Models/Entities/
├── SearchArtistEntity.cs
├── SearchAlbumEntity.cs
├── SearchTrackEntity.cs
├── ArtistImageEntity.cs
├── AlbumImageEntity.cs
└── TrackImageEntity.cs

Transformation happens in repositories:

// SpotifyRepository
private SearchArtistEntity MapToEntity(SpotifyArtist dbModel)
{
    return new SearchArtistEntity
    {
        ProviderType = ProviderType.Spotify,
        Id = dbModel.Id,
        Name = dbModel.Name,
        Popularity = dbModel.Popularity,
        Url = dbModel.ExternalUrl,
        TotalFollowers = dbModel.Followers,
        Genres = dbModel.Genres,
        Images = MapImages(dbModel.Images),
        LastSyncTime = dbModel.LastSyncTime
    };
}

Service Layer Orchestration

Cross-Provider Search

Services aggregate results from multiple repositories:

public class SearchArtistService : ISearchArtistService
{
    private readonly ISpotifyRepository _spotify;
    private readonly ITidalRepository _tidal;
    private readonly IMusicBrainzRepository _musicBrainz;
    private readonly IDeezerRepository _deezer;
    private readonly IDiscogsRepository _discogs;
    private readonly ISoundCloudRepository _soundCloud;

    public async Task<SearchArtistResponse> SearchArtist(
        string name, 
        ProviderType provider, 
        int offset)
    {
        if (provider == ProviderType.Any)
        {
            // Query all providers in parallel
            var tasks = new[]
            {
                _spotify.SearchArtist(name, offset),
                _tidal.SearchArtist(name, offset),
                _musicBrainz.SearchArtist(name, offset),
                _deezer.SearchArtist(name, offset),
                _discogs.SearchArtist(name, offset),
                _soundCloud.SearchArtist(name, offset)
            };
            
            var results = await Task.WhenAll(tasks);
            var combined = results.SelectMany(r => r).ToList();
            
            return new SearchArtistResponse
            {
                SearchResultType = combined.Any() 
                    ? SearchResultType.Ok 
                    : SearchResultType.NotFound,
                Artists = combined
            };
        }
        else
        {
            // Query single provider
            var repository = GetRepository(provider);
            var results = await repository.SearchArtist(name, offset);
            
            return new SearchArtistResponse
            {
                SearchResultType = results.Any() 
                    ? SearchResultType.Ok 
                    : SearchResultType.NotFound,
                Artists = results
            };
        }
    }
}

Parallel Execution: When Provider=Any, all 6 repositories queried simultaneously via Task.WhenAll().

No Result Deduplication: If same artist exists in multiple providers, returned multiple times with different ProviderType values.

Middleware Pipeline

Single middleware: RequestMiddleware

Purpose: Prometheus metrics collection

Implementation:

public class RequestMiddleware
{
    private static readonly Counter RequestCounter = Metrics
        .CreateCounter(
            "minimediametadataapi_request_total",
            "Total HTTP requests",
            new CounterConfiguration
            {
                LabelNames = new[] { "path", "method", "status" }
            });

    public async Task InvokeAsync(HttpContext context, RequestDelegate next)
    {
        await next(context);
        
        RequestCounter
            .WithLabels(
                context.Request.Path,
                context.Request.Method,
                context.Response.StatusCode.ToString())
            .Inc();
    }
}

Registered in Program.cs:

app.UseMiddleware<RequestMiddleware>();

No other middleware:

No authentication middleware
No rate limiting middleware
No CORS middleware
No exception handling middleware (uses ASP.NET Core default)

Dependency Injection Setup

Program.cs registration:

// Database configuration
builder.Services.Configure<DatabaseConfiguration>(
    builder.Configuration.GetSection("DatabaseConfiguration"));

// Repositories
builder.Services.AddScoped<ISpotifyRepository, SpotifyRepository>();
builder.Services.AddScoped<ITidalRepository, TidalRepository>();
builder.Services.AddScoped<IMusicBrainzRepository, MusicBrainzRepository>();
builder.Services.AddScoped<IDeezerRepository, DeezerRepository>();
builder.Services.AddScoped<IDiscogsRepository, DiscogsRepository>();
builder.Services.AddScoped<ISoundCloudRepository, SoundCloudRepository>();
builder.Services.AddScoped<IJobRepository, JobRepository>();

// Services
builder.Services.AddScoped<ISearchArtistService, SearchArtistService>();
builder.Services.AddScoped<ISearchAlbumService, SearchAlbumService>();
builder.Services.AddScoped<ISearchTrackService, SearchTrackService>();

// Swagger
builder.Services.AddSwaggerGen();

// Controllers
builder.Services.AddControllers();

Lifetime: All components use Scoped lifetime (per-request).

No Singleton services - each request gets fresh instances.

Error Handling Strategy

Repository Level:

public async Task<List<SearchArtistEntity>> SearchArtist(string name, int offset)
{
    try
    {
        using var connection = new NpgsqlConnection(_connectionString);
        var results = await connection.QueryAsync<SpotifyArtist>(sql, parameters);
        return results.Select(MapToEntity).ToList();
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error searching Spotify artists");
        return new List<SearchArtistEntity>();
    }
}

Strategy: Catch all exceptions, log, return empty list.

No custom exceptions - generic Exception catch.

No error propagation - failures silently return empty results.

Implications:

Partial failures in multi-provider search go unnoticed
Client can't distinguish between "no results" and "provider error"
No retry logic (despite Polly dependency)

Configuration Management

appsettings.json structure:

{
  "DatabaseConfiguration": {
    "ConnectionString": "Host=localhost;Database=minimediametadata;Username=user;Password=pass;MinPoolSize=5;MaxPoolSize=100"
  },
  "Prometheus": {
    "MetricsUrl": "/metrics"
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft.AspNetCore": "Warning"
    }
  }
}

Environment-specific overrides:

appsettings.Development.json - logging only
No production-specific config file
Environment variables supported (ASP.NET Core default)

No secrets management:

Database password in plain text
No Azure Key Vault integration
No environment variable requirements

Unused Dependencies

Quartz (3.17.0): Job scheduling framework registered but no jobs defined.

SpotifyAPI.Web.Auth (7.4.2): Spotify authentication library present but unused (MiniMediaScanner handles auth).

Polly (8.6.6): Resilience library registered but no retry policies applied.

Implications: Dependency bloat, potential security vulnerabilities in unused packages.

Scalability Considerations

Horizontal Scaling:

Stateless design (no in-memory state)
Connection pooling supports multiple instances
No distributed locking needed
No session affinity required

Bottlenecks:

Database connection pool (max 100 per instance)
PostgreSQL query performance
No caching layer (every request hits database)

Missing Optimizations:

No Redis/Memcached for result caching
No CDN for static responses
No query result pagination limits (unbounded result sets)

Testing Architecture

Current State: Non-existent

Configured Framework: xUnit

Missing Test Types:

Unit tests (repository logic, service orchestration)
Integration tests (database queries)
API tests (controller endpoints)
Performance tests (load testing)

Testability Issues:

Repositories tightly coupled to Npgsql (hard to mock)
No repository interfaces in some cases
No test database setup scripts
No test data fixtures

File Organization

99 C# files organized as:

Controllers/          4 files
Middlewares/          1 file
Options/              1 file
Configurations/       1 file
Enums/                2 files
Helpers/              2 files
Models/Database/      60+ files (10 per provider average)
Models/Entities/      6 files
Repositories/         7 files
Services/             3 files
Tests/                1 file (stub)

Naming Conventions:

PascalCase for all files
Suffix pattern: *Repository.cs, *Service.cs, *Controller.cs, *Entity.cs
Provider prefix for database models: Spotify*.cs, Tidal*.cs, etc.

Architecture Evaluation

Strengths:

Clear layer separation
Provider isolation via repositories
Parallel query execution for multi-provider search
Lightweight (Dapper over EF)
Simple dependency graph

Weaknesses:

No caching layer
Error handling swallows failures
Unused dependencies
No testing
Tight coupling to external schema
No API versioning strategy
No health checks

Suitability for Reference:

Repository pattern implementation: Excellent
Multi-provider aggregation: Good
Service orchestration: Good
Error handling: Poor
Testing approach: Non-existent
Production readiness: Needs work

20 KiB Raw Blame History