Files
metadata-agregator/docs/research/minimediametadataapi/analysis/ARCHITECTURE.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

20 KiB

MiniMediaMetadataAPI - Architecture Analysis

Architectural Pattern

Primary Pattern: Repository Pattern with Service Layer
NOT Clean Architecture - simpler layered approach without strict dependency inversion

Project Structure

MiniMediaMetadataAPI.sln
├── MiniMediaMetadataAPI/              (Web API Layer)
│   ├── Controllers/                   (HTTP endpoints)
│   ├── Middlewares/                   (Request pipeline)
│   ├── Options/                       (Configuration models)
│   └── Program.cs                     (Entry point, DI setup)
├── MiniMediaMetadataAPI.Application/  (Business Logic Layer)
│   ├── Configurations/                (Database config models)
│   ├── Enums/                         (Provider types, result types)
│   ├── Helpers/                       (Utility functions)
│   ├── Models/
│   │   ├── Database/                  (Provider-specific DB models)
│   │   │   ├── Deezer/
│   │   │   ├── Discogs/
│   │   │   ├── MusicBrainz/
│   │   │   ├── SoundCloud/
│   │   │   ├── Spotify/
│   │   │   └── Tidal/
│   │   └── Entities/                  (API response models)
│   ├── Repositories/                  (Data access layer)
│   └── Services/                      (Business logic)
└── MiniMediaMetadataAPI.Tests/        (Test project - empty)

Layer Responsibilities

Web API Layer (MiniMediaMetadataAPI)

Purpose: HTTP interface and request handling

Components:

  • Controllers (4): SearchArtist, SearchAlbum, SearchTrack, Search
  • Middleware (1): RequestMiddleware (Prometheus metrics)
  • Program.cs: DI container configuration, middleware pipeline setup

Dependencies:

  • ASP.NET Core framework
  • Swashbuckle (Swagger/OpenAPI)
  • prometheus-net
  • References Application layer

Responsibilities:

  • HTTP request/response handling
  • Input validation and sanitization
  • Swagger documentation generation
  • Metrics collection
  • Dependency injection configuration

Application Layer (MiniMediaMetadataAPI.Application)

Purpose: Business logic and data access

Components:

Repositories (7 implementations)

  1. SpotifyRepository - Spotify data access
  2. TidalRepository - Tidal data access
  3. MusicBrainzRepository - MusicBrainz data access
  4. DeezerRepository - Deezer data access
  5. DiscogsRepository - Discogs data access
  6. SoundCloudRepository - SoundCloud data access
  7. JobRepository - Job tracking (unused)

Each repository implements:

  • SearchArtist(string name, int offset)
  • GetArtistById(string/int/Guid id)
  • SearchAlbum(string name, string artistId, int offset)
  • GetAlbumById(string/int/Guid id)
  • SearchTrack(string name, string artistId, int offset)
  • GetTrackById(string/int/Guid id)

Services (3 implementations)

  1. SearchArtistService - Orchestrates artist search across providers
  2. SearchAlbumService - Orchestrates album search across providers
  3. SearchTrackService - Orchestrates track search across providers

Dependencies:

  • Dapper (SQL mapping)
  • Npgsql (PostgreSQL driver)
  • FuzzySharp (string similarity)
  • Polly (resilience)

Responsibilities:

  • SQL query execution via Dapper
  • Provider-specific data mapping
  • Fuzzy search logic
  • Error handling and logging
  • Cross-provider aggregation (in services)

Test Layer (MiniMediaMetadataAPI.Tests)

Purpose: Automated testing (currently unused)

Current State:

  • xUnit framework configured
  • Single empty test stub: Test1()
  • 0% code coverage
  • Not executed in CI/CD pipeline

Data Flow

Request Flow (Artist Search Example)

HTTP GET /api/SearchArtist?Name=Beatles&Provider=Any
    ↓
SearchArtistController.Get()
    ↓
Input sanitization (StringHelper.RemoveControlChars)
    ↓
ISearchArtistService.SearchArtist()
    ↓
[Provider=Any] → Query all 6 repositories in parallel
[Provider=Spotify] → Query SpotifyRepository only
    ↓
Repository.SearchArtist()
    ↓
Dapper SQL execution with pg_trgm fuzzy match
    ↓
Map database models → SearchArtistEntity
    ↓
Return SearchArtistResponse (SearchResultType + entities)
    ↓
JSON serialization → HTTP 200 OK

Database Query Flow

Service Layer
    ↓
Repository Interface (ISpotifyRepository)
    ↓
Repository Implementation (SpotifyRepository)
    ↓
Dapper QueryAsync<T>()
    ↓
Npgsql Connection (from pool)
    ↓
PostgreSQL Database
    ↓
pg_trgm similarity search
    ↓
Result set → Dapper mapping → Database models
    ↓
Transform to Entity models
    ↓
Return to Service

Database Access Strategy

ORM Choice: Dapper (NOT Entity Framework)

Rationale:

  • Lightweight, minimal overhead
  • Direct SQL control for complex queries
  • No change tracking (read-only workload)
  • Better performance for high-throughput reads
  • Simpler for multi-provider schema

Trade-offs:

  • No automatic migrations (schema owned externally anyway)
  • Manual SQL writing (more verbose)
  • No LINQ query translation
  • Type safety only at compile time for models

Connection Management

Pooling Configuration:

MinPoolSize=5
MaxPoolSize=100

Connection Lifecycle:

  • Connections created per request
  • Returned to pool after query
  • No long-lived connections
  • No connection state management

No DbContext: Each repository method opens/closes connections independently.

Query Patterns

Fuzzy Search (pg_trgm):

SET LOCAL pg_trgm.similarity_threshold = 0.5;
SELECT * FROM spotify_artist
WHERE lower(name) % lower(@searchTerm)
ORDER BY similarity(lower(name), lower(@searchTerm)) DESC
LIMIT 20 OFFSET @offset;

Exact ID Lookup:

SELECT * FROM spotify_artist WHERE id = @id;

Join Queries (Album with Artists):

SELECT a.*, ar.* 
FROM spotify_album a
LEFT JOIN spotify_album_artist aa ON a.id = aa.album_id
LEFT JOIN spotify_artist ar ON aa.artist_id = ar.id
WHERE a.id = @albumId;

Schema Ownership Model

Critical Design Decision: This API does NOT own the database schema.

Responsibilities Split

Concern Owner Location
Schema definition MiniMediaScanner External project
Migrations MiniMediaScanner External project
Data ingestion MiniMediaScanner External project
Provider API calls MiniMediaScanner External project
Data sync scheduling MiniMediaScanner External project
Query optimization MiniMediaMetadataAPI This project
Read-only queries MiniMediaMetadataAPI This project
Response formatting MiniMediaMetadataAPI This project

Implications

Pros:

  • Clear separation of concerns
  • API doesn't need provider API credentials
  • Simpler deployment (no migration coordination)
  • Avoids dual-write complexity
  • Sync logic isolated from query logic

Cons:

  • Schema changes require coordination
  • No control over data freshness
  • Dependency on external project
  • Can't optimize schema for query patterns
  • Breaking schema changes break API

Coupling Points

  1. Table names - Hardcoded in repository SQL
  2. Column names - Hardcoded in Dapper mappings
  3. Data types - Must match C# model properties
  4. Relationships - Foreign keys assumed in joins

No schema validation - API assumes schema exists and matches expectations.

Provider Isolation Strategy

Repository Per Provider

Each provider has dedicated repository implementation:

ISpotifyRepository → SpotifyRepository
ITidalRepository → TidalRepository
IMusicBrainzRepository → MusicBrainzRepository
IDeezerRepository → DeezerRepository
IDiscogsRepository → DiscogsRepository
ISoundCloudRepository → SoundCloudRepository

Benefits:

  • Provider-specific logic isolated
  • Schema differences handled independently
  • Easy to add/remove providers
  • Clear testing boundaries
  • No cross-provider contamination

Shared Interface:

public interface IProviderRepository
{
    Task<List<ArtistModel>> SearchArtist(string name, int offset);
    Task<ArtistModel> GetArtistById(string id);
    Task<List<AlbumModel>> SearchAlbum(string name, string artistId, int offset);
    Task<AlbumModel> GetAlbumById(string id);
    Task<List<TrackModel>> SearchTrack(string name, string artistId, int offset);
    Task<TrackModel> GetTrackById(string id);
}

Note: ID types vary by provider (string, int, Guid, long), so actual interfaces use provider-specific types.

Database Models Per Provider

60+ database models organized by provider:

Models/Database/
├── Spotify/
│   ├── SpotifyArtist.cs
│   ├── SpotifyArtistImage.cs
│   ├── SpotifyAlbum.cs
│   ├── SpotifyAlbumArtist.cs
│   ├── SpotifyAlbumImage.cs
│   ├── SpotifyAlbumExternalId.cs
│   ├── SpotifyTrack.cs
│   ├── SpotifyTrackArtist.cs
│   └── SpotifyTrackExternalId.cs
├── Tidal/
│   ├── TidalArtist.cs
│   ├── TidalArtistImageLink.cs
│   ├── TidalAlbum.cs
│   ├── TidalAlbumExternalLink.cs
│   ├── TidalAlbumImage.cs
│   ├── TidalTrack.cs
│   ├── TidalTrackArtist.cs
│   └── TidalTrackExternalLink.cs
├── MusicBrainz/
│   ├── MusicBrainzArtist.cs
│   ├── MusicBrainzRelease.cs
│   ├── MusicBrainzReleaseLabel.cs
│   ├── MusicBrainzLabel.cs
│   ├── MusicBrainzReleaseTrack.cs
│   └── MusicBrainzReleaseTrackArtist.cs
├── Deezer/
│   ├── DeezerArtist.cs
│   ├── DeezerArtistImageLink.cs
│   ├── DeezerAlbum.cs
│   ├── DeezerAlbumImageLink.cs
│   ├── DeezerAlbumArtist.cs
│   ├── DeezerTrack.cs
│   └── DeezerTrackArtist.cs
├── Discogs/
│   ├── DiscogsArtist.cs
│   ├── DiscogsArtistAlias.cs
│   ├── DiscogsArtistUrl.cs
│   ├── DiscogsRelease.cs
│   ├── DiscogsReleaseArtist.cs
│   ├── DiscogsReleaseIdentifier.cs
│   ├── DiscogsReleaseTrack.cs
│   ├── DiscogsLabel.cs
│   ├── DiscogsLabelSublabel.cs
│   └── DiscogsLabelUrl.cs
└── SoundCloud/
    ├── SoundCloudUser.cs
    ├── SoundCloudPlaylist.cs
    ├── SoundCloudTrack.cs
    └── SoundCloudTrackArtist.cs

Mapping Strategy:

  • Database models map 1:1 to database tables
  • Dapper auto-maps columns to properties (case-insensitive)
  • Complex types (arrays, nested objects) handled manually
  • No navigation properties (manual joins)

Unified Entity Models

API response models are provider-agnostic:

Models/Entities/
├── SearchArtistEntity.cs
├── SearchAlbumEntity.cs
├── SearchTrackEntity.cs
├── ArtistImageEntity.cs
├── AlbumImageEntity.cs
└── TrackImageEntity.cs

Transformation happens in repositories:

// SpotifyRepository
private SearchArtistEntity MapToEntity(SpotifyArtist dbModel)
{
    return new SearchArtistEntity
    {
        ProviderType = ProviderType.Spotify,
        Id = dbModel.Id,
        Name = dbModel.Name,
        Popularity = dbModel.Popularity,
        Url = dbModel.ExternalUrl,
        TotalFollowers = dbModel.Followers,
        Genres = dbModel.Genres,
        Images = MapImages(dbModel.Images),
        LastSyncTime = dbModel.LastSyncTime
    };
}

Service Layer Orchestration

Services aggregate results from multiple repositories:

public class SearchArtistService : ISearchArtistService
{
    private readonly ISpotifyRepository _spotify;
    private readonly ITidalRepository _tidal;
    private readonly IMusicBrainzRepository _musicBrainz;
    private readonly IDeezerRepository _deezer;
    private readonly IDiscogsRepository _discogs;
    private readonly ISoundCloudRepository _soundCloud;

    public async Task<SearchArtistResponse> SearchArtist(
        string name, 
        ProviderType provider, 
        int offset)
    {
        if (provider == ProviderType.Any)
        {
            // Query all providers in parallel
            var tasks = new[]
            {
                _spotify.SearchArtist(name, offset),
                _tidal.SearchArtist(name, offset),
                _musicBrainz.SearchArtist(name, offset),
                _deezer.SearchArtist(name, offset),
                _discogs.SearchArtist(name, offset),
                _soundCloud.SearchArtist(name, offset)
            };
            
            var results = await Task.WhenAll(tasks);
            var combined = results.SelectMany(r => r).ToList();
            
            return new SearchArtistResponse
            {
                SearchResultType = combined.Any() 
                    ? SearchResultType.Ok 
                    : SearchResultType.NotFound,
                Artists = combined
            };
        }
        else
        {
            // Query single provider
            var repository = GetRepository(provider);
            var results = await repository.SearchArtist(name, offset);
            
            return new SearchArtistResponse
            {
                SearchResultType = results.Any() 
                    ? SearchResultType.Ok 
                    : SearchResultType.NotFound,
                Artists = results
            };
        }
    }
}

Parallel Execution: When Provider=Any, all 6 repositories queried simultaneously via Task.WhenAll().

No Result Deduplication: If same artist exists in multiple providers, returned multiple times with different ProviderType values.

Middleware Pipeline

Single middleware: RequestMiddleware

Purpose: Prometheus metrics collection

Implementation:

public class RequestMiddleware
{
    private static readonly Counter RequestCounter = Metrics
        .CreateCounter(
            "minimediametadataapi_request_total",
            "Total HTTP requests",
            new CounterConfiguration
            {
                LabelNames = new[] { "path", "method", "status" }
            });

    public async Task InvokeAsync(HttpContext context, RequestDelegate next)
    {
        await next(context);
        
        RequestCounter
            .WithLabels(
                context.Request.Path,
                context.Request.Method,
                context.Response.StatusCode.ToString())
            .Inc();
    }
}

Registered in Program.cs:

app.UseMiddleware<RequestMiddleware>();

No other middleware:

  • No authentication middleware
  • No rate limiting middleware
  • No CORS middleware
  • No exception handling middleware (uses ASP.NET Core default)

Dependency Injection Setup

Program.cs registration:

// Database configuration
builder.Services.Configure<DatabaseConfiguration>(
    builder.Configuration.GetSection("DatabaseConfiguration"));

// Repositories
builder.Services.AddScoped<ISpotifyRepository, SpotifyRepository>();
builder.Services.AddScoped<ITidalRepository, TidalRepository>();
builder.Services.AddScoped<IMusicBrainzRepository, MusicBrainzRepository>();
builder.Services.AddScoped<IDeezerRepository, DeezerRepository>();
builder.Services.AddScoped<IDiscogsRepository, DiscogsRepository>();
builder.Services.AddScoped<ISoundCloudRepository, SoundCloudRepository>();
builder.Services.AddScoped<IJobRepository, JobRepository>();

// Services
builder.Services.AddScoped<ISearchArtistService, SearchArtistService>();
builder.Services.AddScoped<ISearchAlbumService, SearchAlbumService>();
builder.Services.AddScoped<ISearchTrackService, SearchTrackService>();

// Swagger
builder.Services.AddSwaggerGen();

// Controllers
builder.Services.AddControllers();

Lifetime: All components use Scoped lifetime (per-request).

No Singleton services - each request gets fresh instances.

Error Handling Strategy

Repository Level:

public async Task<List<SearchArtistEntity>> SearchArtist(string name, int offset)
{
    try
    {
        using var connection = new NpgsqlConnection(_connectionString);
        var results = await connection.QueryAsync<SpotifyArtist>(sql, parameters);
        return results.Select(MapToEntity).ToList();
    }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error searching Spotify artists");
        return new List<SearchArtistEntity>();
    }
}

Strategy: Catch all exceptions, log, return empty list.

No custom exceptions - generic Exception catch.

No error propagation - failures silently return empty results.

Implications:

  • Partial failures in multi-provider search go unnoticed
  • Client can't distinguish between "no results" and "provider error"
  • No retry logic (despite Polly dependency)

Configuration Management

appsettings.json structure:

{
  "DatabaseConfiguration": {
    "ConnectionString": "Host=localhost;Database=minimediametadata;Username=user;Password=pass;MinPoolSize=5;MaxPoolSize=100"
  },
  "Prometheus": {
    "MetricsUrl": "/metrics"
  },
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft.AspNetCore": "Warning"
    }
  }
}

Environment-specific overrides:

  • appsettings.Development.json - logging only
  • No production-specific config file
  • Environment variables supported (ASP.NET Core default)

No secrets management:

  • Database password in plain text
  • No Azure Key Vault integration
  • No environment variable requirements

Unused Dependencies

Quartz (3.17.0): Job scheduling framework registered but no jobs defined.

SpotifyAPI.Web.Auth (7.4.2): Spotify authentication library present but unused (MiniMediaScanner handles auth).

Polly (8.6.6): Resilience library registered but no retry policies applied.

Implications: Dependency bloat, potential security vulnerabilities in unused packages.

Scalability Considerations

Horizontal Scaling:

  • Stateless design (no in-memory state)
  • Connection pooling supports multiple instances
  • No distributed locking needed
  • No session affinity required

Bottlenecks:

  • Database connection pool (max 100 per instance)
  • PostgreSQL query performance
  • No caching layer (every request hits database)

Missing Optimizations:

  • No Redis/Memcached for result caching
  • No CDN for static responses
  • No query result pagination limits (unbounded result sets)

Testing Architecture

Current State: Non-existent

Configured Framework: xUnit

Missing Test Types:

  • Unit tests (repository logic, service orchestration)
  • Integration tests (database queries)
  • API tests (controller endpoints)
  • Performance tests (load testing)

Testability Issues:

  • Repositories tightly coupled to Npgsql (hard to mock)
  • No repository interfaces in some cases
  • No test database setup scripts
  • No test data fixtures

File Organization

99 C# files organized as:

Controllers/          4 files
Middlewares/          1 file
Options/              1 file
Configurations/       1 file
Enums/                2 files
Helpers/              2 files
Models/Database/      60+ files (10 per provider average)
Models/Entities/      6 files
Repositories/         7 files
Services/             3 files
Tests/                1 file (stub)

Naming Conventions:

  • PascalCase for all files
  • Suffix pattern: *Repository.cs, *Service.cs, *Controller.cs, *Entity.cs
  • Provider prefix for database models: Spotify*.cs, Tidal*.cs, etc.

Architecture Evaluation

Strengths:

  • Clear layer separation
  • Provider isolation via repositories
  • Parallel query execution for multi-provider search
  • Lightweight (Dapper over EF)
  • Simple dependency graph

Weaknesses:

  • No caching layer
  • Error handling swallows failures
  • Unused dependencies
  • No testing
  • Tight coupling to external schema
  • No API versioning strategy
  • No health checks

Suitability for Reference:

  • Repository pattern implementation: Excellent
  • Multi-provider aggregation: Good
  • Service orchestration: Good
  • Error handling: Poor
  • Testing approach: Non-existent
  • Production readiness: Needs work