Files
metadata-agregator/docs/research/minimediametadataapi/analysis/INTEGRATIONS.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

851 lines
22 KiB
Markdown

# MiniMediaMetadataAPI - Integration Analysis
## Integration Philosophy
**Critical Distinction:** This API does NOT integrate with external provider APIs.
**Data Source:** Pre-populated PostgreSQL database
**Sync Responsibility:** MiniMediaScanner (separate project)
**API Role:** Query interface only
## Architecture Overview
```
External Providers (Spotify, Tidal, etc.)
MiniMediaScanner (separate project)
↓ (writes)
PostgreSQL Database
↓ (reads)
MiniMediaMetadataAPI (this project)
API Clients
```
**Separation of Concerns:**
- **MiniMediaScanner:** Provider API integration, authentication, rate limiting, data sync
- **MiniMediaMetadataAPI:** Database queries, response formatting, API serving
## Provider Integration Status
### Spotify
**Integration Type:** None (data pre-populated)
**Dependency:** `SpotifyAPI.Web.Auth 7.4.2` (UNUSED)
**Why Dependency Exists:**
- Likely copied from MiniMediaScanner
- Not removed during project split
- Dead code / dependency bloat
**Data Available:**
- Artists (with images, genres, popularity, followers)
- Albums (with images, UPC, label, copyright)
- Tracks (with ISRC, explicit flag, duration)
**Data Sync:** Handled by MiniMediaScanner via Spotify Web API
**Authentication:** Not needed in this API (MiniMediaScanner handles OAuth)
### Tidal
**Integration Type:** None (data pre-populated)
**Dependency:** None
**Data Available:**
- Artists (with image links)
- Albums (with UPC, copyright, explicit flag)
- Tracks (with ISRC, duration)
**Data Sync:** Handled by MiniMediaScanner via Tidal API
**Authentication:** Not needed in this API
### MusicBrainz
**Integration Type:** None (data pre-populated)
**Dependency:** None
**Data Available:**
- Artists (with sort name, type, country)
- Releases (with barcode, status, packaging)
- Labels (with hierarchy)
- Tracks (with ISRC)
**Data Sync:** Handled by MiniMediaScanner via MusicBrainz API
**Authentication:** Not needed (MusicBrainz is open)
### Deezer
**Integration Type:** None (data pre-populated)
**Dependency:** None
**Data Available:**
- Artists (with image links, fans)
- Albums (with genres, fans)
- Tracks (with duration, explicit flag)
**Data Sync:** Handled by MiniMediaScanner via Deezer API
**Authentication:** Not needed in this API
### Discogs
**Integration Type:** None (data pre-populated)
**Dependency:** None
**Data Available:**
- Artists (with aliases, real names, profiles)
- Releases (with identifiers, genres, styles)
- Labels (with hierarchy, contact info)
- Tracks (with disc/track numbers)
**Data Sync:** Handled by MiniMediaScanner via Discogs API
**Authentication:** Not needed in this API
### SoundCloud
**Integration Type:** None (data pre-populated)
**Dependency:** None
**Data Available:**
- Users (with avatars, follower counts)
- Playlists (with artwork, track counts)
- Tracks (with artwork, playback counts, genre)
**Data Sync:** Handled by MiniMediaScanner via SoundCloud API
**Authentication:** Not needed in this API
## Repository Pattern Implementation
### Interface Design
Each provider has dedicated repository interface and implementation.
**Example: ISpotifyRepository**
```csharp
public interface ISpotifyRepository
{
Task<List<SearchArtistEntity>> SearchArtist(string name, int offset);
Task<SearchArtistEntity> GetArtistById(string id);
Task<List<SearchAlbumEntity>> SearchAlbum(string name, string artistId, int offset);
Task<SearchAlbumEntity> GetAlbumById(string id);
Task<List<SearchTrackEntity>> SearchTrack(string name, string artistId, int offset);
Task<SearchTrackEntity> GetTrackById(string id);
}
```
**Implementation: SpotifyRepository**
```csharp
public class SpotifyRepository : ISpotifyRepository
{
private readonly string _connectionString;
private readonly ILogger<SpotifyRepository> _logger;
public SpotifyRepository(
IOptions<DatabaseConfiguration> config,
ILogger<SpotifyRepository> logger)
{
_connectionString = config.Value.ConnectionString;
_logger = logger;
}
public async Task<List<SearchArtistEntity>> SearchArtist(string name, int offset)
{
try
{
using var connection = new NpgsqlConnection(_connectionString);
var sql = @"
SET LOCAL pg_trgm.similarity_threshold = 0.5;
SELECT
a.id,
a.name,
a.popularity,
a.external_url,
a.followers,
a.genres,
a.last_sync_time,
i.url AS image_url,
i.height AS image_height,
i.width AS image_width
FROM spotify_artist a
LEFT JOIN spotify_artist_image i ON a.id = i.artist_id
WHERE lower(a.name) % lower(@searchTerm)
ORDER BY similarity(lower(a.name), lower(@searchTerm)) DESC
LIMIT 20 OFFSET @offset;
";
var artistDict = new Dictionary<string, SearchArtistEntity>();
await connection.QueryAsync<SpotifyArtist, SpotifyArtistImage, SearchArtistEntity>(
sql,
(artist, image) =>
{
if (!artistDict.TryGetValue(artist.Id, out var entity))
{
entity = MapToEntity(artist);
artistDict.Add(artist.Id, entity);
}
if (image != null)
{
entity.Images.Add(MapImageToEntity(image));
}
return entity;
},
new { searchTerm = name, offset },
splitOn: "image_url"
);
return artistDict.Values.ToList();
}
catch (Exception ex)
{
_logger.LogError(ex, "Error searching Spotify artists for term: {SearchTerm}", name);
return new List<SearchArtistEntity>();
}
}
private SearchArtistEntity MapToEntity(SpotifyArtist artist)
{
return new SearchArtistEntity
{
ProviderType = ProviderType.Spotify,
Id = artist.Id,
Name = artist.Name,
Popularity = artist.Popularity,
Url = artist.ExternalUrl,
TotalFollowers = artist.Followers,
Genres = artist.Genres,
Images = new List<ArtistImageEntity>(),
LastSyncTime = artist.LastSyncTime
};
}
}
```
### Repository Variations
**ID Type Differences:**
| Repository | ID Type | C# Type |
|------------|---------|---------|
| SpotifyRepository | VARCHAR | string |
| TidalRepository | INTEGER | int |
| MusicBrainzRepository | UUID | Guid |
| DeezerRepository | BIGINT | long |
| DiscogsRepository | INTEGER | int |
| SoundCloudRepository | BIGINT | long |
**Interface Adaptation:**
```csharp
// Spotify
Task<SearchArtistEntity> GetArtistById(string id);
// Tidal
Task<SearchArtistEntity> GetArtistById(int id);
// MusicBrainz
Task<SearchArtistEntity> GetArtistById(Guid id);
// Deezer
Task<SearchArtistEntity> GetArtistById(long id);
```
**No Common Interface:** Each repository has provider-specific method signatures.
### Provider-Specific Logic
**Discogs Helper:**
```csharp
public static class DiscogsHelper
{
public static int GetDiscNumber(string position)
{
// Discogs stores position as "1-1", "2-3", etc.
// Format: "disc-track"
if (string.IsNullOrEmpty(position))
return 1;
var parts = position.Split('-');
return parts.Length > 0 && int.TryParse(parts[0], out var disc)
? disc
: 1;
}
public static int GetTrackNumber(string position)
{
if (string.IsNullOrEmpty(position))
return 0;
var parts = position.Split('-');
return parts.Length > 1 && int.TryParse(parts[1], out var track)
? track
: 0;
}
}
```
**Usage in DiscogsRepository:**
```csharp
var track = new SearchTrackEntity
{
DiscNumber = DiscogsHelper.GetDiscNumber(dbTrack.Position),
TrackNumber = DiscogsHelper.GetTrackNumber(dbTrack.Position)
};
```
**MusicBrainz Sort Name:**
```csharp
// MusicBrainz stores "Beatles, The" for alphabetical sorting
var artist = new SearchArtistEntity
{
Name = dbArtist.Name, // "The Beatles"
SortName = dbArtist.SortName // "Beatles, The"
};
```
**SoundCloud User vs Artist:**
```csharp
// SoundCloud has "users" not "artists"
var artist = new SearchArtistEntity
{
Name = dbUser.FullName ?? dbUser.Username,
Url = dbUser.Url,
TotalFollowers = dbUser.FollowersCount
};
```
## Service Layer Orchestration
### Cross-Provider Aggregation
**SearchArtistService:**
```csharp
public class SearchArtistService : ISearchArtistService
{
private readonly ISpotifyRepository _spotify;
private readonly ITidalRepository _tidal;
private readonly IMusicBrainzRepository _musicBrainz;
private readonly IDeezerRepository _deezer;
private readonly IDiscogsRepository _discogs;
private readonly ISoundCloudRepository _soundCloud;
private readonly ILogger<SearchArtistService> _logger;
public async Task<SearchArtistResponse> SearchArtist(
string name,
ProviderType provider,
int offset)
{
if (provider == ProviderType.Any)
{
return await SearchAllProviders(name, offset);
}
else
{
return await SearchSingleProvider(name, provider, offset);
}
}
private async Task<SearchArtistResponse> SearchAllProviders(string name, int offset)
{
try
{
var tasks = new[]
{
_spotify.SearchArtist(name, offset),
_tidal.SearchArtist(name, offset),
_musicBrainz.SearchArtist(name, offset),
_deezer.SearchArtist(name, offset),
_discogs.SearchArtist(name, offset),
_soundCloud.SearchArtist(name, offset)
};
var results = await Task.WhenAll(tasks);
var combined = results.SelectMany(r => r).ToList();
return new SearchArtistResponse
{
SearchResultType = combined.Any()
? SearchResultType.Ok
: SearchResultType.NotFound,
Artists = combined
};
}
catch (Exception ex)
{
_logger.LogError(ex, "Error searching all providers for artist: {Name}", name);
return new SearchArtistResponse
{
SearchResultType = SearchResultType.NotFound,
Artists = new List<SearchArtistEntity>()
};
}
}
private async Task<SearchArtistResponse> SearchSingleProvider(
string name,
ProviderType provider,
int offset)
{
try
{
var results = provider switch
{
ProviderType.Spotify => await _spotify.SearchArtist(name, offset),
ProviderType.Tidal => await _tidal.SearchArtist(name, offset),
ProviderType.MusicBrainz => await _musicBrainz.SearchArtist(name, offset),
ProviderType.Deezer => await _deezer.SearchArtist(name, offset),
ProviderType.Discogs => await _discogs.SearchArtist(name, offset),
ProviderType.SoundCloud => await _soundCloud.SearchArtist(name, offset),
_ => new List<SearchArtistEntity>()
};
return new SearchArtistResponse
{
SearchResultType = results.Any()
? SearchResultType.Ok
: SearchResultType.NotFound,
Artists = results
};
}
catch (Exception ex)
{
_logger.LogError(ex, "Error searching {Provider} for artist: {Name}", provider, name);
return new SearchArtistResponse
{
SearchResultType = SearchResultType.NotFound,
Artists = new List<SearchArtistEntity>()
};
}
}
}
```
**Parallel Execution:**
- `Task.WhenAll()` runs all 6 provider queries simultaneously
- Total query time = slowest provider (not sum of all)
- Typical: 20-50ms for all providers (with indexes)
**No Result Deduplication:**
- Same artist from multiple providers returned multiple times
- Each result has `ProviderType` field to distinguish
- Client responsible for deduplication if needed
**Error Handling:**
- Individual provider failures don't fail entire request
- Empty list returned for failed providers
- Logged but not exposed to client
## Helper Utilities
### StringHelper
**File:** `Helpers/StringHelper.cs`
**Methods:**
#### RemoveControlChars
```csharp
public static string RemoveControlChars(string input)
{
if (string.IsNullOrEmpty(input))
return input;
// Remove control characters (0x00-0x1F, 0x7F-0x9F)
return Regex.Replace(input, @"[\x00-\x1F\x7F-\x9F]", string.Empty);
}
```
**Usage:** Sanitize user input before database queries
**Protects Against:**
- Null byte injection
- Terminal escape sequences
- Control character exploits
#### RemoveEmojis
```csharp
public static string RemoveEmojis(string input)
{
if (string.IsNullOrEmpty(input))
return input;
// Remove surrogate pairs (emojis)
return Regex.Replace(input, @"\p{Cs}", string.Empty);
}
```
**Usage:** Clean provider data before storage (in MiniMediaScanner)
**Not Used in API:** Data already cleaned during sync
### DiscogsHelper
**File:** `Helpers/DiscogsHelper.cs`
**Purpose:** Parse Discogs-specific position format
**Methods:**
#### GetDiscNumber
```csharp
public static int GetDiscNumber(string position)
{
// Input: "2-5" (disc 2, track 5)
// Output: 2
if (string.IsNullOrEmpty(position))
return 1;
var parts = position.Split('-');
return parts.Length > 0 && int.TryParse(parts[0], out var disc)
? disc
: 1;
}
```
#### GetTrackNumber
```csharp
public static int GetTrackNumber(string position)
{
// Input: "2-5" (disc 2, track 5)
// Output: 5
if (string.IsNullOrEmpty(position))
return 0;
var parts = position.Split('-');
return parts.Length > 1 && int.TryParse(parts[1], out var track)
? track
: 0;
}
```
**Discogs Position Formats:**
- `"1-1"` - Disc 1, Track 1
- `"2-5"` - Disc 2, Track 5
- `"A1"` - Vinyl side A, track 1 (not handled)
- `"DVD1"` - DVD disc (not handled)
**Limitations:** Only handles numeric disc-track format.
## Job Repository
**File:** `Repositories/JobRepository.cs`
**Purpose:** Track background sync jobs (unused in current implementation)
**Interface:**
```csharp
public interface IJobRepository
{
Task<Job> GetJobById(int id);
Task<List<Job>> GetPendingJobs();
Task CreateJob(Job job);
Task UpdateJobStatus(int id, JobStatus status);
}
```
**Job Model:**
```csharp
public class Job
{
public int Id { get; set; }
public ProviderType Provider { get; set; }
public JobType Type { get; set; } // ArtistSync, AlbumSync, TrackSync
public JobStatus Status { get; set; } // Pending, InProgress, Completed, Failed
public string EntityId { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime? CompletedAt { get; set; }
public string ErrorMessage { get; set; }
}
```
**Current Status:** Registered in DI but never used.
**Intended Use:** Track sync requests from API to MiniMediaScanner (not implemented).
**SearchResultType.InQueueSync:** Enum value exists but never returned.
## Quartz Scheduler Integration
**Dependency:** Quartz 3.17.0
**Configuration:** Registered in DI
**Jobs Defined:** None
**Current Status:** Dead code
**Intended Use:** Scheduled background tasks (speculation):
- Periodic sync triggers
- Stale data cleanup
- Metrics aggregation
**Recommendation:** Remove dependency if not used.
## Polly Resilience Integration
**Dependency:** Polly 8.6.6
**Configuration:** Registered in DI
**Policies Defined:** None
**Current Status:** Dead code
**Intended Use:** Retry policies for database queries (speculation):
```csharp
// NOT IMPLEMENTED
var retryPolicy = Policy
.Handle<NpgsqlException>()
.WaitAndRetryAsync(3, retryAttempt =>
TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)));
await retryPolicy.ExecuteAsync(async () =>
{
return await connection.QueryAsync<SpotifyArtist>(sql, parameters);
});
```
**Recommendation:** Implement retry policies or remove dependency.
## FuzzySharp Integration
**Dependency:** FuzzySharp 2.0.2
**Purpose:** String similarity matching (alternative to pg_trgm)
**Current Status:** Registered but not used
**Intended Use:** Client-side fuzzy matching (speculation):
```csharp
// NOT IMPLEMENTED
var results = await _spotify.SearchArtist(name, offset);
var scored = results.Select(r => new
{
Artist = r,
Score = Fuzz.Ratio(name.ToLower(), r.Name.ToLower())
});
var filtered = scored.Where(s => s.Score >= 70).OrderByDescending(s => s.Score);
```
**Why Not Used:** pg_trgm handles fuzzy search in database (more efficient).
**Recommendation:** Remove dependency if not needed.
## Prometheus Integration
**Dependency:** prometheus-net 8.2.1
**Metrics Exposed:**
### minimediametadataapi_request_total
**Type:** Counter
**Labels:** path, method, status
**Implementation:**
```csharp
public class RequestMiddleware
{
private static readonly Counter RequestCounter = Metrics
.CreateCounter(
"minimediametadataapi_request_total",
"Total HTTP requests",
new CounterConfiguration
{
LabelNames = new[] { "path", "method", "status" }
});
public async Task InvokeAsync(HttpContext context, RequestDelegate next)
{
await next(context);
RequestCounter
.WithLabels(
context.Request.Path,
context.Request.Method,
context.Response.StatusCode.ToString())
.Inc();
}
}
```
**Endpoint:** `/metrics`
**Format:** Prometheus text exposition
**Missing Metrics:**
- Request duration histogram
- Database query duration
- Error rate by provider
- Active requests gauge
- Connection pool usage
## Swagger Integration
**Dependency:** Swashbuckle.AspNetCore 10.1.7
**Configuration:**
```csharp
builder.Services.AddSwaggerGen();
app.UseSwagger();
app.UseSwaggerUI();
```
**Endpoint:** `/swagger`
**Features:**
- Auto-generated from controller attributes
- Interactive API testing
- Request/response schema documentation
- Enum value descriptions
**Customization:** None (default configuration)
**Production Access:** Enabled (no environment check)
## Database Connection Management
**Pattern:** Connection-per-request
**Implementation:**
```csharp
using var connection = new NpgsqlConnection(_connectionString);
await connection.QueryAsync<T>(sql, parameters);
// Connection automatically disposed and returned to pool
```
**No DbContext:** Each repository method creates own connection.
**No Transactions:** Read-only queries don't need transactions.
**Connection Pooling:** Handled by Npgsql driver (configured in connection string).
## Error Handling Strategy
**Repository Level:**
```csharp
try
{
// Database query
}
catch (Exception ex)
{
_logger.LogError(ex, "Error message with context");
return new List<T>(); // Empty result
}
```
**Service Level:**
```csharp
try
{
// Orchestrate repositories
}
catch (Exception ex)
{
_logger.LogError(ex, "Error message with context");
return new Response
{
SearchResultType = SearchResultType.NotFound,
Results = new List<T>()
};
}
```
**Controller Level:**
```csharp
// No try-catch - relies on ASP.NET Core default error handling
var response = await _service.SearchArtist(name, provider, offset);
return Ok(response);
```
**Implications:**
- Errors logged but not exposed to client
- Client can't distinguish between "no results" and "error"
- No retry logic
- No circuit breaker pattern
## Integration Recommendations
### For Production Use
1. **Implement Retry Policies (Polly):**
```csharp
builder.Services.AddHttpClient<ISpotifyRepository, SpotifyRepository>()
.AddTransientHttpErrorPolicy(policy =>
policy.WaitAndRetryAsync(3, retryAttempt =>
TimeSpan.FromSeconds(Math.Pow(2, retryAttempt))));
```
2. **Add Circuit Breaker:**
```csharp
.AddTransientHttpErrorPolicy(policy =>
policy.CircuitBreakerAsync(5, TimeSpan.FromSeconds(30)));
```
3. **Implement Health Checks:**
```csharp
builder.Services.AddHealthChecks()
.AddNpgSql(_connectionString)
.AddCheck<SpotifyRepositoryHealthCheck>("spotify_repository");
```
4. **Add Result Caching:**
```csharp
builder.Services.AddMemoryCache();
builder.Services.AddDistributedRedisCache(options =>
{
options.Configuration = "localhost:6379";
});
```
5. **Implement Request Deduplication:**
```csharp
// Combine results from multiple providers, remove duplicates by name similarity
var deduplicated = DeduplicateArtists(combined, similarityThreshold: 0.9);
```
### For Integration with MiniMediaScanner
**Potential Enhancements:**
1. **Sync Triggering:** API could request sync for missing data
2. **Job Status Tracking:** Use JobRepository to track sync progress
3. **Webhook Notifications:** MiniMediaScanner notifies API of sync completion
4. **Shared Message Queue:** RabbitMQ/Kafka for async communication
**Current Limitation:** No communication channel between projects.
## Integration Evaluation
**Strengths:**
- Clean separation from provider APIs
- Repository pattern isolates provider logic
- Parallel query execution for multi-provider search
- Helper utilities for provider-specific quirks
**Weaknesses:**
- Unused dependencies (Polly, Quartz, FuzzySharp, SpotifyAPI.Web.Auth)
- No retry logic despite Polly dependency
- No caching layer
- Error handling swallows failures
- No communication with MiniMediaScanner
- Job tracking infrastructure unused
**Recommendations:**
- Remove unused dependencies
- Implement retry policies
- Add caching layer (Redis)
- Expose error details to clients
- Consider message queue for MiniMediaScanner integration