Files
metadata-agregator/docs/research/melodee/analysis/DATA.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

1272 lines
34 KiB
Markdown

# Melodee: Data Architecture Analysis
## Data Strategy Overview
Melodee employs a dual-database architecture: PostgreSQL 17 for transactional data and SQLite for the MusicBrainz metadata cache. This separation optimizes for different access patterns: PostgreSQL handles concurrent writes and complex queries for user data, while SQLite provides fast read-only access to reference metadata.
The data model spans 40+ entities across six primary domains:
1. **Library**: Albums, Artists, Tracks, Genres
2. **Users**: Accounts, Settings, Sessions
3. **Playlists**: Manual and smart playlists
4. **Scrobbles**: Listening history
5. **Metadata**: Provider mappings, external IDs
6. **System**: Jobs, Logs, Health checks
With 100+ migrations, the schema has evolved significantly, suggesting iterative development and feature additions over time.
## Database Architecture
### PostgreSQL 17
**Selection Rationale**:
- **JSONB support**: Flexible storage for metadata from multiple providers
- **Full-text search**: Efficient library searching without external search engines
- **Concurrent writes**: Multiple users can scrobble, create playlists, and update settings simultaneously
- **Mature ecosystem**: EF Core 10 provides robust ORM support
- **Advanced features**: CTEs, window functions, and materialized views for analytics
**Configuration**:
```
Host: postgres (Docker Compose service)
Port: 5432
Database: melodee
User: melodee
Connection Pool: 20 connections (default EF Core)
```
**Performance Tuning**:
```sql
-- Increase shared buffers for caching
shared_buffers = 256MB
-- Optimize for SSD storage
random_page_cost = 1.1
-- Enable query planning statistics
track_activity_query_size = 2048
```
### SQLite MusicBrainz Cache
**Selection Rationale**:
- **Embedded**: No separate database server required
- **Fast reads**: Single-user read-only access is extremely fast
- **Portable**: Database file can be copied between systems
- **Offline**: No network dependency for metadata lookups
**File Location**: `/data/mb-cache.db`
**Size**: Approximately 2-5 GB depending on MusicBrainz dump version
**Update Frequency**: Monthly (first day of month via Quartz.NET job)
## Entity Relationship Model
### Library Domain
#### Artist Entity
```csharp
public class Artist
{
public int Id { get; set; }
public string Name { get; set; }
public string SortName { get; set; }
public string Bio { get; set; }
public string Country { get; set; }
public DateTime? FormedDate { get; set; }
public DateTime? DisbandedDate { get; set; }
public string ImagePath { get; set; }
public Dictionary<string, string> ExternalIds { get; set; } // JSONB
public List<string> Genres { get; set; } // JSONB array
public DateTime CreatedAt { get; set; }
public DateTime UpdatedAt { get; set; }
// Navigation properties
public List<Album> Albums { get; set; }
public List<Track> Tracks { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE artists (
id SERIAL PRIMARY KEY,
name VARCHAR(500) NOT NULL,
sort_name VARCHAR(500),
bio TEXT,
country VARCHAR(2),
formed_date DATE,
disbanded_date DATE,
image_path VARCHAR(1000),
external_ids JSONB,
genres JSONB,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_artists_name ON artists(name);
CREATE INDEX idx_artists_sort_name ON artists(sort_name);
CREATE INDEX idx_artists_external_ids ON artists USING GIN(external_ids);
CREATE INDEX idx_artists_genres ON artists USING GIN(genres);
```
**JSONB Example**:
```json
{
"external_ids": {
"musicbrainz": "a74b1b7f-71a5-4011-9441-d0b5e4122711",
"spotify": "4Z8W4fKeB5YxbusRsdQVPb",
"lastfm": "Radiohead",
"discogs": "3840"
},
"genres": ["Alternative Rock", "Art Rock", "Electronic"]
}
```
**GIN Indexes**: Generalized Inverted Indexes enable fast JSONB queries:
```sql
-- Find artists with Spotify ID
SELECT * FROM artists WHERE external_ids->>'spotify' = '4Z8W4fKeB5YxbusRsdQVPb';
-- Find artists in genre
SELECT * FROM artists WHERE genres @> '["Alternative Rock"]';
```
#### Album Entity
```csharp
public class Album
{
public int Id { get; set; }
public string Title { get; set; }
public string SortTitle { get; set; }
public int ArtistId { get; set; }
public Artist Artist { get; set; }
public DateTime? ReleaseDate { get; set; }
public string ReleaseType { get; set; } // Album, EP, Single, Compilation
public string Country { get; set; }
public string Label { get; set; }
public string Barcode { get; set; }
public string CoverArtPath { get; set; }
public Dictionary<string, string> ExternalIds { get; set; }
public List<string> Genres { get; set; }
public int TrackCount { get; set; }
public int Duration { get; set; } // Total duration in seconds
public DateTime CreatedAt { get; set; }
public DateTime UpdatedAt { get; set; }
// Navigation properties
public List<Track> Tracks { get; set; }
public List<PlaylistAlbum> PlaylistAlbums { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE albums (
id SERIAL PRIMARY KEY,
title VARCHAR(500) NOT NULL,
sort_title VARCHAR(500),
artist_id INTEGER NOT NULL REFERENCES artists(id) ON DELETE CASCADE,
release_date DATE,
release_type VARCHAR(50),
country VARCHAR(2),
label VARCHAR(200),
barcode VARCHAR(50),
cover_art_path VARCHAR(1000),
external_ids JSONB,
genres JSONB,
track_count INTEGER DEFAULT 0,
duration INTEGER DEFAULT 0,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_albums_title ON albums(title);
CREATE INDEX idx_albums_artist_id ON albums(artist_id);
CREATE INDEX idx_albums_release_date ON albums(release_date);
CREATE INDEX idx_albums_external_ids ON albums USING GIN(external_ids);
CREATE INDEX idx_albums_genres ON albums USING GIN(genres);
```
**Computed Columns**:
Track count and duration are denormalized for performance. They're updated via triggers:
```sql
CREATE OR REPLACE FUNCTION update_album_stats()
RETURNS TRIGGER AS $$
BEGIN
UPDATE albums
SET track_count = (SELECT COUNT(*) FROM tracks WHERE album_id = NEW.album_id),
duration = (SELECT COALESCE(SUM(duration), 0) FROM tracks WHERE album_id = NEW.album_id)
WHERE id = NEW.album_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER track_stats_trigger
AFTER INSERT OR UPDATE OR DELETE ON tracks
FOR EACH ROW EXECUTE FUNCTION update_album_stats();
```
#### Track Entity
```csharp
public class Track
{
public int Id { get; set; }
public string Title { get; set; }
public int AlbumId { get; set; }
public Album Album { get; set; }
public int ArtistId { get; set; }
public Artist Artist { get; set; }
public int Position { get; set; }
public int DiscNumber { get; set; }
public int Duration { get; set; } // Duration in seconds
public string FilePath { get; set; }
public long FileSize { get; set; }
public string FileFormat { get; set; } // FLAC, MP3, OGG, etc.
public int Bitrate { get; set; }
public int SampleRate { get; set; }
public int Channels { get; set; }
public string Codec { get; set; }
public Dictionary<string, string> ExternalIds { get; set; }
public List<string> Genres { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime UpdatedAt { get; set; }
// Navigation properties
public List<PlaylistTrack> PlaylistTracks { get; set; }
public List<Scrobble> Scrobbles { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE tracks (
id SERIAL PRIMARY KEY,
title VARCHAR(500) NOT NULL,
album_id INTEGER NOT NULL REFERENCES albums(id) ON DELETE CASCADE,
artist_id INTEGER NOT NULL REFERENCES artists(id) ON DELETE CASCADE,
position INTEGER NOT NULL,
disc_number INTEGER DEFAULT 1,
duration INTEGER NOT NULL,
file_path VARCHAR(2000) NOT NULL UNIQUE,
file_size BIGINT NOT NULL,
file_format VARCHAR(20) NOT NULL,
bitrate INTEGER,
sample_rate INTEGER,
channels INTEGER,
codec VARCHAR(50),
external_ids JSONB,
genres JSONB,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_tracks_title ON tracks(title);
CREATE INDEX idx_tracks_album_id ON tracks(album_id);
CREATE INDEX idx_tracks_artist_id ON tracks(artist_id);
CREATE INDEX idx_tracks_file_path ON tracks(file_path);
CREATE INDEX idx_tracks_external_ids ON tracks USING GIN(external_ids);
```
**File Path Uniqueness**: Ensures the same file isn't imported multiple times. The library scanner checks this index before creating new track records.
#### Genre Entity
```csharp
public class Genre
{
public int Id { get; set; }
public string Name { get; set; }
public string ParentGenre { get; set; }
public int AlbumCount { get; set; }
public int TrackCount { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime UpdatedAt { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE genres (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE,
parent_genre VARCHAR(100),
album_count INTEGER DEFAULT 0,
track_count INTEGER DEFAULT 0,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_genres_name ON genres(name);
```
**Genre Hierarchy**:
```
Rock
├── Alternative Rock
│ ├── Indie Rock
│ └── Post-Rock
├── Progressive Rock
└── Hard Rock
```
The `parent_genre` field enables hierarchical genre browsing. Queries can find all "Rock" subgenres recursively.
### User Domain
#### User Entity
```csharp
public class User
{
public int Id { get; set; }
public string Email { get; set; }
public string PasswordHash { get; set; }
public string Role { get; set; } // "admin" or "user"
public bool IsActive { get; set; }
public string GoogleId { get; set; } // For OAuth users
public string ProfilePictureUrl { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime LastLoginAt { get; set; }
// Navigation properties
public UserSettings Settings { get; set; }
public List<Playlist> Playlists { get; set; }
public List<Scrobble> Scrobbles { get; set; }
public List<UserSession> Sessions { get; set; }
public List<Favorite> Favorites { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
password_hash VARCHAR(255),
role VARCHAR(20) NOT NULL DEFAULT 'user',
is_active BOOLEAN NOT NULL DEFAULT TRUE,
google_id VARCHAR(255) UNIQUE,
profile_picture_url VARCHAR(1000),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
last_login_at TIMESTAMP
);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_google_id ON users(google_id);
```
**Password Hashing**: Uses BCrypt with work factor 12:
```csharp
public string HashPassword(string password)
{
return BCrypt.Net.BCrypt.HashPassword(password, workFactor: 12);
}
public bool VerifyPassword(string password, string hash)
{
return BCrypt.Net.BCrypt.Verify(password, hash);
}
```
#### UserSettings Entity
```csharp
public class UserSettings
{
public int Id { get; set; }
public int UserId { get; set; }
public User User { get; set; }
public string Language { get; set; } // en, es, fr, de, it, pt, ru, ja, zh, ko
public string Theme { get; set; } // light, dark, auto
public int TranscodeBitrate { get; set; } // 128, 192, 256, 320
public string TranscodeFormat { get; set; } // mp3, ogg, opus, aac
public bool ScrobbleEnabled { get; set; }
public string LastFmUsername { get; set; }
public string LastFmSessionKey { get; set; } // Encrypted
public bool PartyModeEnabled { get; set; }
public int VolumeLevel { get; set; } // 0-100
public bool RepeatEnabled { get; set; }
public bool ShuffleEnabled { get; set; }
public DateTime UpdatedAt { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE user_settings (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL UNIQUE REFERENCES users(id) ON DELETE CASCADE,
language VARCHAR(10) DEFAULT 'en',
theme VARCHAR(20) DEFAULT 'auto',
transcode_bitrate INTEGER DEFAULT 320,
transcode_format VARCHAR(20) DEFAULT 'mp3',
scrobble_enabled BOOLEAN DEFAULT FALSE,
lastfm_username VARCHAR(100),
lastfm_session_key VARCHAR(500), -- Encrypted
party_mode_enabled BOOLEAN DEFAULT FALSE,
volume_level INTEGER DEFAULT 80,
repeat_enabled BOOLEAN DEFAULT FALSE,
shuffle_enabled BOOLEAN DEFAULT FALSE,
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
```
**Encryption**: Last.fm session keys are encrypted using ASP.NET Core Data Protection:
```csharp
var protector = _dataProtectionProvider.CreateProtector("UserSecrets");
var encrypted = protector.Protect(sessionKey);
```
#### UserSession Entity
```csharp
public class UserSession
{
public int Id { get; set; }
public int UserId { get; set; }
public User User { get; set; }
public string Token { get; set; }
public string IpAddress { get; set; }
public string UserAgent { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime ExpiresAt { get; set; }
public DateTime? LastActivityAt { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE user_sessions (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
token VARCHAR(255) NOT NULL UNIQUE,
ip_address VARCHAR(45),
user_agent VARCHAR(500),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
expires_at TIMESTAMP NOT NULL,
last_activity_at TIMESTAMP
);
CREATE INDEX idx_user_sessions_token ON user_sessions(token);
CREATE INDEX idx_user_sessions_user_id ON user_sessions(user_id);
CREATE INDEX idx_user_sessions_expires_at ON user_sessions(expires_at);
```
**Session Cleanup**: Expired sessions are deleted by a Quartz.NET job:
```sql
DELETE FROM user_sessions WHERE expires_at < NOW();
```
### Playlist Domain
#### Playlist Entity
```csharp
public class Playlist
{
public int Id { get; set; }
public string Name { get; set; }
public int UserId { get; set; }
public User User { get; set; }
public bool IsPublic { get; set; }
public bool IsSmart { get; set; }
public string SmartQuery { get; set; } // MQL query
public string Description { get; set; }
public int TrackCount { get; set; }
public int Duration { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime UpdatedAt { get; set; }
// Navigation properties
public List<PlaylistTrack> PlaylistTracks { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE playlists (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
is_public BOOLEAN DEFAULT FALSE,
is_smart BOOLEAN DEFAULT FALSE,
smart_query TEXT,
description TEXT,
track_count INTEGER DEFAULT 0,
duration INTEGER DEFAULT 0,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_playlists_user_id ON playlists(user_id);
CREATE INDEX idx_playlists_is_public ON playlists(is_public);
```
#### PlaylistTrack Entity
```csharp
public class PlaylistTrack
{
public int PlaylistId { get; set; }
public Playlist Playlist { get; set; }
public int TrackId { get; set; }
public Track Track { get; set; }
public int Position { get; set; }
public DateTime AddedAt { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE playlist_tracks (
playlist_id INTEGER NOT NULL REFERENCES playlists(id) ON DELETE CASCADE,
track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE,
position INTEGER NOT NULL,
added_at TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (playlist_id, track_id)
);
CREATE INDEX idx_playlist_tracks_playlist_id ON playlist_tracks(playlist_id);
CREATE INDEX idx_playlist_tracks_position ON playlist_tracks(playlist_id, position);
```
**Position Management**: When tracks are added or removed, positions are recalculated:
```sql
-- Add track at position 5
UPDATE playlist_tracks
SET position = position + 1
WHERE playlist_id = 1 AND position >= 5;
INSERT INTO playlist_tracks (playlist_id, track_id, position, added_at)
VALUES (1, 420, 5, NOW());
```
### Scrobble Domain
#### Scrobble Entity
```csharp
public class Scrobble
{
public int Id { get; set; }
public int UserId { get; set; }
public User User { get; set; }
public int TrackId { get; set; }
public Track Track { get; set; }
public DateTime PlayedAt { get; set; }
public bool SubmittedToLastFm { get; set; }
public DateTime? LastFmSubmittedAt { get; set; }
public DateTime CreatedAt { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE scrobbles (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE,
played_at TIMESTAMP NOT NULL,
submitted_to_lastfm BOOLEAN DEFAULT FALSE,
lastfm_submitted_at TIMESTAMP,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_scrobbles_user_id ON scrobbles(user_id);
CREATE INDEX idx_scrobbles_track_id ON scrobbles(track_id);
CREATE INDEX idx_scrobbles_played_at ON scrobbles(played_at);
CREATE INDEX idx_scrobbles_submitted_to_lastfm ON scrobbles(submitted_to_lastfm);
```
**Batch Submission**: Unsubmitted scrobbles are batched and sent to Last.fm:
```sql
SELECT * FROM scrobbles
WHERE submitted_to_lastfm = FALSE
ORDER BY played_at
LIMIT 50;
```
After successful submission:
```sql
UPDATE scrobbles
SET submitted_to_lastfm = TRUE,
lastfm_submitted_at = NOW()
WHERE id IN (...);
```
### Metadata Domain
#### MetadataProvider Entity
```csharp
public class MetadataProvider
{
public int Id { get; set; }
public string Name { get; set; } // MusicBrainz, Spotify, Last.fm, etc.
public int Priority { get; set; }
public bool IsEnabled { get; set; }
public string ApiKey { get; set; } // Encrypted
public string ApiSecret { get; set; } // Encrypted
public DateTime? LastSyncAt { get; set; }
public string LastSyncStatus { get; set; }
public DateTime CreatedAt { get; set; }
public DateTime UpdatedAt { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE metadata_providers (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE,
priority INTEGER NOT NULL,
is_enabled BOOLEAN DEFAULT TRUE,
api_key VARCHAR(500),
api_secret VARCHAR(500),
last_sync_at TIMESTAMP,
last_sync_status VARCHAR(50),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
```
#### MetadataCache Entity
```csharp
public class MetadataCache
{
public int Id { get; set; }
public string Provider { get; set; }
public string EntityType { get; set; } // Artist, Album, Track
public string EntityId { get; set; }
public string CacheKey { get; set; }
public string CacheValue { get; set; } // JSON
public DateTime ExpiresAt { get; set; }
public DateTime CreatedAt { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE metadata_cache (
id SERIAL PRIMARY KEY,
provider VARCHAR(100) NOT NULL,
entity_type VARCHAR(50) NOT NULL,
entity_id VARCHAR(255) NOT NULL,
cache_key VARCHAR(500) NOT NULL,
cache_value JSONB NOT NULL,
expires_at TIMESTAMP NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
UNIQUE (provider, cache_key)
);
CREATE INDEX idx_metadata_cache_expires_at ON metadata_cache(expires_at);
CREATE INDEX idx_metadata_cache_provider_key ON metadata_cache(provider, cache_key);
```
**Cache Invalidation**:
```sql
DELETE FROM metadata_cache WHERE expires_at < NOW();
```
### System Domain
#### Job Entity
```csharp
public class Job
{
public int Id { get; set; }
public string Name { get; set; }
public string Status { get; set; } // Queued, Running, Completed, Failed
public int Progress { get; set; } // 0-100
public string Message { get; set; }
public DateTime? StartedAt { get; set; }
public DateTime? CompletedAt { get; set; }
public string ErrorMessage { get; set; }
public DateTime CreatedAt { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE jobs (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
status VARCHAR(50) NOT NULL,
progress INTEGER DEFAULT 0,
message TEXT,
started_at TIMESTAMP,
completed_at TIMESTAMP,
error_message TEXT,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_jobs_status ON jobs(status);
CREATE INDEX idx_jobs_created_at ON jobs(created_at);
```
#### HealthCheck Entity
```csharp
public class HealthCheck
{
public int Id { get; set; }
public string Component { get; set; } // Database, MusicBrainz, Spotify, etc.
public string Status { get; set; } // Healthy, Degraded, Unhealthy
public string Message { get; set; }
public Dictionary<string, object> Data { get; set; }
public DateTime CheckedAt { get; set; }
}
```
**Database Schema**:
```sql
CREATE TABLE health_checks (
id SERIAL PRIMARY KEY,
component VARCHAR(100) NOT NULL,
status VARCHAR(50) NOT NULL,
message TEXT,
data JSONB,
checked_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_health_checks_component ON health_checks(component);
CREATE INDEX idx_health_checks_checked_at ON health_checks(checked_at);
```
## MusicBrainz Cache Schema
### Release Table
```sql
CREATE TABLE mb_release (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
artist_credit TEXT NOT NULL,
release_date TEXT,
country TEXT,
barcode TEXT,
release_group_id TEXT,
updated_at INTEGER NOT NULL
);
CREATE INDEX idx_mb_release_title ON mb_release(title);
CREATE INDEX idx_mb_release_artist_credit ON mb_release(artist_credit);
CREATE INDEX idx_mb_release_barcode ON mb_release(barcode);
```
### Recording Table
```sql
CREATE TABLE mb_recording (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
artist_credit TEXT NOT NULL,
length INTEGER,
updated_at INTEGER NOT NULL
);
CREATE INDEX idx_mb_recording_title ON mb_recording(title);
CREATE INDEX idx_mb_recording_artist_credit ON mb_recording(artist_credit);
```
### Artist Table
```sql
CREATE TABLE mb_artist (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
sort_name TEXT,
type TEXT,
country TEXT,
begin_date TEXT,
end_date TEXT,
updated_at INTEGER NOT NULL
);
CREATE INDEX idx_mb_artist_name ON mb_artist(name);
CREATE INDEX idx_mb_artist_sort_name ON mb_artist(sort_name);
```
### Release Group Table
```sql
CREATE TABLE mb_release_group (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
artist_credit TEXT NOT NULL,
type TEXT,
first_release_date TEXT,
updated_at INTEGER NOT NULL
);
CREATE INDEX idx_mb_release_group_title ON mb_release_group(title);
CREATE INDEX idx_mb_release_group_artist_credit ON mb_release_group(artist_credit);
```
## Data Migration Strategy
### Migration Management
EF Core migrations track schema changes:
```csharp
public partial class InitialCreate : Migration
{
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.CreateTable(
name: "artists",
columns: table => new
{
id = table.Column<int>(nullable: false)
.Annotation("Npgsql:ValueGenerationStrategy", NpgsqlValueGenerationStrategy.IdentityByDefaultColumn),
name = table.Column<string>(maxLength: 500, nullable: false),
// ...
});
}
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.DropTable(name: "artists");
}
}
```
**Migration Naming Convention**:
```
20250101000000_InitialCreate.cs
20250115120000_AddUserSettings.cs
20250201093000_AddPlaylistSupport.cs
20250315164500_AddScrobbling.cs
```
Timestamps ensure migrations apply in chronological order.
### Automatic Migration Application
Docker entrypoint applies migrations on startup:
```bash
#!/bin/bash
set -e
echo "Applying database migrations..."
dotnet ef database update --project Melodee.Data --no-build
echo "Starting Melodee..."
exec dotnet Melodee.Web.dll
```
This ensures the database schema matches the application version without manual intervention.
### Migration Rollback
Rollback to a specific migration:
```bash
dotnet ef database update 20250201093000_AddPlaylistSupport
```
This reverts all migrations applied after the specified migration.
**Caution**: Rollback can cause data loss if migrations drop columns or tables. Always backup before rollback.
## Data Access Patterns
### Repository Pattern
```csharp
public interface IAlbumRepository
{
Task<Album> GetByIdAsync(int id);
Task<List<Album>> GetAllAsync();
Task<List<Album>> GetByArtistAsync(int artistId);
Task<List<Album>> SearchAsync(string query);
Task<Album> CreateAsync(Album album);
Task UpdateAsync(Album album);
Task DeleteAsync(int id);
}
public class AlbumRepository : IAlbumRepository
{
private readonly MelodeeDbContext _context;
public AlbumRepository(MelodeeDbContext context)
{
_context = context;
}
public async Task<Album> GetByIdAsync(int id)
{
return await _context.Albums
.Include(a => a.Artist)
.Include(a => a.Tracks)
.FirstOrDefaultAsync(a => a.Id == id);
}
public async Task<List<Album>> SearchAsync(string query)
{
return await _context.Albums
.Include(a => a.Artist)
.Where(a => EF.Functions.ILike(a.Title, $"%{query}%") ||
EF.Functions.ILike(a.Artist.Name, $"%{query}%"))
.OrderBy(a => a.Title)
.ToListAsync();
}
}
```
### Unit of Work Pattern
```csharp
public interface IUnitOfWork : IDisposable
{
IAlbumRepository Albums { get; }
IArtistRepository Artists { get; }
ITrackRepository Tracks { get; }
IPlaylistRepository Playlists { get; }
IScrobbleRepository Scrobbles { get; }
Task<int> SaveChangesAsync();
}
public class UnitOfWork : IUnitOfWork
{
private readonly MelodeeDbContext _context;
public UnitOfWork(MelodeeDbContext context)
{
_context = context;
Albums = new AlbumRepository(context);
Artists = new ArtistRepository(context);
// ...
}
public IAlbumRepository Albums { get; }
public IArtistRepository Artists { get; }
// ...
public async Task<int> SaveChangesAsync()
{
return await _context.SaveChangesAsync();
}
public void Dispose()
{
_context.Dispose();
}
}
```
**Usage**:
```csharp
public class LibraryService
{
private readonly IUnitOfWork _unitOfWork;
public async Task<Album> CreateAlbumAsync(CreateAlbumRequest request)
{
var artist = await _unitOfWork.Artists.GetByIdAsync(request.ArtistId);
if (artist == null)
throw new NotFoundException("Artist not found");
var album = new Album
{
Title = request.Title,
ArtistId = request.ArtistId,
ReleaseDate = request.ReleaseDate
};
await _unitOfWork.Albums.CreateAsync(album);
await _unitOfWork.SaveChangesAsync();
return album;
}
}
```
### Query Optimization
**N+1 Query Problem**:
```csharp
// BAD: N+1 queries
var albums = await _context.Albums.ToListAsync();
foreach (var album in albums)
{
// Each iteration triggers a separate query
var artist = await _context.Artists.FindAsync(album.ArtistId);
}
// GOOD: Single query with join
var albums = await _context.Albums
.Include(a => a.Artist)
.ToListAsync();
```
**Projection for Performance**:
```csharp
// BAD: Loads entire entity
var albums = await _context.Albums
.Include(a => a.Artist)
.Include(a => a.Tracks)
.ToListAsync();
// GOOD: Projects only needed fields
var albums = await _context.Albums
.Select(a => new AlbumDto
{
Id = a.Id,
Title = a.Title,
ArtistName = a.Artist.Name,
TrackCount = a.Tracks.Count
})
.ToListAsync();
```
**Pagination**:
```csharp
public async Task<PagedResult<Album>> GetAlbumsAsync(int page, int pageSize)
{
var query = _context.Albums.Include(a => a.Artist);
var totalCount = await query.CountAsync();
var items = await query
.Skip((page - 1) * pageSize)
.Take(pageSize)
.ToListAsync();
return new PagedResult<Album>
{
Items = items,
TotalCount = totalCount,
Page = page,
PageSize = pageSize
};
}
```
### Full-Text Search
PostgreSQL full-text search for library queries:
```sql
-- Add tsvector column
ALTER TABLE albums ADD COLUMN search_vector tsvector;
-- Populate search vector
UPDATE albums
SET search_vector = to_tsvector('english', title || ' ' || COALESCE(label, ''));
-- Create GIN index
CREATE INDEX idx_albums_search_vector ON albums USING GIN(search_vector);
-- Search query
SELECT * FROM albums
WHERE search_vector @@ to_tsquery('english', 'radiohead & computer')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'radiohead & computer')) DESC;
```
**EF Core Integration**:
```csharp
public async Task<List<Album>> FullTextSearchAsync(string query)
{
return await _context.Albums
.FromSqlRaw(@"
SELECT * FROM albums
WHERE search_vector @@ to_tsquery('english', {0})
ORDER BY ts_rank(search_vector, to_tsquery('english', {0})) DESC
", query)
.ToListAsync();
}
```
## Data Integrity
### Referential Integrity
Foreign key constraints ensure data consistency:
```sql
-- Cascade delete: deleting an artist deletes all albums
ALTER TABLE albums
ADD CONSTRAINT fk_albums_artist
FOREIGN KEY (artist_id) REFERENCES artists(id) ON DELETE CASCADE;
-- Restrict delete: cannot delete a track if it's in a playlist
ALTER TABLE playlist_tracks
ADD CONSTRAINT fk_playlist_tracks_track
FOREIGN KEY (track_id) REFERENCES tracks(id) ON DELETE RESTRICT;
```
### Check Constraints
```sql
-- Ensure valid rating range
ALTER TABLE albums
ADD CONSTRAINT chk_albums_rating CHECK (rating >= 0 AND rating <= 5);
-- Ensure positive duration
ALTER TABLE tracks
ADD CONSTRAINT chk_tracks_duration CHECK (duration > 0);
-- Ensure valid role
ALTER TABLE users
ADD CONSTRAINT chk_users_role CHECK (role IN ('admin', 'user'));
```
### Unique Constraints
```sql
-- Prevent duplicate artists
ALTER TABLE artists ADD CONSTRAINT uq_artists_name UNIQUE (name);
-- Prevent duplicate tracks in same album
ALTER TABLE tracks ADD CONSTRAINT uq_tracks_album_position UNIQUE (album_id, position);
-- Prevent duplicate playlists for same user
ALTER TABLE playlists ADD CONSTRAINT uq_playlists_user_name UNIQUE (user_id, name);
```
## Data Backup and Recovery
### PostgreSQL Backup
**Automated Daily Backups**:
```bash
#!/bin/bash
BACKUP_DIR="/backups/postgres"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/melodee_$TIMESTAMP.sql.gz"
pg_dump -h postgres -U melodee melodee | gzip > $BACKUP_FILE
# Retain last 30 days
find $BACKUP_DIR -name "melodee_*.sql.gz" -mtime +30 -delete
```
**Restore from Backup**:
```bash
gunzip -c /backups/postgres/melodee_20250428_103000.sql.gz | psql -h postgres -U melodee melodee
```
### SQLite Cache Backup
```bash
# Copy SQLite database file
cp /data/mb-cache.db /backups/mb-cache_$(date +%Y%m%d).db
```
SQLite backups are smaller and less critical (cache can be rebuilt from MusicBrainz).
## Performance Considerations
### Indexing Strategy
**Index Coverage**:
- Primary keys: Automatic clustered indexes
- Foreign keys: Non-clustered indexes for join performance
- Search fields: Indexes on `name`, `title`, `email`
- JSONB fields: GIN indexes for JSON queries
- Full-text search: GIN indexes on tsvector columns
**Index Maintenance**:
```sql
-- Analyze tables for query planner
ANALYZE albums;
-- Reindex to rebuild fragmented indexes
REINDEX TABLE albums;
-- Vacuum to reclaim space
VACUUM ANALYZE albums;
```
### Connection Pooling
EF Core uses connection pooling by default:
```csharp
services.AddDbContext<MelodeeDbContext>(options =>
{
options.UseNpgsql(connectionString, npgsqlOptions =>
{
npgsqlOptions.MinBatchSize(1);
npgsqlOptions.MaxBatchSize(100);
npgsqlOptions.CommandTimeout(30);
});
});
```
**Connection String**:
```
Host=postgres;Database=melodee;Username=melodee;Password=melodee;Pooling=true;Minimum Pool Size=5;Maximum Pool Size=20;
```
### Query Caching
**In-Memory Cache**:
```csharp
public class CachedAlbumRepository : IAlbumRepository
{
private readonly IAlbumRepository _inner;
private readonly IMemoryCache _cache;
public async Task<Album> GetByIdAsync(int id)
{
return await _cache.GetOrCreateAsync($"album:{id}", async entry =>
{
entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15);
return await _inner.GetByIdAsync(id);
});
}
}
```
**Redis Cache** (for distributed deployments):
```csharp
public class RedisAlbumRepository : IAlbumRepository
{
private readonly IAlbumRepository _inner;
private readonly IConnectionMultiplexer _redis;
public async Task<Album> GetByIdAsync(int id)
{
var db = _redis.GetDatabase();
var cached = await db.StringGetAsync($"album:{id}");
if (cached.HasValue)
return JsonSerializer.Deserialize<Album>(cached);
var album = await _inner.GetByIdAsync(id);
await db.StringSetAsync($"album:{id}", JsonSerializer.Serialize(album), TimeSpan.FromHours(1));
return album;
}
}
```
## Conclusion
Melodee's data architecture demonstrates thoughtful design for a music server application. The dual-database approach (PostgreSQL for transactional data, SQLite for reference data) optimizes for different access patterns. The 40+ entity model covers all aspects of music library management, user accounts, playlists, and scrobbling.
Key strengths:
- **JSONB for flexibility**: External IDs and metadata from multiple providers
- **Full-text search**: Fast library searching without external dependencies
- **Automatic migrations**: Docker entrypoint ensures schema consistency
- **Repository pattern**: Clean separation between data access and business logic
- **Comprehensive indexing**: Optimized for common query patterns
Key challenges:
- **100+ migrations**: Complex upgrade paths, potential for migration conflicts
- **Denormalized data**: Track counts and durations require trigger maintenance
- **Cache invalidation**: Multiple caching layers increase complexity
The architecture positions Melodee for scalability and maintainability while supporting rich metadata aggregation and user features.