Files
metadata-agregator/docs/research/melodee/analysis/DATA.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

34 KiB

Melodee: Data Architecture Analysis

Data Strategy Overview

Melodee employs a dual-database architecture: PostgreSQL 17 for transactional data and SQLite for the MusicBrainz metadata cache. This separation optimizes for different access patterns: PostgreSQL handles concurrent writes and complex queries for user data, while SQLite provides fast read-only access to reference metadata.

The data model spans 40+ entities across six primary domains:

  1. Library: Albums, Artists, Tracks, Genres
  2. Users: Accounts, Settings, Sessions
  3. Playlists: Manual and smart playlists
  4. Scrobbles: Listening history
  5. Metadata: Provider mappings, external IDs
  6. System: Jobs, Logs, Health checks

With 100+ migrations, the schema has evolved significantly, suggesting iterative development and feature additions over time.

Database Architecture

PostgreSQL 17

Selection Rationale:

  • JSONB support: Flexible storage for metadata from multiple providers
  • Full-text search: Efficient library searching without external search engines
  • Concurrent writes: Multiple users can scrobble, create playlists, and update settings simultaneously
  • Mature ecosystem: EF Core 10 provides robust ORM support
  • Advanced features: CTEs, window functions, and materialized views for analytics

Configuration:

Host: postgres (Docker Compose service)
Port: 5432
Database: melodee
User: melodee
Connection Pool: 20 connections (default EF Core)

Performance Tuning:

-- Increase shared buffers for caching
shared_buffers = 256MB

-- Optimize for SSD storage
random_page_cost = 1.1

-- Enable query planning statistics
track_activity_query_size = 2048

SQLite MusicBrainz Cache

Selection Rationale:

  • Embedded: No separate database server required
  • Fast reads: Single-user read-only access is extremely fast
  • Portable: Database file can be copied between systems
  • Offline: No network dependency for metadata lookups

File Location: /data/mb-cache.db

Size: Approximately 2-5 GB depending on MusicBrainz dump version

Update Frequency: Monthly (first day of month via Quartz.NET job)

Entity Relationship Model

Library Domain

Artist Entity

public class Artist
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string SortName { get; set; }
    public string Bio { get; set; }
    public string Country { get; set; }
    public DateTime? FormedDate { get; set; }
    public DateTime? DisbandedDate { get; set; }
    public string ImagePath { get; set; }
    public Dictionary<string, string> ExternalIds { get; set; } // JSONB
    public List<string> Genres { get; set; } // JSONB array
    public DateTime CreatedAt { get; set; }
    public DateTime UpdatedAt { get; set; }
    
    // Navigation properties
    public List<Album> Albums { get; set; }
    public List<Track> Tracks { get; set; }
}

Database Schema:

CREATE TABLE artists (
    id SERIAL PRIMARY KEY,
    name VARCHAR(500) NOT NULL,
    sort_name VARCHAR(500),
    bio TEXT,
    country VARCHAR(2),
    formed_date DATE,
    disbanded_date DATE,
    image_path VARCHAR(1000),
    external_ids JSONB,
    genres JSONB,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_artists_name ON artists(name);
CREATE INDEX idx_artists_sort_name ON artists(sort_name);
CREATE INDEX idx_artists_external_ids ON artists USING GIN(external_ids);
CREATE INDEX idx_artists_genres ON artists USING GIN(genres);

JSONB Example:

{
  "external_ids": {
    "musicbrainz": "a74b1b7f-71a5-4011-9441-d0b5e4122711",
    "spotify": "4Z8W4fKeB5YxbusRsdQVPb",
    "lastfm": "Radiohead",
    "discogs": "3840"
  },
  "genres": ["Alternative Rock", "Art Rock", "Electronic"]
}

GIN Indexes: Generalized Inverted Indexes enable fast JSONB queries:

-- Find artists with Spotify ID
SELECT * FROM artists WHERE external_ids->>'spotify' = '4Z8W4fKeB5YxbusRsdQVPb';

-- Find artists in genre
SELECT * FROM artists WHERE genres @> '["Alternative Rock"]';

Album Entity

public class Album
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string SortTitle { get; set; }
    public int ArtistId { get; set; }
    public Artist Artist { get; set; }
    public DateTime? ReleaseDate { get; set; }
    public string ReleaseType { get; set; } // Album, EP, Single, Compilation
    public string Country { get; set; }
    public string Label { get; set; }
    public string Barcode { get; set; }
    public string CoverArtPath { get; set; }
    public Dictionary<string, string> ExternalIds { get; set; }
    public List<string> Genres { get; set; }
    public int TrackCount { get; set; }
    public int Duration { get; set; } // Total duration in seconds
    public DateTime CreatedAt { get; set; }
    public DateTime UpdatedAt { get; set; }
    
    // Navigation properties
    public List<Track> Tracks { get; set; }
    public List<PlaylistAlbum> PlaylistAlbums { get; set; }
}

Database Schema:

CREATE TABLE albums (
    id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    sort_title VARCHAR(500),
    artist_id INTEGER NOT NULL REFERENCES artists(id) ON DELETE CASCADE,
    release_date DATE,
    release_type VARCHAR(50),
    country VARCHAR(2),
    label VARCHAR(200),
    barcode VARCHAR(50),
    cover_art_path VARCHAR(1000),
    external_ids JSONB,
    genres JSONB,
    track_count INTEGER DEFAULT 0,
    duration INTEGER DEFAULT 0,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_albums_title ON albums(title);
CREATE INDEX idx_albums_artist_id ON albums(artist_id);
CREATE INDEX idx_albums_release_date ON albums(release_date);
CREATE INDEX idx_albums_external_ids ON albums USING GIN(external_ids);
CREATE INDEX idx_albums_genres ON albums USING GIN(genres);

Computed Columns: Track count and duration are denormalized for performance. They're updated via triggers:

CREATE OR REPLACE FUNCTION update_album_stats()
RETURNS TRIGGER AS $$
BEGIN
    UPDATE albums
    SET track_count = (SELECT COUNT(*) FROM tracks WHERE album_id = NEW.album_id),
        duration = (SELECT COALESCE(SUM(duration), 0) FROM tracks WHERE album_id = NEW.album_id)
    WHERE id = NEW.album_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER track_stats_trigger
AFTER INSERT OR UPDATE OR DELETE ON tracks
FOR EACH ROW EXECUTE FUNCTION update_album_stats();

Track Entity

public class Track
{
    public int Id { get; set; }
    public string Title { get; set; }
    public int AlbumId { get; set; }
    public Album Album { get; set; }
    public int ArtistId { get; set; }
    public Artist Artist { get; set; }
    public int Position { get; set; }
    public int DiscNumber { get; set; }
    public int Duration { get; set; } // Duration in seconds
    public string FilePath { get; set; }
    public long FileSize { get; set; }
    public string FileFormat { get; set; } // FLAC, MP3, OGG, etc.
    public int Bitrate { get; set; }
    public int SampleRate { get; set; }
    public int Channels { get; set; }
    public string Codec { get; set; }
    public Dictionary<string, string> ExternalIds { get; set; }
    public List<string> Genres { get; set; }
    public DateTime CreatedAt { get; set; }
    public DateTime UpdatedAt { get; set; }
    
    // Navigation properties
    public List<PlaylistTrack> PlaylistTracks { get; set; }
    public List<Scrobble> Scrobbles { get; set; }
}

Database Schema:

CREATE TABLE tracks (
    id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    album_id INTEGER NOT NULL REFERENCES albums(id) ON DELETE CASCADE,
    artist_id INTEGER NOT NULL REFERENCES artists(id) ON DELETE CASCADE,
    position INTEGER NOT NULL,
    disc_number INTEGER DEFAULT 1,
    duration INTEGER NOT NULL,
    file_path VARCHAR(2000) NOT NULL UNIQUE,
    file_size BIGINT NOT NULL,
    file_format VARCHAR(20) NOT NULL,
    bitrate INTEGER,
    sample_rate INTEGER,
    channels INTEGER,
    codec VARCHAR(50),
    external_ids JSONB,
    genres JSONB,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_tracks_title ON tracks(title);
CREATE INDEX idx_tracks_album_id ON tracks(album_id);
CREATE INDEX idx_tracks_artist_id ON tracks(artist_id);
CREATE INDEX idx_tracks_file_path ON tracks(file_path);
CREATE INDEX idx_tracks_external_ids ON tracks USING GIN(external_ids);

File Path Uniqueness: Ensures the same file isn't imported multiple times. The library scanner checks this index before creating new track records.

Genre Entity

public class Genre
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string ParentGenre { get; set; }
    public int AlbumCount { get; set; }
    public int TrackCount { get; set; }
    public DateTime CreatedAt { get; set; }
    public DateTime UpdatedAt { get; set; }
}

Database Schema:

CREATE TABLE genres (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL UNIQUE,
    parent_genre VARCHAR(100),
    album_count INTEGER DEFAULT 0,
    track_count INTEGER DEFAULT 0,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_genres_name ON genres(name);

Genre Hierarchy:

Rock
├── Alternative Rock
│   ├── Indie Rock
│   └── Post-Rock
├── Progressive Rock
└── Hard Rock

The parent_genre field enables hierarchical genre browsing. Queries can find all "Rock" subgenres recursively.

User Domain

User Entity

public class User
{
    public int Id { get; set; }
    public string Email { get; set; }
    public string PasswordHash { get; set; }
    public string Role { get; set; } // "admin" or "user"
    public bool IsActive { get; set; }
    public string GoogleId { get; set; } // For OAuth users
    public string ProfilePictureUrl { get; set; }
    public DateTime CreatedAt { get; set; }
    public DateTime LastLoginAt { get; set; }
    
    // Navigation properties
    public UserSettings Settings { get; set; }
    public List<Playlist> Playlists { get; set; }
    public List<Scrobble> Scrobbles { get; set; }
    public List<UserSession> Sessions { get; set; }
    public List<Favorite> Favorites { get; set; }
}

Database Schema:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    password_hash VARCHAR(255),
    role VARCHAR(20) NOT NULL DEFAULT 'user',
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    google_id VARCHAR(255) UNIQUE,
    profile_picture_url VARCHAR(1000),
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    last_login_at TIMESTAMP
);

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_google_id ON users(google_id);

Password Hashing: Uses BCrypt with work factor 12:

public string HashPassword(string password)
{
    return BCrypt.Net.BCrypt.HashPassword(password, workFactor: 12);
}

public bool VerifyPassword(string password, string hash)
{
    return BCrypt.Net.BCrypt.Verify(password, hash);
}

UserSettings Entity

public class UserSettings
{
    public int Id { get; set; }
    public int UserId { get; set; }
    public User User { get; set; }
    public string Language { get; set; } // en, es, fr, de, it, pt, ru, ja, zh, ko
    public string Theme { get; set; } // light, dark, auto
    public int TranscodeBitrate { get; set; } // 128, 192, 256, 320
    public string TranscodeFormat { get; set; } // mp3, ogg, opus, aac
    public bool ScrobbleEnabled { get; set; }
    public string LastFmUsername { get; set; }
    public string LastFmSessionKey { get; set; } // Encrypted
    public bool PartyModeEnabled { get; set; }
    public int VolumeLevel { get; set; } // 0-100
    public bool RepeatEnabled { get; set; }
    public bool ShuffleEnabled { get; set; }
    public DateTime UpdatedAt { get; set; }
}

Database Schema:

CREATE TABLE user_settings (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL UNIQUE REFERENCES users(id) ON DELETE CASCADE,
    language VARCHAR(10) DEFAULT 'en',
    theme VARCHAR(20) DEFAULT 'auto',
    transcode_bitrate INTEGER DEFAULT 320,
    transcode_format VARCHAR(20) DEFAULT 'mp3',
    scrobble_enabled BOOLEAN DEFAULT FALSE,
    lastfm_username VARCHAR(100),
    lastfm_session_key VARCHAR(500), -- Encrypted
    party_mode_enabled BOOLEAN DEFAULT FALSE,
    volume_level INTEGER DEFAULT 80,
    repeat_enabled BOOLEAN DEFAULT FALSE,
    shuffle_enabled BOOLEAN DEFAULT FALSE,
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

Encryption: Last.fm session keys are encrypted using ASP.NET Core Data Protection:

var protector = _dataProtectionProvider.CreateProtector("UserSecrets");
var encrypted = protector.Protect(sessionKey);

UserSession Entity

public class UserSession
{
    public int Id { get; set; }
    public int UserId { get; set; }
    public User User { get; set; }
    public string Token { get; set; }
    public string IpAddress { get; set; }
    public string UserAgent { get; set; }
    public DateTime CreatedAt { get; set; }
    public DateTime ExpiresAt { get; set; }
    public DateTime? LastActivityAt { get; set; }
}

Database Schema:

CREATE TABLE user_sessions (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    token VARCHAR(255) NOT NULL UNIQUE,
    ip_address VARCHAR(45),
    user_agent VARCHAR(500),
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    expires_at TIMESTAMP NOT NULL,
    last_activity_at TIMESTAMP
);

CREATE INDEX idx_user_sessions_token ON user_sessions(token);
CREATE INDEX idx_user_sessions_user_id ON user_sessions(user_id);
CREATE INDEX idx_user_sessions_expires_at ON user_sessions(expires_at);

Session Cleanup: Expired sessions are deleted by a Quartz.NET job:

DELETE FROM user_sessions WHERE expires_at < NOW();

Playlist Domain

Playlist Entity

public class Playlist
{
    public int Id { get; set; }
    public string Name { get; set; }
    public int UserId { get; set; }
    public User User { get; set; }
    public bool IsPublic { get; set; }
    public bool IsSmart { get; set; }
    public string SmartQuery { get; set; } // MQL query
    public string Description { get; set; }
    public int TrackCount { get; set; }
    public int Duration { get; set; }
    public DateTime CreatedAt { get; set; }
    public DateTime UpdatedAt { get; set; }
    
    // Navigation properties
    public List<PlaylistTrack> PlaylistTracks { get; set; }
}

Database Schema:

CREATE TABLE playlists (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    is_public BOOLEAN DEFAULT FALSE,
    is_smart BOOLEAN DEFAULT FALSE,
    smart_query TEXT,
    description TEXT,
    track_count INTEGER DEFAULT 0,
    duration INTEGER DEFAULT 0,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_playlists_user_id ON playlists(user_id);
CREATE INDEX idx_playlists_is_public ON playlists(is_public);

PlaylistTrack Entity

public class PlaylistTrack
{
    public int PlaylistId { get; set; }
    public Playlist Playlist { get; set; }
    public int TrackId { get; set; }
    public Track Track { get; set; }
    public int Position { get; set; }
    public DateTime AddedAt { get; set; }
}

Database Schema:

CREATE TABLE playlist_tracks (
    playlist_id INTEGER NOT NULL REFERENCES playlists(id) ON DELETE CASCADE,
    track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE,
    position INTEGER NOT NULL,
    added_at TIMESTAMP NOT NULL DEFAULT NOW(),
    PRIMARY KEY (playlist_id, track_id)
);

CREATE INDEX idx_playlist_tracks_playlist_id ON playlist_tracks(playlist_id);
CREATE INDEX idx_playlist_tracks_position ON playlist_tracks(playlist_id, position);

Position Management: When tracks are added or removed, positions are recalculated:

-- Add track at position 5
UPDATE playlist_tracks
SET position = position + 1
WHERE playlist_id = 1 AND position >= 5;

INSERT INTO playlist_tracks (playlist_id, track_id, position, added_at)
VALUES (1, 420, 5, NOW());

Scrobble Domain

Scrobble Entity

public class Scrobble
{
    public int Id { get; set; }
    public int UserId { get; set; }
    public User User { get; set; }
    public int TrackId { get; set; }
    public Track Track { get; set; }
    public DateTime PlayedAt { get; set; }
    public bool SubmittedToLastFm { get; set; }
    public DateTime? LastFmSubmittedAt { get; set; }
    public DateTime CreatedAt { get; set; }
}

Database Schema:

CREATE TABLE scrobbles (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    track_id INTEGER NOT NULL REFERENCES tracks(id) ON DELETE CASCADE,
    played_at TIMESTAMP NOT NULL,
    submitted_to_lastfm BOOLEAN DEFAULT FALSE,
    lastfm_submitted_at TIMESTAMP,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_scrobbles_user_id ON scrobbles(user_id);
CREATE INDEX idx_scrobbles_track_id ON scrobbles(track_id);
CREATE INDEX idx_scrobbles_played_at ON scrobbles(played_at);
CREATE INDEX idx_scrobbles_submitted_to_lastfm ON scrobbles(submitted_to_lastfm);

Batch Submission: Unsubmitted scrobbles are batched and sent to Last.fm:

SELECT * FROM scrobbles
WHERE submitted_to_lastfm = FALSE
ORDER BY played_at
LIMIT 50;

After successful submission:

UPDATE scrobbles
SET submitted_to_lastfm = TRUE,
    lastfm_submitted_at = NOW()
WHERE id IN (...);

Metadata Domain

MetadataProvider Entity

public class MetadataProvider
{
    public int Id { get; set; }
    public string Name { get; set; } // MusicBrainz, Spotify, Last.fm, etc.
    public int Priority { get; set; }
    public bool IsEnabled { get; set; }
    public string ApiKey { get; set; } // Encrypted
    public string ApiSecret { get; set; } // Encrypted
    public DateTime? LastSyncAt { get; set; }
    public string LastSyncStatus { get; set; }
    public DateTime CreatedAt { get; set; }
    public DateTime UpdatedAt { get; set; }
}

Database Schema:

CREATE TABLE metadata_providers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL UNIQUE,
    priority INTEGER NOT NULL,
    is_enabled BOOLEAN DEFAULT TRUE,
    api_key VARCHAR(500),
    api_secret VARCHAR(500),
    last_sync_at TIMESTAMP,
    last_sync_status VARCHAR(50),
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);

MetadataCache Entity

public class MetadataCache
{
    public int Id { get; set; }
    public string Provider { get; set; }
    public string EntityType { get; set; } // Artist, Album, Track
    public string EntityId { get; set; }
    public string CacheKey { get; set; }
    public string CacheValue { get; set; } // JSON
    public DateTime ExpiresAt { get; set; }
    public DateTime CreatedAt { get; set; }
}

Database Schema:

CREATE TABLE metadata_cache (
    id SERIAL PRIMARY KEY,
    provider VARCHAR(100) NOT NULL,
    entity_type VARCHAR(50) NOT NULL,
    entity_id VARCHAR(255) NOT NULL,
    cache_key VARCHAR(500) NOT NULL,
    cache_value JSONB NOT NULL,
    expires_at TIMESTAMP NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW(),
    UNIQUE (provider, cache_key)
);

CREATE INDEX idx_metadata_cache_expires_at ON metadata_cache(expires_at);
CREATE INDEX idx_metadata_cache_provider_key ON metadata_cache(provider, cache_key);

Cache Invalidation:

DELETE FROM metadata_cache WHERE expires_at < NOW();

System Domain

Job Entity

public class Job
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Status { get; set; } // Queued, Running, Completed, Failed
    public int Progress { get; set; } // 0-100
    public string Message { get; set; }
    public DateTime? StartedAt { get; set; }
    public DateTime? CompletedAt { get; set; }
    public string ErrorMessage { get; set; }
    public DateTime CreatedAt { get; set; }
}

Database Schema:

CREATE TABLE jobs (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    status VARCHAR(50) NOT NULL,
    progress INTEGER DEFAULT 0,
    message TEXT,
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    error_message TEXT,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_jobs_status ON jobs(status);
CREATE INDEX idx_jobs_created_at ON jobs(created_at);

HealthCheck Entity

public class HealthCheck
{
    public int Id { get; set; }
    public string Component { get; set; } // Database, MusicBrainz, Spotify, etc.
    public string Status { get; set; } // Healthy, Degraded, Unhealthy
    public string Message { get; set; }
    public Dictionary<string, object> Data { get; set; }
    public DateTime CheckedAt { get; set; }
}

Database Schema:

CREATE TABLE health_checks (
    id SERIAL PRIMARY KEY,
    component VARCHAR(100) NOT NULL,
    status VARCHAR(50) NOT NULL,
    message TEXT,
    data JSONB,
    checked_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_health_checks_component ON health_checks(component);
CREATE INDEX idx_health_checks_checked_at ON health_checks(checked_at);

MusicBrainz Cache Schema

Release Table

CREATE TABLE mb_release (
    id TEXT PRIMARY KEY,
    title TEXT NOT NULL,
    artist_credit TEXT NOT NULL,
    release_date TEXT,
    country TEXT,
    barcode TEXT,
    release_group_id TEXT,
    updated_at INTEGER NOT NULL
);

CREATE INDEX idx_mb_release_title ON mb_release(title);
CREATE INDEX idx_mb_release_artist_credit ON mb_release(artist_credit);
CREATE INDEX idx_mb_release_barcode ON mb_release(barcode);

Recording Table

CREATE TABLE mb_recording (
    id TEXT PRIMARY KEY,
    title TEXT NOT NULL,
    artist_credit TEXT NOT NULL,
    length INTEGER,
    updated_at INTEGER NOT NULL
);

CREATE INDEX idx_mb_recording_title ON mb_recording(title);
CREATE INDEX idx_mb_recording_artist_credit ON mb_recording(artist_credit);

Artist Table

CREATE TABLE mb_artist (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    sort_name TEXT,
    type TEXT,
    country TEXT,
    begin_date TEXT,
    end_date TEXT,
    updated_at INTEGER NOT NULL
);

CREATE INDEX idx_mb_artist_name ON mb_artist(name);
CREATE INDEX idx_mb_artist_sort_name ON mb_artist(sort_name);

Release Group Table

CREATE TABLE mb_release_group (
    id TEXT PRIMARY KEY,
    title TEXT NOT NULL,
    artist_credit TEXT NOT NULL,
    type TEXT,
    first_release_date TEXT,
    updated_at INTEGER NOT NULL
);

CREATE INDEX idx_mb_release_group_title ON mb_release_group(title);
CREATE INDEX idx_mb_release_group_artist_credit ON mb_release_group(artist_credit);

Data Migration Strategy

Migration Management

EF Core migrations track schema changes:

public partial class InitialCreate : Migration
{
    protected override void Up(MigrationBuilder migrationBuilder)
    {
        migrationBuilder.CreateTable(
            name: "artists",
            columns: table => new
            {
                id = table.Column<int>(nullable: false)
                    .Annotation("Npgsql:ValueGenerationStrategy", NpgsqlValueGenerationStrategy.IdentityByDefaultColumn),
                name = table.Column<string>(maxLength: 500, nullable: false),
                // ...
            });
    }
    
    protected override void Down(MigrationBuilder migrationBuilder)
    {
        migrationBuilder.DropTable(name: "artists");
    }
}

Migration Naming Convention:

20250101000000_InitialCreate.cs
20250115120000_AddUserSettings.cs
20250201093000_AddPlaylistSupport.cs
20250315164500_AddScrobbling.cs

Timestamps ensure migrations apply in chronological order.

Automatic Migration Application

Docker entrypoint applies migrations on startup:

#!/bin/bash
set -e

echo "Applying database migrations..."
dotnet ef database update --project Melodee.Data --no-build

echo "Starting Melodee..."
exec dotnet Melodee.Web.dll

This ensures the database schema matches the application version without manual intervention.

Migration Rollback

Rollback to a specific migration:

dotnet ef database update 20250201093000_AddPlaylistSupport

This reverts all migrations applied after the specified migration.

Caution: Rollback can cause data loss if migrations drop columns or tables. Always backup before rollback.

Data Access Patterns

Repository Pattern

public interface IAlbumRepository
{
    Task<Album> GetByIdAsync(int id);
    Task<List<Album>> GetAllAsync();
    Task<List<Album>> GetByArtistAsync(int artistId);
    Task<List<Album>> SearchAsync(string query);
    Task<Album> CreateAsync(Album album);
    Task UpdateAsync(Album album);
    Task DeleteAsync(int id);
}

public class AlbumRepository : IAlbumRepository
{
    private readonly MelodeeDbContext _context;
    
    public AlbumRepository(MelodeeDbContext context)
    {
        _context = context;
    }
    
    public async Task<Album> GetByIdAsync(int id)
    {
        return await _context.Albums
            .Include(a => a.Artist)
            .Include(a => a.Tracks)
            .FirstOrDefaultAsync(a => a.Id == id);
    }
    
    public async Task<List<Album>> SearchAsync(string query)
    {
        return await _context.Albums
            .Include(a => a.Artist)
            .Where(a => EF.Functions.ILike(a.Title, $"%{query}%") ||
                        EF.Functions.ILike(a.Artist.Name, $"%{query}%"))
            .OrderBy(a => a.Title)
            .ToListAsync();
    }
}

Unit of Work Pattern

public interface IUnitOfWork : IDisposable
{
    IAlbumRepository Albums { get; }
    IArtistRepository Artists { get; }
    ITrackRepository Tracks { get; }
    IPlaylistRepository Playlists { get; }
    IScrobbleRepository Scrobbles { get; }
    
    Task<int> SaveChangesAsync();
}

public class UnitOfWork : IUnitOfWork
{
    private readonly MelodeeDbContext _context;
    
    public UnitOfWork(MelodeeDbContext context)
    {
        _context = context;
        Albums = new AlbumRepository(context);
        Artists = new ArtistRepository(context);
        // ...
    }
    
    public IAlbumRepository Albums { get; }
    public IArtistRepository Artists { get; }
    // ...
    
    public async Task<int> SaveChangesAsync()
    {
        return await _context.SaveChangesAsync();
    }
    
    public void Dispose()
    {
        _context.Dispose();
    }
}

Usage:

public class LibraryService
{
    private readonly IUnitOfWork _unitOfWork;
    
    public async Task<Album> CreateAlbumAsync(CreateAlbumRequest request)
    {
        var artist = await _unitOfWork.Artists.GetByIdAsync(request.ArtistId);
        if (artist == null)
            throw new NotFoundException("Artist not found");
        
        var album = new Album
        {
            Title = request.Title,
            ArtistId = request.ArtistId,
            ReleaseDate = request.ReleaseDate
        };
        
        await _unitOfWork.Albums.CreateAsync(album);
        await _unitOfWork.SaveChangesAsync();
        
        return album;
    }
}

Query Optimization

N+1 Query Problem:

// BAD: N+1 queries
var albums = await _context.Albums.ToListAsync();
foreach (var album in albums)
{
    // Each iteration triggers a separate query
    var artist = await _context.Artists.FindAsync(album.ArtistId);
}

// GOOD: Single query with join
var albums = await _context.Albums
    .Include(a => a.Artist)
    .ToListAsync();

Projection for Performance:

// BAD: Loads entire entity
var albums = await _context.Albums
    .Include(a => a.Artist)
    .Include(a => a.Tracks)
    .ToListAsync();

// GOOD: Projects only needed fields
var albums = await _context.Albums
    .Select(a => new AlbumDto
    {
        Id = a.Id,
        Title = a.Title,
        ArtistName = a.Artist.Name,
        TrackCount = a.Tracks.Count
    })
    .ToListAsync();

Pagination:

public async Task<PagedResult<Album>> GetAlbumsAsync(int page, int pageSize)
{
    var query = _context.Albums.Include(a => a.Artist);
    
    var totalCount = await query.CountAsync();
    var items = await query
        .Skip((page - 1) * pageSize)
        .Take(pageSize)
        .ToListAsync();
    
    return new PagedResult<Album>
    {
        Items = items,
        TotalCount = totalCount,
        Page = page,
        PageSize = pageSize
    };
}

PostgreSQL full-text search for library queries:

-- Add tsvector column
ALTER TABLE albums ADD COLUMN search_vector tsvector;

-- Populate search vector
UPDATE albums
SET search_vector = to_tsvector('english', title || ' ' || COALESCE(label, ''));

-- Create GIN index
CREATE INDEX idx_albums_search_vector ON albums USING GIN(search_vector);

-- Search query
SELECT * FROM albums
WHERE search_vector @@ to_tsquery('english', 'radiohead & computer')
ORDER BY ts_rank(search_vector, to_tsquery('english', 'radiohead & computer')) DESC;

EF Core Integration:

public async Task<List<Album>> FullTextSearchAsync(string query)
{
    return await _context.Albums
        .FromSqlRaw(@"
            SELECT * FROM albums
            WHERE search_vector @@ to_tsquery('english', {0})
            ORDER BY ts_rank(search_vector, to_tsquery('english', {0})) DESC
        ", query)
        .ToListAsync();
}

Data Integrity

Referential Integrity

Foreign key constraints ensure data consistency:

-- Cascade delete: deleting an artist deletes all albums
ALTER TABLE albums
ADD CONSTRAINT fk_albums_artist
FOREIGN KEY (artist_id) REFERENCES artists(id) ON DELETE CASCADE;

-- Restrict delete: cannot delete a track if it's in a playlist
ALTER TABLE playlist_tracks
ADD CONSTRAINT fk_playlist_tracks_track
FOREIGN KEY (track_id) REFERENCES tracks(id) ON DELETE RESTRICT;

Check Constraints

-- Ensure valid rating range
ALTER TABLE albums
ADD CONSTRAINT chk_albums_rating CHECK (rating >= 0 AND rating <= 5);

-- Ensure positive duration
ALTER TABLE tracks
ADD CONSTRAINT chk_tracks_duration CHECK (duration > 0);

-- Ensure valid role
ALTER TABLE users
ADD CONSTRAINT chk_users_role CHECK (role IN ('admin', 'user'));

Unique Constraints

-- Prevent duplicate artists
ALTER TABLE artists ADD CONSTRAINT uq_artists_name UNIQUE (name);

-- Prevent duplicate tracks in same album
ALTER TABLE tracks ADD CONSTRAINT uq_tracks_album_position UNIQUE (album_id, position);

-- Prevent duplicate playlists for same user
ALTER TABLE playlists ADD CONSTRAINT uq_playlists_user_name UNIQUE (user_id, name);

Data Backup and Recovery

PostgreSQL Backup

Automated Daily Backups:

#!/bin/bash
BACKUP_DIR="/backups/postgres"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/melodee_$TIMESTAMP.sql.gz"

pg_dump -h postgres -U melodee melodee | gzip > $BACKUP_FILE

# Retain last 30 days
find $BACKUP_DIR -name "melodee_*.sql.gz" -mtime +30 -delete

Restore from Backup:

gunzip -c /backups/postgres/melodee_20250428_103000.sql.gz | psql -h postgres -U melodee melodee

SQLite Cache Backup

# Copy SQLite database file
cp /data/mb-cache.db /backups/mb-cache_$(date +%Y%m%d).db

SQLite backups are smaller and less critical (cache can be rebuilt from MusicBrainz).

Performance Considerations

Indexing Strategy

Index Coverage:

  • Primary keys: Automatic clustered indexes
  • Foreign keys: Non-clustered indexes for join performance
  • Search fields: Indexes on name, title, email
  • JSONB fields: GIN indexes for JSON queries
  • Full-text search: GIN indexes on tsvector columns

Index Maintenance:

-- Analyze tables for query planner
ANALYZE albums;

-- Reindex to rebuild fragmented indexes
REINDEX TABLE albums;

-- Vacuum to reclaim space
VACUUM ANALYZE albums;

Connection Pooling

EF Core uses connection pooling by default:

services.AddDbContext<MelodeeDbContext>(options =>
{
    options.UseNpgsql(connectionString, npgsqlOptions =>
    {
        npgsqlOptions.MinBatchSize(1);
        npgsqlOptions.MaxBatchSize(100);
        npgsqlOptions.CommandTimeout(30);
    });
});

Connection String:

Host=postgres;Database=melodee;Username=melodee;Password=melodee;Pooling=true;Minimum Pool Size=5;Maximum Pool Size=20;

Query Caching

In-Memory Cache:

public class CachedAlbumRepository : IAlbumRepository
{
    private readonly IAlbumRepository _inner;
    private readonly IMemoryCache _cache;
    
    public async Task<Album> GetByIdAsync(int id)
    {
        return await _cache.GetOrCreateAsync($"album:{id}", async entry =>
        {
            entry.AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15);
            return await _inner.GetByIdAsync(id);
        });
    }
}

Redis Cache (for distributed deployments):

public class RedisAlbumRepository : IAlbumRepository
{
    private readonly IAlbumRepository _inner;
    private readonly IConnectionMultiplexer _redis;
    
    public async Task<Album> GetByIdAsync(int id)
    {
        var db = _redis.GetDatabase();
        var cached = await db.StringGetAsync($"album:{id}");
        
        if (cached.HasValue)
            return JsonSerializer.Deserialize<Album>(cached);
        
        var album = await _inner.GetByIdAsync(id);
        await db.StringSetAsync($"album:{id}", JsonSerializer.Serialize(album), TimeSpan.FromHours(1));
        
        return album;
    }
}

Conclusion

Melodee's data architecture demonstrates thoughtful design for a music server application. The dual-database approach (PostgreSQL for transactional data, SQLite for reference data) optimizes for different access patterns. The 40+ entity model covers all aspects of music library management, user accounts, playlists, and scrobbling.

Key strengths:

  • JSONB for flexibility: External IDs and metadata from multiple providers
  • Full-text search: Fast library searching without external dependencies
  • Automatic migrations: Docker entrypoint ensures schema consistency
  • Repository pattern: Clean separation between data access and business logic
  • Comprehensive indexing: Optimized for common query patterns

Key challenges:

  • 100+ migrations: Complex upgrade paths, potential for migration conflicts
  • Denormalized data: Track counts and durations require trigger maintenance
  • Cache invalidation: Multiple caching layers increase complexity

The architecture positions Melodee for scalability and maintainability while supporting rich metadata aggregation and user features.