Files

T

Alexander a1f6701bac feat: initial implementation of metadata aggregator

- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects

2026-04-28 16:28:53 +02:00

23 KiB

Raw Blame History

Meelo Architecture

System Overview

Meelo implements a microservices architecture with four application services and four infrastructure services, orchestrated via Docker Compose. Each service has a single responsibility and communicates through well-defined interfaces (REST APIs, message queues).

┌─────────────────────────────────────────────────────────────┐
│                          Nginx                              │
│  Reverse Proxy (Port 80)                                    │
│  Routes: / → Front, /api/ → Server, /scanner/ → Scanner    │
└─────────────────────────────────────────────────────────────┘
         │              │                │              │
    ┌────┘         ┌────┘           ┌────┘         ┌────┘
    │              │                │              │
┌───▼────┐   ┌────▼─────┐   ┌──────▼───┐   ┌──────▼────┐
│ Front  │   │  Server  │   │ Scanner  │   │  Matcher  │
│ Next.js│   │  NestJS  │   │   Go     │   │  FastAPI  │
│ :3000  │   │  :4000   │   │  :8133   │   │   :6789   │
└────────┘   └────┬─────┘   └────┬─────┘   └─────┬─────┘
                  │              │               │
         ┌────────┼──────────────┼───────────────┘
         │        │              │
    ┌────▼───┐ ┌─▼──────────┐ ┌─▼──────────┐
    │ Postgres│ │ MeiliSearch│ │  RabbitMQ  │
    │  :5432  │ │   :7700    │ │   :5672    │
    └─────────┘ └────────────┘ └────────────┘

Service Responsibilities

Server (NestJS 11, TypeScript)

Port: 4000
Database: PostgreSQL via Prisma ORM
Search: MeiliSearch client
Messaging: RabbitMQ publisher

Module Structure

NestJS organizes code into modules. Each module encapsulates related functionality:

Core Domain Modules

ArtistModule: CRUD operations, relationships to albums/songs/videos
AlbumModule: Album management, release associations
SongModule: Song entities, track relationships, lyrics
TrackModule: Individual track instances (audio/video)
ReleaseModule: Physical/digital release variants
GenreModule: Genre taxonomy and associations
VideoModule: Music video management

Supporting Modules

AuthModule: JWT authentication, user registration, login
UserModule: User management, preferences, scrobbler connections
LibraryModule: Library configuration, scan triggers
FileModule: File metadata, checksums, fingerprints
PlaylistModule: Playlist CRUD, entry management
LyricsModule: Plain and synced lyrics storage

Integration Modules

ExternalMetadataModule: Provider data aggregation
SearchModule: MeiliSearch indexing and queries
ScrobblerModule: Last.fm and ListenBrainz integration
StreamModule: Audio/video streaming endpoints
EventsModule: WebSocket notifications for UI updates

Infrastructure Modules

PrismaModule: Database connection and ORM
MeiliSearchModule: Search client configuration
RabbitMQModule: Message queue publisher

Data Flow

Incoming Request: Nginx forwards to Server at /api/*
Controller: Route handler validates request, extracts JWT
Service: Business logic executes, calls Prisma for data
Repository: Prisma queries PostgreSQL
Response: JSON returned to client

For write operations:

Service updates database via Prisma
Service publishes event to RabbitMQ (if needed)
Service updates MeiliSearch index
Service emits WebSocket event for live UI updates

Authentication Flow

User submits credentials to /api/auth/login
AuthService validates against bcrypt hash in database
JWT signed with JWT_SIGNATURE from .env
Token returned to client
Client includes token in Authorization: Bearer <token> header
JwtStrategy validates token on protected routes
User object attached to request context

Anonymous mode (ALLOW_ANONYMOUS=1) bypasses this flow.

Scrobbling Flow

User authorizes Last.fm via OAuth (callback to /api/scrobblers/lastfm/callback)
Server exchanges code for access token
Token stored in UserScrobbler table
On track play, ScrobblerService posts to Last.fm API
ListenBrainz uses simpler token-based auth (user provides token directly)

Search Integration

On entity creation/update, service calls MeiliSearchService.index()
Service transforms entity to search document
Document pushed to MeiliSearch via HTTP API
Client queries /api/search?q=<term>
Server forwards to MeiliSearch
Results enriched with database data (illustrations, counts)
JSON returned to client

Scanner (Go 1.25, Echo v5)

Port: 8133
Framework: Echo HTTP server
Dependencies: FFmpeg, FFprobe, AcoustID

Responsibilities

Filesystem Watching: Monitor library directories for changes
Metadata Extraction: Parse audio/video files using FFprobe
Fingerprinting: Generate AcoustID fingerprints for matching
Filename Parsing: Apply regex from settings.json to extract metadata
File Registration: POST file metadata to Server API
Match Triggering: Publish events to RabbitMQ for Matcher consumption

Scan Process

Trigger: POST to /scanner/scan/:libraryId or filesystem event
Discovery: Walk directory tree, filter by extension (.mp3, .flac, .m4a, .mkv, etc.)
Extraction: For each file:
- Run FFprobe to get duration, bitrate, codec, embedded tags
- Generate AcoustID fingerprint using chromaprint
- Parse filename using regex from settings.json
- Calculate file checksum (SHA256)
Registration: POST to Server /api/files with:
- File path
- Checksum
- Fingerprint
- Extracted metadata (title, artist, album, track number)
- Technical details (duration, bitrate, codec)
Event Publishing: Publish to RabbitMQ queue file.added with file ID
Repeat: Process next file

Filename Regex

Settings.json contains trackRegex pattern. Example:

(?P<artist>[^/]+)/(?P<album>[^/]+)/(?P<disc>\d+)-(?P<track>\d+) (?P<title>.+)\.(?P<ext>\w+)

Named capture groups extract metadata when embedded tags are missing or untrusted.

Health Monitoring

Scanner exposes GET / endpoint. Returns JSON with:

Service status
Active scan tasks
Last scan timestamp
Library statistics

Docker health check hits this endpoint every 30 seconds.

Error Handling

File Read Errors: Log and skip file, continue scan
FFprobe Failures: Retry once, then skip
Server API Errors: Retry with exponential backoff (max 3 attempts)
RabbitMQ Unavailable: Queue events in memory, flush when connection restored

Matcher (Python 3.14, FastAPI)

Port: 6789
Framework: FastAPI with async HTTP
Messaging: RabbitMQ consumer

Responsibilities

Event Consumption: Listen to RabbitMQ file.added queue
Provider Queries: Fetch metadata from 8 external sources
Data Aggregation: Merge results based on priority in settings.json
Metadata Push: POST enriched data to Server API

Provider Architecture

Each provider is a separate module implementing a common interface:

class Provider(ABC):
    @abstractmethod
    async def search_track(self, fingerprint: str, title: str, artist: str) -> Optional[TrackMetadata]:
        pass
    
    @abstractmethod
    async def fetch_artist(self, artist_id: str) -> Optional[ArtistMetadata]:
        pass
    
    @abstractmethod
    async def fetch_album(self, album_id: str) -> Optional[AlbumMetadata]:
        pass

Provider Modules

musicbrainz.py: Primary database, uses musicbrainzngs library
genius.py: Lyrics and song descriptions, requires API token
wikipedia.py: Artist/album context, uses Wikipedia API
wikidata.py: Structured data (areas, relationships), SPARQL queries
discogs.py: Release details, requires API token
allmusic.py: Editorial reviews, web scraping (no official API)
metacritic.py: Critic scores, web scraping
lrclib.py: Synced lyrics, public API

Matching Flow

Event Received: RabbitMQ delivers file.added message with file ID
File Fetch: GET /api/files/:id from Server to retrieve metadata
Provider Selection: Read settings.json for enabled providers and priority
Parallel Queries: Launch async tasks for each provider:
- MusicBrainz: Query by AcoustID fingerprint
- Genius: Search by title + artist
- Wikipedia: Search by artist name
- Wikidata: Query by MusicBrainz ID (if found)
- Discogs: Search by release title
- AllMusic: Scrape by artist + album
- Metacritic: Scrape by album title
- LrcLib: Search by title + artist + duration
Result Aggregation: Merge results based on priority:
- MusicBrainz IDs take precedence
- Lyrics: prefer synced (LrcLib) over plain (Genius)
- Descriptions: concatenate from multiple sources
- Ratings: average across providers
Metadata Push: POST to Server /api/external-metadata with:
- Track/album/artist IDs
- Descriptions
- Ratings
- Source URLs
- Provider names
Acknowledgment: ACK message to RabbitMQ

Rate Limiting

Providers have different rate limits:

MusicBrainz: 1 request/second (enforced by library)
Genius: 10 requests/second (API limit)
Wikipedia: No official limit, use 5 requests/second
Wikidata: No limit, SPARQL endpoint is fast
Discogs: 60 requests/minute (API limit)
AllMusic: No API, scraping limited to 1 request/second
Metacritic: No API, scraping limited to 1 request/second
LrcLib: No official limit, use 10 requests/second

Matcher implements per-provider rate limiters using aiolimiter.

Error Handling

Provider Timeout: Skip provider, continue with others
HTTP Errors: Retry with exponential backoff (max 3 attempts)
Parsing Errors: Log and skip provider result
Server API Errors: NACK message to RabbitMQ for redelivery
No Results: Push empty metadata (Server marks as "not found")

Configuration

Settings.json controls provider behavior:

{
  "providers": {
    "musicbrainz": { "enabled": true },
    "genius": { "enabled": true, "token": "..." },
    "wikipedia": { "enabled": true },
    "wikidata": { "enabled": true },
    "discogs": { "enabled": false },
    "allmusic": { "enabled": false },
    "metacritic": { "enabled": false },
    "lrclib": { "enabled": true }
  },
  "metadata": {
    "order": ["musicbrainz", "genius", "wikipedia", "lrclib"]
  }
}

Disabled providers are skipped. Order determines priority for conflicting data.

Front (Next.js 16, React)

Port: 3000
Framework: Next.js with SSR
UI: Material-UI components
State: Jotai atoms
Data Fetching: TanStack Query
i18n: i18next

Responsibilities

User Interface: Render pages for browsing, playback, settings
API Communication: Fetch data from Server via REST
State Management: Manage playback queue, user preferences, auth tokens
Internationalization: Support multiple languages

Page Structure

/: Home page with recent albums, top artists
/artists: Artist grid with search
/artists/:id: Artist detail with albums, songs, videos
/albums: Album grid with filters
/albums/:id: Album detail with tracks, releases
/songs: Song list with search
/songs/:id: Song detail with tracks, lyrics
/playlists: User playlists
/playlists/:id: Playlist detail with tracks
/videos: Music video grid
/videos/:id: Video player
/search: Global search results
/settings: User preferences, library management, scrobbler setup

State Management

Jotai atoms store global state:

authAtom: JWT token, user info
playbackAtom: Current track, queue, position, volume
settingsAtom: Theme, language, playback preferences

TanStack Query caches API responses:

useArtists(): Fetch artist list
useArtist(id): Fetch artist detail
useAlbums(): Fetch album list
useAlbum(id): Fetch album detail
useTracks(): Fetch track list
useSearch(query): Fetch search results

Queries invalidate on mutations (create playlist, update settings).

Playback Flow

User clicks track
playbackAtom updated with track ID
Component fetches stream URL: /api/tracks/:id/stream
HTML5 <audio> element loads stream
Playback starts
On play event, POST to /api/scrobblers/scrobble (if enabled)
On track end, advance queue, repeat flow

Video playback uses <video> element with transcoder stream.

Mobile App

Expo/React Native app shares components and state logic with web. Differences:

Navigation: React Navigation instead of Next.js router
Storage: AsyncStorage instead of localStorage
Media: expo-av instead of HTML5 audio/video
Notifications: expo-notifications for background playback

Monorepo structure:

front/
  web/          # Next.js app
  mobile/       # Expo app
  shared/       # Common components, hooks, state

Internationalization

i18next with JSON translation files:

locales/
  en/
    common.json
    artist.json
    album.json
  fr/
    common.json
    artist.json
    album.json

Language switcher in settings. Detects browser locale on first visit.

Infrastructure Services

PostgreSQL

Port: 5432
Image: postgres:alpine3.14
Volume: meelo_db

Stores all persistent data. Prisma manages schema migrations. Health check via pg_isready.

MeiliSearch

Port: 7700
Image: meilisearch:v1.5
Volume: meelo_search

Indexes artists, albums, songs, videos. Configured with:

Searchable attributes: name, title, artist names
Filterable attributes: genre, year, type
Sortable attributes: releaseDate, name
Ranking rules: typo, words, proximity, attribute, sort, exactness

Health check via GET /health.

RabbitMQ

Port: 5672 (AMQP), 15672 (management UI)
Image: rabbitmq:4.2-alpine
Volume: meelo_rabbitmq_data

Message queue for event-driven architecture. Queues:

file.added: Scanner publishes, Matcher consumes
metadata.updated: Matcher publishes, Server consumes (future use)

Health check via rabbitmq-diagnostics ping.

Kyoo Transcoder

Port: 7666
Volume: meelo_transcoder_cache

Transcodes video files for web playback. Supports:

Adaptive bitrate streaming (HLS)
Multiple resolutions (480p, 720p, 1080p)
Codec conversion (H.264, VP9)
Subtitle burning

Server proxies requests to transcoder. Client receives HLS manifest.

Nginx

Port: 80
Image: nginx:1.29.7-alpine
Config: Mounted from nginx.conf

Routes requests to services:

location / {
    proxy_pass http://front:3000;
}

location /api/ {
    proxy_pass http://server:4000;
}

location /scanner/ {
    proxy_pass http://scanner:8133;
}

location /matcher/ {
    proxy_pass http://matcher:6789;
}

Handles WebSocket upgrades for Server events.

Inter-Service Communication

REST APIs

Front → Server: All data fetching (artists, albums, tracks, playlists)
Scanner → Server: File registration, library queries
Matcher → Server: Metadata push, file queries
Server → MeiliSearch: Index updates, search queries
Server → Transcoder: Video stream requests

Message Queue

Scanner → RabbitMQ: Publish file.added events
RabbitMQ → Matcher: Deliver file.added events

Database

Server → PostgreSQL: All CRUD operations via Prisma

Startup Orchestration

Docker Compose defines service dependencies and health checks:

PostgreSQL starts first, health check via pg_isready
MeiliSearch starts, health check via GET /health
RabbitMQ starts, health check via rabbitmq-diagnostics ping
Server starts after database/search/queue are healthy
- Runs Prisma migrations
- Seeds initial data (admin user if none exists)
- Connects to MeiliSearch and RabbitMQ
Scanner starts after Server is healthy
- Registers with Server API
- Begins filesystem watching
Matcher starts after Server and RabbitMQ are healthy
- Connects to RabbitMQ
- Begins consuming events
Front starts after Server is healthy
- SSR requires Server API for initial data
Transcoder starts independently (no dependencies)
Nginx starts last, after all application services are healthy

Health checks run every 30 seconds. Unhealthy services restart automatically.

Data Consistency

Transactions

Prisma transactions ensure atomicity:

await prisma.$transaction([
  prisma.song.create({ data: songData }),
  prisma.track.create({ data: trackData }),
  prisma.file.update({ where: { id: fileId }, data: { trackId } })
]);

If any operation fails, all rollback.

Event Ordering

RabbitMQ guarantees message order per queue. Matcher processes events sequentially to avoid race conditions.

Search Consistency

MeiliSearch updates are asynchronous. Brief window where database and search index diverge. Acceptable for this use case (eventual consistency).

Cache Invalidation

TanStack Query invalidates caches on mutations:

const mutation = useMutation({
  mutationFn: createPlaylist,
  onSuccess: () => {
    queryClient.invalidateQueries(['playlists']);
  }
});

Scalability Considerations

Horizontal Scaling

Scanner: Run multiple instances for different libraries
Matcher: Run multiple consumers for faster enrichment
Front: Stateless, can run multiple instances behind load balancer

Vertical Scaling

Server: CPU-bound for complex queries, benefits from more cores
MeiliSearch: Memory-bound, benefits from more RAM
PostgreSQL: I/O-bound, benefits from SSD and connection pooling

Bottlenecks

Matcher: Limited by external provider rate limits
Transcoder: CPU-intensive, limits concurrent video streams
Database: Complex queries (artist with all albums/songs/videos) can be slow

Monitoring and Observability

Logging

Server: NestJS Logger with configurable levels (error, warn, info, debug)
Scanner: zerolog with structured JSON output
Matcher: Python logging with JSON formatter
Front: Console logs in development, silent in production

All logs written to stdout, captured by Docker.

Health Checks

Every service exposes health endpoint:

Server: GET /api/health
Scanner: GET /
Matcher: GET /health
Front: GET /api/health (Next.js API route)

Docker Compose monitors these endpoints.

Metrics

No built-in Prometheus metrics. Future enhancement.

Security Architecture

Authentication

JWT: Signed tokens with expiration
API Keys: x-api-key header for Scanner/Matcher
Bcrypt: Password hashing with salt rounds = 10

Authorization

Admin Flag: Users have isAdmin boolean
Ownership: Users can only modify their own playlists
Public Playlists: Readable by all, writable by owner or if allowChanges=true

Network Isolation

Docker Compose creates private network. Only Nginx exposes port 80. Internal services not accessible from host.

Input Validation

Server: NestJS validation pipes with class-validator
Scanner: Go struct validation
Matcher: Pydantic models

Invalid input returns 400 Bad Request.

SQL Injection

Prisma uses parameterized queries. No raw SQL in codebase.

XSS Protection

React escapes output by default. No dangerouslySetInnerHTML except for sanitized lyrics.

Deployment Variants

Production (docker-compose.yml)

Pre-built images from Docker Hub. Environment variables from .env. Volumes for persistence. Restart policy: always.

Development (docker-compose.dev.yml)

Mounted source directories. Hot reload enabled. Exposed ports for debugging (PostgreSQL 5432, MeiliSearch 7700, RabbitMQ 15672). Restart policy: unless-stopped.

Local Build (docker-compose.local.yml)

Builds images from source using Dockerfiles. Tests changes before pushing to Docker Hub. Same volumes and network as production.

Configuration Management

Environment Variables (.env)

Deployment-specific settings:

PORT: Server port (default 4000)
PUBLIC_URL: External URL for OAuth callbacks
CONFIG_DIR: Path to settings.json
DATA_DIR: Path to music files
JWT_SIGNATURE: Secret for signing tokens
GENIUS_ACCESS_TOKEN: Genius API key
DISCOGS_ACCESS_TOKEN: Discogs API key
LASTFM_API_KEY, LASTFM_API_SECRET: Last.fm OAuth

Settings File (settings.json)

User preferences:

trackRegex: Filename parsing pattern
metadata.source: Prefer embedded tags or external providers
metadata.order: Provider priority list
providers: Enable/disable specific providers
compilations: Rules for detecting compilation albums

Server reads settings.json on startup. Changes require restart.

Error Recovery

Service Failures

Docker restart policy handles crashes. Health checks detect hung processes.

Database Corruption

PostgreSQL volume backups recommended. Restore from backup if corruption detected.

Message Queue Failures

RabbitMQ persists messages to disk. Unacknowledged messages redelivered on restart.

Search Index Corruption

Rebuild MeiliSearch index from database:

curl -X POST http://localhost:4000/api/search/reindex

Server iterates all entities, pushes to MeiliSearch.

Performance Optimization

Database Indexes

Prisma schema defines indexes on:

Foreign keys (artistId, albumId, songId)
Unique constraints (slug, checksum)
Frequently queried fields (releaseDate, type)

Query Optimization

Eager Loading: Prisma include to avoid N+1 queries
Pagination: Limit/offset for large result sets
Caching: TanStack Query caches API responses client-side

Asset Optimization

Images: Illustrations stored as blurhash + URL
Lazy Loading: Front loads images on scroll
Code Splitting: Next.js splits bundles per page

Testing Strategy

Unit Tests

Server: Jest tests for services, controllers, utilities
Matcher: pytest tests for provider modules
Scanner: Go tests for file parsing, fingerprinting

Integration Tests

Server: Test API endpoints with in-memory database
Matcher: Mock external provider responses

End-to-End Tests

Not implemented. Future enhancement with Playwright.

Coverage

SonarCloud tracks coverage per service. Minimum threshold: 80%.

Summary

Meelo's architecture separates concerns across four microservices, each optimized for its task. The event-driven design decouples scanning from enrichment, enabling parallel processing and fault tolerance. Infrastructure services (PostgreSQL, MeiliSearch, RabbitMQ) provide persistence, search, and messaging. Docker Compose orchestrates startup order and health monitoring. The result is a scalable, maintainable system that handles complex metadata workflows without blocking user interactions.

23 KiB Raw Blame History

Meelo Architecture

System Overview

Service Responsibilities

Server (NestJS 11, TypeScript)

Module Structure

Data Flow

Authentication Flow

Scrobbling Flow

Search Integration

Scanner (Go 1.25, Echo v5)

Responsibilities

Scan Process

Filename Regex

Health Monitoring

Error Handling

Matcher (Python 3.14, FastAPI)

Responsibilities

Provider Architecture

Matching Flow

Rate Limiting

Error Handling

Configuration

Front (Next.js 16, React)

Responsibilities

Page Structure

State Management

Playback Flow

Mobile App

Internationalization

Infrastructure Services

PostgreSQL

MeiliSearch

RabbitMQ

Kyoo Transcoder

Nginx

Inter-Service Communication

REST APIs

Message Queue

Database

Startup Orchestration

Data Consistency

Transactions

Event Ordering

Search Consistency

Cache Invalidation

Scalability Considerations

Horizontal Scaling

Vertical Scaling

Bottlenecks

Monitoring and Observability

Logging

Health Checks

Metrics

Security Architecture

Authentication

Authorization

Network Isolation

Input Validation

SQL Injection

XSS Protection

Deployment Variants

Production (docker-compose.yml)

Development (docker-compose.dev.yml)

Local Build (docker-compose.local.yml)

Configuration Management

Environment Variables (.env)

Settings File (settings.json)

Error Recovery

Service Failures

Database Corruption

Message Queue Failures

Search Index Corruption

Performance Optimization

Database Indexes

Query Optimization

Asset Optimization

Testing Strategy

Unit Tests

Integration Tests

23 KiB

Raw Blame History