- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
23 KiB
Meelo Architecture
System Overview
Meelo implements a microservices architecture with four application services and four infrastructure services, orchestrated via Docker Compose. Each service has a single responsibility and communicates through well-defined interfaces (REST APIs, message queues).
┌─────────────────────────────────────────────────────────────┐
│ Nginx │
│ Reverse Proxy (Port 80) │
│ Routes: / → Front, /api/ → Server, /scanner/ → Scanner │
└─────────────────────────────────────────────────────────────┘
│ │ │ │
┌────┘ ┌────┘ ┌────┘ ┌────┘
│ │ │ │
┌───▼────┐ ┌────▼─────┐ ┌──────▼───┐ ┌──────▼────┐
│ Front │ │ Server │ │ Scanner │ │ Matcher │
│ Next.js│ │ NestJS │ │ Go │ │ FastAPI │
│ :3000 │ │ :4000 │ │ :8133 │ │ :6789 │
└────────┘ └────┬─────┘ └────┬─────┘ └─────┬─────┘
│ │ │
┌────────┼──────────────┼───────────────┘
│ │ │
┌────▼───┐ ┌─▼──────────┐ ┌─▼──────────┐
│ Postgres│ │ MeiliSearch│ │ RabbitMQ │
│ :5432 │ │ :7700 │ │ :5672 │
└─────────┘ └────────────┘ └────────────┘
Service Responsibilities
Server (NestJS 11, TypeScript)
Port: 4000
Database: PostgreSQL via Prisma ORM
Search: MeiliSearch client
Messaging: RabbitMQ publisher
Module Structure
NestJS organizes code into modules. Each module encapsulates related functionality:
Core Domain Modules
ArtistModule: CRUD operations, relationships to albums/songs/videosAlbumModule: Album management, release associationsSongModule: Song entities, track relationships, lyricsTrackModule: Individual track instances (audio/video)ReleaseModule: Physical/digital release variantsGenreModule: Genre taxonomy and associationsVideoModule: Music video management
Supporting Modules
AuthModule: JWT authentication, user registration, loginUserModule: User management, preferences, scrobbler connectionsLibraryModule: Library configuration, scan triggersFileModule: File metadata, checksums, fingerprintsPlaylistModule: Playlist CRUD, entry managementLyricsModule: Plain and synced lyrics storage
Integration Modules
ExternalMetadataModule: Provider data aggregationSearchModule: MeiliSearch indexing and queriesScrobblerModule: Last.fm and ListenBrainz integrationStreamModule: Audio/video streaming endpointsEventsModule: WebSocket notifications for UI updates
Infrastructure Modules
PrismaModule: Database connection and ORMMeiliSearchModule: Search client configurationRabbitMQModule: Message queue publisher
Data Flow
- Incoming Request: Nginx forwards to Server at
/api/* - Controller: Route handler validates request, extracts JWT
- Service: Business logic executes, calls Prisma for data
- Repository: Prisma queries PostgreSQL
- Response: JSON returned to client
For write operations:
- Service updates database via Prisma
- Service publishes event to RabbitMQ (if needed)
- Service updates MeiliSearch index
- Service emits WebSocket event for live UI updates
Authentication Flow
- User submits credentials to
/api/auth/login AuthServicevalidates against bcrypt hash in database- JWT signed with
JWT_SIGNATUREfrom .env - Token returned to client
- Client includes token in
Authorization: Bearer <token>header JwtStrategyvalidates token on protected routes- User object attached to request context
Anonymous mode (ALLOW_ANONYMOUS=1) bypasses this flow.
Scrobbling Flow
- User authorizes Last.fm via OAuth (callback to
/api/scrobblers/lastfm/callback) - Server exchanges code for access token
- Token stored in
UserScrobblertable - On track play,
ScrobblerServiceposts to Last.fm API - ListenBrainz uses simpler token-based auth (user provides token directly)
Search Integration
- On entity creation/update, service calls
MeiliSearchService.index() - Service transforms entity to search document
- Document pushed to MeiliSearch via HTTP API
- Client queries
/api/search?q=<term> - Server forwards to MeiliSearch
- Results enriched with database data (illustrations, counts)
- JSON returned to client
Scanner (Go 1.25, Echo v5)
Port: 8133
Framework: Echo HTTP server
Dependencies: FFmpeg, FFprobe, AcoustID
Responsibilities
- Filesystem Watching: Monitor library directories for changes
- Metadata Extraction: Parse audio/video files using FFprobe
- Fingerprinting: Generate AcoustID fingerprints for matching
- Filename Parsing: Apply regex from settings.json to extract metadata
- File Registration: POST file metadata to Server API
- Match Triggering: Publish events to RabbitMQ for Matcher consumption
Scan Process
- Trigger: POST to
/scanner/scan/:libraryIdor filesystem event - Discovery: Walk directory tree, filter by extension (.mp3, .flac, .m4a, .mkv, etc.)
- Extraction: For each file:
- Run FFprobe to get duration, bitrate, codec, embedded tags
- Generate AcoustID fingerprint using chromaprint
- Parse filename using regex from settings.json
- Calculate file checksum (SHA256)
- Registration: POST to Server
/api/fileswith:- File path
- Checksum
- Fingerprint
- Extracted metadata (title, artist, album, track number)
- Technical details (duration, bitrate, codec)
- Event Publishing: Publish to RabbitMQ queue
file.addedwith file ID - Repeat: Process next file
Filename Regex
Settings.json contains trackRegex pattern. Example:
(?P<artist>[^/]+)/(?P<album>[^/]+)/(?P<disc>\d+)-(?P<track>\d+) (?P<title>.+)\.(?P<ext>\w+)
Named capture groups extract metadata when embedded tags are missing or untrusted.
Health Monitoring
Scanner exposes GET / endpoint. Returns JSON with:
- Service status
- Active scan tasks
- Last scan timestamp
- Library statistics
Docker health check hits this endpoint every 30 seconds.
Error Handling
- File Read Errors: Log and skip file, continue scan
- FFprobe Failures: Retry once, then skip
- Server API Errors: Retry with exponential backoff (max 3 attempts)
- RabbitMQ Unavailable: Queue events in memory, flush when connection restored
Matcher (Python 3.14, FastAPI)
Port: 6789
Framework: FastAPI with async HTTP
Messaging: RabbitMQ consumer
Responsibilities
- Event Consumption: Listen to RabbitMQ
file.addedqueue - Provider Queries: Fetch metadata from 8 external sources
- Data Aggregation: Merge results based on priority in settings.json
- Metadata Push: POST enriched data to Server API
Provider Architecture
Each provider is a separate module implementing a common interface:
class Provider(ABC):
@abstractmethod
async def search_track(self, fingerprint: str, title: str, artist: str) -> Optional[TrackMetadata]:
pass
@abstractmethod
async def fetch_artist(self, artist_id: str) -> Optional[ArtistMetadata]:
pass
@abstractmethod
async def fetch_album(self, album_id: str) -> Optional[AlbumMetadata]:
pass
Provider Modules
musicbrainz.py: Primary database, uses musicbrainzngs librarygenius.py: Lyrics and song descriptions, requires API tokenwikipedia.py: Artist/album context, uses Wikipedia APIwikidata.py: Structured data (areas, relationships), SPARQL queriesdiscogs.py: Release details, requires API tokenallmusic.py: Editorial reviews, web scraping (no official API)metacritic.py: Critic scores, web scrapinglrclib.py: Synced lyrics, public API
Matching Flow
- Event Received: RabbitMQ delivers
file.addedmessage with file ID - File Fetch: GET
/api/files/:idfrom Server to retrieve metadata - Provider Selection: Read settings.json for enabled providers and priority
- Parallel Queries: Launch async tasks for each provider:
- MusicBrainz: Query by AcoustID fingerprint
- Genius: Search by title + artist
- Wikipedia: Search by artist name
- Wikidata: Query by MusicBrainz ID (if found)
- Discogs: Search by release title
- AllMusic: Scrape by artist + album
- Metacritic: Scrape by album title
- LrcLib: Search by title + artist + duration
- Result Aggregation: Merge results based on priority:
- MusicBrainz IDs take precedence
- Lyrics: prefer synced (LrcLib) over plain (Genius)
- Descriptions: concatenate from multiple sources
- Ratings: average across providers
- Metadata Push: POST to Server
/api/external-metadatawith:- Track/album/artist IDs
- Descriptions
- Ratings
- Source URLs
- Provider names
- Acknowledgment: ACK message to RabbitMQ
Rate Limiting
Providers have different rate limits:
- MusicBrainz: 1 request/second (enforced by library)
- Genius: 10 requests/second (API limit)
- Wikipedia: No official limit, use 5 requests/second
- Wikidata: No limit, SPARQL endpoint is fast
- Discogs: 60 requests/minute (API limit)
- AllMusic: No API, scraping limited to 1 request/second
- Metacritic: No API, scraping limited to 1 request/second
- LrcLib: No official limit, use 10 requests/second
Matcher implements per-provider rate limiters using aiolimiter.
Error Handling
- Provider Timeout: Skip provider, continue with others
- HTTP Errors: Retry with exponential backoff (max 3 attempts)
- Parsing Errors: Log and skip provider result
- Server API Errors: NACK message to RabbitMQ for redelivery
- No Results: Push empty metadata (Server marks as "not found")
Configuration
Settings.json controls provider behavior:
{
"providers": {
"musicbrainz": { "enabled": true },
"genius": { "enabled": true, "token": "..." },
"wikipedia": { "enabled": true },
"wikidata": { "enabled": true },
"discogs": { "enabled": false },
"allmusic": { "enabled": false },
"metacritic": { "enabled": false },
"lrclib": { "enabled": true }
},
"metadata": {
"order": ["musicbrainz", "genius", "wikipedia", "lrclib"]
}
}
Disabled providers are skipped. Order determines priority for conflicting data.
Front (Next.js 16, React)
Port: 3000
Framework: Next.js with SSR
UI: Material-UI components
State: Jotai atoms
Data Fetching: TanStack Query
i18n: i18next
Responsibilities
- User Interface: Render pages for browsing, playback, settings
- API Communication: Fetch data from Server via REST
- State Management: Manage playback queue, user preferences, auth tokens
- Internationalization: Support multiple languages
Page Structure
/: Home page with recent albums, top artists/artists: Artist grid with search/artists/:id: Artist detail with albums, songs, videos/albums: Album grid with filters/albums/:id: Album detail with tracks, releases/songs: Song list with search/songs/:id: Song detail with tracks, lyrics/playlists: User playlists/playlists/:id: Playlist detail with tracks/videos: Music video grid/videos/:id: Video player/search: Global search results/settings: User preferences, library management, scrobbler setup
State Management
Jotai atoms store global state:
authAtom: JWT token, user infoplaybackAtom: Current track, queue, position, volumesettingsAtom: Theme, language, playback preferences
TanStack Query caches API responses:
useArtists(): Fetch artist listuseArtist(id): Fetch artist detailuseAlbums(): Fetch album listuseAlbum(id): Fetch album detailuseTracks(): Fetch track listuseSearch(query): Fetch search results
Queries invalidate on mutations (create playlist, update settings).
Playback Flow
- User clicks track
playbackAtomupdated with track ID- Component fetches stream URL:
/api/tracks/:id/stream - HTML5
<audio>element loads stream - Playback starts
- On play event, POST to
/api/scrobblers/scrobble(if enabled) - On track end, advance queue, repeat flow
Video playback uses <video> element with transcoder stream.
Mobile App
Expo/React Native app shares components and state logic with web. Differences:
- Navigation: React Navigation instead of Next.js router
- Storage: AsyncStorage instead of localStorage
- Media: expo-av instead of HTML5 audio/video
- Notifications: expo-notifications for background playback
Monorepo structure:
front/
web/ # Next.js app
mobile/ # Expo app
shared/ # Common components, hooks, state
Internationalization
i18next with JSON translation files:
locales/
en/
common.json
artist.json
album.json
fr/
common.json
artist.json
album.json
Language switcher in settings. Detects browser locale on first visit.
Infrastructure Services
PostgreSQL
Port: 5432
Image: postgres:alpine3.14
Volume: meelo_db
Stores all persistent data. Prisma manages schema migrations. Health check via pg_isready.
MeiliSearch
Port: 7700
Image: meilisearch:v1.5
Volume: meelo_search
Indexes artists, albums, songs, videos. Configured with:
- Searchable attributes: name, title, artist names
- Filterable attributes: genre, year, type
- Sortable attributes: releaseDate, name
- Ranking rules: typo, words, proximity, attribute, sort, exactness
Health check via GET /health.
RabbitMQ
Port: 5672 (AMQP), 15672 (management UI)
Image: rabbitmq:4.2-alpine
Volume: meelo_rabbitmq_data
Message queue for event-driven architecture. Queues:
file.added: Scanner publishes, Matcher consumesmetadata.updated: Matcher publishes, Server consumes (future use)
Health check via rabbitmq-diagnostics ping.
Kyoo Transcoder
Port: 7666
Volume: meelo_transcoder_cache
Transcodes video files for web playback. Supports:
- Adaptive bitrate streaming (HLS)
- Multiple resolutions (480p, 720p, 1080p)
- Codec conversion (H.264, VP9)
- Subtitle burning
Server proxies requests to transcoder. Client receives HLS manifest.
Nginx
Port: 80
Image: nginx:1.29.7-alpine
Config: Mounted from nginx.conf
Routes requests to services:
location / {
proxy_pass http://front:3000;
}
location /api/ {
proxy_pass http://server:4000;
}
location /scanner/ {
proxy_pass http://scanner:8133;
}
location /matcher/ {
proxy_pass http://matcher:6789;
}
Handles WebSocket upgrades for Server events.
Inter-Service Communication
REST APIs
- Front → Server: All data fetching (artists, albums, tracks, playlists)
- Scanner → Server: File registration, library queries
- Matcher → Server: Metadata push, file queries
- Server → MeiliSearch: Index updates, search queries
- Server → Transcoder: Video stream requests
Message Queue
- Scanner → RabbitMQ: Publish
file.addedevents - RabbitMQ → Matcher: Deliver
file.addedevents
Database
- Server → PostgreSQL: All CRUD operations via Prisma
Startup Orchestration
Docker Compose defines service dependencies and health checks:
- PostgreSQL starts first, health check via
pg_isready - MeiliSearch starts, health check via
GET /health - RabbitMQ starts, health check via
rabbitmq-diagnostics ping - Server starts after database/search/queue are healthy
- Runs Prisma migrations
- Seeds initial data (admin user if none exists)
- Connects to MeiliSearch and RabbitMQ
- Scanner starts after Server is healthy
- Registers with Server API
- Begins filesystem watching
- Matcher starts after Server and RabbitMQ are healthy
- Connects to RabbitMQ
- Begins consuming events
- Front starts after Server is healthy
- SSR requires Server API for initial data
- Transcoder starts independently (no dependencies)
- Nginx starts last, after all application services are healthy
Health checks run every 30 seconds. Unhealthy services restart automatically.
Data Consistency
Transactions
Prisma transactions ensure atomicity:
await prisma.$transaction([
prisma.song.create({ data: songData }),
prisma.track.create({ data: trackData }),
prisma.file.update({ where: { id: fileId }, data: { trackId } })
]);
If any operation fails, all rollback.
Event Ordering
RabbitMQ guarantees message order per queue. Matcher processes events sequentially to avoid race conditions.
Search Consistency
MeiliSearch updates are asynchronous. Brief window where database and search index diverge. Acceptable for this use case (eventual consistency).
Cache Invalidation
TanStack Query invalidates caches on mutations:
const mutation = useMutation({
mutationFn: createPlaylist,
onSuccess: () => {
queryClient.invalidateQueries(['playlists']);
}
});
Scalability Considerations
Horizontal Scaling
- Scanner: Run multiple instances for different libraries
- Matcher: Run multiple consumers for faster enrichment
- Front: Stateless, can run multiple instances behind load balancer
Vertical Scaling
- Server: CPU-bound for complex queries, benefits from more cores
- MeiliSearch: Memory-bound, benefits from more RAM
- PostgreSQL: I/O-bound, benefits from SSD and connection pooling
Bottlenecks
- Matcher: Limited by external provider rate limits
- Transcoder: CPU-intensive, limits concurrent video streams
- Database: Complex queries (artist with all albums/songs/videos) can be slow
Monitoring and Observability
Logging
- Server: NestJS Logger with configurable levels (error, warn, info, debug)
- Scanner: zerolog with structured JSON output
- Matcher: Python logging with JSON formatter
- Front: Console logs in development, silent in production
All logs written to stdout, captured by Docker.
Health Checks
Every service exposes health endpoint:
- Server:
GET /api/health - Scanner:
GET / - Matcher:
GET /health - Front:
GET /api/health(Next.js API route)
Docker Compose monitors these endpoints.
Metrics
No built-in Prometheus metrics. Future enhancement.
Security Architecture
Authentication
- JWT: Signed tokens with expiration
- API Keys:
x-api-keyheader for Scanner/Matcher - Bcrypt: Password hashing with salt rounds = 10
Authorization
- Admin Flag: Users have
isAdminboolean - Ownership: Users can only modify their own playlists
- Public Playlists: Readable by all, writable by owner or if
allowChanges=true
Network Isolation
Docker Compose creates private network. Only Nginx exposes port 80. Internal services not accessible from host.
Input Validation
- Server: NestJS validation pipes with class-validator
- Scanner: Go struct validation
- Matcher: Pydantic models
Invalid input returns 400 Bad Request.
SQL Injection
Prisma uses parameterized queries. No raw SQL in codebase.
XSS Protection
React escapes output by default. No dangerouslySetInnerHTML except for sanitized lyrics.
Deployment Variants
Production (docker-compose.yml)
Pre-built images from Docker Hub. Environment variables from .env. Volumes for persistence. Restart policy: always.
Development (docker-compose.dev.yml)
Mounted source directories. Hot reload enabled. Exposed ports for debugging (PostgreSQL 5432, MeiliSearch 7700, RabbitMQ 15672). Restart policy: unless-stopped.
Local Build (docker-compose.local.yml)
Builds images from source using Dockerfiles. Tests changes before pushing to Docker Hub. Same volumes and network as production.
Configuration Management
Environment Variables (.env)
Deployment-specific settings:
PORT: Server port (default 4000)PUBLIC_URL: External URL for OAuth callbacksCONFIG_DIR: Path to settings.jsonDATA_DIR: Path to music filesJWT_SIGNATURE: Secret for signing tokensGENIUS_ACCESS_TOKEN: Genius API keyDISCOGS_ACCESS_TOKEN: Discogs API keyLASTFM_API_KEY,LASTFM_API_SECRET: Last.fm OAuth
Settings File (settings.json)
User preferences:
trackRegex: Filename parsing patternmetadata.source: Prefer embedded tags or external providersmetadata.order: Provider priority listproviders: Enable/disable specific providerscompilations: Rules for detecting compilation albums
Server reads settings.json on startup. Changes require restart.
Error Recovery
Service Failures
Docker restart policy handles crashes. Health checks detect hung processes.
Database Corruption
PostgreSQL volume backups recommended. Restore from backup if corruption detected.
Message Queue Failures
RabbitMQ persists messages to disk. Unacknowledged messages redelivered on restart.
Search Index Corruption
Rebuild MeiliSearch index from database:
curl -X POST http://localhost:4000/api/search/reindex
Server iterates all entities, pushes to MeiliSearch.
Performance Optimization
Database Indexes
Prisma schema defines indexes on:
- Foreign keys (artistId, albumId, songId)
- Unique constraints (slug, checksum)
- Frequently queried fields (releaseDate, type)
Query Optimization
- Eager Loading: Prisma
includeto avoid N+1 queries - Pagination: Limit/offset for large result sets
- Caching: TanStack Query caches API responses client-side
Asset Optimization
- Images: Illustrations stored as blurhash + URL
- Lazy Loading: Front loads images on scroll
- Code Splitting: Next.js splits bundles per page
Testing Strategy
Unit Tests
- Server: Jest tests for services, controllers, utilities
- Matcher: pytest tests for provider modules
- Scanner: Go tests for file parsing, fingerprinting
Integration Tests
- Server: Test API endpoints with in-memory database
- Matcher: Mock external provider responses
End-to-End Tests
Not implemented. Future enhancement with Playwright.
Coverage
SonarCloud tracks coverage per service. Minimum threshold: 80%.
Summary
Meelo's architecture separates concerns across four microservices, each optimized for its task. The event-driven design decouples scanning from enrichment, enabling parallel processing and fault tolerance. Infrastructure services (PostgreSQL, MeiliSearch, RabbitMQ) provide persistence, search, and messaging. Docker Compose orchestrates startup order and health monitoring. The result is a scalable, maintainable system that handles complex metadata workflows without blocking user interactions.