Files
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

14 KiB

MusicMetaLinker Deployment

Distribution Model

MusicMetaLinker is distributed as source code only. No binary distributions, no PyPI package, no conda package.

Installation method: Direct from GitHub via pip.

pip install git+https://github.com/andreamust/MusicMetaLinker.git

Implications:

  • Requires git installed
  • Requires network access to GitHub
  • No version pinning (always installs latest commit)
  • No offline installation

Build System

Build Backend

PEP 517 compliant: Uses pyproject.toml for build configuration.

Build backend: hatchling (modern Python build tool).

pyproject.toml structure:

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "musicmetalinker"
version = "0.0.1"
dependencies = [
    "musicbrainzngs",
    "deezer-python",
    "ytmusicapi",
    "spotipy",
    "requests",
    "tqdm",
    "jams",
    "pandas",
    "cryptography"
]

No setup.py: Modern packaging only.

No setup.cfg: All configuration in pyproject.toml.

Build Process

Local build:

git clone https://github.com/andreamust/MusicMetaLinker.git
cd MusicMetaLinker
pip install -e .

-e flag: Editable install. Changes to source code immediately reflected.

Build artifacts: None. Pure Python package, no compilation.

Dependencies

Runtime dependencies:

  • musicbrainzngs: MusicBrainz API client
  • deezer-python: Deezer API wrapper
  • ytmusicapi: YouTube Music API client
  • spotipy: Spotify API client
  • requests: HTTP library
  • tqdm: Progress bars
  • jams: JAMS format support
  • pandas: CSV output
  • cryptography: Required by spotipy

No optional dependencies: All dependencies required.

No development dependencies: No test framework, no linting tools, no type checkers.

Dependency versions: No version constraints. Always installs latest compatible versions.

Risk: Breaking changes in dependencies may break MusicMetaLinker.

Deployment Environments

Library Deployment

Target environment: Python 3.8+ on any platform (Linux, macOS, Windows).

Installation:

pip install git+https://github.com/andreamust/MusicMetaLinker.git

Usage:

from musicmetalinker.linking import Align

linker = Align(artist="...", track="...")
mbid = linker.get_mbid()

No configuration required (except Spotify credentials for dataset preparation).

Batch Processing Deployment

Target environment: Python 3.8+ with file system access.

Installation: Same as library deployment.

Usage:

cd /path/to/MusicMetaLinker
python link_partitions.py /path/to/jams/files --save --limit audio --overwrite

Requirements:

  • JAMS files in target directory
  • Write permissions for output CSV and enriched JAMS files
  • Network access for API queries

Optional: ffmpeg for audio conversion (if processing audio files directly).

Research Environment Deployment

Typical setup: Jupyter notebook or Python script in research project.

Installation:

pip install git+https://github.com/andreamust/MusicMetaLinker.git

Interactive testing:

Notebooks included in repository:

  • deezer_test.ipynb: Test Deezer integration
  • queries.ipynb: Test various query patterns

Usage:

# In Jupyter notebook
from musicmetalinker.linking import Align

linker = Align(...)
# Interactive exploration of results

Configuration Management

No Configuration Files

All configuration hardcoded in source files.

Hardcoded values:

  • User-Agent: "elka/0.1" (in linking.py)
  • Duration thresholds: 3s (Deezer), 5s (MusicBrainz)
  • Similarity threshold: 0.8
  • API endpoints: In library code

No config.ini, no config.yaml, no .env files.

Spotify Credentials

Only external configuration: mml_secrets.py for Spotify credentials.

Location: Must be in Python path (typically same directory as scripts).

Structure:

# mml_secrets.py
SPOTIFY_CLIENT_ID = "your-client-id-here"
SPOTIFY_CLIENT_SECRET = "your-client-secret-here"

Not in repository: Users must create this file manually.

No documentation: No instructions for obtaining Spotify credentials.

Obtaining credentials:

  1. Register app at https://developer.spotify.com/dashboard
  2. Copy client ID and secret
  3. Create mml_secrets.py with credentials

Environment Variables

Not used: No environment variable configuration.

Recommendation: Use environment variables for credentials instead of mml_secrets.py.

import os

SPOTIFY_CLIENT_ID = os.getenv("SPOTIFY_CLIENT_ID")
SPOTIFY_CLIENT_SECRET = os.getenv("SPOTIFY_CLIENT_SECRET")

Runtime Requirements

Python Version

Minimum: Python 3.8

Tested on: Unknown (no CI/CD, no test matrix).

Likely compatible: Python 3.8, 3.9, 3.10, 3.11, 3.12

Type hints: Not used extensively. No runtime type checking.

System Dependencies

Required:

  • Python 3.8+
  • pip
  • git (for installation)
  • Network access (for API queries)

Optional:

  • ffmpeg (for audio conversion in batch processing)

No database: No PostgreSQL, MySQL, MongoDB, etc.

No message queue: No RabbitMQ, Redis, Kafka, etc.

No web server: No nginx, Apache, etc.

Platform Support

Linux: Fully supported. Primary development platform (likely).

macOS: Fully supported. All dependencies available.

Windows: Likely supported. All dependencies have Windows wheels. Potential issues:

  • Path separators (/ vs )
  • Line endings (LF vs CRLF)
  • Case-sensitive file systems

No platform-specific code: Pure Python, no C extensions (except in dependencies).

Containerization

Docker

No Dockerfile provided.

Sample Dockerfile:

FROM python:3.11-slim

WORKDIR /app

RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*

RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git

COPY mml_secrets.py /app/

CMD ["python"]

For batch processing:

FROM python:3.11-slim

WORKDIR /app

RUN apt-get update && apt-get install -y git ffmpeg && rm -rf /var/lib/apt/lists/*

RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git

RUN git clone https://github.com/andreamust/MusicMetaLinker.git /app/MusicMetaLinker

WORKDIR /app/MusicMetaLinker

ENTRYPOINT ["python", "link_partitions.py"]

Usage:

docker build -t musicmetalinker .
docker run -v /path/to/jams:/data musicmetalinker /data --save

Docker Compose

Not provided.

Sample docker-compose.yml:

version: '3.8'

services:
  musicmetalinker:
    build: .
    volumes:
      - ./data:/data
      - ./output:/output
    environment:
      - SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
      - SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}

Kubernetes

Not applicable: MusicMetaLinker is a library/batch tool, not a long-running service.

Possible use case: Kubernetes Job for batch processing.

apiVersion: batch/v1
kind: Job
metadata:
  name: musicmetalinker-batch
spec:
  template:
    spec:
      containers:
      - name: musicmetalinker
        image: musicmetalinker:latest
        args: ["/data", "--save"]
        volumeMounts:
        - name: data
          mountPath: /data
      restartPolicy: Never
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: jams-data

Continuous Integration/Continuous Deployment

CI/CD Status

No CI/CD pipeline.

No GitHub Actions, no Travis CI, no CircleCI, no Jenkins.

Implications:

  • No automated testing on commits
  • No automated builds
  • No automated releases
  • No quality gates

Testing

No test suite.

No pytest, no unittest, no nose.

Testing approach:

  • Manual testing via Jupyter notebooks
  • if name == "main" blocks in some modules

No test coverage metrics.

Linting and Formatting

No linting configuration.

No pylint, no flake8, no black, no isort.

Code quality: Inconsistent. Debug prints, commented-out code, inconsistent naming.

Type Checking

No type checking.

No mypy, no pyright, no pyre.

Type hints: Minimal. Not enforced.

Monitoring and Logging

Logging

Library usage: Minimal console logging.

Batch processing: File-based logging to link_partitions.log.

Log format:

2024-01-15 10:30:45 - INFO - Processing file: track001.jams
2024-01-15 10:30:46 - INFO - Found MBID: 6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e
2024-01-15 10:30:47 - ERROR - Failed to query Deezer

Log levels: INFO, ERROR. No DEBUG, WARNING.

Debug output: Multiple print() statements in code (not controlled by logging).

Monitoring

No monitoring.

No metrics collection, no Prometheus, no Grafana, no Datadog.

No health checks, no status endpoints.

Error Tracking

No error tracking.

No Sentry, no Rollbar, no Bugsnag.

Errors silently suppressed. Returns None on failure.

Scaling Considerations

Horizontal Scaling

Not applicable: Library runs in single process.

Batch processing: Can be parallelized manually.

Manual parallelization:

# Split JAMS files into partitions
# Run multiple instances in parallel
python link_partitions.py /data/partition1 --save &
python link_partitions.py /data/partition2 --save &
python link_partitions.py /data/partition3 --save &
wait

No built-in parallelization.

Vertical Scaling

CPU: Single-threaded. More CPU cores don't help.

Memory: Minimal usage. Each Align instance uses ~1KB. Batch processing uses more for pandas DataFrame.

Network: Bottleneck. Sequential API calls. More bandwidth doesn't help (latency-bound).

Performance Optimization

No performance optimization.

Bottlenecks:

  • Network latency (sequential API calls)
  • No caching across instances
  • No connection pooling
  • No request batching

Potential optimizations:

  • Async/await for concurrent API calls
  • Persistent cache (Redis)
  • Connection pooling
  • Batch API requests (if services support)

Security Considerations

Secrets Management

Current approach: Hardcoded in mml_secrets.py.

Issues:

  • Plaintext credentials
  • No encryption
  • Risk of committing to version control

Recommendations:

  • Environment variables
  • Secrets vault (HashiCorp Vault, AWS Secrets Manager)
  • Encrypted configuration files

Network Security

HTTPS: All API calls use HTTPS.

Certificate validation: Handled by requests library (validates by default).

No proxy support: No configuration for HTTP proxies.

Input Validation

No input validation.

Risks:

  • Invalid MBIDs accepted
  • Negative durations accepted
  • Malformed ISRCs accepted

Actual risk: Low. Invalid input causes query failures (returns None).

Dependency Security

No dependency scanning.

No Dependabot, no Snyk, no safety.

Vulnerable dependencies: Unknown. No automated checks.

Recommendation: Run pip-audit or safety check regularly.

Backup and Recovery

Data Backup

No persistent data: Nothing to back up (library is stateless).

Batch output: CSV and JAMS files. User responsible for backup.

Disaster Recovery

Not applicable: Library has no state to recover.

Batch processing: Rerun if output lost. No checkpointing, no resume capability.

Deployment Checklist

Library Deployment

  • Python 3.8+ installed
  • pip installed
  • git installed
  • Network access to GitHub
  • Network access to MusicBrainz, Deezer, YouTube Music
  • (Optional) Spotify credentials in mml_secrets.py

Batch Processing Deployment

  • All library deployment requirements
  • JAMS files prepared
  • Write permissions for output directory
  • (Optional) ffmpeg installed for audio conversion
  • Sufficient disk space for output CSV and enriched JAMS files

Production Deployment (Recommendations)

  • Pin dependency versions in pyproject.toml
  • Add automated tests
  • Add CI/CD pipeline
  • Add error tracking (Sentry)
  • Add logging (structured JSON logs)
  • Add monitoring (Prometheus metrics)
  • Add rate limiting
  • Add retry logic with exponential backoff
  • Add health checks
  • Use environment variables for configuration
  • Add input validation
  • Add dependency scanning
  • Remove AcousticBrainz integration
  • Fix User-Agent header
  • Add documentation for Spotify setup

Deployment Recommendations

Immediate Actions

  1. Publish to PyPI: Enable pip install musicmetalinker without git.
  2. Pin dependencies: Add version constraints to prevent breaking changes.
  3. Document Spotify setup: Instructions for obtaining credentials.
  4. Remove AcousticBrainz: Delete defunct integration.

Short-Term Improvements

  1. Add CI/CD: GitHub Actions for automated testing and releases.
  2. Add tests: pytest suite with mocked API calls.
  3. Add Docker support: Official Dockerfile and Docker Compose.
  4. Add configuration: Support environment variables and config files.
  5. Add logging: Structured logging with configurable levels.

Long-Term Enhancements

  1. Add monitoring: Prometheus metrics for API latency, success rates.
  2. Add caching: Redis for cross-instance caching.
  3. Add async support: Concurrent API calls for better performance.
  4. Add health checks: Service availability monitoring.
  5. Add error tracking: Sentry integration for production debugging.
  6. Add documentation: Comprehensive deployment guide.
  7. Add versioning: Semantic versioning with changelog.
  8. Add security scanning: Automated dependency vulnerability checks.

Deployment Maturity Assessment

Current state: Research prototype. Suitable for academic exploration, not production.

Maturity level: 1/5

Production readiness: Low

Gaps:

  • No PyPI distribution
  • No CI/CD
  • No tests
  • No monitoring
  • No error tracking
  • Hardcoded configuration
  • Dead code (AcousticBrainz)
  • No documentation for deployment

Recommendation: Use for research and prototyping only. Significant work required for production deployment.