- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
14 KiB
MusicMetaLinker Deployment
Distribution Model
MusicMetaLinker is distributed as source code only. No binary distributions, no PyPI package, no conda package.
Installation method: Direct from GitHub via pip.
pip install git+https://github.com/andreamust/MusicMetaLinker.git
Implications:
- Requires git installed
- Requires network access to GitHub
- No version pinning (always installs latest commit)
- No offline installation
Build System
Build Backend
PEP 517 compliant: Uses pyproject.toml for build configuration.
Build backend: hatchling (modern Python build tool).
pyproject.toml structure:
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "musicmetalinker"
version = "0.0.1"
dependencies = [
"musicbrainzngs",
"deezer-python",
"ytmusicapi",
"spotipy",
"requests",
"tqdm",
"jams",
"pandas",
"cryptography"
]
No setup.py: Modern packaging only.
No setup.cfg: All configuration in pyproject.toml.
Build Process
Local build:
git clone https://github.com/andreamust/MusicMetaLinker.git
cd MusicMetaLinker
pip install -e .
-e flag: Editable install. Changes to source code immediately reflected.
Build artifacts: None. Pure Python package, no compilation.
Dependencies
Runtime dependencies:
- musicbrainzngs: MusicBrainz API client
- deezer-python: Deezer API wrapper
- ytmusicapi: YouTube Music API client
- spotipy: Spotify API client
- requests: HTTP library
- tqdm: Progress bars
- jams: JAMS format support
- pandas: CSV output
- cryptography: Required by spotipy
No optional dependencies: All dependencies required.
No development dependencies: No test framework, no linting tools, no type checkers.
Dependency versions: No version constraints. Always installs latest compatible versions.
Risk: Breaking changes in dependencies may break MusicMetaLinker.
Deployment Environments
Library Deployment
Target environment: Python 3.8+ on any platform (Linux, macOS, Windows).
Installation:
pip install git+https://github.com/andreamust/MusicMetaLinker.git
Usage:
from musicmetalinker.linking import Align
linker = Align(artist="...", track="...")
mbid = linker.get_mbid()
No configuration required (except Spotify credentials for dataset preparation).
Batch Processing Deployment
Target environment: Python 3.8+ with file system access.
Installation: Same as library deployment.
Usage:
cd /path/to/MusicMetaLinker
python link_partitions.py /path/to/jams/files --save --limit audio --overwrite
Requirements:
- JAMS files in target directory
- Write permissions for output CSV and enriched JAMS files
- Network access for API queries
Optional: ffmpeg for audio conversion (if processing audio files directly).
Research Environment Deployment
Typical setup: Jupyter notebook or Python script in research project.
Installation:
pip install git+https://github.com/andreamust/MusicMetaLinker.git
Interactive testing:
Notebooks included in repository:
- deezer_test.ipynb: Test Deezer integration
- queries.ipynb: Test various query patterns
Usage:
# In Jupyter notebook
from musicmetalinker.linking import Align
linker = Align(...)
# Interactive exploration of results
Configuration Management
No Configuration Files
All configuration hardcoded in source files.
Hardcoded values:
- User-Agent: "elka/0.1" (in linking.py)
- Duration thresholds: 3s (Deezer), 5s (MusicBrainz)
- Similarity threshold: 0.8
- API endpoints: In library code
No config.ini, no config.yaml, no .env files.
Spotify Credentials
Only external configuration: mml_secrets.py for Spotify credentials.
Location: Must be in Python path (typically same directory as scripts).
Structure:
# mml_secrets.py
SPOTIFY_CLIENT_ID = "your-client-id-here"
SPOTIFY_CLIENT_SECRET = "your-client-secret-here"
Not in repository: Users must create this file manually.
No documentation: No instructions for obtaining Spotify credentials.
Obtaining credentials:
- Register app at https://developer.spotify.com/dashboard
- Copy client ID and secret
- Create mml_secrets.py with credentials
Environment Variables
Not used: No environment variable configuration.
Recommendation: Use environment variables for credentials instead of mml_secrets.py.
import os
SPOTIFY_CLIENT_ID = os.getenv("SPOTIFY_CLIENT_ID")
SPOTIFY_CLIENT_SECRET = os.getenv("SPOTIFY_CLIENT_SECRET")
Runtime Requirements
Python Version
Minimum: Python 3.8
Tested on: Unknown (no CI/CD, no test matrix).
Likely compatible: Python 3.8, 3.9, 3.10, 3.11, 3.12
Type hints: Not used extensively. No runtime type checking.
System Dependencies
Required:
- Python 3.8+
- pip
- git (for installation)
- Network access (for API queries)
Optional:
- ffmpeg (for audio conversion in batch processing)
No database: No PostgreSQL, MySQL, MongoDB, etc.
No message queue: No RabbitMQ, Redis, Kafka, etc.
No web server: No nginx, Apache, etc.
Platform Support
Linux: Fully supported. Primary development platform (likely).
macOS: Fully supported. All dependencies available.
Windows: Likely supported. All dependencies have Windows wheels. Potential issues:
- Path separators (/ vs )
- Line endings (LF vs CRLF)
- Case-sensitive file systems
No platform-specific code: Pure Python, no C extensions (except in dependencies).
Containerization
Docker
No Dockerfile provided.
Sample Dockerfile:
FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git
COPY mml_secrets.py /app/
CMD ["python"]
For batch processing:
FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y git ffmpeg && rm -rf /var/lib/apt/lists/*
RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git
RUN git clone https://github.com/andreamust/MusicMetaLinker.git /app/MusicMetaLinker
WORKDIR /app/MusicMetaLinker
ENTRYPOINT ["python", "link_partitions.py"]
Usage:
docker build -t musicmetalinker .
docker run -v /path/to/jams:/data musicmetalinker /data --save
Docker Compose
Not provided.
Sample docker-compose.yml:
version: '3.8'
services:
musicmetalinker:
build: .
volumes:
- ./data:/data
- ./output:/output
environment:
- SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
- SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
Kubernetes
Not applicable: MusicMetaLinker is a library/batch tool, not a long-running service.
Possible use case: Kubernetes Job for batch processing.
apiVersion: batch/v1
kind: Job
metadata:
name: musicmetalinker-batch
spec:
template:
spec:
containers:
- name: musicmetalinker
image: musicmetalinker:latest
args: ["/data", "--save"]
volumeMounts:
- name: data
mountPath: /data
restartPolicy: Never
volumes:
- name: data
persistentVolumeClaim:
claimName: jams-data
Continuous Integration/Continuous Deployment
CI/CD Status
No CI/CD pipeline.
No GitHub Actions, no Travis CI, no CircleCI, no Jenkins.
Implications:
- No automated testing on commits
- No automated builds
- No automated releases
- No quality gates
Testing
No test suite.
No pytest, no unittest, no nose.
Testing approach:
- Manual testing via Jupyter notebooks
- if name == "main" blocks in some modules
No test coverage metrics.
Linting and Formatting
No linting configuration.
No pylint, no flake8, no black, no isort.
Code quality: Inconsistent. Debug prints, commented-out code, inconsistent naming.
Type Checking
No type checking.
No mypy, no pyright, no pyre.
Type hints: Minimal. Not enforced.
Monitoring and Logging
Logging
Library usage: Minimal console logging.
Batch processing: File-based logging to link_partitions.log.
Log format:
2024-01-15 10:30:45 - INFO - Processing file: track001.jams
2024-01-15 10:30:46 - INFO - Found MBID: 6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e
2024-01-15 10:30:47 - ERROR - Failed to query Deezer
Log levels: INFO, ERROR. No DEBUG, WARNING.
Debug output: Multiple print() statements in code (not controlled by logging).
Monitoring
No monitoring.
No metrics collection, no Prometheus, no Grafana, no Datadog.
No health checks, no status endpoints.
Error Tracking
No error tracking.
No Sentry, no Rollbar, no Bugsnag.
Errors silently suppressed. Returns None on failure.
Scaling Considerations
Horizontal Scaling
Not applicable: Library runs in single process.
Batch processing: Can be parallelized manually.
Manual parallelization:
# Split JAMS files into partitions
# Run multiple instances in parallel
python link_partitions.py /data/partition1 --save &
python link_partitions.py /data/partition2 --save &
python link_partitions.py /data/partition3 --save &
wait
No built-in parallelization.
Vertical Scaling
CPU: Single-threaded. More CPU cores don't help.
Memory: Minimal usage. Each Align instance uses ~1KB. Batch processing uses more for pandas DataFrame.
Network: Bottleneck. Sequential API calls. More bandwidth doesn't help (latency-bound).
Performance Optimization
No performance optimization.
Bottlenecks:
- Network latency (sequential API calls)
- No caching across instances
- No connection pooling
- No request batching
Potential optimizations:
- Async/await for concurrent API calls
- Persistent cache (Redis)
- Connection pooling
- Batch API requests (if services support)
Security Considerations
Secrets Management
Current approach: Hardcoded in mml_secrets.py.
Issues:
- Plaintext credentials
- No encryption
- Risk of committing to version control
Recommendations:
- Environment variables
- Secrets vault (HashiCorp Vault, AWS Secrets Manager)
- Encrypted configuration files
Network Security
HTTPS: All API calls use HTTPS.
Certificate validation: Handled by requests library (validates by default).
No proxy support: No configuration for HTTP proxies.
Input Validation
No input validation.
Risks:
- Invalid MBIDs accepted
- Negative durations accepted
- Malformed ISRCs accepted
Actual risk: Low. Invalid input causes query failures (returns None).
Dependency Security
No dependency scanning.
No Dependabot, no Snyk, no safety.
Vulnerable dependencies: Unknown. No automated checks.
Recommendation: Run pip-audit or safety check regularly.
Backup and Recovery
Data Backup
No persistent data: Nothing to back up (library is stateless).
Batch output: CSV and JAMS files. User responsible for backup.
Disaster Recovery
Not applicable: Library has no state to recover.
Batch processing: Rerun if output lost. No checkpointing, no resume capability.
Deployment Checklist
Library Deployment
- Python 3.8+ installed
- pip installed
- git installed
- Network access to GitHub
- Network access to MusicBrainz, Deezer, YouTube Music
- (Optional) Spotify credentials in mml_secrets.py
Batch Processing Deployment
- All library deployment requirements
- JAMS files prepared
- Write permissions for output directory
- (Optional) ffmpeg installed for audio conversion
- Sufficient disk space for output CSV and enriched JAMS files
Production Deployment (Recommendations)
- Pin dependency versions in pyproject.toml
- Add automated tests
- Add CI/CD pipeline
- Add error tracking (Sentry)
- Add logging (structured JSON logs)
- Add monitoring (Prometheus metrics)
- Add rate limiting
- Add retry logic with exponential backoff
- Add health checks
- Use environment variables for configuration
- Add input validation
- Add dependency scanning
- Remove AcousticBrainz integration
- Fix User-Agent header
- Add documentation for Spotify setup
Deployment Recommendations
Immediate Actions
- Publish to PyPI: Enable
pip install musicmetalinkerwithout git. - Pin dependencies: Add version constraints to prevent breaking changes.
- Document Spotify setup: Instructions for obtaining credentials.
- Remove AcousticBrainz: Delete defunct integration.
Short-Term Improvements
- Add CI/CD: GitHub Actions for automated testing and releases.
- Add tests: pytest suite with mocked API calls.
- Add Docker support: Official Dockerfile and Docker Compose.
- Add configuration: Support environment variables and config files.
- Add logging: Structured logging with configurable levels.
Long-Term Enhancements
- Add monitoring: Prometheus metrics for API latency, success rates.
- Add caching: Redis for cross-instance caching.
- Add async support: Concurrent API calls for better performance.
- Add health checks: Service availability monitoring.
- Add error tracking: Sentry integration for production debugging.
- Add documentation: Comprehensive deployment guide.
- Add versioning: Semantic versioning with changelog.
- Add security scanning: Automated dependency vulnerability checks.
Deployment Maturity Assessment
Current state: Research prototype. Suitable for academic exploration, not production.
Maturity level: 1/5
Production readiness: Low
Gaps:
- No PyPI distribution
- No CI/CD
- No tests
- No monitoring
- No error tracking
- Hardcoded configuration
- Dead code (AcousticBrainz)
- No documentation for deployment
Recommendation: Use for research and prototyping only. Significant work required for production deployment.