Files
metadata-agregator/docs/research/musicmetalinker/analysis/DEPLOYMENT.md
T
Alexander a1f6701bac feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider
- PostgreSQL schema with migrations
- Service layer with database-first caching
- Repository pattern for data access
- YAML configuration support
- Research documentation for 17 music metadata projects
2026-04-28 16:28:53 +02:00

612 lines
14 KiB
Markdown

# MusicMetaLinker Deployment
## Distribution Model
MusicMetaLinker is distributed as source code only. No binary distributions, no PyPI package, no conda package.
**Installation method:** Direct from GitHub via pip.
```bash
pip install git+https://github.com/andreamust/MusicMetaLinker.git
```
**Implications:**
- Requires git installed
- Requires network access to GitHub
- No version pinning (always installs latest commit)
- No offline installation
## Build System
### Build Backend
**PEP 517 compliant:** Uses pyproject.toml for build configuration.
**Build backend:** hatchling (modern Python build tool).
**pyproject.toml structure:**
```toml
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "musicmetalinker"
version = "0.0.1"
dependencies = [
"musicbrainzngs",
"deezer-python",
"ytmusicapi",
"spotipy",
"requests",
"tqdm",
"jams",
"pandas",
"cryptography"
]
```
**No setup.py:** Modern packaging only.
**No setup.cfg:** All configuration in pyproject.toml.
### Build Process
**Local build:**
```bash
git clone https://github.com/andreamust/MusicMetaLinker.git
cd MusicMetaLinker
pip install -e .
```
**-e flag:** Editable install. Changes to source code immediately reflected.
**Build artifacts:** None. Pure Python package, no compilation.
### Dependencies
**Runtime dependencies:**
- musicbrainzngs: MusicBrainz API client
- deezer-python: Deezer API wrapper
- ytmusicapi: YouTube Music API client
- spotipy: Spotify API client
- requests: HTTP library
- tqdm: Progress bars
- jams: JAMS format support
- pandas: CSV output
- cryptography: Required by spotipy
**No optional dependencies:** All dependencies required.
**No development dependencies:** No test framework, no linting tools, no type checkers.
**Dependency versions:** No version constraints. Always installs latest compatible versions.
**Risk:** Breaking changes in dependencies may break MusicMetaLinker.
## Deployment Environments
### Library Deployment
**Target environment:** Python 3.8+ on any platform (Linux, macOS, Windows).
**Installation:**
```bash
pip install git+https://github.com/andreamust/MusicMetaLinker.git
```
**Usage:**
```python
from musicmetalinker.linking import Align
linker = Align(artist="...", track="...")
mbid = linker.get_mbid()
```
**No configuration required** (except Spotify credentials for dataset preparation).
### Batch Processing Deployment
**Target environment:** Python 3.8+ with file system access.
**Installation:** Same as library deployment.
**Usage:**
```bash
cd /path/to/MusicMetaLinker
python link_partitions.py /path/to/jams/files --save --limit audio --overwrite
```
**Requirements:**
- JAMS files in target directory
- Write permissions for output CSV and enriched JAMS files
- Network access for API queries
**Optional:** ffmpeg for audio conversion (if processing audio files directly).
### Research Environment Deployment
**Typical setup:** Jupyter notebook or Python script in research project.
**Installation:**
```bash
pip install git+https://github.com/andreamust/MusicMetaLinker.git
```
**Interactive testing:**
Notebooks included in repository:
- deezer_test.ipynb: Test Deezer integration
- queries.ipynb: Test various query patterns
**Usage:**
```python
# In Jupyter notebook
from musicmetalinker.linking import Align
linker = Align(...)
# Interactive exploration of results
```
## Configuration Management
### No Configuration Files
All configuration hardcoded in source files.
**Hardcoded values:**
- User-Agent: "elka/0.1" (in linking.py)
- Duration thresholds: 3s (Deezer), 5s (MusicBrainz)
- Similarity threshold: 0.8
- API endpoints: In library code
**No config.ini, no config.yaml, no .env files.**
### Spotify Credentials
**Only external configuration:** mml_secrets.py for Spotify credentials.
**Location:** Must be in Python path (typically same directory as scripts).
**Structure:**
```python
# mml_secrets.py
SPOTIFY_CLIENT_ID = "your-client-id-here"
SPOTIFY_CLIENT_SECRET = "your-client-secret-here"
```
**Not in repository:** Users must create this file manually.
**No documentation:** No instructions for obtaining Spotify credentials.
**Obtaining credentials:**
1. Register app at https://developer.spotify.com/dashboard
2. Copy client ID and secret
3. Create mml_secrets.py with credentials
### Environment Variables
**Not used:** No environment variable configuration.
**Recommendation:** Use environment variables for credentials instead of mml_secrets.py.
```python
import os
SPOTIFY_CLIENT_ID = os.getenv("SPOTIFY_CLIENT_ID")
SPOTIFY_CLIENT_SECRET = os.getenv("SPOTIFY_CLIENT_SECRET")
```
## Runtime Requirements
### Python Version
**Minimum:** Python 3.8
**Tested on:** Unknown (no CI/CD, no test matrix).
**Likely compatible:** Python 3.8, 3.9, 3.10, 3.11, 3.12
**Type hints:** Not used extensively. No runtime type checking.
### System Dependencies
**Required:**
- Python 3.8+
- pip
- git (for installation)
- Network access (for API queries)
**Optional:**
- ffmpeg (for audio conversion in batch processing)
**No database:** No PostgreSQL, MySQL, MongoDB, etc.
**No message queue:** No RabbitMQ, Redis, Kafka, etc.
**No web server:** No nginx, Apache, etc.
### Platform Support
**Linux:** Fully supported. Primary development platform (likely).
**macOS:** Fully supported. All dependencies available.
**Windows:** Likely supported. All dependencies have Windows wheels. Potential issues:
- Path separators (/ vs \)
- Line endings (LF vs CRLF)
- Case-sensitive file systems
**No platform-specific code:** Pure Python, no C extensions (except in dependencies).
## Containerization
### Docker
**No Dockerfile provided.**
**Sample Dockerfile:**
```dockerfile
FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git
COPY mml_secrets.py /app/
CMD ["python"]
```
**For batch processing:**
```dockerfile
FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y git ffmpeg && rm -rf /var/lib/apt/lists/*
RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git
RUN git clone https://github.com/andreamust/MusicMetaLinker.git /app/MusicMetaLinker
WORKDIR /app/MusicMetaLinker
ENTRYPOINT ["python", "link_partitions.py"]
```
**Usage:**
```bash
docker build -t musicmetalinker .
docker run -v /path/to/jams:/data musicmetalinker /data --save
```
### Docker Compose
**Not provided.**
**Sample docker-compose.yml:**
```yaml
version: '3.8'
services:
musicmetalinker:
build: .
volumes:
- ./data:/data
- ./output:/output
environment:
- SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
- SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
```
### Kubernetes
**Not applicable:** MusicMetaLinker is a library/batch tool, not a long-running service.
**Possible use case:** Kubernetes Job for batch processing.
```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: musicmetalinker-batch
spec:
template:
spec:
containers:
- name: musicmetalinker
image: musicmetalinker:latest
args: ["/data", "--save"]
volumeMounts:
- name: data
mountPath: /data
restartPolicy: Never
volumes:
- name: data
persistentVolumeClaim:
claimName: jams-data
```
## Continuous Integration/Continuous Deployment
### CI/CD Status
**No CI/CD pipeline.**
**No GitHub Actions, no Travis CI, no CircleCI, no Jenkins.**
**Implications:**
- No automated testing on commits
- No automated builds
- No automated releases
- No quality gates
### Testing
**No test suite.**
**No pytest, no unittest, no nose.**
**Testing approach:**
- Manual testing via Jupyter notebooks
- if __name__ == "__main__" blocks in some modules
**No test coverage metrics.**
### Linting and Formatting
**No linting configuration.**
**No pylint, no flake8, no black, no isort.**
**Code quality:** Inconsistent. Debug prints, commented-out code, inconsistent naming.
### Type Checking
**No type checking.**
**No mypy, no pyright, no pyre.**
**Type hints:** Minimal. Not enforced.
## Monitoring and Logging
### Logging
**Library usage:** Minimal console logging.
**Batch processing:** File-based logging to link_partitions.log.
**Log format:**
```
2024-01-15 10:30:45 - INFO - Processing file: track001.jams
2024-01-15 10:30:46 - INFO - Found MBID: 6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e
2024-01-15 10:30:47 - ERROR - Failed to query Deezer
```
**Log levels:** INFO, ERROR. No DEBUG, WARNING.
**Debug output:** Multiple print() statements in code (not controlled by logging).
### Monitoring
**No monitoring.**
**No metrics collection, no Prometheus, no Grafana, no Datadog.**
**No health checks, no status endpoints.**
### Error Tracking
**No error tracking.**
**No Sentry, no Rollbar, no Bugsnag.**
**Errors silently suppressed.** Returns None on failure.
## Scaling Considerations
### Horizontal Scaling
**Not applicable:** Library runs in single process.
**Batch processing:** Can be parallelized manually.
**Manual parallelization:**
```bash
# Split JAMS files into partitions
# Run multiple instances in parallel
python link_partitions.py /data/partition1 --save &
python link_partitions.py /data/partition2 --save &
python link_partitions.py /data/partition3 --save &
wait
```
**No built-in parallelization.**
### Vertical Scaling
**CPU:** Single-threaded. More CPU cores don't help.
**Memory:** Minimal usage. Each Align instance uses ~1KB. Batch processing uses more for pandas DataFrame.
**Network:** Bottleneck. Sequential API calls. More bandwidth doesn't help (latency-bound).
### Performance Optimization
**No performance optimization.**
**Bottlenecks:**
- Network latency (sequential API calls)
- No caching across instances
- No connection pooling
- No request batching
**Potential optimizations:**
- Async/await for concurrent API calls
- Persistent cache (Redis)
- Connection pooling
- Batch API requests (if services support)
## Security Considerations
### Secrets Management
**Current approach:** Hardcoded in mml_secrets.py.
**Issues:**
- Plaintext credentials
- No encryption
- Risk of committing to version control
**Recommendations:**
- Environment variables
- Secrets vault (HashiCorp Vault, AWS Secrets Manager)
- Encrypted configuration files
### Network Security
**HTTPS:** All API calls use HTTPS.
**Certificate validation:** Handled by requests library (validates by default).
**No proxy support:** No configuration for HTTP proxies.
### Input Validation
**No input validation.**
**Risks:**
- Invalid MBIDs accepted
- Negative durations accepted
- Malformed ISRCs accepted
**Actual risk:** Low. Invalid input causes query failures (returns None).
### Dependency Security
**No dependency scanning.**
**No Dependabot, no Snyk, no safety.**
**Vulnerable dependencies:** Unknown. No automated checks.
**Recommendation:** Run `pip-audit` or `safety check` regularly.
## Backup and Recovery
### Data Backup
**No persistent data:** Nothing to back up (library is stateless).
**Batch output:** CSV and JAMS files. User responsible for backup.
### Disaster Recovery
**Not applicable:** Library has no state to recover.
**Batch processing:** Rerun if output lost. No checkpointing, no resume capability.
## Deployment Checklist
### Library Deployment
- [ ] Python 3.8+ installed
- [ ] pip installed
- [ ] git installed
- [ ] Network access to GitHub
- [ ] Network access to MusicBrainz, Deezer, YouTube Music
- [ ] (Optional) Spotify credentials in mml_secrets.py
### Batch Processing Deployment
- [ ] All library deployment requirements
- [ ] JAMS files prepared
- [ ] Write permissions for output directory
- [ ] (Optional) ffmpeg installed for audio conversion
- [ ] Sufficient disk space for output CSV and enriched JAMS files
### Production Deployment (Recommendations)
- [ ] Pin dependency versions in pyproject.toml
- [ ] Add automated tests
- [ ] Add CI/CD pipeline
- [ ] Add error tracking (Sentry)
- [ ] Add logging (structured JSON logs)
- [ ] Add monitoring (Prometheus metrics)
- [ ] Add rate limiting
- [ ] Add retry logic with exponential backoff
- [ ] Add health checks
- [ ] Use environment variables for configuration
- [ ] Add input validation
- [ ] Add dependency scanning
- [ ] Remove AcousticBrainz integration
- [ ] Fix User-Agent header
- [ ] Add documentation for Spotify setup
## Deployment Recommendations
### Immediate Actions
1. **Publish to PyPI:** Enable `pip install musicmetalinker` without git.
2. **Pin dependencies:** Add version constraints to prevent breaking changes.
3. **Document Spotify setup:** Instructions for obtaining credentials.
4. **Remove AcousticBrainz:** Delete defunct integration.
### Short-Term Improvements
1. **Add CI/CD:** GitHub Actions for automated testing and releases.
2. **Add tests:** pytest suite with mocked API calls.
3. **Add Docker support:** Official Dockerfile and Docker Compose.
4. **Add configuration:** Support environment variables and config files.
5. **Add logging:** Structured logging with configurable levels.
### Long-Term Enhancements
1. **Add monitoring:** Prometheus metrics for API latency, success rates.
2. **Add caching:** Redis for cross-instance caching.
3. **Add async support:** Concurrent API calls for better performance.
4. **Add health checks:** Service availability monitoring.
5. **Add error tracking:** Sentry integration for production debugging.
6. **Add documentation:** Comprehensive deployment guide.
7. **Add versioning:** Semantic versioning with changelog.
8. **Add security scanning:** Automated dependency vulnerability checks.
## Deployment Maturity Assessment
**Current state:** Research prototype. Suitable for academic exploration, not production.
**Maturity level:** 1/5
**Production readiness:** Low
**Gaps:**
- No PyPI distribution
- No CI/CD
- No tests
- No monitoring
- No error tracking
- Hardcoded configuration
- Dead code (AcousticBrainz)
- No documentation for deployment
**Recommendation:** Use for research and prototyping only. Significant work required for production deployment.