a1f6701bac
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
612 lines
14 KiB
Markdown
612 lines
14 KiB
Markdown
# MusicMetaLinker Deployment
|
|
|
|
## Distribution Model
|
|
|
|
MusicMetaLinker is distributed as source code only. No binary distributions, no PyPI package, no conda package.
|
|
|
|
**Installation method:** Direct from GitHub via pip.
|
|
|
|
```bash
|
|
pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
|
```
|
|
|
|
**Implications:**
|
|
- Requires git installed
|
|
- Requires network access to GitHub
|
|
- No version pinning (always installs latest commit)
|
|
- No offline installation
|
|
|
|
## Build System
|
|
|
|
### Build Backend
|
|
|
|
**PEP 517 compliant:** Uses pyproject.toml for build configuration.
|
|
|
|
**Build backend:** hatchling (modern Python build tool).
|
|
|
|
**pyproject.toml structure:**
|
|
|
|
```toml
|
|
[build-system]
|
|
requires = ["hatchling"]
|
|
build-backend = "hatchling.build"
|
|
|
|
[project]
|
|
name = "musicmetalinker"
|
|
version = "0.0.1"
|
|
dependencies = [
|
|
"musicbrainzngs",
|
|
"deezer-python",
|
|
"ytmusicapi",
|
|
"spotipy",
|
|
"requests",
|
|
"tqdm",
|
|
"jams",
|
|
"pandas",
|
|
"cryptography"
|
|
]
|
|
```
|
|
|
|
**No setup.py:** Modern packaging only.
|
|
|
|
**No setup.cfg:** All configuration in pyproject.toml.
|
|
|
|
### Build Process
|
|
|
|
**Local build:**
|
|
|
|
```bash
|
|
git clone https://github.com/andreamust/MusicMetaLinker.git
|
|
cd MusicMetaLinker
|
|
pip install -e .
|
|
```
|
|
|
|
**-e flag:** Editable install. Changes to source code immediately reflected.
|
|
|
|
**Build artifacts:** None. Pure Python package, no compilation.
|
|
|
|
### Dependencies
|
|
|
|
**Runtime dependencies:**
|
|
|
|
- musicbrainzngs: MusicBrainz API client
|
|
- deezer-python: Deezer API wrapper
|
|
- ytmusicapi: YouTube Music API client
|
|
- spotipy: Spotify API client
|
|
- requests: HTTP library
|
|
- tqdm: Progress bars
|
|
- jams: JAMS format support
|
|
- pandas: CSV output
|
|
- cryptography: Required by spotipy
|
|
|
|
**No optional dependencies:** All dependencies required.
|
|
|
|
**No development dependencies:** No test framework, no linting tools, no type checkers.
|
|
|
|
**Dependency versions:** No version constraints. Always installs latest compatible versions.
|
|
|
|
**Risk:** Breaking changes in dependencies may break MusicMetaLinker.
|
|
|
|
## Deployment Environments
|
|
|
|
### Library Deployment
|
|
|
|
**Target environment:** Python 3.8+ on any platform (Linux, macOS, Windows).
|
|
|
|
**Installation:**
|
|
|
|
```bash
|
|
pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
|
```
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
from musicmetalinker.linking import Align
|
|
|
|
linker = Align(artist="...", track="...")
|
|
mbid = linker.get_mbid()
|
|
```
|
|
|
|
**No configuration required** (except Spotify credentials for dataset preparation).
|
|
|
|
### Batch Processing Deployment
|
|
|
|
**Target environment:** Python 3.8+ with file system access.
|
|
|
|
**Installation:** Same as library deployment.
|
|
|
|
**Usage:**
|
|
|
|
```bash
|
|
cd /path/to/MusicMetaLinker
|
|
python link_partitions.py /path/to/jams/files --save --limit audio --overwrite
|
|
```
|
|
|
|
**Requirements:**
|
|
- JAMS files in target directory
|
|
- Write permissions for output CSV and enriched JAMS files
|
|
- Network access for API queries
|
|
|
|
**Optional:** ffmpeg for audio conversion (if processing audio files directly).
|
|
|
|
### Research Environment Deployment
|
|
|
|
**Typical setup:** Jupyter notebook or Python script in research project.
|
|
|
|
**Installation:**
|
|
|
|
```bash
|
|
pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
|
```
|
|
|
|
**Interactive testing:**
|
|
|
|
Notebooks included in repository:
|
|
- deezer_test.ipynb: Test Deezer integration
|
|
- queries.ipynb: Test various query patterns
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
# In Jupyter notebook
|
|
from musicmetalinker.linking import Align
|
|
|
|
linker = Align(...)
|
|
# Interactive exploration of results
|
|
```
|
|
|
|
## Configuration Management
|
|
|
|
### No Configuration Files
|
|
|
|
All configuration hardcoded in source files.
|
|
|
|
**Hardcoded values:**
|
|
- User-Agent: "elka/0.1" (in linking.py)
|
|
- Duration thresholds: 3s (Deezer), 5s (MusicBrainz)
|
|
- Similarity threshold: 0.8
|
|
- API endpoints: In library code
|
|
|
|
**No config.ini, no config.yaml, no .env files.**
|
|
|
|
### Spotify Credentials
|
|
|
|
**Only external configuration:** mml_secrets.py for Spotify credentials.
|
|
|
|
**Location:** Must be in Python path (typically same directory as scripts).
|
|
|
|
**Structure:**
|
|
|
|
```python
|
|
# mml_secrets.py
|
|
SPOTIFY_CLIENT_ID = "your-client-id-here"
|
|
SPOTIFY_CLIENT_SECRET = "your-client-secret-here"
|
|
```
|
|
|
|
**Not in repository:** Users must create this file manually.
|
|
|
|
**No documentation:** No instructions for obtaining Spotify credentials.
|
|
|
|
**Obtaining credentials:**
|
|
1. Register app at https://developer.spotify.com/dashboard
|
|
2. Copy client ID and secret
|
|
3. Create mml_secrets.py with credentials
|
|
|
|
### Environment Variables
|
|
|
|
**Not used:** No environment variable configuration.
|
|
|
|
**Recommendation:** Use environment variables for credentials instead of mml_secrets.py.
|
|
|
|
```python
|
|
import os
|
|
|
|
SPOTIFY_CLIENT_ID = os.getenv("SPOTIFY_CLIENT_ID")
|
|
SPOTIFY_CLIENT_SECRET = os.getenv("SPOTIFY_CLIENT_SECRET")
|
|
```
|
|
|
|
## Runtime Requirements
|
|
|
|
### Python Version
|
|
|
|
**Minimum:** Python 3.8
|
|
|
|
**Tested on:** Unknown (no CI/CD, no test matrix).
|
|
|
|
**Likely compatible:** Python 3.8, 3.9, 3.10, 3.11, 3.12
|
|
|
|
**Type hints:** Not used extensively. No runtime type checking.
|
|
|
|
### System Dependencies
|
|
|
|
**Required:**
|
|
- Python 3.8+
|
|
- pip
|
|
- git (for installation)
|
|
- Network access (for API queries)
|
|
|
|
**Optional:**
|
|
- ffmpeg (for audio conversion in batch processing)
|
|
|
|
**No database:** No PostgreSQL, MySQL, MongoDB, etc.
|
|
|
|
**No message queue:** No RabbitMQ, Redis, Kafka, etc.
|
|
|
|
**No web server:** No nginx, Apache, etc.
|
|
|
|
### Platform Support
|
|
|
|
**Linux:** Fully supported. Primary development platform (likely).
|
|
|
|
**macOS:** Fully supported. All dependencies available.
|
|
|
|
**Windows:** Likely supported. All dependencies have Windows wheels. Potential issues:
|
|
- Path separators (/ vs \)
|
|
- Line endings (LF vs CRLF)
|
|
- Case-sensitive file systems
|
|
|
|
**No platform-specific code:** Pure Python, no C extensions (except in dependencies).
|
|
|
|
## Containerization
|
|
|
|
### Docker
|
|
|
|
**No Dockerfile provided.**
|
|
|
|
**Sample Dockerfile:**
|
|
|
|
```dockerfile
|
|
FROM python:3.11-slim
|
|
|
|
WORKDIR /app
|
|
|
|
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
|
|
|
|
RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
|
|
|
COPY mml_secrets.py /app/
|
|
|
|
CMD ["python"]
|
|
```
|
|
|
|
**For batch processing:**
|
|
|
|
```dockerfile
|
|
FROM python:3.11-slim
|
|
|
|
WORKDIR /app
|
|
|
|
RUN apt-get update && apt-get install -y git ffmpeg && rm -rf /var/lib/apt/lists/*
|
|
|
|
RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
|
|
|
RUN git clone https://github.com/andreamust/MusicMetaLinker.git /app/MusicMetaLinker
|
|
|
|
WORKDIR /app/MusicMetaLinker
|
|
|
|
ENTRYPOINT ["python", "link_partitions.py"]
|
|
```
|
|
|
|
**Usage:**
|
|
|
|
```bash
|
|
docker build -t musicmetalinker .
|
|
docker run -v /path/to/jams:/data musicmetalinker /data --save
|
|
```
|
|
|
|
### Docker Compose
|
|
|
|
**Not provided.**
|
|
|
|
**Sample docker-compose.yml:**
|
|
|
|
```yaml
|
|
version: '3.8'
|
|
|
|
services:
|
|
musicmetalinker:
|
|
build: .
|
|
volumes:
|
|
- ./data:/data
|
|
- ./output:/output
|
|
environment:
|
|
- SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
|
|
- SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
|
|
```
|
|
|
|
### Kubernetes
|
|
|
|
**Not applicable:** MusicMetaLinker is a library/batch tool, not a long-running service.
|
|
|
|
**Possible use case:** Kubernetes Job for batch processing.
|
|
|
|
```yaml
|
|
apiVersion: batch/v1
|
|
kind: Job
|
|
metadata:
|
|
name: musicmetalinker-batch
|
|
spec:
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: musicmetalinker
|
|
image: musicmetalinker:latest
|
|
args: ["/data", "--save"]
|
|
volumeMounts:
|
|
- name: data
|
|
mountPath: /data
|
|
restartPolicy: Never
|
|
volumes:
|
|
- name: data
|
|
persistentVolumeClaim:
|
|
claimName: jams-data
|
|
```
|
|
|
|
## Continuous Integration/Continuous Deployment
|
|
|
|
### CI/CD Status
|
|
|
|
**No CI/CD pipeline.**
|
|
|
|
**No GitHub Actions, no Travis CI, no CircleCI, no Jenkins.**
|
|
|
|
**Implications:**
|
|
- No automated testing on commits
|
|
- No automated builds
|
|
- No automated releases
|
|
- No quality gates
|
|
|
|
### Testing
|
|
|
|
**No test suite.**
|
|
|
|
**No pytest, no unittest, no nose.**
|
|
|
|
**Testing approach:**
|
|
- Manual testing via Jupyter notebooks
|
|
- if __name__ == "__main__" blocks in some modules
|
|
|
|
**No test coverage metrics.**
|
|
|
|
### Linting and Formatting
|
|
|
|
**No linting configuration.**
|
|
|
|
**No pylint, no flake8, no black, no isort.**
|
|
|
|
**Code quality:** Inconsistent. Debug prints, commented-out code, inconsistent naming.
|
|
|
|
### Type Checking
|
|
|
|
**No type checking.**
|
|
|
|
**No mypy, no pyright, no pyre.**
|
|
|
|
**Type hints:** Minimal. Not enforced.
|
|
|
|
## Monitoring and Logging
|
|
|
|
### Logging
|
|
|
|
**Library usage:** Minimal console logging.
|
|
|
|
**Batch processing:** File-based logging to link_partitions.log.
|
|
|
|
**Log format:**
|
|
|
|
```
|
|
2024-01-15 10:30:45 - INFO - Processing file: track001.jams
|
|
2024-01-15 10:30:46 - INFO - Found MBID: 6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e
|
|
2024-01-15 10:30:47 - ERROR - Failed to query Deezer
|
|
```
|
|
|
|
**Log levels:** INFO, ERROR. No DEBUG, WARNING.
|
|
|
|
**Debug output:** Multiple print() statements in code (not controlled by logging).
|
|
|
|
### Monitoring
|
|
|
|
**No monitoring.**
|
|
|
|
**No metrics collection, no Prometheus, no Grafana, no Datadog.**
|
|
|
|
**No health checks, no status endpoints.**
|
|
|
|
### Error Tracking
|
|
|
|
**No error tracking.**
|
|
|
|
**No Sentry, no Rollbar, no Bugsnag.**
|
|
|
|
**Errors silently suppressed.** Returns None on failure.
|
|
|
|
## Scaling Considerations
|
|
|
|
### Horizontal Scaling
|
|
|
|
**Not applicable:** Library runs in single process.
|
|
|
|
**Batch processing:** Can be parallelized manually.
|
|
|
|
**Manual parallelization:**
|
|
|
|
```bash
|
|
# Split JAMS files into partitions
|
|
# Run multiple instances in parallel
|
|
python link_partitions.py /data/partition1 --save &
|
|
python link_partitions.py /data/partition2 --save &
|
|
python link_partitions.py /data/partition3 --save &
|
|
wait
|
|
```
|
|
|
|
**No built-in parallelization.**
|
|
|
|
### Vertical Scaling
|
|
|
|
**CPU:** Single-threaded. More CPU cores don't help.
|
|
|
|
**Memory:** Minimal usage. Each Align instance uses ~1KB. Batch processing uses more for pandas DataFrame.
|
|
|
|
**Network:** Bottleneck. Sequential API calls. More bandwidth doesn't help (latency-bound).
|
|
|
|
### Performance Optimization
|
|
|
|
**No performance optimization.**
|
|
|
|
**Bottlenecks:**
|
|
- Network latency (sequential API calls)
|
|
- No caching across instances
|
|
- No connection pooling
|
|
- No request batching
|
|
|
|
**Potential optimizations:**
|
|
- Async/await for concurrent API calls
|
|
- Persistent cache (Redis)
|
|
- Connection pooling
|
|
- Batch API requests (if services support)
|
|
|
|
## Security Considerations
|
|
|
|
### Secrets Management
|
|
|
|
**Current approach:** Hardcoded in mml_secrets.py.
|
|
|
|
**Issues:**
|
|
- Plaintext credentials
|
|
- No encryption
|
|
- Risk of committing to version control
|
|
|
|
**Recommendations:**
|
|
- Environment variables
|
|
- Secrets vault (HashiCorp Vault, AWS Secrets Manager)
|
|
- Encrypted configuration files
|
|
|
|
### Network Security
|
|
|
|
**HTTPS:** All API calls use HTTPS.
|
|
|
|
**Certificate validation:** Handled by requests library (validates by default).
|
|
|
|
**No proxy support:** No configuration for HTTP proxies.
|
|
|
|
### Input Validation
|
|
|
|
**No input validation.**
|
|
|
|
**Risks:**
|
|
- Invalid MBIDs accepted
|
|
- Negative durations accepted
|
|
- Malformed ISRCs accepted
|
|
|
|
**Actual risk:** Low. Invalid input causes query failures (returns None).
|
|
|
|
### Dependency Security
|
|
|
|
**No dependency scanning.**
|
|
|
|
**No Dependabot, no Snyk, no safety.**
|
|
|
|
**Vulnerable dependencies:** Unknown. No automated checks.
|
|
|
|
**Recommendation:** Run `pip-audit` or `safety check` regularly.
|
|
|
|
## Backup and Recovery
|
|
|
|
### Data Backup
|
|
|
|
**No persistent data:** Nothing to back up (library is stateless).
|
|
|
|
**Batch output:** CSV and JAMS files. User responsible for backup.
|
|
|
|
### Disaster Recovery
|
|
|
|
**Not applicable:** Library has no state to recover.
|
|
|
|
**Batch processing:** Rerun if output lost. No checkpointing, no resume capability.
|
|
|
|
## Deployment Checklist
|
|
|
|
### Library Deployment
|
|
|
|
- [ ] Python 3.8+ installed
|
|
- [ ] pip installed
|
|
- [ ] git installed
|
|
- [ ] Network access to GitHub
|
|
- [ ] Network access to MusicBrainz, Deezer, YouTube Music
|
|
- [ ] (Optional) Spotify credentials in mml_secrets.py
|
|
|
|
### Batch Processing Deployment
|
|
|
|
- [ ] All library deployment requirements
|
|
- [ ] JAMS files prepared
|
|
- [ ] Write permissions for output directory
|
|
- [ ] (Optional) ffmpeg installed for audio conversion
|
|
- [ ] Sufficient disk space for output CSV and enriched JAMS files
|
|
|
|
### Production Deployment (Recommendations)
|
|
|
|
- [ ] Pin dependency versions in pyproject.toml
|
|
- [ ] Add automated tests
|
|
- [ ] Add CI/CD pipeline
|
|
- [ ] Add error tracking (Sentry)
|
|
- [ ] Add logging (structured JSON logs)
|
|
- [ ] Add monitoring (Prometheus metrics)
|
|
- [ ] Add rate limiting
|
|
- [ ] Add retry logic with exponential backoff
|
|
- [ ] Add health checks
|
|
- [ ] Use environment variables for configuration
|
|
- [ ] Add input validation
|
|
- [ ] Add dependency scanning
|
|
- [ ] Remove AcousticBrainz integration
|
|
- [ ] Fix User-Agent header
|
|
- [ ] Add documentation for Spotify setup
|
|
|
|
## Deployment Recommendations
|
|
|
|
### Immediate Actions
|
|
|
|
1. **Publish to PyPI:** Enable `pip install musicmetalinker` without git.
|
|
2. **Pin dependencies:** Add version constraints to prevent breaking changes.
|
|
3. **Document Spotify setup:** Instructions for obtaining credentials.
|
|
4. **Remove AcousticBrainz:** Delete defunct integration.
|
|
|
|
### Short-Term Improvements
|
|
|
|
1. **Add CI/CD:** GitHub Actions for automated testing and releases.
|
|
2. **Add tests:** pytest suite with mocked API calls.
|
|
3. **Add Docker support:** Official Dockerfile and Docker Compose.
|
|
4. **Add configuration:** Support environment variables and config files.
|
|
5. **Add logging:** Structured logging with configurable levels.
|
|
|
|
### Long-Term Enhancements
|
|
|
|
1. **Add monitoring:** Prometheus metrics for API latency, success rates.
|
|
2. **Add caching:** Redis for cross-instance caching.
|
|
3. **Add async support:** Concurrent API calls for better performance.
|
|
4. **Add health checks:** Service availability monitoring.
|
|
5. **Add error tracking:** Sentry integration for production debugging.
|
|
6. **Add documentation:** Comprehensive deployment guide.
|
|
7. **Add versioning:** Semantic versioning with changelog.
|
|
8. **Add security scanning:** Automated dependency vulnerability checks.
|
|
|
|
## Deployment Maturity Assessment
|
|
|
|
**Current state:** Research prototype. Suitable for academic exploration, not production.
|
|
|
|
**Maturity level:** 1/5
|
|
|
|
**Production readiness:** Low
|
|
|
|
**Gaps:**
|
|
- No PyPI distribution
|
|
- No CI/CD
|
|
- No tests
|
|
- No monitoring
|
|
- No error tracking
|
|
- Hardcoded configuration
|
|
- Dead code (AcousticBrainz)
|
|
- No documentation for deployment
|
|
|
|
**Recommendation:** Use for research and prototyping only. Significant work required for production deployment.
|