feat: initial implementation of metadata aggregator
- gRPC service with MusicBrainz provider - PostgreSQL schema with migrations - Service layer with database-first caching - Repository pattern for data access - YAML configuration support - Research documentation for 17 music metadata projects
This commit is contained in:
@@ -0,0 +1,611 @@
|
||||
# MusicMetaLinker Deployment
|
||||
|
||||
## Distribution Model
|
||||
|
||||
MusicMetaLinker is distributed as source code only. No binary distributions, no PyPI package, no conda package.
|
||||
|
||||
**Installation method:** Direct from GitHub via pip.
|
||||
|
||||
```bash
|
||||
pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
||||
```
|
||||
|
||||
**Implications:**
|
||||
- Requires git installed
|
||||
- Requires network access to GitHub
|
||||
- No version pinning (always installs latest commit)
|
||||
- No offline installation
|
||||
|
||||
## Build System
|
||||
|
||||
### Build Backend
|
||||
|
||||
**PEP 517 compliant:** Uses pyproject.toml for build configuration.
|
||||
|
||||
**Build backend:** hatchling (modern Python build tool).
|
||||
|
||||
**pyproject.toml structure:**
|
||||
|
||||
```toml
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
|
||||
[project]
|
||||
name = "musicmetalinker"
|
||||
version = "0.0.1"
|
||||
dependencies = [
|
||||
"musicbrainzngs",
|
||||
"deezer-python",
|
||||
"ytmusicapi",
|
||||
"spotipy",
|
||||
"requests",
|
||||
"tqdm",
|
||||
"jams",
|
||||
"pandas",
|
||||
"cryptography"
|
||||
]
|
||||
```
|
||||
|
||||
**No setup.py:** Modern packaging only.
|
||||
|
||||
**No setup.cfg:** All configuration in pyproject.toml.
|
||||
|
||||
### Build Process
|
||||
|
||||
**Local build:**
|
||||
|
||||
```bash
|
||||
git clone https://github.com/andreamust/MusicMetaLinker.git
|
||||
cd MusicMetaLinker
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
**-e flag:** Editable install. Changes to source code immediately reflected.
|
||||
|
||||
**Build artifacts:** None. Pure Python package, no compilation.
|
||||
|
||||
### Dependencies
|
||||
|
||||
**Runtime dependencies:**
|
||||
|
||||
- musicbrainzngs: MusicBrainz API client
|
||||
- deezer-python: Deezer API wrapper
|
||||
- ytmusicapi: YouTube Music API client
|
||||
- spotipy: Spotify API client
|
||||
- requests: HTTP library
|
||||
- tqdm: Progress bars
|
||||
- jams: JAMS format support
|
||||
- pandas: CSV output
|
||||
- cryptography: Required by spotipy
|
||||
|
||||
**No optional dependencies:** All dependencies required.
|
||||
|
||||
**No development dependencies:** No test framework, no linting tools, no type checkers.
|
||||
|
||||
**Dependency versions:** No version constraints. Always installs latest compatible versions.
|
||||
|
||||
**Risk:** Breaking changes in dependencies may break MusicMetaLinker.
|
||||
|
||||
## Deployment Environments
|
||||
|
||||
### Library Deployment
|
||||
|
||||
**Target environment:** Python 3.8+ on any platform (Linux, macOS, Windows).
|
||||
|
||||
**Installation:**
|
||||
|
||||
```bash
|
||||
pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
```python
|
||||
from musicmetalinker.linking import Align
|
||||
|
||||
linker = Align(artist="...", track="...")
|
||||
mbid = linker.get_mbid()
|
||||
```
|
||||
|
||||
**No configuration required** (except Spotify credentials for dataset preparation).
|
||||
|
||||
### Batch Processing Deployment
|
||||
|
||||
**Target environment:** Python 3.8+ with file system access.
|
||||
|
||||
**Installation:** Same as library deployment.
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
cd /path/to/MusicMetaLinker
|
||||
python link_partitions.py /path/to/jams/files --save --limit audio --overwrite
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
- JAMS files in target directory
|
||||
- Write permissions for output CSV and enriched JAMS files
|
||||
- Network access for API queries
|
||||
|
||||
**Optional:** ffmpeg for audio conversion (if processing audio files directly).
|
||||
|
||||
### Research Environment Deployment
|
||||
|
||||
**Typical setup:** Jupyter notebook or Python script in research project.
|
||||
|
||||
**Installation:**
|
||||
|
||||
```bash
|
||||
pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
||||
```
|
||||
|
||||
**Interactive testing:**
|
||||
|
||||
Notebooks included in repository:
|
||||
- deezer_test.ipynb: Test Deezer integration
|
||||
- queries.ipynb: Test various query patterns
|
||||
|
||||
**Usage:**
|
||||
|
||||
```python
|
||||
# In Jupyter notebook
|
||||
from musicmetalinker.linking import Align
|
||||
|
||||
linker = Align(...)
|
||||
# Interactive exploration of results
|
||||
```
|
||||
|
||||
## Configuration Management
|
||||
|
||||
### No Configuration Files
|
||||
|
||||
All configuration hardcoded in source files.
|
||||
|
||||
**Hardcoded values:**
|
||||
- User-Agent: "elka/0.1" (in linking.py)
|
||||
- Duration thresholds: 3s (Deezer), 5s (MusicBrainz)
|
||||
- Similarity threshold: 0.8
|
||||
- API endpoints: In library code
|
||||
|
||||
**No config.ini, no config.yaml, no .env files.**
|
||||
|
||||
### Spotify Credentials
|
||||
|
||||
**Only external configuration:** mml_secrets.py for Spotify credentials.
|
||||
|
||||
**Location:** Must be in Python path (typically same directory as scripts).
|
||||
|
||||
**Structure:**
|
||||
|
||||
```python
|
||||
# mml_secrets.py
|
||||
SPOTIFY_CLIENT_ID = "your-client-id-here"
|
||||
SPOTIFY_CLIENT_SECRET = "your-client-secret-here"
|
||||
```
|
||||
|
||||
**Not in repository:** Users must create this file manually.
|
||||
|
||||
**No documentation:** No instructions for obtaining Spotify credentials.
|
||||
|
||||
**Obtaining credentials:**
|
||||
1. Register app at https://developer.spotify.com/dashboard
|
||||
2. Copy client ID and secret
|
||||
3. Create mml_secrets.py with credentials
|
||||
|
||||
### Environment Variables
|
||||
|
||||
**Not used:** No environment variable configuration.
|
||||
|
||||
**Recommendation:** Use environment variables for credentials instead of mml_secrets.py.
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
SPOTIFY_CLIENT_ID = os.getenv("SPOTIFY_CLIENT_ID")
|
||||
SPOTIFY_CLIENT_SECRET = os.getenv("SPOTIFY_CLIENT_SECRET")
|
||||
```
|
||||
|
||||
## Runtime Requirements
|
||||
|
||||
### Python Version
|
||||
|
||||
**Minimum:** Python 3.8
|
||||
|
||||
**Tested on:** Unknown (no CI/CD, no test matrix).
|
||||
|
||||
**Likely compatible:** Python 3.8, 3.9, 3.10, 3.11, 3.12
|
||||
|
||||
**Type hints:** Not used extensively. No runtime type checking.
|
||||
|
||||
### System Dependencies
|
||||
|
||||
**Required:**
|
||||
- Python 3.8+
|
||||
- pip
|
||||
- git (for installation)
|
||||
- Network access (for API queries)
|
||||
|
||||
**Optional:**
|
||||
- ffmpeg (for audio conversion in batch processing)
|
||||
|
||||
**No database:** No PostgreSQL, MySQL, MongoDB, etc.
|
||||
|
||||
**No message queue:** No RabbitMQ, Redis, Kafka, etc.
|
||||
|
||||
**No web server:** No nginx, Apache, etc.
|
||||
|
||||
### Platform Support
|
||||
|
||||
**Linux:** Fully supported. Primary development platform (likely).
|
||||
|
||||
**macOS:** Fully supported. All dependencies available.
|
||||
|
||||
**Windows:** Likely supported. All dependencies have Windows wheels. Potential issues:
|
||||
- Path separators (/ vs \)
|
||||
- Line endings (LF vs CRLF)
|
||||
- Case-sensitive file systems
|
||||
|
||||
**No platform-specific code:** Pure Python, no C extensions (except in dependencies).
|
||||
|
||||
## Containerization
|
||||
|
||||
### Docker
|
||||
|
||||
**No Dockerfile provided.**
|
||||
|
||||
**Sample Dockerfile:**
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
||||
|
||||
COPY mml_secrets.py /app/
|
||||
|
||||
CMD ["python"]
|
||||
```
|
||||
|
||||
**For batch processing:**
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN apt-get update && apt-get install -y git ffmpeg && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
RUN pip install git+https://github.com/andreamust/MusicMetaLinker.git
|
||||
|
||||
RUN git clone https://github.com/andreamust/MusicMetaLinker.git /app/MusicMetaLinker
|
||||
|
||||
WORKDIR /app/MusicMetaLinker
|
||||
|
||||
ENTRYPOINT ["python", "link_partitions.py"]
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
```bash
|
||||
docker build -t musicmetalinker .
|
||||
docker run -v /path/to/jams:/data musicmetalinker /data --save
|
||||
```
|
||||
|
||||
### Docker Compose
|
||||
|
||||
**Not provided.**
|
||||
|
||||
**Sample docker-compose.yml:**
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
musicmetalinker:
|
||||
build: .
|
||||
volumes:
|
||||
- ./data:/data
|
||||
- ./output:/output
|
||||
environment:
|
||||
- SPOTIFY_CLIENT_ID=${SPOTIFY_CLIENT_ID}
|
||||
- SPOTIFY_CLIENT_SECRET=${SPOTIFY_CLIENT_SECRET}
|
||||
```
|
||||
|
||||
### Kubernetes
|
||||
|
||||
**Not applicable:** MusicMetaLinker is a library/batch tool, not a long-running service.
|
||||
|
||||
**Possible use case:** Kubernetes Job for batch processing.
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: musicmetalinker-batch
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: musicmetalinker
|
||||
image: musicmetalinker:latest
|
||||
args: ["/data", "--save"]
|
||||
volumeMounts:
|
||||
- name: data
|
||||
mountPath: /data
|
||||
restartPolicy: Never
|
||||
volumes:
|
||||
- name: data
|
||||
persistentVolumeClaim:
|
||||
claimName: jams-data
|
||||
```
|
||||
|
||||
## Continuous Integration/Continuous Deployment
|
||||
|
||||
### CI/CD Status
|
||||
|
||||
**No CI/CD pipeline.**
|
||||
|
||||
**No GitHub Actions, no Travis CI, no CircleCI, no Jenkins.**
|
||||
|
||||
**Implications:**
|
||||
- No automated testing on commits
|
||||
- No automated builds
|
||||
- No automated releases
|
||||
- No quality gates
|
||||
|
||||
### Testing
|
||||
|
||||
**No test suite.**
|
||||
|
||||
**No pytest, no unittest, no nose.**
|
||||
|
||||
**Testing approach:**
|
||||
- Manual testing via Jupyter notebooks
|
||||
- if __name__ == "__main__" blocks in some modules
|
||||
|
||||
**No test coverage metrics.**
|
||||
|
||||
### Linting and Formatting
|
||||
|
||||
**No linting configuration.**
|
||||
|
||||
**No pylint, no flake8, no black, no isort.**
|
||||
|
||||
**Code quality:** Inconsistent. Debug prints, commented-out code, inconsistent naming.
|
||||
|
||||
### Type Checking
|
||||
|
||||
**No type checking.**
|
||||
|
||||
**No mypy, no pyright, no pyre.**
|
||||
|
||||
**Type hints:** Minimal. Not enforced.
|
||||
|
||||
## Monitoring and Logging
|
||||
|
||||
### Logging
|
||||
|
||||
**Library usage:** Minimal console logging.
|
||||
|
||||
**Batch processing:** File-based logging to link_partitions.log.
|
||||
|
||||
**Log format:**
|
||||
|
||||
```
|
||||
2024-01-15 10:30:45 - INFO - Processing file: track001.jams
|
||||
2024-01-15 10:30:46 - INFO - Found MBID: 6b9e7b9e-8f9e-4f9e-9f9e-9f9e9f9e9f9e
|
||||
2024-01-15 10:30:47 - ERROR - Failed to query Deezer
|
||||
```
|
||||
|
||||
**Log levels:** INFO, ERROR. No DEBUG, WARNING.
|
||||
|
||||
**Debug output:** Multiple print() statements in code (not controlled by logging).
|
||||
|
||||
### Monitoring
|
||||
|
||||
**No monitoring.**
|
||||
|
||||
**No metrics collection, no Prometheus, no Grafana, no Datadog.**
|
||||
|
||||
**No health checks, no status endpoints.**
|
||||
|
||||
### Error Tracking
|
||||
|
||||
**No error tracking.**
|
||||
|
||||
**No Sentry, no Rollbar, no Bugsnag.**
|
||||
|
||||
**Errors silently suppressed.** Returns None on failure.
|
||||
|
||||
## Scaling Considerations
|
||||
|
||||
### Horizontal Scaling
|
||||
|
||||
**Not applicable:** Library runs in single process.
|
||||
|
||||
**Batch processing:** Can be parallelized manually.
|
||||
|
||||
**Manual parallelization:**
|
||||
|
||||
```bash
|
||||
# Split JAMS files into partitions
|
||||
# Run multiple instances in parallel
|
||||
python link_partitions.py /data/partition1 --save &
|
||||
python link_partitions.py /data/partition2 --save &
|
||||
python link_partitions.py /data/partition3 --save &
|
||||
wait
|
||||
```
|
||||
|
||||
**No built-in parallelization.**
|
||||
|
||||
### Vertical Scaling
|
||||
|
||||
**CPU:** Single-threaded. More CPU cores don't help.
|
||||
|
||||
**Memory:** Minimal usage. Each Align instance uses ~1KB. Batch processing uses more for pandas DataFrame.
|
||||
|
||||
**Network:** Bottleneck. Sequential API calls. More bandwidth doesn't help (latency-bound).
|
||||
|
||||
### Performance Optimization
|
||||
|
||||
**No performance optimization.**
|
||||
|
||||
**Bottlenecks:**
|
||||
- Network latency (sequential API calls)
|
||||
- No caching across instances
|
||||
- No connection pooling
|
||||
- No request batching
|
||||
|
||||
**Potential optimizations:**
|
||||
- Async/await for concurrent API calls
|
||||
- Persistent cache (Redis)
|
||||
- Connection pooling
|
||||
- Batch API requests (if services support)
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Secrets Management
|
||||
|
||||
**Current approach:** Hardcoded in mml_secrets.py.
|
||||
|
||||
**Issues:**
|
||||
- Plaintext credentials
|
||||
- No encryption
|
||||
- Risk of committing to version control
|
||||
|
||||
**Recommendations:**
|
||||
- Environment variables
|
||||
- Secrets vault (HashiCorp Vault, AWS Secrets Manager)
|
||||
- Encrypted configuration files
|
||||
|
||||
### Network Security
|
||||
|
||||
**HTTPS:** All API calls use HTTPS.
|
||||
|
||||
**Certificate validation:** Handled by requests library (validates by default).
|
||||
|
||||
**No proxy support:** No configuration for HTTP proxies.
|
||||
|
||||
### Input Validation
|
||||
|
||||
**No input validation.**
|
||||
|
||||
**Risks:**
|
||||
- Invalid MBIDs accepted
|
||||
- Negative durations accepted
|
||||
- Malformed ISRCs accepted
|
||||
|
||||
**Actual risk:** Low. Invalid input causes query failures (returns None).
|
||||
|
||||
### Dependency Security
|
||||
|
||||
**No dependency scanning.**
|
||||
|
||||
**No Dependabot, no Snyk, no safety.**
|
||||
|
||||
**Vulnerable dependencies:** Unknown. No automated checks.
|
||||
|
||||
**Recommendation:** Run `pip-audit` or `safety check` regularly.
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
### Data Backup
|
||||
|
||||
**No persistent data:** Nothing to back up (library is stateless).
|
||||
|
||||
**Batch output:** CSV and JAMS files. User responsible for backup.
|
||||
|
||||
### Disaster Recovery
|
||||
|
||||
**Not applicable:** Library has no state to recover.
|
||||
|
||||
**Batch processing:** Rerun if output lost. No checkpointing, no resume capability.
|
||||
|
||||
## Deployment Checklist
|
||||
|
||||
### Library Deployment
|
||||
|
||||
- [ ] Python 3.8+ installed
|
||||
- [ ] pip installed
|
||||
- [ ] git installed
|
||||
- [ ] Network access to GitHub
|
||||
- [ ] Network access to MusicBrainz, Deezer, YouTube Music
|
||||
- [ ] (Optional) Spotify credentials in mml_secrets.py
|
||||
|
||||
### Batch Processing Deployment
|
||||
|
||||
- [ ] All library deployment requirements
|
||||
- [ ] JAMS files prepared
|
||||
- [ ] Write permissions for output directory
|
||||
- [ ] (Optional) ffmpeg installed for audio conversion
|
||||
- [ ] Sufficient disk space for output CSV and enriched JAMS files
|
||||
|
||||
### Production Deployment (Recommendations)
|
||||
|
||||
- [ ] Pin dependency versions in pyproject.toml
|
||||
- [ ] Add automated tests
|
||||
- [ ] Add CI/CD pipeline
|
||||
- [ ] Add error tracking (Sentry)
|
||||
- [ ] Add logging (structured JSON logs)
|
||||
- [ ] Add monitoring (Prometheus metrics)
|
||||
- [ ] Add rate limiting
|
||||
- [ ] Add retry logic with exponential backoff
|
||||
- [ ] Add health checks
|
||||
- [ ] Use environment variables for configuration
|
||||
- [ ] Add input validation
|
||||
- [ ] Add dependency scanning
|
||||
- [ ] Remove AcousticBrainz integration
|
||||
- [ ] Fix User-Agent header
|
||||
- [ ] Add documentation for Spotify setup
|
||||
|
||||
## Deployment Recommendations
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
1. **Publish to PyPI:** Enable `pip install musicmetalinker` without git.
|
||||
2. **Pin dependencies:** Add version constraints to prevent breaking changes.
|
||||
3. **Document Spotify setup:** Instructions for obtaining credentials.
|
||||
4. **Remove AcousticBrainz:** Delete defunct integration.
|
||||
|
||||
### Short-Term Improvements
|
||||
|
||||
1. **Add CI/CD:** GitHub Actions for automated testing and releases.
|
||||
2. **Add tests:** pytest suite with mocked API calls.
|
||||
3. **Add Docker support:** Official Dockerfile and Docker Compose.
|
||||
4. **Add configuration:** Support environment variables and config files.
|
||||
5. **Add logging:** Structured logging with configurable levels.
|
||||
|
||||
### Long-Term Enhancements
|
||||
|
||||
1. **Add monitoring:** Prometheus metrics for API latency, success rates.
|
||||
2. **Add caching:** Redis for cross-instance caching.
|
||||
3. **Add async support:** Concurrent API calls for better performance.
|
||||
4. **Add health checks:** Service availability monitoring.
|
||||
5. **Add error tracking:** Sentry integration for production debugging.
|
||||
6. **Add documentation:** Comprehensive deployment guide.
|
||||
7. **Add versioning:** Semantic versioning with changelog.
|
||||
8. **Add security scanning:** Automated dependency vulnerability checks.
|
||||
|
||||
## Deployment Maturity Assessment
|
||||
|
||||
**Current state:** Research prototype. Suitable for academic exploration, not production.
|
||||
|
||||
**Maturity level:** 1/5
|
||||
|
||||
**Production readiness:** Low
|
||||
|
||||
**Gaps:**
|
||||
- No PyPI distribution
|
||||
- No CI/CD
|
||||
- No tests
|
||||
- No monitoring
|
||||
- No error tracking
|
||||
- Hardcoded configuration
|
||||
- Dead code (AcousticBrainz)
|
||||
- No documentation for deployment
|
||||
|
||||
**Recommendation:** Use for research and prototyping only. Significant work required for production deployment.
|
||||
Reference in New Issue
Block a user