Note
Originally developed as a final project for the BSc Computer Science degree at Goldsmiths, University of London (available here). This repository continues that work, aiming to transform it into a deployable music discovery web app.
-
Install required software:
Python@3.12.4,PostgreSQL@17.6,Node.js@v20.17.0 -
Create a config file in
backend/.envwith DB login information, see.env.example -
Create the DB and user
--Optional commands if DB/USER were created previously
--REVOKE ALL ON SCHEMA public FROM django;
--DROP DATABASE IF EXISTS taste_mender_db;
--DROP USER IF EXISTS django;
CREATE USER django WITH PASSWORD 'password';
CREATE DATABASE taste_mender_db WITH ENCODING 'UTF8' OWNER django;
GRANT ALL PRIVILEGES ON DATABASE taste_mender_db TO django;
GRANT ALL PRIVILEGES ON SCHEMA public TO django;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO django;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO django;
-- Needed for creating a DB when running tests
ALTER USER django CREATEDB;-
Load data into the DB (ingest), see below
-
Install Django dependencies, check that everything is running:
cd backend/
pip install -r requirements.txt
python manage.py migrate
python manage.py test
python manage.py runserver- Install React dependencies:
cd frontend/
npm install
npm run devIdeally you should have access to the already-built database in SQLite format and the features NPZ file. If this isn't the case you can replicate the DB from scratch using the instructions below. For development, the sample data should be enough.
The first step is to download the dataset dumps from AcousticBrainz, these contain track metadata and the audio features used to determine song similarity, link: https://acousticbrainz.org/download. I recommend using a structure like this:
AcousticBrainzSampleacousticbrainz-highlevel-sample-json-20220623-0.tar.zst
High-Levelacousticbrainz-highlevel-json-20220623-0.tar.zstacousticbrainz-highlevel-json-20220623-1.tar.zst...
You download the datasets from a browser or by using these commands:
# Sample DB dump with 100k entries, good for development
mkdir Sample
cd Sample
wget -P . https://data.metabrainz.org/pub/musicbrainz/acousticbrainz/dumps/acousticbrainz-sample-json-20220623/acousticbrainz-highlevel-sample-json-20220623-0.tar.zst
# Full DB dump with 30M entries, good for production
mkdir High-level
cd High-level
wget -r -np -nH --cut-dirs=5 -P . https://data.metabrainz.org/pub/musicbrainz/acousticbrainz/dumps/acousticbrainz-highlevel-json-20220623/
# Check that the downloaded files aren't corrupted
sha256sum -c sha256sumsThen update the project's .env file with the paths to the dumps, ex:
AB_HIGHLEVEL_ROOT=D:/Datasets/AcousticBrainz/High-level
AB_SAMPLE_ROOT=D:/Datasets/AcousticBrainz/SampleFinally you can now build the SQLite database and the features file (features_and_index.npz):
# Build the Django DB and the in-memory vector store for audio features
python manage.py build_db # Use all available parts of dataset OR
python manage.py build_db --parts 2 # Use 2 parts of dataset OR
python manage.py build_db --sample # Use the sample dataset with 100k entriesbackend/music_recommendation/- the main Django projectrecommend_api/- recommendation APIservices/recommender.py- recommendation logicyoutube_sources.py- gets playable sources for tracks
tests/- unit testsapi.py- endpoint views
ingest/- scripts for building the DBmanagement/commands/build_db.py- dataset ingest and DB build commandrecommend.py- command for showing recommendations
frontend/- standalone app that consumes the API
Track data is loaded from the DB dumps of the AcousticBrainz dataset. The build pipeline does the following:
- Stream JSON data from
.tar.zstarchives, processing the archives in parallel - Extract relevant information from each file (title, audio features, metadata), discarding those that have missing or invalid data
- Build a hashmap (
track_index) of duplicate tracks indexed by their MusicBrainz ID (musicbrainz_recordingid) - Merge duplicates into a single entry by selecting the most common value for each field (title, audio features, metadata)
- Build the DB models:
Trackfromtrack_indexArtist,Albumand M2M pairings (AlbumArtist,TrackArtist) from the track metadata- Extract audio features to a separate file (
features_and_index.npz), this will be loaded into memory by the Django app to allow for fast searching
Because many popular tracks are duplicated in the dataset, the final number of tracks that the app will be working with is considerably lower than what was ingested.
For the sample dataset (100k tracks), 85732 unique entries will be loaded:
This project uses Docker to build and manage a reproducible environment that runs the same both locally and in production. This removes the need of having some special setup that exists solely on the server and isn't included in the repo.
# run in repo root
docker build . -f ./backend/Dockerfile -t taste-mender-image
docker run --name taste-mender-web -p 8000:8000 taste-mender-image
docker stop taste-mender-webTip
In Windows you may need to stop WSL from running distros in the background, to do this run:
# list running distros
wsl -l -v
# terminate one to free up RAM
wsl -t {NAME}ssh root@your-server-ip
cd ~
# install Nginx and Node
sudo apt update
sudo apt install -y nginx
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
nvm install v20.17.0
nvm use v20.17.0
# enable and start
sudo systemctl enable nginx
sudo systemctl start nginx
# open ports so nginx can serve front-end
sudo ufw allow 80/tcp # HTTP
sudo ufw allow 443/tcp # HTTPS (for certs later)
sudo systemctl reload ufw
sudo ufw status
# create Nginx config
sudo touch /etc/nginx/sites-available/tastemender
code /etc/nginx/sites-available/tastemender[NOTE] TODO: Enable SSL
server {
listen 80;
server_name taste-mender.com www.taste-mender.com your-server-ip;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /static/ {
proxy_pass http://localhost:8000/static/;
# Cache static files
expires 30d;
}
}
# Enable the config
sudo ln -s /etc/nginx/sites-available/tastemender /etc/nginx/sites-enabled/
sudo nginx -t # check config syntax
sudo systemctl reload nginx
# Setup project
git clone https://github.com/RadValentin/CM3070-FP-Music-Recommendation.git tastemender
cd tastemender
# copy `features_and_index.npz` that was built locally during ingest to backend/
# create .env file in backend/
touch backend/.env
# add production values (see .env.example)
code backend/.env# install Python 3.13 and venv
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install -y python3.13 python3.13-venv python3-pip build-essential
# build frontend
cd ~/tastemender/frontend
npm install && npm run build
# create virtual environment
cd ~/tastemender/backend
python3.13 -m venv venv
source venv/bin/activate
# install requirements
pip install -r requirements.txt
pip install gunicorn
python manage.py collectstatic --noinput
gunicorn music_recommendation.wsgi:application \
--bind 0.0.0.0:8000 \
--workers 1 \
--timeout 300 \
--preload \
--access-logfile - \
--error-logfile - \
--daemoncd ~/tastemender
git pull
# build the frontend
cd frontend && npm install && npm run build && cd ..
# build and start container
# from repo root
docker compose -f backend/docker-compose.yml up --build -d
# run migrations
docker exec taste-mender-web python manage.py migrate
# check logs
docker logs -f taste-mender-web
docker stop taste-mender-web