Skip to content

RadValentin/taste-mender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

145 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TasteMender: A stateless music recommendation API

codecov

Note

Originally developed as a final project for the BSc Computer Science degree at Goldsmiths, University of London (available here). This repository continues that work, aiming to transform it into a deployable music discovery web app.

Installation

  1. Install required software: Python@3.12.4, PostgreSQL@17.6, Node.js@v20.17.0

  2. Create a config file in backend/.env with DB login information, see .env.example

  3. Create the DB and user

--Optional commands if DB/USER were created previously
--REVOKE ALL ON SCHEMA public FROM django;
--DROP DATABASE IF EXISTS taste_mender_db;
--DROP USER IF EXISTS django;
CREATE USER django WITH PASSWORD 'password';
CREATE DATABASE taste_mender_db WITH ENCODING 'UTF8' OWNER django;
GRANT ALL PRIVILEGES ON DATABASE taste_mender_db TO django;
GRANT ALL PRIVILEGES ON SCHEMA public TO django;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO django;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO django;

-- Needed for creating a DB when running tests
ALTER USER django CREATEDB;
  1. Load data into the DB (ingest), see below

  2. Install Django dependencies, check that everything is running:

cd backend/
pip install -r requirements.txt
python manage.py migrate
python manage.py test
python manage.py runserver
  1. Install React dependencies:
cd frontend/
npm install
npm run dev

Building the database from scratch

Ideally you should have access to the already-built database in SQLite format and the features NPZ file. If this isn't the case you can replicate the DB from scratch using the instructions below. For development, the sample data should be enough.

The first step is to download the dataset dumps from AcousticBrainz, these contain track metadata and the audio features used to determine song similarity, link: https://acousticbrainz.org/download. I recommend using a structure like this:

  • AcousticBrainz
    • Sample
      • acousticbrainz-highlevel-sample-json-20220623-0.tar.zst
    • High-Level
      • acousticbrainz-highlevel-json-20220623-0.tar.zst
      • acousticbrainz-highlevel-json-20220623-1.tar.zst
      • ...

You download the datasets from a browser or by using these commands:

# Sample DB dump with 100k entries, good for development
mkdir Sample
cd Sample
wget -P . https://data.metabrainz.org/pub/musicbrainz/acousticbrainz/dumps/acousticbrainz-sample-json-20220623/acousticbrainz-highlevel-sample-json-20220623-0.tar.zst

# Full DB dump with 30M entries, good for production
mkdir High-level
cd High-level
wget -r -np -nH --cut-dirs=5 -P . https://data.metabrainz.org/pub/musicbrainz/acousticbrainz/dumps/acousticbrainz-highlevel-json-20220623/

# Check that the downloaded files aren't corrupted
sha256sum -c sha256sums

Then update the project's .env file with the paths to the dumps, ex:

AB_HIGHLEVEL_ROOT=D:/Datasets/AcousticBrainz/High-level
AB_SAMPLE_ROOT=D:/Datasets/AcousticBrainz/Sample

Finally you can now build the SQLite database and the features file (features_and_index.npz):

# Build the Django DB and the in-memory vector store for audio features
python manage.py build_db # Use all available parts of dataset OR
python manage.py build_db --parts 2 # Use 2 parts of dataset OR
python manage.py build_db --sample # Use the sample dataset with 100k entries

Repo Structure

  • backend/
    • music_recommendation/ - the main Django project
    • recommend_api/ - recommendation API
      • services/
        • recommender.py - recommendation logic
        • youtube_sources.py - gets playable sources for tracks
      • tests/ - unit tests
      • api.py - endpoint views
    • ingest/ - scripts for building the DB
      • management/commands/
        • build_db.py - dataset ingest and DB build command
        • recommend.py - command for showing recommendations
  • frontend/ - standalone app that consumes the API

How It Works

Dataset Ingest

Track data is loaded from the DB dumps of the AcousticBrainz dataset. The build pipeline does the following:

  1. Stream JSON data from .tar.zst archives, processing the archives in parallel
  2. Extract relevant information from each file (title, audio features, metadata), discarding those that have missing or invalid data
  3. Build a hashmap (track_index) of duplicate tracks indexed by their MusicBrainz ID (musicbrainz_recordingid)
  4. Merge duplicates into a single entry by selecting the most common value for each field (title, audio features, metadata)
  5. Build the DB models:
  6. Track from track_index
  7. Artist, Album and M2M pairings (AlbumArtist, TrackArtist) from the track metadata
  8. Extract audio features to a separate file (features_and_index.npz), this will be loaded into memory by the Django app to allow for fast searching

Because many popular tracks are duplicated in the dataset, the final number of tracks that the app will be working with is considerably lower than what was ingested.

$$finalSize = datasetSize - duplicateCount - tracksMissingData - tracksMissingArtist$$

For the sample dataset (100k tracks), 85732 unique entries will be loaded: $$85732 = 100000 - 11182 - 4 - 3082$$

Deploy and Docker

This project uses Docker to build and manage a reproducible environment that runs the same both locally and in production. This removes the need of having some special setup that exists solely on the server and isn't included in the repo.

# run in repo root
docker build . -f ./backend/Dockerfile -t taste-mender-image
docker run --name taste-mender-web -p 8000:8000 taste-mender-image
docker stop taste-mender-web

Tip

In Windows you may need to stop WSL from running distros in the background, to do this run:

# list running distros
wsl -l -v
# terminate one to free up RAM
wsl -t {NAME}

Setting up the droplet

ssh root@your-server-ip
cd ~

# install Nginx and Node
sudo apt update
sudo apt install -y nginx
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
nvm install v20.17.0
nvm use v20.17.0

# enable and start
sudo systemctl enable nginx
sudo systemctl start nginx

# open ports so nginx can serve front-end
sudo ufw allow 80/tcp    # HTTP
sudo ufw allow 443/tcp   # HTTPS (for certs later)
sudo systemctl reload ufw
sudo ufw status

# create Nginx config
sudo touch /etc/nginx/sites-available/tastemender
code /etc/nginx/sites-available/tastemender

[NOTE] TODO: Enable SSL

server {
    listen 80;
    server_name taste-mender.com www.taste-mender.com your-server-ip;

    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    location /static/ {
        proxy_pass http://localhost:8000/static/;
        # Cache static files
        expires 30d;
    }
}
# Enable the config
sudo ln -s /etc/nginx/sites-available/tastemender /etc/nginx/sites-enabled/
sudo nginx -t  # check config syntax
sudo systemctl reload nginx

# Setup project
git clone https://github.com/RadValentin/CM3070-FP-Music-Recommendation.git tastemender
cd tastemender

# copy `features_and_index.npz` that was built locally during ingest to backend/

# create .env file in backend/
touch backend/.env
# add production values (see .env.example)
code backend/.env

Run the project directly on droplet

# install Python 3.13 and venv
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install -y python3.13 python3.13-venv python3-pip build-essential

# build frontend
cd ~/tastemender/frontend
npm install && npm run build

# create virtual environment
cd ~/tastemender/backend
python3.13 -m venv venv
source venv/bin/activate

# install requirements
pip install -r requirements.txt
pip install gunicorn

python manage.py collectstatic --noinput

gunicorn music_recommendation.wsgi:application \
  --bind 0.0.0.0:8000 \
  --workers 1 \
  --timeout 300 \
  --preload \
  --access-logfile - \
  --error-logfile - \
  --daemon

Deploy and Run

cd ~/tastemender
git pull

# build the frontend
cd frontend && npm install && npm run build && cd ..

# build and start container
# from repo root
docker compose -f backend/docker-compose.yml up --build -d

# run migrations
docker exec taste-mender-web python manage.py migrate

# check logs
docker logs -f taste-mender-web

docker stop taste-mender-web

About

Music recommendation system using high-level features from AcousticBrainz dataset

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors