Recommendation Model for Spotify Million Playlist Dataset challenge.

This is the codebase for the Winter 2024 ECE 143 Final Project for Group 10

Dataset

Spotify Million Playlist Dataset Challenge

File Structure

.
├── data/
├── src/
│   ├── EDA.ipynb [Main Visualizations]
│   ├── analysis.py [Basic descriptive statistics]
│   ├── collaborative_stats.py
│   ├── plots.py
│   ├── pre_processing.py 
│   ├── recommend_tracks.py
│   └── utils.py [Spotify API]
├── requirements.txt 
└── README.md

Pre-processing

Create and activate virtual environment.

python3 -m venv venv && source venv/bin/activate

Install packages from requirements.txt
```
pip install -r requirements.txt
```
Run python3 pre_processing.py [directory of dataset] [directory of generated df]

Note: it takes around 10 min to run and the peak memory usage is ~6GB

Generate Visualizations

Run jupyter notebook
Open EDA.ipynb
Run all cells

`utils.py` Usage

Create a Spotify developer account
Create a new app, copy the Client ID and Client Secret
In ".env", store these credentials

Analysis

Prior to building a recommendation model, we analyzed parts of the dataset to get a better understanding of the underlying distributions.

We considered two major forms of analysis in investigation of the dataset:

Basic Descriptive Statistics (Tracks, Playlists, Artists, Albums, etc.)
Clustering Analysis with Audio Features

The basic descriptive statistics include results such as the most popular tracks, artists, albums, etc. as well as some more detailed statistics such as the distribution of playlist lengths. Effectively, this analysis provides a high-level overview of the dataset and serves as a starting point to guide us in asking more interesting and pointed questions about the nature of certain features in the dataset.

On the other hand, the clustering analysis attempts to group playlists and tracks based on their audio characteristics as provided by the Spotify API in the anticipation that listeners prefer tracks with similar audio features.

Viewing the Analysis (Basic Descriptive Statistics)

Note: the pre-processing step must be completed before running the analysis

Run python3 analysis.py [directory of pre processed data] -N 10.

These will produce bar plots and histograms to answer the following questions:

What are the most popular tracks across all playlists?
What are the most popular artists across all playlists?
What are the most popular albums across all playlists?
What artists are the most prolific in terms of number of tracks?
What artists are the most prolific in terms of number of albums?
What albums contain the most tracks?
How do the audio characteristics of popular tracks differ from just average (random) tracks?

Recommendation

In our recommendation model, we implemented the K-means clustering algorithm to group tracks based on attributes such as danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness, valence, and tempo. Songs residing in the same cluster as the current track were suggested for sequential playback.

For recommendations from a given playlist, we utilized cosine similarity to compare the current song against the rest in the playlist. This approach helped in identifying and recommending the next song to play from the playlist

Recommendation of tracks

Note: the pre-processing step must be completed before running the analysis

Run python3 recommend_tracks.py [current_song] --dir [directory of pre processed data] -N 10 .

This will recommend N songs from the same K-means cluster as the current song.

Run python3 recommend_tracks.py [current_song] --dir [directory of pre processed data] -N 10 --playlist_id <value>.

This will recommend N songs from the given playlist based on cosine similarity.

Third Party Packages

jupyter
numpy
pandas
python-dotenv
seaborn
scikit-learn
spotipy
tqdm
wordcloud
pillow

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
src		src
.env		.env
.gitignore		.gitignore
ECE143_ FinalProjectPresentation.pdf		ECE143_ FinalProjectPresentation.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommendation Model for Spotify Million Playlist Dataset challenge.

Dataset

File Structure

Pre-processing

Generate Visualizations

`utils.py` Usage

Analysis

Viewing the Analysis (Basic Descriptive Statistics)

Recommendation

Recommendation of tracks

Third Party Packages

About

Releases

Packages

Contributors 6

Languages

ersimpson/ece143final

Folders and files

Latest commit

History

Repository files navigation

Recommendation Model for Spotify Million Playlist Dataset challenge.

Dataset

File Structure

Pre-processing

Generate Visualizations

utils.py Usage

Analysis

Viewing the Analysis (Basic Descriptive Statistics)

Recommendation

Recommendation of tracks

Third Party Packages

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

`utils.py` Usage

Packages