It uses Dask as a Distributed Framework with Website Application using Streamlit. Inspired by the work of https://github.com/entbappy/ML-Based-Book-Recommender-System
-
Collaborative filtering systems rely on user-item interactions.
-
Users with similar ratings form clusters, facilitating the recommendation process.
-
When recommending books, the system employs a cluster-based mechanism.
-
The system considers either ratings or comments as its sole parameter.
-
In essence, collaborative filtering assumes that if one user likes item A and another user likes both item A and another item, B, the first user may also be interested in item B.
-
Challenges include:
-
The computational expense of managing a user-item nXn matrix.
-
Preferential recommendation for only popular items.
-
Potential neglect of recommending new items.
-
We used the data from Kaggle that contains the Book names, User Ids and their Ratings.
Link to data: https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset?resource=download&select=Ratings.csv
In training the Model, we used the KNN algorithm to cluster their ratings and in finding the suitable books to be recommended.
-
Load the dataset.
-
Set the value of k.
-
Iterate through the total number of training data points to obtain the predicted class.
-
Compute the Euclidean distance between the test data and each row of the training data, as it is a widely used distance metric.
-
Arrange the calculated distances in ascending order.
-
Extract the top k rows from the sorted array.
Dask is a parallel computing library designed to seamlessly scale and handle larger-than-memory computations in a distributed environment.
-
Convert the Pandas DataFrame to a Dask DataFrame
-
Find the index of the target book in a distributed manner
-
Compute the distances and suggestions in a distributed manner
-
Schedule the computation and gather results
-
Append the Book list into the array
git clone https://github.com/D3struf/Distributed-Collaborative-Filtering-Book-Recommendation-System.git
conda create -n books python=3.7.10 -y
conda activate books
pip install -r requirements.txt
streamlit run app.py