Skip to content

rounakdatta/fastreco

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fastreco

Fastreco is a simple command line based item Recommender which uses Item-To-Item Collaborative Filtering at its core. That is, when given a collection of interaction between users and items, it helps in finding highly associated pairs of items (or 'If you liked A, you might like B').

Fastreco expects a CSV file as the training interaction data (consisting at least user_id and item_id columns) and produces a processed JSON as the recommendation data. It also maintains a status file for minimal caching. Currently the recommendation computations expects the required columns to be integer. However, abstractions to map the ids to actual values are in progress.

How much accuracy / complexity is supported?

Currently this is a very simple implementation taking into account only interactions, and not contextual similarity. It uses simple statistical algorithms like Log Likelihood.

Why call it fast?

It is fast because it performs significantly fast than the pandas-based approach in Python, thanks to qframe's enhanced DataFrame processing as well as introduced concurrency & parallelism in this implementation. There are equivalent implementations in Python, Rust and many more languages and we intend to publish detailed benchmark of performance.

Example Usage

We first need to prepare the binary fastreco using

go build

Next, lets say we want to experiment with the GoodReads books dataset,

# grabbing the training data
curl -O https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/ratings.csv

# user_id (int) gives the user identifications
# book_id (int) gives the item identifications
# rating (int) gives a metric of whether the item is liked (>= 5) or not
head -n 3 ratings.csv
# user_id,book_id,rating
# 1,258,5
# 2,4081,4

# computing top 5 recommendations for item id 1212
./fastreco --input-file "ratings.csv" \
	--user-column "user_id" \
	--item-column "book_id" \
	--liked-column "rating" \
	--liked-threshold 5 \
	--item-id 1212
# [2 24 23 19 37 6 1 5 7 20]

Force re-computation

By default fastreco will cache the recommendation results on per-user id basis. However, use of --force flag makes a fresh re-computation for that particular user.

Computing recommendations for all users

Although a computationally costly operation, we can skip the item-id flag to demand processing of recommendation for each and every unique user.

About

Fast item to item recommendations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages