Worked on a final Term Project in Machine Learning - Recommender Systems with a fellow student.
The Problem Statement for this question is in http://heather.cs.ucdavis.edu/~matloff/189G/Hwk/TermProject.html
This project was a term project done in my Recommender Systems course. In this project, we had to analyse two different datasets:
- Congress voting dataset -> Contains the type of Congressman and the bills passed and a rating (No or Yes) to indicate whether that particular bill was passed.
- Drugs usage database -> Contains two data files. test and train, which contain the drugname, id, condition for which that drug is used,rating given by the user on the drug, verbal reviews on the ratings and then the useful count on the number of users who found that review useful.
Our team's job was to use the knowledge of recommender systems and methods of prediction of ratings from the information in the datasets to do the following:
- problem A: in the voting dataset, we had to analyse the best prediction method out of the methods we learnt in class and then we used the best prediction method to predict the unknown bills in the data set that are supposed to be predicted.
- Problem B: in this dataset, we had to measure the accuracy of predicting ratings from reviews and we had to report its accuracy.
Our team had written code on first finding most effective methods of prediction. For this we used the rectools package. For the problem B, we had to use sentiment analysis to provide numeric scores to reviews, and we then used a couple of methods to find the most effective prediction method.
The link to the dataset for Problem A is in the same directory (house-votes-84.data)
The link to the datasets for Problem B are: https://archive.ics.uci.edu/ml/machine-learning-databases/00462/
The report is also in 189GTermProject pdf file.