CS5691 - Pattern Recognition and Machine Learning is a course offered by Indian Institute of Technology Madras which was taught by Prof. Arun Rajkumar during Sep2020 -Dec 2020. This repository contains the projects and assignments done as part of learning in the course.
In Spam Email Classifier project, a spam classifier was built from scratch. The code has a function/procedure which when invoked will be able to automatically read a set a emails from a folder titled test in the current directory. Each file in this folder will be a test email and will be named ’email#.txt’ (’email1.txt’,’email2.txt’, etc). For each of these emails, the classifier should predict +1 (spam) or 0 (non Spam). Two sample test emails can be found in the folder test. The report gives a detailed information relating to the data-set chosen, the features extracted and the exact algorithm/procedure used for training including hyperparameter tuning/kernel selection if any. The performance of the algorithm is based on the accuracy on the test set.
P.S. In case, for any reason, you are unable to download the dataset from the URL provided in the report. You can download the same from the zip folder named Spam_Email_Train_Test_Dataset. This contains both the test and the training dataset.