Winter of Code Final Work Product

Selection a Proper Dataset

Getting proper data for training models suitable to our requirements is important.
I have searched a lot of dataset like twitter analysis data and many more but at last i finalised Amazon Fine Food Review .
I have choosed this dataset because it includes rating from 0-5 scores for every individual review.
The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012.
It contains huge dataset due to this i have choosen this dataset for my project.
images

Before we move to train our model we have to do preprocessing so that we can remove unwanted data.
So here in my dataset there are various columns of different values but for my project i have selected only scores , id and reviews text column for my project.
after that there is score given 0-5 for reviews so i divided that into 3 categories negative(score>3),Positive(score>3),neutral(score==3).
after that i have checked that if any duplicate values are there than i have seen that there are duplicates values , I removed all duplicated value.
Then i removed html tags, special character and Tokenize the reviews into word tokens.
After that i split the review into words and then check if these words are in the stop words if so we will remove them, if not we will join.

Now we have splitted my cleaned dataset into train and test set to work on that and build a gentle model.
Then i featurised my dataset on tf-idf vectorizer and fit it as tfidf_model.fit(reviews_train,sentiment_train).
Then i transform it on train reviews as reviews_train_tfidf=tfidf_model.transform(reviews_train).
Then i used WordCloud to see top 10 words by importing Wordcloud in my model.
images

Model selection is very key point to make your project best in term of accuracy and precission.
after applying EDA on dataset i tried three algorithms to train my model for better prediction.
These are 3 algorithms :- Logistic Regression, Naive Bayes and Decision tree.
From these three i have figureout one algorithm which will fit to my model.
So on the basis of parameters and accuracy i choosed Naive Bayes for my model to train my model on it.
images

For pre-check my model frontend and backend i deployed it on my local server, and it working efficiently and precisely.
I have added feature of prediction of sentiment , keywords extraction and showing , Polarity and Subjectivity and Summary. these are some features which will comes up when we put sone text in my frontend and gives output after processing in backend.
images of frontend