This project is part of an Internship (TCS iON RIO 210 )
RIO-210: Automate detection of different emotions from paragraphs and predict overall emotion - (Batch 01)
A complete analysis is available here : emotion-prediction-notebook
Libraries used :
Core libraries: numpy,sklearn,pandas,matplotlib
Machine Learning / NLP Libraries : Sci-kit Learn(sk-learn), NLTK
Deep Learning Libraries : Tensorflow, Keras
Fasttext pre-trained (english) word vectors were also used as embeddings: https://fasttext.cc/docs/en/english-vectors.html
- Data cleaning
- Data pre-processing
- Feature Selection
- Stopwords removal
- Feature Encoding
- Creating Bag-of-words(BoW) model
- Final Model creation
Since the dataset consists of categorical data, classification algorithms were used to strengthen the predictive power of the model.
Here two models were built viz:
- Multinomial Naive Bayes
- Random Forest Classifier
- SGD classifier
- Logistic Regression
To determine the best model to use for this classification problem, a comparison was done between all of the models.
The final result - LSTM model wins in the deep learning category for this task!
-
Machine Learning: SGD classifier performed slightly better as compared to all the other models in terms of accuracy.
-
Deep Learning: LSTM Architecture provided the best results for predicting overall emotion from text, even better than SGD
Please refer the Data folder : Data
- Vishak Gopkumar - (https://github.com/RepoMan20)