A simple machine learning model that detects emotions from text. Built primarily as a learning project to compare and explore NLP techniques and sentiment classification.
Obtained via loading the emotion dataset from the datasets package.
- Detects six emotions based on the pre-defined classes of the dataset: sadness, joy, love, anger, fear, and surprise
- Built using Python, scikit-learn, and transformers
- Jupyter Notebook for exploration and visualization
- Model Training: Logistic Regression, and Transformer (Distilbert) is trained on labeled emotion data
- Fine tuning: Fine-tuning of transformer model was done over the emotions dataset. One train epoch due to GPU restraints.
- Prediction: The main evaluation metrics for this training is F1 score. Transformer model produced higher F1 scores across all emotion labels, and a higher accuracy (0.93 against 0.89 of Logistic Regression). Observe that both models confused love and joy emotions most frequently.
- Visualization: Seaborn and matplot lib to display classification reports of models. Confusion matrix was used to map the correct and incorrect predictions per classification.
- Python 3.x
- scitkit-learn
- pandas
- numpy
- transformers
- jupyter
- torch