This project demonstrates the development of a multilingual sentiment classifier for customer reviews using a fine-tuned BERT model. The classifier predicts one of four sentiment categories: very negative
, negative
, positive
, and very positive
.
The goal was to create a robust and scalable NLP pipeline for sentiment analysis, with the following components:
- Data Preprocessing: Cleaned and filtered reviews, removed neutral samples, and standardized label encoding.
- Model Training: Fine-tuned a pre-trained
bert-base-uncased
model with Hugging Face Transformers. - Evaluation: Measured performance using accuracy and macro F1-score over multiple epochs.
- Deployment: Exposed the trained model as an API using FastAPI.
Epoch | Train Loss | Val Loss | Accuracy | Macro F1 |
---|---|---|---|---|
1 | 0.5536 | 0.9056 | 0.6466 | 0.6486 |
4 | 0.0423 | 1.6631 | 0.6241 | 0.6245 |
8 | 0.0015 | 2.1568 | 0.6391 | 0.6409 |
- Accuracy: 65%
- Macro F1-score: 65%
- Performance by class:
- Very Negative: F1 = 0.70
- Negative: F1 = 0.61
- Positive: F1 = 0.62
- Very Positive: F1 = 0.67
curl -X POST http://127.0.0.1:8000/predict
-H "Content-Type: application/json"
-d '{"text": "The product was terrible and arrived broken."}'
{ "sentiment": "very negative", "confidence": 0.92 }
The trained model is served via FastAPI. To run the API:
uvicorn main:app --reload
# Notes
# Random seed (42) was set to ensure reproducibility.
# Trained and evaluated using Apple Silicon (MPS acceleration).
# Tokenizer and model saved for reuse and inference.