Skip to content

Sentiment Analysis on the IMDB dataset using BERT, Hugging Face and PyTorch

Notifications You must be signed in to change notification settings

dchandak99/BERT-Sentiment

Repository files navigation

Sentiment Analysis by BERT:

BERT is state-of-the-art natural language processing model from Google. Using its latent space, it can be repurpossed for various NLP tasks, such as sentiment analysis.

I have used Hugging Face Transformers and Pytorch and the task is predicting positivity / negativity on IMDB reviews.

Data:

Firstly, you need to prepare IMDB data which is publicly available. Format used here is one review per line, with first 12500 lines being positive, followed by 12500 negative lines. Positive has been encoded with 0 and negative with 1.

You can download data and weights (in the correct format) directly from my drive link here.

Models:

I have used 3 models:

  • BertForSequenceClassification (Hugging Face)
  • BertModel (Hugging Face)
  • Pytorch pretrained BERT (not from Hugging Face)

Results:

  • BertForSequenceClassification:
precision recall f1-score support
0.0 0.90 0.93 0.91 12500
1.0 0.93 0.90 0.91 12500
accuracy 0.91 25000
macro avg 0.91 0.91 0.91 25000
weighted avg 0.91 0.91 0.91 25000

Accuracy achieved: 91 %

  • After optimization experiments BertModel does better with an accuracy of 93 %

Optimization:

I will optimize the hyperparameters later to get as close to the sota as possible.
You can view the optimization experiments here.

Code:

Code has been uploaded as a notebook and a .py file.

Note: For the .py file, ensure transformers is installed (command: pip install transformers) and set correct paths in lines 76 and 227.

Code with the base BertModel can be found here.

Links:

Useful comments and links to tutorials have been given inside the notebook to guide you through

About

Sentiment Analysis on the IMDB dataset using BERT, Hugging Face and PyTorch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published