Using deep learning model to train over 49,000 IMDB rating dataset to categorize either the review is positive and negative.
- The project's objective is to categorize the IMDB movies rating.
- The IMDB movie reviews contain enormous amount of data, which can be used to predict whether the movie review is a negative or positive review.
- The dataset contains anomalies such as HTML tags (removed using RegEx), lowercase/uppercase, and duplicates data.
- The method used for the deep learning model are word embedding, LSTM and Bidirectional.
- Several method can be used to improve the model such as lemmatization, stemming, CNN, n-grams, etc.
- The model achieved 84% accuracy during training.
- Both recall and f1 score report 85%.
- However, the model starts to overfit after 2nd epochs. Early stopping can be used to prevent overfitting. The dropout data can be increased to control overfitting.
Shout out to @Ankit152 for the IMDB Dataset. Check out the dataset by clicking the link below. 😄