This project analyzes the sentiment of IMDB movie reviews using machine learning models. The goal is to classify reviews as positive or negative and extract meaningful insights using Logistic Regression and other models.
- Dataset: IMDB Movie Reviews (50,000 reviews)
- Goal: Sentiment classification and feature analysis.
- Models: Logistic Regression, Random Forest, Naive Bayes.
Model | Accuracy |
---|---|
Logistic Regression | 89.7% |
Random Forest | 85.3% |
Naive Bayes | 85.8% |
- Provides interpretable feature importance.
- Performs well with sparse, high-dimensional text data.
- Top positive words:
great
,excellent
,amazing
,perfect
- Top negative words:
worst
,awful
,boring
,terrible
I conducted CLT experiments on review length, negation count, and exclamation marks to validate statistical properties.
Just download the ipynb file, and run it!
For questions or suggestions, please reach out to diluny2@naver.com!