Welcome to the Movie Review Sentiment Analysis Dashboard, a tool designed to analyze the sentiment of movie reviews using advanced natural language processing (NLP) techniques and machine learning models. This app provides sentiment predictions and visual insights into the factors influencing them using SHAP explanations.
HanChen Wang
November 2024
- Interactive Sentiment Analysis: Input your own movie review text or analyze a random review.
- SHAP Explanation: Gain insights into the most influential words driving sentiment predictions.
- Preprocessing Transparency: Track changes to words during preprocessing, including slang translation and stemming.
- Random Review Generator: Explore the app with preloaded demo reviews.
- Frontend: Built using Dash and styled with Dash Bootstrap Components.
- Backend:
- Machine Learning: XGBoost for predictions and SHAP for model interpretability.
- NLP Preprocessing: NLTK and SpaCy for text cleaning and tokenization.
- Data:
- Slang dictionary for translation.
- Demo dataset of movie reviews from IMDb.
You can visit the webpage deployed on render.com with this link.
Ensure you have the following installed:
- Python 3.8+
- Required Python libraries:
dash
,dash-bootstrap-components
,nltk
,spacy
,shap
,pandas
,joblib
,plotly
,matplotlib
.
-
Clone this repository:
git clone https://github.com/hcwang24/sentiment_analysis.git cd sentiment_analysis
-
Install dependencies:
pip install -r requirements.txt
-
Download NLTK and SpaCy data:
import nltk nltk.download('punkt') nltk.download('stopwords')
-
Run the app:
python app.py
-
Open the app in your browser at http://127.0.0.1:8050/.
movie-sentiment-dashboard/
├── app.py # Main application script
├── models/
│ ├── Vectorizer.pkl # Pretrained vectorizer
│ ├── XGBoost_model.pkl # Sentiment prediction model
├── data/
│ └── slang-dict.csv # Dictionary of slang terms
├── demo/
│ └── imdb_1000.csv # Demo movie review dataset
├── assets/ # Additional resources (CSS, images, etc.)
├── requirements.txt # List of dependencies
└── README.md # Project documentation
- Input a Movie Review: Paste your text into the input box or click Generate Random Review.
- Analyze Sentiment: Click the Analyze Sentiment button to get predictions.
- Explore Results:
- View the sentiment prediction (positive/negative).
- Examine the top contributing words using SHAP visualizations.
- Review: "This movie was absolutely amazing! I loved every second of it."
- Prediction: Positive (81% Confidence).
The visualization highlights the most influential words, such as:
- Positive Contributors: "amazing", "loved".
- Negative Contributors: (None in this case).
Contributions are welcome! Feel free to:
- Submit issues for bugs or feature requests.
- Fork the repository and create a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
Enjoy exploring the sentiment of movie reviews! 🚀