- Project Overview
- Installation
- Usage
- Data Preprocessing
- Exploratory Data Analysis (EDA)
- Model Architecture
- Training and Evaluation
- Results Visualization
- Prediction Function
- Future Improvements
- Contributing
- License
This project aims to predict mental health status based on textual statements using Natural Language Processing (NLP) techniques and a Convolutional Neural Network (CNN) model. The project includes data preprocessing, exploratory data analysis, model training, evaluation, and a prediction function for new statements.
To run this project, you need to have Python installed on your system. Clone the repository and install the required packages:
git clone [https://github.com/kknani24/Sentiment-Analysis-for-Mental-Health-Using-NLP-and-Deep-Learning.git]
cd sentiment-analysis.ipynb
pip install -r requirements.txt
The requirements.txt
file should include:
pandas
plotly
nltk
scikit-learn
textblob
numpy
wordcloud
matplotlib
tensorflow
To run the main script:
python sentiment-analysis.ipynb
The data preprocessing steps include:
- Loading the data from 'Combined Data.csv'
- Handling missing values
- Text cleaning:
- Lowercasing
- Removing text in square brackets
- Removing links and HTML tags
- Removing punctuation and newlines
- Removing words containing numbers
- Tokenization and stopword removal
- Data augmentation using translation
The EDA phase includes:
- Displaying basic dataset information
- Visualizing the distribution of mental health status
- Analyzing text length distribution
- Creating a word cloud of cleaned statements
- Visualizing the proportion of each status category
The CNN model architecture:
model = Sequential([
Embedding(input_dim=10000, output_dim=128),
Conv1D(filters=128, kernel_size=5, activation='relu'),
GlobalMaxPooling1D(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(len(label_map), activation='softmax')
])
The model is trained using:
- Optimizer: Adam
- Loss function: Sparse Categorical Crossentropy
- Metrics: Accuracy
- Epochs: 10
- Validation split: 0.2
- Batch size: 32
Evaluation metrics include:
- Test Accuracy
- Classification Report
- Confusion Matrix
Results are visualized using Plotly:
- Histogram of mental health status distribution
- Text length distribution
- Confusion matrix heatmap
- Word cloud of cleaned statements
- Pie chart of status category proportions
A predict_status
function is provided to make predictions on new statements:
predicted_status = predict_status(statement_to_predict, tokenizer, model, label_map, reverse_label_map)
Potential areas for improvement:
- Fine-tuning hyperparameters
- Experimenting with different model architectures (e.g., LSTM, Transformer)
- Incorporating more features (e.g., sentiment analysis scores)
- Collecting more diverse data to improve model generalization
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.