This repository provides two main scripts for detecting emotions in text using a BERT model: train.py
for training the model, and detect.py
for predicting emotions from new text inputs. The model classifies text into seven different emotions: fear, joy, sadness, anger, surprise, neutral, and disgust.
- Emotion Classification: Classifies text into seven emotions using a pre-trained BERT model.
- Custom Dataset Handling: Loads a dataset for training and processes it into a format suitable for BERT.
- Database Integration: Saves detected emotions into a PostgreSQL database for record-keeping.
This project uses the following libraries:
- PyTorch for model implementation.
- Transformers (Hugging Face) for pre-trained BERT model and tokenizer.
- Pandas for data handling and preprocessing.
- Psycopg2 for PostgreSQL database interaction.
This script is responsible for training a BERT model on an emotion classification dataset. The main class and functions include:
-
Class
CustomDataset
: A PyTorch Dataset for managing text data and corresponding emotion labels.__init__(self, texts, labels, tokenizer, max_length)
: Initializes the dataset with text data, labels, a tokenizer, and the maximum sequence length.__len__(self)
: Returns the number of samples in the dataset.__getitem__(self, idx)
: Retrieves the text and label at a specific index and tokenizes the text.
-
Class
TextEmotionDetector
: Manages the model setup, dataset loading, and training loop for BERT.__init__(self, dataset_path, model_name, num_labels, max_length, batch_size, lr, num_epochs, device)
: Loads the dataset, sets up the tokenizer and model, and defines the training parameters.train(self)
: Trains the model over a specified number of epochs, logging the average loss per epoch.save_model(self, save_directory)
: Saves the trained model and tokenizer to the specified directory.
This script is used for detecting emotions from new text inputs and saving the results in a PostgreSQL database. The main classes and functions are:
-
Class
DBParams
: A data structure for managing PostgreSQL database connection parameters. -
Class
TextEmotionPostgreSQL
: Loads a pre-trained model, establishes a database connection, and provides emotion detection functionality.__init__(self, model_path, db_params)
: Initializes the model for text classification, sets up emotion label mappings, and connects to the PostgreSQL database.predict_emotion(self, text)
: Predicts the emotion of the input text, saves the result in the database, and returns the predicted emotion label.setup_emotion_table(self)
: Creates a table in the database to store text and detected emotions if the table does not already exist.close_connection(self)
: Closes the database connection.
-
Prepare a dataset in CSV format with two columns:
Clean_Text
andEmotion
, and place it in the same directory astrain.py
. -
Adjust the
dataset_path
intrain.py
to point to your CSV file. -
Run the training script:
python3 train.py
This will train the model and save it to the specified directory (default is
./
).
-
Ensure PostgreSQL is set up and the connection parameters are configured correctly in
detect.py
. -
Run the detection script:
python3 detect.py
-
Enter the text to analyze when prompted. The detected emotion will be printed in the console and saved in the database.
-
Clone this repository:
git clone https://github.com/amiriiw/text_emotion_detection cd text-emotion-detection cd Text-emotion-detection
-
Install the required packages:
pip3 install -r requirements.txt
-
Make sure PostgreSQL is installed and running, and create a database to use with this project.
-
Download the dataset via this link: Drive
This project is licensed under the MIT License - see the LICENSE file for details.