This project aims to classify toxic comments using machine learning techniques. It provides a solution to identify and filter out toxic comments from data collected on Kaggle competition Toxic Comment Classification Challenge.
- Preprocessing of text data to remove noise and irrelevant information.
- Training and evaluation of machine learning models for toxic comment classification.
- Integration with a user interface for easy interaction and input of comments.
-
Clone the repository:
git clone https://github.com/your-username/Toxic-comment-classification.git cd Toxic-comment-classification
-
Install the required dependencies:
pip install -r requirements.txt
-
If you have a capable GPU, use a
conda
environment is advised:conda env create -f environment.yaml
or if you prefer
pip
:pip install -r requirements2.txt
-
Navigate to the project directory:
cd Toxic-comment-classification
-
Inference:
python demo/Demo_GUI.py
-
To try the demo, download the models folder from this link and put it in the root folder. The
model_checkpoint
folder should be directly under theToxic-comment-classification
folder. -
To train the models, download the .vec and .txt embedding files from the internet, and put it in the folder like this:
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.