This repository is part of my master's thesis project,"Modeling Offensive Language as a Distinct Class for Hate Speech Detection" (Kim, 2025), supervised by Dr. Antske Fokkens and Dr. Hennie van der Vliet. The project explored how modeling offensive (but not hateful) language as a distinct class impacts the task of detection of hate speech. Using a ternary classification scheme (Hateful, Offensive, Clean), I fine-tuned and evaluated a RoBERTa-base model in the full three-class setup and in binary variants where two classes are merged or the offensive class is removed (Hate vs. Non-hate, Non-clean vs. Clean, and Hate vs. Clean). The code used in this study includes my modifications and extensions of Khurana et al. (2025)'s code.
In the project, to probe model behavior beyond set-internal performance, I revised both HateCheck (Röttger et al., 2021) and an existing extension by Khurana et al. (2025), aligning them with the ternary system by re-annotating them and correcting errors present in the extension. The resulting dataset, HateCheck-XR, is available in this repository, under "dataset" in a csv format.
Project
├─ hs_generalization/
│ ├─ __init__.py
│ ├─ modes.py
│ ├─ train.py
│ ├─ test.py
│ └─ uitls.py
├─ tools/
│ └─ run_many.py
├─ configs/
│ └─ example.json
├─ dataset/
│ ├─ davidson/
│ └─ hatecheck/hatecheck-xr.csv
├─ requirements.txt
└─ README.md
Set up the environment like the following:
# Create environment.
conda create -n hs-generalization python=3.9
conda activate hs-generalization
# Install packages.
python setup.py develop
pip install -r requirements.txt
Create a config file and run the following:
python -m hs_generalization.train -c configs\train\example.json
Create a config file and run like the example:
#running based on a single seed/checkpoint
python -m hs_generalization.test -c configs\test\example.json --dataset davidson --eval-mode 3class --train-mode 3class --seed 5 --checkpoint "outputs\davidson\RoBERTa-base\3class\RoBERTa-base_0.pt"
# If you want to run multiple files at once, you can use run_many.py like the following example:
python tools/run_many.py ^
-c configs/test/test.json ^
--dataset hatecheck_xr ^
--eval-mode 3class ^
--train-mode 3class ^
--seeds 7 222 550 999 3111 ^
--ckpt-pattern "outputs/davidson/RoBERTa-base/3class/*.pt" ^
--hatecheck-csv dataset/hatecheck/hatecheck-xr.csv
Currently I am transforming my Master's thesis project on hate speech detection into a production-ready content moderation platform. Extended a fine-tuned RoBERTa classifier with a multi-mode REST API (FastAPI), RAG-powered policy explanations using ChromaDB, containerization with Docker, experiment tracking via MLflow, and CI/CD automation with GitHub Actions. The system supports three classification modes—ternary (hateful/offensive/clean), binary hate detection, and toxicity filtering—making it adaptable for different moderation use cases. The repository will be shared here as soon the work is complete.