Causality Guided Representation Learning for Cross-Style Hate Speech Detection

CADET: Causal-Aware Disentanglement for Explicit/Implicit Transfer

CADET is a comprehensive framework for hate speech detection research with focus on cross-style generalization between explicit and implicit hate speech.

Warning

This project and associated datasets involve hate speech and other offensive content. Handle with care and follow your organization's ethics and safety policies.

🏆 Accepted at The Web Conference 2026 (WWW'26)!

Quick Start

# Clone and setup
git clone https://github.com/Shu-Wan/cadet.git
cd cadet
uv sync --all-groups

# Run CADET experiment
uv run python scripts/run_experiments.py

Installation

Requirements: Python 3.12+, uv, CUDA-compatible GPU (recommended)

# Clone and setup
git clone https://github.com/Shu-Wan/cadet.git
cd cadet
uv sync --all-groups

# Set up Hugging Face authentication (for LLM Guard models)
export HF_TOKEN="your_hf_token_here"

Data

IE-HSC (Implicit-Explicit Hate Speech Corpus) is hosted on Hugging Face: Shuwan/cadet-datasets
Data cards, schema, warnings, and access notes: docs/DATASETS.md
Small samples for testing/debugging: data/samples/

Usage

Running Experiments

All experiments use the unified runner run_experiments.py:

# Single experiments (select config)
uv run python scripts/run_experiments.py -cn=cadet
uv run python scripts/run_experiments.py -cn=simple_baselines
uv run python scripts/run_experiments.py -cn=llm_guard

# Override parameters
uv run python scripts/run_experiments.py -cn=cadet \
    data.dataset_name=DynaHate \
    data.source_style=implicit

# Multirun mode (predefined configs)
uv run python scripts/run_experiments.py -cn=cadet_multirun
uv run python scripts/run_experiments.py -cn=simple_baselines_multirun
uv run python scripts/run_experiments.py -cn=llm_guard_multirun

# Multirun from command line
uv run python scripts/run_experiments.py -cn=cadet -m \
    data.dataset_name=AbuseEval,DynaHate \
    data.source_style=explicit,implicit

# Ad-hoc testing with limited samples
uv run python scripts/run_experiments.py -cn=cadet +adhoc.n_samples=100

For detailed configuration options, see configs/README.md.

Documentation

Document	Description
configs/README.md	Configuration files and options
scripts/README.md	Running experiments and utility scripts
CADET.md	CADET model architecture and causal analysis
DATASETS.md	Dataset structure, samples, and safety notes
BASELINES.md	Baseline model implementations
EVALUATIONS.md	Evaluation metrics and analysis
DESIGN.md	System architecture and design principles
TRAINING.md	Training recipes and methodologies

Development

# Run linting
ruff check src/ scripts/

# Run pre-commit hooks
pre-commit run --all-files

Usage of GenAI

This project is developed with the help of AI coding assistant tools such as GitHub Copilot and ChatGPT.

Citation

If you use CADET in your research, please cite:

@article{zhao2025causality,
  title={Causality Guided Representation Learning for Cross-Style Hate Speech Detection},
  author={Zhao, Chengshuai and Wan, Shu and Sheth, Paras and Patwa, Karan and Candan, K Sel{\c{c}}uk and Liu, Huan},
  journal={arXiv preprint arXiv:2510.07707},
  year={2025}
}

@dataset{cadet_dataset_2026,
  title={cadet-datasets},
  url={https://huggingface.co/datasets/Shuwan/cadet-datasets},
  DOI={10.57967/HF/7615},
  publisher={Hugging Face},
  author={Chengshuai Zhao, Shu Wan, Paras Sheth, Karan Patwa, K. Selçuk Candan, Huan Liu}, year={2026}
}

@software{shu_2026_18331976,
  author       = {Shu},
  title        = {Shu-Wan/cadet: Initial release (v1.0.0)},
  month        = jan,
  year         = 2026,
  publisher    = {Zenodo},
  version      = {v1.0.0},
  doi          = {10.5281/zenodo.18331976},
  url          = {https://doi.org/10.5281/zenodo.18331976},
  swhid        = {swh:1:dir:ceb3fc11806a24207cb0e43fd92b9b6f106f4cc2
                   ;origin=https://doi.org/10.5281/zenodo.18331975;vi
                   sit=swh:1:snp:402265cabd7f46dfbeeadf2a29ed9ed9a173
                   48a8;anchor=swh:1:rel:014aced8c618a22ad8ffd659e14f
                   8c8e29e11440;path=Shu-Wan-cadet-678b13c
                  },
}

License

This code is licensed under the MIT License.

The dataset is derived from multiple upstream sources and may be governed by different licenses and terms than the code in this repository. Refer to the dataset card on Hugging Face and docs/DATASETS.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
configs		configs
data/samples		data/samples
docs		docs
scripts		scripts
src/cadet		src/cadet
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causality Guided Representation Learning for Cross-Style Hate Speech Detection

Quick Start

Installation

Data

Usage

Running Experiments

Documentation

Development

Usage of GenAI

Citation

License

About

Uh oh!

Releases 1

Packages

Languages

License

Shu-Wan/cadet

Folders and files

Latest commit

History

Repository files navigation

Causality Guided Representation Learning for Cross-Style Hate Speech Detection

Quick Start

Installation

Data

Usage

Running Experiments

Documentation

Development

Usage of GenAI

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages