HEARTS-Text-Stereotype-Detection

Overview

HEARTS introduces explainable, low-carbon models fine-tuned on the Expanded Multi-Grain Stereotype Dataset (EMGSD) to tackle challenges in stereotype detection. This repository includes scripts for training, evaluation, and explainability analysis for sentence-level stereotype classification. For details, refer to the HEARTS research paper.

Features

Exploratory Data Analysis (EDA): Analyze group distributions, text length, and sentiment/regard trends in EMGSD.
Model Training & Evaluation: Train and test models (e.g., BERT, ALBERT-V2, logistic regression) on EMGSD with ablation studies.
Explainability: Generate SHAP and LIME explanations to interpret predictions.
LLM Bias Evaluation: Classify and evaluate bias in LLM outputs using neutral prompts derived from EMGSD.

Quickstart

Clone this repository:

git clone https://github.com/username/HEARTS-Text-Stereotype-Detection.git
cd HEARTS-Text-Stereotype-Detection

Install dependencies:
```
pip install -r requirements.txt
```
Explore the modules (see details below).

Modules

1. Exploratory Data Analysis

Scripts to perform basic analysis on EMGSD, available at Hugging Face.

Initial_EDA: Analyze target group distribution, stereotype group distribution, text length, and frequency.
Sentiment_Regard_Analysis: Classify sentiment (RoBERTa Sentiment Model) and regard (Regard v3) for dataset entries.

2. Model Training and Evaluation

Train and evaluate various models on EMGSD, with ablation studies on its three core datasets (MGSD, Augmented WinoQueer, and Augmented SeeGULL).

BERT_Models_Fine_Tuning: Fine-tune and evaluate ALBERT-V2, DistilBERT, and BERT models.
Logistic_Regression: Train logistic regression models using:
- TF-IDF vectorization
- Pre-trained embeddings (spaCy embeddings)
DistilRoBERTaBias: Evaluate an open-source bias detection model (DistilRoBERTa Bias).
GPT4_Models: Evaluate GPT-4o and GPT-4o-mini using API prompting (API credentials required).

3. Model Explainability

Interpret model predictions using SHAP and LIME. Weights for the fine-tuned ALBERT-V2 model are available at Hugging Face.

SHAP_LIME_Analysis: Generate SHAP and LIME explanations for selected model predictions and compare their similarity using metrics such as:
- Cosine similarity
- Pearson correlation
- Jensen-Shannon divergence

4. LLM Bias Evaluation

Classify and evaluate bias in LLM responses using neutral prompts derived from EMGSD.

LLM_Prompt_Verification: Verify neutrality of prompts using the fine-tuned ALBERT-V2 model.
LLM_Bias_Evaluation: Classify LLM outputs to compute aggregate bias scores, representing stereotype prevalence.
SHAP_LIME_Analysis_LLM_Outputs: Apply SHAP and LIME to interpret predictions on LLM outputs.

Results

Key findings and performance benchmarks from the paper are outlined here.

Citation

If you use this work, please cite the following paper:

@article{hearts2024,
  title={HEARTS: Enhancing Stereotype Detection with Explainable, Low-Carbon Models},
  author={Author Names},
  journal={arXiv preprint arXiv:2409.11579},
  year={2024}
}

License

This repository is licensed under the MIT License.

Contact

For questions or collaborations, contact Holistic AI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HEARTS-Text-Stereotype-Detection

Overview

Features

Quickstart

Modules

1. Exploratory Data Analysis

2. Model Training and Evaluation

3. Model Explainability

4. LLM Bias Evaluation

Results

Citation

License

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Exploratory Data Analysis		Exploratory Data Analysis
LLM Bias Evaluation Exercise		LLM Bias Evaluation Exercise
Model Explainability		Model Explainability
Model Training and Evaluation		Model Training and Evaluation
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

holistic-ai/HEARTS-Text-Stereotype-Detection

Folders and files

Latest commit

History

Repository files navigation

HEARTS-Text-Stereotype-Detection

Overview

Features

Quickstart

Modules

1. Exploratory Data Analysis

2. Model Training and Evaluation

3. Model Explainability

4. LLM Bias Evaluation

Results

Citation

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages