SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration
Authors: Xin Guan, Nathaniel Demchak, Saloni Gupta, Ze Wang, Ediz Ertekin Jr., Adriano Koshiyama, Emre Kazim, Zekun Wu
Conference: COLING 2025 Main Conference
DOI: https://doi.org/10.48550/arXiv.2409.11149
SAGED(-Bias) is the first comprehensive benchmarking pipeline designed to detect and mitigate bias in large language models. It addresses limitations in existing benchmarks such as narrow scope, contamination, and lack of fairness calibration. The SAGED pipeline includes the following five core stages:
This diagram illustrates the core stages of the SAGED pipeline:
- Scraping Materials: Collects and processes benchmark data from various sources.
- Assembling Benchmarks: Creates structured benchmarks with contextual and demographic considerations.
- Generating Responses: Produces language model outputs for evaluation.
- Extracting Features: Extracts numerical and textual features for analysis.
- Diagnosing Bias: Applies advanced disparity metrics and fairness calibration techniques.
SAGED evaluates max disparity (e.g., impact ratio) and bias concentration (e.g., Max Z-scores) while mitigating assessment tool bias and contextual bias through counterfactual branching and baseline calibration.
# Clone the repository
git clone https://github.com/holistic-ai/SAGED-Bias.git
cd SAGED-Bias
# Install Hatch (if not already installed)
pip install hatch
# Create a virtual environment
hatch env create
hatch run install
hatch run pytest tests --cache-clear --cov=saged --cov-report=term
SAGED allows users to define custom prompts and tailor bias-benchmarking metrics, making it adaptable to different contexts and evaluation requirements.
- Scraping (
_scrape.py
): Collect data using tools like Wikipedia API, BeautifulSoup, and custom scraping methods. - Assembling (
_assembler.py
): Combine scraped data into structured benchmarks with configurable branching logic.
- Generate (
_generator.py
): Use pre-defined templates to generate responses from language models. - Extract (
_extractor.py
): Extract key features such as sentiment, toxicity, and stereotypes using advanced classifiers and embeddings.
- Diagnose (
_diagnoser.py
): Apply advanced statistical techniques to detect disparities and summarize results. - Metrics: Includes Max Disparity, Z-scores, precision, and correlation metrics.
- Pipeline (
_pipeline.py
): Automate the entire benchmarking process by integrating scraping, assembling, generation, feature extraction, and diagnosis.
- Scraping Materials: Use the
KeywordFinder
,SourceFinder
, orScraper
classes from_scrape.py
to collect benchmark data. - Assembling Prompts: Use the
PromptAssembler
class in_assembler.py
to split sentences and create custom prompts.
- Generate Responses: Use the
ResponseGenerator
class in_generator.py
to generate outputs from language models. - Extract Features: Apply the
FeatureExtractor
class in_extractor.py
for sentiment, toxicity, and stereotype analysis.
- Group and Analyze: Use the
DisparityDiagnoser
class in_diagnoser.py
to calculate group statistics and compare disparities. - Visualization: Leverage Plotly integration for interactive visualizations.
- The
Pipeline
class in_pipeline.py
integrates all stages into a seamless workflow.
If you use SAGED in your work, please cite the following paper:
@article{guan2025saged,
title={SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration},
author={Xin Guan and Nathaniel Demchak and Saloni Gupta and Ze Wang and Ediz Ertekin Jr. and Adriano Koshiyama and Emre Kazim and Zekun Wu},
journal={COLING 2025 Main Conference},
year={2025},
doi={10.48550/arXiv.2409.11149}
}
SAGED-bias is released under the MIT License.