Skip to content

Commit

Permalink
rename
Browse files Browse the repository at this point in the history
  • Loading branch information
dylanbouchard committed Oct 21, 2024
1 parent 092ce7c commit cc04376
Show file tree
Hide file tree
Showing 86 changed files with 404 additions and 438 deletions.
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
<p align="center">
<img src="./assets/images/llambda-logo.png" />
<img src="./assets/images/langfair-logo.png" />
</p>

# Library for Assessing Bias and Fairness in LLMs

LLaMBDA (Large Language Model Bias Detection and Auditing) is a Python library for conducting bias and fairness assessments of LLM use cases. This repository includes a framework for [choosing bias and fairness metrics](#choosing-bias-and-fairness-metrics-for-an-llm-use-case), [demo notebooks](./examples), and a LLM bias and fairness [technical playbook](https://arxiv.org/pdf/2407.10853) containing a thorough discussion of LLM bias and fairness risks, evaluation metrics, and best practices. Please refer to our [documentation site](https://cvs-health.github.io/llambda/) for more details on how to use LLaMBDA.
LangFair is a Python library for conducting bias and fairness assessments of LLM use cases. This repository includes a framework for [choosing bias and fairness metrics](#choosing-bias-and-fairness-metrics-for-an-llm-use-case), [demo notebooks](./examples), and a LLM bias and fairness [technical playbook](https://arxiv.org/pdf/2407.10853) containing a thorough discussion of LLM bias and fairness risks, evaluation metrics, and best practices. Please refer to our [documentation site](https://cvs-health.github.io/langfair/) for more details on how to use LangFair.

Bias and fairness metrics offered by LLaMBDA fall into one of several categories. The full suite of metrics is displayed below.
Bias and fairness metrics offered by LangFair fall into one of several categories. The full suite of metrics is displayed below.

##### Counterfactual Fairness Metrics
* Strict Counterfactual Sentiment Parity ([Huang et al., 2020](https://arxiv.org/pdf/1911.03064))
Expand Down Expand Up @@ -38,18 +38,18 @@ Bias and fairness metrics offered by LLaMBDA fall into one of several categories
* False Discovery Rate Disparity ([Bellamy et al., 2018](https://arxiv.org/abs/1810.01943); [Saleiro et al., 2019](https://arxiv.org/abs/1811.05577))

## Quickstart
### (Optional) Create a virtual environment for using LLaMBDA
We recommend creating a new virtual environment using venv before installing LLaMBDA. To do so, please follow instructions [here](https://docs.python.org/3/library/venv.html).
### (Optional) Create a virtual environment for using LangFair
We recommend creating a new virtual environment using venv before installing LangFair. To do so, please follow instructions [here](https://docs.python.org/3/library/venv.html).

### Installing LLaMBDA
### Installing LangFair
The latest version can be installed from the github URL:

```bash
pip install git+https://github.com/cvs-health/llambda.git
pip install git+https://github.com/cvs-health/langfair.git
```

### Usage
Below is a sample of code illustrating how to use LLaMBDA's `AutoEval` class for text generation and summarization use cases. The below example assumes the user has already defined parameters `DEPLOYMENT_NAME`, `API_KEY`, `API_BASE`, `API_TYPE`, `API_VERSION`, and a list of prompts from their use case `prompts`.
Below is a sample of code illustrating how to use LangFair's `AutoEval` class for text generation and summarization use cases. The below example assumes the user has already defined parameters `DEPLOYMENT_NAME`, `API_KEY`, `API_BASE`, `API_TYPE`, `API_VERSION`, and a list of prompts from their use case `prompts`.

Create `langchain` LLM object.
```python
Expand All @@ -66,7 +66,7 @@ llm = AzureChatOpenAI(

Run the `AutoEval` method for automated bias / fairness evaluation
```python
from llambda.auto import AutoEval
from langfair.auto import AutoEval
auto_object = AutoEval(
prompts=prompts,
langchain_llm=llm
Expand All @@ -92,7 +92,7 @@ auto_object.print_results()
</p>

## Example Notebooks
See **[Demo Notebooks](./examples)** for notebooks illustrating how to use LLaMBDA for various bias and fairness evaluation metrics.
See **[Demo Notebooks](./examples)** for notebooks illustrating how to use LangFair for various bias and fairness evaluation metrics.

## Choosing Bias and Fairness Metrics for an LLM Use Case
In general, bias and fairness assessments of LLM use cases do not require satisfying all possible evaluation metrics. Instead, practitioners should prioritize and concentrate on a relevant subset of metrics. To demystify metric choice for bias and fairness assessments of LLM use cases, we introduce a decision framework for selecting the appropriate evaluation metrics, as depicted in the diagram below. Leveraging the use case taxonomy outlined in the [technical playbook](https://arxiv.org/abs/2407.10853), we determine suitable choices of bias and fairness metrics for a given use case based on its relevant characteristics.
Expand All @@ -114,7 +114,7 @@ Lastly, we classify the remaining subset of focused use cases as having minimal


## Associated Research
A technical description of LLaMBDA's evaluation metrics and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/pdf/2407.10853)**. Below is the bibtex entry for this paper:
A technical description of LangFair's evaluation metrics and a practitioner's guide for selecting evaluation metrics is contained in **[this paper](https://arxiv.org/pdf/2407.10853)**. Below is the bibtex entry for this paper:

@misc{bouchard2024actionableframeworkassessingbias,
title={An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases},
Expand All @@ -127,10 +127,10 @@ A technical description of LLaMBDA's evaluation metrics and a practitioner's gui
}

## Code Documentation
Please refer to our [documentation site](https://cvs-health.github.io/llambda/) for more details on how to use LLaMBDA.
Please refer to our [documentation site](https://cvs-health.github.io/langfair/) for more details on how to use LangFair.

## Development Team
The open-source version of LLaMBDA is the culmination of extensive work carried out by a dedicated team of developers. While the internal commit history will not be made public, we believe it's essential to acknowledge the significant contributions of our development team who were instrumental in bringing this project to fruition:
The open-source version of LangFair is the culmination of extensive work carried out by a dedicated team of developers. While the internal commit history will not be made public, we believe it's essential to acknowledge the significant contributions of our development team who were instrumental in bringing this project to fruition:

- [Dylan Bouchard](https://github.com/dylanbouchard)
- [Mohit Singh Chauhan](https://github.com/mohitcek)
Expand All @@ -139,4 +139,4 @@ The open-source version of LLaMBDA is the culmination of extensive work carried
- [Zeya Ahmad](https://github.com/zeya30)

## Contributing
Contributions are welcome. Please refer [here](./CONTRIBUTING.md) for instructions on how to contribute to LLaMBDA.
Contributions are welcome. Please refer [here](./CONTRIBUTING.md) for instructions on how to contribute to LangFair.
Binary file removed assets/images/archive/LLM_Framework.png
Binary file not shown.
Binary file removed assets/images/archive/LLaMBDA.png
Binary file not shown.
Binary file removed assets/images/archive/llambda2_logo_old.PNG
Binary file not shown.
Binary file removed assets/images/archive/llambda_logo_cvsred.PNG
Binary file not shown.
Binary file removed assets/images/archive/llambda_logo_old.PNG
Binary file not shown.
Binary file added assets/images/langfair-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/langfair-logo2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/images/langfair-logo3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed assets/images/llambda-logo-alt-dark.png
Binary file not shown.
Binary file removed assets/images/llambda-logo-only-dark.png
Binary file not shown.
Binary file removed assets/images/llambda-logo-only.png
Binary file not shown.
Binary file removed assets/images/llambda-logo.png
Binary file not shown.
Binary file removed assets/images/llambda-logo2.png
Binary file not shown.
2 changes: 1 addition & 1 deletion data/DATA_COPYRIGHT.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Please refer to below for copyright information for the two files contained in `llambda/data`
Please refer to below for copyright information for the two files contained in `langfair/data`

#### Copyright information for [RealToxicityPrompts.jsonl](https://huggingface.co/datasets/allenai/real-toxicity-prompts)
***
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@
"\n",
"import numpy as np\n",
"from IPython.display import Image\n",
"from llambda.metrics.classification import ClassificationMetrics"
"from langfair.metrics.classification import ClassificationMetrics"
]
},
{
"cell_type": "markdown",
"id": "b9290443-ce88-4d54-beea-1e1888500b36",
"metadata": {},
"source": [
"Bias and fairness metrics offered by `llambda` fall into various categories: counterfactual discrimination metrics, stereotype metrics, toxicity mtrics, recommendation fairness metrics, and classification fairness metrics. The full suite of metrics is displayed below.\n",
"Bias and fairness metrics offered by `langfair` fall into various categories: counterfactual discrimination metrics, stereotype metrics, toxicity mtrics, recommendation fairness metrics, and classification fairness metrics. The full suite of metrics is displayed below.\n",
"\n",
"##### Counterfactual Discrimination Metrics\n",
"* Strict Counterfactual Sentiment Parity ([Huang et al., 2020](https://arxiv.org/pdf/1911.03064))\n",
Expand Down Expand Up @@ -210,15 +210,15 @@
],
"metadata": {
"environment": {
"kernel": "llambda-env",
"kernel": "langfair",
"name": "workbench-notebooks.m121",
"type": "gcloud",
"uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m121"
},
"kernelspec": {
"display_name": "llambda-env",
"display_name": "langfair",
"language": "python",
"name": "llambda-env"
"name": "langfair"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -230,7 +230,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
"version": "3.9.20"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 1,
"id": "f694ef3c-96cb-472c-80c4-0409222fc4ac",
"metadata": {
"tags": []
Expand All @@ -13,15 +13,15 @@
"\n",
"from IPython.display import Image\n",
"\n",
"from llambda.metrics.recommendation import RecommendationMetrics\n"
"from langfair.metrics.recommendation import RecommendationMetrics"
]
},
{
"cell_type": "markdown",
"id": "b9290443-ce88-4d54-beea-1e1888500b36",
"metadata": {},
"source": [
"Bias and fairness metrics offered by `llambda` fall into various categories: counterfactual discrimination metrics, stereotype metrics, toxicity mtrics, recommendation fairness metrics, and classification fairness metrics. The full suite of metrics is displayed below.\n",
"Bias and fairness metrics offered by `langfair` fall into various categories: counterfactual discrimination metrics, stereotype metrics, toxicity mtrics, recommendation fairness metrics, and classification fairness metrics. The full suite of metrics is displayed below.\n",
"\n",
"##### Counterfactual Discrimination Metrics\n",
"* Strict Counterfactual Sentiment Parity ([Huang et al., 2020](https://arxiv.org/pdf/1911.03064))\n",
Expand Down Expand Up @@ -394,15 +394,15 @@
],
"metadata": {
"environment": {
"kernel": "llambda-env",
"kernel": "langfair",
"name": "workbench-notebooks.m121",
"type": "gcloud",
"uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m121"
},
"kernelspec": {
"display_name": "llambda-env",
"display_name": "langfair",
"language": "python",
"name": "llambda-env"
"name": "langfair"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -414,7 +414,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
"version": "3.9.20"
}
},
"nbformat": 4,
Expand Down
54 changes: 27 additions & 27 deletions examples/evaluations/text_generation/auto_eval_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 2,
"metadata": {
"tags": []
},
Expand All @@ -43,14 +43,14 @@
"from dotenv import find_dotenv, load_dotenv\n",
"from langchain_openai import AzureChatOpenAI\n",
"\n",
"from llambda.auto import AutoEval\n",
"from langfair.auto import AutoEval\n",
"\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"metadata": {
"tags": []
},
Expand All @@ -77,7 +77,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 4,
"metadata": {
"tags": []
},
Expand All @@ -99,7 +99,7 @@
" \"#Person1#: Watsup, ladies! Y'll looking'fine tonight. May I have this dance?\\\\n#Person2#: He's cute! He looks like Tiger Woods! But, I can't dance. . .\\\\n#Person1#: It's all good. I'll show you all the right moves. My name's Malik.\\\\n#Person2#: Nice to meet you. I'm Wen, and this is Nikki.\\\\n#Person1#: How you feeling', vista? Mind if I take your friend'round the dance floor?\\\\n#Person2#: She doesn't mind if you don't mind getting your feet stepped on.\\\\n#Person1#: Right. Cool! Let's go!\\n\"]"
]
},
"execution_count": 3,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -118,7 +118,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 5,
"metadata": {
"tags": []
},
Expand All @@ -132,7 +132,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `AutoEval()` - For calculating all toxicity, stereotype, and counterfactual metrics supported by LLaMBDA\n",
"#### `AutoEval()` - For calculating all toxicity, stereotype, and counterfactual metrics supported by LangFair\n",
"\n",
"**Class Attributes:**\n",
"- `prompts` - (**list of strings**)\n",
Expand Down Expand Up @@ -173,7 +173,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"metadata": {
"tags": []
},
Expand All @@ -191,7 +191,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"metadata": {
"tags": []
},
Expand All @@ -216,7 +216,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"metadata": {
"tags": []
},
Expand All @@ -227,35 +227,35 @@
"text": [
"\u001b[1mStep 1: Fairness Through Unawareness\u001b[0m\n",
"------------------------------------\n",
"LLaMBDA: Number of prompts containing race words: 0\n",
"- LLaMBDA: The prompts satisfy fairness through unawareness for race words, the recommended risk assessment only include Toxicity\n",
"LLaMBDA: Number of prompts containing gender words: 31\n",
"- LLaMBDA: The prompts do not satisfy fairness through unawareness for gender words, the recommended risk assessments include Toxicity, Stereotype, and Counterfactual Discrimination.\n",
"langfair: Number of prompts containing race words: 0\n",
"- langfair: The prompts satisfy fairness through unawareness for race words, the recommended risk assessment only include Toxicity\n",
"langfair: Number of prompts containing gender words: 31\n",
"- langfair: The prompts do not satisfy fairness through unawareness for gender words, the recommended risk assessments include Toxicity, Stereotype, and Counterfactual Discrimination.\n",
"\n",
"\u001b[1mStep 2: Generate Counterfactual Dataset\u001b[0m\n",
"---------------------------------------\n",
"LLaMBDA: gender words found in 31 prompts.\n",
"langfair: gender words found in 31 prompts.\n",
"Generating 25 responses for each gender prompt...\n",
"LLaMBDA: Responses successfully generated!\n",
"langfair: Responses successfully generated!\n",
"\n",
"\u001b[1mStep 3: Generating Model Responses\u001b[0m\n",
"----------------------------------\n",
"LLaMBDA: Generating 25 responses per prompt...\n",
"LLaMBDA: Responses successfully generated!\n",
"langfair: Generating 25 responses per prompt...\n",
"langfair: Responses successfully generated!\n",
"\n",
"\u001b[1mStep 4: Evaluate Toxicity Metrics\u001b[0m\n",
"---------------------------------\n",
"LLaMBDA: Computing toxicity scores...\n",
"LLaMBDA: Evaluating metrics...\n",
"langfair: Computing toxicity scores...\n",
"langfair: Evaluating metrics...\n",
"\n",
"\u001b[1mStep 5: Evaluate Stereotype Metrics\u001b[0m\n",
"-----------------------------------\n",
"LLaMBDA: Computing stereotype scores...\n",
"LLaMBDA: Evaluating metrics...\n",
"langfair: Computing stereotype scores...\n",
"langfair: Evaluating metrics...\n",
"\n",
"\u001b[1mStep 6: Evaluate Counterfactual Metrics\u001b[0m\n",
"---------------------------------------\n",
"LLaMBDA: Evaluating metrics...\n"
"langfair: Evaluating metrics...\n"
]
}
],
Expand Down Expand Up @@ -620,15 +620,15 @@
],
"metadata": {
"environment": {
"kernel": "llambda-env",
"kernel": "langfair",
"name": "workbench-notebooks.m121",
"type": "gcloud",
"uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m121"
},
"kernelspec": {
"display_name": "llambda-env",
"display_name": "langfair",
"language": "python",
"name": "llambda-env"
"name": "langfair"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -640,7 +640,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
"version": "3.9.20"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit cc04376

Please sign in to comment.