LLM Refusal Training

This repository contains code for fine-tuning Large Language Models (LLMs) to learn the boundaries of refusal, specifically for cases where the provided context lacks sufficient information.

For a detailed explanation of the experiment and results, check out the blog post: Fine-Tuning LLMs for Refusal.

🚀 Overview

The goal of this project is to calibrate LLMs to refuse answering questions when:

The answer is not present in the provided context.
The context is missing entirely.
The question might lead to hallucinations.

🛠️ Project Structure

.
├── src/
│   ├── dataset_prep.py    # Data preparation and calibration logic
│   ├── train.py           # QLoRA fine-tuning script
│   └── model_eval.py      # Evaluation and judging logic
├── data/                  # (Optional) Directory for raw/processed datasets
├── outputs/               # Model checkpoints and metrics
├── requirements.txt       # Project dependencies
├── LICENSE                # MIT License
└── README.md              # Project documentation

📦 Installation

# Clone the repository
git clone https://github.com/your-username/refusal-training.git
cd refusal-training

# Install dependencies
pip install -r requirements.txt

📖 Usage

1. Data Preparation

Generate calibrated training, evaluation, and test datasets from various sources (SQuAD, Natural Questions, etc.).

python src/dataset_prep.py

2. Fine-Tuning

Train the model using QLoRA. The script is configured for Qwen-based models but can be adapted.

python src/train.py

3. Evaluation

Evaluate the fine-tuned model's refusal accuracy and hallucination rate using an LLM judge (via OpenRouter).

python src/model_eval.py

📊 Methodology

The training workflow involves creating a balanced dataset with:

Answerable Questions: Standard RAG examples.
Unanswerable Questions: Questions where the context is provided but doesn't contain the answer.
Missing Context: Questions with no context where a refusal is expected if information is unknown.
Hallucination Prevention: Ensuring the model doesn't fabricate facts.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

✍️ Author

Sahil Chachra

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Refusal Training

🚀 Overview

🛠️ Project Structure

📦 Installation

📖 Usage

1. Data Preparation

2. Fine-Tuning

3. Evaluation

📊 Methodology

📄 License

✍️ Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LLM Refusal Training

🚀 Overview

🛠️ Project Structure

📦 Installation

📖 Usage

1. Data Preparation

2. Fine-Tuning

3. Evaluation

📊 Methodology

📄 License

✍️ Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages