Skip to content

SahilChachra/Refusal-Finetuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Refusal Training

License: MIT Blog Post

This repository contains code for fine-tuning Large Language Models (LLMs) to learn the boundaries of refusal, specifically for cases where the provided context lacks sufficient information.

For a detailed explanation of the experiment and results, check out the blog post: Fine-Tuning LLMs for Refusal.

🚀 Overview

The goal of this project is to calibrate LLMs to refuse answering questions when:

  1. The answer is not present in the provided context.
  2. The context is missing entirely.
  3. The question might lead to hallucinations.

🛠️ Project Structure

.
├── src/
│   ├── dataset_prep.py    # Data preparation and calibration logic
│   ├── train.py           # QLoRA fine-tuning script
│   └── model_eval.py      # Evaluation and judging logic
├── data/                  # (Optional) Directory for raw/processed datasets
├── outputs/               # Model checkpoints and metrics
├── requirements.txt       # Project dependencies
├── LICENSE                # MIT License
└── README.md              # Project documentation

📦 Installation

# Clone the repository
git clone https://github.com/your-username/refusal-training.git
cd refusal-training

# Install dependencies
pip install -r requirements.txt

📖 Usage

1. Data Preparation

Generate calibrated training, evaluation, and test datasets from various sources (SQuAD, Natural Questions, etc.).

python src/dataset_prep.py

2. Fine-Tuning

Train the model using QLoRA. The script is configured for Qwen-based models but can be adapted.

python src/train.py

3. Evaluation

Evaluate the fine-tuned model's refusal accuracy and hallucination rate using an LLM judge (via OpenRouter).

python src/model_eval.py

📊 Methodology

The training workflow involves creating a balanced dataset with:

  • Answerable Questions: Standard RAG examples.
  • Unanswerable Questions: Questions where the context is provided but doesn't contain the answer.
  • Missing Context: Questions with no context where a refusal is expected if information is unknown.
  • Hallucination Prevention: Ensuring the model doesn't fabricate facts.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

✍️ Author

Sahil Chachra

About

This repo explains how to finetune an LLM to update its decision boundary to correctly refuse to answer when context lacks the data to prevent hallucinations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages