This repository contains scripts and notebooks for efficiently fine-tuning and interpreting large language models (LLMs) using state-of-the-art quantization, fine-tuning techniques, and interpretability tools. Specifically, the project leverages 4-bit quantization for memory efficiency, LoRA (Low-Rank Adaptation) for fine-tuning efficiency, and LLM Explainer for model interpretability.
This notebook demonstrates how to load and quantize a full precision FP16 Hugging Face transformer model into 4-bit precision.
Features:
- Efficient loading of large FP16 models.
- Implementation of 4-bit quantization.
- Uploading quantized models to Hugging Face model repository.
Steps included:
- Loading a full FP16 model from Hugging Face.
- Applying quantization techniques (BitsAndBytes).
- Validation and memory footprint analysis.
- Saving and pushing quantized model to Hugging Face Hub.
Requirements:
- Hugging Face transformers
- BitsAndBytes
- PyTorch
This notebook provides the procedure to fine-tune the previously quantized model using Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning technique.
Features:
- Efficient fine-tuning with reduced computational requirements.
- Integration with quantized models.
Steps included:
- Loading a quantized model.
- Configuring and applying LoRA.
- Fine-tuning on custom datasets.
- Evaluating model performance during training.
Requirements:
- PEFT (Parameter-Efficient Fine-Tuning) from Hugging Face
- PyTorch
- Transformers
This notebook illustrates the use of LLM Explainer for interpreting and analyzing predictions made by fine-tuned LLMs, enhancing transparency and trustworthiness.
Features:
- Explanation of individual model predictions.
- Visualization of attention and important tokens.
- Comparison between predictions.
Steps included:
- Loading fine-tuned LoRA model.
- Using LLM Explainer to generate interpretations.
- Visualization of interpretations.
Requirements:
- LLM Explainer
- Transformers
- PyTorch
pip install transformers bitsandbytes peft llm_explainer- Ensure each notebook is executed sequentially: first quantize, then fine-tune, and finally interpret the model.
- Adjust configuration parameters in each notebook according to your dataset and model specifics.
Contributions and improvements are welcome! Please submit pull requests or open issues for discussion.
This project is licensed under the MIT License. See the LICENSE file for details.