This project demonstrates how to fine-tune the LLaMA-3.1-8B model using LoRA adapters, apply chat templates, and save the model for inference. The model is trained on local data, optimized for parameter-efficient training, and deployed to the Hugging Face Hub.
- Model: LLaMA-3.1-8B with 4-bit quantization for efficient memory usage.
- Techniques: Fine-tuning using LoRA (Low-Rank Adaptation) adapters, gradient checkpointing, and chat templates.
- Data: Custom local CSV file used for training.
- Goal: Train and deploy a chatbot model capable of handling user input in conversation-based formats.
- LoRA Adapters: Parameter-efficient fine-tuning.
- Quantization: Efficient memory usage with 4-bit precision.
- Chat Templates: Structured conversation flow with tokenization.
- Model Deployment: Saving and pushing models to Hugging Face Hub in different quantization formats.
To run this project, you'll need to install the required packages. You can set this up in Google Colab or your local environment:
pip install torch transformers datasets pandas unsloth trl