Fine-tuning Llama-2-7B for Text classification.
Datasets: imdb , framework: deepspeed.
Refer to DFSS (ppopp'23), which dynamically cuts the attention matrix to 2:4 during fine-tuning and inference.
So in this repo, I changed Transformers packge's code, src/transformers/models/llama/modeling_llama.py, which means Sparse Attention Mechanism was applied duing training and inference.
Model: LLaMA-2-7b.
Datasets: imdb. features: ['text', 'label'].
Native llama is not suitable for such tasks.
25000: 'eval_accuracy': 0.49368.
Full parameter fine-tuning, using DeepSpeed-zero-2/3 acceleration, fp16, zero-2, 4 GPUs.
- conda create -n Ft24 python=3.10
- conda activate Ft24
- pip install requirements.txt
- CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.run --nproc_per_node=4 llama2-full.py
- data set: train datasets: 500, epoch: 2 eval datasets: 1000
- Time consumed: 4 GPUs(NVIDIA A6000), batch size=4, about 35 minutes
Use the checkpoint after fine-tuning to eval the native model and the pruned model (2:4).
- CUDA_VISIBLE_DEVICES=0 python llama2-full_eval.py
- Time consumption: single card, batch size=1, about 50 minutes.
- Native llama: 25000: 'eval_accuracy': 0.92032
- Fine-tuned llama: 25000: 'eval_accuracy': 0.9014