lora/qlora training w/wo unsloth on NVIDIA L4 (~20GB VRAM usage) #33

ajaymin28 · 2025-06-17T22:32:02Z

Pull Request Summary #12

issues addressed

L4 GPU with ~20GB ram usage batch=16 Finetuning with GPU #9
Added LoRA and QLoRA support [Add] LoRA support #10 Add QLoRA support file #13
Save best weights based on validation loss Improved Training Loop: Add Validation and Best Checkpoint Saving #23

Major updates

The files config.py, create_dataset.py, and utils.py (now renamed to utilities.py) have been moved to the utils directory to keep the project root organized and clean.

Summary

Refactored and modularized the codebase to support extensive configuration(OmegaConf) and future extensions.
Enhanced argument parsing to convert dataclass configurations to CLI arguments: dataclass → CLI args → final config.
Implemented model checkpointing based on best validation loss and tested Weights & Biases (wandb) logging.
Added support for pushing the best model checkpoint to the Hugging Face Hub.
Find wandb report here: https://api.wandb.ai/links/ajaymin28-university-of-central-florida/4kw0i5o2
Find Weights here: https://huggingface.co/ajaymin28/Gemma3_ObjeDet
log file: unsloth_log_gemma3.log

What Works

Training and validation using Unsloth, qlora vanilla method.
Local saving of model checkpoints on best validation loss.
Loading models from local directories.
Automatically pushing the best model to Hugging Face Hub.

Common errors and resolutions

While Dataset loading if you get an error "Invalid pattern: '**' can only be an entire path component"
Refer this: https://stackoverflow.com/questions/77671277/valueerror-invalid-pattern-can-only-be-an-entire-path-component

AttributeError: 'Gemma3ModelOutputWithPast' object has no attribute 'loss': unslothai/unsloth#2656

Not Yet Tested

The saved model checkpoint has not been thoroughly tested for inference yet. This is a high-priority task.

Packages

except colab:

uv pip install -r requirements.txt
uv pip install unsloth==2025.5.7 unsloth-zoo==2025.5.8 omegaconf

if you are on colab, refer this:

https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3_(4B).ipynb

commands used for training with unsloth on L4

In train.py, uncomment unsloth imports then run below command

python train.py --dtype=bfloat16 --finetune_method=qlora --batch_size=16 --use_unsloth=True --epochs=1 --max_step_to_train=1000 --validate_steps_freq=100

commands used for training without unsloth on L4

In train.py, comment unsloth imports, make sure FastModel = None, then run below command

python train.py --dtype=bfloat16 --finetune_method=qlora --batch_size=6 --use_unsloth=False --epochs=1 --max_step_to_train=1000 --validate_steps_freq=100

loss graphs

TODO

Test the trained model, publish results.
Train other methods, vanilla lora, vanilla qlora and test those models as well.
clean qlora and lora config and test for all configs (rslora, dora etc.).

This is my first contribution to GitHub, and I welcome any feedback or suggestions. I’ve aimed to keep the code simple, well-structured, and easily extendable for new modules, while following best practices throughout.

Lora/QLoRA training with unsloth

lora/qlora training w/wo unsloth on NVIDIA L4 (~20GB VRAM usage)

ajaymin28 · 2025-06-18T23:53:53Z

Prediction Code and Outputs for Unsloth QLoRA

You can find the working inference code for Unsloth QLoRA in this branch:
https://github.com/ajaymin28/gemma3-object-detection/tree/predeval

The associated model is available here:
https://huggingface.co/ajaymin28/Gemma3_ObjeDet

Note:
The current predictions are not optimal, as the model was trained for only 100 steps.

For evaluation, I’ve provided a sample output pickle file:
outputs/infer_data_unsloth_qlora.pkl

The pickle file contains a list of dictionaries for the test set, with the following fields:

{
  "image_id": image_id,
  "width": width,
  "height": height,
  "output_text": output_text  # predicted text from the model
}

ajaymin28 · 2025-06-19T23:16:03Z

Closing this for now, I don't think anyone needs this. Let me know otherwise.

ajaymin28 and others added 24 commits June 16, 2025 10:09

initial changes [untested]

6d42662

fixed device map to auto

b0b79fd

moved root files to modules

e239b6c

updated lora training code

3311eb6

updated quantization option for qlora

8356959

updated quantization option for qlora

c8eea4e

added unsloth changes

d8cebfb

fixed tokenizer issue, made img size to small

5c59f8e

unsloth uncommented

d0bdf5e

set quant type to fp4

1514fdb

cfg print

5fd67e4

wandb disable

edb273e

workig code for unsloth

f9abfb0

final checkpoint for usloth training [uncleaned]

3ed763b

final checkpoint for usloth training [uncleaned]

56b25d0

lad best model back and push to hub

dcfd2d7

tested code train/val, save models

c40dcbf

Merge pull request #1 from ajaymin28/LoraTraining

9146526

Lora/QLoRA training with unsloth

cleaned code

fd31e7c

added TODOs for cleanup and new features

0edd0c6

working code for unsloth qlora, vanilla qlora on l4

ef628bb

updated requirements

de70910

cleaned code, enable wandb, tested google gemma model

d5fd4e0

Merge pull request #2 from ajaymin28/LoraTraining

95f62ce

lora/qlora training w/wo unsloth on NVIDIA L4 (~20GB VRAM usage)

ajaymin28 closed this Jun 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lora/qlora training w/wo unsloth on NVIDIA L4 (~20GB VRAM usage) #33

lora/qlora training w/wo unsloth on NVIDIA L4 (~20GB VRAM usage) #33

Uh oh!

ajaymin28 commented Jun 17, 2025 •

edited

Loading

Uh oh!

ajaymin28 commented Jun 18, 2025

Uh oh!

ajaymin28 commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lora/qlora training w/wo unsloth on NVIDIA L4 (~20GB VRAM usage) #33

lora/qlora training w/wo unsloth on NVIDIA L4 (~20GB VRAM usage) #33

Uh oh!

Conversation

ajaymin28 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary #12

issues addressed

Major updates

Summary

What Works

Common errors and resolutions

Not Yet Tested

Packages

commands used for training with unsloth on L4

commands used for training without unsloth on L4

loss graphs

TODO

Uh oh!

ajaymin28 commented Jun 18, 2025

Prediction Code and Outputs for Unsloth QLoRA

Uh oh!

ajaymin28 commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ajaymin28 commented Jun 17, 2025 •

edited

Loading