Skip to content

chrisalexiuk-nvidia/ODSC-Hackathon-Repository

Repository files navigation

ODSC West 2024 Hackathon with NVIDIA

Welcome to the central repository for the ODSC West 2024 Hackathon with NVIDIA!

❔ For more information on the hackathon itself, check out this webpage or this FAQ. ❔

Overview of the Hackathon:

Your goal in this Hackathon is to train google/gemma-2-2b using PEFT LoRA on a legal tag-classification task. You'll be using the Law-StackExchange as the base dataset for this task.

You will use NeMo Curator to curate data and NeMo FW to customize it and then evaluate your model!

You are free to:

  • Modify training hyperparameters
  • Modify, Augment (with SDG, etc) the training dataset
  • Modify the NeMo Curator curation pipeline

Scoring:

Your (or your team's) scores will be based on multi-label F1 scores, determined by comparing your generated predictions on the submission dataset against the held-out labels.

In case of the ties, we will be using the videos you submitted to gauge your understanding of the data, NeMo curator, and NeMo framework:

  • Understanding the data and usage of the NeMo Curator
  • Deep understanding of data processing pipeline. Usage of the most relevant data processing steps.
  • Understanding the fine-tuning and usage of the NeMo framework
  • Excellent grasp of fine-tuning techniques and using various hyperparameters for optimal model accuracy and customization.

Overview of this Repository

The repository will guide you through a boilerplate example of NeMo Curator curation pipelines and NeMo FW customization, model loading, and inference.

There are a total of three Jupyter Notebooks to work through:

  1. Data Curation
  • This notebook will take you through the downloading, processing, and then curating the target dataset
  1. Downloading the Model
  • This notebook will download the model and convert it to a NeMo FW compatible format
  1. Training a LoRA Adapter
  • This notebook will go through how to fine-tune the model using PEFT LoRA, and then how to generate submission responses

Deliverables

You must submit (according to this form) in a SINGLE Google Drive:

  1. Your predicted tag submission .JSONL file.
  2. Your LoRA Adapters
  3. Your notebooks (with outputs)
  4. A 3min. video explaining your process (code walkthrough not required).

Conclusion

Have fun! 🎉

About

A repository for the OSDC NVIDIA NeMo Hackathon!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published