ODSC West 2024 Hackathon with NVIDIA

Welcome to the central repository for the ODSC West 2024 Hackathon with NVIDIA!

❔ For more information on the hackathon itself, check out this webpage or this FAQ. ❔

Overview of the Hackathon:

Your goal in this Hackathon is to train google/gemma-2-2b using PEFT LoRA on a legal tag-classification task. You'll be using the Law-StackExchange as the base dataset for this task.

You will use NeMo Curator to curate data and NeMo FW to customize it and then evaluate your model!

You are free to:

Modify training hyperparameters
Modify, Augment (with SDG, etc) the training dataset
Modify the NeMo Curator curation pipeline

Scoring:

Your (or your team's) scores will be based on multi-label F1 scores, determined by comparing your generated predictions on the submission dataset against the held-out labels.

In case of the ties, we will be using the videos you submitted to gauge your understanding of the data, NeMo curator, and NeMo framework:

Understanding the data and usage of the NeMo Curator
Deep understanding of data processing pipeline. Usage of the most relevant data processing steps.
Understanding the fine-tuning and usage of the NeMo framework
Excellent grasp of fine-tuning techniques and using various hyperparameters for optimal model accuracy and customization.

Overview of this Repository

The repository will guide you through a boilerplate example of NeMo Curator curation pipelines and NeMo FW customization, model loading, and inference.

There are a total of three Jupyter Notebooks to work through:

Data Curation

This notebook will take you through the downloading, processing, and then curating the target dataset

Downloading the Model

This notebook will download the model and convert it to a NeMo FW compatible format

Training a LoRA Adapter

This notebook will go through how to fine-tune the model using PEFT LoRA, and then how to generate submission responses

Deliverables

You must submit (according to this form) in a SINGLE Google Drive:

Your predicted tag submission .JSONL file.
Your LoRA Adapters
Your notebooks (with outputs)
A 3min. video explaining your process (code walkthrough not required).

Conclusion

Have fun! 🎉

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data/submission		data/submission
helpers		helpers
README.md		README.md
step-1-data-curation.ipynb		step-1-data-curation.ipynb
step-2-download-model.ipynb		step-2-download-model.ipynb
step-3-training.ipynb		step-3-training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ODSC West 2024 Hackathon with NVIDIA

Overview of the Hackathon:

Scoring:

Overview of this Repository

Deliverables

Conclusion

About

Releases

Packages

Contributors 2

Languages

chrisalexiuk-nvidia/ODSC-Hackathon-Repository

Folders and files

Latest commit

History

Repository files navigation

ODSC West 2024 Hackathon with NVIDIA

Overview of the Hackathon:

Scoring:

Overview of this Repository

Deliverables

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages