Skip to content

pugazhendhit/hugging-face-tutorials

 
 

Repository files navigation

CI Codespaces Prebuilds

Hugging Face Tutorials

Push model

fine-tune

Follow steps in guide: https://huggingface.co/docs/transformers/training

  1. Login:
  • huggingface-cli login

If you get output about Authenticated through git-credential store but this isn't the helper defined on your machine., then follow the instructions to fix.

Tip: You can get your token from https://huggingface.co/settings/tokens and it needs to be a WRITE token.

  1. Run python hugging-face/hf_fine_tune_hello_world.py

Create data

Manually upload data from UX or from API.

To load do the following:

from datasets import load_dataset
remote_dataset = load_dataset("noahgift/social-power-nba")
remote_dataset

Recommended Tutorial Followup

  1. Find a simple and small dataset: kaggle, your own, a sample dataset
  2. Go to Hugging Face website and upload
  3. Download and explore dataset
  4. Enhance dataset by filling out dataset metadata.
  5. Build a Demo for it.

Generally useful skills

Use the huggingface-cli

(venv) @noahgift ➜ /workspaces/hugging-face-tutorials (GPU) $ huggingface-cli scan-cache
REPO ID                      REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH                                                                   
---------------------------- --------- ------------ -------- ------------- ------------- ---- ---------------------------------------------------------------------------- 
bert-base-cased              model           436.4M        5 2 days ago    2 days ago    main /home/codespace/.cache/huggingface/hub/models--bert-base-cased               
bert-base-uncased            model           441.2M        5 2 hours ago   2 hours ago   main /home/codespace/.cache/huggingface/hub/models--bert-base-uncased             
google/pegasus-cnn_dailymail model             1.9M        4 1 hour ago    1 hour ago    main /home/codespace/.cache/huggingface/hub/models--google--pegasus-cnn_dailymail 
gpt2                         model           551.0M        5 2 days ago    2 days ago    main /home/codespace/.cache/huggingface/hub/models--gpt2                          
gpt2-xl                      model             6.4G        5 1 hour ago    1 hour ago    main /home/codespace/.cache/huggingface/hub/models--gpt2-xl  

Create model

Recommended Tutorial Followup

  1. Upload model to Hugging Face website
  2. Fill out model card
  3. Use model

Fine-Tuning Hugging Face Models Tutorial

Why transfer learning?

10-7-transformers

  • One batch in PyTorch
  • Using sacrebleu (precision based "Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances"). Recall is "while recall (also known as sensitivity) is the fraction of relevant instances that were retrieved" - wikipedia
  • The ROUGE score was specifically developed for applications like summarization where high recall is more important than just precision!
rouge_metric = load_metric("rouge")
from datasets import load_metric
bleu_metric = load_metric("sacrebleu")

Push to Hub

use huggingface-cli login and pass in your token

Create spaces

Verify GPU works

The following examples test out the GPU

  • run pytorch training test: python utils/quickstart_pytorch.py
  • run pytorch CUDA test: python utils/verify_cuda_pytorch.py
  • run tensorflow training test: python utils/quickstart_tf2.py
  • run nvidia monitoring test: nvidia-smi -l 1 it should show a GPU
  • run whisper transcribe test ./utils/transcribe-whisper.sh and verify GPU is working with nvidia-smi -l 1

Additionally, this workspace is setup to fine-tune Hugging Face

fine-tune

python hf_fine_tune_hello_world.py

Used in Following Projects

Used as the base and customized in the following Duke MLOps and Applied Data Engineering Coursera Labs:

References

About

tutorials on Hugging Face

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 89.8%
  • Python 9.3%
  • Other 0.9%