Add `from_huggingface` method to KerasNLP models #1294

shivance · 2023-11-06T03:33:10Z

Add support for loading huggingface model checkpoints in KerasNLP backbones

Is your feature request related to a problem? Please describe.
As of now KerasNLP backbones load pretrained weights of standard checkpoints. However there are lots of fine-tuned checkpoints on huggingface hub which most of the time solve a lot of problems. If we add this functionality of supporting HF checkpoints, we can truly fulfil them with Keras's Multi-backend-promise with KerasNLP's modular design for most of the NLP community.

Describe the solution you'd like
Implemnting from_huggingface method passing checkpoint name from huggingface
All it will require is mapping layer names and implementing checkpoint-conversion scripts as methods.

Alternative solution
Instead of implementing a seperate method, we could modify from_preset method to use huggingface checkpoints

I'm up for contributing this feature.

cc: @abheesht17

The text was updated successfully, but these errors were encountered:

mattdangerw · 2023-11-08T18:26:36Z

I will check with other folks here, but I think this is something we probably will not want to pursue. We could mirror all our own presets on huggingface, or make it easier to bulk convert hf checkpoints offline, but I do not think converting huggingface checkpoints "live" will be a good solution.

Performance will be bad, as you need to load the model twice and assign weights, often from torch -> jax or tensorflow. So basically you are 2x-ing memory usage for a bit. For larger models this is important enough to be a deal breaker, and would make an offline solution more appealing.
Keeping this working reliably will be tricky. Huggingface has there own release schedule and versioning, will occasionally update their own model configs etc. It would be very tricky to guarantee we could convert checkpoints reliably across a large swatch of huggingface versions indefinitely. And a feature that works on, idk, <10% of actually huggingface model ids, seems like potentially bad UX to put in the library.

I think right now it makes sense to continue to integrate with Kaggle #1292, which will help us define a external friendly format for our presets. Once we have that we could consider exposing a set of tools to automatically convert huggingface model to our format on a best effort basis. This would never included all models or all huggingface config options (I just don't see that happening feasibly), but it could be easy to use and side step the performance issues mentioned above.

Wauplin · 2024-02-27T15:48:43Z

Hey there 🤗

I think there is a confusion in this issue between 2 different topics:

Allowing keras_nlp to load transformers-based checkpoints: this seem to be what's described by @shivance, right?
Loading models from the Hugging Face Hub.

The HF Hub is a platform to host and share all kinds of models, and not only transformers ones. While topic 1. might be tricky for reasons explained by @mattdangerw in #1294 (comment), I do think hosting KerasNLP models on the HF Hub would make sense. Now that both KaggleHub and GS presets are supported, adding a new preset provider doesn't seem too complex.

I have actually worked on a fork to showcase how the implementation would look like: master...Wauplin:keras-nlp:huggingface-hub-integration. The integration requires the huggingface_hub library. Authentication can be configured with the HF_TOKEN environment variable (only for private models or for uploads, similarly to KaggleHub).

Here is a Colab notebook showcasing it.

import keras_nlp
from keras_nlp.models import BertClassifier
from keras_nlp.utils.preset_utils import save_to_preset

classifier = BertClassifier.from_preset("bert_base_en_uncased")
(...) # train/retrain/fine-tune

# Save to Hugging Face Hub
save_to_preset(classifier, "hf://Wauplin/bert_base_en_uncased_retrained")

# Reload from Hugging Face Hub
classifier_reloaded = BertClassifier.from_preset("hf://Wauplin/bert_base_en_uncased_retrained")

Here is how it looks like once uploaded on the Hub: https://huggingface.co/Wauplin/bert_base_en_uncased_retrained/tree/main..
If we go this way, I think we should also upload a default model card with keras-nlp tag to make all KerasNLP models discoverable on the Hub.

WDYT? I am wiling to help creating a PR if this is of interest for the Keras team. It is essentially what's already in the fork + some documentation and testing. On the Hugging Face side, we could make KerasNLP an official library (e.g. searchable, with code snippets, download counts, etc.).

Disclaimer: I work at Hugging Face as a maintainer of the huggingface_hub library.

Wauplin · 2024-03-13T17:37:53Z

Following my comment above, I've opened #1510 to continue the discussion :)

mattdangerw · 2024-03-13T18:31:04Z

Thanks! Overall totally agree with your comment.

Let's add a hf:// flows for saving and upload, so people can easily host/share weights on the Hugging Face model hub. We are hoping to expose a public form of save_to_preset this week, @SamanehSaadat is working on this. So we might wait to merge that PR until we have the whole picture of download/upload sorted. But let's get it in! (And thanks very much!)

Re: conversion, I still do think a solid set of tooling for converting bi-directionally from transformers format <> KerasNLP format is important, at least for popular architectures (gemma, llama, mistral, falcon, bloom...). But see more design questions there to nail down. Let's start with the hub integration, and keep figuring out what we want for conversion tooling.

Note that for Gemma, @nkovela1 did give us a tool for HF export -> https://github.com/keras-team/keras-nlp/blob/master/tools/gemma/export_gemma_to_hf.py, so a flow of fine-tuning with Keras exporting to vllm, TGI, etc is possible. But I suspect we might want to move stuff like that into the library proper at some point.

mattdangerw · 2024-03-14T21:07:06Z

Draft for public saving API -> #1512, though expect some changes. Comments welcome!

shivance added type:feature New feature or request infra labels Nov 6, 2023

shivance changed the title ~~Add support for loading huggingface model checkpoints in KerasNLP backbones~~ Add from_huggingface method to KerasNLP models Nov 6, 2023

Wauplin mentioned this issue Mar 13, 2024

Allow saving / loading from Huggingface Hub preset #1510

Merged

mattdangerw mentioned this issue Apr 12, 2024

Issue instantiating a keras_nlp.models.Backbone from a model preset of Hugging Face handles #1574

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `from_huggingface` method to KerasNLP models #1294

Add `from_huggingface` method to KerasNLP models #1294

shivance commented Nov 6, 2023 •

edited

Loading

mattdangerw commented Nov 8, 2023

Wauplin commented Feb 27, 2024 •

edited

Loading

Wauplin commented Mar 13, 2024

mattdangerw commented Mar 13, 2024

mattdangerw commented Mar 14, 2024 •

edited

Loading

Add from_huggingface method to KerasNLP models #1294

Add from_huggingface method to KerasNLP models #1294

Comments

shivance commented Nov 6, 2023 • edited Loading

Add support for loading huggingface model checkpoints in KerasNLP backbones

I'm up for contributing this feature.

mattdangerw commented Nov 8, 2023

Wauplin commented Feb 27, 2024 • edited Loading

Wauplin commented Mar 13, 2024

mattdangerw commented Mar 13, 2024

mattdangerw commented Mar 14, 2024 • edited Loading

Add `from_huggingface` method to KerasNLP models #1294

Add `from_huggingface` method to KerasNLP models #1294

shivance commented Nov 6, 2023 •

edited

Loading

Wauplin commented Feb 27, 2024 •

edited

Loading

mattdangerw commented Mar 14, 2024 •

edited

Loading