Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from_huggingface method to KerasNLP models #1294

Open
shivance opened this issue Nov 6, 2023 · 5 comments
Open

Add from_huggingface method to KerasNLP models #1294

shivance opened this issue Nov 6, 2023 · 5 comments
Labels
infra type:feature New feature or request

Comments

@shivance
Copy link
Collaborator

shivance commented Nov 6, 2023

Add support for loading huggingface model checkpoints in KerasNLP backbones

Is your feature request related to a problem? Please describe.
As of now KerasNLP backbones load pretrained weights of standard checkpoints. However there are lots of fine-tuned checkpoints on huggingface hub which most of the time solve a lot of problems. If we add this functionality of supporting HF checkpoints, we can truly fulfil them with Keras's Multi-backend-promise with KerasNLP's modular design for most of the NLP community.

Describe the solution you'd like
Implemnting from_huggingface method passing checkpoint name from huggingface
All it will require is mapping layer names and implementing checkpoint-conversion scripts as methods.

Alternative solution
Instead of implementing a seperate method, we could modify from_preset method to use huggingface checkpoints

I'm up for contributing this feature.

cc: @abheesht17

@shivance shivance added type:feature New feature or request infra labels Nov 6, 2023
@shivance shivance changed the title Add support for loading huggingface model checkpoints in KerasNLP backbones Add from_huggingface method to KerasNLP models Nov 6, 2023
@mattdangerw
Copy link
Member

I will check with other folks here, but I think this is something we probably will not want to pursue. We could mirror all our own presets on huggingface, or make it easier to bulk convert hf checkpoints offline, but I do not think converting huggingface checkpoints "live" will be a good solution.

  • Performance will be bad, as you need to load the model twice and assign weights, often from torch -> jax or tensorflow. So basically you are 2x-ing memory usage for a bit. For larger models this is important enough to be a deal breaker, and would make an offline solution more appealing.
  • Keeping this working reliably will be tricky. Huggingface has there own release schedule and versioning, will occasionally update their own model configs etc. It would be very tricky to guarantee we could convert checkpoints reliably across a large swatch of huggingface versions indefinitely. And a feature that works on, idk, <10% of actually huggingface model ids, seems like potentially bad UX to put in the library.

I think right now it makes sense to continue to integrate with Kaggle #1292, which will help us define a external friendly format for our presets. Once we have that we could consider exposing a set of tools to automatically convert huggingface model to our format on a best effort basis. This would never included all models or all huggingface config options (I just don't see that happening feasibly), but it could be easy to use and side step the performance issues mentioned above.

@Wauplin
Copy link
Contributor

Wauplin commented Feb 27, 2024

Hey there 🤗

I think there is a confusion in this issue between 2 different topics:

  1. Allowing keras_nlp to load transformers-based checkpoints: this seem to be what's described by @shivance, right?
  2. Loading models from the Hugging Face Hub.

The HF Hub is a platform to host and share all kinds of models, and not only transformers ones. While topic 1. might be tricky for reasons explained by @mattdangerw in #1294 (comment), I do think hosting KerasNLP models on the HF Hub would make sense. Now that both KaggleHub and GS presets are supported, adding a new preset provider doesn't seem too complex.

I have actually worked on a fork to showcase how the implementation would look like: master...Wauplin:keras-nlp:huggingface-hub-integration. The integration requires the huggingface_hub library. Authentication can be configured with the HF_TOKEN environment variable (only for private models or for uploads, similarly to KaggleHub).

Here is a Colab notebook showcasing it.

import keras_nlp
from keras_nlp.models import BertClassifier
from keras_nlp.utils.preset_utils import save_to_preset

classifier = BertClassifier.from_preset("bert_base_en_uncased")
(...) # train/retrain/fine-tune

# Save to Hugging Face Hub
save_to_preset(classifier, "hf://Wauplin/bert_base_en_uncased_retrained")

# Reload from Hugging Face Hub
classifier_reloaded = BertClassifier.from_preset("hf://Wauplin/bert_base_en_uncased_retrained")

Here is how it looks like once uploaded on the Hub: https://huggingface.co/Wauplin/bert_base_en_uncased_retrained/tree/main..
If we go this way, I think we should also upload a default model card with keras-nlp tag to make all KerasNLP models discoverable on the Hub.

WDYT? I am wiling to help creating a PR if this is of interest for the Keras team. It is essentially what's already in the fork + some documentation and testing. On the Hugging Face side, we could make KerasNLP an official library (e.g. searchable, with code snippets, download counts, etc.).

Disclaimer: I work at Hugging Face as a maintainer of the huggingface_hub library.

@Wauplin
Copy link
Contributor

Wauplin commented Mar 13, 2024

Following my comment above, I've opened #1510 to continue the discussion :)

@mattdangerw
Copy link
Member

Thanks! Overall totally agree with your comment.

Let's add a hf:// flows for saving and upload, so people can easily host/share weights on the Hugging Face model hub. We are hoping to expose a public form of save_to_preset this week, @SamanehSaadat is working on this. So we might wait to merge that PR until we have the whole picture of download/upload sorted. But let's get it in! (And thanks very much!)

Re: conversion, I still do think a solid set of tooling for converting bi-directionally from transformers format <> KerasNLP format is important, at least for popular architectures (gemma, llama, mistral, falcon, bloom...). But see more design questions there to nail down. Let's start with the hub integration, and keep figuring out what we want for conversion tooling.

Note that for Gemma, @nkovela1 did give us a tool for HF export -> https://github.com/keras-team/keras-nlp/blob/master/tools/gemma/export_gemma_to_hf.py, so a flow of fine-tuning with Keras exporting to vllm, TGI, etc is possible. But I suspect we might want to move stuff like that into the library proper at some point.

@mattdangerw
Copy link
Member

mattdangerw commented Mar 14, 2024

Draft for public saving API -> #1512, though expect some changes. Comments welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra type:feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants