-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] compatibility with third-party Inference providers #1077
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
- [@huggingface/tasks](packages/tasks/README.md): The definition files and source-of-truth for the Hub's main primitives like pipeline tasks, model libraries, etc. | ||
- [@huggingface/jinja](packages/jinja/README.md): A minimalistic JS implementation of the Jinja templating engine, to be used for ML chat templates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @xenova fyi
LGTM. When I have time I'll look at the tests. |
@coyotte508 i think i can use VCR tapes, looks like a cool feature |
cc @Aschen ⬆️ :) |
/// TODO we wil proxy the request server-side (using our own keys) and handle billing for it on the user's HF account. | ||
throw new Error("Inference proxying is not implemented yet"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bf47b3b
to
238567a
Compare
daec1eb
to
3939438
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
switch (provider) { | ||
case "replicate": | ||
return REPLICATE_MODEL_IDS[model]; | ||
case "sambanova": | ||
return SAMBANOVA_MODEL_IDS[model]; | ||
case "together": | ||
return TOGETHER_MODEL_IDS[model]?.id; | ||
case "fal-ai": | ||
return FAL_AI_MODEL_IDS[model]; | ||
default: | ||
return model; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should maybe at least return REPLICATE_MODEL_IDS[model] ?? model;
to fallback to the provided id in case the user directly provided a replicate model id and not a HF model ID. Same with the others.
Note that we could maybe maintain a mapping in the backend and in case of errors try to load it -only once (like we do for default models associated to tasks). Just a thought for the future, but it would enable new mappings without updating the lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should maybe at least return REPLICATE_MODEL_IDS[model] ?? model; to fallback to the provided id in case the user directly provided a replicate model id and not a HF model ID. Same with the others.
cc @julien-c - we discussed it and decided to stick to HF model IDs for now
Note that we could maybe maintain a mapping in the backend and in case of errors try to load it -only once (like we do for default models associated to tasks). Just a thought for the future, but it would enable new mappings without updating the lib.
Yes, we definitely want a way for 3rd party providers to expose the mapping HF model ID
-> Provider ID
that does not require hardcoding / updating the huggingface.js
lib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we discussed it and decided to stick to HF model IDs for now
Yes. simpler to always be "Hub-centric"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes and i now remember, the ?? model
in my mind was to work out of the box for models that have the same id on the inference provider as the HF id, NOT to work if you pass the provider's (different) id
i think this is ready to merge 🎉 🌮 |
Hey @julien-c @coyotte508 glad to see that it's still here and it's useful ;-) Continue your great job here, I like it 😄 |
export const FAL_AI_MODEL_IDS: Record<ModelId, FalAiId> = { | ||
/** text-to-image */ | ||
"black-forest-labs/FLUX.1-schnell": "fal-ai/flux/schnell", | ||
"black-forest-labs/FLUX.1-dev": "fal-ai/flux/dev", | ||
|
||
/** automatic-speech-recognition */ | ||
"openai/whisper-large-v3": "fal-ai/whisper", | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk if it has been discussed somewhere but I feel that at some point these mappings should be maintained as an API lazy-loaded by the inference client (same as what we are doing for the tasks).
This way, we would be able to retrieve it in huggingface_hub
+ could be quickly updated without client releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scroll up: #1077 (comment) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great minds think alike
- Update the `makeRequestOptions` function to make the logic (hopefully) more readable; especially since #1077 - Stop support of model URLs - Update tapes
TL;DR
Allow users to request 3rd party inference providers (Sambanova, Replicate, Together, Fal) with
@huggingface/inference
for a curated set of models on the HF HubFor now, Requesting a 3rd party inference provider requires users to pass an api key from this provider as a parameter to the inference function.