[InferenceClient] Add third-party providers support #2757

hanouticelina · 2025-01-17T09:46:34Z

Following huggingface.js#1077 and moon-landing#12072, this PR adds 3rd party inference providers support into huggingface_hub.InferenceClient.

This v0 adds third-party inference provider support in a modular way. Each provider code lives in its own self-contained file under src/huggingface/inference/_providers/ to make it easier for us to add or update a provider. Similarly, in a future PR, we probably should isolate the Inference API specific code and keep InferenceClient as generic as possible.

Note: For fal.ai, we currently call the blocking API endpoint which has a 60s timeout limit, The same applies to Replicate. This limits the models we can use with these providers. In a future PR, we could add continuous polling support to use non-blocking API endpoints, enabling support for longer-running models.

TODO:

Add proxy calls to 3rd party providers. (in a following PR)
Add (VCR) tests.
Update Inference documentation. (in a following PR)

SBrandeis

Let's goooo

src/huggingface_hub/inference/_providers/replicate.py

src/huggingface_hub/inference/_providers/fal_ai.py

SBrandeis · 2025-01-17T10:06:43Z

src/huggingface_hub/inference/_providers/together.py

+            payload["json"].update(
+                {
+                    "model": model,
+                    "response_format": "base64",


src/huggingface_hub/inference/_providers/replicate.py

julien-c

neat 🔥

Wauplin

Super nice seen this taking shape! 🔥 I've started to review the PR and have 2 main comments:

I think InferenceAPI should be considered as a provider to factorize things as much as possible (and avoid the "if provider is not None: ..."
Using classes and inheritance might be avoided (since we don't use any inheritance benefit). BaseProvider should be more like an interface rather than a class.

You'll find my comments below. Hope I did not go too far into overthinking 🙈 Prefer to think this thoroughly before publishing :)

src/huggingface_hub/inference/_common.py

src/huggingface_hub/inference/_client.py

src/huggingface_hub/inference/_generated/_async_client.py

src/huggingface_hub/inference/_client.py

src/huggingface_hub/inference/_providers/base.py

Wauplin · 2025-01-17T13:22:29Z

src/huggingface_hub/inference/_providers/base.py

+@dataclass
+class BaseProvider:
+    """Base class defining the interface for inference providers."""
+
+    BASE_URL: str = field(init=False)
+    MODEL_IDS_MAPPING: Dict[str, str] = field(default_factory=dict, init=False)


I feel that in the current structure, BaseProvider been a class/dataclass is clunky and doesn't add much value compared to a base module. The fact that it's a class is not really used (could be a singleton since it's always instantiated with PROVIDERS[name]()). And the fact that it's a dataclass either since __repr__ will likely be unusable (MODEL_IDS_MAPPING) is too large) and other dataclass benefits are not used (no comparisons, etc.). On the contrary it brings extra complexity in the code like MODEL_IDS_MAPPING: Dict[str, Dict[str, str]] = field(default_factory=lambda: {...}

2 solutions I see here:

Either use a Protocol:

# __init__.py from typing import ... from . import replicate, together, sambanova, fal_ai class Provider(Protocol): """Protocol defining the interface for inference providers.""" BASE_URL: str MODEL_IDS_MAPPING: Dict[str, Dict[str, str]] def build_url(self, task: Optional[str] = None, model: Optional[str] = None) -> str: ... def map_model(self, task: Optional[str] = None, model: Optional[str] = None) -> str: ... def prepare_headers(self, headers: Dict, task: Optional[str] = None, model: Optional[str] = None) -> Dict: ... def prepare_payload(self, input: str, parameters: Dict[str, Any], task: Optional[str] = None, model: Optional[str] = None) -> Dict[str, Any]: ... def get_response(self, response: Union[bytes, Dict], task: Optional[str] = None) -> Any: ... PROVIDERS: Dict[str, Provider] = { "fal-ai": fal_ai, "together": togerther, "sambanova": sambanova, "replicate": replicate, } ...

# replicate.py BASE_URL = "https://api.replicate.com" MODEL_IDS_MAPPING: Dict[str, str] = { "text-to-image": { "black-forest-labs/FLUX.1-schnell": "black-forest-labs/flux-schnell", "ByteDance/SDXL-Lightning": "bytedance/sdxl-lightning-4step:5599ed30703defd1d160a25a63321b4dec97101d98b4674bcc56e41f62f35637", }, } # no need for "self" def build_url(task: Optional[str] = None, model: Optional[str] = None) -> str: if model is not None and ":" in model: return f"{self.BASE_URL}/v1/predictions" return f"{self.BASE_URL}/v1/models/{model}/predictions" ...

Type annotations are still happy and on a maintenance side, that's less indentations, no unused self attribute, no need for __base__.py, no field(default_factory=dict ..., etc.

either use classes but instead of having them on Provider without parameters/attributes, we could have them at a task/model level. So it would be more of a ProviderTaskHelper (or something like this). My reasoning is that all methods build_url, map_model, prepare_payload, etc. heavily depends on task/model so we're always passing it to each method.

def get_provider_helper(provider: str, task: str, model: Optional[str] = None) -> ProviderHelper: """Get provider instance by name.""" if provider not supported: raise ValueError(...) if task not supported by provider: raise ValueError(...) if model not supported by provider: raise ValueError(...) return ...

I feel that with complexity growing (e.g. more providers, more tasks), relying on if task == "...": in the code will start to be more and more complex to maintain. Having 1 class per provider per task will make it more readable and self-container (IMO).

We could also have a mix of 1. and 2. (not 100% sure though) like this:

class TaskProviderHelper(Protocol): def build_url(self, model: Optional[str] = None) -> str: ... def map_model(self, model: Optional[str] = None) -> str: ... def prepare_headers(self, headers: Dict) -> Dict: ... def prepare_payload(self, input: str, parameters: Dict[str, Any]) -> Dict[str, Any]: ... def get_response(self, response: Union[bytes, Dict]) -> Any: ...

and a folder structure like this:

from .replicate import text_to_image as replicate_text_to_image from .together import conversational as together_conversational from .together import text_to_image as together_text_to_image (...) PROVIDERS = { "replicate": { "text_to_image": replicate_text_to_image, }, "together": { "conversational": together_conversational, "text_to_image": together_text_to_image, } def get_provider_helper(provider: str, task: str) -> TaskProviderHelper: return PROVIDERS[provider][task] # with more checks ofc

Side effect: get_response could be correctly type annotated with the expected output for the given task.

yes, agree. to be honest, I was overthinking too much about this part, I will revert back to using a Protocol instead

Wauplin

Looking great! I've left a few comments, mostly related to how headers are handled (important to have them thread-safe) and file structure

src/huggingface_hub/inference/_client.py

src/huggingface_hub/inference/_providers/together/__init__.py

src/huggingface_hub/inference/_providers/__init__.py

src/huggingface_hub/inference/_providers/fal_ai/text_to_image.py

src/huggingface_hub/inference/_providers/hf_inference/_common.py

src/huggingface_hub/inference/_providers/hf_inference/text_to_image.py

Wauplin · 2025-01-22T14:37:24Z

src/huggingface_hub/inference/_client.py

+                The task to perform on the inference. if you are passing a provider, `task` is required.
+                Verify which tasks are supported by the provider.For `hf-inference`, all available tasks
+                can be found [here](https://huggingface.co/tasks).


Suggested change

The task to perform on the inference. if you are passing a provider, `task` is required.

Verify which tasks are supported by the provider.For `hf-inference`, all available tasks

can be found [here](https://huggingface.co/tasks).

The task to perform on the inference. if you are passing a provider, `task` is required.

Verify which tasks are supported by the provider.

Available tasks can be found [here](https://huggingface.co/docs/huggingface_hub/guides/inference#supported-tasks).

(TODO in subsequent PR: extend https://huggingface.co/docs/huggingface_hub/guides/inference#supported-tasks to document tasks per providers)

src/huggingface_hub/inference/_client.py

Co-authored-by: Lucain <lucain@huggingface.co>

…gface/huggingface_hub into inference-providers-compatibility

Wauplin

Except for #2757 (comment) for which I feel a bit strongly and a few tests, I think we are close to be able to merge this PR.

As discussed offline, we'll have to take care about a few things:

replace prepare_headers / prepare_payload / build_url by a unique prepare_request
revert providers from module-based to class-based (same as "hf-inference")
add documentation (examples with providers + maintain a provider <> tasks table)
ASR parameters (+ likely T2I / TTS as well?)
implement proxy-ed calls (+ make sure we never leak HF token to another provider)
revamp VCR tests => server-side caching instead

All of this can be done in subsequent PRs. This PR is already big enough like this 😄

Thanks again for coordinating all this @hanouticelina ! It takes InferenceClient to a whole new dimension 🚀

Wauplin

Awesome!

Wauplin · 2025-01-23T10:34:17Z

🎉

julien-c · 2025-01-23T10:44:41Z

🤯 🤯

Add first version of third-party providers support

5c1a209

hanouticelina requested review from Wauplin, julien-c and SBrandeis January 17, 2025 09:46

SBrandeis reviewed Jan 17, 2025

View reviewed changes

add task level in model id mappings

c142995

julien-c reviewed Jan 17, 2025

View reviewed changes

raise error when task is not supported by a provider + some improvements

051ee76

Wauplin reviewed Jan 17, 2025

View reviewed changes

small (big) refactoring

ab2b44f

Wauplin reviewed Jan 20, 2025

View reviewed changes

hanouticelina added 2 commits January 20, 2025 15:26

multiple fixes

a896f3e

add hf inference tasks

49d389a

Wauplin mentioned this pull request Jan 20, 2025

Handle hf_inference in single file #2766

Merged

Wauplin and others added 9 commits January 20, 2025 22:11

Handle hf_inference in single file (#2766)

20e8d3a

harmonize prepare_payload args and add automatic-speech-recognition task

ceca530

backward compatibility with custom urls

a076eb4

first draft of tests

aca6050

InferenceClient as fixture + skip if no api_key

c2bdcc2

give name to parametrized tests

4489069

upload cassettes

5f9d946

make quali

e1a379c

download sample files from prod

fec77a6

Wauplin reviewed Jan 22, 2025

View reviewed changes

Wauplin and others added 5 commits January 22, 2025 16:42

fix python3.8

a731eec

small improvement for better readability

9b209b8

Co-authored-by: Lucain <lucain@huggingface.co>

make style

28825cb

fixing more tests

a0208c9

Merge branch 'inference-providers-compatibility' of github.com:huggin…

8f2eb6c

…gface/huggingface_hub into inference-providers-compatibility

Wauplin reviewed Jan 22, 2025

View reviewed changes

hanouticelina added 2 commits January 22, 2025 17:08

test url building

456122f

fix and record async client tests

d5dcf8f

hanouticelina marked this pull request as ready for review January 22, 2025 18:00

hanouticelina added 7 commits January 22, 2025 19:04

re-add cassettes

4ba4ab4

fix

65b659d

add cassettes back

ae6f2af

fix test

4d50893

hopefully this will fix the test

9d557f0

fix sentence similarity test

92f62fc

Merge branch 'main' into inference-providers-compatibility

8ebbe80

hanouticelina requested a review from Wauplin January 22, 2025 19:54

Merge branch 'main' into inference-providers-compatibility

2223998

Wauplin approved these changes Jan 23, 2025

View reviewed changes

Wauplin mentioned this pull request Jan 23, 2025

[InferenceClient] Third-party providers follow-up tasks #2772

Open

8 tasks

hanouticelina merged commit 826f654 into main Jan 23, 2025
16 of 17 checks passed

hanouticelina deleted the inference-providers-compatibility branch January 23, 2025 10:34

[InferenceClient] Add third-party providers support #2757

[InferenceClient] Add third-party providers support #2757

Uh oh!

Conversation

hanouticelina commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SBrandeis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SBrandeis Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

julien-c left a comment

Choose a reason for hiding this comment

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wauplin Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

Wauplin Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

Wauplin Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

Wauplin Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

hanouticelina Jan 17, 2025

Choose a reason for hiding this comment

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wauplin Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Wauplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Wauplin commented Jan 23, 2025

Uh oh!

julien-c commented Jan 23, 2025

Uh oh!

Uh oh!

hanouticelina commented Jan 17, 2025 •

edited

Loading