Skip to content

Commit

Permalink
[InferenceClient] flag chat_completion()'s logit_bias as UNUSED (#…
Browse files Browse the repository at this point in the history
…2724)

* update logit bias doc

* improve unused parameters documentation
  • Loading branch information
hanouticelina authored Jan 6, 2025
1 parent 438f2fb commit 6f5d870
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 14 deletions.
9 changes: 2 additions & 7 deletions src/huggingface_hub/inference/_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,25 +576,20 @@ def chat_completion(
The model to use for chat-completion. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
Inference Endpoint. If not provided, the default recommended model for chat-based text-generation will be used.
See https://huggingface.co/tasks/text-generation for more details.
If `model` is a model ID, it is passed to the server as the `model` parameter. If you want to define a
custom URL while setting `model` in the request payload, you must set `base_url` when initializing [`InferenceClient`].
frequency_penalty (`float`, *optional*):
Penalizes new tokens based on their existing frequency
in the text so far. Range: [-2.0, 2.0]. Defaults to 0.0.
logit_bias (`List[float]`, *optional*):
Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens
(specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically,
the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model,
but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should
result in a ban or exclusive selection of the relevant token. Defaults to None.
UNUSED. Currently not implemented in text-generation-inference (TGI). Kept as a parameter for OpenAI compatibility.
logprobs (`bool`, *optional*):
Whether to return log probabilities of the output tokens or not. If true, returns the log
probabilities of each output token returned in the content of message.
max_tokens (`int`, *optional*):
Maximum number of tokens allowed in the response. Defaults to 100.
n (`int`, *optional*):
UNUSED.
UNUSED. Currently not implemented in text-generation-inference (TGI). Kept as a parameter for OpenAI compatibility.
presence_penalty (`float`, *optional*):
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the
text so far, increasing the model's likelihood to talk about new topics.
Expand Down
9 changes: 2 additions & 7 deletions src/huggingface_hub/inference/_generated/_async_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -612,25 +612,20 @@ async def chat_completion(
The model to use for chat-completion. Can be a model ID hosted on the Hugging Face Hub or a URL to a deployed
Inference Endpoint. If not provided, the default recommended model for chat-based text-generation will be used.
See https://huggingface.co/tasks/text-generation for more details.
If `model` is a model ID, it is passed to the server as the `model` parameter. If you want to define a
custom URL while setting `model` in the request payload, you must set `base_url` when initializing [`InferenceClient`].
frequency_penalty (`float`, *optional*):
Penalizes new tokens based on their existing frequency
in the text so far. Range: [-2.0, 2.0]. Defaults to 0.0.
logit_bias (`List[float]`, *optional*):
Modify the likelihood of specified tokens appearing in the completion. Accepts a JSON object that maps tokens
(specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically,
the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model,
but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should
result in a ban or exclusive selection of the relevant token. Defaults to None.
UNUSED. Currently not implemented in text-generation-inference (TGI). Kept as a parameter for OpenAI compatibility.
logprobs (`bool`, *optional*):
Whether to return log probabilities of the output tokens or not. If true, returns the log
probabilities of each output token returned in the content of message.
max_tokens (`int`, *optional*):
Maximum number of tokens allowed in the response. Defaults to 100.
n (`int`, *optional*):
UNUSED.
UNUSED. Currently not implemented in text-generation-inference (TGI). Kept as a parameter for OpenAI compatibility.
presence_penalty (`float`, *optional*):
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the
text so far, increasing the model's likelihood to talk about new topics.
Expand Down

0 comments on commit 6f5d870

Please sign in to comment.