-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Message API code snippets #700
Conversation
Align with new text-generation generator and make codebase more future-proof (when we improve inputs across other tasks)
For me we would just change the default input for conversational to the following: messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
] and let the API and/or libraries handle the rest, no? (That's what @Wauplin @SBrandeis @mishig25 have been doing on the huggingface.js/inference, widgets and huggingface_hub side i think) |
Finally getting back to this! Just a few notes:
|
Thanks for looking into it @xenova! In Python, it's best to promote For JS, I know that @radames also added a
Not that I'm aware of no, except looking at the config. I think it's also fine to not showcase system prompts at all in code snippets. Describing the message API + user/assistant is already a big win! |
Thanks for the resources @Wauplin! Exactly what I was looking for! I've updated the code to use this now :) Here's some sample snippet code generated for JavaScript: import { HfInference } from "@huggingface/inference";
const inference = new HfInference("hf_xxx");
for await (const chunk of inference.chatCompletionStream({
model: "meta-llama/Meta-Llama-3-8B-Instruct",
messages: [{ role: "user", content: "What is the capital of France?" }],
max_tokens: 500,
})) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
} Python: from huggingface_hub import InferenceClient
client = InferenceClient(
"meta-llama/Meta-Llama-3-8B-Instruct",
token="hf_xxx",
)
for message in client.chat_completion(
messages=[{"role": "user", "content": "What is the capital of France?"}],
max_tokens=500,
stream=True,
):
print(message.choices[0].delta.content, end="") cURL:
Other notes:
|
Nice wrap-up! The snippets looks good to me (at least Py and CURL but I trust you on the JS one as well).
What you did there is nice I think. Showcasing only 2 parameters is ok, to let the user know they have more options. It's only a small snippet anyway so we'll not showcase everything (docs are better for that).
In Python, one can do client = InferenceClient(
"meta-llama/Meta-Llama-3-8B-Instruct",
token="hf_xxx",
timeout=30,
) But I don't think it's that bad to have an infinite timeout. FYI, there is an
👍
Which model are you talking about? facebook/blenderbot-3B, facebook/blenderbot-400M-distill or facebook/blenderbot_small-90M are all EDIT: I just realized blenderbot models have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code, snippets and overall logic looks sane to me - especially since all text-generation models are served with TGI now (IIUC)
I think this will be good, since in cases it hangs forever, it would be a good idea to inform the user why 😇
Good point! This won't be an issue now then, since we only consider this path for text-generation + conversational models: In that case, I think this PR is pretty much ready - just final reviews left! 😎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a final review and looks good to me 👍 Thanks for all the iterations! Please wait another approval before merging though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Merged! 🚀 Shall we put out a new release of |
You can yes :) (by triggering it here) |
This PR is the first step towards improving auto-generated code snippets, mainly focusing on improving chat model inputs.
Highlights of the PR:
"Content-Type": "application/json"
)tasks/[task]/snippet.ts
. The reason against keeping it all in a single file (which wassnippets/inputs.ts
) is that this will grow in complexity as we improve code snippets across all other tasks.