Conversation
Align with new text-generation generator and make codebase more future-proof (when we improve inputs across other tasks)
|
For me we would just change the default input for conversational to the following: messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]and let the API and/or libraries handle the rest, no? (That's what @Wauplin @SBrandeis @mishig25 have been doing on the huggingface.js/inference, widgets and huggingface_hub side i think) |
|
Finally getting back to this! Just a few notes:
|
|
Thanks for looking into it @xenova! In Python, it's best to promote For JS, I know that @radames also added a
Not that I'm aware of no, except looking at the config. I think it's also fine to not showcase system prompts at all in code snippets. Describing the message API + user/assistant is already a big win! |
|
Thanks for the resources @Wauplin! Exactly what I was looking for! I've updated the code to use this now :) Here's some sample snippet code generated for JavaScript: import { HfInference } from "@huggingface/inference";
const inference = new HfInference("hf_xxx");
for await (const chunk of inference.chatCompletionStream({
model: "meta-llama/Meta-Llama-3-8B-Instruct",
messages: [{ role: "user", content: "What is the capital of France?" }],
max_tokens: 500,
})) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}Python: from huggingface_hub import InferenceClient
client = InferenceClient(
"meta-llama/Meta-Llama-3-8B-Instruct",
token="hf_xxx",
)
for message in client.chat_completion(
messages=[{"role": "user", "content": "What is the capital of France?"}],
max_tokens=500,
stream=True,
):
print(message.choices[0].delta.content, end="")cURL: Other notes:
|
|
Nice wrap-up! The snippets looks good to me (at least Py and CURL but I trust you on the JS one as well).
What you did there is nice I think. Showcasing only 2 parameters is ok, to let the user know they have more options. It's only a small snippet anyway so we'll not showcase everything (docs are better for that).
In Python, one can do client = InferenceClient(
"meta-llama/Meta-Llama-3-8B-Instruct",
token="hf_xxx",
timeout=30,
)But I don't think it's that bad to have an infinite timeout. FYI, there is an
👍
Which model are you talking about? facebook/blenderbot-3B, facebook/blenderbot-400M-distill or facebook/blenderbot_small-90M are all EDIT: I just realized blenderbot models have the |
SBrandeis
left a comment
There was a problem hiding this comment.
The code, snippets and overall logic looks sane to me - especially since all text-generation models are served with TGI now (IIUC)
I think this will be good, since in cases it hangs forever, it would be a good idea to inform the user why 😇
Good point! This won't be an issue now then, since we only consider this path for text-generation + conversational models: In that case, I think this PR is pretty much ready - just final reviews left! 😎 |
|
Merged! 🚀 Shall we put out a new release of |
|
You can yes :) (by triggering it here) |

This PR is the first step towards improving auto-generated code snippets, mainly focusing on improving chat model inputs.
Highlights of the PR:
"Content-Type": "application/json")tasks/[task]/snippet.ts. The reason against keeping it all in a single file (which wassnippets/inputs.ts) is that this will grow in complexity as we improve code snippets across all other tasks.