-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JS][Proposal] Streamlined Generation APIs #939
Comments
I generally like it - but a few questions: 1. Streaming with
2. Streaming for multi-turn generation How do you get a streamed response from If we're being consistent with 3. Arguments for Does If so, what's the difference between the two? |
Hmm, mostly accidental but maybe intentional after some thought. The problem is that
Yeah, forgot to write that up,
I'm imagining them as being two things, but they're really really similar so it's maybe a judgment call if they deserve to be different things. I'm imagining But maybe...maybe they are just the same thing, and the extra "stuff you want to do with the response" of I like the idea of calling this a |
Generate APILet's go over a few potential options for the On a related note, it's not clear to me why Option 1 const output = ai.generate({ ... }); // where output is just the text or data output
const response = ai.generateResponse({ ... }); // where response has { text/data, messages, usage, stopReason }
const { stream, output } = ai.generateStream({ ... });
const { stream, response } = ai.generateStream({ ... }); const text = ai.generate({ ... });
console.long(text);
const response = ai.generateResponse({ ... });
console.long(response.text()); Option 2 const { text, messages, usage, stopReason } = ai.generateText({ ... });
const { data, messages, usage, stopReason } = ai.generateData({ ... });
const { stream, text, messages, usage, stopReason } = ai.streamText({ ... });
const { stream, data, messages, usage, stopReason } = ai.streamData({ ... }); const { text } = ai.generateText({ ... });
console.long(text); Option 3 Change if it returns const { text, messages, usage, stopReason } = ai.generate({ ... });
const { data, messages, usage, stopReason } = ai.generate({ ... });
const { stream, text, messages, usage, stopReason } = ai.generateStream({ ... });
const { stream, data, messages, usage, stopReason } = ai.generateStream({ ... }); const { text } = ai.generate({ ... });
console.long(text); Chat APII think it would be valuable to not have too many separate, but highly overlapping APIs like How about something like this? const agent = ai.agent({
model: model: googleAI.model('gemini-1.5-flash'),
system: "You are a pirate.",
// messages: ...
// tools: [ ... ]
// stateStore: ...
});
const reply = await agent.generate("How are you today?");
console.log(reply);
// "Yarr, not too bad, matey. How be ye?"
const {stream, text} = await agent.generateStream("Tell me a long story, ye scurvy sea dog!");
agent.messages(); // equivalent to `toHistory()` in current Genkit |
I think destructuring is probably what I'm leaning toward at the moment since it provides the best balance between "one-liner friendly" and "can still get at metadata". I didn't realize that destructuring class instance properties works just fine, so this doesn't really even need to be a big refactor...I think we just make some of the stuff that is a method today into a getter property instead. // for single-turn, use generate
const {text} = await ai.generate("Tell me a story.");
const {data} = await ai.generate({
prompt: "Generate a fake person.",
output: {schema: z.object({name: z.string(), age: z.number()}}
});
// for single-turn streaming, use generateStream
const {stream} = await ai.generateStream("Tell me a long story");
for await (const {text} of stream) {
console.log(text);
} For multi-turn...still thinking but maybe we can collapse everything into Session... const session = ai.session();
let {text} = await session.generate("What's your name?");
// "My name is Bobot."
{text} = await session.generate("That's a funny name.");
// "It's the only one I have." |
// for single-turn, use generate
const {text} = await ai.generate("Tell me a story.");
const {data} = await ai.generate({
prompt: "Generate a fake person.",
output: {schema: z.object({name: z.string(), age: z.number()}}
});
// for single-turn streaming, use generateStream
const {stream} = await ai.generateStream("Tell me a long story");
for await (const {text} of stream) {
console.log(text);
} I like this. Does a |
This is a proposed breaking change API for Genkit to streamline the most common scenarios while keeping the flexibility and capability level constant. The changes can be broken down into three components:
Default Model Configurations
While one of the strengths of Genkit is the ability to easily swap between multiple models, we find in practice that most people use a single model as their "go-to" with other models swapped in as needed. The same goes for model configuration -- most of the time you're going to want the same settings.
Proposed is to encourage setting a default model (now just called
model
) when initializing Genkit as well as the ability to define model settings when instantiating a reference to a model.Both model and configuration can still be overridden at call time, but this makes it easier to set a common reusable baseline.
Streamlining Generation
Most of the time, what you want from a
generate()
call is the data that is being generated. Today this requires a two-line "get response, get output from response" pattern which gets tedious when working with e.g. multi-step processes.Proposed is to simplify to a
generate
API that will return text or structured data depending on call configuration:This can get more complex if you want it to:
When developers do want to dig into the metadata of the response, they can use a new
generateResponse
method which will be equivalent togenerate
today.Streaming will be supported through
streamGenerate
andstreamGenerateResponse
. When doingstreamGenerate
, the chunks emitted will be in output form (either a partial data response or a string chunk):Multi-Turn Generation
All of the above is great if you only have a single turn generation, but it doesn't really help for a chatbot scenario. Fundamentally multi-turn use cases are pretty different and deserve better attention in the API surface.
Proposed is a new
Chat
class and a newsend()
method that lets you explicitly opt-in to multi-turn conversational use cases.The text was updated successfully, but these errors were encountered: