[JS][Proposal] Streamlined Generation APIs #939

mbleigh · 2024-09-20T18:08:16Z

This is a proposed breaking change API for Genkit to streamline the most common scenarios while keeping the flexibility and capability level constant. The changes can be broken down into three components:

Encouraging default model configurations
Streamlining generation to return data directly instead of returning a wrapping response
Separating out multi-turn and single-turn use cases

Default Model Configurations

While one of the strengths of Genkit is the ability to easily swap between multiple models, we find in practice that most people use a single model as their "go-to" with other models swapped in as needed. The same goes for model configuration -- most of the time you're going to want the same settings.

Proposed is to encourage setting a default model (now just called model) when initializing Genkit as well as the ability to define model settings when instantiating a reference to a model.

import { genkit } from "genkit";
import { vertexAI } from "@genkit-ai/vertexAI";

const ai = genkit({
  plugins: [vertexAI()],
  // sets a default model with configuration
  model: vertexAI.geminiModel('gemini-1.5-flash', {safetySettings: [...]});
});

const claude = vertexAI.anthropicModel('claude-3.5-sonnet', {...claudeSettings});

Both model and configuration can still be overridden at call time, but this makes it easier to set a common reusable baseline.

Streamlining Generation

Most of the time, what you want from a generate() call is the data that is being generated. Today this requires a two-line "get response, get output from response" pattern which gets tedious when working with e.g. multi-step processes.

Proposed is to simplify to a generate API that will return text or structured data depending on call configuration:

const jokeText = await ai.generate("Tell a funny joke.");

const fakePerson = await ai.generate({
  prompt: "Generate the information for an imaginary person named Annaka",
  schema: z.object({name: z.string(), job: z.string(), hobbies: z.array(z.string())}),
});

This can get more complex if you want it to:

const jokeAdvanced = await ai.generate({
  model: gpt,
  config: {...},
  prompt: {role: "user", content: [{text: "Tell a funny joke."}],
});

When developers do want to dig into the metadata of the response, they can use a new generateResponse method which will be equivalent to generate today.

const jokeResponse = await ai.generateResponse("Tell a funny joke.");
console.log(jokeResponse.text());
console.log(jokeResponse.stopReason);

Streaming will be supported through streamGenerate and streamGenerateResponse. When doing streamGenerate, the chunks emitted will be in output form (either a partial data response or a string chunk):

const {stream, data} = ai.streamGenerate("Tell a really long joke with at least 5 paragraphs.");

for await (const chunk of stream) {
  console.log(chunk); // chunk is just a string
}

console.log(await data); // this is the full result, equiavalent to `generate()`

const {stream, response} = ai.streamGenerateResponse(...);
for await (const chunk of stream) {
  console.log(chunk.text()); // chunk is a Chunk instance
}

console.log((await response).usage);

Multi-Turn Generation

All of the above is great if you only have a single turn generation, but it doesn't really help for a chatbot scenario. Fundamentally multi-turn use cases are pretty different and deserve better attention in the API surface.

Proposed is a new Chat class and a new send() method that lets you explicitly opt-in to multi-turn conversational use cases.

const chat = ai.chat({
  system: "You are a pirate.",
});

const response = await chat.send("How are you today?");
console.log(reply);
// "Yarr, not too bad, matey. How be ye?"
const {stream, data} = await chat.streamSend("Tell me a long story, ye scurvy sea dog!");
chat.messages(); // equivalent to `toHistory()` in current Genkit

The text was updated successfully, but these errors were encountered:

chrisraygill · 2024-09-20T18:46:33Z

I generally like it - but a few questions:

1. Streaming with generateX

streamGenerate() today is generateStream(). Do to mean to change that, or just an oversight? Otherwise:

ai.streamGenerate() --> ai.generateStream()
ai.stremGenerateResponse() --> ai.generateResponseStream()

2. Streaming for multi-turn generation

How do you get a streamed response from ai.send()?

If we're being consistent with generate(), then it would be ai.sendStream() .

3. Arguments for ai.send

Does ai.send accept the same arguments as ai.generateResponse()? Does it return the same response object?

If so, what's the difference between the two?

mbleigh · 2024-09-20T18:51:31Z

I generally like it - but a few questions:

1. Streaming with generateX

streamGenerate() today is generateStream(). Do to mean to change that, or just an oversight? Otherwise:

Hmm, mostly accidental but maybe intentional after some thought. The problem is that generateStream makes sense but sendStream sounds like you're sending the stream, not receiving one back.

ai.streamGenerate() --> ai.generateStream() ai.stremGenerateResponse() --> ai.generateResponseStream()

2. Streaming for multi-turn generation

How do you get a streamed response from ai.send()?

If we're being consistent with generate(), then it would be ai.sendStream() .

Yeah, forgot to write that up, ai.streamSend would be the proposal.

3. Arguments for ai.send

Does ai.send accept the same arguments as ai.generateResponse()? Does it return the same response object?

If so, what's the difference between the two?

I'm imagining them as being two things, but they're really really similar so it's maybe a judgment call if they deserve to be different things. I'm imagining generateResponse returns a GenerateResponse which does not necessarily have send() on it.

But maybe...maybe they are just the same thing, and the extra "stuff you want to do with the response" of send() means that it's also sufficient for "single-turn but want more metadata".

I like the idea of calling this a Conversation, but in theory it could maybe replace GenerateResponse? Hmm...

chrisraygill · 2024-09-23T13:34:15Z

Generate API

Let's go over a few potential options for the generate APIs. Do you have a favorite?

On a related note, it's not clear to me why result.text() and result.toHistory() are functions rather than attributes like result.text and result.history.

Option 1

const output = ai.generate({ ... }); // where output is just the text or data output
const response = ai.generateResponse({ ... }); // where response has { text/data, messages, usage, stopReason }

const { stream, output } = ai.generateStream({ ... });
const { stream, response } = ai.generateStream({ ... });

const text = ai.generate({ ... });
console.long(text);

const response = ai.generateResponse({ ... });
console.long(response.text());

Option 2

const { text, messages, usage, stopReason } = ai.generateText({ ... });
const { data, messages, usage, stopReason } = ai.generateData({ ... });

const { stream, text, messages, usage, stopReason } = ai.streamText({ ... });
const { stream, data, messages, usage, stopReason } = ai.streamData({ ... });

const { text } = ai.generateText({ ... });
console.long(text);

Option 3

Change if it returns text or structured data based on if a schema is provided.

const { text, messages, usage, stopReason } = ai.generate({ ... });
const { data, messages, usage, stopReason } = ai.generate({ ... });

const { stream, text, messages, usage, stopReason } = ai.generateStream({ ... });
const { stream, data, messages, usage, stopReason } = ai.generateStream({ ... });

const { text } = ai.generate({ ... });
console.long(text);

Chat API

I think it would be valuable to not have too many separate, but highly overlapping APIs like generate and send. That way we can have much more succinct documentation and a smaller surface to maintain.

How about something like this?

const agent = ai.agent({
  model: model: googleAI.model('gemini-1.5-flash'),
  system: "You are a pirate.",
  // messages: ...
  // tools: [ ... ]
  // stateStore: ...
});

const reply = await agent.generate("How are you today?");
console.log(reply);
// "Yarr, not too bad, matey. How be ye?"

const {stream, text} = await agent.generateStream("Tell me a long story, ye scurvy sea dog!");
agent.messages(); // equivalent to `toHistory()` in current Genkit

mbleigh · 2024-09-25T07:13:00Z

I think destructuring is probably what I'm leaning toward at the moment since it provides the best balance between "one-liner friendly" and "can still get at metadata". I didn't realize that destructuring class instance properties works just fine, so this doesn't really even need to be a big refactor...I think we just make some of the stuff that is a method today into a getter property instead.

// for single-turn, use generate
const {text} = await ai.generate("Tell me a story.");
const {data} = await ai.generate({
  prompt: "Generate a fake person.",
  output: {schema: z.object({name: z.string(), age: z.number()}}
});

// for single-turn streaming, use generateStream
const {stream} = await ai.generateStream("Tell me a long story");
for await (const {text} of stream) {
  console.log(text);
}

For multi-turn...still thinking but maybe we can collapse everything into Session...

const session = ai.session();
let {text} = await session.generate("What's your name?");
// "My name is Bobot."
{text} = await session.generate("That's a funny name.");
// "It's the only one I have."

chrisraygill · 2024-09-25T22:43:47Z

// for single-turn, use generate
const {text} = await ai.generate("Tell me a story.");
const {data} = await ai.generate({
  prompt: "Generate a fake person.",
  output: {schema: z.object({name: z.string(), age: z.number()}}
});

// for single-turn streaming, use generateStream
const {stream} = await ai.generateStream("Tell me a long story");
for await (const {text} of stream) {
  console.log(text);
}

I like this. Does a generate call return both data and text regardless of if output conformance is used, or does the return types change?

github-project-automation bot added this to Genkit Backlog Sep 20, 2024

chrisraygill moved this to Discuss in Genkit Backlog Oct 3, 2024

chrisraygill assigned pavelgj Oct 3, 2024

chrisraygill added this to the js-0.6.0 milestone Oct 3, 2024

chrisraygill added the js label Oct 3, 2024

pavelgj moved this from Discuss to In Progress in Genkit Backlog Oct 7, 2024

pavelgj closed this as completed Oct 21, 2024

github-project-automation bot moved this from In Progress to Done in Genkit Backlog Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JS][Proposal] Streamlined Generation APIs #939

[JS][Proposal] Streamlined Generation APIs #939

mbleigh commented Sep 20, 2024 •

edited

Loading

chrisraygill commented Sep 20, 2024 •

edited

Loading

mbleigh commented Sep 20, 2024

chrisraygill commented Sep 23, 2024 •

edited

Loading

mbleigh commented Sep 25, 2024 •

edited

Loading

chrisraygill commented Sep 25, 2024 •

edited

Loading

[JS][Proposal] Streamlined Generation APIs #939

[JS][Proposal] Streamlined Generation APIs #939

Comments

mbleigh commented Sep 20, 2024 • edited Loading

Default Model Configurations

Streamlining Generation

Multi-Turn Generation

chrisraygill commented Sep 20, 2024 • edited Loading

mbleigh commented Sep 20, 2024

chrisraygill commented Sep 23, 2024 • edited Loading

Generate API

Chat API

mbleigh commented Sep 25, 2024 • edited Loading

chrisraygill commented Sep 25, 2024 • edited Loading

mbleigh commented Sep 20, 2024 •

edited

Loading

chrisraygill commented Sep 20, 2024 •

edited

Loading

chrisraygill commented Sep 23, 2024 •

edited

Loading

mbleigh commented Sep 25, 2024 •

edited

Loading

chrisraygill commented Sep 25, 2024 •

edited

Loading