Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JS][Proposal] Streamlined Generation APIs #939

Closed
mbleigh opened this issue Sep 20, 2024 · 5 comments
Closed

[JS][Proposal] Streamlined Generation APIs #939

mbleigh opened this issue Sep 20, 2024 · 5 comments
Assignees
Labels
Milestone

Comments

@mbleigh
Copy link
Collaborator

mbleigh commented Sep 20, 2024

This is a proposed breaking change API for Genkit to streamline the most common scenarios while keeping the flexibility and capability level constant. The changes can be broken down into three components:

  1. Encouraging default model configurations
  2. Streamlining generation to return data directly instead of returning a wrapping response
  3. Separating out multi-turn and single-turn use cases

Default Model Configurations

While one of the strengths of Genkit is the ability to easily swap between multiple models, we find in practice that most people use a single model as their "go-to" with other models swapped in as needed. The same goes for model configuration -- most of the time you're going to want the same settings.

Proposed is to encourage setting a default model (now just called model) when initializing Genkit as well as the ability to define model settings when instantiating a reference to a model.

import { genkit } from "genkit";
import { vertexAI } from "@genkit-ai/vertexAI";

const ai = genkit({
  plugins: [vertexAI()],
  // sets a default model with configuration
  model: vertexAI.geminiModel('gemini-1.5-flash', {safetySettings: [...]});
});

const claude = vertexAI.anthropicModel('claude-3.5-sonnet', {...claudeSettings});

Both model and configuration can still be overridden at call time, but this makes it easier to set a common reusable baseline.

Streamlining Generation

Most of the time, what you want from a generate() call is the data that is being generated. Today this requires a two-line "get response, get output from response" pattern which gets tedious when working with e.g. multi-step processes.

Proposed is to simplify to a generate API that will return text or structured data depending on call configuration:

const jokeText = await ai.generate("Tell a funny joke.");

const fakePerson = await ai.generate({
  prompt: "Generate the information for an imaginary person named Annaka",
  schema: z.object({name: z.string(), job: z.string(), hobbies: z.array(z.string())}),
});

This can get more complex if you want it to:

const jokeAdvanced = await ai.generate({
  model: gpt,
  config: {...},
  prompt: {role: "user", content: [{text: "Tell a funny joke."}],
});

When developers do want to dig into the metadata of the response, they can use a new generateResponse method which will be equivalent to generate today.

const jokeResponse = await ai.generateResponse("Tell a funny joke.");
console.log(jokeResponse.text());
console.log(jokeResponse.stopReason);

Streaming will be supported through streamGenerate and streamGenerateResponse. When doing streamGenerate, the chunks emitted will be in output form (either a partial data response or a string chunk):

const {stream, data} = ai.streamGenerate("Tell a really long joke with at least 5 paragraphs.");

for await (const chunk of stream) {
  console.log(chunk); // chunk is just a string
}

console.log(await data); // this is the full result, equiavalent to `generate()`

const {stream, response} = ai.streamGenerateResponse(...);
for await (const chunk of stream) {
  console.log(chunk.text()); // chunk is a Chunk instance
}

console.log((await response).usage);

Multi-Turn Generation

All of the above is great if you only have a single turn generation, but it doesn't really help for a chatbot scenario. Fundamentally multi-turn use cases are pretty different and deserve better attention in the API surface.

Proposed is a new Chat class and a new send() method that lets you explicitly opt-in to multi-turn conversational use cases.

const chat = ai.chat({
  system: "You are a pirate.",
});

const response = await chat.send("How are you today?");
console.log(reply);
// "Yarr, not too bad, matey. How be ye?"
const {stream, data} = await chat.streamSend("Tell me a long story, ye scurvy sea dog!");
chat.messages(); // equivalent to `toHistory()` in current Genkit
@chrisraygill
Copy link
Contributor

chrisraygill commented Sep 20, 2024

I generally like it - but a few questions:

1. Streaming with generateX

streamGenerate() today is generateStream(). Do to mean to change that, or just an oversight? Otherwise:

ai.streamGenerate() --> ai.generateStream()
ai.stremGenerateResponse() --> ai.generateResponseStream()

2. Streaming for multi-turn generation

How do you get a streamed response from ai.send()?

If we're being consistent with generate(), then it would be ai.sendStream() .

3. Arguments for ai.send

Does ai.send accept the same arguments as ai.generateResponse()? Does it return the same response object?

If so, what's the difference between the two?

@mbleigh
Copy link
Collaborator Author

mbleigh commented Sep 20, 2024

I generally like it - but a few questions:

1. Streaming with generateX

streamGenerate() today is generateStream(). Do to mean to change that, or just an oversight? Otherwise:

Hmm, mostly accidental but maybe intentional after some thought. The problem is that generateStream makes sense but sendStream sounds like you're sending the stream, not receiving one back.

ai.streamGenerate() --> ai.generateStream() ai.stremGenerateResponse() --> ai.generateResponseStream()

2. Streaming for multi-turn generation

How do you get a streamed response from ai.send()?

If we're being consistent with generate(), then it would be ai.sendStream() .

Yeah, forgot to write that up, ai.streamSend would be the proposal.

3. Arguments for ai.send

Does ai.send accept the same arguments as ai.generateResponse()? Does it return the same response object?

If so, what's the difference between the two?

I'm imagining them as being two things, but they're really really similar so it's maybe a judgment call if they deserve to be different things. I'm imagining generateResponse returns a GenerateResponse which does not necessarily have send() on it.

But maybe...maybe they are just the same thing, and the extra "stuff you want to do with the response" of send() means that it's also sufficient for "single-turn but want more metadata".

I like the idea of calling this a Conversation, but in theory it could maybe replace GenerateResponse? Hmm...

@chrisraygill
Copy link
Contributor

chrisraygill commented Sep 23, 2024

Generate API

Let's go over a few potential options for the generate APIs. Do you have a favorite?

On a related note, it's not clear to me why result.text() and result.toHistory() are functions rather than attributes like result.text and result.history.

Option 1

const output = ai.generate({ ... }); // where output is just the text or data output
const response = ai.generateResponse({ ... }); // where response has { text/data, messages, usage, stopReason }

const { stream, output } = ai.generateStream({ ... });
const { stream, response } = ai.generateStream({ ... });
const text = ai.generate({ ... });
console.long(text);

const response = ai.generateResponse({ ... });
console.long(response.text());

Option 2

const { text, messages, usage, stopReason } = ai.generateText({ ... });
const { data, messages, usage, stopReason } = ai.generateData({ ... });

const { stream, text, messages, usage, stopReason } = ai.streamText({ ... });
const { stream, data, messages, usage, stopReason } = ai.streamData({ ... });
const { text } = ai.generateText({ ... });
console.long(text);

Option 3

Change if it returns text or structured data based on if a schema is provided.

const { text, messages, usage, stopReason } = ai.generate({ ... });
const { data, messages, usage, stopReason } = ai.generate({ ... });

const { stream, text, messages, usage, stopReason } = ai.generateStream({ ... });
const { stream, data, messages, usage, stopReason } = ai.generateStream({ ... });
const { text } = ai.generate({ ... });
console.long(text);

Chat API

I think it would be valuable to not have too many separate, but highly overlapping APIs like generate and send. That way we can have much more succinct documentation and a smaller surface to maintain.

How about something like this?

const agent = ai.agent({
  model: model: googleAI.model('gemini-1.5-flash'),
  system: "You are a pirate.",
  // messages: ...
  // tools: [ ... ]
  // stateStore: ...
});

const reply = await agent.generate("How are you today?");
console.log(reply);
// "Yarr, not too bad, matey. How be ye?"

const {stream, text} = await agent.generateStream("Tell me a long story, ye scurvy sea dog!");
agent.messages(); // equivalent to `toHistory()` in current Genkit

@mbleigh
Copy link
Collaborator Author

mbleigh commented Sep 25, 2024

I think destructuring is probably what I'm leaning toward at the moment since it provides the best balance between "one-liner friendly" and "can still get at metadata". I didn't realize that destructuring class instance properties works just fine, so this doesn't really even need to be a big refactor...I think we just make some of the stuff that is a method today into a getter property instead.

// for single-turn, use generate
const {text} = await ai.generate("Tell me a story.");
const {data} = await ai.generate({
  prompt: "Generate a fake person.",
  output: {schema: z.object({name: z.string(), age: z.number()}}
});

// for single-turn streaming, use generateStream
const {stream} = await ai.generateStream("Tell me a long story");
for await (const {text} of stream) {
  console.log(text);
}

For multi-turn...still thinking but maybe we can collapse everything into Session...

const session = ai.session();
let {text} = await session.generate("What's your name?");
// "My name is Bobot."
{text} = await session.generate("That's a funny name.");
// "It's the only one I have."

@chrisraygill
Copy link
Contributor

chrisraygill commented Sep 25, 2024

// for single-turn, use generate
const {text} = await ai.generate("Tell me a story.");
const {data} = await ai.generate({
  prompt: "Generate a fake person.",
  output: {schema: z.object({name: z.string(), age: z.number()}}
});

// for single-turn streaming, use generateStream
const {stream} = await ai.generateStream("Tell me a long story");
for await (const {text} of stream) {
  console.log(text);
}

I like this. Does a generate call return both data and text regardless of if output conformance is used, or does the return types change?

@chrisraygill chrisraygill moved this to Discuss in Genkit Backlog Oct 3, 2024
@chrisraygill chrisraygill added this to the js-0.6.0 milestone Oct 3, 2024
@chrisraygill chrisraygill added the js label Oct 3, 2024
@pavelgj pavelgj moved this from Discuss to In Progress in Genkit Backlog Oct 7, 2024
@pavelgj pavelgj closed this as completed Oct 21, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Genkit Backlog Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

3 participants