From 14d59fc925e13b794204cbcccc1f27eb7b704448 Mon Sep 17 00:00:00 2001 From: jingyun19 Date: Tue, 18 Nov 2025 14:16:46 -0800 Subject: [PATCH 1/7] Enhance README with tool use modes and examples Added detailed explanations for tool use modes, including examples for open loop and closed loop execution. --- README.md | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 71 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index aace534..98b0317 100644 --- a/README.md +++ b/README.md @@ -141,7 +141,74 @@ Because of their special behavior of being preserved on context window overflow, The Prompt API supports **tool use** via the `tools` option, allowing you to define external capabilities that a language model can invoke in a model-agnostic way. Each tool is represented by an object that includes an `execute` member that specifies the JavaScript function to be called. When the language model initiates a tool use request, the user agent calls the corresponding `execute` function and sends the result back to the model. -Here’s an example of how to use the `tools` option: +There are 2 tool use modes: with automatic execution (closed loop) and without automatic execution (open loop) + +Regardless of with or without automatic execution, the session creation and appending signature are the same. Here’s an example: + +```js +const session = await LanguageModel.create({ + initialPrompts: [ + { + role: "system", + content: `You are a helpful assistant. You can use tools to help the user.` + } + ], + tools: [ + { + name: "getWeather", + description: "Get the weather in a location.", + inputSchema: { + type: "object", + properties: { + location: { + type: "string", + description: "The city to check for the weather condition.", + }, + }, + required: ["location"], + }, + } + ] +}); +``` + +In this example, the `tools` array defines a `getWeather` tool, specifying its name, description and input schema. + +Few shot examples of tool use can be appended like so: + +```js +await session.append([ + {role: "user", content: "What is the weather in Seattle?"}, + {role: "tool-call", content: {type: "tool-call", value: {callID:" get_weather_1", name: "get_weather", arguments: {location:"Seattle"}}}, + {role: "tool-result", content: {type: "tool-response", value: {callID: "get_weather_1", name: "get_weather", result: [{type:"object", value: {temperature: "55F", humidity: "67%"}}]}}, + {role: "assistant", content: "The temperature in Seattle is 55F and humidity is 67%"}, +]); +``` + +Note that "role" and "type" now supports "tool-call" and "tool-result". `content.result` is a list of a dictionary of `type` and `value`, where `type` can be `{"text", "image", "audio", "object" }` and `value` is `any`. + +#### Open Loop: + +Without automatic execution, the API will return a `ToolCall` object with `callId` (a unique identifier of this tool call), `name` (name of the tool), and `arguments` (a dictionary fitting the JSON input schema of the tool's declaration), and client is expected to handle the tool execution and append the tool result back to the session. + +Example: + +```js +const result = await session.prompt("What is the weather in Seattle?"); +if (result.type=="tool-call") { + if (result.name == "get_weather") { + const tool_result = getWeather(result.arguments.location); + session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}]) + } +} +``` + +Note that we always require tool-response to immediately follow tool-call generated by the model. + + +#### Closed Loop: + +To enable automatic execution, add a `execute` function for each tool's implementation, and add a `toolUseConfig` to indicate that execution is enabled and pose a max number of tool calls invoked in a single session generation: ```js const session = await LanguageModel.create({ @@ -171,13 +238,14 @@ const session = await LanguageModel.create({ return JSON.stringify(await res.json()); }, } - ] + ], +toolUseConfig: {enabled: true, max_tool_calls: 5}, }); const result = await session.prompt("What is the weather in Seattle?"); ``` -In this example, the `tools` array defines a `getWeather` tool, specifying its name, description, input schema, and `execute` implementation. When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response. +When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response. #### Concurrent tool use From 5379da1fc97469d89867530ef2fb144a31fbdda4 Mon Sep 17 00:00:00 2001 From: jingyun19 Date: Wed, 19 Nov 2025 09:53:15 -0800 Subject: [PATCH 2/7] Clarify tool-call and tool-result in README Updated README to clarify tool-call and tool-result usage. --- README.md | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 98b0317..6bcf806 100644 --- a/README.md +++ b/README.md @@ -185,21 +185,30 @@ await session.append([ ]); ``` -Note that "role" and "type" now supports "tool-call" and "tool-result". `content.result` is a list of a dictionary of `type` and `value`, where `type` can be `{"text", "image", "audio", "object" }` and `value` is `any`. +Note that "role" and "type" now supports "tool-call" and "tool-result". +`content.result` is a list of a dictionary of `type` and `value`, where `type` can be `{"text", "image", "audio", "object" }` and `value` is `any`. #### Open Loop: -Without automatic execution, the API will return a `ToolCall` object with `callId` (a unique identifier of this tool call), `name` (name of the tool), and `arguments` (a dictionary fitting the JSON input schema of the tool's declaration), and client is expected to handle the tool execution and append the tool result back to the session. +Open loop is enabled by specifying `tool-call` in `expectedOutputs` when the session is created. + +When a tool needs to be called, the API will return an object with `callId` (a unique identifier of this tool call), `name` (name of the tool), and `arguments` (inputs to the tool), and client is expected to handle the tool execution and append the tool result back to the session. The `argument` is a dictionary fitting the JSON input schema of the tool's declaration; if the input schema is not "object", the value will be wrapped in a key. Example: ```js -const result = await session.prompt("What is the weather in Seattle?"); +sessionOptions = structuredClone(options); +sessionOptions.expectedOutputs.push(["tool-call"]); +session = await LanguageModel.create(sessionOptions); + +var result = await session.prompt("What is the weather in Seattle?"); if (result.type=="tool-call") { if (result.name == "get_weather") { const tool_result = getWeather(result.arguments.location); - session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}]) + result = session.prompt([{role:"tool-result", content: {type: "tool-result", value: {callId: result.callID, name: result.name, result: [{type:"object", value: tool_result}]}}}]) } +} else{ + console.log(result) } ``` @@ -239,7 +248,7 @@ const session = await LanguageModel.create({ }, } ], -toolUseConfig: {enabled: true, max_tool_calls: 5}, +toolUseConfig: {enabled: true}, }); const result = await session.prompt("What is the weather in Seattle?"); From 8e076f7f3bcad312527d7bbbf458a0f4aa5f700d Mon Sep 17 00:00:00 2001 From: jingyun19 Date: Wed, 19 Nov 2025 10:00:49 -0800 Subject: [PATCH 3/7] Enhance LanguageModel with tool call definitions Added new types and enums for tool calls and responses. --- index.bs | 51 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/index.bs b/index.bs index 7f7bdb1..37c8ebc 100644 --- a/index.bs +++ b/index.bs @@ -39,8 +39,11 @@ interface LanguageModel : EventTarget { static Promise availability(optional LanguageModelCreateCoreOptions options = {}); static Promise params(); + // The return type from prompt() method and those alike. + typedef (DOMString or sequence) LanguageModelPromptResult; + // These will throw "NotSupportedError" DOMExceptions if role = "system" - Promise prompt( + Promise prompt( LanguageModelPrompt input, optional LanguageModelPromptOptions options = {} ); @@ -80,13 +83,11 @@ interface LanguageModelParams { callback LanguageModelToolFunction = Promise (any... arguments); // A description of a tool call that a language model can invoke. -dictionary LanguageModelTool { +dictionary LanguageModelToolDeclaration { required DOMString name; required DOMString description; // JSON schema for the input parameters. required object inputSchema; - // The function to be invoked by user agent on behalf of language model. - required LanguageModelToolFunction execute; }; dictionary LanguageModelCreateCoreOptions { @@ -97,7 +98,7 @@ dictionary LanguageModelCreateCoreOptions { sequence expectedInputs; sequence expectedOutputs; - sequence tools; + sequence tools; }; dictionary LanguageModelCreateOptions : LanguageModelCreateCoreOptions { @@ -148,16 +149,52 @@ dictionary LanguageModelMessageContent { required LanguageModelMessageValue value; }; -enum LanguageModelMessageRole { "system", "user", "assistant" }; +enum LanguageModelMessageRole { "system", "user", "assistant", "tool-call", "tool-response" }; -enum LanguageModelMessageType { "text", "image", "audio" }; +enum LanguageModelMessageType { "text", "image", "audio","tool-call", "tool-response" }; typedef ( ImageBitmapSource or AudioBuffer or BufferSource or DOMString + or LanguageModelToolCall + or LanguageModelToolResponse ) LanguageModelMessageValue; + +// The definitions of `LanguageModelToolCall` and `LanguageModelToolResponse` values +enum LanguageModelToolResultType { "text", "image", "audio", "object" }; + +dictionary LanguageModelToolResultContent { + required LanguageModelToolResultType type; + required any value; +}; + +// Represents a tool call requested by the language model. +dictionary LanguageModelToolCall { + required DOMString callID; + required DOMString name; + object arguments; +}; + +// Successful tool execution result. +dictionary LanguageModelToolSuccess { + required DOMString callID; + required DOMString name; + required sequence result; +}; + +// Failed tool execution result. +dictionary LanguageModelToolError { + required DOMString callID; + required DOMString name; + required DOMString errorMessage; +}; + +// The response from executing a tool call - either success or error. +typedef (LanguageModelToolSuccess or LanguageModelToolError) LanguageModelToolResponse; + +

Prompt processing

From ed78c121f50a45af2f8ca9cd25613cb68677032b Mon Sep 17 00:00:00 2001 From: jingyun19 Date: Wed, 26 Nov 2025 09:08:32 -0800 Subject: [PATCH 4/7] Update README.md Co-authored-by: Thomas Steiner --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6bcf806..d4aaea4 100644 --- a/README.md +++ b/README.md @@ -141,7 +141,7 @@ Because of their special behavior of being preserved on context window overflow, The Prompt API supports **tool use** via the `tools` option, allowing you to define external capabilities that a language model can invoke in a model-agnostic way. Each tool is represented by an object that includes an `execute` member that specifies the JavaScript function to be called. When the language model initiates a tool use request, the user agent calls the corresponding `execute` function and sends the result back to the model. -There are 2 tool use modes: with automatic execution (closed loop) and without automatic execution (open loop) +There are two tool use modes: with automatic execution (closed loop) and without automatic execution (open loop). Regardless of with or without automatic execution, the session creation and appending signature are the same. Here’s an example: From 5b11a1beba282caac1cad95973a676e15c8762e2 Mon Sep 17 00:00:00 2001 From: jingyun19 Date: Wed, 26 Nov 2025 09:08:43 -0800 Subject: [PATCH 5/7] Update README.md Co-authored-by: Thomas Steiner --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index d4aaea4..804f4bd 100644 --- a/README.md +++ b/README.md @@ -150,8 +150,8 @@ const session = await LanguageModel.create({ initialPrompts: [ { role: "system", - content: `You are a helpful assistant. You can use tools to help the user.` - } + content: `You are a helpful assistant. You can use tools to help the user.`, + }, ], tools: [ { @@ -167,8 +167,8 @@ const session = await LanguageModel.create({ }, required: ["location"], }, - } - ] + }, + ], }); ``` From 5a0cbefcc650dd756f38ae26b546f9deaf16c0a4 Mon Sep 17 00:00:00 2001 From: jingyun19 Date: Wed, 26 Nov 2025 11:49:37 -0800 Subject: [PATCH 6/7] add open vs. loop use case diff --- README.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/README.md b/README.md index 804f4bd..18f8788 100644 --- a/README.md +++ b/README.md @@ -256,6 +256,30 @@ const result = await session.prompt("What is the weather in Seattle?"); When the language model determines that a tool call is needed, the user agent invokes the `getWeather` tool's `execute()` function with the provided arguments and returns the result to the model, which can then incorporate it into its response. +#### Do I need auto execution? + +In general, automatic execution is suitable for use cases where the model quality is good enough via prompt tuning. That can either mean you are tolerable for certain mistakes that the model makes when making tool calls, or the task is simple enough for the model to handle (e.g, just a few distinct tools, short and clean tool output, short context window, etc) + +On the other hand, open loop allows more flexibility for intercepting at various points in the planner loop (the reason->action->observation loop) where you can inject your business logic programmatically. + +Here are a few patterns where open loop would be useful: + +1) context management + +If your session might go through a long chain of contents, and the previous tool results are no longer important or relevant for your use case, open loop gives the flexibility of editing and recreating the session in the middle of a tool call. You can manually compress and modify the history, and recreate a new session with less content. + +For example, for a shopping agent, your tool keeps track of a live shopping cart, but only the latest cart status is important. When there have been multiple rounds of cart updates, you might need to compress the tool call history to avoid exceeding context window, improve latency and quality. + +2) Conditional loop breaking + +If your business logic requires some determinism in some critical states, open loop allows the flexibility to early exit the planner loop and output a pre-determined action. + +For example, for a shopping agent, you might be required to get an explicit confirmation before placing the order. Whenever the tool `"place_order"` is called in the first time, you want to exit the planner loop immediately, and display a verbatim message to the user + +3) Conditional constraints + + + #### Concurrent tool use Developers should be aware that the model might call their tool multiple times, concurrently. For example, code such as From 81e75afb93de8a640b5791f8027b5941991f4d7a Mon Sep 17 00:00:00 2001 From: jingyun19 Date: Mon, 1 Dec 2025 09:25:32 -0800 Subject: [PATCH 7/7] Enhance README with planner loop constraints details Added explanation about automatic execution and constraints in planner loop. --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 18f8788..814f333 100644 --- a/README.md +++ b/README.md @@ -277,7 +277,10 @@ If your business logic requires some determinism in some critical states, open l For example, for a shopping agent, you might be required to get an explicit confirmation before placing the order. Whenever the tool `"place_order"` is called in the first time, you want to exit the planner loop immediately, and display a verbatim message to the user 3) Conditional constraints - + +In automatic execution, the planner loop decodes various and mutliple times. If you need to supply constraints dynamically, you'd use the open loop API and control the planner loop yourself. Because the planner loop runs the entire loop behind the scene, the closed loop API doesn't have a natural way to supply a different constraint for each LLM step. + +For example, you might want the model to always generate tool `FOO` after tool `BAR` is called; or you might want the model to always generate text only with some prefix after tool `FOO` is called. #### Concurrent tool use