Skip to content

Latest commit

 

History

History
165 lines (126 loc) · 5.99 KB

STREAMING.md

File metadata and controls

165 lines (126 loc) · 5.99 KB

Streaming for Bots

NOTE: This feature is in the rollout phase and is available only to specific tenants. Our team is actively working on enabling this feature fully on Teams and across all languages in the SDK. Rest assured; we are diligently working to enable this feature for everyone. Updates will be posted on the Discussions page.

Navigation


AI-powered bots tend to have slower response times which can disengage users. There are two factors that contribute to a slow response. The first is the multiple preprocessing steps such as RAG or function calls which take time and are often required before the LLM can produce a response. The second is the time the LLM takes to generate a full response.

A common solution is to stream the bot’s response to users while the LLM generates its full response. Through streaming, your bot can offer an experience that feels engaging, responsive, and on-par with leading AI products.

There are two parts to streaming:

  • Informative Updates: Provide users with insights into what your bot is doing before it has started generating its response.

  • Response Streaming: Provide users with chunks of the response as they are generated by the LLM. This feels like the bot is actively typing out its message.

Sample Bots

Streaming Response Class

The StreamingResponse class is the helper class for streaming responses to the client. The class is used to send a series of updates to the client in a single response. If you are using your own custom model, you can directly instantiate and manage this class to stream responses.

The expected sequence of calls is:

  1. queueInformativeUpdate()
  2. queueTextChunk(), ...,
  3. endStream().

Once endStream() is called, the stream is considered ended and no further updates can be sent.

Configuration with Azure Open AI / Open AI

Current Limitations:

  • Streaming is only available in 1:1 chats.
  • Only rich text can be streamed.
  • Only one informative message can be set. This is reused for each message.
    • Examples include:
      • “Scanning through documents”
      • “Summarizing content”
      • “Finding relevant work items”
  • The informative message is rendered only at the beginning of each message returned from the LLM.
  • Attachments can only be sent in the final streamed chunk.
  • Streaming is not available in conjunction with AI SDK's function calls yet.

Setup Instructions:

You can configure streaming with your bot by following these steps:

  • Use the DefaultAugmentation class
  • Set stream: true in the OpenAIModel declaration

Optional additions:

  • Set the informative message in the ActionPlanner declaration via the StartStreamingMessage config.
  • Set attachments in the final chunk via the EndStreamHandler in the ActionPlanner declaration.

C#

    // Create OpenAI Model
    builder.Services.AddSingleton<OpenAIModel > (sp => new(
        new OpenAIModelOptions(config.OpenAI.ApiKey, "gpt-4o")
        {
            LogRequests = true,
            Stream = true,              // Set stream toggle
        },
        sp.GetService<ILoggerFactory>()
    ));

ResponseReceivedHandler endStreamHandler = new((object sender, ResponseReceivedEventArgs args) =>
    {
        StreamingResponse? streamer = args.Streamer;

        if (streamer == null)
        {
            return;
        }

        AdaptiveCard adaptiveCard = new("1.6")
        {
            Body = [new AdaptiveTextBlock(streamer.Message) { Wrap = true }]
        };

        var adaptiveCardAttachment = new Attachment()
        {
            ContentType = "application/vnd.microsoft.card.adaptive",
            Content = adaptiveCard,
        };


        streamer.Attachments = [adaptiveCardAttachment];    // Set attachments

    });


    // Create ActionPlanner
    ActionPlanner<TurnState> planner = new(
        options: new(
            model: sp.GetService<OpenAIModel>()!,
            prompts: prompts,
            defaultPrompt: async (context, state, planner) =>
            {
                PromptTemplate template = prompts.GetPrompt("Chat");
                return await Task.FromResult(template);
            }
        )
        {
            LogRepairs = true,
            StartStreamingMessage = "Loading stream results...", // Set informative message
            EndStreamHandler = endStreamHandler // Set final chunk handler
        },
        loggerFactory: loggerFactory
    );

JS/TS

const model = new OpenAIModel({
    // ...Setup OpenAI or AzureOpenAI
    stream: true,                                         // Set stream toggle
});

const endStreamHandler: PromptCompletionModelResponseReceivedEvent = (ctx, memory, response, streamer) => {
    // ... Setup attachments
    streamer.setAttachments([...cards]);                      // Set attachments
};

const planner = new ActionPlanner({
    model,
    prompts,
    defaultPrompt: 'default',
    startStreamingMessage: 'Loading stream results...', // Set informative message
    endStreamHandler: endStreamHandler                  // Set final chunk handler
});

Return to other major section topics: