feat(openai): support TTS and Audio models by CorieW · Pull Request #189 · genkit-ai/genkit-dart

CorieW · 2026-03-02T10:55:30Z

Resolves #168

This PR adds support for TTS models (e.g. gpt-4o-mini-tts) and audio models (e.g. gpt-4o-audio-preview) in the OpenAI plugin.

Testing:

TTS model

Audio model

gemini-code-assist · 2026-03-02T10:55:53Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the OpenAI plugin's capabilities by integrating support for Text-to-Speech (TTS) and chat models that can generate audio responses. Users can now leverage OpenAI's speech synthesis features to convert text into spoken audio and engage with chat models that provide audio as part of their output. This enhancement provides a richer, more interactive experience by enabling new forms of multimodal communication with OpenAI models.

Highlights

OpenAI Text-to-Speech (TTS) Support: Implemented full support for OpenAI's Text-to-Speech models, allowing users to generate audio from text input with configurable voices and formats. This includes a new dedicated handler for TTS requests.
OpenAI Chat Audio Model Support: Added support for OpenAI chat models that can return audio responses, enabling multimodal interactions where models can speak back to the user. This involves new responseModalities options and audio aggregation logic for streaming responses.
New Configuration Options: Introduced new OpenAIOptions fields: responseModalities to specify desired output types (text, audio), audioVoice for selecting the speech voice, and audioFormat for choosing the audio output format.
Enhanced Model Information and Type Detection: Updated model information helpers to correctly identify and categorize 'audio' and 'tts' model types, ensuring appropriate handling and capabilities are exposed for these new model categories.
Streaming Audio Aggregation: Developed new internal logic to correctly aggregate fragmented audio data received from streaming chat completion responses, ensuring complete audio output is reconstructed.
Documentation and Examples: Updated the README.md with usage examples for both TTS and chat audio models, and added new example flows in example.dart and dedicated sample applications to demonstrate these features.

Changelog

packages/genkit_openai/README.md
- Updated documentation with new sections and code examples for Text-to-Speech and Chat Audio Models.
- Expanded the OpenAIOptions description to include new audio-related parameters.
packages/genkit_openai/example/example.dart
- Added new example flows for textToSpeech and chatAudioModels to demonstrate new functionalities.
packages/genkit_openai/lib/genkit_openai.dart
- Exported new model info functions (audioModelInfo, ttsModelInfo).
- Added responseModalities, audioVoice, and audioFormat properties to OpenAIOptions.
packages/genkit_openai/lib/genkit_openai.g.dart
- Updated the generated OpenAIOptions class to include new audio-related fields and their serialization/deserialization logic.
packages/genkit_openai/lib/src/aggregation.dart
- Introduced _AudioAccumulator to aggregate audio chunks from streaming responses.
- Modified aggregateStreamResponses to process and accumulate audio deltas.
packages/genkit_openai/lib/src/converters.dart
- Modified fromOpenAIAssistantMessage to convert audio payloads into MediaPart objects.
- Added a helper function _audioMimeType to determine the correct MIME type for audio formats.
packages/genkit_openai/lib/src/models.dart
- Added audioModelInfo and ttsModelInfo functions to define capabilities for audio generation models.
- Updated model support checks to include new audio model types.
packages/genkit_openai/lib/src/openai_plugin.dart
- Imported dart:convert and openai_dart/src/generated/client.dart for audio handling.
- Added utility functions resolveOpenAIModalities, isSpeechSynthesisModel, _parseOpenAIModality, _extractSpeechInputText, _speechFormatToApiValue, _speechFormatToMimeType, and _resolveSpeechContentType for audio processing.
- Updated getModelType to differentiate between 'audio', 'tts', and 'stt' models.
- Modified _getModelInfo to return appropriate ModelInfo for audio and TTS models.
- Adjusted model registration to include 'audio' and 'tts' types.
- Implemented _handleSpeechSynthesis for dedicated TTS model requests.
- Modified _handleStreaming and _handleNonStreaming to support audio responses from chat models, including passing audioFormat for conversion.
packages/genkit_openai/test/aggregation_test.dart
- Added _audioChunk helper for creating audio stream chunks in tests.
- Added a test case to verify aggregation of streamed audio fragments.
packages/genkit_openai/test/integration_test.dart
- Added an integration test for text-to-speech generation.
packages/genkit_openai/test/openai_plugin_test.dart
- Imported isSpeechSynthesisModel and resolveOpenAIModalities for testing.
- Added tests for OpenAIOptions parsing of audio-related fields.
- Added tests for resolveOpenAIModalities covering default behavior, normalization, and error handling.
- Added tests for isSpeechSynthesisModel to correctly identify TTS models.
- Added a test for GenkitConverter.fromOpenAIAssistantMessage to verify audio payload conversion.
- Added tests for audioModelInfo and ttsModelInfo to confirm correct model support settings.
testapps/openai_sample/lib/chat_audio.dart
- Added new file defining a sample flow for generating speech audio using OpenAI chat audio models.
testapps/openai_sample/lib/text_to_speech.dart
- Added new file defining a sample flow for generating speech audio using OpenAI text-to-speech models.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request integrates support for OpenAI's Text-to-Speech (TTS) and chat audio models. A critical security vulnerability has been identified where the TTS handler hardcodes the OpenAI API base URL, bypassing custom baseUrl configurations and potentially leading to unintended data leakage. Additionally, there are opportunities for general code improvement and test corrections to enhance robustness and maintainability.

packages/genkit_openai/test/openai_plugin_test.dart

packages/genkit_openai/lib/src/openai_plugin.dart

packages/genkit_openai/lib/src/models.dart

packages/genkit_openai/lib/src/openai_plugin.dart

CorieW · 2026-03-03T17:24:10Z

packages/genkit_openai/lib/src/openai_plugin.dart

+    final responseFormat = _speechFormatToApiValue(format);
+    final requestedMimeType = _speechFormatToMimeType(format);
+
+    // ignore: invalid_use_of_protected_member


Should we do this?

…dio-models

CorieW added 3 commits February 25, 2026 17:21

feat(openai): add TTS and Audio models

a3ced6d

refactor(testapps/openai): remove unused flow

79fb4ec

fix(openai): analyze issues

f4dc212

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

chore(openai): address feedback

680d808

CorieW commented Mar 3, 2026

View reviewed changes

CorieW added 2 commits March 3, 2026 22:42

Merge remote-tracking branch 'origin/main' into openai-add-tts-and-au…

509ff66

…dio-models

chore: format

da642b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openai): support TTS and Audio models#189

feat(openai): support TTS and Audio models#189
CorieW wants to merge 6 commits intogenkit-ai:mainfrom
invertase:openai-add-tts-and-audio-models

CorieW commented Mar 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CorieW Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CorieW commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CorieW Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CorieW commented Mar 2, 2026 •

edited

Loading