Implement audio chat provider by julien-nc · Pull Request #227 · nextcloud/integration_openai

julien-nc · 2025-07-02T12:26:59Z

Requires nextcloud/assistant#291 (for the mp3 recording).
Related with nextcloud/server#53759

If connected to OpenAI, we can use the chat completion endpoint with the gpt-4o-audio-preview model which accepts audio messages as input.

If not connected to OpenAI, it's safer to split the process in: STT -> LLM -> TTS.

lib/AppInfo/Application.php

kyteinsky

just nitpicks :)

lib/TaskProcessing/AudioToAudioChatProvider.php

lib/TaskProcessing/AudioToAudioChatTaskType.php

lib/TaskProcessing/AudioToAudioChatProvider.php

lib/TaskProcessing/AudioToAudioChatTaskType.php

lib/TaskProcessing/AudioToAudioChatProvider.php

lukasdotcom

LGTM

julien-nc · 2025-07-09T14:46:20Z

A few new things:

The "one step" is actually 2 steps because we need the input transcription (that is not done by multimodal models)
We return the response "audio ID" as optional output so the assistant can use it when scheduling following tasks. This helps to make sure the multimodal model keeps responding with audio

lukasdotcom

Splitting it into two functions is a good idea, and I have some feedback on the audio id portion.

lib/TaskProcessing/AudioToAudioChatProvider.php

kyteinsky

🚀

kyteinsky · 2025-07-10T09:41:23Z

lib/TaskProcessing/AudioToAudioChatProvider.php

+				throw new RuntimeException($serviceName . ' text to speech generation failed with: ' . $e->getMessage());
+			}
+		} else {
+			$output = base64_decode($message['audio']['data']);


this audio data expires at a certain time expires_at. Should we check that and fallback to the above TTS approach to get the new audio and audio id?

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

…input if using chat endpoint Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

…he server task type is available, handle the case when the chat endpoint does not return audio Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

… multimodal model Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

lukasdotcom

Other than what @kyteinsky said everything else LGTM.

julien-nc · 2025-07-10T12:44:58Z

@kyteinsky @lukasdotcom The expiration is checked where the history array is built. Currently the only place where we do that is when the assistant schedules an audio chat task, in https://github.com/nextcloud/assistant/pull/292/files#diff-bacc1ea613cd8effdacf1a14dd4d819266ed4315d83dc81e94cad93b8d071ff4R591-R592

In integration_openai, we won't receive the expiration date in the history input, we can't check it here.

julien-nc requested review from edward-ly, janepie and kyteinsky July 2, 2025 12:26

julien-nc added enhancement New feature or request 3. to review labels Jul 2, 2025

julien-nc force-pushed the enh/noid/voice2voice branch from d285a78 to c9fa7c1 Compare July 2, 2025 12:27

lukasdotcom reviewed Jul 2, 2025

View reviewed changes

lib/AppInfo/Application.php Outdated Show resolved Hide resolved

kyteinsky reviewed Jul 3, 2025

View reviewed changes

julien-nc requested review from kyteinsky and lukasdotcom July 3, 2025 16:15

julien-nc force-pushed the enh/noid/voice2voice branch from a8ac1af to 264930e Compare July 3, 2025 16:15

lukasdotcom reviewed Jul 3, 2025

View reviewed changes

julien-nc requested a review from lukasdotcom July 4, 2025 12:44

lukasdotcom approved these changes Jul 8, 2025

View reviewed changes

julien-nc force-pushed the enh/noid/voice2voice branch 3 times, most recently from 9adb87e to 720d619 Compare July 9, 2025 14:44

julien-nc requested a review from lukasdotcom July 9, 2025 14:46

lukasdotcom requested changes Jul 9, 2025

View reviewed changes

lib/TaskProcessing/AudioToAudioChatProvider.php Show resolved Hide resolved

lib/TaskProcessing/AudioToAudioChatProvider.php Show resolved Hide resolved

julien-nc force-pushed the enh/noid/voice2voice branch from 720d619 to 688d951 Compare July 10, 2025 09:09

kyteinsky approved these changes Jul 10, 2025

View reviewed changes

julien-nc requested a review from lukasdotcom July 10, 2025 10:20

julien-nc added 7 commits July 10, 2025 12:21

implement audio chat provider

f178063

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

adjust AudioToAudioChatTaskType

6334cae

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

feat(audio-chat): add condition to register provider and task type

e3a6d2d

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

review adjustments

b875987

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

fix tests

2054ea6

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

change defaults and enum values if using openai or not, use llmModel …

456a8a1

…input if using chat endpoint Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

use service name in logs instead of hardcoded value, fall back to app ID

f00717e

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

julien-nc added 4 commits July 10, 2025 12:21

remove fallback audio chat task type, register the provider only if t…

47b9c86

…he server task type is available, handle the case when the chat endpoint does not return audio Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

isolate oneStep audio2audio processing in a provider method

42c644c

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

try to return the remote audio ID when using the chat endpoint with a…

cd67b7f

… multimodal model Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

return remote audio expiration date

3ea9b77

Signed-off-by: Julien Veyssier <julien-nc@posteo.net>

julien-nc force-pushed the enh/noid/voice2voice branch from 6a48831 to 3ea9b77 Compare July 10, 2025 10:21

lukasdotcom approved these changes Jul 10, 2025

View reviewed changes

julien-nc merged commit acbcb17 into main Jul 10, 2025
29 checks passed

julien-nc deleted the enh/noid/voice2voice branch July 10, 2025 12:45

kyteinsky mentioned this pull request Jul 11, 2025

3.7.0 #235

Merged

Conversation

julien-nc commented Jul 2, 2025

Uh oh!

Uh oh!

kyteinsky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukasdotcom left a comment

Choose a reason for hiding this comment

Uh oh!

julien-nc commented Jul 9, 2025

Uh oh!

lukasdotcom left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kyteinsky left a comment

Choose a reason for hiding this comment

Uh oh!

kyteinsky Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

lukasdotcom left a comment

Choose a reason for hiding this comment

Uh oh!

julien-nc commented Jul 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants