Conversation
d285a78 to
c9fa7c1
Compare
a8ac1af to
264930e
Compare
9adb87e to
720d619
Compare
|
A few new things:
|
lukasdotcom
left a comment
There was a problem hiding this comment.
Splitting it into two functions is a good idea, and I have some feedback on the audio id portion.
720d619 to
688d951
Compare
| throw new RuntimeException($serviceName . ' text to speech generation failed with: ' . $e->getMessage()); | ||
| } | ||
| } else { | ||
| $output = base64_decode($message['audio']['data']); |
There was a problem hiding this comment.
this audio data expires at a certain time expires_at. Should we check that and fallback to the above TTS approach to get the new audio and audio id?
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
…input if using chat endpoint Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
…he server task type is available, handle the case when the chat endpoint does not return audio Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
… multimodal model Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
Signed-off-by: Julien Veyssier <julien-nc@posteo.net>
6a48831 to
3ea9b77
Compare
lukasdotcom
left a comment
There was a problem hiding this comment.
Other than what @kyteinsky said everything else LGTM.
|
@kyteinsky @lukasdotcom The expiration is checked where the history array is built. Currently the only place where we do that is when the assistant schedules an audio chat task, in https://github.com/nextcloud/assistant/pull/292/files#diff-bacc1ea613cd8effdacf1a14dd4d819266ed4315d83dc81e94cad93b8d071ff4R591-R592 In integration_openai, we won't receive the expiration date in the history input, we can't check it here. |
Requires nextcloud/assistant#291 (for the mp3 recording).
Related with nextcloud/server#53759
If connected to OpenAI, we can use the chat completion endpoint with the
gpt-4o-audio-previewmodel which accepts audio messages as input.If not connected to OpenAI, it's safer to split the process in: STT -> LLM -> TTS.