-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
All LLM platforms out there will stream the text piece by piece so you can see the response as it comes out. Right now, ours waits until the chatbot fully generates an answer before displaying it to the user.
This might be harder than it looks, and might not be worth it.
- The frontend will need to be updated to show a streamed response (many libraries are built for this, i think even antd or shadcn has some components for it)
- Backend: Needs to stream from Ollama server (or OpenAI technically) to our Chatbot server to our HelpMe server to our Frontend. This might be somewhat easy to do or a huge pain in the ass.
- We need to save the chatbot response to the database. We could maybe stream it into the database (maybe???), or just save it once the stream has concluded.
- Users are typically able to stop a LLM part-way through its response. Good luck catching all the edge cases with this one.
- Some LLMs (such as Deepseek R1) have
<think>blocks. Right now, our frontend will just parse out the<think> thinking text (usually many words) </think>but it will need to be modified to show "thinking..." if there's no</think> - Probably more issues with this that I haven't yet thought about
Overall, I don't think it would be too hard to pull off, but it's probably just a lot more work than it's worth. Faster chatbot responses is nice, though.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels