Real-time Stream Based AI Assistant #20

lucasjinreal · 2025-01-22T07:17:51Z

Hello, this is one of my initial proposals for implementing a real-time stream-based AI assistant powered by pure Rust. Given Kokoro's significant role in text speech and the rapid evolution of Large Language Models (LLMs), here are my thoughts on how to achieve this. I will present model selection and the overall architecture. If you are interested, please comment below and share how you can contribute. Together, we can build it. The ultimate goal could be to implement a terminal voice AI assistant as a prototype.

Goal

A voice-based AI assistant (agents). It will possess voice understanding ability (ASR+) and Text-to-Speech (TTS) capabilities (currently mainly in Chinese, with stream mode). In addition to its perception (hearing and speaking), it can have the following abilities:

Calling tools such as your file explorer, calendar, computer browser, etc.
Having memories, not in the form of Retrieval-Augmented Generation (RAG), but through memory extraction of some of your main ideas (similar to short, shared memories).
Having an interface to control more things, such as your home intelligent devices.

With these three main goals, I believe this will be an assistant that lives with you, understands you, and helps you with many daily tasks.

On the engineering side, two rules should be followed:

Models need to be a combination of cloud and local. Tiny models should run fast and include Voice Activity Detection (VAD), Automatic Speech Recognition (ASR), and TTS, etc.
Agents should be reusable.

Checkpoints

Stage 1: A workable version that stitches components such as ASR, LLM, and TTS.
Stage 2: An audio model that combines LLM and audio encoder to understand audio input and perform TTS.
Stage 3: An end-to-end multimodal model similar to 4o that can understand voice and speak out with clear, nice, and expressive voice.
Stage 4: Become a Human Experience Replicator (HER).

Leave comments below let me saw your ideas.

Useful links

SLAM-Omini: https://github.com/X-LANCE/SLAM-LLM pure e2e, however, not sure under the hood

lucasjinreal · 2025-01-22T11:56:09Z

output2_added_subtitle.mp4

This is how it like as for now.

devilankur18 · 2025-01-24T19:28:07Z

@lucasjinreal sounds really interesting. Why you are limiting this only to rust ? you have any thoughts on runing it in browser using wasm / onnx ?

lucasjinreal · 2025-01-25T07:17:00Z

@devilankur18 Pleased to learn that you are interested in this topic.

Why limit it to Rust? There are several reasons:

I aim to deploy this "model" or an intelligent "hub" more easily. Python may save time in development, but it becomes truly annoying when dealing with large projects. Rust deployment can be a single binary file.
I believe that in some scenarios, Rust can be much faster.
Rust is equivalent to WebAssembly (wasm), a buildable language that is extremely easy to use on any platform.

Regarding the model part, it may not have just one model, so it could run through ONNX or Candle.

devilankur18 · 2025-01-26T21:59:24Z

@lucasjinreal I was trying to find some benchmarks in rust using candles / burn, tried examples in browser, not sure if gains are much as of today. I am pretty new to rust ml, You have some bencharks for llm models ?

Also let me know where I can be of help.

lucasjinreal · 2025-01-27T00:28:49Z

@devilankur18 I think using a same model, such as Qwen2B, using llama.cpp, candle to run see the time consume.
Burn should used mainly for training, I would also like to see your result, hoping to see your updates interms of it!

lucasjinreal pinned this issue Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real-time Stream Based AI Assistant #20

Real-time Stream Based AI Assistant #20

lucasjinreal commented Jan 22, 2025 •

edited

Loading

lucasjinreal commented Jan 22, 2025

devilankur18 commented Jan 24, 2025

lucasjinreal commented Jan 25, 2025

devilankur18 commented Jan 26, 2025

lucasjinreal commented Jan 27, 2025

Real-time Stream Based AI Assistant #20

Real-time Stream Based AI Assistant #20

Comments

lucasjinreal commented Jan 22, 2025 • edited Loading

Goal

Checkpoints

Useful links

lucasjinreal commented Jan 22, 2025

devilankur18 commented Jan 24, 2025

lucasjinreal commented Jan 25, 2025

devilankur18 commented Jan 26, 2025

lucasjinreal commented Jan 27, 2025

lucasjinreal commented Jan 22, 2025 •

edited

Loading