An AI assistant that supports answering questions and performing tasks from users, with speech-to-text and text-to-speech capabilities.
-
Speech-to-Text - Implement repo Fast-whisper-server - use PhoWhisper large model
Converts the vinai/PhoWhisper-large model (a pretrained ASR model by VINAI) from the Hugging Face transformers format into the CTranslate2 format for optimized inference.
-
Text-to-Speech - Implement repo vietTTS
-
LLM-based Agent - Use Qwen2.5:3b self-host by vLLM serving
-
RESTful API interface - FastAPI
-
Burr - Monitor, trace, persist, and execute on my own agent infrastructure.
- Python 3.10 or higher
- poetry
- Docker
- Burr Framwork
- instructor
- vLLM
sudo make run # Run docker-compose -> start all container
sudo make stop # Stop all containerThe project can be configured through:
- llm.yaml - LLM settings
- models.yaml - Model configurations
- params.yaml - General paramters
- text2speech - TTS settings
- Endpoint:
/botvov/create_new - Method:
GET - Description: Registers a new user and creates a new application instance.
- Response:
status_code: HTTP status codeuid: Unique identifier for the new applicationmessage: Response messagestatus: Operation status
- Endpoint:
/botvov/send_audio_query - Method:
POST - Description: Processes an audio query and returns the corresponding command.
- Parameters:
uid: Unique identifier for the applicationlat: Latitude of the user's locationlong: Longitude of the user's locationaudio: Audio file to be processed
- Response:
audio/wav: Processed audio response
- Endpoint:
/botvov/get_audio_response - Method:
GET - Description: Retrieves the command response for a given application.
- Parameters:
uid: Unique identifier for the application
- Response:
status_code: HTTP status codetext_response: Textual response from the applicationdata: Command response datamessage: Response messagestatus: Operation status
BOTVOV/
├── botvov/ # Main service: Qwen2.5:3b LLM
│ ├── main.py # Define API endpoint
│ ├── llm_service.py # Use Burr framework to build multi-agent workflow
│ ├── stt_service.py # Call speech2tech service
│ ├── tts_service.py # Call text2speech service
│ ├── run_burr_UI.sh # Bash script to run: Burr tracking, run main
│ ├── models.py # Define some models (Pydantic)
│ ├── utils.py
│ └── tool_calling/
│ ├── VOV_channel.py # Define class to provide vov channel information from API
│ └── Weather.py # Define class to provide weather from OpenWeather API
├── speech2text
├── text2speech
├── docs/ # API docs
├── .gitignore
├──
├── requirements.txt # Danh sách các thư viện cần thiết
├── Dockerfile # Dockerfile để đóng gói ứng dụng
└── README.md # Tệp hướng dẫn sử dụng dự án
