-
Notifications
You must be signed in to change notification settings - Fork 1
voice_assistant_integration
Version: 1.0
Date: December 2025
Status: Implementation Guide
This document provides step-by-step instructions for integrating actual Whisper.cpp and Piper TTS models with ThemisDB Voice Assistant.
- CMake 3.20 or higher
- C++20 compatible compiler (GCC 10+, Clang 12+, MSVC 2019+)
- Git for cloning repositories
- ONNX Runtime for Piper TTS (optional, can be bundled)
- CUDA Toolkit 11.x or 12.x (NVIDIA GPUs)
- cuBLAS (comes with CUDA)
cd /path/to/ThemisDB
mkdir -p external
cd external
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cppFor CPU-only:
mkdir build && cd build
cmake ..
cmake --build . --config ReleaseFor GPU (CUDA):
mkdir build && cd build
cmake .. -DWHISPER_CUBLAS=ON
cmake --build . --config ReleaseFor GPU (HIP - AMD):
mkdir build && cd build
cmake .. -DWHISPER_HIPBLAS=ON
cmake --build . --config Releasecd /path/to/ThemisDB/external/whisper.cpp
# Download base model (recommended for starting)
bash ./models/download-ggml-model.sh base
# Or download other models:
# bash ./models/download-ggml-model.sh tiny # Fastest, least accurate
# bash ./models/download-ggml-model.sh small # Good balance
# bash ./models/download-ggml-model.sh medium # Better accuracy
# bash ./models/download-ggml-model.sh large-v3 # Best accuracyModels will be downloaded to ./models/ directory.
cd /path/to/ThemisDB/external
git clone https://github.com/rhasspy/piper.git
cd piperInstall dependencies first:
On Ubuntu/Debian:
sudo apt-get install libespeak-ng-dev libonnxruntime-devOn macOS:
brew install espeak-ng onnxruntimeBuild Piper:
cd src/cpp
mkdir build && cd build
cmake ..
cmake --build . --config Releasecd /path/to/ThemisDB
mkdir -p models/voices
# Download English voice (Amy - US Female)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/medium/en_US-amy-medium.onnx \
-O models/voices/en_US-amy-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/medium/en_US-amy-medium.onnx.json \
-O models/voices/en_US-amy-medium.onnx.json
# Download German voice (Thorsten - Male)
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/de/de_DE/thorsten/medium/de_DE-thorsten-medium.onnx \
-O models/voices/de_DE-thorsten-medium.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/de/de_DE/thorsten/medium/de_DE-thorsten-medium.onnx.json \
-O models/voices/de_DE-thorsten-medium.onnx.jsonMore voices available at: https://huggingface.co/rhasspy/piper-voices/tree/main
cd /path/to/ThemisDB
mkdir -p build && cd build
cmake .. \
-DTHEMIS_ENABLE_VOICE_ASSISTANT=ON \
-DTHEMIS_ENABLE_WHISPER=ON \
-DTHEMIS_ENABLE_PIPER_TTS=ON \
-DTHEMIS_ENABLE_LLM=ON \
-DWHISPER_ROOT=/path/to/ThemisDB/external/whisper.cpp \
-DPIPER_ROOT=/path/to/ThemisDB/external/piper/src/cpp \
-DCMAKE_BUILD_TYPE=ReleaseFor GPU acceleration (CUDA):
cmake .. \
-DTHEMIS_ENABLE_VOICE_ASSISTANT=ON \
-DTHEMIS_ENABLE_WHISPER=ON \
-DTHEMIS_ENABLE_PIPER_TTS=ON \
-DTHEMIS_ENABLE_LLM=ON \
-DTHEMIS_ENABLE_CUDA=ON \
-DWHISPER_ROOT=/path/to/ThemisDB/external/whisper.cpp \
-DPIPER_ROOT=/path/to/ThemisDB/external/piper/src/cpp \
-DCMAKE_BUILD_TYPE=Releasecmake --build . --config Release -j$(nproc)Edit config/processors/stt.yaml:
processor:
model:
# Point to your downloaded Whisper model
path: "./models/ggml-base.bin"
size: "base"
auto_download: falseEdit config/processors/tts.yaml:
processor:
model:
# Point to your downloaded Piper voice
path: "./models/voices/en_US-amy-medium.onnx"
engine: "piper"
auto_download: falseEdit config/voice_assistant.yaml:
voice_assistant:
enabled: true
stt:
model_path: "./models/ggml-base.bin"
model_size: "base"
language: "auto"
tts:
model_path: "./models/voices/en_US-amy-medium.onnx"
voice: "en_US-amy-medium"
llm:
model_path: "./models/llama-2-7b-chat.gguf"
n_ctx: 4096cd /path/to/ThemisDB/build
./themis_server --config ../config/themis.yaml --enable-voice-assistant# Using Python example
cd /path/to/ThemisDB
python examples/voice_assistant_example.pyOr use curl:
# Prepare test audio (example with a WAV file)
base64 test_audio.wav > audio_base64.txt
# Call transcribe API
curl -X POST http://localhost:8080/api/v1/voice/transcribe \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"audio_base64\": \"$(cat audio_base64.txt)\", \"language\": \"en\"}"curl -X POST http://localhost:8080/api/v1/voice/synthesize \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"text": "Hello, this is ThemisDB Voice Assistant", "voice": "default", "return_base64": true}' \
| jq -r '.audio_base64' | base64 -d > output.wav
# Play the generated audio
aplay output.wav # Linux
afplay output.wav # macOSimport requests
import base64
# Read audio file
with open("call_recording.wav", "rb") as f:
audio_base64 = base64.b64encode(f.read()).decode()
# Record and transcribe phone call
response = requests.post(
"http://localhost:8080/api/v1/voice/call/record",
headers={"Authorization": "Bearer YOUR_TOKEN"},
json={
"audio_base64": audio_base64,
"caller": "+1234567890",
"callee": "+0987654321",
"call_type": "inbound"
}
)
result = response.json()
print(f"Transcript: {result['transcript']}")
print(f"Summary: {result['summary']}")
print(f"Document ID: {result['document_id']}")Solution:
- Verify model file exists at the configured path
- Check file permissions
- Ensure CMake found Whisper.cpp library during build
- Check server logs for detailed error messages
Solution:
- Verify ONNX model and .json config files exist
- Ensure ONNX Runtime is installed
- Check ONNX model compatibility (should be Piper format)
- Verify sufficient memory available
Solution:
# Ensure you have the latest version
cd external/whisper.cpp
git pull
cd build
rm -rf *
cmake .. -DWHISPER_BUILD_TESTS=OFF -DWHISPER_BUILD_EXAMPLES=OFF
cmake --build . --config ReleaseSolution:
Ubuntu/Debian:
wget https://github.com/microsoft/onnxruntime/releases/download/v1.16.3/onnxruntime-linux-x64-1.16.3.tgz
tar xzf onnxruntime-linux-x64-1.16.3.tgz
sudo cp -r onnxruntime-linux-x64-1.16.3/include/* /usr/local/include/
sudo cp -r onnxruntime-linux-x64-1.16.3/lib/* /usr/local/lib/
sudo ldconfigmacOS:
brew install onnxruntimeFor NVIDIA GPUs, build with CUDA support:
cmake .. \
-DTHEMIS_ENABLE_VOICE_ASSISTANT=ON \
-DTHEMIS_ENABLE_WHISPER=ON \
-DTHEMIS_ENABLE_CUDA=ON \
-DWHISPER_CUBLAS=ONUpdate config:
stt:
performance:
use_gpu: true
gpu_device_id: 0| Model | Speed | Accuracy | RAM | Use Case |
|---|---|---|---|---|
| tiny | 4x RT | Good | 1GB | Real-time, low-resource |
| base | 1x RT | Better | 1GB | Balanced (recommended) |
| small | 0.5x RT | High | 2GB | High accuracy needed |
| medium | 0.3x RT | Very High | 5GB | Maximum accuracy |
| large | 0.2x RT | Best | 10GB | Research/archival |
RT = Real-time (1x RT = 1 minute audio = 1 minute processing)
- Models downloaded and configured
- Build completed successfully with voice assistant enabled
- Configuration files updated with correct paths
- API authentication configured
- Storage paths configured for recordings
- Revision control enabled in ThemisDB
- Tested transcription with sample audio
- Tested synthesis with sample text
- Tested complete call recording pipeline
- Load testing completed
- Monitoring configured
- Backup strategy in place
Development:
- Whisper: base model
- Piper: single voice (English)
- CPU processing
Production:
- Whisper: small or medium model
- Piper: multiple voices (multi-language)
- GPU acceleration (if available)
- Load balancer for multiple instances
- Redis cache for frequent queries
For issues or questions:
- Documentation: Voice Assistant Guide
- Whisper.cpp: https://github.com/ggerganov/whisper.cpp
- Piper TTS: https://github.com/rhasspy/piper
- ThemisDB: GitHub Issues
Integration uses MIT-licensed libraries:
- Whisper.cpp: MIT License
- Piper TTS: MIT License
- ONNX Runtime: MIT License
See License Documentation for details.
ThemisDB v1.3.4 | GitHub | Documentation | Discussions | License
Last synced: January 02, 2026 | Commit: 6add659
Version: 1.3.0 | Stand: Dezember 2025
- Übersicht
- Home
- Dokumentations-Index
- Quick Reference
- Sachstandsbericht 2025
- Features
- Roadmap
- Ecosystem Overview
- Strategische Übersicht
- Geo/Relational Storage
- RocksDB Storage
- MVCC Design
- Transaktionen
- Time-Series
- Memory Tuning
- Chain of Thought Storage
- Query Engine & AQL
- AQL Syntax
- Explain & Profile
- Rekursive Pfadabfragen
- Temporale Graphen
- Zeitbereichs-Abfragen
- Semantischer Cache
- Hybrid Queries (Phase 1.5)
- AQL Hybrid Queries
- Hybrid Queries README
- Hybrid Query Benchmarks
- Subquery Quick Reference
- Subquery Implementation
- Content Pipeline
- Architektur-Details
- Ingestion
- JSON Ingestion Spec
- Enterprise Ingestion Interface
- Geo-Processor Design
- Image-Processor Design
- Hybrid Search Design
- Fulltext API
- Hybrid Fusion API
- Stemming
- Performance Tuning
- Migration Guide
- Future Work
- Pagination Benchmarks
- Enterprise README
- Scalability Features
- HTTP Client Pool
- Build Guide
- Implementation Status
- Final Report
- Integration Analysis
- Enterprise Strategy
- Verschlüsselungsstrategie
- Verschlüsselungsdeployment
- Spaltenverschlüsselung
- Encryption Next Steps
- Multi-Party Encryption
- Key Rotation Strategy
- Security Encryption Gap Analysis
- Audit Logging
- Audit & Retention
- Compliance Audit
- Compliance
- Extended Compliance Features
- Governance-Strategie
- Compliance-Integration
- Governance Usage
- Security/Compliance Review
- Threat Model
- Security Hardening Guide
- Security Audit Checklist
- Security Audit Report
- Security Implementation
- Development README
- Code Quality Pipeline
- Developers Guide
- Cost Models
- Todo Liste
- Tool Todo
- Core Feature Todo
- Priorities
- Implementation Status
- Roadmap
- Future Work
- Next Steps Analysis
- AQL LET Implementation
- Development Audit
- Sprint Summary (2025-11-17)
- WAL Archiving
- Search Gap Analysis
- Source Documentation Plan
- Changefeed README
- Changefeed CMake Patch
- Changefeed OpenAPI
- Changefeed OpenAPI Auth
- Changefeed SSE Examples
- Changefeed Test Harness
- Changefeed Tests
- Dokumentations-Inventar
- Documentation Summary
- Documentation TODO
- Documentation Gap Analysis
- Documentation Consolidation
- Documentation Final Status
- Documentation Phase 3
- Documentation Cleanup Validation
- API
- Authentication
- Cache
- CDC
- Content
- Geo
- Governance
- Index
- LLM
- Query
- Security
- Server
- Storage
- Time Series
- Transaction
- Utils
Vollständige Dokumentation: https://makr-code.github.io/ThemisDB/