An integrated voice assistant system that combines Asterisk PBX with AI to provide intelligent voice interactions. The system uses OpenAI for natural language processing and text-to-speech functionality.
- AI-Powered Voice Recognition: Uses OpenAI for processing and understanding voice commands
- Text-to-Speech: Text-to-speech conversion with multi-language support (including plain text TTS via
/speak) - Asterisk Integration: Full integration with Asterisk PBX for telephone calls
- RESTful API: FastAPI backend for easy extension and integration
- Docker Support: Fully containerized for easy installation and deployment
- Health Monitoring: Built-in health checks for monitoring
The system consists of the following components:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Asterisk PBX │───▶│ Voice Agent │───▶│ OpenAI API │
│ │ │ (Python) │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ AGI Scripts │ │ FastAPI │
│ │ │ Server │
└─────────────────┘ └─────────────────┘
- voice_agent.py: The main FastAPI application
- voice_bridge_fixed.agi: AGI script for communication with the voice agent
- voice_route.agi: AGI script for call routing
- Asterisk Dialplan: Call flow orchestration
- Docker & Docker Compose
- Asterisk PBX (installed and configured)
- OpenAI API Key
- Python 3.11+ (for local development)
git clone https://github.com/msolomos/voice-agent-asterisk
cd voice-agent-asteriskCreate a .env file in the root directory:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Application Configuration
APP_HOST=0.0.0.0
APP_PORT=5000
DEBUG=false
# Voice Configuration
DEFAULT_LANGUAGE=en
TTS_ENGINE=google
VOICE_TIMEOUT=10
MAX_RECORDING_TIME=30
# Asterisk Configuration
ASTERISK_AGI_PATH=/var/lib/asterisk/agi-bin/
TEMP_AUDIO_PATH=/tmp/# Build and start the container
docker-compose up -d
# Check logs
docker-compose logs -f voice-agent
# Check health status
curl http://localhost:5000/healthAdd the following dialplan to /etc/asterisk/extensions.conf:
[voice-agent-test]
exten => 997,1,Answer()
exten => 997,2,AGI(googletts.agi,"Hello, how can I help you?",en)
exten => 997,3,Wait(1)
exten => 997,4,Record(/tmp/voice_input_${UNIQUEID}.wav,3,10,q)
exten => 997,5,AGI(voice_bridge_fixed.agi,${UNIQUEID})
exten => 997,6,System(cp /tmp/voice_response_${UNIQUEID}.wav /tmp/final_response.wav)
exten => 997,7,Wait(1)
exten => 997,8,Playback(/tmp/final_response)
exten => 997,9,GotoIf($["${STAT(e,/tmp/final_response.wav)}" != "1"]?998,1)
exten => 997,10,AGI(voice_route.agi,${UNIQUEID})
exten => 997,11,Goto(ext-local,${ROUTE_EXT},1)
exten => 997,12,Hangup()Copy AGI scripts:
# Copy AGI scripts to Asterisk directory
sudo cp voice_bridge_fixed.agi /var/lib/asterisk/agi-bin/
sudo cp voice_route.agi /var/lib/asterisk/agi-bin/
sudo chmod +x /var/lib/asterisk/agi-bin/*.agi
# Restart Asterisk
sudo systemctl restart asterisk- Call extension
997 - Listen to the welcome message
- Speak when recording starts
- The system will process your voice and respond
# Health Check
GET http://localhost:5000/health
# Process Voice (for manual testing)
POST http://localhost:5000/process-voice
Content-Type: multipart/form-data
{
"audio_file": "voice_recording.wav",
"unique_id": "test123"
}
# Text-to-Speech from Text
POST http://localhost:5000/speak
Content-Type: application/json
{
"text": "Γειά σας, πώς μπορώ να σας βοηθήσω;",
"voice": "alloy", # optional
"model": "tts-1", # optional
"format": "mp3" # optional
}
Response:
Returns a binary MP3 audio stream.
This endpoint is ideal for pre-call prompts such as welcome messages or static announcements that do not require real-time voice analysis.
# Check container status
docker-compose ps
# Live logs
docker-compose logs -f
# Check API health
curl -f http://localhost:5000/health || echo "Service not healthy"Below is an example of expected logs when a call is processed using /speak and /process_audio:
voice-agent-1 | INFO:main:Initializing OpenAI client with key: sk-proj-... voice-agent-1 | INFO:main:OpenAI API connection verified successfully voice-agent-1 | INFO: Started server process [1] voice-agent-1 | INFO: Waiting for application startup. voice-agent-1 | INFO: Application startup complete. voice-agent-1 | INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit) voice-agent-1 | INFO: 192.168.2.2:56721 - "POST /speak HTTP/1.1" 200 OK voice-agent-1 | INFO: 127.0.0.1:44230 - "GET /health HTTP/1.1" 200 OK voice-agent-1 | INFO:main:Processing with OpenAI... voice-agent-1 | INFO:main:OpenAI Transcribed: Γεια σου, θα ήθελα να μιλήσω με το λογιστήριο, παρακαλώ. voice-agent-1 | INFO:main:GPT Response: json voice-agent-1 | { voice-agent-1 | "intent": "accounting", voice-agent-1 | "confidence": 0.9, voice-agent-1 | "response": "Σας συνδέω με το λογιστήριο μας. Περιμένετε λίγο.", voice-agent-1 | "name": null voice-agent-1 | } voice-agent-1 | voice-agent-1 | INFO:main:TTS audio saved as MP3: /tmp/tmpxmga3m21_response.mp3 voice-agent-1 | INFO:main:Returning audio file: /tmp/tmpxmga3m21_response.mp3 (exists: True) voice-agent-1 | INFO: 192.168.2.2:56722 - "POST /process_audio HTTP/1.1" 200 OK
These logs confirm that:
/speakendpoint returns an MP3 audio stream successfully/process_audioprocesses voice input and generates a TTS response- OpenAI API is working correctly
- Intent is detected and response is synthesized dynamically
voice-agent/
├── voice_agent.py # Main FastAPI application
├── voice_bridge_fixed.agi # AGI bridge script
├── voice_route.agi # AGI routing script
├── requirements.txt # Python dependencies
├── Dockerfile # Docker build configuration
├── docker-compose.yml # Docker orchestration
├── .env.example # Environment variables template
├── README.md # This file
├── logs/ # Application logs
├── temp/ # Temporary audio files
└── scripts/ # Additional utility scripts
-
Container won't start:
# Check logs for errors docker-compose logs voice-agent # Rebuild container docker-compose down docker-compose build --no-cache docker-compose up -d
-
Audio processing errors:
- Ensure OpenAI API key is correct
- Check permissions on
/tmp/directory - Verify audio files are in correct format
-
AGI Scripts not executing:
# Check permissions sudo chmod +x /var/lib/asterisk/agi-bin/*.agi # Check Asterisk logs sudo tail -f /var/log/asterisk/full
-
API connectivity issues:
# Test network connectivity docker exec voice-agent curl -I http://localhost:5000/health # Check port binding netstat -tulpn | grep :5000
For more debugging information:
# Enable debug mode
echo "DEBUG=true" >> .env
docker-compose restart voice-agent
# Monitor detailed logs
docker-compose logs -f voice-agent- Environment Variables: Never commit
.envfile with real API keys - Network Security: Use reverse proxy (nginx) for HTTPS
- Access Control: Restrict API access
- Monitoring: Set up monitoring for production use
version: '3.8'
services:
voice-agent:
build: .
restart: always
environment:
- PYTHONUNBUFFERED=1
env_file:
- .env.production
volumes:
- ./logs:/app/logs
- ./temp:/app/temp
networks:
- voice-network
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./ssl:/etc/ssl
depends_on:
- voice-agent
networks:
- voice-network
networks:
voice-network:
driver: bridgeCreate a .env.production file for production:
# Production OpenAI Configuration
OPENAI_API_KEY=your_production_openai_api_key
# Production Application Configuration
APP_HOST=0.0.0.0
APP_PORT=5000
DEBUG=false
# Production Voice Configuration
DEFAULT_LANGUAGE=en
TTS_ENGINE=google
VOICE_TIMEOUT=10
MAX_RECORDING_TIME=30
# Production Asterisk Configuration
ASTERISK_AGI_PATH=/var/lib/asterisk/agi-bin/
TEMP_AUDIO_PATH=/tmp/
# Additional production settings
LOG_LEVEL=INFO
MAX_CONCURRENT_CALLS=10
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_PERIOD=60The system supports multiple languages for both speech recognition and text-to-speech:
- English (en): Default language
- Greek (el): Ελληνική υποστήριξη
- Spanish (es): Soporte en español
- French (fr): Support français
- German (de): Deutsche Unterstützung
# Set default language in .env
DEFAULT_LANGUAGE=en
# Or configure per-call in dialplan
exten => 997,2,AGI(googletts.agi,"Hello, how can I help you?",en)
exten => 998,2,AGI(googletts.agi,"Γειά σας, πως μπορώ να βοηθήσω;",el)The application includes comprehensive health monitoring:
# Basic health check
curl http://localhost:5000/health
# Detailed health information
curl http://localhost:5000/health/detailedLogs are stored in the ./logs/ directory:
voice_agent.log: Main application logserror.log: Error-specific logsaccess.log: API access logs
For production monitoring, consider integrating:
- Prometheus: Metrics collection
- Grafana: Visualization
- ELK Stack: Log aggregation
- Sentry: Error tracking
GET /healthReturns service health status.
POST /process-voice
Content-Type: multipart/form-dataParameters:
audio_file(file): Audio file in WAV formatunique_id(string): Unique identifier for the calllanguage(string, optional): Language code (default: en)
Response:
{
"status": "success",
"unique_id": "test123",
"response_file": "/tmp/voice_response_test123.wav",
"transcript": "Hello, how are you?",
"response_text": "I'm doing well, thank you for asking!"
}GET /call-status/{unique_id}Returns the status of a specific call.
We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Clone the repository
git clone https://github.com/msolomos/voice-agent-asterisk
cd voice-agent
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run locally
python voice_agent.py- Follow PEP 8 for Python code
- Use meaningful variable names
- Add docstrings for functions and classes
- Include type hints where appropriate
This project is licensed under the MIT License - see the LICENSE file for details.
For support and questions:
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: msolomos2@gmail.com
- Be respectful and inclusive
- Provide detailed information when reporting issues
- Search existing issues before creating new ones
- Use clear and descriptive titles
- Multi-tenant Support: Support for multiple organizations
- Advanced Analytics: Call analytics and reporting
- Voice Biometrics: Speaker identification and verification
- Webhook Support: Integration with external systems
- GUI Dashboard: Web-based management interface
- Load Balancing: Support for multiple voice agent instances
- Custom Voice Models: Integration with custom TTS models
- v1.0.0: Initial release with basic voice processing
- v1.1.0: Added multi-language support
- v1.2.0: Docker containerization
- v1.3.0: Health monitoring and logging improvements
- v1.4.0: Added
/speakendpoint for OpenAI TTS from plain text
- OpenAI for the API
- Asterisk for PBX functionality
- FastAPI for the web framework
- The open source community for tools and libraries
Minimum:
- 2 CPU cores
- 4GB RAM
- 10GB storage
- Docker support
Recommended:
- 4+ CPU cores
- 8GB+ RAM
- 50GB+ storage
- SSD storage
- Response Time: < 2 seconds average
- Concurrent Calls: Up to 10 simultaneous calls
- Uptime: 99.9% availability target
- Audio Quality: 16kHz, 16-bit WAV processing
⭐ If you find this project useful, please give it a star!
🔗 Connect with us: