A comprehensive, production-ready platform for creating, managing, and deploying AI-powered voice agents with support for multiple LLM providers, real-time voice synthesis, knowledge base integration, and intelligent call routing.
- Python 3.8+
- MongoDB (local or cloud)
- Redis (for call queuing and caching)
- API Keys for:
- OpenAI
- Google Gemini
- Anthropic Claude
- ElevenLabs
- Deepgram
- Twilio
- Serper API
git clone https://github.com/ajitashwath/callmind.git
cd call-mind
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
Create a .env file in the project root with the following variables:
# Application
APP_NAME=AI Agent Platform
DEBUG=false
SECRET_KEY=your-secret-key-here
FASTAPI_PORT=3000
# Database
MONGODB_URL=mongodb://localhost:27017/ai_agents
REDIS_URL=redis://localhost:6379
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=
# LLM Providers
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIzaSy...
CLAUDE_API_KEY=sk-ant-...
# Voice Providers
ELEVENLABS_API_KEY=...
OPENAI_API_KEY=sk-...
CARTESIA_API_KEY=...
# Deepgram (Speech-to-Text & Pricing)
DEEPGRAM_API_KEY=...
# Telephony
TWILIO_ACCOUNT_SID=AC...
TWILIO_AUTH_TOKEN=...
WEBHOOK_BASE_URL=https://your-domain.com/agent
# ChromaDB (Vector Storage)
CHROMA_CLOUD_API_KEY= # Optional - uses local by default
CHROMA_CLOUD_TENANT=
CHROMA_CLOUD_DATABASE=
CHROMA_PERSIST_DIR=./chroma_data
# JWT Authentication
JWT_ACCESS_SECRET=your-access-secret
JWT_REFRESH_SECRET=your-refresh-secret
JWT_ALGORITHM=HS256
JWT_ISSUER=jesty-crm
JWT_AUDIENCE=jesty-crm-users
BACKEND_API_URL=http://localhost:3000
# Search API
SERPER_API_KEY=...python app/main.py
# Production
uvicorn app.main:app --host 0.0.0.0 --port 3000 --workers 4The API will be available at http://localhost:3000 with interactive documentation at /agent/docs.
- Architecture Overview
- Core Modules
- API Endpoints
- Features
- Database Schema
- Deployment
- Development
- Troubleshooting
AI Agent Platform
βββ Authentication & Authorization
βββ Agent Management
β βββ Multi-LLM Support (OpenAI, Gemini, Claude)
β βββ Voice Integration
β βββ Template Management
βββ Conversation Management
β βββ Message Tracking
β βββ Cost Calculation
β βββ Summarization
βββ Knowledge Base (RAG)
β βββ Multi-format File Support
β βββ Semantic Search
β βββ ChromaDB Vector Storage
βββ Voice Services
β βββ Text-to-Speech (ElevenLabs, OpenAI, Cartesia)
β βββ Voice Cloning
β βββ Voice Management
βββ Telephony System
β βββ Twilio Integration
β βββ Deepgram Real-time Transcription
β βββ Call Routing & Queuing
β βββ WebSocket Streaming
βββ Dashboard & Analytics
βββ Call Metrics
βββ Cost Tracking
βββ Performance Analysis
βββ Agent Performance
Create and manage AI agents with customizable configurations.
Key Features:
- Multi-LLM provider support
- Industry-specific templates
- Voice configuration
- Knowledge base integration
- Auto-shift scheduling
Main Endpoints:
POST /api/agents- Create agentGET /api/agents- List agentsPUT /api/agents/{agent_id}- Update agentPOST /api/agents/{agent_id}/test- Test agentGET /api/agents/templates/industries- Browse templates
Example - Create an Agent:
curl -X POST http://localhost:3000/api/agents \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"name": "Sales Assistant",
"description": "AI-powered sales representative",
"config": {
"llm_provider": "openai",
"model": "gpt-4o-mini",
"temperature": 0.7,
"max_tokens": 1000,
"system_prompt": "You are a friendly sales representative...",
"first_message": "Hello! How can I help you today?",
"voice_provider": "elevenlabs",
"voice_id": "rachel",
"max_conversation_turns": 10
}
}'Manage conversation lifecycle, messages, and analytics.
Key Features:
- Session-based conversation management
- Message and event tracking
- Multi-provider cost calculation
- AI-powered summarization
- Advanced filtering
Main Endpoints:
POST /api/conversations- Create conversationGET /api/conversations- List conversationsGET /api/conversations/{id}- Get conversation detailsGET /api/conversations/{id}/summary- Get AI summaryPATCH /api/conversations/{id}/metrics- Update metrics
Example - Create Conversation:
curl -X POST http://localhost:3000/api/conversations \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"agent_id": "agent_abc123",
"session_id": "session_xyz789",
"from_number": "+1234567890",
"to_number": "+0987654321"
}'RAG-powered knowledge management with semantic search.
Supported Formats:
- PDF documents
- Word documents (.docx, .doc)
- Excel spreadsheets (.xlsx, .xls)
- CSV files
- Plain text files
- Website content (URLs)
Main Endpoints:
POST /api/knowledge/create- Create knowledge baseGET /api/knowledge/- List knowledge basesPOST /api/knowledge/{kb_id}/add-text- Add textPOST /api/knowledge/{kb_id}/add-file- Upload filePOST /api/knowledge/{kb_id}/add-website- Add websiteGET /api/knowledge/{kb_id}/search- Search knowledge base
Example - Create & Search Knowledge Base:
# Create knowledge base
KB_ID=$(curl -X POST http://localhost:3000/api/knowledge/create \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"name":"Company Docs","description":"Internal knowledge"}' \
| jq -r '.kb_id')
# Add document
curl -X POST http://localhost:3000/api/knowledge/$KB_ID/add-text \
-H "Authorization: Bearer <token>" \
-F "content=Our business hours are 9 AM to 5 PM EST"
# Search
curl -X GET "http://localhost:3000/api/knowledge/$KB_ID/search?query=business%20hours&top_k=5" \
-H "Authorization: Bearer <token>"Multi-provider text-to-speech and voice cloning.
Supported Providers:
- ElevenLabs (advanced TTS with cloning)
- OpenAI TTS (standard and HD)
- Cartesia (Sonic models)
Main Endpoints:
POST /api/voices/synthesize- Convert text to speechPOST /api/voices/clone- Create cloned voiceGET /api/voices- List voicesGET /api/voices/search- Search voicesPUT /api/voices/{voice_id}/settings- Update settingsGET /api/voices/test- Test provider connections
Example - Synthesize Speech:
curl -X POST http://localhost:3000/api/voices/synthesize \
-H "Authorization: Bearer <token>" \
-F "text=Hello, this is a test" \
-F "voice_id=rachel" \
-F "provider=elevenlabs" \
-F "stability=0.7" \
-F "similarity_boost=0.8" \
--output speech.mp3Complete voice call management with real-time transcription.
Key Features:
- Twilio integration for call routing
- Deepgram real-time transcription
- WebSocket streaming
- Call queuing with scheduling
- Metrics tracking and cost calculation
Main Endpoints:
POST /api/telephony/calls/outbound- Make outbound callPOST /api/telephony/calls/outbound/streaming- Stream callGET /api/telephony/calls/{call_sid}- Get call statusPOST /api/telephony/calls/{call_sid}/hangup- End callWS /api/telephony/twilio/stream/{agent_id}- WebSocket stream
Example - Make Outbound Call:
curl -X POST http://localhost:3000/api/telephony/calls/outbound \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"agent_id": "agent_abc123",
"to_number": "+1234567890",
"from_number": "+0987654321"
}'Comprehensive analytics and performance monitoring.
Main Endpoints:
GET /api/dashboard/stats- Aggregate statisticsGET /api/dashboard/calls/analytics- Call detailsGET /api/dashboard/agents/{agent_id}/metrics- Agent metricsGET /api/dashboard/costs/breakdown- Cost analysisGET /api/dashboard/performance/trends- Performance trendsGET /api/dashboard/calls/{call_id}/summary- Call summary
Query Parameters (most endpoints):
time_range:24h,7d,30d(default),90d,customfrom_date,to_date: ISO format dates for custom rangeagent_ids: Comma-separated agent IDscall_status: Filter by statusmin_duration,max_duration: Duration filtersmin_rating: Minimum satisfaction rating
Example - Get Dashboard Stats:
curl "http://localhost:3000/api/dashboard/stats?time_range=7d" \
-H "Authorization: Bearer <token>"POST /api/auth/login - Login and get tokens
POST /api/auth/refresh - Refresh access token
POST /api/auth/logout - Logout
GET /api/auth/me - Get current user info
GET /api/agents - List agents
POST /api/agents - Create agent
GET /api/agents/{agent_id} - Get agent details
PUT /api/agents/{agent_id} - Update agent
DELETE /api/agents/{agent_id} - Delete agent
POST /api/agents/{agent_id}/test - Test agent
GET /api/agents/{agent_id}/start - Start conversation
GET /api/agents/templates/* - Template management
GET /api/agents/models - Available models
GET /api/agents/models/pricing - Model pricing
GET /api/conversations - List conversations
POST /api/conversations - Create conversation
GET /api/conversations/{id} - Get conversation
GET /api/conversations/{id}/summary
GET /api/conversations/{id}/metadata
GET /api/conversations/{id}/stats
PATCH /api/conversations/{id}/metrics
POST /api/conversations/{id}/calculate-costs
POST /api/conversations/{id}/events
POST /api/knowledge/create - Create KB
GET /api/knowledge/ - List KBs
DELETE /api/knowledge/{kb_id} - Delete KB
POST /api/knowledge/{kb_id}/add-text
POST /api/knowledge/{kb_id}/add-file
POST /api/knowledge/{kb_id}/add-website
GET /api/knowledge/{kb_id}/search
POST /api/knowledge/{kb_id}/associate-agents
GET /api/knowledge/{kb_id}/agents
GET /api/voices - List voices
GET /api/voices/search - Search voices
GET /api/voices/trending - Trending voices
GET /api/voices/{voice_id} - Get voice details
POST /api/voices/synthesize - Text to speech
POST /api/voices/clone - Clone voice
DELETE /api/voices/{voice_id} - Delete voice
GET /api/voices/test - Test providers
POST /api/telephony/calls/outbound
POST /api/telephony/calls/outbound/streaming
GET /api/telephony/calls/{call_sid}
POST /api/telephony/calls/{call_sid}/hangup
GET /api/telephony/calls/{call_sid}/status
GET /api/telephony/phone-numbers/available
POST /api/telephony/phone-numbers/buy
WS /api/telephony/twilio/stream/{agent_id}
GET /api/dashboard/stats
GET /api/dashboard/calls/analytics
GET /api/dashboard/agents/{agent_id}/metrics
GET /api/dashboard/costs/breakdown
GET /api/dashboard/performance/trends
GET /api/dashboard/calls/{call_id}/summary
GET /api/dashboard/calls/summary
GET / - API info
GET /health - Health check
GET /config - Configuration info
Seamlessly switch between multiple AI providers:
- OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
- Google Gemini: Gemini 1.5 Pro, Gemini 1.5 Flash
- Anthropic Claude: Claude 3 Sonnet, Claude 3 Opus
Support for leading voice synthesis providers:
- ElevenLabs: Professional TTS with voice cloning
- OpenAI: Fast, reliable TTS
- Cartesia: Advanced voice synthesis with Sonic models
- Deepgram integration for speech-to-text
- Real-time streaming via WebSocket
- Multiple language support
- Confidence scoring and interim results
- Upload multiple file formats (PDF, Word, Excel, CSV, Text)
- Fetch content from websites
- Semantic search with ChromaDB
- Automatic text chunking and embedding
- Agent-specific knowledge base associations
- Outbound call initiation via Twilio
- Real-time audio streaming
- Automatic call queuing
- Working hours scheduling
- Call recording and metadata tracking
- Comprehensive call metrics
- Cost tracking per call, agent, and time period
- Performance trends and patterns
- Customer satisfaction ratings
- AI-powered conversation summaries
Pre-built templates for:
- Sales and Customer Service
- Healthcare and Medical
- Real Estate
- Education
- Financial Services
- Technical Support
- And more...
Each template includes customizable system prompts, personality traits, and operational guardrails.
{
_id: ObjectId,
name: String,
description: String,
userId: String,
organizationId: String,
phone_number: String,
status: String,
config: {
llm_provider: String,
model: String,
temperature: Number,
max_tokens: Number,
system_prompt: String,
voice_provider: String,
voice_id: String,
// ... more config fields
},
analytics: {
total_calls: Number,
successful_calls: Number,
total_duration: Number,
total_cost: Number
},
created_at: Date,
updated_at: Date
}{
_id: ObjectId,
agent_id: ObjectId|String,
session_id: String,
userId: String,
messages: [{
role: String,
content: String,
timestamp: Date,
metadata: Object
}],
call_metadata: {
call_sid: String,
from_number: String,
to_number: String,
duration: Number,
status: String,
recording_url: String,
costs: {
llm_cost: Number,
voice_cost: Number,
telephony_cost: Number,
total_cost: Number
}
},
summary: String,
evaluation_score: Number,
created_at: Date
}{
_id: ObjectId,
name: String,
description: String,
owner_id: String,
associated_agents: [String],
document_count: Number,
collection_name: String,
created_at: Date
}{
_id: ObjectId,
name: String,
voice_id: String,
description: String,
category: String,
gender: String,
language: String,
is_custom: Boolean,
userId: String,
provider: String,
settings: {
stability: Number,
similarity_boost: Number,
style: Number
},
usage_statistics: {
usage_count: Number,
total_characters: Number
},
created_at: Date
}Knowledge bases are stored as ChromaDB collections with:
- Document embeddings (vector format)
- Text content chunks
- Metadata (source, file type, chunk index)
- Similarity scores for retrieval
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "3000"]Build and run:
docker build -t ai-agent-platform .
docker run -p 3000:3000 --env-file .env ai-agent-platformversion: '3.8'
services:
api:
build: .
ports:
- "3000:3000"
environment:
- MONGODB_URL=mongodb://mongo:27017/ai_agents
- REDIS_URL=redis://redis:6379
depends_on:
- mongo
- redis
mongo:
image: mongo:5.0
ports:
- "27017:27017"
volumes:
- mongo_data:/data/db
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
volumes:
mongo_data:
redis_data:Run with:
docker-compose up -dDevelopment:
DEBUG=true
FASTAPI_PORT=3000
JWT_ACCESS_SECRET=dev-secretProduction:
DEBUG=false
FASTAPI_PORT=3000
# Use strong secrets and secure URLs
JWT_ACCESS_SECRET=<generate-strong-secret>
JWT_REFRESH_SECRET=<generate-strong-secret>
WEBHOOK_BASE_URL=https://your-production-domain.com- API Keys: Store in environment variables, never commit to version control
- CORS: Configure appropriately for your domain
- JWT Secrets: Use strong, randomly generated secrets
- HTTPS: Always use HTTPS in production
- Rate Limiting: Implement at load balancer level
- Database: Use authentication and run in private network
- Logging: Avoid logging sensitive information
app/
βββ __init__.py
βββ config.py # Configuration management
βββ database.py # Database initialization
βββ main.py # FastAPI app setup
βββ agents/ # Agent management module
βββ conversations/ # Conversation management
βββ knowledge/ # Knowledge base (RAG)
βββ voice/ # Voice services
βββ telephony/ # Telephony system
βββ dashboard/ # Analytics dashboard
βββ auth/ # Authentication
βββ data/ # Data files (pricing, etc.)
pip install pytest pytest-asyncio httpx
pytest tests/ -v
pytest tests/ --cov=app --cov-report=html- Create a feature branch:
git checkout -b feature/my-feature - Make changes and commit:
git commit -am 'Add feature' - Push to branch:
git push origin feature/my-feature - Submit a pull request
- Use Black for formatting:
black app/ - Use isort for imports:
isort app/ - Lint with Flake8:
flake8 app/
Enable verbose logging:
logging.basicConfig(level=logging.DEBUG)Access debug endpoints:
/api/telephony/debug/agent/{agent_id}/api/telephony/debug/validate-numbers/api/telephony/debug/call-state/{call_sid}
mongosh --eval "db.adminCommand('ping')"
# Verify connection string in .env
# Default: mongodb://localhost:27017/ai_agentsredis-cli ping
# Verify connection string
# Default: redis://localhost:6379ls -la ./chroma_data
# For Chroma Cloud, verify credentials
# Ensure CHROMA_CLOUD_API_KEY, CHROMA_CLOUD_TENANT, CHROMA_CLOUD_DATABASE are set- Verify
TWILIO_ACCOUNT_SIDandTWILIO_AUTH_TOKEN - Ensure phone numbers are verified in Twilio Console
- Check webhook URL is accessible:
WEBHOOK_BASE_URL - Review Twilio logs for error details
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"| Error | Solution |
|---|---|
| "API key not configured" | Check environment variables |
| "Connection refused" | Verify MongoDB/Redis running |
| "Phone number not verified" | Add to Twilio verified caller IDs |
| "Knowledge base not found" | Verify KB ID and user ownership |
| "Voice not found" | Check voice provider configuration |
- Agents Module Documentation
- Conversations Module Documentation
- Knowledge Base Documentation
- Voice Services Documentation
- Telephony System Documentation
- Dashboard Documentation
For issues, questions, or feature requests:
- Check existing GitHub issues
- Review documentation and troubleshooting guide
- Enable debug logging and check application logs
- Contact the development team
Built with:
- FastAPI
- MongoDB & Motor
- ChromaDB
- Twilio
- Deepgram
- OpenAI, Google Gemini, Anthropic Claude
- ElevenLabs, Cartesia
- And many more open-source libraries