Skip to content

FastAPI-based AI voice agent platform for intelligent phone automation with multi-LLM support, real-time transcription, and comprehensive analytics.

Notifications You must be signed in to change notification settings

ajitashwath/callmind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

44 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Call Mind: AI Telephony Based Agent Platform

A comprehensive, production-ready platform for creating, managing, and deploying AI-powered voice agents with support for multiple LLM providers, real-time voice synthesis, knowledge base integration, and intelligent call routing.

Quick Start

Prerequisites

  • Python 3.8+
  • MongoDB (local or cloud)
  • Redis (for call queuing and caching)
  • API Keys for:
    • OpenAI
    • Google Gemini
    • Anthropic Claude
    • ElevenLabs
    • Deepgram
    • Twilio
    • Serper API

Installation

git clone https://github.com/ajitashwath/callmind.git
cd call-mind

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

cp .env.example .env

Environment Configuration

Create a .env file in the project root with the following variables:

# Application
APP_NAME=AI Agent Platform
DEBUG=false
SECRET_KEY=your-secret-key-here
FASTAPI_PORT=3000

# Database
MONGODB_URL=mongodb://localhost:27017/ai_agents
REDIS_URL=redis://localhost:6379
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=

# LLM Providers
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=AIzaSy...
CLAUDE_API_KEY=sk-ant-...

# Voice Providers
ELEVENLABS_API_KEY=...
OPENAI_API_KEY=sk-...
CARTESIA_API_KEY=...

# Deepgram (Speech-to-Text & Pricing)
DEEPGRAM_API_KEY=...

# Telephony
TWILIO_ACCOUNT_SID=AC...
TWILIO_AUTH_TOKEN=...
WEBHOOK_BASE_URL=https://your-domain.com/agent

# ChromaDB (Vector Storage)
CHROMA_CLOUD_API_KEY=  # Optional - uses local by default
CHROMA_CLOUD_TENANT=
CHROMA_CLOUD_DATABASE=
CHROMA_PERSIST_DIR=./chroma_data

# JWT Authentication
JWT_ACCESS_SECRET=your-access-secret
JWT_REFRESH_SECRET=your-refresh-secret
JWT_ALGORITHM=HS256
JWT_ISSUER=jesty-crm
JWT_AUDIENCE=jesty-crm-users
BACKEND_API_URL=http://localhost:3000

# Search API
SERPER_API_KEY=...

Running the Application

python app/main.py

# Production
uvicorn app.main:app --host 0.0.0.0 --port 3000 --workers 4

The API will be available at http://localhost:3000 with interactive documentation at /agent/docs.

Table of Contents

Architecture Overview

AI Agent Platform
β”œβ”€β”€ Authentication & Authorization
β”œβ”€β”€ Agent Management
β”‚   β”œβ”€β”€ Multi-LLM Support (OpenAI, Gemini, Claude)
β”‚   β”œβ”€β”€ Voice Integration
β”‚   └── Template Management
β”œβ”€β”€ Conversation Management
β”‚   β”œβ”€β”€ Message Tracking
β”‚   β”œβ”€β”€ Cost Calculation
β”‚   └── Summarization
β”œβ”€β”€ Knowledge Base (RAG)
β”‚   β”œβ”€β”€ Multi-format File Support
β”‚   β”œβ”€β”€ Semantic Search
β”‚   └── ChromaDB Vector Storage
β”œβ”€β”€ Voice Services
β”‚   β”œβ”€β”€ Text-to-Speech (ElevenLabs, OpenAI, Cartesia)
β”‚   β”œβ”€β”€ Voice Cloning
β”‚   └── Voice Management
β”œβ”€β”€ Telephony System
β”‚   β”œβ”€β”€ Twilio Integration
β”‚   β”œβ”€β”€ Deepgram Real-time Transcription
β”‚   β”œβ”€β”€ Call Routing & Queuing
β”‚   └── WebSocket Streaming
└── Dashboard & Analytics
    β”œβ”€β”€ Call Metrics
    β”œβ”€β”€ Cost Tracking
    β”œβ”€β”€ Performance Analysis
    └── Agent Performance

Core Modules

1. Agents (app/agents/)

Create and manage AI agents with customizable configurations.

Key Features:

  • Multi-LLM provider support
  • Industry-specific templates
  • Voice configuration
  • Knowledge base integration
  • Auto-shift scheduling

Main Endpoints:

  • POST /api/agents - Create agent
  • GET /api/agents - List agents
  • PUT /api/agents/{agent_id} - Update agent
  • POST /api/agents/{agent_id}/test - Test agent
  • GET /api/agents/templates/industries - Browse templates

Example - Create an Agent:

curl -X POST http://localhost:3000/api/agents \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Sales Assistant",
    "description": "AI-powered sales representative",
    "config": {
      "llm_provider": "openai",
      "model": "gpt-4o-mini",
      "temperature": 0.7,
      "max_tokens": 1000,
      "system_prompt": "You are a friendly sales representative...",
      "first_message": "Hello! How can I help you today?",
      "voice_provider": "elevenlabs",
      "voice_id": "rachel",
      "max_conversation_turns": 10
    }
  }'

2. Conversations (app/conversations/)

Manage conversation lifecycle, messages, and analytics.

Key Features:

  • Session-based conversation management
  • Message and event tracking
  • Multi-provider cost calculation
  • AI-powered summarization
  • Advanced filtering

Main Endpoints:

  • POST /api/conversations - Create conversation
  • GET /api/conversations - List conversations
  • GET /api/conversations/{id} - Get conversation details
  • GET /api/conversations/{id}/summary - Get AI summary
  • PATCH /api/conversations/{id}/metrics - Update metrics

Example - Create Conversation:

curl -X POST http://localhost:3000/api/conversations \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "agent_abc123",
    "session_id": "session_xyz789",
    "from_number": "+1234567890",
    "to_number": "+0987654321"
  }'

3. Knowledge Base (app/knowledge/)

RAG-powered knowledge management with semantic search.

Supported Formats:

  • PDF documents
  • Word documents (.docx, .doc)
  • Excel spreadsheets (.xlsx, .xls)
  • CSV files
  • Plain text files
  • Website content (URLs)

Main Endpoints:

  • POST /api/knowledge/create - Create knowledge base
  • GET /api/knowledge/ - List knowledge bases
  • POST /api/knowledge/{kb_id}/add-text - Add text
  • POST /api/knowledge/{kb_id}/add-file - Upload file
  • POST /api/knowledge/{kb_id}/add-website - Add website
  • GET /api/knowledge/{kb_id}/search - Search knowledge base

Example - Create & Search Knowledge Base:

# Create knowledge base
KB_ID=$(curl -X POST http://localhost:3000/api/knowledge/create \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{"name":"Company Docs","description":"Internal knowledge"}' \
  | jq -r '.kb_id')

# Add document
curl -X POST http://localhost:3000/api/knowledge/$KB_ID/add-text \
  -H "Authorization: Bearer <token>" \
  -F "content=Our business hours are 9 AM to 5 PM EST"

# Search
curl -X GET "http://localhost:3000/api/knowledge/$KB_ID/search?query=business%20hours&top_k=5" \
  -H "Authorization: Bearer <token>"

4. Voice Services (app/voice/)

Multi-provider text-to-speech and voice cloning.

Supported Providers:

  • ElevenLabs (advanced TTS with cloning)
  • OpenAI TTS (standard and HD)
  • Cartesia (Sonic models)

Main Endpoints:

  • POST /api/voices/synthesize - Convert text to speech
  • POST /api/voices/clone - Create cloned voice
  • GET /api/voices - List voices
  • GET /api/voices/search - Search voices
  • PUT /api/voices/{voice_id}/settings - Update settings
  • GET /api/voices/test - Test provider connections

Example - Synthesize Speech:

curl -X POST http://localhost:3000/api/voices/synthesize \
  -H "Authorization: Bearer <token>" \
  -F "text=Hello, this is a test" \
  -F "voice_id=rachel" \
  -F "provider=elevenlabs" \
  -F "stability=0.7" \
  -F "similarity_boost=0.8" \
  --output speech.mp3

5. Telephony (app/telephony/)

Complete voice call management with real-time transcription.

Key Features:

  • Twilio integration for call routing
  • Deepgram real-time transcription
  • WebSocket streaming
  • Call queuing with scheduling
  • Metrics tracking and cost calculation

Main Endpoints:

  • POST /api/telephony/calls/outbound - Make outbound call
  • POST /api/telephony/calls/outbound/streaming - Stream call
  • GET /api/telephony/calls/{call_sid} - Get call status
  • POST /api/telephony/calls/{call_sid}/hangup - End call
  • WS /api/telephony/twilio/stream/{agent_id} - WebSocket stream

Example - Make Outbound Call:

curl -X POST http://localhost:3000/api/telephony/calls/outbound \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "agent_abc123",
    "to_number": "+1234567890",
    "from_number": "+0987654321"
  }'

6. Dashboard (app/dashboard/)

Comprehensive analytics and performance monitoring.

Main Endpoints:

  • GET /api/dashboard/stats - Aggregate statistics
  • GET /api/dashboard/calls/analytics - Call details
  • GET /api/dashboard/agents/{agent_id}/metrics - Agent metrics
  • GET /api/dashboard/costs/breakdown - Cost analysis
  • GET /api/dashboard/performance/trends - Performance trends
  • GET /api/dashboard/calls/{call_id}/summary - Call summary

Query Parameters (most endpoints):

  • time_range: 24h, 7d, 30d (default), 90d, custom
  • from_date, to_date: ISO format dates for custom range
  • agent_ids: Comma-separated agent IDs
  • call_status: Filter by status
  • min_duration, max_duration: Duration filters
  • min_rating: Minimum satisfaction rating

Example - Get Dashboard Stats:

curl "http://localhost:3000/api/dashboard/stats?time_range=7d" \
  -H "Authorization: Bearer <token>"

🌐 API Endpoints

Authentication

POST   /api/auth/login              - Login and get tokens
POST   /api/auth/refresh            - Refresh access token
POST   /api/auth/logout             - Logout
GET    /api/auth/me                 - Get current user info

Agents

GET    /api/agents                  - List agents
POST   /api/agents                  - Create agent
GET    /api/agents/{agent_id}       - Get agent details
PUT    /api/agents/{agent_id}       - Update agent
DELETE /api/agents/{agent_id}       - Delete agent
POST   /api/agents/{agent_id}/test  - Test agent
GET    /api/agents/{agent_id}/start - Start conversation
GET    /api/agents/templates/*      - Template management
GET    /api/agents/models           - Available models
GET    /api/agents/models/pricing   - Model pricing

Conversations

GET    /api/conversations           - List conversations
POST   /api/conversations           - Create conversation
GET    /api/conversations/{id}      - Get conversation
GET    /api/conversations/{id}/summary
GET    /api/conversations/{id}/metadata
GET    /api/conversations/{id}/stats
PATCH  /api/conversations/{id}/metrics
POST   /api/conversations/{id}/calculate-costs
POST   /api/conversations/{id}/events

Knowledge Base

POST   /api/knowledge/create        - Create KB
GET    /api/knowledge/              - List KBs
DELETE /api/knowledge/{kb_id}       - Delete KB
POST   /api/knowledge/{kb_id}/add-text
POST   /api/knowledge/{kb_id}/add-file
POST   /api/knowledge/{kb_id}/add-website
GET    /api/knowledge/{kb_id}/search
POST   /api/knowledge/{kb_id}/associate-agents
GET    /api/knowledge/{kb_id}/agents

Voice Services

GET    /api/voices                  - List voices
GET    /api/voices/search           - Search voices
GET    /api/voices/trending         - Trending voices
GET    /api/voices/{voice_id}       - Get voice details
POST   /api/voices/synthesize       - Text to speech
POST   /api/voices/clone            - Clone voice
DELETE /api/voices/{voice_id}       - Delete voice
GET    /api/voices/test             - Test providers

Telephony

POST   /api/telephony/calls/outbound
POST   /api/telephony/calls/outbound/streaming
GET    /api/telephony/calls/{call_sid}
POST   /api/telephony/calls/{call_sid}/hangup
GET    /api/telephony/calls/{call_sid}/status
GET    /api/telephony/phone-numbers/available
POST   /api/telephony/phone-numbers/buy
WS     /api/telephony/twilio/stream/{agent_id}

Dashboard

GET    /api/dashboard/stats
GET    /api/dashboard/calls/analytics
GET    /api/dashboard/agents/{agent_id}/metrics
GET    /api/dashboard/costs/breakdown
GET    /api/dashboard/performance/trends
GET    /api/dashboard/calls/{call_id}/summary
GET    /api/dashboard/calls/summary

System

GET    /                            - API info
GET    /health                      - Health check
GET    /config                      - Configuration info

Features

Multi-LLM Support

Seamlessly switch between multiple AI providers:

  • OpenAI: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo
  • Google Gemini: Gemini 1.5 Pro, Gemini 1.5 Flash
  • Anthropic Claude: Claude 3 Sonnet, Claude 3 Opus

Voice Integration

Support for leading voice synthesis providers:

  • ElevenLabs: Professional TTS with voice cloning
  • OpenAI: Fast, reliable TTS
  • Cartesia: Advanced voice synthesis with Sonic models

Real-Time Transcription

  • Deepgram integration for speech-to-text
  • Real-time streaming via WebSocket
  • Multiple language support
  • Confidence scoring and interim results

Knowledge Base Management

  • Upload multiple file formats (PDF, Word, Excel, CSV, Text)
  • Fetch content from websites
  • Semantic search with ChromaDB
  • Automatic text chunking and embedding
  • Agent-specific knowledge base associations

Call Management

  • Outbound call initiation via Twilio
  • Real-time audio streaming
  • Automatic call queuing
  • Working hours scheduling
  • Call recording and metadata tracking

Analytics & Monitoring

  • Comprehensive call metrics
  • Cost tracking per call, agent, and time period
  • Performance trends and patterns
  • Customer satisfaction ratings
  • AI-powered conversation summaries

Industry Templates

Pre-built templates for:

  • Sales and Customer Service
  • Healthcare and Medical
  • Real Estate
  • Education
  • Financial Services
  • Technical Support
  • And more...

Each template includes customizable system prompts, personality traits, and operational guardrails.

Database Schema

MongoDB Collections

agents

{
  _id: ObjectId,
  name: String,
  description: String,
  userId: String,
  organizationId: String,
  phone_number: String,
  status: String,
  config: {
    llm_provider: String,
    model: String,
    temperature: Number,
    max_tokens: Number,
    system_prompt: String,
    voice_provider: String,
    voice_id: String,
    // ... more config fields
  },
  analytics: {
    total_calls: Number,
    successful_calls: Number,
    total_duration: Number,
    total_cost: Number
  },
  created_at: Date,
  updated_at: Date
}

conversations

{
  _id: ObjectId,
  agent_id: ObjectId|String,
  session_id: String,
  userId: String,
  messages: [{
    role: String,
    content: String,
    timestamp: Date,
    metadata: Object
  }],
  call_metadata: {
    call_sid: String,
    from_number: String,
    to_number: String,
    duration: Number,
    status: String,
    recording_url: String,
    costs: {
      llm_cost: Number,
      voice_cost: Number,
      telephony_cost: Number,
      total_cost: Number
    }
  },
  summary: String,
  evaluation_score: Number,
  created_at: Date
}

knowledge_bases

{
  _id: ObjectId,
  name: String,
  description: String,
  owner_id: String,
  associated_agents: [String],
  document_count: Number,
  collection_name: String,
  created_at: Date
}

voices

{
  _id: ObjectId,
  name: String,
  voice_id: String,
  description: String,
  category: String,
  gender: String,
  language: String,
  is_custom: Boolean,
  userId: String,
  provider: String,
  settings: {
    stability: Number,
    similarity_boost: Number,
    style: Number
  },
  usage_statistics: {
    usage_count: Number,
    total_characters: Number
  },
  created_at: Date
}

ChromaDB Collections

Knowledge bases are stored as ChromaDB collections with:

  • Document embeddings (vector format)
  • Text content chunks
  • Metadata (source, file type, chunk index)
  • Similarity scores for retrieval

Deployment

Docker Deployment

FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "3000"]

Build and run:

docker build -t ai-agent-platform .
docker run -p 3000:3000 --env-file .env ai-agent-platform

Docker Compose

version: '3.8'

services:
  api:
    build: .
    ports:
      - "3000:3000"
    environment:
      - MONGODB_URL=mongodb://mongo:27017/ai_agents
      - REDIS_URL=redis://redis:6379
    depends_on:
      - mongo
      - redis
    
  mongo:
    image: mongo:5.0
    ports:
      - "27017:27017"
    volumes:
      - mongo_data:/data/db
    
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

volumes:
  mongo_data:
  redis_data:

Run with:

docker-compose up -d

Environment-Specific Configurations

Development:

DEBUG=true
FASTAPI_PORT=3000
JWT_ACCESS_SECRET=dev-secret

Production:

DEBUG=false
FASTAPI_PORT=3000
# Use strong secrets and secure URLs
JWT_ACCESS_SECRET=<generate-strong-secret>
JWT_REFRESH_SECRET=<generate-strong-secret>
WEBHOOK_BASE_URL=https://your-production-domain.com

Security Best Practices

  1. API Keys: Store in environment variables, never commit to version control
  2. CORS: Configure appropriately for your domain
  3. JWT Secrets: Use strong, randomly generated secrets
  4. HTTPS: Always use HTTPS in production
  5. Rate Limiting: Implement at load balancer level
  6. Database: Use authentication and run in private network
  7. Logging: Avoid logging sensitive information

πŸ‘¨β€πŸ’» Development

Project Structure

app/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ config.py              # Configuration management
β”œβ”€β”€ database.py            # Database initialization
β”œβ”€β”€ main.py               # FastAPI app setup
β”œβ”€β”€ agents/               # Agent management module
β”œβ”€β”€ conversations/        # Conversation management
β”œβ”€β”€ knowledge/            # Knowledge base (RAG)
β”œβ”€β”€ voice/                # Voice services
β”œβ”€β”€ telephony/            # Telephony system
β”œβ”€β”€ dashboard/            # Analytics dashboard
β”œβ”€β”€ auth/                 # Authentication
└── data/                 # Data files (pricing, etc.)

Running Tests

pip install pytest pytest-asyncio httpx

pytest tests/ -v

pytest tests/ --cov=app --cov-report=html

Contributing

  1. Create a feature branch: git checkout -b feature/my-feature
  2. Make changes and commit: git commit -am 'Add feature'
  3. Push to branch: git push origin feature/my-feature
  4. Submit a pull request

Code Style

  • Use Black for formatting: black app/
  • Use isort for imports: isort app/
  • Lint with Flake8: flake8 app/

Debugging

Enable verbose logging:

logging.basicConfig(level=logging.DEBUG)

Access debug endpoints:

  • /api/telephony/debug/agent/{agent_id}
  • /api/telephony/debug/validate-numbers
  • /api/telephony/debug/call-state/{call_sid}

Troubleshooting

MongoDB Connection Issues

mongosh --eval "db.adminCommand('ping')"

# Verify connection string in .env
# Default: mongodb://localhost:27017/ai_agents

Redis Connection Issues

redis-cli ping

# Verify connection string
# Default: redis://localhost:6379

ChromaDB Issues

ls -la ./chroma_data

# For Chroma Cloud, verify credentials
# Ensure CHROMA_CLOUD_API_KEY, CHROMA_CLOUD_TENANT, CHROMA_CLOUD_DATABASE are set

Twilio Integration Issues

  1. Verify TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN
  2. Ensure phone numbers are verified in Twilio Console
  3. Check webhook URL is accessible: WEBHOOK_BASE_URL
  4. Review Twilio logs for error details

API Key Issues

curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Common Errors

Error Solution
"API key not configured" Check environment variables
"Connection refused" Verify MongoDB/Redis running
"Phone number not verified" Add to Twilio verified caller IDs
"Knowledge base not found" Verify KB ID and user ownership
"Voice not found" Check voice provider configuration

Additional Resources

Support

For issues, questions, or feature requests:

  1. Check existing GitHub issues
  2. Review documentation and troubleshooting guide
  3. Enable debug logging and check application logs
  4. Contact the development team

Acknowledgments

Built with:

  • FastAPI
  • MongoDB & Motor
  • ChromaDB
  • Twilio
  • Deepgram
  • OpenAI, Google Gemini, Anthropic Claude
  • ElevenLabs, Cartesia
  • And many more open-source libraries

About

FastAPI-based AI voice agent platform for intelligent phone automation with multi-LLM support, real-time transcription, and comprehensive analytics.

Resources

Stars

Watchers

Forks