An AI-powered voice conversation platform that transforms PDF and PowerPoint documents into interactive learning experiences through intelligent voice-based tutoring.
Soundora enables users to upload educational documents (PDF/PPTX) and engage in natural voice conversations with an AI tutor that understands the content and provides personalized explanations, answers questions, and guides learning through the material.
Want to try Soundora?
Since we're currently in development and haven't implemented a pricing model yet, the platform isn't publicly accessible to prevent unexpected AI usage costs. However, if you're interested in testing the platform and experiencing the voice-powered document tutoring capabilities, please reach out to me directly.
Contact for Demo Access: 📧 mohamednouichi06@gmail.com
I'll be happy to provide you with temporary access to explore the features, upload your documents, and experience the AI voice tutoring firsthand. When reaching out, please mention:
- Your intended use case
- Type of documents you'd like to test with
- Any specific features you're most interested in
- 📄 Document Processing: Upload and process PDF and PowerPoint files with advanced text extraction
- 🎤 Voice Conversations: Natural voice interactions powered by VAPI integration
- 🤖 AI Tutoring: Context-aware AI responses using Claude 3.5 Sonnet and GPT-4o-mini
- 📊 Real-time Analytics: Track learning progress and conversation insights
- 🔒 Secure Authentication: User management with Clerk integration
- 💾 Session Management: Persistent chat sessions with conversation history
- 📱 Responsive Design: Mobile-optimized interface with PWA capabilities
- Framework: Next.js 14 with TypeScript
- Styling: TailwindCSS + shadcn/ui components
- Authentication: Clerk
- Deployment: Vercel
- Database: Supabase (PostgreSQL)
- Voice AI: VAPI integration
- Document Processing: Python-based microservice on AWS App Runner
- LLM Integration: Anthropic Claude 3.5 Sonnet + OpenAI GPT-4o-mini
- Next.js 14 - React framework with App Router
- TypeScript - Type-safe development
- TailwindCSS - Utility-first CSS framework
- shadcn/ui - Modern component library
- Supabase - PostgreSQL database with real-time capabilities
- Clerk - Authentication and user management
- VAPI - Voice AI conversation platform
- AWS App Runner - Containerized Python service deployment
- Anthropic Claude 3.5 Sonnet - Primary LLM for tutoring
- OpenAI GPT-4o-mini - Fallback LLM for cost optimization
- VAPI Voice Pipeline - Speech-to-text and text-to-speech
- Python - Backend processing service
- PDF Libraries - Advanced PDF text extraction
- PPTX Processing - PowerPoint content extraction
- Intelligent Chunking - Context-aware document segmentation
-- User management
users (id, clerk_id, email, tier, subscription_status, preferences)
-- Document storage and processing
documents (id, user_id, filename, processed_content, chunk_count, metadata)
document_chunks (id, document_id, content, page_number, chunk_type)
-- Conversation management
chat_sessions (id, user_id, document_id, session_type, voice_minutes)
messages (id, session_id, role, content, audio_url, context_used)
-- Analytics and feedback
usage (id, user_id, tokens_used, audio_minutes, cost_estimate)
feedback (id, user_id, session_id, rating, comment)- Node.js 18+ and npm/yarn
- Supabase account and project
- Clerk account for authentication
- VAPI account for voice capabilities
- AWS account for document processing service
# Next.js App
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up
# Supabase
NEXT_PUBLIC_SUPABASE_URL=
NEXT_PUBLIC_SUPABASE_ANON_KEY=
SUPABASE_SERVICE_ROLE_KEY=
# VAPI
NEXT_PUBLIC_VAPI_PUBLIC_KEY=
VAPI_PRIVATE_KEY=
# AI Services
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
# Document Processing Service
DOCUMENT_PROCESSING_API_URL=
DOCUMENT_PROCESSING_API_KEY=
# Storage
NEXT_PUBLIC_STORAGE_BUCKET=- Upload: User uploads PDF/PPTX through Next.js interface
- Storage: File stored in Supabase storage with metadata
- Processing: Document sent to Python service on AWS App Runner
- Extraction: Advanced text extraction with structure preservation
- Chunking: Intelligent content segmentation for AI context
- Storage: Processed chunks stored in database
- Ready: Document available for voice conversations
- Session Init: Create chat session linked to processed document
- VAPI Connection: Establish voice connection with document context
- Voice Input: User speaks through microphone
- STT: Speech converted to text via VAPI
- AI Processing: Text + document context sent to Claude/GPT
- Response Generation: AI generates contextual response
- TTS: Response converted to speech
- Audio Output: AI response played to user
- Response Time: <2s average for document processing
- Voice Latency: <500ms for real-time conversations
- Scalability: Horizontal scaling with Vercel and AWS
- Caching: Redis caching for frequently accessed content
- Optimization: Lazy loading and code splitting for optimal performance
- VAPI for voice AI infrastructure
- Anthropic for Claude AI models
- Supabase for backend services
- Vercel for deployment platform
- Clerk for authentication services
For any questions, email mohamednouichi06@gmail.com
Soundora - Transforming documents into conversations, one voice at a time. 🎙️✨