Skip to content

SimoHypers/soundora

Repository files navigation

Soundora 🎙️📚

An AI-powered voice conversation platform that transforms PDF and PowerPoint documents into interactive learning experiences through intelligent voice-based tutoring.

🚀 Overview

Soundora enables users to upload educational documents (PDF/PPTX) and engage in natural voice conversations with an AI tutor that understands the content and provides personalized explanations, answers questions, and guides learning through the material.

🖥️ Demo

Want to try Soundora?

Since we're currently in development and haven't implemented a pricing model yet, the platform isn't publicly accessible to prevent unexpected AI usage costs. However, if you're interested in testing the platform and experiencing the voice-powered document tutoring capabilities, please reach out to me directly.

Contact for Demo Access: 📧 mohamednouichi06@gmail.com

I'll be happy to provide you with temporary access to explore the features, upload your documents, and experience the AI voice tutoring firsthand. When reaching out, please mention:

  • Your intended use case
  • Type of documents you'd like to test with
  • Any specific features you're most interested in

✨ Key Features

  • 📄 Document Processing: Upload and process PDF and PowerPoint files with advanced text extraction
  • 🎤 Voice Conversations: Natural voice interactions powered by VAPI integration
  • 🤖 AI Tutoring: Context-aware AI responses using Claude 3.5 Sonnet and GPT-4o-mini
  • 📊 Real-time Analytics: Track learning progress and conversation insights
  • 🔒 Secure Authentication: User management with Clerk integration
  • 💾 Session Management: Persistent chat sessions with conversation history
  • 📱 Responsive Design: Mobile-optimized interface with PWA capabilities

🏗️ Architecture

Frontend Stack

  • Framework: Next.js 14 with TypeScript
  • Styling: TailwindCSS + shadcn/ui components
  • Authentication: Clerk
  • Deployment: Vercel

Backend Services

  • Database: Supabase (PostgreSQL)
  • Voice AI: VAPI integration
  • Document Processing: Python-based microservice on AWS App Runner
  • LLM Integration: Anthropic Claude 3.5 Sonnet + OpenAI GPT-4o-mini

🛠️ Technology Stack

Core Technologies

  • Next.js 14 - React framework with App Router
  • TypeScript - Type-safe development
  • TailwindCSS - Utility-first CSS framework
  • shadcn/ui - Modern component library

Backend & Services

  • Supabase - PostgreSQL database with real-time capabilities
  • Clerk - Authentication and user management
  • VAPI - Voice AI conversation platform
  • AWS App Runner - Containerized Python service deployment

AI & ML

  • Anthropic Claude 3.5 Sonnet - Primary LLM for tutoring
  • OpenAI GPT-4o-mini - Fallback LLM for cost optimization
  • VAPI Voice Pipeline - Speech-to-text and text-to-speech

Document Processing

  • Python - Backend processing service
  • PDF Libraries - Advanced PDF text extraction
  • PPTX Processing - PowerPoint content extraction
  • Intelligent Chunking - Context-aware document segmentation

📊 Database Schema

Core Tables

-- User management
users (id, clerk_id, email, tier, subscription_status, preferences)

-- Document storage and processing
documents (id, user_id, filename, processed_content, chunk_count, metadata)
document_chunks (id, document_id, content, page_number, chunk_type)

-- Conversation management
chat_sessions (id, user_id, document_id, session_type, voice_minutes)
messages (id, session_id, role, content, audio_url, context_used)

-- Analytics and feedback
usage (id, user_id, tokens_used, audio_minutes, cost_estimate)
feedback (id, user_id, session_id, rating, comment)

🚀 Getting Started

Prerequisites

  • Node.js 18+ and npm/yarn
  • Supabase account and project
  • Clerk account for authentication
  • VAPI account for voice capabilities
  • AWS account for document processing service

Environment Variables

# Next.js App
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up

# Supabase
NEXT_PUBLIC_SUPABASE_URL=
NEXT_PUBLIC_SUPABASE_ANON_KEY=
SUPABASE_SERVICE_ROLE_KEY=

# VAPI
NEXT_PUBLIC_VAPI_PUBLIC_KEY=
VAPI_PRIVATE_KEY=

# AI Services
ANTHROPIC_API_KEY=
OPENAI_API_KEY=

# Document Processing Service
DOCUMENT_PROCESSING_API_URL=
DOCUMENT_PROCESSING_API_KEY=

# Storage
NEXT_PUBLIC_STORAGE_BUCKET=

🔄 Document Processing Pipeline

  1. Upload: User uploads PDF/PPTX through Next.js interface
  2. Storage: File stored in Supabase storage with metadata
  3. Processing: Document sent to Python service on AWS App Runner
  4. Extraction: Advanced text extraction with structure preservation
  5. Chunking: Intelligent content segmentation for AI context
  6. Storage: Processed chunks stored in database
  7. Ready: Document available for voice conversations

🎤 Voice Conversation Flow

  1. Session Init: Create chat session linked to processed document
  2. VAPI Connection: Establish voice connection with document context
  3. Voice Input: User speaks through microphone
  4. STT: Speech converted to text via VAPI
  5. AI Processing: Text + document context sent to Claude/GPT
  6. Response Generation: AI generates contextual response
  7. TTS: Response converted to speech
  8. Audio Output: AI response played to user

📊 Performance & Scalability

  • Response Time: <2s average for document processing
  • Voice Latency: <500ms for real-time conversations
  • Scalability: Horizontal scaling with Vercel and AWS
  • Caching: Redis caching for frequently accessed content
  • Optimization: Lazy loading and code splitting for optimal performance

🙏 Acknowledgments

  • VAPI for voice AI infrastructure
  • Anthropic for Claude AI models
  • Supabase for backend services
  • Vercel for deployment platform
  • Clerk for authentication services

📞 Questions

For any questions, email mohamednouichi06@gmail.com


Soundora - Transforming documents into conversations, one voice at a time. 🎙️✨

About

Al-Powered Voice Learning Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published