Soundora 🎙️📚

An AI-powered voice conversation platform that transforms PDF and PowerPoint documents into interactive learning experiences through intelligent voice-based tutoring.

🚀 Overview

Soundora enables users to upload educational documents (PDF/PPTX) and engage in natural voice conversations with an AI tutor that understands the content and provides personalized explanations, answers questions, and guides learning through the material.

🖥️ Demo

Want to try Soundora?

Since we're currently in development and haven't implemented a pricing model yet, the platform isn't publicly accessible to prevent unexpected AI usage costs. However, if you're interested in testing the platform and experiencing the voice-powered document tutoring capabilities, please reach out to me directly.

Contact for Demo Access: 📧 mohamednouichi06@gmail.com

I'll be happy to provide you with temporary access to explore the features, upload your documents, and experience the AI voice tutoring firsthand. When reaching out, please mention:

Your intended use case
Type of documents you'd like to test with
Any specific features you're most interested in

✨ Key Features

📄 Document Processing: Upload and process PDF and PowerPoint files with advanced text extraction
🎤 Voice Conversations: Natural voice interactions powered by VAPI integration
🤖 AI Tutoring: Context-aware AI responses using Claude 3.5 Sonnet and GPT-4o-mini
📊 Real-time Analytics: Track learning progress and conversation insights
🔒 Secure Authentication: User management with Clerk integration
💾 Session Management: Persistent chat sessions with conversation history
📱 Responsive Design: Mobile-optimized interface with PWA capabilities

🏗️ Architecture

Frontend Stack

Framework: Next.js 14 with TypeScript
Styling: TailwindCSS + shadcn/ui components
Authentication: Clerk
Deployment: Vercel

Backend Services

Database: Supabase (PostgreSQL)
Voice AI: VAPI integration
Document Processing: Python-based microservice on AWS App Runner
LLM Integration: Anthropic Claude 3.5 Sonnet + OpenAI GPT-4o-mini

🛠️ Technology Stack

Core Technologies

Next.js 14 - React framework with App Router
TypeScript - Type-safe development
TailwindCSS - Utility-first CSS framework
shadcn/ui - Modern component library

Backend & Services

Supabase - PostgreSQL database with real-time capabilities
Clerk - Authentication and user management
VAPI - Voice AI conversation platform
AWS App Runner - Containerized Python service deployment

AI & ML

Anthropic Claude 3.5 Sonnet - Primary LLM for tutoring
OpenAI GPT-4o-mini - Fallback LLM for cost optimization
VAPI Voice Pipeline - Speech-to-text and text-to-speech

Document Processing

Python - Backend processing service
PDF Libraries - Advanced PDF text extraction
PPTX Processing - PowerPoint content extraction
Intelligent Chunking - Context-aware document segmentation

📊 Database Schema

Core Tables

-- User management
users (id, clerk_id, email, tier, subscription_status, preferences)

-- Document storage and processing
documents (id, user_id, filename, processed_content, chunk_count, metadata)
document_chunks (id, document_id, content, page_number, chunk_type)

-- Conversation management
chat_sessions (id, user_id, document_id, session_type, voice_minutes)
messages (id, session_id, role, content, audio_url, context_used)

-- Analytics and feedback
usage (id, user_id, tokens_used, audio_minutes, cost_estimate)
feedback (id, user_id, session_id, rating, comment)

🚀 Getting Started

Prerequisites

Node.js 18+ and npm/yarn
Supabase account and project
Clerk account for authentication
VAPI account for voice capabilities
AWS account for document processing service

Environment Variables

# Next.js App
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up

# Supabase
NEXT_PUBLIC_SUPABASE_URL=
NEXT_PUBLIC_SUPABASE_ANON_KEY=
SUPABASE_SERVICE_ROLE_KEY=

# VAPI
NEXT_PUBLIC_VAPI_PUBLIC_KEY=
VAPI_PRIVATE_KEY=

# AI Services
ANTHROPIC_API_KEY=
OPENAI_API_KEY=

# Document Processing Service
DOCUMENT_PROCESSING_API_URL=
DOCUMENT_PROCESSING_API_KEY=

# Storage
NEXT_PUBLIC_STORAGE_BUCKET=

🔄 Document Processing Pipeline

Upload: User uploads PDF/PPTX through Next.js interface
Storage: File stored in Supabase storage with metadata
Processing: Document sent to Python service on AWS App Runner
Extraction: Advanced text extraction with structure preservation
Chunking: Intelligent content segmentation for AI context
Storage: Processed chunks stored in database
Ready: Document available for voice conversations

🎤 Voice Conversation Flow

Session Init: Create chat session linked to processed document
VAPI Connection: Establish voice connection with document context
Voice Input: User speaks through microphone
STT: Speech converted to text via VAPI
AI Processing: Text + document context sent to Claude/GPT
Response Generation: AI generates contextual response
TTS: Response converted to speech
Audio Output: AI response played to user

📊 Performance & Scalability

Response Time: <2s average for document processing
Voice Latency: <500ms for real-time conversations
Scalability: Horizontal scaling with Vercel and AWS
Caching: Redis caching for frequently accessed content
Optimization: Lazy loading and code splitting for optimal performance

🙏 Acknowledgments

VAPI for voice AI infrastructure
Anthropic for Claude AI models
Supabase for backend services
Vercel for deployment platform
Clerk for authentication services

📞 Questions

For any questions, email mohamednouichi06@gmail.com

Soundora - Transforming documents into conversations, one voice at a time. 🎙️✨

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
app		app
components		components
hooks		hooks
lib		lib
utils		utils
.gitignore		.gitignore
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
middleware.ts		middleware.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Soundora 🎙️📚

🚀 Overview

🖥️ Demo

✨ Key Features

🏗️ Architecture

Frontend Stack

Backend Services

🛠️ Technology Stack

Core Technologies

Backend & Services

AI & ML

Document Processing

📊 Database Schema

Core Tables

🚀 Getting Started

Prerequisites

Environment Variables

🔄 Document Processing Pipeline

🎤 Voice Conversation Flow

📊 Performance & Scalability

🙏 Acknowledgments

📞 Questions

About

Uh oh!

Releases

Packages

Languages

SimoHypers/soundora

Folders and files

Latest commit

History

Repository files navigation

Soundora 🎙️📚

🚀 Overview

🖥️ Demo

✨ Key Features

🏗️ Architecture

Frontend Stack

Backend Services

🛠️ Technology Stack

Core Technologies

Backend & Services

AI & ML

Document Processing

📊 Database Schema

Core Tables

🚀 Getting Started

Prerequisites

Environment Variables

🔄 Document Processing Pipeline

🎤 Voice Conversation Flow

📊 Performance & Scalability

🙏 Acknowledgments

📞 Questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages