Edu Chatbot is a customer service chatbot application, created for education enrichment businesses to auto-reply to customer inquiries. It manages customer inquiries across multiple channels including websites, WhatsApp, WeChat, Telegram, and more.
Edu Chatbot combines AI technologies with human oversight to ensure customer satisfaction and improve sales conversion:
-
π€ Intelligent Interaction: Leverages on Retrieval-Augmented Generation (RAG) to respond to complex customer inquiries, customization according to business needs.
-
π Knowledge Base: Stores and indexes frequently asked questions (FAQs), course details, pricing information, and other business-critical data in a vector database for rapid, accurate retrieval.
-
π― Personalized Recommendations: Gathers relevant student information such as age and interests to recommend the relevant course.
-
π§ Intent Classification: Identifies customer needs to provide targeted responses.
-
π Sentiment Analysis: Detects customer satisfaction levels and able to escalate to human staff when reaching a pre-configured threshold.
-
π¨βπΌ Human-in-the-Loop Design:
Ensures quality customer service through a sophisticated handoff system that activates when: 1. A customer explicitly requests to speak with a human representative 2. The sentiment analysis module detects customer frustration or dissatisfaction 3. Staff members proactively choose to intervene via the support dashboard
-
π Seamless Handoff: Enables staff to take over conversations when needed and return control to the chatbot once complex issues are resolved.
-
π± Dual Interface: Features a comprehensive demonstration UI with customer-facing chat (left panel) and staff support dashboard (right panel) views.
Check out Edu Chatbot in action: YouTube
The diagram below illustrates the complete interaction flow demonstrated in the video:
flowchart TD
Start([Demo Start]) --> A
A["Customer: Inquiries about courses"] --> B
B["Chatbot: Intent Classification & Information Gathering"]
B --> C1["Chatbot: Asks customer about age of student"]
C1 --> C2["Customer: Provides age"]
B --> C3["Chatbot: Asks customer about interest of student"]
C3 --> C4["Customer: Shares interests"]
C2 --> D["Chatbot: Course Recommendation with details - Description, Teacher info, Pricing, Schedule"]
C4 --> D
D --> E["Customer: Expresses concern about price and requests discount"]
E --> E2["Chatbot: Not authorized to offer discounts"]
E2 --> F["Support Staff: Notices situation and clicks the Take Over button"]
F --> F2["Staff: Offers special discount"]
F2 --> G["Customer: Accepts discounted offer"]
G --> G1["Staff: Toggles back to chatbot"]
G1 --> G2["Chatbot: Proceeds with enrollment"]
G2 --> End([Enrollment Complete])
classDef customer fill:#f9d5e5,stroke:#333,color:#000
classDef chatbot fill:#e0f0ff,stroke:#333,color:#000
classDef staff fill:#d5f9e5,stroke:#333,color:#000
classDef endpoint fill:#f5f5f5,stroke:#333,color:#000
class A,C2,C4,E,G customer
class B,C1,C3,D,E2,G2 chatbot
class F,F2,G1 staff
class Start,End endpoint
- Python version 3.12+
- Docker Desktop
- Clone the repository
git clone https://github.com/Jeanetted3v/edu_chatbot.git
cd edu-chatbot
- Configure environmental variables
cp .env.example .env
# Edit .env file with your API keys and configurations
- Start the application using Docker Compose
docker compose up --build
- Access the application
- Open in a web browser to interact with the User Interface with "dual interface" (port 8000)
- Or interact directly with the Backend (port 3000)
- Place your unstructured FAQ documents (PDF) and structured data Excel files in the /data/data_to_ingest folder
- In config/data_ingest.yaml, configure the paths under "local_docs" according to the file names and excel sheet names
local_doc:
paths:
- path: ./data/data_to_ingest/excel.xlsx
sheet: syn_data
- path: ./data/data_to_ingest/rag_qna.pdf
- In config/data_ingest.yaml, configure the chromadb collection name accordingly, default is set to "syn_data"
embedder:
similarity_metric: cosine
persist_dir: ./data/embeddings
collection: syn_data
vector_store: chromadb
- Or configure Google Drive access
- Generate and download Google Drive API credentials JSON file
- Place your credentials file in a secure location
- In config/data_ingest.yaml, configure the Google Drive settings:
gdrive:
credentials_path: /path/to/your/credentials.json # Path to your Google API credentials JSON file
gdrive_doc:
- file_id: abcd123efg456 # ID from Google Sheets URL
file_type: sheets # For Google Sheets documents
- file_id: abcd123efg456 # ID from Google Docs URL
file_type: docs # For Google Docs documents
# - file_id: your_drive_pdf_file_id_here
# file_type: pdf # Support for PDF files (coming soon)
- File IDs can be found in Google Drive URLs:
- For Google Sheets: https://docs.google.com/spreadsheets/d/FILE_ID_HERE/edit
- For Google Docs: https://docs.google.com/document/d/FILE_ID_HERE/edit
- For Drive files: https://drive.google.com/file/d/FILE_ID_HERE/view
π RAG or Long Context?
- In view of recent advancement in LLM's context window, this chatbot is set up to use LLM to retreive information if data is within a certain token count. If token count is over a certain number, we'll fall back to use RAG instead.
- Token count is set as a configurable parameter in config/data_ingest.yaml
π Loading documents from Local or Google Drive
- Education company can either load data into Google Drive or locally for both structured and unstructured data ingestion.
- This can be configured in config/data_ingest.yaml
βοΈ Chunking
- Langchain is used for chunking for the RAG pipeline.
- Currently support RecursiveCharacter and SemanticChunker, configurable in config/data_ingest.yaml
π Embedding & Vector Database
- Implements ChromaDB for lightweight, high-performance vector storage.
π€ Agentic RAG
- PydanticAI is used here for its simplicity and data valiadation feature.
- It is able to provide a more direct output, such as during intent classification process.
- For other LLM functions, plain vanilla OpenAI API is used for simplicity and flexibility.
πΎ Saved Chat History
- All chat histories are saved in MongoDB, which allows for tracing, further analysis and prompt enhancements.
π Evaluation
- Metrics include answer relevancy, faithfulness, context precision and answer correctness.
- Evaluation results are logged for continuous improvement of the system.
Multi-Channel Integration
- Implement direct integration with WhatsApp, WeChat, Telegram, and other messaging platforms
- Develop a unified API layer for consistent experience across all communication channels
- Enable channel-specific customizations while maintaining core functionality
Vector Database
- To support more types of vector database
LLM Models
- To support more LLM models
Edu_chatbot/
βββ assets/
βββ config/
βββ data/
β βββ raw/
β βββ embeddings/
βββ dockerfiles/
βββ src/
β βββ backend/
β β βββ api/
β β βββ chat/
β β βββ database/
β β βββ dataloaders/
β β βββ dataprocessor/
β β βββ evaluation/
β β βββ main/
β β βββ models/
β β βββ utils/
β β βββ websocket/
β βββ frontend/
β βββ src/
β βββ app/
β βββ components/
β βββ services/
β βββ page.tsx
βββ .dockerignore
βββ .env
βββ .gitignore
βββ docker-compose.yml
βββ README.md
βββ requirements.in
βββ requirements.txt
graph TD
A[Edu_chatbot] --> C[config]
A --> D[data]
A --> F[src]
D --> D1[data_to_ingest]
D --> D2[embeddings]
F --> F1[backend]
F --> F2[frontend]
F1 --> F1A[api]
F1 --> F1B[chat]
F1 --> F1C[database]
F1 --> F1D[dataloaders]
F1 --> F1E[dataprocessor]
F1 --> F1F[main]
F1 --> F1G[models]
F1 --> F1H[utils]
F1 --> F1I[websocket]
F1 --> F1J[evaluation]
F2 --> F2A[src]
F2A --> F2A1[app]
F2A1 --> F2A1A[components]
F2A1 --> F2A1B[services]
F2A1 --> F2A1C[page.tsx]
Edu_chatbot (Root)
.dockerignore
- Docker build exclusion patterns.env
- Environment variables.gitignore
- Git exclusion patternsdocker-compose.yml
- Docker Compose configurationREADME.md
- Project documentationrequirements.in
- Primary Python dependenciesrequirements.txt
- Pinned Python dependencies
assets/
- Project assets (images, static files, etc.)
config/
- Configuration files
data/
data_to_ingest/
- Raw data for ingestion
embeddings/
- Vector embeddings storage
dockerfiles/
- Docker configuration files
src/
backend/
- api/ - API endpoints
- chat/ - Chat functionality
- database/ - Database connections and models
- dataloaders/ - Data loading utilities
- dataprocessor/ - Data processing pipelines
- evaluation/ - Evaluation pipeline
- main/ - Application entry points
- models/ - ML/AI models
- utils/ - Utility functions
- websocket/ - WebSocket handlers
frontend/
src/
app/
- components/ - UI components
- services/ - Frontend api
- page.tsx - Main entry point to frontend components
π§ OpenAI: LLM provider for natural language understanding and generation
π PydanticAI: Agentic framework for data validation and structured outputs
βοΈ Langchain: Document processing and chunking
π Vadar: Sentiment analysis
π ChromaDB: Vector database for semantic search
πΎ MongoDB: Chat history storage and data persistence
π GoogleDriveAPI: Remote data access and integration
β‘ FastAPI: Backend API framework
βοΈ NodeJS/React: Frontend interface
π³ Docker: Containerization and deployment
π RAGAS: RAG evaluation framework for measuring relevancy, faithfulness and correctness
Contributions are welcome! Please feel free to submit a Pull Request.