Skip to content

A chatbot created for an education company. Features: build-in sentiment analyzer module, enabling transfer to human agent, RAG, multi-turn conversation, saving & retrieving chat history to MongoDB

License

Notifications You must be signed in to change notification settings

Jeanetted3v/edu_chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Edu Chatbot

RAG based chatbot, with human transfer mechanism

Edu Chatbot is a customer service chatbot application, created for education enrichment businesses to auto-reply to customer inquiries. It manages customer inquiries across multiple channels including websites, WhatsApp, WeChat, Telegram, and more.

Overview & Key Features

Edu Chatbot combines AI technologies with human oversight to ensure customer satisfaction and improve sales conversion:

  • πŸ€– Intelligent Interaction: Leverages on Retrieval-Augmented Generation (RAG) to respond to complex customer inquiries, customization according to business needs.

  • πŸ“š Knowledge Base: Stores and indexes frequently asked questions (FAQs), course details, pricing information, and other business-critical data in a vector database for rapid, accurate retrieval.

  • 🎯 Personalized Recommendations: Gathers relevant student information such as age and interests to recommend the relevant course.

  • 🧠 Intent Classification: Identifies customer needs to provide targeted responses.

  • 😊 Sentiment Analysis: Detects customer satisfaction levels and able to escalate to human staff when reaching a pre-configured threshold.

  • πŸ‘¨β€πŸ’Ό Human-in-the-Loop Design:

Ensures quality customer service through a sophisticated handoff system that activates when: 1. A customer explicitly requests to speak with a human representative 2. The sentiment analysis module detects customer frustration or dissatisfaction 3. Staff members proactively choose to intervene via the support dashboard

  • πŸ”„ Seamless Handoff: Enables staff to take over conversations when needed and return control to the chatbot once complex issues are resolved.

  • πŸ“± Dual Interface: Features a comprehensive demonstration UI with customer-facing chat (left panel) and staff support dashboard (right panel) views.

Demo

Check out Edu Chatbot in action: YouTube

The diagram below illustrates the complete interaction flow demonstrated in the video:

flowchart TD
  Start([Demo Start]) --> A
  A["Customer: Inquiries about courses"] --> B
  B["Chatbot: Intent Classification & Information Gathering"]
  
  B --> C1["Chatbot: Asks customer about age of student"]
  C1 --> C2["Customer: Provides age"]
  B --> C3["Chatbot: Asks customer about interest of student"]
  C3 --> C4["Customer: Shares interests"]
  
  C2 --> D["Chatbot: Course Recommendation with details - Description, Teacher info, Pricing, Schedule"]
  C4 --> D
  
  D --> E["Customer: Expresses concern about price and requests discount"]
  E --> E2["Chatbot: Not authorized to offer discounts"]
  
  E2 --> F["Support Staff: Notices situation and clicks the Take Over button"]
  F --> F2["Staff: Offers special discount"]
  
  F2 --> G["Customer: Accepts discounted offer"]
  G --> G1["Staff: Toggles back to chatbot"]
  G1 --> G2["Chatbot: Proceeds with enrollment"]
  G2 --> End([Enrollment Complete])
  
  classDef customer fill:#f9d5e5,stroke:#333,color:#000
  classDef chatbot fill:#e0f0ff,stroke:#333,color:#000
  classDef staff fill:#d5f9e5,stroke:#333,color:#000
  classDef endpoint fill:#f5f5f5,stroke:#333,color:#000
  
  class A,C2,C4,E,G customer
  class B,C1,C3,D,E2,G2 chatbot
  class F,F2,G1 staff
  class Start,End endpoint
Loading

Setup

Prerequisites

  • Python version 3.12+
  • Docker Desktop

Installation

  1. Clone the repository
git clone https://github.com/Jeanetted3v/edu_chatbot.git
cd edu-chatbot
  1. Configure environmental variables
cp .env.example .env
# Edit .env file with your API keys and configurations
  1. Start the application using Docker Compose
docker compose up --build
  1. Access the application

Data Configuration - Local

  • Place your unstructured FAQ documents (PDF) and structured data Excel files in the /data/data_to_ingest folder
  • In config/data_ingest.yaml, configure the paths under "local_docs" according to the file names and excel sheet names
local_doc:
  paths:
    - path: ./data/data_to_ingest/excel.xlsx
      sheet: syn_data
    - path: ./data/data_to_ingest/rag_qna.pdf
  • In config/data_ingest.yaml, configure the chromadb collection name accordingly, default is set to "syn_data"
embedder:
  similarity_metric: cosine
  persist_dir: ./data/embeddings
  collection: syn_data
  vector_store: chromadb

Data Configuration - Gdrive (Temporarily disabled)

  • Or configure Google Drive access
  • Generate and download Google Drive API credentials JSON file
  • Place your credentials file in a secure location
  • In config/data_ingest.yaml, configure the Google Drive settings:
gdrive:
  credentials_path: /path/to/your/credentials.json   # Path to your Google API credentials JSON file
gdrive_doc:
  - file_id: abcd123efg456                           # ID from Google Sheets URL
    file_type: sheets                                # For Google Sheets documents
  - file_id: abcd123efg456                           # ID from Google Docs URL
    file_type: docs                                  # For Google Docs documents
  # - file_id: your_drive_pdf_file_id_here
  #   file_type: pdf   # Support for PDF files (coming soon)

Technical Implementation Details

πŸ“ RAG or Long Context?

  • In view of recent advancement in LLM's context window, this chatbot is set up to use LLM to retreive information if data is within a certain token count. If token count is over a certain number, we'll fall back to use RAG instead.
  • Token count is set as a configurable parameter in config/data_ingest.yaml

πŸ“‚ Loading documents from Local or Google Drive

  • Education company can either load data into Google Drive or locally for both structured and unstructured data ingestion.
  • This can be configured in config/data_ingest.yaml

βœ‚οΈ Chunking

  • Langchain is used for chunking for the RAG pipeline.
  • Currently support RecursiveCharacter and SemanticChunker, configurable in config/data_ingest.yaml

πŸ” Embedding & Vector Database

  • Implements ChromaDB for lightweight, high-performance vector storage.

πŸ€– Agentic RAG

  • PydanticAI is used here for its simplicity and data valiadation feature.
  • It is able to provide a more direct output, such as during intent classification process.
  • For other LLM functions, plain vanilla OpenAI API is used for simplicity and flexibility.

πŸ’Ύ Saved Chat History

  • All chat histories are saved in MongoDB, which allows for tracing, further analysis and prompt enhancements.

πŸ“Š Evaluation

  • Metrics include answer relevancy, faithfulness, context precision and answer correctness.
  • Evaluation results are logged for continuous improvement of the system.

Future Enhancements

Multi-Channel Integration

  • Implement direct integration with WhatsApp, WeChat, Telegram, and other messaging platforms
  • Develop a unified API layer for consistent experience across all communication channels
  • Enable channel-specific customizations while maintaining core functionality

Vector Database

  • To support more types of vector database

LLM Models

  • To support more LLM models

Project Structure

ASCII Directory Tree (Complete Structure)

Edu_chatbot/
β”œβ”€β”€ assets/
β”œβ”€β”€ config/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/
β”‚   └── embeddings/
β”œβ”€β”€ dockerfiles/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ backend/
β”‚   β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ chat/
β”‚   β”‚   β”œβ”€β”€ database/
β”‚   β”‚   β”œβ”€β”€ dataloaders/
β”‚   β”‚   β”œβ”€β”€ dataprocessor/
β”‚   β”‚   β”œβ”€β”€ evaluation/
β”‚   β”‚   β”œβ”€β”€ main/
β”‚   β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ utils/
β”‚   β”‚   └── websocket/
β”‚   └── frontend/
β”‚       └── src/
β”‚           └── app/
β”‚               β”œβ”€β”€ components/
β”‚               β”œβ”€β”€ services/
β”‚               └── page.tsx
β”œβ”€β”€ .dockerignore
β”œβ”€β”€ .env
β”œβ”€β”€ .gitignore
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.in
└── requirements.txt

Mermard Diagram (Visual Overview)

graph TD
    A[Edu_chatbot] --> C[config]
    A --> D[data]
    A --> F[src]
    
    D --> D1[data_to_ingest]
    D --> D2[embeddings]
    
    F --> F1[backend]
    F --> F2[frontend]
    
    F1 --> F1A[api]
    F1 --> F1B[chat]
    F1 --> F1C[database]
    F1 --> F1D[dataloaders]
    F1 --> F1E[dataprocessor]
    F1 --> F1F[main]
    F1 --> F1G[models]
    F1 --> F1H[utils]
    F1 --> F1I[websocket]
    F1 --> F1J[evaluation]
    
    F2 --> F2A[src]
    F2A --> F2A1[app]
    F2A1 --> F2A1A[components]
    F2A1 --> F2A1B[services]
    F2A1 --> F2A1C[page.tsx]
Loading

Collapsible sections (With Explanation)

Edu_chatbot (Root)
  • .dockerignore - Docker build exclusion patterns
  • .env - Environment variables
  • .gitignore - Git exclusion patterns
  • docker-compose.yml - Docker Compose configuration
  • README.md - Project documentation
  • requirements.in - Primary Python dependencies
  • requirements.txt - Pinned Python dependencies
assets/
  • Project assets (images, static files, etc.)
config/
  • Configuration files
data/
data_to_ingest/
  • Raw data for ingestion
embeddings/
  • Vector embeddings storage
dockerfiles/
  • Docker configuration files
src/
backend/
  • api/ - API endpoints
  • chat/ - Chat functionality
  • database/ - Database connections and models
  • dataloaders/ - Data loading utilities
  • dataprocessor/ - Data processing pipelines
  • evaluation/ - Evaluation pipeline
  • main/ - Application entry points
  • models/ - ML/AI models
  • utils/ - Utility functions
  • websocket/ - WebSocket handlers
frontend/
src/
app/
  • components/ - UI components
  • services/ - Frontend api
  • page.tsx - Main entry point to frontend components

Tech Stack

🧠 OpenAI: LLM provider for natural language understanding and generation
πŸ” PydanticAI: Agentic framework for data validation and structured outputs
⛓️ Langchain: Document processing and chunking
😊 Vadar: Sentiment analysis
πŸ”  ChromaDB: Vector database for semantic search
πŸ’Ύ MongoDB: Chat history storage and data persistence
πŸ“ GoogleDriveAPI: Remote data access and integration
⚑ FastAPI: Backend API framework
βš›οΈ NodeJS/React: Frontend interface
🐳 Docker: Containerization and deployment
πŸ“Š RAGAS: RAG evaluation framework for measuring relevancy, faithfulness and correctness

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License

About

A chatbot created for an education company. Features: build-in sentiment analyzer module, enabling transfer to human agent, RAG, multi-turn conversation, saving & retrieving chat history to MongoDB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published