PDF Chatbot – Built a Retrieval-Augmented Generation (RAG) system for answering queries from PDF content with high accuracy. Integrated LangChain with OpenAI and Google Vertex AI for natural language understanding, used Qdrant DB for vector search, and implemented Redis + BullMQ for scalable background processing. Added secure user authentication via Clerk and optimized performance with caching and task decoupling.
- 🔗 Click here to try the PDF Chatbot
⚠️ Note: The server is hosted on Render.com free tier, so please wait 40–50 seconds after uploading your first file for the server to spin up.
- RAG Pipeline – Enables accurate responses using context from uploaded PDFs.
- Multi-LLM Integration – Utilizes LangChain, OpenAI, and Google Vertex AI for enhanced natural language understanding.
- Scalable Backend – Built with Redis and BullMQ for efficient job queue management.
- Vector Search – Implements Qdrant DB for semantic search and quick retrieval of relevant content.
- Secure Authentication – Uses Clerk for user authentication and session management.
- Performance Optimized – Background job processing & caching with Redis for faster responses.
Frontend:
- Next.js – React framework for building the frontend interface.
Backend & Infrastructure:
- Node.js – Backend server.
- Redis – In-memory data store for caching.
- BullMQ – Job queue management.
- Qdrant DB – Vector database for semantic search.
- LangChain – LLM orchestration.
- OpenAI API – Language model integration.
- Google Vertex AI – Alternative LLM processing.
Authentication:
- Clerk – Authentication and session management.
📦client
┣ 📂app
┃ ┣ 📂components
┃ ┃ ┣ 📂ui
┃ ┃ ┣ 📜chat.jsx
┃ ┃ ┣ 📜dropdown-menu.jsx
┃ ┃ ┣ 📜theme-provider.jsx
┃ ┃ ┣ 📜toggle-theme.jsx
┃ ┃ ┗ 📜upload-file.jsx
┃ ┣ 📂utils
┃ ┃ ┗ 📜freeup-resource.jsx
┃ ┣ 📜favicon.ico
┃ ┣ 📜globals.css
┃ ┣ 📜layout.js
┃ ┗ 📜page.js
┣ 📂lib
┣ 📂public
┣ 📜.env
┣ 📜package-lock.json
┣ 📜package.json
📦server
┣ 📂config
┃ ┣ 📜ai-agent-465405-ccb3e8b3e185.json
┃ ┣ 📜openai.config.js
┃ ┗ 📜qdrantdb.config.js
┣ 📂controllers
┃ ┣ 📜chat.controller.js
┃ ┣ 📜file.controller.js
┃ ┗ 📜vectordb.controller.js
┣ 📂routes
┃ ┣ 📜chat.routes.js
┃ ┣ 📜file.routes.js
┃ ┗ 📜vectordb.routes.js
┣ 📂uploads
┣ 📜.env
┣ 📜.gitignore
┣ 📜app.js
┣ 📜package-lock.json
┣ 📜package.json
┣ 📜redis-connection.js
┗ 📜worker.js
- Clone the repository
git clone https://github.com/your-username/pdf-chatbot.git
cd pdf-chatbot- Install dependencies
Client: npm install
Server: npm install
- Set environment variables Create a .env file in the root directory and add:
QDRANT_URL=<url>
QDRANT_API_KEY=<key>
# OpenAI
GEMINI_API_KEY=<api-key>
BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
# Redis
HOST="redis-host-url"
REDIS_PORT=<port>
REDIS_PASSWORD=<password>
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=<> (inside client)
CLERK_SECRET_KEY=<> (inside client)
- Run the development server
Server-> 1. npm run dev 2. npm run dev:worker
Client-> npm run dev
- Upload a PDF – The system ingests and processes it in the background.
- Ask questions – The chatbot retrieves relevant content from the vector database.
- Receive context-aware answers – Powered by LLMs and RAG.
- PDF Upload → Stored temporarily for processing.
- Text Extraction → Extracts text from PDF.
- Vector Embedding → Converts text into embeddings and stores in Qdrant.
- Question Processing → Retrieves relevant chunks using semantic search.
- LLM Query → Passes chunks to LLM for answer generation.
- Response Delivery → Sends answer back to the user.
- Decoupled PDF ingestion using BullMQ background jobs.
- Cached frequent queries using Redis.
- Batched vector inserts to reduce Qdrant API calls.
- All user sessions managed securely with Clerk.
- API routes protected via authentication middleware.