Multi-PDF Chatbot

This project is a comprehensive Python application that combines text extraction, document analysis, language translation, and email automation. The system is designed to facilitate efficient handling of PDF documents, enabling users to upload files, extract text using Chroma DB, pose targeted questions, and receive precise language translations along with email capabilities therefore streamlining communication.

Features

Multi-PDF option for users to handle and analyze multiple PDF documents
PDF text extraction and vector representations via Chroma DB
Targeted question-based chatbot for document information
Language translation capabilities for multilingual document understanding
Automated email process using GMail API

Technologies Used

Frontend: Streamlit
Backend: Python
Vector Databse: Chroma DB
PDF Handling: PyPDF2
AI Libraries:
- OpenAI: langchain.llms
- Langchain:
  - langchain.prompts
  - langchain.chains
  - CharacterTextSplitter
  - OpenAIEmbeddings
- Authentication and Authorization:
  - Google Auth: google-auth
  - Google Auth OAuthLib: google-auth-oauthlib
  - Google Auth HTTPLib2: google-auth-httplib2
Google API Integration: Google API Python Client: google-api-python-client

Usage

Prerequisites

Before running the application, ensure you have the necessary API keys for OpenAI and GMail.

Getting Started

Clone this repository to your local machine.
Create a secret_key.py file in the project root directory.

Add your OpenAI API key to the secret_key.py file:

# secret_key.py
openapi_key="YOUR_API_KEY_HERE"

Download the json file from GCP and name the file as credentials.json. Save this file in the project root directory.
Install the required dependencies using:
```
pip install -r requirements.txt
```
Run the application using streamlit run app.py.

How it Works

Upon uploading multiple PDFs, the application initiates a systematic data processing workflow. The uploaded data is meticulously segmented into manageable chunks, and each segment undergoes embedding for storage in the Chroma vector database. Users are prompted to inquire about the content through targeted questions. Leveraging OpenAI's Language Model (LLM), the system conducts a semantic search, providing users with insightful answers. For enhanced accessibility, users can opt to translate the results into another language using OpenAI's language translation capabilities. The final touch involves seamless email communication facilitated by the GMail API, allowing users to effortlessly share or archive the analyzed and translated content. This streamlined process caters to the needs of professionals, researchers, and students, offering a comprehensive solution for efficient document handling, information retrieval, and multilingual communication.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
README.md		README.md
Report.pdf		Report.pdf
app.py		app.py
mail.py		mail.py
requirements.txt		requirements.txt
translation.py		translation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-PDF Chatbot

Features

Technologies Used

Usage

Prerequisites

Getting Started

How it Works

About

Releases

Packages

Languages

reethuthota/PDF-Chatbot

Folders and files

Latest commit

History

Repository files navigation

Multi-PDF Chatbot

Features

Technologies Used

Usage

Prerequisites

Getting Started

How it Works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages