SmartDoc: Intelligent Document Processing with LLM Integration

SmartDoc is an intelligent document processing system that leverages OCR and LLM integration to convert, analyze, and extract valuable information from PDFs. This project provides a user-friendly interface for comprehensive document understanding, and translation, and includes a chatbot for interactive queries.

Features

PDF to Text Conversion: Extract text from PDF files using PyMuPDF.
OCR Integration: Extract text from images within PDFs using Tesseract OCR.
Text Preprocessing: Clean, segment, and tokenize text for optimal LLM performance.
Information Extraction: Identify and extract entities, and relationships, and summarize key information using Gemini LLM.
Document Classification: Classify documents into predefined categories.
Translation: Translate extracted text into different languages.
Interactive Interface: User-friendly Streamlit interface for easy interaction.
Chatbot Integration: Ask questions and interact with the extracted data using a conversational chatbot powered by Gemini LLM.

Installation

Clone the repository:

git clone https://github.com/rsharvesh16/SmartDoc-Document-Processing-With-LLM.git
cd SmartDoc-Document-Processing-With-LLM

Set up a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```
Set up environment variables:
- Create a .env file in the project root and add your Gemini LLM API key:
```
GOOGLE_API_KEY=your_google_api_key
```
Configure Tesseract and Poppler paths:
- Ensure Tesseract OCR is installed and its path is correctly set in the code:
```
pytesseract.pytesseract.tesseract_cmd = r'Your Tesseract Path'
```
- Ensure Poppler is installed and its path is correctly set in the code:
```
pages = convert_from_path("temp.pdf", 500, poppler_path=r'Your Poppler Path')
```

Usage

Run the Streamlit application:
```
streamlit run app.py
```
Open the application in your browser:
- The application will be available at http://localhost:8501.
Upload a PDF file for processing and interact with the results through the provided interface.

Code Structure

app.py: Main Streamlit application file.
requirements.txt: List of required packages.
.env: Environment variables file (not included, create your own).
temp.pdf: Temporary file for uploaded PDF (generated during runtime).

Functions

convert_pdf_to_txt: Converts PDF to text using PyMuPDF.
extract_text_from_images: Extracts text from images using Tesseract OCR.
preprocess_text: Cleans, segments, and tokenizes text for LLM processing.
generate_response: Calls Gemini LLM to generate responses for prompts.
extract_entities: Extracts entities from text using LLM.
extract_relationships: Extracts relationships between entities using LLM.
summarize_text: Summarizes text using LLM.
classify_document: Classifies document text into predefined categories.
translate_text: Translates text into different languages using LLM.

Chatbot Integration

SmartDoc includes a chatbot feature that allows you to interact with the extracted data in a conversational manner. You can ask questions about the document. The chatbot leverages the Gemini LLM to generate intelligent and contextual responses.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

PyMuPDF
Tesseract OCR
PDF2Image
NLTK
Google Generative AI

If you have any further questions or issues, feel free to ask!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SmartDoc: Intelligent Document Processing with LLM Integration

Features

Installation

Usage

Code Structure

Functions

Chatbot Integration

License

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

SmartDoc: Intelligent Document Processing with LLM Integration

Features

Installation

Usage

Code Structure

Functions

Chatbot Integration

License

Acknowledgements