Agentic Document System

A web-based application that allows users to upload PDF or image documents and interactively ask questions about their content. The system extracts structured content from the uploaded documents, displays it in a markdown viewer, and provides an AI-powered chat interface for Q&A. Example prompts are provided based on the uploaded document to guide user queries.

When a user asks a question, the chat not only provides an answer but also highlights the exact page, paragraph, and bounding box coordinates from the original document where the information was found. Users can click to scroll directly to that location, making verification and traceability simple.

Features

Upload Documents: Supports PDF and image files (.png, .jpg, .jpeg).
Original Document Viewer: View the uploaded document in its original format.
Markdown Document Viewer: View the parsed document content in markdown format with bounding boxes and structured layout.
- Grounded Answers: Each response includes a reference to the exact page, paragraph, and bounding box where the information was extracted.
- Scroll-to-Location: Clicking the reference automatically scrolls the document viewer to the relevant section.
Interactive Chat: Ask questions about the document content using an AI assistant.
Example Prompts: Automatically provides clickable example questions based on the uploaded document.
Automatic Prompt Sending: Clicking an example prompt sends it directly to the chat.
Hover Effects: Example prompts have hover effects with border and background changes.
Drag and Drop Upload: Easily drag-and-drop files for upload.
Sample Documents: Users can try sample PDF and PNG files without uploading their own.

Installation

Clone the repository:

git clone https://github.com/your-username/document-chat-assistant.git
cd document-chat-assistant

Create a Python virtual environment:

python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

Install dependencies:

pip install -r requirements.txt

Usage

Start a flask server

python app.py

Open your browser and navigate to http://127.0.0.1:5000
Upload a document (PDF or image) or select a sample file.
Interact with the AI assistant by typing your question or clicking an example prompt.

Project Structure

document-chat-assistant/
├─ app.py                  # Flask backend server
├─ document_agent.py
├─ static/                # Frontend JavaScript
│  ├─ samples/             # Sample PDF and image files
├─index.html           # Main HTML page
├─ requirements.txt        # Python dependencies
└─ README.md

Frontend Overview

Document Viewer Panel
Displays the uploaded document in two views:
- Original Document View: Shows the PDF or image as it was uploaded.
- Markdown Document View: Shows the extracted, structured content and bounding boxes when user asked question.
Chat Panel
- Displays user and AI assistant messages.
- Each AI response includes a grounded reference: page number, paragraph, and bounding box of the source content.
Example Prompts
- Clickable bubbles with rounded corners for suggested questions.
- Hover effects highlight the prompt with border and background changes.
- Clicking a prompt sends it directly to the chat for instant AI response.

Backend Overview

PDF Extraction: Uses Gemini API to parse PDFs into JSON with structured layout and bounding boxes.
Image Extraction: Planned: use Gemini API to parse images in a similar structured JSON format.
Chat Handling: AI-powered responses based on the extracted content.
File Handling: Uploads are processed via a single /upload endpoint for both user uploads and sample files.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
static/samples		static/samples
uploads		uploads
.gitignore		.gitignore
LICENSE		LICENSE
app.py		app.py
document_agent.py		document_agent.py
index.html		index.html
output.json		output.json
output.md		output.md
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic Document System

Features

Installation

Usage

Project Structure

Frontend Overview

Backend Overview

License

About

Uh oh!

Releases

Packages

Languages

License

NDarayut/Agentic-Document-Intelligent-System

Folders and files

Latest commit

History

Repository files navigation

Agentic Document System

Features

Installation

Usage

Project Structure

Frontend Overview

Backend Overview

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages