A web-based application that allows users to upload PDF or image documents and interactively ask questions about their content. The system extracts structured content from the uploaded documents, displays it in a markdown viewer, and provides an AI-powered chat interface for Q&A. Example prompts are provided based on the uploaded document to guide user queries.
When a user asks a question, the chat not only provides an answer but also highlights the exact page, paragraph, and bounding box coordinates from the original document where the information was found. Users can click to scroll directly to that location, making verification and traceability simple.
- Upload Documents: Supports PDF and image files (
.png,.jpg,.jpeg). - Original Document Viewer: View the uploaded document in its original format.
- Markdown Document Viewer: View the parsed document content in markdown format with bounding boxes and structured layout.
- Grounded Answers: Each response includes a reference to the exact page, paragraph, and bounding box where the information was extracted.
- Scroll-to-Location: Clicking the reference automatically scrolls the document viewer to the relevant section.
- Interactive Chat: Ask questions about the document content using an AI assistant.
- Example Prompts: Automatically provides clickable example questions based on the uploaded document.
- Automatic Prompt Sending: Clicking an example prompt sends it directly to the chat.
- Hover Effects: Example prompts have hover effects with border and background changes.
- Drag and Drop Upload: Easily drag-and-drop files for upload.
- Sample Documents: Users can try sample PDF and PNG files without uploading their own.
- Clone the repository:
git clone https://github.com/your-username/document-chat-assistant.git
cd document-chat-assistant- Create a Python virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txt- Start a flask server
python app.py- Open your browser and navigate to http://127.0.0.1:5000
- Upload a document (PDF or image) or select a sample file.
- Interact with the AI assistant by typing your question or clicking an example prompt.
document-chat-assistant/
├─ app.py # Flask backend server
├─ document_agent.py
├─ static/ # Frontend JavaScript
│ ├─ samples/ # Sample PDF and image files
├─index.html # Main HTML page
├─ requirements.txt # Python dependencies
└─ README.md
-
Document Viewer Panel
Displays the uploaded document in two views:- Original Document View: Shows the PDF or image as it was uploaded.
- Markdown Document View: Shows the extracted, structured content and bounding boxes when user asked question.
-
Chat Panel
- Displays user and AI assistant messages.
- Each AI response includes a grounded reference: page number, paragraph, and bounding box of the source content.
-
Example Prompts
- Clickable bubbles with rounded corners for suggested questions.
- Hover effects highlight the prompt with border and background changes.
- Clicking a prompt sends it directly to the chat for instant AI response.
-
PDF Extraction: Uses Gemini API to parse PDFs into JSON with structured layout and bounding boxes.
-
Image Extraction: Planned: use Gemini API to parse images in a similar structured JSON format.
-
Chat Handling: AI-powered responses based on the extracted content.
-
File Handling: Uploads are processed via a single /upload endpoint for both user uploads and sample files.
MIT License