Your own LLM agent for generating dataset from documents.
DocMate is an Autonomous Document Intelligent tool that lets you chat with both structured and unstrured documents in your dataset. You can upload files, ask questions and get structured responses instantly. It support schema generation, data extraction, refinement and dynamic follow-ups. The front-end features voice input, multiple file upload and chat history.


- Multi Modal Support
Works with structure and unstructured files and parsed usingunstructured
library, - RAG based workflow
Uses Retrieval Augmented Geneation to enhance accuracy, retrieving releavant documents chunks before extraction. - LangGraph
Conditional graph-based flow for managing complex state transitions. - FAISS Vector Store
Similarity check for document and chunk retrieval. - Dataset Wide Statistics
Generates statistics of the whole dataset which displays and saves locally. - Speech-to-Text Input
- Multiple File upload
Follow these steps to setup DOCMATE on your local machine.
- Tauri and npm
- API keys for Azure STT and Gemini
-
Clone the repository:
git clone https://github.com/ItsAbhinavM/DocMate.git cd DocMate
-
Install dependencies:
npm install pip install requirements.txt
-
Set up environment variables:
Create a
.env
file in the root directory with the necessary API keys:AZURE_API_KEY=<YOUR_AZURE_API_KEY> GOOGLE_API_KEY=<YOUR_GOOGLE_GEMINI_API_KEY>
-
Start the development server:
npm run tauri build fastapi run main.py