Build software better, together

dissorial / doc-chatbot

Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.

chat typescript reactjs mongoose nextjs chatbot openai vectorization pinecone document-embedding tailwindcss pdf-processing gpt-3 openai-api gpt-4 langchain

Updated Jul 21, 2023
TypeScript

allenai / papermage

Star

library supporting NLP and CV research on scientific papers

python machine-learning natural-language-processing computer-vision scientific-papers multimodal pdf-processing

Updated Nov 8, 2024
Python

Tele-AI / doc-ops-mcp

Star

MCP server for seamless document format conversion and processing

document-conversion file-converter pdf-conversion markdown-converter watermark document-processing document-converter docx-to-pdf pdf-processing docx2pdf document-rewriting

Updated Jan 5, 2026
TypeScript

ahmedkhemiri95 / PDFs-TextExtract

Star

Multiple and Large PDF Documents Text Extraction.

python pdf parser data-science pdf-document text-analytics pdfs pypdf2 extract-text pdfminer pdf-processing pdfs-textextract

Updated Feb 10, 2025
Python

postralai / masquerade

Star

The Privacy Firewall for LLMs

privacy mcp claude anonymization pdf-processing pseudonymization pdf-redaction private-llm model-context-protocol mcp-server pdf-pseudonymization

Updated Aug 11, 2025
Python

aws-samples / document-processing-pipeline-for-regulated-industries

Star

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

Updated Oct 25, 2021
Python

PSPDFKit / nutrient-dws-client-python

Star

Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion

python pdf-converter pdf-generation pdf-document-processor ocr-python pdf-processing

Updated Dec 15, 2025
Python

PSPDFKit-labs / nutrient-dws-client-typescript

Star

This library provides a type-safe and ergonomic interface for document processing operations including conversion, merging, compression, watermarking, and text extraction using Nutrient DWS Processor API.

typescript pdf-converter pdf-generation ocr-library pdf-document-processor pdf-processing

Updated Aug 21, 2025
TypeScript

autollama / autollama

Star

Anthropic's Contextual Retrieval implementation with visual chunk comparison. Preview context enrichment before/after embedding.

react nodejs docker automation ai chatbot embeddings openai knowledge-base semantic-search document-processing rag pdf-processing vector-database llm

Updated Sep 25, 2025
HTML

Govind-S-B / pdf-to-text-chroma-search

Star

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

text-extraction similarity-search pdf-processing vector-embeddings chromadb

Updated Oct 23, 2023
Python

tetratensor / ML-powered_resume_analyser

Star

Local, privacy-friendly resume analysis: convert, classify, and get advice using TF‑IDF, Logistic Regression, and sentence-transformer embeddings.

python nlp data-science machine-learning text-classification sklearn kaggle-dataset resume-analysis pdf-processing resume-screening sentence-transformers

Updated Sep 24, 2025
Python

enesmanan / paper-bold

Star

AI-powered RAG-based tool for summarizing, extracting insights, and answering questions about research papers with high accuracy

academic-paper gemini-api rag pdf-processing academic-research langchain

Updated Mar 20, 2025
HTML

ranguy9304 / LangGraphRAG

Star

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

python natural-language-processing information-retrieval chatbot web-scraping nlp-machine-learning rag terminal-application pdf-processing vector-database openai-api langgraph

Updated Jul 13, 2024
Python

ManasMadan / pdf-actions

Star

A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...

react javascript pdf npm reactjs react-component pdf-merge pdf-split pdf-rotate pdf-merger pdf-downloader pdf-lib pdf-splitter pdf-processing pdf-download pdf-free pdf-online

Updated Oct 31, 2023
JavaScript

Remy2404 / Polymind

Star

A powerful, multi-modal Telegram bot leveraging cutting-edge AI technologies including Gemini, DeepSeek, OpenRouter, and 50+ AI models for comprehensive conversational assistance, media processing, and collaborative features with MCP (Model Context Protocol) integration.

telegram-bot voice image-processing voice-recognition gemini multi-model pdf-processing ai-assistant openrouter mermiad deepseek-r1

Updated Dec 15, 2025
Python

DioCrafts / ai-book-summarizer

Star

📚 AI-Powered Book EPUB Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

python markdown pdf machine-learning natural-language-processing automation ai text-analysis openai text-summarization document-analysis study-materials pymupdf knowledge-extraction pdf-processing book-summary educational-tools pdf-summarization ai-powered-tools

Updated Sep 28, 2025
Python

hannahjan06 / LegalEase-AI

Star

LegalEase AI is a document simplification tool built using Gemini API, Streamlit, and Hugging Face models. It allows users to upload legal PDFs and automatically receive simplified summaries, clause-level insights, and structured information designed for clarity and accessibility.

python nlp ocr document-analysis gemini-api huggingface legal-ai pdf-processing legal-tech streamlit ai-assistant

Updated Nov 22, 2025
Python

noorjotk / local-rag-engine

Star

Local RAG app with zero-config Docker setup. FastAPI + Streamlit + Qdrant + Ollama. Just run `docker-compose up --build`! 🚀

python docker semantic-search rag fastapi pdf-processing privacy-focused streamlit vector-database qdrant llm qdrant-vector-database local-llm local-ai ollama local-ollama

Updated Jul 26, 2025
Python

AkshayG999 / MistralOCR---AI-Powered-Document-Extraction

Star

MistralOCR is an open-source application that transforms documents into structured data using Mistral AI's OCR capabilities. Built with FastAPI and Streamlit, it provides an intuitive interface for extracting and processing text from PDFs and images, making document digitization effortless and accurate.

Updated Jan 9, 2026
Python

Inc44 / MaTools

Sponsor

Star

An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.

python rust productivity application gui qt ocr image-processing video-processing speech-recognition youtube-downloader file-management audio-processing pdf-processing code-formatting

Updated Dec 7, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-processing

Here are 281 public repositories matching this topic...

dissorial / doc-chatbot

allenai / papermage

Tele-AI / doc-ops-mcp

ahmedkhemiri95 / PDFs-TextExtract

postralai / masquerade

aws-samples / document-processing-pipeline-for-regulated-industries

PSPDFKit / nutrient-dws-client-python

PSPDFKit-labs / nutrient-dws-client-typescript

autollama / autollama

Govind-S-B / pdf-to-text-chroma-search

tetratensor / ML-powered_resume_analyser

enesmanan / paper-bold

ranguy9304 / LangGraphRAG

ManasMadan / pdf-actions

Remy2404 / Polymind

DioCrafts / ai-book-summarizer

hannahjan06 / LegalEase-AI

noorjotk / local-rag-engine

AkshayG999 / MistralOCR---AI-Powered-Document-Extraction

Inc44 / MaTools

Improve this page

Add this topic to your repo