A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
-
Updated
Nov 10, 2024 - Python
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
一款美观且功能强大的在线设计工具,具备海报设计和图片编辑功能,基于fabric.js的开源版【稿定设计】。适用于多种场景,如海报生成、电商产品图制作、文章长图设计、视频/公众号封面编辑等 。A beautiful and powerful online design tool
Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.
Python PDF parser for scientific publications: content and figures
Analyze PDFs. With colors. And Yara.
A package for parsing PDFs and analyzing their content using LLMs.
Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser
A python client for the Sypht API
Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
A Java client for the Sypht API
Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library
C# and VB.NET samples for Docotic.Pdf library
Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.
PDF parsing toolkit for preparing academic text corpus
Fast and memory-efficient Python PDF Parser based on xpdf sources
Swift PDFParser for PDF parsing and text mining. Includes a TrueType font parser
A Golang client for the Sypht API
Add a description, image, and links to the pdf-parser topic page so that developers can more easily learn about it.
To associate your repository with the pdf-parser topic, visit your repo's landing page and select "manage topics."