🖼️📄E2E Multi-modal Document Preprocessing for Search Indexing with Azure Document Intelligence
-
Updated
Oct 22, 2025 - Python
🖼️📄E2E Multi-modal Document Preprocessing for Search Indexing with Azure Document Intelligence
QATorch is a Python tool for in-depth analysis of Q&A datasets, designed to prepare data for Retrieval Augmented Generation (RAG) systems. It performs data quality checks, deduplication, metadata analysis, and generates detailed, customizable reports to streamline AI/NLP dataset preparation.
Add a description, image, and links to the rag-preparation topic page so that developers can more easily learn about it.
To associate your repository with the rag-preparation topic, visit your repo's landing page and select "manage topics."