Semantic Document Intelligence & Forensic Analysis
Bridging the gap between physical documents and structured digital twins.
Warning
PRE-RELEASE SOFTWARE: KPaperFlux is currently in a state of rapid development. Breaking changes are common. Use with real, sensitive data at your own risk! Always keep backups of your documents.
KPaperFlux is a Document Refiner designed for precision and technical sovereignty. While traditional systems focus on archival and search, KPaperFlux aims at deep understanding: converting visual and textual information into structured, mathematically validated data.
- Forensic Separation: Specialized engines extract "noise" (stamps, signatures, handwriting) for separate analysis, ensuring the integrity of the core transactional data.
- Multi-Type-Tag Architecture: Documents are managed using a polymorphic tagging system. For documents identified with a
Financetag, KPaperFlux enforces strict EN 16931 (ZUGFeRD) data management for high-fidelity accounting and analytics. - Data Control & AI Choice: Physical document storage and vault management are strictly local. Users can opt for high-performance semantic analysis via Google Gemini, accepting the trade-offs of cloud-based processing, or rely on local-only extraction for maximum privacy (foregoing advanced AI insights).
- Automated Validation: Integrated logic engines perform cross-checks on financial totals and tax rates, prioritizing consistency and factual accuracy.
Forensic Rules Engine![]() |
Smart Import Splitter![]() |
Hybrid PDF Matching![]() |
Vendor Distribution![]() |
Forensic Visual Comparator![]() |
|
- Multi-Stage Extraction: Uses an adaptive pipeline (Stage 1 to 2) to classify documents and extract structured JSON compliant with EN 16931 (ZUGFeRD 2.2).
- Visual Auditor (X-Ray): Separates visual artifacts (accounting stamps, handwritten "Paid" notes, signatures) from the background text for independent analysis.
- Mathematical Integrity: Automated cross-checking of net/tax/gross totals to ensure 100% calculation consistency.
- Confidence Scoring: AI self-assessment scores are captured and persisted for every extraction pass to ensure data reliability.
- Physical Tracking: Integrated
Storage Locationfield for bridging the gap between digital vault and physical shelf (e.g., "Box A / Folder 5"). - Long-Term Archiving: Dedicated
Archivestatus with specialized sidebar filters for audited documents. - Live Aggregation: Real-time financial summation in the status bar for multi-selected documents (Σ ... EUR).
- Organic Order Collections: Automatically assembles logical collections (Quote ➔ Order ➔ Invoice) via background service. Uses extracted technical references (PROJECT_ID, ORDER_ID) to group related documents without manual tagging.
- Vector Protection: Instead of "baking" scans into PDFs, KPaperFlux overlays transparent signatures on original vector documents, preserving 1:1 text quality and minimal file size.
- Chain of Trust: Hybrid PDFs automatically embed the original digitally signed source document as an attachment for legal validity.
- Rules: Process documents through custom-defined state machines (e.g.,
VERIFIED→TO_PAY→ARCHIVED). - Automation: Intelligent routing based on semantic metadata.
- Dynamic Analytics: Real-time generation of charts (Bar, Pie, Line) based on your entire document corpus.
- Intelligent Visualization: Pie charts with automatic "Others" grouping, side legends with elision, and pattern-based color distinction for maximum readability.
- Relative Date Filtering: Predefined smart filters (Today, YTD, Last 90 Days) that stay dynamic as time passes.
- Global Zooming: Visual consistency from 50% to 300% zoom across all report elements.
- Structured Data: Export report results to CSV for external spreadsheet analysis.
- Document Archives: One-click ZIP export of all original PDF source documents associated with a specific report.
- Batch PDF Export: Merge multiple virtual documents into a single, paginated PDF file directly from the export dialog (Stitching).
- Hybrid PDF Reports: Exporting visual reports as PDFs with embedded semantic metadata.
- Current State: Active development. Stable core with rapidly evolving reporting and export capabilities.
- Reporting: Dynamic reporting engine with multi-format export.
- GUI: Solid desktop experience
- Standardization: Strict adherence to European invoicing standards (EN 16931).
- Clone:
git clone https://github.com/schnebeck/KPaperFlux.git - Env:
python3 -m venv venv && source venv/bin/activate - Install:
pip install -r requirements.txt - Hardware:
sudo apt install sane-airscan(Recommended for network scanners). - API Key: Enter your Google Gemini API Key in the Settings Dialog within the app.
- AI Selection: Dynamically choose between Gemini Flash (fast/cheap) and Gemini Pro (high-reasoning) directly in the configuration menu.
KPaperFlux follows strict Clean Code and TDD principles. Details can be found in the devel/ folder.
License: GNU General Public License v3.0
(c) 2025-2026 Thorsten Schnebeck & The Antigravity Team




