Skip to content

schnebeck/KPaperFlux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

359 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KPaperFlux Logo

KPaperFlux

Semantic Document Intelligence & Forensic Analysis
Bridging the gap between physical documents and structured digital twins.

Status Python UI AI

Warning

PRE-RELEASE SOFTWARE: KPaperFlux is currently in a state of rapid development. Breaking changes are common. Use with real, sensitive data at your own risk! Always keep backups of your documents.


Project Philosophy

KPaperFlux is a Document Refiner designed for precision and technical sovereignty. While traditional systems focus on archival and search, KPaperFlux aims at deep understanding: converting visual and textual information into structured, mathematically validated data.

  • Forensic Separation: Specialized engines extract "noise" (stamps, signatures, handwriting) for separate analysis, ensuring the integrity of the core transactional data.
  • Multi-Type-Tag Architecture: Documents are managed using a polymorphic tagging system. For documents identified with a Finance tag, KPaperFlux enforces strict EN 16931 (ZUGFeRD) data management for high-fidelity accounting and analytics.
  • Data Control & AI Choice: Physical document storage and vault management are strictly local. Users can opt for high-performance semantic analysis via Google Gemini, accepting the trade-offs of cloud-based processing, or rely on local-only extraction for maximum privacy (foregoing advanced AI insights).
  • Automated Validation: Integrated logic engines perform cross-checks on financial totals and tax rates, prioritizing consistency and factual accuracy.

Showcase: Intelligence in Action

Forensic Rules Engine
Smart Import Splitter
Hybrid PDF Matching
Vendor Distribution
Forensic Visual Comparator

Technical Highlights

Semantic Analysis Pipeline

  • Multi-Stage Extraction: Uses an adaptive pipeline (Stage 1 to 2) to classify documents and extract structured JSON compliant with EN 16931 (ZUGFeRD 2.2).
  • Visual Auditor (X-Ray): Separates visual artifacts (accounting stamps, handwritten "Paid" notes, signatures) from the background text for independent analysis.
  • Mathematical Integrity: Automated cross-checking of net/tax/gross totals to ensure 100% calculation consistency.
  • Confidence Scoring: AI self-assessment scores are captured and persisted for every extraction pass to ensure data reliability.

Document Management (DMS) Evolution

  • Physical Tracking: Integrated Storage Location field for bridging the gap between digital vault and physical shelf (e.g., "Box A / Folder 5").
  • Long-Term Archiving: Dedicated Archive status with specialized sidebar filters for audited documents.
  • Live Aggregation: Real-time financial summation in the status bar for multi-selected documents (Σ ... EUR).
  • Organic Order Collections: Automatically assembles logical collections (Quote ➔ Order ➔ Invoice) via background service. Uses extracted technical references (PROJECT_ID, ORDER_ID) to group related documents without manual tagging.

Plugin System & Localization

  • Vector Protection: Instead of "baking" scans into PDFs, KPaperFlux overlays transparent signatures on original vector documents, preserving 1:1 text quality and minimal file size.
  • Chain of Trust: Hybrid PDFs automatically embed the original digitally signed source document as an attachment for legal validity.

Workflows

  • Rules: Process documents through custom-defined state machines (e.g., VERIFIEDTO_PAYARCHIVED).
  • Automation: Intelligent routing based on semantic metadata.

Advanced Reporting & Dynamic Cockpits

  • Dynamic Analytics: Real-time generation of charts (Bar, Pie, Line) based on your entire document corpus.
  • Intelligent Visualization: Pie charts with automatic "Others" grouping, side legends with elision, and pattern-based color distinction for maximum readability.
  • Relative Date Filtering: Predefined smart filters (Today, YTD, Last 90 Days) that stay dynamic as time passes.
  • Global Zooming: Visual consistency from 50% to 300% zoom across all report elements.

Multi-Format Data Export

  • Structured Data: Export report results to CSV for external spreadsheet analysis.
  • Document Archives: One-click ZIP export of all original PDF source documents associated with a specific report.
  • Batch PDF Export: Merge multiple virtual documents into a single, paginated PDF file directly from the export dialog (Stitching).
  • Hybrid PDF Reports: Exporting visual reports as PDFs with embedded semantic metadata.

Status & Development

  • Current State: Active development. Stable core with rapidly evolving reporting and export capabilities.
  • Reporting: Dynamic reporting engine with multi-format export.
  • GUI: Solid desktop experience
  • Standardization: Strict adherence to European invoicing standards (EN 16931).

Installation & Quick Start

  1. Clone: git clone https://github.com/schnebeck/KPaperFlux.git
  2. Env: python3 -m venv venv && source venv/bin/activate
  3. Install: pip install -r requirements.txt
  4. Hardware: sudo apt install sane-airscan (Recommended for network scanners).
  5. API Key: Enter your Google Gemini API Key in the Settings Dialog within the app.
  6. AI Selection: Dynamically choose between Gemini Flash (fast/cheap) and Gemini Pro (high-reasoning) directly in the configuration menu.

Contribution & License

KPaperFlux follows strict Clean Code and TDD principles. Details can be found in the devel/ folder.

License: GNU General Public License v3.0
(c) 2025-2026 Thorsten Schnebeck & The Antigravity Team

About

a semantic personal document management system

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages