AI-Bank Statement document automatic by LLM and Personal financial analysis

Introdctions

Business/ Use case

Every Month, we obtain a lot bank statement in pdf format document. We intend to calculate and summarize the personal expensive and income from bank statement, statistically analysis monthly and yearly income vs expense for personal finanial planing . It is taking time to handle and store data.

This project intended to be used LLM Model for the purpose of assisting the user obtain fully record from bank statement by RAG technique. Convert the bank statement from unstructured document format into structured format. Store the records into database. Use LLM nature language to query the bank statement and give us the report.

This project mainly can divide into three main parts:

Data Extraction model for PDF file complex format
Embedding model + Vector Database for Store PDF Retrival document
LLM Model + RAG technique from data retrieved from database with natural language queries by user

Technology use in this project

Unstructure Document Preprocssing

Because the input document complexity, include table, image (chart), I will use several AI model like OCR , commputer vision model, Vision transformer , layout transformer, Embedding model to extract and analysis the document content from bank statement.
Complex layout/Context format Analysis by ML model
Level 1 analysis: Document layout Analysis
- Use Computer vision (object detection) AI model to extract component in document content
- then use different AI model analysis and extract different type of components context
Level 2 each component context
- use different AI model for extract and recognize different types of docunment components
Level 3 High level task analysis
- use AI model Entities
- use AI model Sentiment Analysis
- use AI model Summarization
use advance rule base model or Machine learning model :
- group and reorganize the data into a user-friendly format. (no experience to build rule to graoup data)
- Identify common denominators and create headers for each group. (no experence)
- Display only the differences between similar items (e.g., window sizes, owners) as line items below each header.
- Automate the process using AI, enabling the system to self-learn and understand the data structure.
- Extract relevant data from PDFs with different layouts and formats.

retrieval augmented generation (RAG) with langChain

use Embedding model with VectorDB to Retrieve data values by query
using training dataset for improvement the Text summaration task for conference speakers
using Advance RAG technique improve retrieval accuracy (e.g. re-ranking, query extensions, auto-merging)

LLM Model / Multi-model

try to use different open LLM models / multi-model (e.g. LLama3, gemma 2) , prefer use local open LLM models(planning inference LLM model at offline in local machine)
LLM model use for user-friendly documentation queries and retrieval information interface with natural language

LLM Model evaluation

use truLens or W&B framework for evaluation and debug LLM performance
LLM evaluation : Content relevance, Answer Relevance, accuary, recall, precision

AI agent

use AI agent automatically trigger multiple function

VectorDB

use Vector DataBase to store the converted Document context into embedding vector
use Vector Database can find document similarity

SQL Database

use to store the conference record
use for query history conference record

FrontEnd UI

first version will be used Streamlit for Frontend UI
later versions will be Full stack with Backend Restful API

Installation and Setup

use requirements.txt for installation package dependencies
you can setup virual environment by venv
add your google api key to .env file for enviroment variables
install pytesseract library for ubuntu linux , please run install-pytesseract-for-linux.sh script file

Run Application

For Development version: go to dev folder run jupyter notebook code for development new model/techique
For Application GUI version: running steamlit run apps.py for develop the application

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AI-Bank Statement document automatic by LLM and Personal financial analysis

Introdctions

Business/ Use case

Technology use in this project

Installation and Setup

Run Application

Files

README.md

Latest commit

History

README.md

File metadata and controls

AI-Bank Statement document automatic by LLM and Personal financial analysis

Introdctions

Business/ Use case

Technology use in this project

Installation and Setup

Run Application