Python script to automate copy & pasting expenses from unformatted bank statements for accounting.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Script to automate copy & pasting expenses from unformatted bank statements for accounting purposes.
This automation project was inspired by the lack of proper xls/csv exporting options from certain banks (e.g. CITI), where we only have a CITI mobile app that provides billing statements in PDF. This was a tedious format to clean and transfer onto spreadsheet programs.
With this script, we can automate the copy & pasting, as well as formating/categorizing of our expenses with ease. This script can be tweaked to serve any other copy/pasting purposes as well.
Written with PyAutoGUI, Pynput, Pyperclip and Natural Language Toolkit (NLTK).
Libraries Used:
- PyAutoGUI - Automating mouse and keyboard actions such as clicking and typing
- Pynput - Event listeners to store variables
- Pyperclip - Clipboard for pasting data
- NLTK - Provides tools to work with human language data. Specifically this script uses a class in the NLTK.classify module, NaiveBayesClassifier, that provides an implementation of the Naive Bayes algorithm for text classification. This machine learning techniques help to predict & classify our expenses category
- V1.1 - Work in progress
- Text recognition, planning to remove some of V1.0 's manual tasks
- V1.0
- Able to copy text from badly formatted PDF files, and paste them onto excel or google sheets via mouse & keyboard actions.
- some actions are still manual (defining copy & paste coordinates with initial configuration).
- Food category is still manual (Prediction is not accurate for food related expenses, due to the vast variations of restaurants)
- Clone Repository
- Open project in your favorite IDE (i.e. VS Code)
- Install dependencies (libraries)
- Calibrate the line_spacing field in the Automation class (This represents the y-dist for the cursor to move between transactions in your PDF)
- Update build_json with your own dataset and expenses, tailor it to fit your own use case
- Run main.py in the terminal
- dataset.json & keywords_food.json should be generated in the same directory
- 1st prompt - select start point for copy script
- 2nd prompt - select start point for paste script
- Script executes successfully.
Project Link: https://github.com/nc1z/expenses-automation-script