fed-pay-stub-extractor

This repo contains a small utility for extracting structured data from pay stub PDFs generated by Employee Express.

Requirements

Node.js (see .nvmrc for recommended version)
Yarn

Getting started

First of all, you need to go to Employee Express and download PDFs of your pay stubs. If you have a lot, this will take you a while.

Then, install the dependencies and build the project:

$ yarn && yarn build

Then run the script using yarn start, passing in paths to PDF files:

$ yarn --silent start path/to/your/pay-stub.pdf

You can pass multiple PDF files in this way, just add them to the command line:

$ yarn --silent start path/to/your/pay-stubs/*.pdf

By default, the output is in CSV, suitable for copying and pasting into an actual spreadsheet. You can also get JSON if you want:

$ yarn --silent start path/to/your/pay-stub.pdf --json

How it works

First, this tool uses pdf2json to extract text tokens from the PDF file. Then it uses some bespoke and extremely fragile parsing logic to extract structured information.

FAQ

Why don't you just download CSV files from Employee Express?

I couldn't figure out how.

This is a bad idea. How do I know the numbers this thing generates match reality?

fed-pay-stub-extractor attempts to sum up all your deductions, subtract them from your gross pay, and check that the net pay it calculates matches what it found in the PDF. This hopefully gives you a little confidence? I don't know, man. Don't use this.

You should simply use a large language model to extract all this information.

That's not a question. Also, I found in my testing that local LLMs weren't quite good enough to get "correct" data out of these PDFs and I didn't really feel like turning my pay stubs into training data for API-based models.

I got a error. It says "calculated net differs from statement net"

There are a couple of things this could be:

Your pay stub might have fields on it that mine doesn't. This means that fed-pay-stub-extractor is probably not parsing out all of your deductions.
The stated "net" numbers on your pay stub might include factors that are not documented elsewhere on your pay stub. This can happen if an HR snafu leads to your pay stubs being incorrect for some reason. Double check the numbers and :fingers_crossed: everything just works out.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierrc.json		.prettierrc.json
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fed-pay-stub-extractor

Requirements

Getting started

How it works

FAQ

Why don't you just download CSV files from Employee Express?

This is a bad idea. How do I know the numbers this thing generates match reality?

You should simply use a large language model to extract all this information.

I got a error. It says "calculated net differs from statement net"

About

Releases

Packages

Languages

License

matthinz/fed-pay-stub-extractor

Folders and files

Latest commit

History

Repository files navigation

fed-pay-stub-extractor

Requirements

Getting started

How it works

FAQ

Why don't you just download CSV files from Employee Express?

This is a bad idea. How do I know the numbers this thing generates match reality?

You should simply use a large language model to extract all this information.

I got a error. It says "calculated net differs from statement net"

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages