Skip to content

Code to classify bank account transactions based on key words

License

Notifications You must be signed in to change notification settings

JosephCrispell/transactionCodeR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

R-CMD-check GitHub stars License: GPL v3

transactionCodeR

Summary

An early R package with functions to explore monthly bank transactions data breaking them down by the type of transaction. The type of transaction is defined based upon key words (or patterns) observed in each transaction's description. See an example report you can generate to give you an idea of what you can generate.

Defining transaction types

Key words or patterns are used to define transaction types and can be defined in a simple CSV (Comma Seperated Values) file. For example, here is data/dummy_transaction_coding_dictionary.csv that is used to classify transactions in the dummy data:

Type Patterns
Salary Salary
Food supermarket;market;corner shop
Travel train;bus;car
Bills Bills
Subscriptions films;music
Rent Rent
Surfboard Surfboard
Exclude loan

Note these are very simplistic. The key things to note:

  • You can multiple key words (or patterns) matching to single transaction type
  • You can have as many transaction types as you want
  • The column names are required to be Type and Patterns
  • The transaction type "Exclude" can be used to exclude transactions matching the patterns from summary statistics generated

Bank transactions data

Bank transactions data should be in a standard format through your provider's online system. The following four columns are used by this package:

  • Transaction Date - column containing transaction date
  • Transaction Description - payment description column
  • Credit Amount - amount of money paid in
  • Debit Amount - amount of money paid out

Note that the names of the above can be specified. For example in the report Rmarkdown script here:

# Note key parameters
date_column <- "Date"
date_format <- "%Y-%m-%d"
description_column <- "Description"
in_column <- "In"
out_column <- "Out"

Installation

Requirements

transactionCodeR is an R package with the following requirements:

data

software

  • R
  • RStudio - you can get by without this!

R packages

  • plotly - for interactive visualisations
  • devtools - for installing R package from GitHub

R package installation

devtools::install_github("JosephCrispell/transactionCodeR")
library(basicPlotteR)

Example scripts

There are three scripts provided with this R package. To locate these scripts once the R package is installed use the following code:

transaction_report_script_path <- system.file(
    "inst", "R", "transaction_coding_report.Rmd",
    package = "transactionCodeR"
)
process_transactions_script_path <- system.file(
    "inst", "R", "process_transactions.R",
    package = "transactionCodeR"
)
dummy_data_script_path <- system.file(
    "inst", "R", "generate_dummy_data.R",
    package = "transactionCodeR"
)

These two scripts do the following (more detail on these in later sections):

Building your report

The inst/R/transaction_coding_report.Rmd Rmarkdown script represents a template report you can use to analyse your monthly bank transactions by their type.

The inst/R/transaction_coding_report.Rmd is designed to run on dummy transaction data but can easily be modified to run on your own data:

  • Edit the input parameters when knitting the Rmarkdown file providing the file names for your bank transactions data and transactions types files (more info on knitting with parameters)
  • Update the following lines with the correct column names and date format:
    # Note key parameters
    date_column <- "Date"
    date_format <- "%Y-%m-%d"
    description_column <- "Description"
    in_column <- "In"
    out_column <- "Out"

The report will automatically call the inst/R/process_transactions.R script to process the transactions data provided based upon the parameters set above.

Here is an example report that is generated based on the dummy data provided with this R package.

Generating dummy data

For ease of use, some dummy bank transactions data were generated along with a transaction type file. These were generated (and can be readily recreated using the inst/R/generate_dummy_data.R script) to provide examples of the input files.

While you are getting comfortable with this R package you can use these dummy data files as input, for example (as noted above) the example Rmarkdown report will by default point to these data files.

As noted above, you can recreate the dummy data using the inst/R/generate_dummy_data.R script. Within this script you can edit the characteristics of the dummy data by editing the transaction_types list here:

transaction_types <- list(
"Salary" = list(
"average_value" = 1200, "type" = "in",
"frequency" = "monthly", "day_of_month" = 5
),
"Food" = list(
"average_value" = 20, "type" = "out", "frequency" = "random",
"n_per_month" = 4,
"patterns" = c("supermarket", "market", "corner shop")
),
"Travel" = list(
"average_value" = 5, "type" = "out", "frequency" = "weekdays",
"patterns" = c("train", "bus", "car")
),
"Bills" = list(
"average_value" = 150, "type" = "out",
"frequency" = "monthly", "standard_deviation" = 0
),
"Subscriptions" = list(
"average_value" = 45, "type" = "out", "frequency" = "monthly",
"standard_deviation" = 0, "patterns" = c("films", "music")
),
"Rent" = list(
"average_value" = 300, "type" = "out",
"frequency" = "monthly", "standard_deviation" = 0
)
)

For each type of dummay data you create you can use the following parameters to create it's values specified within a list structure:

  • "average_value": average value (mean)
  • "standard_deviation": standard deviation from average value for transaction. Defaults to 10% of value.
  • "type": type of transaction ("in" (credit), or "out" (debit))
  • "frequency": frequency that transaction type seen in transactions. Expecting one of c("monthly", "weekly", "daily", "weekdays", "random", "once")
  • "day_of_month": if monthly, which day of month. Defaults to 1 (first day).
  • "n_per_month": if random frequency, on average how many transactions per month. Defaults to 4.
  • "day_of_week": if weekly, which day of week. Defaults to 1 (first day).
  • "patterns": patterns to use as transaction descriptions. Defaults to name of transaction type.

Precommit installation (for development)

The current repo uses a precommit continuous integration workflow. A precommit workflow triggers a set of task each time you commit any changed files. Here, the tasks mainly help with maintaining a standard coding style and spotting any minor mistakes in the code.

To install the workflow run the following: (more info here):

  • Install python precommit library with in the command line: pip install pre-commit
  • Install the precommit hooks (tasks) by:
    • Cloning the repository
    • Navigating to the respository in the command line and run: pre-commit install

Note you can interact with pre-commit from R directly using the pre-commit package

About

Code to classify bank account transactions based on key words

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages