Skip to content

R-codebase for a scientific research article, titled "The TruEnd-procedure: Treating trailing zero-valued balances in credit data"

License

Notifications You must be signed in to change notification settings

arnobotha/TruEnd-Procedure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The TruEnd-procedure: Treating trailing zero-valued balances in credit data

DOI

A novel procedure is presented for finding the true but latent endpoints within the repayment histories of individual loans. The monthly observations beyond these true endpoints are false, largely due to operational failures that delay account closure, thereby corrupting some loans in the dataset with `false' observations. Detecting these false observations is difficult at scale since each affected loan history might have a different sequence of zero (or very small) month-end balances that persist towards the end. Identifying these trails of diminutive balances would require an exact definition of a "small balance", which can be found using our so-called TruEnd-procedure. We demonstrate this procedure and isolate the ideal small-balance definition using residential mortgages from a large South African bank. Evidently, corrupted loans are remarkably prevalent and have excess histories that are surprisingly long, which ruin the timing of certain risk events and compromise any subsequent time-to-event model such as survival analysis. Excess histories can be discarded using the ideal small-balance definition, which demonstrably improves the accuracy of both the predicted timing and severity of risk events, without materially impacting the monetary value of the portfolio. The resulting estimates of credit losses are lower and less biased, which augurs well for raising accurate credit impairments under the IFRS 9 accounting standard. Our work therefore addresses a pernicious data error, which highlights the pivotal role of data preparation in producing credible forecasts of credit risk.

Structure

This R-codebase can be run sequentially using the file numbering itself as a structure. Delinquency measures are algorithmically defined in DelinqM.R as data-driven functions, which may be valuable to the practitioner outside of the study's current scope. These delinquency measures were formulated and empirically tested in Botha22, as part of a loss optimisation exercise of recovery decision times, as implemented in the corresponding R-codebase. A simulation study from Botha2021 also demonstrated these delinquency measures at length, with its corresponding R-codebase. Similarly, the TruEnd-procedure from Botha2024 and its corresponding R-codebase is implemented in the TruEnd.R script, which includes a small variety of functions related to running the TruEnd-procedure practically.

Data

This R-codebase assumes that monthly loan performance data is available. Naturally, the data itself can't be made publically available given its sensitive nature, as well as various data privacy laws, particularly the Protection of Personal Information (POPI) Act of 2013 in South Africa. However, the structure and type of data that is required for reproducing this study, is sufficiently described in the commentary within the scripts. This should enable the practitioner to extract and prepare data accordingly. Moreover, this codebase assumes South African macroeconomic data is available, as sourced and collated by internal staff of the bank in question.

Copyright

All code and scripts are hereby released under an MIT license. Similarly, all graphs produced by relevant scripts as well as those published here, are hereby released under a Creative Commons Attribution (CC-BY 4.0) licence.