- /Notebooks
One notebook with basic postgres usage and another with the bulk of the project in walkthrough format - streamlit_app.py
The guts of a streamlit app for fine-tuning the model using prediction threshold - helper_functions.py
Functions used in cleaning the data - card_fraud_predictions.pdf
The slides for the project presentation - app_preview.mov
A video preview of the streamlit app
Extras: Read the blog post.
This repository contains a working model to predict credit card fraud based on a Kaggle dataset provided by the Vesta corporation. The final model produced is an XG Boost classifier model that predicts a binary of 1 for a fraudulent transaction and 0 for valid transaction.
- Target Variable: Fraud or Valid
- Features: Matched information, timedelta, transaction amount, debit vs. credit, product code, general card information
- PostgreSQL
- XG Boost
- Logistic Regression
- Random Oversampler
- SMOTE
- Streamlit
- Seaborn
- Matplotlib
Vesta Corporation put out this dataset to encourage data scientists to help with the fight against credit card fraud. In 2018, the worldwide cost of credit card fraud was over $24 billion. With this knowledge, I hope my work, or the work of other data scientists exploring this dataset, will be able to aid in the fight again fraudulent transactions.
Below is an image of the ROC curve from my final XGBoost model.