creditcard_fraud

Detecting credit card fraud detection. Selecting an optimum threshold with analysis of confusion matrix and ROC curce

This is a dataset from Kaggle. As a typical fraud detection case, the data set is highly unballanced with a ratio of almost 600:1 for negative to positive cases.

Features for the data are confidential, and are thus just numerical features. They were calculated by applying PCA on the original feature set.

A typical approach to attack an imbalanced dataset is to - undersample and ajust the threshold based on a ROC analysis.

Undersample the dataset to train a logistic regression classifier on a dataset with 1:1 ratio of positive and negative labels.
Use the classifier on the entire dataset
Evaluate Recall, and draw ROC curves for different algorithms.
Optimize the Recall-Precision tradeoff as per your requirements.

Results: 0.87 recall on the complete dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

creditcard_fraud

Files

README.md

Latest commit

History

README.md

File metadata and controls

creditcard_fraud