creditcard_fraud

Detecting credit card fraud detection. Selecting an optimum threshold with analysis of confusion matrix and ROC curce

This is a dataset from Kaggle. As a typical fraud detection case, the data set is highly unballanced with a ratio of almost 600:1 for negative to positive cases.

Features for the data are confidential, and are thus just numerical features. They were calculated by applying PCA on the original feature set.

A typical approach to attack an imbalanced dataset is to - undersample and ajust the threshold based on a ROC analysis.

Undersample the dataset to train a logistic regression classifier on a dataset with 1:1 ratio of positive and negative labels.
Use the classifier on the entire dataset
Evaluate Recall, and draw ROC curves for different algorithms.
Optimize the Recall-Precision tradeoff as per your requirements.

Results: 0.87 recall on the complete dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
script.py		script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

creditcard_fraud

About

Releases

Packages

Languages

rahul2992/creditcard_fraud

Folders and files

Latest commit

History

Repository files navigation

creditcard_fraud

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages