performance analysis of credit card fraud detection using various ML techniques with Feature selection technique - Sequential Forward Selection (SFS)
Financial fraud nowadays is a continually growing threat with far-reaching consequences in the finance industry. Data mining has been playing an important role in the detection of credit card fraud in online transactions. Credit card fraud detection becomes challenging because of significant reasons. Such as the profiles of normal and fraudulent behaviors change frequently, scarcity of credit card fraud data, credit card fraud data sets are highly imbalanced, and so on. The efficiency of fraud detection in credit card transactions is greatly impacted by the data set sampling method, features selection, and detection technique(s). This study investigates the performance of Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Extra Trees Classifier (ETC), Support Vector Machine (SVM), Gradient Boosting Classifier (GBC), XGBoost Classifier (XGB), AdaBoost Classifier (ABC), Naive Bayes (NB) and K-Nearest Neighbor (KNN) on credit card fraud data in terms of feature selection. In this study, feature selection is done by the Sequence Forward Selection ("SFS") algorithm. This study also extends the handling of highly imbalanced credit card fraud data using "Random under-sampling" and feature scaling using "Robust Scalar." Different machine learning techniques' performance is evaluated based on accuracy, precision, recall, F1-measure, and ROC score. Performance evaluation of different machine learning techniques is justified using a benchmark credit card fraud detection dataset available in Kaggle.