Skip to content

Samarveersah/Fraud_detection_system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Transaction Fraud Detection System

This project implements an end-to-end machine learning pipeline to detect fraudulent financial transactions using classification models and domain-driven feature engineering. The goal is to maximize fraud detection accuracy while maintaining business-friendly false positive rates.

Project Overview

Financial fraud causes significant operational and monetary losses. This project focuses on building a scalable fraud detection system using historical transaction data and machine learning techniques.

Key objectives:

  • Identify fraudulent transactions accurately
  • Handle extreme class imbalance
  • Optimize decision thresholds for business use
  • Interpret important fraud-driving features

Dataset Information

  • Dataset: Financial Transaction Fraud Dataset
  • Total Records: 1M+ transactions
  • Target Variable: isFraud
  • Class Distribution: Highly imbalanced

Note: The dataset file is not included in this repository due to GitHub file size limitations. Please download the dataset separately and place it inside the data folder.

Technologies Used

  • Python
  • Pandas and NumPy
  • Scikit-learn
  • Matplotlib and Seaborn
  • Jupyter Notebook

Feature Engineering

The following domain-driven features were engineered:

  • Sender balance inconsistency
  • Receiver balance inconsistency

These features help capture abnormal transaction behavior commonly associated with fraud.

Models Implemented

Two machine learning models were implemented:

  • Logistic Regression (Baseline)
  • Random Forest Classifier (Final Model)

Class imbalance was handled using cost-sensitive learning and probability threshold tuning.

Model Evaluation

The models were evaluated using:

  • Confusion Matrix
  • ROC-AUC Score
  • ROC Curve
  • Precision-Recall Curve
  • Fraud Probability Distribution
  • Threshold Optimization

How To Run The Project

  1. Clone the repository
  2. Install dependencies using requirements.txt
  3. Launch Jupyter Notebook
  4. Open fraud_detection_analysis.ipynb
  5. Run all cells

Results Summary

  • Achieved high ROC-AUC performance
  • Improved fraud recall using threshold tuning
  • Identified important fraud-driving features
  • Built a production-ready fraud detection workflow

Author

Samarveer Sah
Machine Learning and Data Science Enthusiast

About

End-to-end fraud detection system using machine learning with threshold optimization and business-driven feature engineering

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors