This project implements an end-to-end machine learning pipeline to detect fraudulent financial transactions using classification models and domain-driven feature engineering. The goal is to maximize fraud detection accuracy while maintaining business-friendly false positive rates.
Financial fraud causes significant operational and monetary losses. This project focuses on building a scalable fraud detection system using historical transaction data and machine learning techniques.
Key objectives:
- Identify fraudulent transactions accurately
- Handle extreme class imbalance
- Optimize decision thresholds for business use
- Interpret important fraud-driving features
- Dataset: Financial Transaction Fraud Dataset
- Total Records: 1M+ transactions
- Target Variable: isFraud
- Class Distribution: Highly imbalanced
Note: The dataset file is not included in this repository due to GitHub file size limitations. Please download the dataset separately and place it inside the data folder.
- Python
- Pandas and NumPy
- Scikit-learn
- Matplotlib and Seaborn
- Jupyter Notebook
The following domain-driven features were engineered:
- Sender balance inconsistency
- Receiver balance inconsistency
These features help capture abnormal transaction behavior commonly associated with fraud.
Two machine learning models were implemented:
- Logistic Regression (Baseline)
- Random Forest Classifier (Final Model)
Class imbalance was handled using cost-sensitive learning and probability threshold tuning.
The models were evaluated using:
- Confusion Matrix
- ROC-AUC Score
- ROC Curve
- Precision-Recall Curve
- Fraud Probability Distribution
- Threshold Optimization
- Clone the repository
- Install dependencies using requirements.txt
- Launch Jupyter Notebook
- Open fraud_detection_analysis.ipynb
- Run all cells
- Achieved high ROC-AUC performance
- Improved fraud recall using threshold tuning
- Identified important fraud-driving features
- Built a production-ready fraud detection workflow
Samarveer Sah
Machine Learning and Data Science Enthusiast