This project identifies fraudulent transactions using unsupervised clustering with DBSCAN.
Without needing labeled data, it detects frauds based on density anomalies, and evaluates them against real labels.
Visual inspection using PCA and metrics like Silhouette Score and Confusion Matrix validate the model.
- 📈 DBSCAN Clustering (Unsupervised Anomaly Detection)
- 🧮 Z-score Standardization for feature scaling
- 🧠 PCA Visualization (2D) of fraud and normal clusters
- 📊 Silhouette Score + Confusion Matrix + Classification Report
- 💾 Saved Model & Scaler for deployment use
- 📍 Source: Kaggle – Credit Card Fraud Detection
- 📦 Total Records: 284,807
⚠️ Class Distribution: 0 = Normal, 1 = Fraud (~0.17%)- Features: 28 anonymized (V1–V28) , Time and Amount
| Category | Tools & Libraries |
|---|---|
| Programming | Python 3.8+ |
| Data Handling | pandas, numpy |
| ML Algorithms | scikit-learn (DBSCAN, PCA, StandardScaler) |
| Evaluation | sklearn.metrics |
| Visualization | matplotlib, seaborn |
| Model Saving | joblib |
- ✅ Silhouette Score: Measures cluster compactness and separation.
- 📉 Confusion Matrix: Evaluates predictions vs actual frauds.
- 📋 Classification Report: Includes precision, recall, and F1-score.
📍 High precision: Most predicted anomalies were actual frauds.
⚠️ Moderate recall: Common in unsupervised anomaly detection, as some frauds may not appear as clear outliers.
git clone https://github.com/ShubhamS2005/FraudAnomalyDetection.git
cd anomaly-detection-dbscanpython -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activatepip install -r requirements.txtjupyter notebook anomaly_detection.ipynbDeveloped by Shubham Srivastava
If you use or adapt this work, please consider citing or giving credit.
This project is currently under development and does not have a formal license.
A suitable open-source license (e.g., MIT, Apache 2.0) will be added upon final release.