Welcome to the Loan Prediction Machine Learning Project repository! This project focuses on predicting loan eligibility based on various customer attributes. By leveraging classification algorithms, we aim to develop a robust model that accurately assesses whether a customer qualifies for a loan.
Our project encompasses the following key stages:
-
Data Exploration: We conducted an in-depth analysis of the dataset to understand the impact of attributes like marital status, education, and employment status on loan eligibility.
-
Data Preprocessing: We handled missing values, outliers, and encoded categorical variables to prepare the data for model training.
-
Feature Analysis: We explored trends and correlations in the data, uncovering insights into factors affecting loan approvals.
-
Model Implementation: We implemented and compared classification models, including Logistic Regression and K-Nearest Neighbors (KNN), to determine the best-performing algorithm for loan prediction.
-
Performance Evaluation: The models were evaluated using accuracy, precision, recall, and F1-score metrics. Logistic Regression emerged as the superior model with an accuracy of 79% compared to KNN's 77%.
-
Predicting Loan Status: We used the Logistic Regression model to predict loan status for new customers and analyzed the results to gain further insights.
-
Logistic Regression: A powerful classification model that achieved an accuracy of 79%, effectively identifying eligible and ineligible loan applicants.
-
K-Nearest Neighbors (KNN): An alternative classification model evaluated using GridSearchCV to find the optimal number of neighbors. Although KNN performed well, Logistic Regression proved to be more accurate.
-
Data Insights: Analyzed patterns in the new customer data to identify trends such as the percentage of married individuals in semiurban areas who obtained loans.
-
Visualization: Utilized visualizations to understand the distribution of loan status across different attributes like marital status and employment.
- Clone the Repository:
git clone https://github.com/yourusername/loan-prediction-project.git
- Navigate to the Project Directory:
cd loan-prediction-project
- Install Dependencies:
pip install -r requirements.txt
- Run Data Analysis and Model Training Scripts:
To explore and preprocess the data:
jupyter notebook data_analysis.ipynb
To train and evaluate the models:
python train_models.py
-
Logistic Regression:
- Accuracy: 79%
- Precision:
- Class 0: 0.95
- Class 1: 0.76
- Recall:
- Class 0: 0.40
- Class 1: 0.99
- F1-score:
- Class 0: 0.56
- Class 1: 0.86
-
K-Nearest Neighbors (KNN):
- Accuracy: 77%
- Precision:
- Class 0: 0.81
- Class 1: 0.76
- Recall:
- Class 0: 0.42
- Class 1: 0.95
- F1-score:
- Class 0: 0.55
- Class 1: 0.84
- The Logistic Regression model demonstrates superior performance compared to K-Nearest Neighbors (KNN) with a higher accuracy of 79% versus 77%.
- Key insights from the new customer data reveal significant trends, such as the percentage of married individuals in semiurban areas who secured loans.