The problem statement here is to predict whether a customer will leave the bank or retain in the bank based on the famous kaggle dataset which is bank_customer_churn_dataset.
I tried with Artificial Neural Networks , Logistic regression , K nearest neighbours algorithm and Random forest algorithm for this dataset after visualizing the dataset and performing some of the feature engineering tasks. Out of the all above models Random Forest yields 86% accuracy. Since the dataset is imbalanced, we can consider this accuracy as best only.
- CustomerId
- Surname
- CreditScore
- Geography - country of customer
- Gender
- Age
- Tenure - Total number of years with bank
- Balance
- NumOfProducts - Number of products or services utilising from bank
- HasCrCard - Utilising credit card or not ( Card - 1 , No card - 0 )
- IsActiveMember - Metric defining the member on the basis of his/her transactions (Active - 1 , Inactive - 0 )
- EstimatedSalary
Based on above columns we are predicting Exited column :
Exited - Left bank or not ( left bank - 1 , retained by bank - 0 )