This repository contains models built on Bank Marketing Data set available from UCI ML repository. The classification goal is to predict wheather a customer will accept the 'CD' (Certificate of Deposit) offer based on various customer related and previous campaign related data.
This notebook quickly performs the basic data exploration to ascertain the intrgrity of the data.
One of the motivation to study this particular problem is to show the end-to-end pipeline feature of the sklearn library. 'sklearn' is a remarkably well designed library which let's one quickly prototype a data flow pipeline and test a variety of machine learning models, by chaining a set of Estimators, Transformers, and Predictors. This notebook demonstrates the applications of the pipeline feature. 10 different models were tested on this particular dataset.
Business decision usually provides a better context for deciding how many False Positives vs. False Negatives are acceptable. Below is a plot between precision, recall, and f1 score plotted against various thresholds:
Among the machine learning models that were tested on this particular dataset, not so surprisingly, Light Gradient Boosting framework produced the best results. Obtained: Gini = 0.87, or equivalently, AUC = 0.93. The metrics/accuracy of the model is equivalent to the an analysis performed using CRISP-DM methodology.