Udacity Data Scientist Nanodegree Capstone Project. Overview
This is the capstone project for Udacity Data Science Nanodegree. Three simulated datasets were provided by Starbucks that were meant to evaluate the algorithm and not to mimic the behavior of real customers. The goal of project is open to be defined by students. For the purposes of this project, I defined three main goals:
- study the impact of demographic factors and offer characteristics on the customer’s response to an offer.
- build different models for predicting the user response to offer
- to evaluate the accuracy, precision, recall, and F1 value of these models
The results showed differences across gender, income, and age lines in the completion rates. In the second part of this article, I trained and evaluated three classification models for predicting the offer response rate. Gradient Boost and SVC showed an accuracy of 77% while Random Forest had a lower accuracy rate of 73%. In terms of precision, Gradient Boost showed the highest level (74%) while the other two algorithms resulted in a 72% precision. Finally, both SVC and Gradient Boost showed an F1-score of 0.79 while Random Forest had an F1-score of 0.74.
Please find the main results in this Medium post.
The code and data used in this project can be found in GitHub.
This code runs with Python version 3. Libraries needed for running the code are:
- scikit Learn
- numpy
- pandas
- matplotlib
- seaborn
- json
- tqdm
- math
├── datasets
| ├── portfolio.json
| ├── profile.json
| └──transcript.json
|
├── pickle files
| ├── customer_offer_df.pkl
| ├── customer_offer_df_ready_for_ml.pkl
| └──transcript.json
|
├── README.md
|
└── Starbucks_Capstone_notebook.ipynb
Dataset folder includes files provided by Starbucks for this project.
Pickle folder includes pre-processed data saved in the Pandas dataframe format.
- Dataset was provided by Starbucks.