Online Food Sales Analysis

Overview

This project aims to analyze customer retention in the online food sales industry. By employing various machine learning models and data preprocessing techniques, this project seeks to understand the factors that contribute to customer retention and churn. The dataset used in this project is sourced from Kaggle and contains various features related to customer transactions and interactions.

Dataset

The dataset used in this project is Data Source.

It contains the following features:

Age
Gender
Marital Status
Occupation
Monthly Income
Educational Qualifications
Family size
Latitude
Longitude
Pin code
Output
Feedback
Unnamed: 12

The main variables of interest is Output.

Requirements

The following libraries are required to run the notebook:

pandas
numpy
matplotlib
seaborn
scikit-learn

Key Features

Exploratory Data Analysis (EDA): The project features a stage of Exploratory Data Analysis (EDA), where we examine the data closely to identify customer reordering trends
Classification: The project employs a variety of models, including Logistic Regression, Random Forest,and K-Nearest Neighbors to predict if the customer is going to order again

Results

Logistic Regression:

a. Precision for predicting "Yes" (0.90) is higher than for predicting "No" (0.63),indicating that the model is better at correctly identifying positive cases.

b. Recall for predicting "Yes" (0.93) is higher than for predicting "No" (0.55),indicating that the model is better at capturing actual positive cases.

c. F1-score for predicting "Yes" (0.91) is high,indicating a good balance between precision and recall for positive cases.

d. ROC AUC score (0.857) is also quite good,indicating that the model performs well in distinguishing between positive and negative cases.
Random Forest:

a. Precision,recall,and F1-score for predicting "Yes" are all high (0.91,0.96,0.93 respectively),indicating that the model performs well in identifying positive cases.

b. Precision,recall,and F1-score for predicting "No" are lower compared to Logistic Regression,indicating that the model is not as good at predicting negative cases.

c. ROC AUC score (0.913) is higher than Logistic Regression,indicating better overall performance in distinguishing between positive and negative cases.
K-Nearest Neighbors:

a. Precision,recall,and F1-score for predicting "Yes" are high (0.92,0.95,0.93 respectively),similar to Random Forest.

b. Precision,recall,and F1-score for predicting "No" are lower compared to Logistic Regression and Random Forest.

c. ROC AUC score (0.839) is lower than both Logistic Regression and Random Forest,indicating that the model is not as effective in distinguishing between positive and negative cases.

Overall:

Random Forest has the highest F1-score and ROC AUC score,indicating better overall performance among the three models.
Logistic Regression performs reasonably well but is outperformed by Random Forest in most metrics.
K-Nearest Neighbors lags behind the other two models,particularly in distinguishing between positive and negative cases (ROC AUC score).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
eda-and-modeling.ipynb		eda-and-modeling.ipynb
eda-and-modeling.py		eda-and-modeling.py
onlinefoods.csv		onlinefoods.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Online Food Sales Analysis

Overview

Dataset

Requirements

Key Features

Results

Overall:

About

Releases

Packages

Languages

rohitkulkarni08/Online-Food-Sales-Customer-Retention

Folders and files

Latest commit

History

Repository files navigation

Online Food Sales Analysis

Overview

Dataset

Requirements

Key Features

Results

Overall:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages