E-commerce Customer Behavior Analysis and Segmentation

This project analyzes customer behavior and segments customers in an e-commerce context using machine learning techniques. The dataset includes information about customers' demographics, product preferences, and purchasing patterns.

Introduction

Understanding customer behavior is crucial for e-commerce businesses to personalize marketing strategies, improve customer satisfaction, and optimize product offerings. This project leverages machine learning algorithms to predict customer purchases and segment customers based on their characteristics.

Dataset Overview

The dataset includes the following columns:

Customer ID
Gender
Age
Salary
Product ID
Price
Purchased (Target variable)

Project Goals

Customer Purchase Prediction:
- Build and evaluate machine learning models to predict whether a customer will purchase a product.
Customer Segmentation:
- Utilize clustering algorithms to segment customers based on demographic and behavioral attributes.
Feature Importance Analysis:
- Determine which factors (e.g., Age, Salary, Product Price) influence customer purchasing decisions the most.

Methods and Techniques

Data Preprocessing

Encoding categorical variables (Gender, Product ID)
Scaling numerical features (Age, Salary, Price)

Model Evaluation

Two different splitting methods were used to evaluate model performance:

Stratified Splitting

Model	Accuracy	Precision	Recall	F1 Score
LogReg	0.820	0.783	0.783	0.783
DecTree	0.780	0.747	0.711	0.728
RandFor	0.785	0.750	0.723	0.736
SVM	0.790	0.773	0.699	0.734
NaiveBay	0.780	0.760	0.687	0.722
KNN	0.765	0.725	0.699	0.712

Default Random Splitting

Model	Accuracy	Precision	Recall	F1 Score
LogReg	0.815	0.753	0.782	0.767
DecTree	0.750	0.679	0.679	0.679
RandFor	0.770	0.716	0.679	0.697
SVM	0.795	0.740	0.731	0.735
NaiveBay	0.795	0.734	0.744	0.739
KNN	0.760	0.679	0.731	0.704

Results

Stratified splitting generally improves model performance metrics compared to default random splitting.
Logistic Regression and SVM show more consistent results across both splitting methods.
Decision Tree and Random Forest perform better with stratified splitting, indicating the importance of balanced class distributions in training and testing sets.

Usage

Requirements

Python 3.x
Libraries: pandas, numpy, scikit-learn, matplotlib, seaborn

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
E-commerce-Customer-Behavior-Analysis-and-Segmentation.ipynb		E-commerce-Customer-Behavior-Analysis-and-Segmentation.ipynb
New Text Document.txt		New Text Document.txt
README.md		README.md
Using stratified sampling for splitting E-commerce-Customer-Behavior-Analysis-and-Segmentation.ipynb		Using stratified sampling for splitting E-commerce-Customer-Behavior-Analysis-and-Segmentation.ipynb
purchase_history.csv		purchase_history.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-commerce Customer Behavior Analysis and Segmentation

Table of Contents

Introduction

Dataset Overview

Project Goals

Methods and Techniques

Data Preprocessing

Model Evaluation

Stratified Splitting

Default Random Splitting

Results

Usage

Requirements

About

Releases

Packages

Languages

alhadikhan/E-commerce-Customer-Behavior-Analysis-and-Segmentation

Folders and files

Latest commit

History

Repository files navigation

E-commerce Customer Behavior Analysis and Segmentation

Table of Contents

Introduction

Dataset Overview

Project Goals

Methods and Techniques

Data Preprocessing

Model Evaluation

Stratified Splitting

Default Random Splitting

Results

Usage

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages