Skip to content

Latest commit

 

History

History
147 lines (135 loc) · 5.53 KB

File metadata and controls

147 lines (135 loc) · 5.53 KB

Ecommerce-Shipping-Clasification-Modeling

Data source : https://www.kaggle.com/prachi13/customer-analytics

Table of Contents
  1. Project Overview
  2. Background and Problem
  3. Exploratory Data Analysis
  4. Data Processing
  5. Modelling
  6. Business Recommendations

Project Overview

• Stage 1 : We focus on Data Exploration, Exploratory Data Analysis, Business Insight and Visualization
• Stage 2 : We focus on Data Cleansing and Feature Engineering
• Stgae 3 : And then on last stage, We focus Modeling and Evaluation

Overall Project :
• Seek insight from the dataset with Exploratory Data Analysis
• Performed data cleansing, data processing, data engineering to prepare data before modeling
• Built a model to predict whether the shipping deliveries will be received late or on time by the customers
• Developt recommendations & benefit analysis based on insights and model prediction

Background and Problem

An international e-commerce company that sells electronic products want to discover key insights from their customer database. Currently, most of the shipping deliveries are late.

Exploratory Data Analysis

Variable Type Definition Example
ID Nominal Customer ID Number 10, 15, 10995, 10996
Warehouse_block Nominal Warehouse to Store the Product A, B, C, D, F
Mode_of_Shipment Nominal Mode of Product Shipping Flight, Road, Ship
Customer_care_calls Discrete Number of Calls Made 1, 2, 5, 6
Customer_rating Ordinal Company Rating by Customers 5: Best - 4: Better - 3: Neutral - 2: Bad - 1: Worst
Cost_of_the_Product Discrete Cost of Product in US Dollars 177, 216, 236, 182
Prior_purchases Discrete Number of Prior Purchase 3, 2, 6
Product_importance Ordinal Product Importance Parameter Low, Medium, High
Gender Nominal Customer Gender Male, Female
Discount_offered Discrete Product Discount in US Dollars 65, 10, 16
Weight_in_gms Continous Product Weight in grams 4953, 5676, 2171
Reached.on.Time_Y.N Nominal Target Variable, 1: NOT reached on time - 0: REACHED on time 1, 0
  1. 59.7% of e-commerce shipping deliveries are late received by the customers (6.563 of 10.999 customers). image

  2. Ship & Warehouse F has the highest frequency of delivery. But it looks almost the same based on the percentage. There's an assumtion that the late is influenced by other factors. image

  3. Every product that gets a discount above 10 is confirmed Late. There is an assumption that this happens in specific months, but needs further checking. image

  4. Shipping delivery is confirmed late when the product weight is between 2-4 kg. image



Data Processing

• Check missing & duplicate values
• Remove outliers with z-score
• Ordinal encoding for Importance column & feature encoding the rest of categorical columns
• Select best features for modeling
• Normalize & Standarize all selected features


Modelling

• Split features & target
• Split data into data train & data test
• Train model with 5 different algorithm such as Decision Tree, Logistic Regression, Random Forest, XGBoost , KNN, & Lightgbm
• Evaluate model with Accuracy, Precision, Recall, F1-Score and AUC and focus on AUC Score
• Hyperparameter tuning
• Select the best model


Model Evaluation


Model Accuracy Precision Recall F1-Score AUC
Decision Tree 0.65 0.72 0.66 0.69 0.65
Logistic Regression 0.58 0.58 1.00 0.73 0.50
lightgbm 0.66 0.76 0.60 0.67 0.739
KNN 0.66 0.78 0.56 0.65 0.67
Random Forest 0.68 0.82 0.56 0.67 0.70
XGBoost 0.65 0.71 0.67 0.69 0.65
Based on the model evaluation (AUC Score And Recall), We Choose Decision Tree algorithm .

Business Recommendations

Short terms

• Add estimatedarrival time to assure the package arrived on time
• Give credit points as a compensations to retain customer loyalty

Long terms

• Add more features to give more specific & accurate insights
• Perform operational audit based on the insights