This project was a group final project of Data Analytics Boot Camp which focused on how to increase Uber's order growth in 2015 by creating campaign suggestions using cluster analysis. The dataset contains Uber Inc's order data in New York City from January 2009 - June 2015. We found that orders were having a downward trend from 2012, and from that trend, we aimed to increase the orders by creating customer segmentation using K-means clustering analysis (Silhouette & Elbow Method), and to reduce the dimensionality of the dataset we used Principal Component Analysis (PCA) for better interpretation & visualization. On top of that, we created the suggested campaign for each customer segmentation, and also created the prediction of the order increase using Timeseries-Analysis Prophet if Uber manages to execute the campaigns successfully. Finally, we created a dashboard for reporting each metric and important variable using Tableau.
The Dataset used was Uber Inc Order's data in New York City from January 2009 to June 2015 (200,000 orders/transactions)
Data Features consist of order id, fare of each trip, pickup datetime, passenger count, and coordinates of pickup and drop off location.
How to increase Uber’s order growth by 3% within a year from 2014?
For the project purpose, 2015 would become the year that Uber should implement the suggested campaign, since we only had the data from January - June, the campaign focused on how to increase the orders in July - December, so Uber could increase ~ 3% of the total orders in 2015
Data Preparation, Outlier and Null Cleaning are fully done with Python (Google Colaboratory). In order to find any trends or patterns, we did Exploratory Data Analysis (EDA) using Google Colaboratory which consist of analysis based on the distance, based on the time, based on month & days of the orders.
To segment the users, we used K-means clustering using Elbow Method and Silhouette Analysis. The variables we included in the analysis are total fare, number of passangers, distance, and days of the orders. For the categorical variables, we applied one-hot encoding to the data in order to convert it to numerical data. Thus, we can do clustering analysis.
And to reduce the dimensionality of the dataset we used Principal Component Analysis (PCA) for better interpretation & visualization, as shown in the graphic below
From above analysis, we could say that the users could be divided into 8 clusters that we could summarize it based on the variable that distinguised them, i.e day category (weekdays and weekend)
Since the cluster was considered too many to be applied campaign suggestion, We analyzed some variables that we could find the simmilarity among the 8 clusters:
- The day category of the trips Total trips in each cluster shows that uber is dominated by user who like to go trips in weekdays (typically it is routine activities).
- The peak hours of each day category
The peak hours in weekdays
The peak hours in weekend
- Average order value (AOV) All clusters have the same average trip value (~USD 10), except for cluster 4 & 5 which have slightly higher value than the others.
- The average of distance All cluster have the same distance level (near level <= 5 KM) and the average is ~3 KM, except for cluster 4 & 5.
- The average of passenger All clusters have the same average of total passenger (~2). This is correlated with the type of uber that user might be ordered.
We could narrowed the cluster into 3 groups for the campaign suggestion purposes based on above analysis,
Here, the suggested campaign based on the cluster grouping above,
We also created a dashboard of this project with Tableau https://public.tableau.com/app/profile/cessa.mutiara.aziz/viz/shared/BYS6Z2QZQ
For the expected outcome of the suggested campaign, we used forecasting analysis with Prophet method since it works best with time series that have strong seasonal effects and several seasons of historical data. Also it is robust to missing data and shifts in the trend, and typically handles outliers well.
As we know, since the trend has down trend, so it does the forecasting for the some times in the future as shown in the graph belows:
Thus, to create an expected outcome if Uber can successfully implement the campaign, we decided to create a new data that the number of orders has been increased by 35% in the 2nd Semester of 2015 (from the result of the forecasting) to give a total increase ~ 3% in 1 year (2015).
The analysis can be more comprehensive if we have revenue and cost data to do profit analysis and we can do a more detail & targeted of customer segmentation analysis if we can access more information of the users.