-
Notifications
You must be signed in to change notification settings - Fork 0
/
Uzoigwe Foodhub Project.py
636 lines (419 loc) · 24.5 KB
/
Uzoigwe Foodhub Project.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
# -*- coding: utf-8 -*-
"""FoodHub_Project_FullCode_Rithul.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1MmfYnUTZKhSamzjn2EfSYxilVcJFst0F
## Note: This is a sample solution for the project. Projects will NOT be graded on the basis of how well the submission matches this sample solution. Projects will be graded on the basis of the rubric only.
# Project Python Foundations: FoodHub Data Analysis
**Marks: 60**
### Context
The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.
The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.
### Objective
The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.
### Data Description
The data contains the different data related to a food order. The detailed data dictionary is given below.
### Data Dictionary
* order_id: Unique ID of the order
* customer_id: ID of the customer who ordered the food
* restaurant_name: Name of the restaurant
* cuisine_type: Cuisine ordered by the customer
* cost_of_the_order: Cost of the order
* day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
* rating: Rating given by the customer out of 5
* food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
* delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information
### Let us start by importing the required libraries
"""
# Commented out IPython magic to ensure Python compatibility.
# import libraries for data manipulation
import numpy as np
import pandas as pd
# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# %matplotlib inline
"""### Understanding the structure of the data"""
# read the data
# df = pd.read_csv('foodhub_order.csv')
# # returns the first 5 rows
# df.head()
# Reading the data-set into Google colab
from google.colab import files
uploaded = files.upload()
# Getting the csvs in a data-frame
#--This is required only if Google Colab is being used
import io
df = pd.read_csv(io.BytesIO(uploaded['foodhub_order.csv']))# Dataset is now stored in a Pandas Dataframe
#honeyprod = pd.read_csv("honeyproduction1998-2016.csv")
"""#### Observations:
The DataFrame has 9 columns as mentioned in the Data Dictionary. Data in each row corresponds to the order placed by a customer.
### **Question 1:** How many rows and columns are present in the data? [0.5 mark]
"""
# check the shape of the dataset
df.shape
"""#### Observations:
* The DataFrame has 1898 rows and 9 columns.
### **Question 2:** What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]
"""
# use info() to print a concise summary of the DataFrame
df.info()
"""#### Observations:
* There are a total of 1898 non-null observations in each of the columns.
* The dataset contains 9 columns: 4 are of integer type ('order_id', 'customer_id', 'food_preparation_time', 'delivery_time'), 1 is of floating point type ('cost_of_the_order') and 4 are of the general object type ('restaurant_name', 'cuisine_type', 'day_of_the_week', 'rating').
* Total memory usage is approximately 133.6 KB.
### **Question 3:** Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]
"""
# Checking for missing values
df.isnull().sum()
df.head()
"""#### Observations:
* There are no missing values in the data.
### **Question 4:** Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]
"""
# get the summary statistics of the numerical data
df.describe()
#--Convert from scientific to non-scientific
pd.set_option('display.float_format', lambda x: '%.2f' % x)
df.describe().T
#--pd.set_option is not mandatory, it just helps with readability--#
"""#### Observations:
* Order ID and Customer ID are just identifiers for each order.
* The cost of an order ranges from 4.47 to 35.41 dollars, with an average order costing around 16 dollars and a standard deviation of 7.5 dollars. The cost of 75% of the orders are below 23 dollars. This indicates that most of the customers prefer low-cost food compared to the expensive ones.
* Food preparation time ranges from 20 to 35 minutes, with an average of around 27 minutes and a standard deviation of 4.6 minutes. The spread is not very high for the food preparation time.
* Delivery time ranges from 15 to 33 minutes, with an average of around 24 minutes and a standard deviation of 5 minutes. The spread is not too high for delivery time either.
### **Question 5:** How many orders are not rated? [1 mark]
"""
df['rating'].value_counts()
#--Alternate solution
#--Filter the data and get a count
df_not_rated = df[df['rating'] == 'Not given']
df_not_rated.shape
df[(df['rating'] == 'Not given')]['rating'].value_counts()
"""#### Observations:
* There are 736 orders that are not rated.
### Exploratory Data Analysis (EDA)
### Univariate Analysis
### **Question 6:** Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]
#### Order ID
"""
# check unique order ID
df['order_id'].nunique()
"""#### Observations:
* There are 1898 unique orders. As mentioned earlier, 'order_id' is just an identifier for the orders.
#### Customer ID
"""
# check unique customer ID
df['customer_id'].nunique()
"""#### Observations:
* There are 1200 unique customers. As 'customer_id' is a variable to identify customers, and the number of unique customer IDs is less than the number of unique order IDs, we can see that there are some customers who have placed more than one order.
#### Restaurant name
"""
# check unique restaurant name
df['restaurant_name'].nunique()
"""#### Observations:
* There are 178 unique restaurants in the dataset.
* Let's check the number of orders that get served by the restaurants.
"""
df['restaurant_name'].value_counts()
df['restaurant_name'].value_counts(normalize = True)
#--Alternate solution---#
#--Use groupby
df.groupby('restaurant_name')['restaurant_name'].count().sort_values(ascending = False)
"""#### Observations:
* The restaurant that has received maximum number of orders is Shake Shack
#### Cuisine type
"""
# check unique cuisine type
df['cuisine_type'].nunique()
plt.figure(figsize = (15,5))
sns.countplot(data = df, x = 'cuisine_type');
#--Sort the plot
plt.figure(figsize = (15,5))
sns.countplot(data = df, x = 'cuisine_type', order = df['cuisine_type'].value_counts().index);
"""#### Observations:
* There are 14 unique cuisines in the dataset.
* The distribution of cuisine types show that cuisine types are not equally distributed.
* The most frequent cuisine type is American followed by Japanese and Italian.
* Vietnamese appears to be the least popular of all the cuisines.
#### Cost of the order
"""
sns.histplot(data=df,x='cost_of_the_order')
plt.show()
sns.boxplot(data=df,x='cost_of_the_order')
plt.show()
"""#### Observations:
* The average cost of the order is greater than the median cost indicating that the distribution for the cost of the order is right-skewed.
* The mode of the distribution indicates that a large chunk of people prefer to order food that costs around 10-12 dollars.
* There are few orders that cost greater than 30 dollars. These orders might be for some expensive meals.
#### Day of the week
"""
# check the unique values
df['day_of_the_week'].value_counts()
sns.countplot(data = df, x = 'day_of_the_week')
"""#### Observations:
* The 'day_of_the_week' columns consists of 2 unique values - Weekday and Weekend
* The distribution shows that number of order placed on weekends is approximately double the number of orders placed on weekdays.
#### Rating
"""
# check the unique values
df['rating'].value_counts()
sns.countplot(data = df, x = 'rating')
"""#### Observations:
* The distribution of 'rating' shows that the most frequent rating category is 'not given', followed by a rating of 5.
* Only around 200 orders have been rated 3.
#### Food Preparation time
"""
sns.histplot(data=df,x='food_preparation_time')
plt.show()
sns.boxplot(data=df,x='food_preparation_time')
plt.show()
"""#### Observations:
* The average food preparation time is almost equal to the median food preparation time indicating that the distribution is nearly symmetrical.
* The food preparation time is pretty evenly distributed between 20 and 35 minutes.
* There are no outliers in this column.
#### Delivery time
"""
sns.histplot(data=df,x='delivery_time')
plt.show()
sns.boxplot(data=df,x='delivery_time')
plt.show()
"""#### Observations:
* The average delivery time is a bit smaller than the median delivery time indicating that the distribution is a bit left-skewed.
* Comparatively more number of orders have delivery time between 24 and 30 minutes.
* There are no outliers in this column.
### **Question 7:** Which are the top 5 restaurants in terms of the number of orders received? [1 mark]
"""
# Get top 5 restaurants with highest number of orders
df['restaurant_name'].value_counts()[:5]
#--Using group by
df_top_5 = df.groupby(['restaurant_name'])['restaurant_name'].count().sort_values(ascending = False)
df_top_5.head()
"""#### Observations:
* Top 5 popular restaurants that have received the highest number of orders **'Shake Shack', 'The Meatball Shop', 'Blue Ribbon Sushi', 'Blue Ribbon Fried Chicken' and 'Parm'**.
* Almost 33% of the orders in the dataset are from these restaurants.
### **Question 8:** Which is the most popular cuisine on weekends? [1 mark]
"""
# Get most popular cuisine on weekends
df_weekend = df[df['day_of_the_week'] == 'Weekend']
df_weekend['cuisine_type'].value_counts()
#--Alternate solution---#
#--Get a countplot
plt.figure(figsize = (15,5))
sns.countplot(data = df_weekend, x = 'cuisine_type', order = df_weekend['cuisine_type'].value_counts().index)
#--Alternate solution 2--#
#--Get a countplot from the original data-frame
plt.figure(figsize = (15,5))
sns.countplot(data = df, x = 'cuisine_type', hue = 'day_of_the_week', order = df_weekend['cuisine_type'].value_counts().index)
"""#### Observations:
* The most popular cuisine type on weekends is American.
### **Question 9:** What percentage of the orders cost more than 20 dollars? [2 Marks]
"""
# Get orders that cost above 20 dollars
df_greater_than_20 = df[df['cost_of_the_order'] > 20]
# Calculate the number of total orders where the cost is above 20 dollars
print('The number of total orders that cost above 20 dollars is:', df_greater_than_20.shape[0])
# Calculate percentage of such orders in the dataset
percentage = (df_greater_than_20.shape[0] / df.shape[0]) * 100
print("Percentage of orders above 20 dollars:", round(percentage, 2), '%')
#--Using a new column with np.where
df['more_than_20'] = np.where(df['cost_of_the_order'] > 20, 1, 0)
percentage_1 = (df['more_than_20'].sum() / df.shape[0])*100
print("Percentage of orders above 20 dollars:", round(percentage_1, 2), '%')
"""#### Observations:
* There are a total of 555 orders that cost above 20 dollars.
* The percentage of such orders in the dataset is around 29.24%.
### **Question 10:** What is the mean order delivery time? [1 mark]
"""
# get the mean delivery time
print('The mean delivery time for this dataset is', round(df['delivery_time'].mean(), 2), 'minutes')
"""#### Observations:
* The mean delivery time is around 24.16 minutes.
### **Question 11:** The company has decided to give 20% discount vouchers to the top 5 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]
"""
# Get the counts of each customer_id
df['customer_id'].value_counts().head()
"""#### Observations:
- Customer with ID 52832 has ordered 13 times.
### Multivariate Analysis
### **Question 12:** Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]
#### Cuisine vs Cost of the order
"""
# Relationship between cost of the order and cuisine type
plt.figure(figsize=(15,7))
sns.boxplot(x = "cuisine_type", y = "cost_of_the_order", data = df, palette = 'PuBu')
plt.xticks(rotation = 60)
plt.show()
#--What is the most expensive cuisine
df_grouped = df.groupby('cuisine_type')['cost_of_the_order'].mean().sort_values(ascending = False).reset_index()
df_grouped
"""#### Observations:
* Vietnamese and Korean cuisines cost less compared to other cuisines.
* The boxplots for Italian, American, Chinese, Japanese cuisines are quite similar. This indicates that the quartile costs for these cuisines are quite similar.
* Outliers are present for the cost of Korean, Mediterranean and Vietnamese cuisines.
* French and Spanish cuisines are costlier compared to other cuisines.
#### Cuisine vs Food Preparation time
"""
# Relationship between food preparation time and cuisine type
plt.figure(figsize=(15,7))
sns.boxplot(x = "cuisine_type", y = "food_preparation_time", data = df, palette = 'PuBu')
plt.xticks(rotation = 60)
plt.show()
"""#### Observations:
* Food preparation time is very consistent for most of the cuisines.
* The median food preparation time lies between 24 and 30 minutes for all the cuisines.
* Outliers are present for the food preparation time of Korean cuisine.
* Korean cuisine takes less time compared to the other cuisines.
#### Day of the Week vs Delivery time
"""
# Relationship between day of the week and delivery time
plt.figure(figsize=(15,7))
sns.boxplot(x = "day_of_the_week", y = "delivery_time", data = df, palette = 'PuBu')
plt.xticks(rotation = 60)
plt.show()
"""#### Observations:
* The delivery time for all the orders over the weekends is less compared to weekdays. This could be due to the dip in traffic over the weekends.
#### Revenue generated by the restaurants
"""
plt.figure(figsize = (15, 7))
df.groupby(['restaurant_name'])['cost_of_the_order'].sum().sort_values(ascending = False).head(14)
"""#### Observations:
* The above 14 restaurants are generating more than 500 dollars revenue.
#### Rating vs Delivery time
"""
# Relationship between rating and delivery time
plt.figure(figsize=(15, 7))
sns.pointplot(x = 'rating', y = 'delivery_time', data = df)
plt.show()
"""#### Observations:
* It is possible that delivery time plays a role in the low-rating of the orders.
#### Rating vs Food preparation time
"""
# Relationship between rating and food preparation time
plt.figure(figsize=(15, 7))
sns.pointplot(x = 'rating', y = 'food_preparation_time', data = df)
plt.show()
"""#### Observations:
* It seems that food preparation time does not play a role in the low-rating of the orders.
#### Rating vs Cost of the order
"""
# Relationship between rating and cost of the order
plt.figure(figsize=(15, 7))
sns.pointplot(x = 'rating', y = 'cost_of_the_order', data = df)
plt.show()
"""#### Observations
* It seems that high-cost orders have been rated well and low-cost orders have not been rated.
#### Correlation among variables
"""
# plot the heatmap
col_list = ['cost_of_the_order', 'food_preparation_time', 'delivery_time']
plt.figure(figsize=(15, 7))
sns.heatmap(df[col_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()
"""#### Observations:
* There is no correlation between cost of the order, delivery time and food preparation time.
### **Question 13:** The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]
"""
# filter the rated restaurants
df_rated = df[df['rating'] != 'Not given'].copy()
# convert rating column from object to integer
df_rated['rating'] = df_rated['rating'].astype('int')
# create a dataframe that contains the restaurant names with their rating counts
df_rating_count = df_rated.groupby(['restaurant_name'])['rating'].count().sort_values(ascending = False).reset_index()
df_rating_count.head()
# get the restaurant names that have rating count more than 50
rest_names = df_rating_count[df_rating_count['rating'] > 50]['restaurant_name']
# filter to get the data of restaurants that have rating count more than 50
df_mean_4 = df_rated[df_rated['restaurant_name'].isin(rest_names)].copy()
# find the mean rating of the restaurants
df_mean_4.groupby(df_mean_4['restaurant_name'])['rating'].mean().sort_values(ascending = False).reset_index()
#--Question 13
#--Alternate solution using merge---#
#--Step 1: Get average rating and filter > 4
#--Get the average rating and filter out for > 4
df_rating_filtered = df[df['rating'] != 'Not given'].copy() #--Remove 'not given'
df_rating_filtered['rating'] = df_rating_filtered['rating'].astype('int') #--Convert to integer
df_avg_rating = df_rating_filtered.groupby(['restaurant_name'])['rating'].mean().reset_index() #--Get average rating by restaurant name
df_avg_rating = df_avg_rating[df_avg_rating['rating'] > 4.0] #--Filter for > 4
df_avg_rating.rename(columns = {'rating' : 'avg_rating'}, inplace = True)
#--Step 2: Filter by count > 50
#--Filter by restaurant count > 50
df_rating_count = df_rating_filtered.groupby(['restaurant_name'])['rating'].count().sort_values(ascending = False).reset_index()
df_rating_count = df_rating_count[df_rating_count['rating'] > 50]
df_rating_count.rename(columns = {'rating' : 'rating_count'}, inplace = True)
#--Step 3: Inner join and get the set of names
#--Get the final set of names
df_final = pd.merge(left = df_rating_count, right = df_avg_rating, how = 'inner', on = 'restaurant_name')
df_final.sort_values(by = 'avg_rating', ascending = False, inplace = True)
df_final
"""#### Observations:
* The restaurants fulfilling the criteria to get the promotional offer are: **'The Meatball Shop', 'Blue Ribbon Fried Chicken', 'Shake Shack' and 'Blue Ribbon Sushi'**.
### **Question 14:** The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]
"""
#function to determine the net revenue
def compute_rev(x):
if x > 20:
return x*0.25
elif x > 5:
return x*0.15
else:
return x*0
df['Revenue'] = df['cost_of_the_order'].apply(compute_rev)
df.head()
# get the total revenue and print it
total_rev = df['Revenue'].sum()
print('The net revenue is around', round(total_rev, 2), 'dollars')
#--Alternate solution using np.select
condition = [(df['cost_of_the_order'] > 20), (df['cost_of_the_order'] > 5)]
values = [df['cost_of_the_order'] * 0.25, df['cost_of_the_order'] * 0.15]
df['charge_to_foodhub'] = np.select(condition, values)
df['charge_to_foodhub'].sum()
#--Alternate solution using np.where
df['revenue'] = np.where(df['cost_of_the_order'] > 20, df['cost_of_the_order'] * 0.25,
np.where(df['cost_of_the_order'] > 5, df['cost_of_the_order'] * 0.15, 0))
total_revenue = df['revenue'].sum()
total_revenue
"""#### Observations:
* The net revenue generated on all the orders given in the dataset is around 6166.3 dollars.
### **Question 15:** The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]
"""
# add a new column to the dataframe df to store the total delivery time
df['total_time'] = df['food_preparation_time'] + df['delivery_time']
# find the percentage of orders that have more than 60 minutes of total delivery time
print ('The percentage of orders that have more than 60 minutes of total delivery time is',
round(df[df['total_time'] > 60].shape[0] / df.shape[0] * 100, 2),'%')
#--Using an additional column
df['total_time_2'] = df['food_preparation_time'] + df['delivery_time']
df['more_than_60'] = np.where(df['total_time_2'] > 60, 1, 0)
percentage_more_than_60 = round(df['more_than_60'].sum() / df.shape[0]*100,2)
percentage_more_than_60
"""#### Observations:
* Approximately 10.54 % of the total orders have more than 60 minutes of total delivery time.
### **Question 16:** The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]
"""
# get the mean delivery time on weekdays and print it
print('The mean delivery time on weekdays is around',
round(df[df['day_of_the_week'] == 'Weekday']['delivery_time'].mean()),
'minutes')
# get the mean delivery time on weekends and print it
print('The mean delivery time on weekends is around',
round(df[df['day_of_the_week'] == 'Weekend']['delivery_time'].mean()),
'minutes')
"""#### Observations:
* The mean delivery time on weekdays is around 28 minutes whereas the mean delivery time on weekends is around 22 minutes.
* This could be due to the dip of traffic volume in the weekends.
### Conclusion and Recommendations
### **Question 17:** What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6 marks]
### Conclusions:
* Around 80% of the orders are for American, Japanese, Italian and Chinese cuisines. Thus, it seems that these cuisines are quite popular among customers of FoodHub.
* Shake Shack is the most popular restaurant that has received the highest number of orders.
* Order volumes increase on the weekends compared to the weekdays.
* Delivery time over the weekends is less compared to the weekdays. This could be due to the dip in traffic volume over the weekends.
* Around 39% of the orders have not been rated.
### Business Recommendations:
* FoodHub should integrate with restaurants serving American, Japanese, Italian and Chinese cuisines as these cuisines are very popular among FoodHub customers.
* FoodHub should provide promotional offers to top-rated popular restaurants like Shake Shack that serve most of the orders.
* As the order volume is high during the weekends, more delivery persons should be employed during the weekends to ensure timely delivery of the order. Weekend promotional offers should be given to the customers to increase the food orders during weekends.
* Customer Rating is a very important factor to gauge customer satisfaction. The company should investigate the reason behind the low count of ratings. They can redesign the rating page in the app and make it more interactive to lure the customers to rate the order.
* Around 11% of the total orders have more than 60 minutes of total delivery time. FoodHub should try to minimize such instances in order to avoid customer dissatisfaction. They can provide some reward to the punctual delivery persons.
"""