-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathannual_business_churn.Rmd
389 lines (301 loc) · 12.9 KB
/
annual_business_churn.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
---
title: "Annual Business Subscription Churn"
output: github_document
---
## Motivation
The purpose of this analysis is to gain a better understanding of _subscription churn_ and the factors that influence it. A subscription is considered churned when it has been cacelled or allowed to expire. More specifically, we will focus on _annual business_ subscriptions in this analysis.
```{r setup, include = F}
library(buffer); library(dplyr); library(ggplot2); library(survival); library(survminer)
```
## Findings and conclusions
Based on a small sample of 10 customers, the data suggests that there may be some confusion with trials, particularly for the `business_v2_small_yearly` plan.
Several of the customers started and canceled trials _and_ asked for refunds. Most of these users have also had previous subscriptions, so we probably had their credit cards on file. These customers may or may have known if, when, and exactly how much they were going to get charged.
In other cases, the `busines_v2_small_yearly` plan might not have been the right fit. Almost all of these subscriptions were canceled very quickly after having been created.
## Data collection and tidying
Let's analyze _business_ subscriptions created after January 1, 2016. The dataset we use comes from [**this Look**](https://looker.buffer.com/looks/3859) and contains the following information for each _business_ subscription created on or after January 1, 2016:
- subscription creation date
- subscription cancellation date
- subscription end date
- current subscription status
- plan id
- billing gateway (Stripe, Apple, etc.)
- plan amount in dollars
- customer id
We'll import it to R using the `buffer` package.
```{r warning = F, message = F}
# Get data from Looker
subs <- get_look(3859)
```
There are over 10k subscriptions in the dataset. Let's do a bit of cleanup and calculate the length of each subscription. The code to do that has been hidden below.
```{r include = F}
# Rename columns
colnames(subs) <- c('created_at', 'canceled_at', 'ended_at', 'status', 'id', 'plan',
'gateway', 'billing_cycle', 'amount', 'customer_id')
# Set dates as date objects
subs$created_at <- as.Date(subs$created_at, "%Y-%m-%d")
subs$canceled_at <- as.Date(subs$canceled_at, "%Y-%m-%d")
subs$ended_at <- as.Date(subs$ended_at, "%Y-%m-%d")
# Calculate subscription length
subs <- subs %>%
mutate(length = as.numeric(canceled_at - created_at),
churned = !(is.na(canceled_at)))
```
After cleaning, this is what the dataframe looks like:
```{r}
head(subs)
```
Now, to only look at _annual_ subscriptions, let's filter on the `billing_cycle`.
```{r}
# Filter out monthly subscriptions
annual <- subs %>%
filter(billing_cycle == 'year')
```
## Survival analysis
To get a better understanding of exactly when these Business customers churn, we'll use a method called survival analysis. Clasically, survival analysis was used to model the time it takes for people to die of a disease. However it can be used to model and analyze the time it takes for a specific event to occur, churn in this case.
It's particularly useful in this case because of missing data -- there must be subscriptions that will churn in our dataset _that haven't done so yet_. This is called _censoring_, and in particular _right censoring_.
Right censoring occurs when the date of the event is unknown, but is after some known date. Survival analysis can account for this kind of censoring.
There is also left censoring, for example when the date the subscription begain is unknown, but that is less applicable to our case.
The survival function, or survival curve, (`S`) models the probability that the time of the event (`T`) is greater than some specified time (`t`).
Let's build the survival curve and plot it out.
```{r message = F, warning = F}
# Kaplan Meier survival curve
annual$survival <- Surv(annual$length, annual$churned)
# Fit the model
fit <- survfit(survival ~ 1, data = annual)
# Create survival plot
ggsurvplot(fit, data = annual, risk.table = "percentage", risk.table.title = "Percent Remaining",
break.x.by = 60, xlim = c(0, 365))
```
The plot shows the percent of annual Business subscriptions still active X days after creating the subscription, and table below the graph shows the percent of subscription still remaining after X days.
By the end of the billing period (one year), only around 22% of subscriptions remain active.
It's also striking to see that only 51% of annual subscriptions are still active 60 days after creation. Looking at the graph, we can see that there is a heavy amount of churn in the first few days. Let's zoom in on this crucial time period.
```{r echo = F}
# Create survival plot
ggsurvplot(fit, data = annual, risk.table = "percentage", risk.table.title = "Percent Remaining",
break.x.by = 7, xlim = c(0, 70))
```
This plot zooms in on the first 70 days of the subscription. After 7 days, only 70% of the subscriptions remain. This is the biggest dropoff by far, and most of those subscriptions churned on the first day!
Let's look at these subscriptions.
```{r}
annual %>% group_by(churned) %>% count
# Get subscriptions that churned on first day
churned <- annual %>%
select(-survival) %>%
filter(length == 0)
```
There are 127 of these subscriptions. **The first day alone accounts for over 21% of _all_ churned subscriptions!**
Let's count these subscriptions by payment gateway.
```{r}
# Group by payment gateway
churned %>%
group_by(gateway) %>%
count
```
...and let's group by the plan as well.
```{r}
# Group by plan
churned %>%
group_by(plan) %>%
count %>%
arrange(desc(n))
```
The majority of these cases are small business yearly plans, which is to be expected. Let's grab a random sample of 10 of these subscriptions.
## Individual churned customers
Let's start by getting a random sample of 10 customers.
```{r}
# Set seed for reproducibility
set.seed(1234)
# Get sample of 10 customers
sample <- churned %>%
sample_n(size = 10) %>%
mutate(customer_id = as.character(customer_id))
```
We'll use SQL to look at the stripe events of each customer.
```{r include = F}
con <- redshift_connect()
```
#### Customer 1
Let's look at the stripe events of the first customer in the dataframe.
```{r}
cus1 <- sample[1, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus1
order by date desc
limit 100
```
This user was previously on a monthly Awesome plan before switching to a yearly Small Business plan. The user decided to cancel the subscription at the end of the billing period shortly after creating the subscription. Let's look at the next customer.
#### Customer 2
```{r}
cus2 <- sample[2, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus2
order by date desc
limit 100
```
This customer was also previously on a monthly Awesome plan before switching to an annual Small Business _trial_. The subscription was created, quickly deleted, and a _new_, separate small business plan was created.
#### Customer 3
```{r}
cus3 <- sample[3, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus3
order by date desc
limit 100
```
This customer was also previously on an Awesome monthly plan. The user switched to an annual business _trial_ and deleted the subscription 1 second later.
#### Customer 4
```{r}
cus4 <- sample[4, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus4
order by date desc
limit 100
```
This customer was also on an Awesome monthly plan, switched to a Small Business yearly plan, and canceled it 10 minutes later.
#### Customer 5
```{r}
cus5 <- sample[5, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus5
order by date desc
limit 100
```
This customer was a new customer, started a Small Business yearly plan, paid, and canceled it 30 seconds later.
#### Customer 6
```{r}
cus6 <- sample[6, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus6
order by date desc
limit 100
```
This customer was also on a monthly Awesome plan. The user started a Small Business yearly trial, canceled it quickly, and created a new active Small Business yearly subscription.
#### Customer 7
```{r}
cus7 <- sample[7, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus7
order by date desc
limit 100
```
This customer started a Small Business yearly trial, was charged, got a refund, and canceled the subscription.
#### Customer 8
```{r}
cus8 <- sample[8, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus8
order by date desc
limit 100
```
This seems like a normal customer that started a yearly trial, canceled it, and ended up on an active Small Business yearly subscription.
#### Customer 9
```{r}
cus9 <- sample[9, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus9
order by date desc
limit 100
```
This customer switched back and forth between an Awesome yearly and Small Business yearly subscription before finally getting a refund and canceling the Small Business subscription.
#### Customer 10
```{r}
cus10 <- sample[10, ]$customer_id
```
```{sql connection=con}
select
date
, type
, json_extract_path_text(data, 'object', 'customer') as customer
, json_extract_path_text(data, 'object','plan','id') as plan_id
, json_extract_path_text(data, 'object','subscription') as sub_id
, data
from stripe_api_events events
where json_extract_path_text(data, 'object', 'customer') = ?cus10
order by date desc
limit 100
```
It looks like this customer has switched between Awesome yearly and Small Business yearly plans, taking advantage of multiple trials along the way. The subscription has several failed payments and is in a "past due" state currently.
## Conclusions
This sample of 10 customers suggests that there may be some confusion with trials, particularly with the `business_v2_small_yearly` plan.
Several of the customers started and canceled trials _and_ asked for refunds. Most of these users have also had previous subscriptions, so we probably had their credit cards on file. These customers may or may have known if, when, and exactly how much they were going to get charged.
In other cases, the `busines_v2_small_yearly` plan might not have been the right fit. Almost all of these subscriptions were canceled very quickly after beginning.