Expected the # of transactions & CLV etc #1221

sevimcengiz · 2024-06-11T12:12:46Z

sevimcengiz
Jun 11, 2024

Hi,

I have the real e-commerce data. Firstly, I've used "lifetimes" library for expected transactions, and expected average revenue, and CLV prediction. However, the model overfits and it fails when it comes to test data. There is a seriuos mismatch with the actual data and predicted data.

Then, I've found "Pymc".

I've transaction data and required information like you have in your tutorials. I've used Pymc bg nbd and gamma gamma models.
I've split the data for training and testing. I fit the model with the training data and then I predict the testing data and compared with the current actual data. The result doesn't satisfy. What to do to increase the accuracy? Any trick to validate the results, etc?
Which points do I miss?

juanitorduz · 2024-06-11T13:51:11Z

juanitorduz
Jun 11, 2024
Maintainer

Hey! Have you tried the Pareto/NB model (see https://www.pymc-marketing.io/en/stable/notebooks/clv/pareto_nbd.html) ? This model allows for time-invariant covariates, this might help the model performance.

0 replies

ColtAllen · 2024-06-11T15:27:24Z

ColtAllen
Jun 11, 2024
Maintainer

Hey @sevimcengiz,

I presume you're working with this dataset:

https://www.kaggle.com/datasets/ankitverma2010/ecommerce-customer-churn-analysis-and-prediction/data

Databricks uses this same dataset in a notebook utilizing the btyd library, a fork I created of lifetimes before joining the pymc-marketing project:

https://notebooks.databricks.com/notebooks/RCG/Customer_Lifetime_Value/index.html#Customer_Lifetime_Value_1.html

The plots in that notebook suggest a wildly disparate population of customers. It's best to filter out extraneous customers when identified to improve performance.

The choice of time period on which to do the train/test split is also important due to the possibility of data drift over time. The same concept also makes rfm_train_test_split quite useful for evaluating the impact of a marketing campaign. Due to the difficulties in selecting an appropriate train/test split, you may wish to do a posterior predictive check instead (more on this at the end of this message).

It's also best to evaluate these models in aggregate because historical purchase frequency is given in integers, but predicted purchases are provided as decimal floats over time. Applying a discount rate with expected_customer_lifetime_value will also distort results further. Segmenting customers and evaluating performance by segment usually works well (the clv.rfm_segments utility was recently added and will be included in the next release).

Hey! Have you tried the Pareto/NB model (see https://www.pymc-marketing.io/en/stable/notebooks/clv/pareto_nbd.html) ? This model allows for time-invariant covariates, this might help the model performance.

Yes - not only does the ParetoNBDModel support covariates (and there are lot in this dataset!) but it also supports prior/posterior predictive checks, which is a more appropriate way to evaluate these models. We are still working on adding this functionality to the BetaGeoModel.

0 replies

sevimcengiz · 2024-06-13T07:50:17Z

sevimcengiz
Jun 13, 2024
Author

Thanks @juanitorduz, I'll implement it.

@ColtAllen thanks for the long and detailed answer. The databricks link helped me a a lot. The data that I'm working is real data but similar to the online retail data, I've quickly implemented all parts from the data bricks. I've followed each step but get "NaN" as a CLV value for all customers.

Firstly, why is this "purchases in the calibration period" written as a x-axis title name in the first figure? I think it should be "purchases in the holdout period". As I'm keeping the holdout period for the validation check. Model calculates and validates averagely. It doesn't fit perfectly but accaptable at the first point but getting "NaN" CLV for all customers makes me "hmm, which points I'm missing?".

0 replies

sevimcengiz · 2024-06-13T09:59:09Z

sevimcengiz
Jun 13, 2024
Author

0 replies

ColtAllen · 2024-06-14T15:37:45Z

ColtAllen
Jun 14, 2024
Maintainer

I've followed each step but get "NaN" as a CLV value for all customers.

This is a common bug in lifetimes. Is this also happening in pymc-marketing? Many of the plots you shared are from lifetimes.

0 replies

wd60622 · 2024-11-14T19:37:10Z

wd60622
Nov 14, 2024
Maintainer

Hi @sevimcengiz
Do you know if this is also an issue coming from pymc-marketing or just from lifetimes?

0 replies

sevimcengiz · 2024-11-14T20:05:38Z

sevimcengiz
Nov 14, 2024
Author

I faced with NaN when I used pymc.

0 replies

ColtAllen · 2024-11-14T20:17:46Z

ColtAllen
Nov 14, 2024
Maintainer

@sevimcengiz the API for these models has changed since this issue was created. Do you have any code examples you're able to share?

0 replies

sevimcengiz · 2024-11-14T20:23:30Z

sevimcengiz
Nov 14, 2024
Author

Actually, i haven’t check the code the last 5 months. But, I can check.Sevim CengizOn Nov 15, 2024, at 12:18 AM, Colt Allen ***@***.***> wrote: @sevimcengiz the API for these models has changed since this issue was created. Do you have any code examples you're able to share? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expected the # of transactions & CLV etc #1221

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Expected the # of transactions & CLV etc #1221

sevimcengiz Jun 11, 2024

Replies: 9 comments

juanitorduz Jun 11, 2024 Maintainer

ColtAllen Jun 11, 2024 Maintainer

sevimcengiz Jun 13, 2024 Author

sevimcengiz Jun 13, 2024 Author

ColtAllen Jun 14, 2024 Maintainer

wd60622 Nov 14, 2024 Maintainer

sevimcengiz Nov 14, 2024 Author

ColtAllen Nov 14, 2024 Maintainer

sevimcengiz Nov 14, 2024 Author

sevimcengiz
Jun 11, 2024

juanitorduz
Jun 11, 2024
Maintainer

ColtAllen
Jun 11, 2024
Maintainer

sevimcengiz
Jun 13, 2024
Author

sevimcengiz
Jun 13, 2024
Author

ColtAllen
Jun 14, 2024
Maintainer

wd60622
Nov 14, 2024
Maintainer

sevimcengiz
Nov 14, 2024
Author

ColtAllen
Nov 14, 2024
Maintainer

sevimcengiz
Nov 14, 2024
Author