MMM - Model Curves (Quasi-calibration) #296

cetagostini · 2023-06-09T15:48:16Z

cetagostini
Jun 9, 2023
Maintainer

Hi guys,

I wanted to raise a point about possibilities to calibrate the MMM model and some ideas about this that I've been developing around this.

Response curve calibration

For context, I've been trying to create different models for different markets which suffer from different situations. For each one, it is always very important to visualize the response curves, in search of understanding how much I can increase my spending. However, on different occasions, these curves tend to seem quite optimistic.

For this reason, if possible, along with the curves I also make a plot of the experiments carried out in the channels that I am analyzing with MMM.

Example:

As you can see, the results of an experiment or a set of experiments sometimes do not perfectly match the curve.

This happens because the default method for generating response curves uses the average of the generated distribution of contributions.

channel_contributions = self.compute_channel_contribution_original_scale().mean(
            ["chain", "draw"]
        )

In principle, if the result of all distributions were "normal", the mean, median, and mode should be equal (or at least close), so this should not matter. If there were a discrepancy, using the method to incorporate prior knowledge should be resolved, and the distributions could be adjusted.

But if based on the data, and your experiments, you found that your channel doesn't follow a "normal" distribution, even adding prior knowledge may not move the distribution too much. What to do in those cases?

Example ROAS distribution per channel on MMM model.

Here you can see that some channels have a "skewed" distribution where the average is not close to the area of greatest density of the distribution (The most probable, according to the model).

The grey line represents the same experiment shown on the response curve. In this case, the experiment does not seem as extreme as it was presented in the response curves. Instead, it can be interpreted that the experiment is within the expected interval based on the distribution of that channel.

This can be easily resolved if we choose another statistic by default that can better represent our data better instead of the average. In this case, by playing with the mode, I obtained better results. I believe this is also related to the fact that, in essence, pymc-marketing is carrying out a probabilistic regression. If we want to choose a point, we should choose the most probable or recurrent (Accept opinions and arguments here).

When this is done, we can see how the response curve decreases its optimism and becomes closer to the calibration experiments. Instead of having a contribution up to thousands, is now up to hundreds.

Idea 💡

Could be nice to add by default the use of mode or any other statistic which represents better the most probable areas than mean.

#my example with scipy
# Define a function to calculate mode
def compute_mode(a, axis=None):
    res = mode(a, axis=axis)
    return res[0]  # Return only the mode, not the count

# Calculate mode along "chain" and "draw" dimensions
channel_contributions_mode = xr.apply_ufunc(
    compute_mode,
    mmm.compute_channel_contribution_original_scale(),
    input_core_dims=[["chain", "draw"]],  # apply along these dimensions
    vectorize=True,  # vectorize the function over non-core dimensions
)

If not, could be good to give users the possibility to do it through quantiles. After analyzing the distribution, is easy to identify by quantiles. Another example ->

channel_contributions = mmm.compute_channel_contribution_original_scale().quantile(q=0.2,
    dim=["chain", "draw"]
)

Why? 🧐

I think this type of implementation would allow the model and results to generalize more optimally over the different shapes of distributions that may appear across all possible scenarios given by users who use the solution.

Additionally, I believe that this type of calibration would be fully compatible with causalpy, allowing the resulting data from experiments to be used, even to automatically generalize which area of the distribution should be used to plot the curves.

Assumptions:

The model is based on probabilistic regression and not point base.
Users can create experiments through Causalpy, Geolift, Causalimpact or similar to get an idea of how the distribution of each analyzed channel behaves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMM - Model Curves (Quasi-calibration) #296

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

MMM - Model Curves (Quasi-calibration) #296

cetagostini Jun 9, 2023 Maintainer

Response curve calibration

Idea 💡

Why? 🧐

Assumptions:

Replies: 0 comments

cetagostini
Jun 9, 2023
Maintainer