A metric to indicate model sensitivity towards individual data points #173
Replies: 2 comments
-
Hi Morten, |
Beta Was this translation helpful? Give feedback.
-
Variants on LOO validation could certainly be a good start. One useful metric to judge sensitivity would be to plot the change in the location of the expected minimum (in normalized axes) when leaving out each individual datapoint and fitting the model to the rest. If you take a look at lecture 7 of Richard McElreath's Statistical Rethinking 2022 course, he names a few other good metrics for outlier detection, namely Pareto Smoothed Importance Sampling (PSIS) and the Widely Applicable Information Criterion (WAIC). |
Beta Was this translation helpful? Give feedback.
-
In the real world, mistakes happen during experiments and measurements. This means that sometimes your data does not actually represent the settings you believe. In the domain of frequentist statistics (that we have left to use this tool) one would use things like normal-probability plots and so on to detect outliers, but such a plot is not meaningful for a Bayesian model (as far as I know).
A different metric related to each data point that may be transferable is DFFITS, which stands for "difference in fits". It is calculated as by measuring the change in predicted values that occurs when that data point is deleted. The larger the value of DFFITS, the more that data point influences the fitted model. The metric has a concrete math-y definition that you can look up, but the point is that if the location of your expected minimum hinges on one data point, then you better be sure that data is valid (or that you gather more data before you stop your optimization proces).
Beta Was this translation helpful? Give feedback.
All reactions