A metric to indicate model sensitivity towards individual data points #173

dk-teknologisk-mon · 2021-03-09T16:54:03Z

dk-teknologisk-mon
Mar 9, 2021
Maintainer

In the real world, mistakes happen during experiments and measurements. This means that sometimes your data does not actually represent the settings you believe. In the domain of frequentist statistics (that we have left to use this tool) one would use things like normal-probability plots and so on to detect outliers, but such a plot is not meaningful for a Bayesian model (as far as I know).

A different metric related to each data point that may be transferable is DFFITS, which stands for "difference in fits". It is calculated as by measuring the change in predicted values that occurs when that data point is deleted. The larger the value of DFFITS, the more that data point influences the fitted model. The metric has a concrete math-y definition that you can look up, but the point is that if the location of your expected minimum hinges on one data point, then you better be sure that data is valid (or that you gather more data before you stop your optimization proces).

sqbl · 2022-11-11T10:27:39Z

sqbl
Nov 11, 2022
Maintainer

Hi Morten,
Hijacking this tread to start/continue this discussion.
Would a "leave-one-out" validation work?
For one, it would help us judge whether the optimisation is done (think about plotting expected y vs realised y and printing the R^2 for that).
Secondly, a pivotal data-point's absense would probably be noticable on the suggestions(?)

0 replies

dk-teknologisk-mon · 2022-11-18T08:59:21Z

dk-teknologisk-mon
Nov 18, 2022
Maintainer Author

Variants on LOO validation could certainly be a good start. One useful metric to judge sensitivity would be to plot the change in the location of the expected minimum (in normalized axes) when leaving out each individual datapoint and fitting the model to the rest.

If you take a look at lecture 7 of Richard McElreath's Statistical Rethinking 2022 course, he names a few other good metrics for outlier detection, namely Pareto Smoothed Importance Sampling (PSIS) and the Widely Applicable Information Criterion (WAIC).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A metric to indicate model sensitivity towards individual data points #173

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

A metric to indicate model sensitivity towards individual data points #173

dk-teknologisk-mon Mar 9, 2021 Maintainer

Replies: 2 comments

sqbl Nov 11, 2022 Maintainer

dk-teknologisk-mon Nov 18, 2022 Maintainer Author

dk-teknologisk-mon
Mar 9, 2021
Maintainer

sqbl
Nov 11, 2022
Maintainer

dk-teknologisk-mon
Nov 18, 2022
Maintainer Author