elpd seems to be overfitting under cross-validation #283
-
Hi all, I have run several models (with different SPDE mesh size) under cross-validation using sdmTMB_cv(). I have tried several strategies of defining folds: random subsample, spatial clustering and temporal subsample and checked the test metrics of sum_loglik and elpd. It seems that in all 3 strategies. The elpd metric prefers the most smallest mesh size (the most complex model), while sum_loglik often shows the other direction. I am wondering if it is valid to combine elpd with cross-validation procedure and what could caused the strange result? thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Apologies for the confusion -- there was an issue with our implementation of ELPD, and it was not doing what we had described. Current versions of sdmTMB will only return the |
Beta Was this translation helpful? Give feedback.
Apologies for the confusion -- there was an issue with our implementation of ELPD, and it was not doing what we had described. Current versions of sdmTMB will only return the
sum_loglik
and have removed ELPD entirely -- so for the purpose of your analysis, I would usesum_loglik
. This is essentially the same as the log-score, and similar implementations can be found in packages likescoringRules