Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreting DHARMa diagnostics for Bernoulli GAMM #462

Open
LeneSo opened this issue Jan 28, 2025 · 1 comment
Open

Interpreting DHARMa diagnostics for Bernoulli GAMM #462

LeneSo opened this issue Jan 28, 2025 · 1 comment

Comments

@LeneSo
Copy link

LeneSo commented Jan 28, 2025

Hi Florian. I have read your vignette on DHARMa, and just want to make sure that I am interpreting my diagnostics correctly, as some issues arose during model diagnostics.

I have the following GAMM:
M2a <- gam(Diel ~ fsex + fRiver +fArray + fRiver:fArray + Length.cm+
s(yday, by = fRiver, k = 10, bs = "cr")+
s(fTransmitter, bs = "re"),
method = "REML",
select = TRUE,
data = diel_migration,
family = binomial(link="logit"))
To test for differences in diel patterns (1/0 = day/night) in arrival times of fish (male/female) from two different rivers, from four locations in each river (3 in the river and one outside the river mouth). yday represents arrival date at a location in Julian day-of-year.

The qqplot and residuals vs predicted indicated some issues:

Image

However, to me they seem quite minor, but I am not sure and would like your opinion. Also, from plotting the scaled residuals against covariates in the model, it was fine for all covariates. Upon further inspection by splitting the model into one for each river, I discovered that the yday pattern for one of the rivers is a flat, linear line, while for the other it is non-linear. When splitting the model into two separate models for each river, it seems that the model is fine, but I would prefer to keep it as one for higher statistical power, given the model is trustworthy. Based on the residual patterns pictured above, do you think this model including both rivers is fine? Thanks!

@melina-leite
Copy link
Collaborator

melina-leite commented Feb 18, 2025

Hi @LeneSo, sorry for the delayed response.
The significance of the tests also depends on the number of observations. Small deviations from the expected test statistics will appear significant if you have a large sample size. So, I'd suggest you look at the test statistics to base your conclusion. For example, for the dispersion test, how larger than 1 (or smaller for underdispersion) are the dispersion statistics (Note: based on recent tests with the available DHARMa dispersion tests, I recommend using the parametric bootstrapping for Pearson Chi-squared test:

res <- simulateResiduals( model, refit=T) # to perform the parametric bootstrapping
testDispersion(res, type="DHARMa") # to test it

Dispersion problems may arise from heteroscedasticity. So I'd check it first because, from the plot you shared, I suspect you may have some heteroscedasticity problem with your predictors. I would suspect the smooth for the yday, may be causing some issues. Have you tried a different k? Or maybe some interaction of river with other variables could help the original model to fit better.
Moreover, have you compared the predictions for both models (the original one and the ones you split the data)? maybe this may help in finding the differences (and if they are that big to decide on the separated models).

Best,
Melina

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants