Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHARMa residual checks with with GLMMadaptive using ‘hurdle semi-continous’ or ‘hurdle log-normal’ #423

Open
florianhartig opened this issue Jul 12, 2024 · 1 comment

Comments

@florianhartig
Copy link
Owner

For my study I model the response variable using a two-part mixed model approach, using the GLMMadaptive package in R. The family function there is called a ‘hurdle semi-continous’ or ‘hurdle log-normal’ function. On the vignette-page of the GLMMadaptive package, a wrapper function was provided to have the DHARMa functionality to be used with MixMod objects. I modified this function a little to work with NA’s in the dataset. The function is:

resids_plot2 <- function(object, y, nsim = 1000,

                         type = c("subject_specific", "mean_subject"),

                         integerResponse = NULL) {

  if (!inherits(object, "MixMod"))

    stop("this function works for 'MixMod' objects.\n")

  type <- match.arg(type)

  if (is.null(integerResponse)) {

    integer_families <- c("binomial", "poisson", "negative binomial",

                          "zero-inflated poisson", "zero-inflated negative binomial",

                          "hurdle poisson", "hurdle negative binomial")

    numeric_families <- c("hurdle log-normal", "beta", "hurdle beta", "Gamma")

    if (object$family$family %in% integer_families) {

      integerResponse <- TRUE

    } else if (object$family$family %in% numeric_families) {

      integerResponse <- FALSE

    } else {

      stop("non build-in family object; you need to specify the 'integerResponse',\n",

           "\targument indicating whether the outcome variable is integer or not.\n")

    }

  }

 

  # Select only subjects that were not removed from model (i.e., non-NA).

  subjects <- sort(as.numeric(rownames(model.frame(object))))

  y <- y[subjects]

 

  # Simulate

  sims <- simulate(object, nsim = nsim, type = type)

  fits <- fitted(object, type = type)

  dharmaRes <- DHARMa::createDHARMa(simulatedResponse = sims, observedResponse = y,

                                    fittedPredictedResponse = fits,

                                    integerResponse = integerResponse)

  DHARMa:::plot.DHARMa(dharmaRes, quantreg = FALSE)

}

 

In my case, the function returns a plot like below:

image

However, when I try to plot the residuals with the functions like you described in the DHARMa vignette, I can obtain the following plots

plot(simulateResiduals(tp0_hl, n = 1000)) #setting N to 1000 to match the nsim from the wrapper function

This shows a slightly better QQ-plot.

image

However, when testing the model with a zero-part random effects model, the differences increase even further:

Using the wrapper function:

image

Using the plot(simulateResiduals(model, nsim = 1000)) function:

image

The latter shows a much better QQ-plot than the wrapper function does, and one could deem the second plot to show a relatively nice fit (apart from the KS test) compared to the first plot.

I am not able to decipher where this difference comes from and which plot-method would be better to ‘trust’. I believe this has something to do with the createDHARMa function.

Also, I read on your github that you have been working together with the creator of the GLMMadaptive package, so, perhaps, the wrapper function is absolete now? Could you help me to point out which model-plot function would be best to use, working with MixMod objects and what leads to the differences of the plots?

@florianhartig
Copy link
Owner Author

Hi,

thanks for bringing this to my attention! I suspect the differences come from different defaults for type = c("subject_specific", "mean_subject") in DHARMa and the glmmAdaptive function.

I will have to check in more detail which settings exactly make more sense. My initial reaction is that subject-specific is likely fine for simulations, but extractions of predictions as

fits <- fitted(object, type = type)

in the glmmAdaptive function may be a problem for the reasons discussed in #43.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants