Revise multicollinearity guidance in check_model vignette #871

ANAMASGARD · 2025-10-31T10:50:31Z

Fixes #828

This PR updates the multicollinearity section based on @bbolker's feedback and recent research showing that automatically removing high VIF predictors isn't usually the right approach.

What changed:

Removed the recommendation to drop predictors with high VIF values
Added context about when multicollinearity actually matters (interpretation vs prediction)
Explained that removing variables can introduce omitted variable bias
Kept the existing interaction terms guidance intact
Added 6 references (Graham 2003, O'Brien 2007, Morrissey & Ruxton 2018, Feng et al. 2019, Gregorich et al. 2021, Vanhove 2021)

Why:

The old advice was too prescriptive. Multicollinearity mostly affects coefficient precision, not bias, and whether it's a problem depends on your research goals. The vignette now reflects this nuance instead of suggesting a one-size-fits-all solution.

Checked that the vignette builds without errors and the new text flows well with the existing content.

- Remove blanket recommendation to drop high VIF predictors - Add nuanced, research-backed guidance on when multicollinearity matters - Distinguish between prediction vs. interpretation goals - Add 6 academic references (2003-2021) supporting new approach - Improve prose clarity and structure

gemini-code-assist · 2025-10-31T10:50:50Z

Summary of Changes

Hello @ANAMASGARD, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly revises the multicollinearity section within the check_model vignette. The update moves away from a prescriptive approach of automatically removing high VIF predictors, instead offering a more nuanced understanding of multicollinearity's implications based on research goals (prediction vs. interpretation) and highlighting the risks of omitted variable bias. The aim is to provide more robust and context-aware guidance for users.

Highlights

Multicollinearity Guidance: Removed the recommendation to automatically drop predictors with high VIF values, advocating for a more nuanced approach.
Contextual Understanding: Added detailed context on when multicollinearity truly matters, differentiating its impact on model interpretation versus prediction.
Omitted Variable Bias: Explained the potential for introducing omitted variable bias if variables are removed solely based on high VIFs.
Interaction Terms: Maintained the existing guidance for handling interaction terms, acknowledging their inherent multicollinearity.
New References: Incorporated six new academic references to support the updated statistical guidance on multicollinearity.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request provides a much-needed and well-executed update to the guidance on multicollinearity in the check_model vignette. The new text correctly emphasizes that multicollinearity is primarily a problem for interpretation, not prediction, and wisely advises against the automatic removal of predictors with high VIFs. The inclusion of recent literature strengthens the recommendations.

However, I've noticed a significant inconsistency. The documentation for the check_collinearity() function itself (in R/check_collinearity.R) still contains the old, more prescriptive advice that has been removed from the vignette. This will lead to conflicting information for users depending on whether they read the vignette or the function's help page. To ensure consistency across the package, I strongly recommend updating the roxygen documentation in R/check_collinearity.R to align with the excellent new guidance in this PR.

I have one suggestion for the vignette text to further improve its clarity for beginners.

gemini-code-assist · 2025-10-31T10:52:25Z

vignettes/check_model.Rmd

 ## Multicollinearity

-This plot checks for potential collinearity among predictors. In a nutshell multicollinearity means that once you know the effect of one predictor, the value of knowing the other predictor is rather low. Multicollinearity might arise when a third, unobserved variable has a causal effect on each of the two predictors that are associated with the outcome. In such cases, the actual relationship that matters would be the association between the unobserved variable and the outcome.
+This plot checks for potential collinearity among predictors. Multicollinearity occurs when predictor variables are highly correlated with each other, conditional on the other variables in the model. This should not be confused with simple pairwise correlation between predictors; what matters is the association between predictors *after accounting for all other variables in the model*.


The revised definition of multicollinearity is much more precise, which is great. However, the previous version included a very intuitive 'in a nutshell' explanation that was very accessible. Consider blending the intuitive and formal definitions to help readers who are new to this topic.

This plot checks for potential collinearity among predictors. Multicollinearity occurs when predictor variables are highly correlated with each other, conditional on the other variables in the model. In other words, the information one predictor provides about the outcome is redundant in the presence of the other predictors. This should not be confused with simple pairwise correlation between predictors; what matters is the association between predictors *after accounting for all other variables in the model*.

gemini-code-assist bot reviewed Oct 31, 2025

View reviewed changes

Made all the gemini recommended changes

bb018c0

ANAMASGARD mentioned this pull request Oct 31, 2025

advice in multicollinearity section of vignette #828

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Revise multicollinearity guidance in check_model vignette #871

Revise multicollinearity guidance in check_model vignette #871

ANAMASGARD commented Oct 31, 2025

Uh oh!

gemini-code-assist bot commented Oct 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Revise multicollinearity guidance in check_model vignette #871

Are you sure you want to change the base?

Revise multicollinearity guidance in check_model vignette #871

Conversation

ANAMASGARD commented Oct 31, 2025

What changed:

Why:

Uh oh!

gemini-code-assist bot commented Oct 31, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant