How to handle interactions? #5
-
Hi! Thank you for creating and maintaining the package! I am working with regression models that include pariwise interactions. Suppose this example, in which not all variables are interacting: x1 + x2 + x3 + x4 + x1:x2 + x1:x3 + x2:x3. While experimenting with the I would be interested in knowing the contribution to R2 of X1:X2, when such interaction is added to all models that already include X1 + X2. Does this idea make sense? Is there a way to implement it, in an authomated way, for a large set of regressors? See below a reproducible example. Neither the main effects nor the interactions have gone through model selection, but hopefully it is enough the show the concept. Thank you for your time! library(domir)
library(MuMIn)
#> Warning: package 'MuMIn' was built under R version 4.1.3
data(mtcars)
## Dominance analysis with main effects
summary(lm(mpg ~ cyl + carb + am + gear, data = mtcars))
#>
#> Call:
#> lm(formula = mpg ~ cyl + carb + am + gear, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.4112 -1.4110 0.1477 1.4387 5.6888
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 27.0825 6.3126 4.290 0.000205 ***
#> cyl -1.4575 0.5233 -2.785 0.009662 **
#> carb -1.4244 0.5273 -2.701 0.011790 *
#> am 3.2888 1.7055 1.928 0.064396 .
#> gear 1.2736 1.4500 0.878 0.387511
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.766 on 27 degrees of freedom
#> Multiple R-squared: 0.8165, Adjusted R-squared: 0.7894
#> F-statistic: 30.04 on 4 and 27 DF, p-value: 1.373e-09
domir::domir(mpg ~ cyl + carb + am + gear,
function(x) {summary(lm(x, data = mtcars))$adj.r.squared})
#> Overall Value: 0.7893655
#>
#> General Dominance Values:
#> General Dominance Standardized Ranks
#> cyl 0.35369559 0.4480758 1
#> carb 0.19770138 0.2504561 2
#> am 0.14467549 0.1832807 3
#> gear 0.09329307 0.1181874 4
#>
#> Conditional Dominance Values:
#> Subset Size: 1 Subset Size: 2 Subset Size: 3 Subset Size: 4
#> cyl 0.7170527 0.4512340 0.19566498 0.050830651
#> carb 0.2803024 0.2871405 0.17599916 0.047363447
#> am 0.3384589 0.1796463 0.04014787 0.020448935
#> gear 0.2050292 0.1361367 0.03372550 -0.001719118
#>
#> Complete Dominance Designations:
#> Dmnated?cyl Dmnated?carb Dmnated?am Dmnated?gear
#> Dmnates?cyl NA NA TRUE TRUE
#> Dmnates?carb NA NA NA TRUE
#> Dmnates?am FALSE NA NA NA
#> Dmnates?gear FALSE FALSE NA NA
## Dominance analysis with main effects and interactions
summary(lm(mpg ~ cyl + carb + am + gear + cyl:am + cyl:carb, data = mtcars))
#>
#> Call:
#> lm(formula = mpg ~ cyl + carb + am + gear + cyl:am + cyl:carb,
#> data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.6599 -1.0932 0.2417 1.3790 5.5857
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 23.15499 7.94205 2.915 0.00739 **
#> cyl -1.18493 0.80108 -1.479 0.15159
#> carb -1.21018 1.94809 -0.621 0.54008
#> am 7.17145 4.53403 1.582 0.12629
#> gear 1.79267 1.63692 1.095 0.28389
#> cyl:am -0.79716 0.87592 -0.910 0.37147
#> cyl:carb -0.01106 0.27304 -0.040 0.96802
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.826 on 25 degrees of freedom
#> Multiple R-squared: 0.8227, Adjusted R-squared: 0.7802
#> F-statistic: 19.34 on 6 and 25 DF, p-value: 2.773e-08
## Interactions fitted without main effects
domir::domir(mpg ~ cyl + carb + am + gear + cyl:am + cyl:carb,
function(x) {summary(lm(x, data = mtcars))$adj.r.squared})
#> Overall Value: 0.7801873
#>
#> General Dominance Values:
#> General Dominance Standardized Ranks
#> cyl 0.23490534 0.30108842 1
#> carb 0.12006149 0.15388804 4
#> am 0.12787318 0.16390063 3
#> gear 0.07044886 0.09029737 5
#> cyl:am 0.05860488 0.07511642 6
#> cyl:carb 0.16829350 0.21570911 2
#>
#> Conditional Dominance Values:
#> Subset Size: 1 Subset Size: 2 Subset Size: 3 Subset Size: 4
#> cyl 0.7170527 0.4472323 0.16884036 0.04775141
#> carb 0.2803024 0.2991753 0.12103014 0.02339223
#> am 0.3384589 0.2676498 0.10129479 0.03079048
#> gear 0.2050292 0.1531088 0.04678272 0.01411749
#> cyl:am 0.1105455 0.1688356 0.06188640 0.01143019
#> cyl:carb 0.4527203 0.3738007 0.15265878 0.03597025
#> Subset Size: 5 Subset Size: 6
#> cyl 0.0185124593 0.010042874
#> carb 0.0016606553 -0.005191772
#> am 0.0163487161 0.012696398
#> gear 0.0019695635 0.001685357
#> cyl:am 0.0003836658 -0.001452065
#> cyl:carb 0.0030514990 -0.008440470
#>
#> Complete Dominance Designations:
#> Dmnated?cyl Dmnated?carb Dmnated?am Dmnated?gear
#> Dmnates?cyl NA NA NA NA
#> Dmnates?carb NA NA NA NA
#> Dmnates?am NA NA NA NA
#> Dmnates?gear NA NA NA NA
#> Dmnates?cyl:am NA NA FALSE NA
#> Dmnates?cyl:carb NA NA NA NA
#> Dmnated?cyl:am Dmnated?cyl:carb
#> Dmnates?cyl NA NA
#> Dmnates?carb NA NA
#> Dmnates?am TRUE NA
#> Dmnates?gear NA NA
#> Dmnates?cyl:am NA NA
#> Dmnates?cyl:carb NA NA
## Interactions fitted with main effects
domir::domir(mpg ~ cyl + carb + am + gear + cyl:am + cyl:carb,
.set = list(~ cyl + carb + am + cyl:am + cyl:carb),
function(x) {summary(lm(x, data = mtcars))$adj.r.squared})
#> Overall Value: 0.7801873
#>
#> General Dominance Values:
#> General Dominance Standardized Ranks
#> gear 0.1033573 0.1324775 2
#> set1 0.6768300 0.8675225 1
#>
#> Conditional Dominance Values:
#> Subset Size: 1 Subset Size: 2
#> gear 0.2050292 0.001685357
#> set1 0.7785019 0.575158022
#>
#> Complete Dominance Designations:
#> Dmnated?gear Dmnated?set1
#> Dmnates?gear NA FALSE
#> Dmnates?set1 TRUE NA
## This does not work
domir::domir(mpg ~ cyl + carb + am + gear + cyl:am + cyl:carb,
.set = list(~ cyl + am + cyl:am, ~ cyl + carb + cyl:carb),
function(x) {summary(lm(x, data = mtcars))$adj.r.squared})
#> Error: Names cyl in Position 2 of '.set'. do not match any names in '.obj'.
## Model selection table
options(na.action = "na.fail")
MuMIn::dredge(lm(mpg ~ gear + cyl + carb + am + cyl:am + cyl:carb, data = mtcars),
extra = c("adj.r2" = function(x) summary(x)$adj.r.squared))
#> Fixed term is "(Intercept)"
#> Global model call: lm(formula = mpg ~ gear + cyl + carb + am + cyl:am + cyl:carb,
#> data = mtcars)
#> ---
#> Model selection table
#> (Int) am crb cyl ger am:cyl crb:cyl adj.r2 df logLik
#> 8 32.170 4.243 -1.1300 -1.717 0.7911 5 -75.696
#> 16 27.080 3.289 -1.4240 -1.457 1.27400 0.7894 6 -75.245
#> 24 31.220 6.647 -0.9837 -1.638 -0.4497 0.7859 6 -75.506
#> 40 30.730 4.330 -0.4796 -1.489 -0.09724 0.7845 6 -75.614
#> 15 22.560 -1.5570 -1.512 3.05500 0.7689 5 -77.310
#> 22 30.870 10.180 -1.976 -1.3050 0.7621 5 -77.772
#> 32 23.240 7.184 -1.2860 -1.206 1.80900 -0.8037 0.7886 7 -74.697
#> 6 34.520 2.567 -2.501 0.7424 4 -79.610
#> 48 26.480 3.384 -1.0320 -1.337 1.21400 -0.05654 0.7816 7 -75.218
#> 56 30.040 6.587 -0.4341 -1.447 -0.4244 -0.08336 0.7785 7 -75.446
#> 47 23.480 -2.0430 -1.661 3.06500 0.07070 0.7610 6 -77.269
#> 5 37.880 -2.876 0.7171 3 -81.653
#> 14 38.790 3.890 -2.555 -1.21200 0.7420 5 -79.072
#> 12 12.740 3.545 -2.5640 3.55800 0.7385 5 -79.286
#> 30 30.540 10.240 -1.962 0.07378 -1.3300 0.7534 6 -77.771
#> 7 37.810 -0.5261 -2.625 0.7226 4 -80.791
#> 64 23.150 7.171 -1.2100 -1.185 1.79300 -0.7972 -0.01106 0.7802 8 -74.696
#> 11 7.276 -2.7540 5.57600 0.7161 4 -81.166
#> 13 34.660 -2.743 0.65190 0.7125 4 -81.369
#> 39 38.470 -0.8542 -2.729 0.04810 0.7130 5 -80.776
#> 4 23.150 7.653 -2.1920 0.6832 4 -82.917
#> 2 17.150 7.245 0.3385 3 -95.242
#> 10 16.860 7.142 0.08805 0.3157 4 -95.241
#> 3 25.870 -2.0560 0.2803 3 -96.590
#> 9 5.623 3.92300 0.2050 3 -98.182
#> 1 20.090 0.0000 2 -102.378
#> AICc delta weight
#> 8 163.7 0.00 0.365
#> 16 165.9 2.15 0.125
#> 24 166.4 2.67 0.096
#> 40 166.6 2.89 0.086
#> 15 166.9 3.23 0.073
#> 22 167.9 4.15 0.046
#> 32 168.1 4.36 0.041
#> 6 168.7 5.00 0.030
#> 48 169.1 5.40 0.025
#> 56 169.6 5.86 0.020
#> 47 169.9 6.20 0.016
#> 5 170.2 6.46 0.014
#> 14 170.5 6.75 0.012
#> 12 170.9 7.18 0.010
#> 30 170.9 7.20 0.010
#> 7 171.1 7.36 0.009
#> 64 171.7 7.95 0.007
#> 11 171.8 8.11 0.006
#> 13 172.2 8.52 0.005
#> 39 173.9 10.16 0.002
#> 4 175.3 11.62 0.001
#> 2 197.3 33.64 0.000
#> 10 200.0 36.26 0.000
#> 3 200.0 36.34 0.000
#> 9 203.2 39.52 0.000
#> 1 209.2 45.47 0.000
#> Models ranked by AICc(x)
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MuMIn_1.46.0 domir_1.0.1
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.38 magrittr_2.0.3 lattice_0.20-44
#> [5] rlang_1.1.0 fastmap_1.1.0 stringr_1.5.0 highr_0.9
#> [9] tools_4.1.0 grid_4.1.0 nlme_3.1-152 xfun_0.30
#> [13] cli_3.6.0 withr_2.5.0 htmltools_0.5.2 yaml_2.3.5
#> [17] digest_0.6.29 lifecycle_1.0.3 Matrix_1.3-3 vctrs_0.6.1
#> [21] fs_1.5.2 glue_1.6.2 evaluate_0.15 rmarkdown_2.13
#> [25] reprex_2.0.1 stringi_1.7.6 compiler_4.1.0 stats4_4.1.0 Created on 2023-09-26 by the reprex package (v2.0.1) |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 6 replies
-
Transferring this issue to a discussion in Q&A. |
Beta Was this translation helpful? Give feedback.
-
Understand what you are asking and there is an approach in the methodological literature to do just that. The article below outlines it. LeBreton, J. M., Tonidandel, S., & Krasikova, D. V. (2013). Residualized relative importance analysis: A technique for the comprehensive decomposition of variance in higher order regression models. Organizational Research Methods, 16(3), 449-473. In short, the idea is to residualize the interaction and estimate it as a separate variable. I might recommend using the
Then the data can be used as before referring to the generated, residualzed interaction instead of the non-residualized one from before. Something like:
Which produces:
In this case, the adjusted |
Beta Was this translation helpful? Give feedback.
-
Thank you very much! In order to make this work, can I extend it to factors with 3-4 levels? Perhaps using dummy variables? I have tried to use dummy variables to get the residualized interaction term, does the aproach below make sense? library(domir)
#> Warning: package 'domir' was built under R version 4.2.3
# Declare factor
data(mtcars)
mtcars$cyl <- as.factor(mtcars$cyl)
# Factor as dummy variables
model.matrix(~ mtcars$cyl)
#> (Intercept) mtcars$cyl6 mtcars$cyl8
#> 1 1 1 0
#> 2 1 1 0
#> 3 1 0 0
#> 4 1 1 0
#> 5 1 0 1
#> 6 1 1 0
#> 7 1 0 1
#> 8 1 0 0
#> 9 1 0 0
#> 10 1 1 0
#> 11 1 1 0
#> 12 1 0 1
#> 13 1 0 1
#> 14 1 0 1
#> 15 1 0 1
#> 16 1 0 1
#> 17 1 0 1
#> 18 1 0 0
#> 19 1 0 0
#> 20 1 0 0
#> 21 1 0 0
#> 22 1 0 1
#> 23 1 0 1
#> 24 1 0 1
#> 25 1 0 1
#> 26 1 0 0
#> 27 1 0 0
#> 28 1 0 0
#> 29 1 0 1
#> 30 1 1 0
#> 31 1 0 1
#> 32 1 0 0
#> attr(,"assign")
#> [1] 0 1 1
#> attr(,"contrasts")
#> attr(,"contrasts")$`mtcars$cyl`
#> [1] "contr.treatment"
# Store residualized interaction terms, one per level of the categorical factor
mtcars$mtcars_cyl6_am <- resid(lm(model.matrix(~ mtcars$cyl)[, 2]*am ~ am + model.matrix(~ mtcars$cyl)[, 2], mtcars))
mtcars$mtcars_cyl8_am <- resid(lm(model.matrix(~ mtcars$cyl)[, 3]*am ~ am + model.matrix(~ mtcars$cyl)[, 3], mtcars))
# Dominance analysis with residualized interaction terms
domir::domir(
mpg ~ am + cyl + mtcars_cyl6_am + mtcars_cyl8_am + gear,
function(x) {summary(lm(x, data = mtcars))$adj.r.squared})
#> Overall Value: 0.7368113
#>
#> General Dominance Values:
#> General Dominance Standardized Ranks
#> am 0.13799818 0.18729108 2
#> cyl 0.47169054 0.64017823 1
#> mtcars_cyl6_am 0.02928847 0.03975031 4
#> mtcars_cyl8_am 0.02425967 0.03292521 5
#> gear 0.07357442 0.09985518 3
#>
#> Conditional Dominance Values:
#> Subset Size: 1 Subset Size: 2 Subset Size: 3 Subset Size: 4
#> am 0.338458908 0.20946575 0.10380933 0.026921572
#> cyl 0.714009029 0.57403829 0.44688158 0.339113641
#> mtcars_cyl6_am 0.035094411 0.03668608 0.04193978 0.029121983
#> mtcars_cyl8_am -0.005988183 0.04079001 0.05196323 0.030012699
#> gear 0.205029230 0.11901345 0.05237557 0.001516353
#> Subset Size: 5
#> am 0.011335334
#> cyl 0.284410140
#> mtcars_cyl6_am 0.003600121
#> mtcars_cyl8_am 0.004520578
#> gear -0.010062489
#>
#> Complete Dominance Designations:
#> Dmnated?am Dmnated?cyl Dmnated?mtcars_cyl6_am
#> Dmnates?am NA FALSE NA
#> Dmnates?cyl TRUE NA TRUE
#> Dmnates?mtcars_cyl6_am NA FALSE NA
#> Dmnates?mtcars_cyl8_am NA FALSE NA
#> Dmnates?gear FALSE FALSE NA
#> Dmnated?mtcars_cyl8_am Dmnated?gear
#> Dmnates?am NA TRUE
#> Dmnates?cyl TRUE TRUE
#> Dmnates?mtcars_cyl6_am NA NA
#> Dmnates?mtcars_cyl8_am NA NA
#> Dmnates?gear NA NA Created on 2023-10-02 with reprex v2.0.2 |
Beta Was this translation helpful? Give feedback.
-
Hi! I have an additional question. I want to fit regression models than include an interaction between a factor and a continuous variable with a quadratic term. I would fit raw polynomials on a standardized continous variable. library(domir)
# Dataset
data(mtcars)
# Declare factor
mtcars$cyl <- as.factor(mtcars$cyl)
# Standardize continuous variable
mtcars$disp.std <- (mtcars$disp - mean(mtcars$disp))/sd(mtcars$disp)
# Create quadratic variable for the standardized continuous variable
mtcars$disp.sq <- mtcars$disp.std^2
# Resulting dataset
str(mtcars)
#> 'data.frame': 32 obs. of 13 variables:
#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#> $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
#> $ disp : num 160 160 108 258 360 ...
#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
#> $ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
#> $ qsec : num 16.5 17 18.6 19.4 17 ...
#> $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
#> $ am : num 1 1 1 0 0 0 0 0 0 0 ...
#> $ gear : num 4 4 4 3 3 3 3 4 4 4 ...
#> $ carb : num 4 4 1 1 2 1 4 2 2 4 ...
#> $ disp.std: num -0.571 -0.571 -0.99 0.22 1.043 ...
#> $ disp.sq : num 0.3256 0.3256 0.9805 0.0484 1.088 ...
# Regression model
summary(lm(mpg ~ poly(disp.std, 2, raw = TRUE)*cyl, data = mtcars))
#>
#> Call:
#> lm(formula = mpg ~ poly(disp.std, 2, raw = TRUE) * cyl, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.233 -1.251 -0.455 1.376 4.029
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 50.98 18.20 2.801 0.0101 *
#> poly(disp.std, 2, raw = TRUE)1 69.63 37.61 1.851 0.0770 .
#> poly(disp.std, 2, raw = TRUE)2 43.24 18.76 2.304 0.0306 *
#> cyl6 -32.02 18.27 -1.753 0.0929 .
#> cyl8 -37.52 18.38 -2.042 0.0528 .
#> poly(disp.std, 2, raw = TRUE)1:cyl6 -62.88 38.16 -1.648 0.1130
#> poly(disp.std, 2, raw = TRUE)2:cyl6 -29.36 22.83 -1.286 0.2111
#> poly(disp.std, 2, raw = TRUE)1:cyl8 -62.61 37.99 -1.648 0.1129
#> poly(disp.std, 2, raw = TRUE)2:cyl8 -47.45 18.91 -2.510 0.0196 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.114 on 23 degrees of freedom
#> Multiple R-squared: 0.9087, Adjusted R-squared: 0.877
#> F-statistic: 28.62 on 8 and 23 DF, p-value: 3.472e-10
## Residualization
## Residualization of quadratic term
mtcars$r_disp_sq <- resid(lm(disp.sq ~ disp.std, mtcars))
## Residualization of interaction between factor and linear term
mtcars$r_cyl6_disp <- resid(lm(model.matrix(~ mtcars$cyl)[, 2]*disp.std ~ disp.std + model.matrix(~ mtcars$cyl)[, 2], mtcars))
mtcars$r_cyl8_am_disp <- resid(lm(model.matrix(~ mtcars$cyl)[, 3]*disp.std ~ disp.std + model.matrix(~ mtcars$cyl)[, 3], mtcars))
## Residualization of interaction between factor and quadratic term
mtcars$r_cyl6_disp_sq <- resid(lm(model.matrix(~ mtcars$cyl)[, 2]*disp.sq ~ disp.std + disp.sq + model.matrix(~ mtcars$cyl)[, 2], mtcars))
mtcars$r_cyl8_am_disp_sq <- resid(lm(model.matrix(~ mtcars$cyl)[, 3]*disp.sq ~ disp.std + disp.sq + model.matrix(~ mtcars$cyl)[, 3], mtcars)) Created on 2023-10-05 by the reprex package (v2.0.1) |
Beta Was this translation helpful? Give feedback.
-
Hi @jluchman There are a couple of points I don't follow in the post above. 1- If I understand correctly, you residualized each factor level with all factor levels. Is this the recommended approach?
2- Here, shouldn't the response be
My statistical/mathematical background is not null, but not very strong either, so I might have missed stuff. Apologies! Our approaches also differ because I used standardized variables (standardization before calculating the quadratic term). However, I don't think standardization should have an effect. See a reproducible example with an alternative approach. Cheers! library(domir)
#> Warning: package 'domir' was built under R version 4.2.3
# Dataset
data(mtcars)
# Declare factor
mtcars <- mtcars
mtcars$cyl <- as.factor(mtcars$cyl)
# Standardize continuous variable
mtcars$disp.std <- (mtcars$disp - mean(mtcars$disp))/sd(mtcars$disp)
# Create quadratic variable for the standardized continuous variable
mtcars$disp.sq <- mtcars$disp.std^2
# Regression model
summary(lm(mpg ~ poly(disp.std, 2, raw = TRUE)*cyl, data = mtcars))
#>
#> Call:
#> lm(formula = mpg ~ poly(disp.std, 2, raw = TRUE) * cyl, data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.233 -1.251 -0.455 1.376 4.029
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 50.98 18.20 2.801 0.0101 *
#> poly(disp.std, 2, raw = TRUE)1 69.63 37.61 1.851 0.0770 .
#> poly(disp.std, 2, raw = TRUE)2 43.24 18.76 2.304 0.0306 *
#> cyl6 -32.02 18.27 -1.753 0.0929 .
#> cyl8 -37.52 18.38 -2.042 0.0528 .
#> poly(disp.std, 2, raw = TRUE)1:cyl6 -62.88 38.16 -1.648 0.1130
#> poly(disp.std, 2, raw = TRUE)2:cyl6 -29.36 22.83 -1.286 0.2111
#> poly(disp.std, 2, raw = TRUE)1:cyl8 -62.61 37.99 -1.648 0.1129
#> poly(disp.std, 2, raw = TRUE)2:cyl8 -47.45 18.91 -2.510 0.0196 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.114 on 23 degrees of freedom
#> Multiple R-squared: 0.9087, Adjusted R-squared: 0.877
#> F-statistic: 28.62 on 8 and 23 DF, p-value: 3.472e-10
## Residualization
## Residualization of quadratic term
mtcars$r_disp_sq <- resid(lm(disp.sq ~ disp.std, mtcars))
## Residualization of interaction between factor and linear term
mtcars$r_cyl6_disp <- resid(lm(model.matrix(~ mtcars$cyl)[, 2]*disp.std ~ disp.std + model.matrix(~ mtcars$cyl)[, 2], mtcars))
mtcars$r_cyl8_disp <- resid(lm(model.matrix(~ mtcars$cyl)[, 3]*disp.std ~ disp.std + model.matrix(~ mtcars$cyl)[, 3], mtcars))
## Residualization of interaction between factor and quadratic term
mtcars$r_cyl6_disp_sq <- resid(lm(model.matrix(~ mtcars$cyl)[, 2]*disp.sq ~ disp.std + disp.sq + model.matrix(~ mtcars$cyl)[, 2], mtcars))
mtcars$r_cyl8_disp_sq <- resid(lm(model.matrix(~ mtcars$cyl)[, 3]*disp.sq ~ disp.std + disp.sq + model.matrix(~ mtcars$cyl)[, 3], mtcars))
domir::domir(
mpg ~ cyl + disp.std + r_disp_sq + r_cyl6_disp + r_cyl8_disp + r_cyl6_disp_sq + r_cyl8_disp_sq,
.set = list("int.factor.linear" = ~ r_cyl6_disp + r_cyl8_disp, "int.factor.quadratic" = ~ r_cyl6_disp_sq + r_cyl8_disp_sq),
function(x) {summary(lm(x, data = mtcars))$adj.r.squared})
#> Overall Value: 0.8769711
#>
#> General Dominance Values:
#> General Dominance Standardized Ranks
#> cyl 0.36824869 0.41990974 2
#> disp.std 0.39913100 0.45512448 1
#> r_disp_sq 0.02126462 0.02424780 5
#> int.factor.linear 0.03408503 0.03886676 4
#> int.factor.quadratic 0.05424173 0.06185121 3
#>
#> Conditional Dominance Values:
#> Subset Size: 1 Subset Size: 2 Subset Size: 3
#> cyl 0.71400903 0.53004453 0.37142433
#> disp.std 0.70895479 0.57608541 0.44260226
#> r_disp_sq 0.04353530 0.02039859 0.01063936
#> int.factor.linear 0.07160398 0.03156972 0.02681937
#> int.factor.quadratic 0.03448693 0.05318363 0.07269929
#> Subset Size: 4 Subset Size: 5
#> cyl 0.21704072 0.008724854
#> disp.std 0.27306009 -0.005047519
#> r_disp_sq 0.01891762 0.012832229
#> int.factor.linear 0.03674691 0.003685162
#> int.factor.quadratic 0.08116961 0.029669161
#>
#> Complete Dominance Designations:
#> Dmnated?cyl Dmnated?disp.std Dmnated?r_disp_sq
#> Dmnates?cyl NA NA NA
#> Dmnates?disp.std NA NA NA
#> Dmnates?r_disp_sq NA NA NA
#> Dmnates?int.factor.linear NA NA NA
#> Dmnates?int.factor.quadratic NA NA NA
#> Dmnated?int.factor.linear
#> Dmnates?cyl NA
#> Dmnates?disp.std NA
#> Dmnates?r_disp_sq NA
#> Dmnates?int.factor.linear NA
#> Dmnates?int.factor.quadratic NA
#> Dmnated?int.factor.quadratic
#> Dmnates?cyl NA
#> Dmnates?disp.std NA
#> Dmnates?r_disp_sq NA
#> Dmnates?int.factor.linear NA
#> Dmnates?int.factor.quadratic NA Created on 2023-10-10 with reprex v2.0.2 |
Beta Was this translation helpful? Give feedback.
-
Hi again @MarcRieraDominguez , I've had a chance to go back and revisit the article and have some clarity on exactly what to do here. I've also revised my response in #5 (reply in thread). The key issue in developing the residualization methodology is to be able to take the residualized term, submit it to the To use the example you've posted, the key thing would be to devise residuals that can reproduce the highest order coefficients (i.e., the two dummy code-by-quadratic interactions):
but when submitted alone. To get the approach to work, I'd recommend the following adjustments to the residualization in your example:
Which is mostly just adding in the other dummy code for the interactions and a few extra terms for the residualization of the quadratic-by-dummy code terms including the other dummy code as well as the lower-order interaction term. To your question about the necessity of both dummies in residualization, it's a good one. In this case, the dummy codes are being used as such. That is, they are mutually exclusive categories representing responses to a single variable. In this way, they are inseparable in modeling and should always be used together. Moreover, they will produce the wrong result if not (will show an example below). With the residualization as updated above, notice the result of the
Now the residualized terms, without any other predictors, match those from the full model as discussed above (not true for the intercept, but that is a different question). Now, with this residualization, the below matches with the updated
One last point has to do with what happens when the complementary dummy code is not included in the focal dummy code's residualization. Below is just such an example focused on the magnitude of the highest order coefficients.
As can be seen, removing the other dummy code when residualizing produces a result that does not match the full model's coefficients. This is not the case when including the complementary coefficients as was observed above. |
Beta Was this translation helpful? Give feedback.
Understand what you are asking and there is an approach in the methodological literature to do just that.
The article below outlines it.
LeBreton, J. M., Tonidandel, S., & Krasikova, D. V. (2013). Residualized relative importance analysis: A technique for the comprehensive decomposition of variance in higher order regression models. Organizational Research Methods, 16(3), 449-473.
In short, the idea is to residualize the interaction and estimate it as a separate variable. I might recommend using the
datawizard::adjust()
function to obtain the residual interaction term. For example: