-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcollinearity.Rmd
executable file
·74 lines (50 loc) · 2.15 KB
/
collinearity.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
title: "collinearity"
author: "liuc"
date: "1/17/2022"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## collinearity
多重线性回归中的共线性问题,可以通过剔除变量法、增加样本容量、变量变换法进行。但是这些针对变量的操作或者需要具体的背景知识,或者无法保证一定可以得到很好的结果。不过依旧为最为常用的方法。 下面从变量选择和岭回归等角度解决此问题。
除岭回归以外,常用来处理严重共线性的方法还包括:主成分回归、逐步回归、偏最小二乘法、lasso回归等。
#### 岭回归
岭回归是一种专门用于共线性数据分析的有偏估计方法,实质上是经修正过的最小二乘法。
```{r}
library(glmnet)
ridge <- glmnet::glmnet(Fertility ~., data = swiss, alpha = 0)
summary(ridge)
```
#### 多重共线性的变量选择之 逐步回归法
逐步回归的应用。
虽然很多时候并不是推荐的方法。
```{r}
# Note that,
#
# forward selection and stepwise selection can be applied in the high-dimensional configuration, where the number of samples n is inferior to the number of predictors p, such as in genomic fields.
# Backward selection requires that the number of samples n is larger than the number of variables p, so that the full model can be fit.
library(MASS)
# Fit the full model
full.model <- lm(Fertility ~., data = swiss)
# Stepwise regression model
step.model <- stepAIC(full.model, direction = "both",
trace = FALSE)
summary(step.model)
# use stats package
#define intercept-only model
intercept_only <- lm(Fertility ~ 1, data=swiss)
#define model with all predictors
all <- lm(Fertility ~ ., data=swiss)
#perform forward stepwise regression, 注意direction不同时,step第一个参数的改变
forward <- step(intercept_only, direction='forward', scope=formula(all), trace=0)
#view results of forward stepwise regression
forward$anova
#view final model
forward$coefficients
# use leaps package
models <- regsubsets(Fertility~., data = swiss, nvmax = 5,
method = "seqrep")
summary(models)
```