-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathiv.r
79 lines (63 loc) · 3.96 KB
/
iv.r
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
## Exercise 5.4.6
## Summer 2017
# 6. We continue to consider the use of a logistic regression model to
# predict the probability of default using income and balance on the
# Default data set. In particular, we will now compute estimates for
# the standard errors of the income and balance logistic regression coefficients in two different ways: (1) using the bootstrap, and (2) using
# the standard formula for computing the standard errors in the glm()
# function. Do not forget to set a random seed before beginning your
# analysis.
# (a) Using the summary() and glm() functions, determine the estimated standard errors for the coefficients associated with income
# and balance in a multiple logistic regression model that uses
# both predictors.
# (b) Write a function, boot.fn(), that takes as input the Default data
# set as well as an index of the observations, and that outputs
# the coefficient estimates for income and balance in the multiple
# logistic regression model.
# (c) Use the boot() function together with your boot.fn() function to
# estimate the standard errors of the logistic regression coefficients
# for income and balance.
# (d) Comment on the estimated standard errors obtained using the
# glm() function and using your bootstrap function.
## Load data. Need to install ISLR package before using.
getwd()
library(ISLR)
attach(Default)
# (a) Using the summary() and glm() functions, determine the estimated standard errors for the coefficients associated with income
# and balance in a multiple logistic regression model that uses both predictors.
set.seed(88)
glm.fit = glm(default~income+balance, family=binomial("logit"), data=Default)
summary(glm.fit)
## The standard error for the coefficients associated with income and balance are listed below.
## (Intercept) income balance
## 4.348e-01 4.985e-06 2.274e-04
# (b) Write a function, boot.fn(), that takes as input the Default data
# set as well as an index of the observations, and that outputs
# the coefficient estimates for income and balance in the multiple
# logistic regression model.
boot.fn = function(data, index)
return (coef(glm(default~income+balance, family=binomial("logit"), data=data, subset=index)))
# (c) Use the boot() function together with your boot.fn() function to
# estimate the standard errors of the logistic regression coefficients
# for income and balance.
## The boot() function below produces 100 bootstrap estimates with replacement
## and listed the statistics for the model intercept and slop of predictors respectively.
## The output listed the Standard Errors for the Intercept β0, the slop of predictors β1, and B2 are
## 4.071567e-01, 5.141896e-06, 2.139490e-04 respectively.
library(boot)
boot(Default, boot.fn, 100)
# (d) Comment on the estimated standard errors obtained using the
# glm() function and using your bootstrap function.
## The boot() function above produced 100 bootstrap estimates for α repeatedly with replacement,
## and the estimated Std.Error generated between glm() and boot() sampling approaches are listed below.
## The std.error generated by using glm() function are listed below.
## (Intercept) income balance
## 4.348e-01 4.985e-06 2.274e-04
## The std.error generated by using boot() function are listed below.
## (Intercept) income balance
## 4.277823e-01 4.681102e-06 2.220416e-04
## The standard error estimates using summary() are little different from estimates obtained using bootstrap function.
## The boostrap produces slightly better performance over the summary() in this case.
## The reason for the diffrences are due to the fact that Std. Error obtained using summary() considers
## assumpotions like noise variance and fixed values of predictors whereas bootstrap function do not rely on these assumptions.
## Hence, boostrap function likely gives better estimates without taking into account of those assumptions.