iv.r

## Exercise 5.4.6
## Summer 2017

# 6. We continue to consider the use of a logistic regression model to
# predict the probability of default using income and balance on the
# Default data set. In particular, we will now compute estimates for
# the standard errors of the income and balance logistic regression coefficients in two different ways: (1) using the bootstrap, and (2) using
# the standard formula for computing the standard errors in the glm()
# function. Do not forget to set a random seed before beginning your
# analysis.
# (a) Using the summary() and glm() functions, determine the estimated standard errors for the coefficients associated with income
# and balance in a multiple logistic regression model that uses
# both predictors.
# (b) Write a function, boot.fn(), that takes as input the Default data
# set as well as an index of the observations, and that outputs
# the coefficient estimates for income and balance in the multiple
# logistic regression model.
# (c) Use the boot() function together with your boot.fn() function to
# estimate the standard errors of the logistic regression coefficients
# for income and balance.
# (d) Comment on the estimated standard errors obtained using the
# glm() function and using your bootstrap function.

## Load data. Need to install ISLR package before using.

getwd()
library(ISLR)
attach(Default)

# (a) Using the summary() and glm() functions, determine the estimated standard errors for the coefficients associated with income
# and balance in a multiple logistic regression model that uses both predictors.

set.seed(88)
glm.fit = glm(default~income+balance, family=binomial("logit"), data=Default)
summary(glm.fit)

## The standard error for the coefficients associated with income and balance are listed below.
## (Intercept)        income       balance 
## 4.348e-01          4.985e-06    2.274e-04

# (b) Write a function, boot.fn(), that takes as input the Default data
# set as well as an index of the observations, and that outputs
# the coefficient estimates for income and balance in the multiple
# logistic regression model.

boot.fn = function(data, index)
  return (coef(glm(default~income+balance, family=binomial("logit"), data=data, subset=index)))  

# (c) Use the boot() function together with your boot.fn() function to
# estimate the standard errors of the logistic regression coefficients
# for income and balance.

## The boot() function below produces 100 bootstrap estimates with replacement
## and listed the statistics for the model intercept and slop of predictors respectively.
## The output listed the Standard Errors for the Intercept β0, the slop of predictors β1, and B2 are
## 4.071567e-01, 5.141896e-06, 2.139490e-04 respectively.

library(boot)
boot(Default, boot.fn, 100)

# (d) Comment on the estimated standard errors obtained using the
# glm() function and using your bootstrap function.

## The boot() function above produced 100 bootstrap estimates for α repeatedly with replacement, 
## and the estimated Std.Error generated between glm() and boot() sampling approaches are listed below.

## The std.error generated by using glm() function are listed below.
## (Intercept)        income        balance 
## 4.348e-01          4.985e-06     2.274e-04

## The std.error generated by using boot() function are listed below.
## (Intercept)        income        balance 
## 4.277823e-01       4.681102e-06  2.220416e-04

## The standard error estimates using summary() are little different from estimates obtained using bootstrap function. 
## The boostrap produces slightly better performance over the summary() in this case.
## The reason for the diffrences are due to the fact that Std. Error obtained using summary() considers 
## assumpotions like noise variance and fixed values of predictors whereas bootstrap function do not rely on these assumptions. 
## Hence, boostrap function likely gives better estimates without taking into account of those assumptions.