-
Notifications
You must be signed in to change notification settings - Fork 0
/
ae-7-exam-2-review.qmd
132 lines (85 loc) · 2.94 KB
/
ae-7-exam-2-review.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
title: "AE 7: Exam 2 Review"
author: "Add your name here"
format: pdf
editor: visual
---
## Packages
```{r}
#| label: load-pkgs
#| message: false
library(tidyverse)
library(tidymodels)
library(knitr)
library(openintro)
# fix data!
loans_full_schema <- droplevels(loans_full_schema)
```
## Goal
Create a model for precicting `interest_rate`.
## View data
Note the dimensions of the data and the variable names.
Review the data dictionary.
```{r}
#| label: load-data
# add code here
```
## Split data into training and testing
Split your data into testing and training sets.
```{r}
#| label: initial-split
# add code here
```
## Write the model
Write the model for predicting interest rate (`interest_rate`) from debt to income ratio (`debt_to_income`), the term of loan (`term`), the number of inquiries (credit checks) into the applicant's credit during the last 12 months (`inquiries_last_12m`), whether there are any bankruptcies listed in the public record for this applicant (`bankrupt`), and the type of application (`application_type`).
The model should allow for the effect of to income ratio on interest rate to vary by application type.
*Add model here*
## Exploration
Explore characteristics of the variables you'll use for the model using the training data only.
```{r}
#| label: explore
# add code here
```
## Specify model
Specify a linear regression model.
Call it `office_spec`.
```{r}
#| label: specify-model
# add code here
```
## Create recipe
- Predict `interest_rate` from `debt_to_income`, `term`, `inquiries_last_12m`, `public_record_bankrupt`, and `application_type`.
- Mean center `debt_to_income`.
- Make `term` a factor.
- Create a new variable: `bankrupt` that takes on the value "no" if `public_record_bankrupt` is 0 and the value "yes" if `public_record_bankrupt` is 1 or higher. Then, remove `public_record_bankrupt`.
- Interact `application_type` with `debt_to_income`.
- Create dummy variables where needed and drop any zero variance variables.
```{r}
#| label: create-recipe
# add code here
```
## Create workflow
Create the workflow that brings together the model specification and recipe.
```{r}
#| label: create-wflow
# add code here
```
## Cross validation
Conduct 10-fold cross validation.
```{r}
#| label: cv-tenfold
# add code here
```
## Summarize CV metrics
Summarize metrics from your CV resamples.
```{r}
#| label: cv-summarize
# add code here
```
Why are we focusing on R-squared and RMSE instead of adjusted R-squared, AIC, BIC?
*\[Add response here\]*
## Next steps...
Depending on time, either
- Create a workflow for another model with a new recipe (omitting the interaction variable), conduct CV, do model selection between these two, and then interpret the coefficients for the selected model.
- Or interpret the coefficients for the one model you fit.
Make sure to interpret the intercept and slope coefficient for at least one numerical, one categorical, and one interaction predictor.