-
Notifications
You must be signed in to change notification settings - Fork 0
/
ae-3-duke-forest-notes.qmd
72 lines (56 loc) · 1.56 KB
/
ae-3-duke-forest-notes.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
title: "AE 3: Duke Forest houses"
subtitle: "Checking model conditions"
author: "Add your name here"
format: pdf
editor: visual
---
## Packages
```{r load-packages}
#| message: false
library(tidyverse)
library(tidymodels)
library(openintro)
library(knitr)
```
## Predict sale price from area
```{r}
df_fit <- linear_reg() %>%
set_engine("lm") %>%
fit(price ~ area, data = duke_forest)
tidy(df_fit) %>%
kable(digits = 2)
```
\pagebreak
## Model conditions
### Exercise 1
The following code produces the residuals vs. fitted values plot for this model.
Comment out the layer that defines the y-axis limits and re-create the plot.
How does the plot change?
Why might we want to define the limits explicitly?
```{r}
df_aug <- augment(df_fit$fit)
ggplot(df_aug, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
ylim(-1000000, 1000000) +
labs(
x = "Fitted value", y = "Residual",
title = "Residuals vs. fitted values"
)
```
A symmetric axis allows us to evaluate more easily whether the residuals are randomly scattered around the $y = 0$ line.
\pagebreak
### Exercise 2
Improve how the values on the axes of the plot are displayed by modifying the code below.
```{r}
ggplot(df_aug, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(
x = "Fitted value", y = "Residual",
title = "Residuals vs. fitted values"
) +
scale_x_continuous(labels = label_dollar()) +
scale_y_continuous(labels = label_dollar(), limits = c(-1000000, 1000000))
``