-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathAnova.Rmd
132 lines (97 loc) · 3.57 KB
/
Anova.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
title: "ANOVA in R"
author: "Abhishek Kumar"
date: "17 August 2020"
output: github_document
---
In this tutorial, I will show how to implement ANOVA in R.
Here is a case study.
I belong to a golf club in my neighborhood. I divide the year into three golf
seasons: summer (June–September), winter (November–March), and shoulder
(October, April, and May). I believe that I play my best golf during the summer
(because I have more time and the course isn’t crowded) and shoulder (because the
course isn’t crowded) seasons, and my worst golf is during the winter (because
when all of the part-year residents show up, the course is crowded, play is slow,
and I get frustrated). Data from the last year are shown in the following table.
Season | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
-------|---|---|---|---|---|---|---|---|---|---
Summer | 83 | 85 | 85 | 87 | 90 | 88 | 88 | 84 | 91 | 90
Shoulder | 91 | 87 | 84 | 87 | 85 | 86 | 83 | NA | NA | NA
Winter | 94 | 91 | 87 | 85 | 87 | 91 | 92 | 86 | NA | NA
## Data Preparation
```{r}
library(car)
library(ggpubr)
```
Let's prepare the data for analysis:
```{r}
obs <- c(83, 85, 85, 87, 90, 88, 88, 84, 91, 90,
91, 87, 84, 87, 85, 86, 83,
94, 91, 87, 85, 87, 91, 92, 86)
season <- c(rep("Summer", 10),
rep("Shoulder", 7),
rep("Winter", 8))
dat <- data.frame(obs, season)
str(dat)
```
## Anova in R
```{r}
mod <- aov(data = dat, obs~season)
summary(mod)
```
## Assumptions
*1. Homogeneity of variances*
The classical one-way ANOVA test requires an assumption of equal variances for
all groups. The residuals versus fits plot can be used to check the homogeneity
of variances.
```{r}
plot(mod, 1)
```
In the above plot, there is no evident relationships between residuals and fitted
values (the mean of each groups), which is good. So, we can assume the homogeneity
of variances.
It’s also possible to use **Bartlett’s test** or **Levene’s test** to check the
homogeneity of variances. I recommend Levene’s test, which is less sensitive to
departures from normal distribution.
```{r}
bartlett.test(obs ~ season, data = dat)
```
```{r}
leveneTest(obs ~ season, data = dat)
```
From the output above we can see that the p-value is not less than the significance
level of 0.05. This means that there is no evidence to suggest that the variance
across groups is statistically significantly different. Therefore, we can assume
the homogeneity of variances in the different treatment groups.
In our example, the homogeneity of variance assumption turned out to be fine:
the Levene test is not significant.
*2. Normality*
The normal probability plot of residuals is used to check the assumption that
the residuals are normally distributed. In this plot, the quantiles of the residuals
are plotted against the quantiles of the normal distribution. It should approximately
follow a straight line.
```{r}
plot(mod, 2)
```
In the above plot, as all the points fall approximately along this reference line,
we can assume normality.
```{r}
ggqqplot(residuals(mod))
```
Alternatively, the normality assumption can be checked using the Shapiro-Wilk test
on the ANOVA residuals.
```{r}
shapiro.test(mod$residuals)
#or
shapiro.test(residuals(mod))
```
In the above test pvalue greater than significance level (0.05) finds no indication
that normality is violated.
Other diagnostic plot
```{r}
par(mfrow = c(2, 2))
plot(mod, 3)
plot(mod, 4)
plot(mod, 5)
plot(mod, 6)
```