-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathA_Deep_Dive_into_ANOVA_part_2.Rmd
160 lines (120 loc) · 5.54 KB
/
A_Deep_Dive_into_ANOVA_part_2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
title: "A Deep Dive into ANOVA (part 2)"
author: "Vadim Tyuryaev"
date: "2024-01-14"
output: html_document
---
```{r setup, include=FALSE}
# Set global chunk options
knitr::opts_chunk$set(echo = TRUE)
```
# Custom Two-Way ANOVA Function
The following function, `two_way_anova`, computes the various components of a
two-way ANOVA with an interaction term from first principles Such approach
provides an insight into the mechanics inherent in ANOVA computations.
```{r two_way_anova_function, echo=TRUE}
# Two-way ANOVA function with interaction
two_way_anova <- function(data, response_col, factor1_col, factor2_col) {
# Extract data columns
response <- data[[response_col]]
factor1 <- data[[factor1_col]]
factor2 <- data[[factor2_col]]
# Identify unique levels for each factor
levels_factor1 <- unique(factor1)
levels_factor2 <- unique(factor2)
# Compute overall (grand) mean and group means
grand_mean <- mean(response)
means_factor1 <- tapply(response, factor1, mean)
means_factor2 <- tapply(response, factor2, mean)
# Preallocate a matrix for the interaction means
means_interaction <- matrix(0, nrow = length(levels_factor1),
ncol = length(levels_factor2))
# Calculate cell means for the interaction between factor1 and factor2
for (i in 1:length(levels_factor1)) {
for (j in 1:length(levels_factor2)) {
means_interaction[i, j] <- mean(response[factor1 == levels_factor1[i] & factor2 == levels_factor2[j]])
}
}
# Calculate sums of squares
ss_total <- sum((response - grand_mean)^2)
ss_factor1 <- sum((means_factor1 - grand_mean)^2 * table(factor1))
ss_factor2 <- sum((means_factor2 - grand_mean)^2 * table(factor2))
# Preallocate matrix for the interaction sum of squares
ss_interaction_mat <- matrix(0, nrow = length(levels_factor1),
ncol = length(levels_factor2))
# Compute interaction sum of squares for each cell
for (i in 1:length(levels_factor1)) {
for (j in 1:length(levels_factor2)) {
ss_interaction_mat[i, j] <- table(factor1, factor2)[i, j] *
(means_interaction[i, j] - means_factor1[i] - means_factor2[j] + grand_mean)^2
}
}
ss_interaction <- sum(ss_interaction_mat)
# Compute error sum of squares
ss_error <- ss_total - ss_factor1 - ss_factor2 - ss_interaction
# Calculate degrees of freedom for each component
df_factor1 <- length(levels_factor1) - 1
df_factor2 <- length(levels_factor2) - 1
df_interaction <- df_factor1 * df_factor2
df_error <- length(response) - (length(levels_factor1) * length(levels_factor2))
# Compute mean squares
ms_factor1 <- ss_factor1 / df_factor1
ms_factor2 <- ss_factor2 / df_factor2
ms_interaction <- ss_interaction / df_interaction
ms_error <- ss_error / df_error
# Calculate F-statistics for each source of variation
f_factor1 <- ms_factor1 / ms_error
f_factor2 <- ms_factor2 / ms_error
f_interaction <- ms_interaction / ms_error
# Calculate corresponding p-values
p_factor1 <- 1 - pf(f_factor1, df_factor1, df_error)
p_factor2 <- 1 - pf(f_factor2, df_factor2, df_error)
p_interaction <- 1 - pf(f_interaction, df_interaction, df_error)
# Create and return a summary data frame (ANOVA table)
results <- data.frame(
Factor = c(factor1_col, factor2_col, "Interaction", "Error"),
Df = c(df_factor1, df_factor2, df_interaction, df_error),
SumSq = c(ss_factor1, ss_factor2, ss_interaction, ss_error),
MeanSq = c(ms_factor1, ms_factor2, ms_interaction, ms_error),
Fvalue = c(f_factor1, f_factor2, f_interaction, ""),
Pval = c(p_factor1, p_factor2, p_interaction, "")
)
return(results)
}
```
# Applying the Custom Two-Way ANOVA Function
In this section, we load the [`CO2`](https://www.rdocumentation.org/packages/datasets/versions/3.6.2/topics/CO2) dataset and apply the custom two-way ANOVA function. We fit a two-way ANOVA model
with interaction term to estimate how the mean of a `uptake` changes according to
the levels of the factors `Type` and `Treatment`. If the model is additive (i.e.
the effects of the factors combine independently), interaction term will not
be statistically significant.
```{r CO2_example, echo=TRUE}
# Load the built-in CO2 dataset
data(CO2)
# Apply the custom two-way ANOVA function to the CO2 data
result <- two_way_anova(CO2, "uptake", "Type", "Treatment")
print(result)
```
# Comparison with the Built-in ANOVA Function
We now compare the results from our custom function with those from R's built-in
`aov` function. The summary output is printed with high precision (8 digits) for
a detailed comparison.
```{r comparasion, echo=TRUE}
# Perform two-way ANOVA using R's built-in aov function
anova_model <- aov(uptake ~ Type * Treatment, data = CO2)
print(summary(anova_model), digits = 8)
```
The results are **identical**.
# Visualizing the Data with [`ggpubr`](https://cran.r-project.org/web/packages/ggpubr/index.html)
The final section creates a visual summary of the data using the `ggpubr` package.
The plot displays the mean and standard error of uptake for each level of Treatment,
differentiated by Type. A dot plot overlay shows individual data points.
Custom colors (red and green) are used to clearly distinguish between the groups.
```{r Visualizing, echo=TRUE, warning=FALSE,message=FALSE}
library(ggpubr)
ggline(CO2, x = "Treatment", y = "uptake", color = "Type",
add = c("mean_se", "dotplot"),
palette = c("red", "green"))
```
If the underlying model was strictly additive, what would one expect to see here?
Does the plot above reflect this expectation?