-
Notifications
You must be signed in to change notification settings - Fork 0
/
ToothData.Rmd
169 lines (122 loc) · 6.71 KB
/
ToothData.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: "Exploration and Analysis of the 'ToothGrowth Dataset'"
author: "Vivek Narayan"
date: "December 14, 2018"
output: pdf_document
---
```{r setup, include=FALSE}
#setup workspace
library(tidyverse)
options(digits = 4)
#load data
data("ToothGrowth"); data <- ToothGrowth
```
## Introduction
This report is a brief exploration and analysis of the 'ToothGrowth' data-set in R. The data-set contains the results from an experiment conducted on (n = 60) rodents (Guinea Pigs) with the objective to ascertain the effects of Vitamin C on ondontoblast growth (cells responsible for tooth growth) as measured by the length of tooth growth.
```{r summary, echo=FALSE}
summary(data)
```
The data is in the form of a data frame with 60 observations and three variables, 'len' (Tooth length), 'supp' (two types OJ or VC i.e. Orange Juice or Vitamin C - Ascorbic Acid), and 'dose' (0.5, 1, and 2 milligrams/day). Each dose level contains 10 observations per supplement type.
## Exploratory Data Analysis
Figure 1.1 shows the density plots for the outcome variable of interest. There seems to be two distinct groups in the OJ group, and three distinct groups in the VC group. Figure 1.2 shows the boxplots for the same data. There seems to be a trend between dose and length. However, there doesn't seem to be much difference between supplement type at the 2mg/day dosage.
```{r eda1, echo=FALSE, fig.height=3}
#transform data
data$dose <- as.factor(data$dose)
#eda density plots
data %>% ggplot(aes(x = len, fill = dose)) +
geom_density(alpha = 0.2) + facet_wrap(~ supp) + theme_light() +
labs(x = "Length", y = "Density", title = "Tooth Length by Supplement Type",
caption = "Figure 1.1") +
scale_fill_discrete(name="Dose")
```
```{r eda2, echo=FALSE, fig.height=3}
#eda boxplots
data %>% ggplot(aes(x = dose, y = len, fill = supp)) + geom_boxplot() + theme_light() +
labs(x = "Dose (mg/day)", y = "Length", title = "Tooth Length and Dose by Supplement",
caption = "Figure 1.2") +
scale_fill_discrete(name="Supplement")
```
## Analysis
### Null Hypotheses
Given the potential trend identified in the EDA, the following null - hypotheses will be tested:
1. There is no difference in the average tooth length between the cohort administered supplements via orange juice vs the cohort administered Ascorbic Acid, at each dose level.
2. Increasing dosage of supplement via Orange Juice does not lead to a change in tooth growth.
3. Increasing dosage of supplement via Ascorbic Acid does not lead to a change in tooth growth.
### Assumptions
* There was random assignment within the cohort between those that were given Orange Juice vs those that were given Ascorbic Acid. Furthermore, there was random assignment when deciding which subjects would receive larger doses of the supplement.
* The sample size is less than 10% of the population.
* Since the sample size is less than 30 in each test arm, t-intervals and hence t. tests will be used.
* Test subjects are not genetic clones.
* The response variable i.e. Tooth growth is iid (independent and identically distributed) i.e. the length of the tooth in one subject is not correlated to the length of the tooth in another subject - mutually exclusive - and that each subject belongs to the same population.
### Results of the t.tests:
```{r pressure, echo=FALSE}
db <- split(data, data$dose)
#loop over db to extract t.test values of interest.
results <- NULL
results <- data.frame(results)
for (i in 1:3) {
t_test <- t.test(len ~ supp, data = db[[i]])
results[i,1] <- t_test$p.value
results[i,2] <- t_test$conf.int[1]
results[i,3] <- t_test$conf.int[2]
results[i,4] <- t_test$estimate[1]
results[i,5] <- t_test$estimate[2]
}
colnames(results) <- c("p Value", "Lower CI", "Upper CI", "Est. OJ Mean", "Est. VC Mean")
results$Dose <- c("0.5", "1", "2")
# additional secondary data transformations to create subgroups based on supplement type.
db2 <- split(data, data$supp)
#function to extract t.test values of interest.
extract_t_info <- function(name, t_obj) {
name <- NULL
name <- data.frame(name)
name[1,1] <- t_obj$p.value
name[1,2] <- t_obj$conf.int[1]
name[1,3] <- t_obj$conf.int[2]
name[1,4] <- t_obj$estimate[1]
name[1,5] <- t_obj$estimate[2]
name
}
# series of t.text calls on the subgroups.
results_OJ_51 <- t.test(db2$OJ$len[1:10], db2$OJ$len[11:20])
results_OJ_12 <- t.test(db2$OJ$len[11:20], db2$OJ$len[21:30])
results_OJ_52 <- t.test(db2$OJ$len[1:10], db2$OJ$len[21:30])
results_VC_51 <- t.test(db2$VC$len[1:10], db2$VC$len[11:20])
results_VC_12 <- t.test(db2$VC$len[11:20], db2$VC$len[21:30])
results_VC_52 <- t.test(db2$VC$len[1:10], db2$VC$len[21:30])
# extract t.test values of interest
results_OJ_51 <- extract_t_info(t_obj = results_OJ_51)
results_OJ_12 <- extract_t_info(t_obj = results_OJ_12)
results_OJ_52 <- extract_t_info(t_obj = results_OJ_52)
results_VC_51 <- extract_t_info(t_obj = results_VC_51)
results_VC_12 <- extract_t_info(t_obj = results_VC_12)
results_VC_52 <- extract_t_info(t_obj = results_VC_52)
# apply relevant colnames
colnames(results_OJ_51) <- c("p Value", "Lower CI", "Upper CI", "Est. 0.5 Mean", "Est. 1 Mean")
colnames(results_OJ_12) <- c("p Value", "Lower CI", "Upper CI", "Est. 1 Mean", "Est. 2 Mean")
colnames(results_OJ_52) <- c("p Value", "Lower CI", "Upper CI", "Est. 0.5 Mean", "Est. 2 Mean")
colnames(results_VC_51) <- c("p Value", "Lower CI", "Upper CI", "Est. 0.5 Mean", "Est. 1 Mean")
colnames(results_VC_12) <- c("p Value", "Lower CI", "Upper CI", "Est. 1 Mean", "Est. 2 Mean")
colnames(results_VC_52) <- c("p Value", "Lower CI", "Upper CI", "Est. 0.5 Mean", "Est. 2 Mean")
# print results
cat("Difference of means between supplement type by dose")
results
cat("Difference of means between 0.5 and 1 mg dose supplement = Orange Juice")
results_OJ_51
cat("Difference of means between 0.5 and 1 mg dose supplement = Orange Juice")
results_OJ_12
cat("Difference of means between 0.5 and 1 mg dose supplement = Orange Juice")
results_OJ_52
cat("Difference of means between 0.5 and 1 mg dose supplement = Ascorbic Acid")
results_VC_51
cat("Difference of means between 0.5 and 1 mg dose supplement = Ascorbic Acid")
results_VC_12
cat("Difference of means between 0.5 and 1 mg dose supplement = Ascorbic Acid")
results_VC_52
```
### Conclusion:
1. Increasing the dose of supplement regardless of supplement type leads to an increase in tooth growth.
2. Orange Juice leads to more tooth growth as compared to Ascorbic Acid when the supplement dose is 0.5 and 1 mg per day.
3. There is no difference in tooth growth between those subjects administered Orange Juice vs those subjects that were administered Ascorbic Acid at the 2 mg / day dose.
***
The code for generating this report can be found on [github](https://github.com/maximegalon5/Toothgrowth).