-
Notifications
You must be signed in to change notification settings - Fork 1
/
finalReport.Rmd
463 lines (367 loc) · 20.3 KB
/
finalReport.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
---
title: "Article ID: 8-5-2015 PS"
output:
html_document:
toc: true
toc_float: true
---
# Report Details
```{r}
articleID <- "8-5-2015_PS" # insert the article ID code here e.g., "10-3-2015_PS"
reportType <- "final" # specify whether this is the 'pilot' report or 'final' report
pilotNames <- "Kyle MacDonald" # insert the pilot's name here e.g., "Tom Hardwicke". If there are multiple cpilots enter both names in a character string e.g., "Tom Hardwicke, Bob Dylan"
copilotNames <- "Tom Hardwicke" # # insert the co-pilot's name here e.g., "Michael Frank". If there are multiple co-pilots enter both names in a character string e.g., "Tom Hardwicke, Bob Dylan"
pilotTTC <- 150 # insert the pilot's estimated time to complete (in minutes, fine to approximate) e.g., 120
copilotTTC <- 30 # insert the co-pilot's estimated time to complete (in minutes, fine to approximate) e.g., 120
pilotStartDate <- as.Date("06/13/18", format = "%m/%d/%y") # insert the pilot's start date in US format e.g., as.Date("01/25/18", format = "%m/%d/%y")
copilotStartDate <- as.Date("09/19/18", format = "%m/%d/%y") # insert the co-pilot's start date in US format e.g., as.Date("01/25/18", format = "%m/%d/%y")
completionDate <- as.Date("09/19/18", format = "%m/%d/%y") # copilot insert the date of final report completion (after any necessary rounds of author assistance) in US format e.g., as.Date("01/25/18", format = "%m/%d/%y")
```
------
#### Methods summary:
The goal of E3 was to compare how children's puppet choices were influenced by the type of prompt -- whether they wanted to "play with a puppet" or if they thought a pupper was "nicer." On each trial, children met two puppets -- one who was good at activating a toy (competent) and one who was not so good (incompetent). Then, children were asked to make a choice between the two puppets, and the type of question varied as a function of condition. In the play condition, children were asked, "Which one would you rather play with?" In the nicer condition, children were asked, "Which one is nicer?" In the nicer baseline condition, the introductory script was changed, but children were still asked, "Which one is nicer?" Each child only gave single choice response, which was coded after the task by a coder blind to condition.
------
#### Target outcomes:
For this article you should focus on the findings reported for experiment 3.
Specifically, you should attempt to reproduce all descriptive and inferential analyses reported in the text below and associated tables/figures:
> Results were coded from videotape by a coder blind to
the condition and the child’s final choice. If, in the coder’s
judgment, the puppets were not placed equidistant
from a child, that child would have been excluded from
analysis, but no children were excluded on these grounds.
One subject was dropped because of experimenter error,
2 were dropped because they left before the experiment
concluded, and 5 were dropped because of parental
interference. In addition, 10 children (2 in the play condition,
5 in the nicer condition, and 3 in the nicer baseline
control) were excluded from analysis because they failed
to respond in the first 30 s. Thus, the final sample included
16 subjects per condition.
> Figure 2 shows the results from this experiment. In the
play condition, 81.25% of children chose the competent
agent (13 subjects; CI = [68.75%, 100%]). Thus, this condition
replicated the findings of Experiment 2. By contrast,
children in the nicer condition tended to choose the less
competent agent. Of the 16 two-year-olds who made a
choice, only 31.25% (CI = [6.25%, 50.00%]) chose the competent
agent (5 subjects). That is, the subjects’ choice of
the competent agent dropped from 81.25% to 31.25%
when they were asked to judge which agent was nicer
(CI = [21.05%, 81.67%]; p < .05, Fisher exact test). Finally,
subjects’ performance in the nicer baseline condition suggests
that this difference between the play and nicer conditions
was not due to children’s baseline belief that less
competent agents are nicer. Whereas only 68.75% of subjects
(11 out of 16) chose the incompetent agent in the
nicer condition, 31.25% of subjects (5 out of 16) chose the
incompetent agent in the nicer baseline condition, a
decrease of 37.5% (95% CI = [6.17%, 71.03%]; p < .075,
Fisher exact test). This suggests that toddlers do not simply
assume that incompetent agents are sympathetic; rather,
they take into account the relative cost of agents’ actions.
When both puppets refused to help, the toddlers
might have inferred that the less competent agent was
nicer because they exonerated the less competent agent
from performing a costly action, because they judged the
more competent agent harshly for refusing to perform a
low-cost action, or both. Further research might shed
light on the precise inferences that underlie children’s
social evaluations in this paradigm. However, note that neither
agent was canonically nice: Both agents explicitly
refused to engage in a helpful action. If our subjects
understood “nice” only with respect to nice, helpful
behaviors, rather than with respect to internal motivations,
then they should have chosen a puppet at chance
or refused to answer. Instead, these 2-year-olds were able
to use differences in the agents’ costs to identify the nicer
of two unhelpful agents.
------
```{r global_options, include=FALSE}
# sets up some formatting options for the R Markdown document
knitr::opts_chunk$set(echo=TRUE, warning=FALSE, message=FALSE)
```
# Step 1: Load packages and prepare report object
```{r}
# load packages
library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(ReproReports) # custom report functions
```
```{r}
# Prepare report object. This will be updated automatically by the reproCheck function each time values are compared
reportObject <- data.frame(dummyRow = TRUE, reportedValue = NA, obtainedValue = NA, valueType = NA, percentageError = NA, comparisonOutcome = NA, eyeballCheck = NA)
```
# Step 2: Load data
```{r}
d <- read_csv("data/JaraEttinger_Tenenbaum_Schulz_Results.csv")
```
# Step 3: Tidy data
Subset to just Experiment 3, which is the target of the check.
```{r}
d_e3 <- d %>% filter(Experiment == "Experiment 3")
```
How many participants in the dataset? From the Participants section:
> Sixty-six subjects (mean age = 2.48 years, SD = 114 days, range = 2.00–2.98 years) were recruited and tested at an urban children’s museum. Thirteen additional 2-year-olds were recruited but not included in the study because they declined to participate in a warm-up task, in which they were asked to choose between two stuffed animals.
Additional information about the analysis sample from the results section:
> One subject was dropped because of experimenter error,
2 were dropped because they left before the experiment
concluded, and 5 were dropped because of parental
interference. In addition, 10 children (2 in the play condition,
5 in the nicer condition, and 3 in the nicer baseline
control) were excluded from analysis because they failed
to respond in the first 30 s. Thus, the final sample included
16 subjects per condition.
```{r}
d_e3 %>% count(Status) %>% kable()
```
Filter out based on exclusionary criteria and check if we get 16 participants in each condition.
Note that there was some additional filtering information in the AnalyzeData.R script.
> 10 children did not make a choice: 3 in the control condition, five in the nicer condition, and two in the play condition.
```{r}
d_e3 %>%
filter(Status == "Data", Choice != "None") %>%
count(Condition) %>%
kable()
```
We can reproduce the reported number of participants in each condition. So we use this filtered dataset for
subsequent analyses.
```{r}
d_e3_analysis <- d_e3 %>% filter(Status == "Data", Choice != "None")
```
# Step 4: Run analysis
## Pre-processing
No pre-processing needed.
## Descriptive statistics
#### Compute choice proportions
> In the play condition, 81.25% of children chose the competent
agent (13 subjects; CI = [68.75%, 100%]). By contrast,
children in the nicer condition tended to choose the less
competent agent. Of the 16 two-year-olds who made a
choice, only 31.25% (CI = [6.25%, 50.00%]) chose the competent
agent (5 subjects).
```{r}
n_subs <- 16
choice_results_n <- d_e3_analysis %>%
count(Condition, Choice) %>%
mutate(n_total = n_subs,
prop = (n / n_total) * 100)
choice_results_n %>% kable()
```
#### Check the descriptive choice results
Note that we use the `valueType = 'n'` to report the number of children
who made a given choice.
```{r}
# play condition
comp_choice_play <- choice_results_n$n[5]
reportObject <- reproCheck(reportedValue = '13', obtainedValue = comp_choice_play, valueType = 'n')
# nice condition
comp_choice_nice <- choice_results_n$n[3]
reportObject <- reproCheck(reportedValue = '5', obtainedValue = comp_choice_nice, valueType = 'n')
# control condition
comp_choice_control <- choice_results_n$n[1]
reportObject <- reproCheck(reportedValue = '11', obtainedValue = comp_choice_control, valueType = 'n')
```
They also reported the following descriptives in a comparison of choosing the incompetent agent in nicer condition and the nicer baseline (control).
> Whereas only 68.75% of sub- jects (11 out of 16) chose the incompetent agent in the nicer condition, 31.25% of subjects (5 out of 16) chose the incompetent agent in the nicer baseline condition...
```{r}
# nice condition choosing incompetent
incomp_nice <- choice_results_n$n[4]
reportObject <- reproCheck(reportedValue = '11', obtainedValue = incomp_nice, valueType = 'n')
# control condition choosing incompetent
incomp_control <- choice_results_n$n[2]
reportObject <- reproCheck(reportedValue = '5', obtainedValue = incomp_control, valueType = 'n')
```
#### Bootstrap 95% CIs on choice behavior
```{r}
# library(tidyboot)
# nboot <- 1000
#
# choice_results_table <- d_e3_analysis %>%
# mutate(numeric_choice = ifelse(Choice == "Competent", 1, 0),
# logical_choice = as.logical(numeric_choice)) %>%
# group_by(Condition) %>%
# tidyboot_mean(logical_choice, nboot = nboot)
#
# choice_results_table %>%
# mutate(empirical_perc = empirical_stat * 100,
# ci_lower_perc = ci_lower * 100,
# ci_upper_perc = ci_upper * 100) %>%
# print.data.frame() %>%
# kable()
```
We aren't getting enough precision to compare values, so we compute CIs using the `boot` package (how the authors originally computed CIs).
```{r}
library(boot)
ProportionFunction<-function(data,indices){return(sum(data[indices])/length(data[indices]))}
nboot <- 1000
# Experiment 3 Play condition
exp3p<-c(rep(1,13),rep(0,3))
resultsexp3p <- boot(data=exp3p, statistic=ProportionFunction, R=nboot)
ci_play <- boot.ci(resultsexp3p,type="basic") # 62.50%-100%
ci_play_lower <- ci_play$basic[4] * 100
ci_play_upper <- ci_play$basic[5] * 100
```
```{r}
# Experiment 3 Nicer condition
exp3n <- c(rep(1,5),rep(0,11))
resultsexp3n <- boot(data=exp3n, statistic=ProportionFunction, R=nboot)
ci_nice <- boot.ci(resultsexp3n,type="basic") # 6.25%-50.00%
ci_nice_lower <- ci_nice$basic[4] * 100
ci_nice_upper <- ci_nice$basic[5] * 100
```
```{r}
# Experiment 3 control condition
exp3c <- c(rep(1,11),rep(0,5))
resultsexp3c <- boot(data=exp3c, statistic=ProportionFunction, R=nboot)
ci_control <- boot.ci(resultsexp3c,type="basic") # 50.00%-93.75%
ci_control_lower <- ci_control$basic[4] * 100
ci_control_upper <- ci_control$basic[5] * 100
```
#### Check the 95% CIs
```{r}
# play condition
reportObject <- reproCheck(reportedValue = '68.75', obtainedValue = ci_play_lower, valueType = 'ci')
reportObject <- reproCheck(reportedValue = '100', obtainedValue = ci_play_upper, valueType = 'ci')
# nice condition
reportObject <- reproCheck(reportedValue = '6.25', obtainedValue = ci_nice_lower, valueType = 'ci')
reportObject <- reproCheck(reportedValue = '50', obtainedValue = ci_nice_upper, valueType = 'ci')
```
KM note: the authors did not report the CI on the nicer baseline condition.
## Inferential statistics
The authors reported two key inferential analyses, both using Fisher exact tests.
> That is, the subjects’ choice of the competent agent dropped from 81.25% to 31.25% when they were asked to judge which agent was nicer (CI = [21.05%, 81.67%]; p < .05, Fisher exact test)
```{r}
# Compare Play and Nicer conditions in Experiment 3
# Fisher's exact test
Exp3ChoiceSwitch <-
matrix(c(13, 5, 3, 11),
nrow = 2,
dimnames =
list(c("Play", "Nicer"),
c("Competent", "Incompetent")))
Exp3ChoiceSwitch
play_nice_fisher <- fisher.test(Exp3ChoiceSwitch)
# Bootstrap the difference
Exp3NP<-data.frame(
choice=c(rep(1,13),rep(0,3),rep(1,5),rep(0,11)),
condition=c(rep("Play",16),rep("Nicer",16))
)
IncreasePerc<-function(x,id){
dat<-x[id,]
d1<-filter(dat,condition=="Play")
d2<-filter(dat,condition=="Nicer")
res<-(sum(d1$choice)/nrow(d1))-((sum(d2$choice)/nrow(d2)))
return(res)
}
NicerDiff<-boot(Exp3NP,statistic=IncreasePerc, R=500000)
play_nice_diff <- boot.ci(NicerDiff,type="basic") # 21.05%-81.67%
```
#### Check the values for the fisher's exact test for the Play vs. Nice condition difference.
```{r}
play_nice_diff_ci_low <- play_nice_diff$basic[4] %>% round(4) * 100
play_nice_diff_ci_upper <- play_nice_diff$basic[5] %>% round(4) * 100
play_nice_diff_ci_p <- play_nice_fisher$p.value
# ci-upper
reportObject <- reproCheck(reportedValue = '21.05', obtainedValue = play_nice_diff_ci_low, valueType = 'ci')
# ci-lower
reportObject <- reproCheck(reportedValue = '81.67', obtainedValue = play_nice_diff_ci_upper, valueType = 'ci')
# p-value
reportObject <- reproCheck(reportedValue = 'p < .05', obtainedValue = play_nice_diff_ci_p, valueType = 'p',
eyeballCheck = TRUE)
```
> Whereas only 68.75% of sub- jects (11 out of 16) chose the incompetent agent in the nicer condition, 31.25% of subjects (5 out of 16) chose the incompetent agent in the nicer baseline condition, a decrease of 37.5% (95% CI = [6.17%, 71.03%]; p < .075, Fisher exact test).
```{r}
# Compare Nicer and Control conditions in Experiment 3
# Fisher's exact test
Exp3NiceBias <-
matrix(c(5, 11, 11, 5),
nrow = 2,
dimnames =
list(c("Nicer", "Control"),
c("Competent", "Incompetent")))
Exp3NiceBias
nice_control_fisher <- fisher.test(Exp3NiceBias)
# Bootstrap the difference
Exp3NB<-data.frame(
choice=c(rep(1,11),rep(0,5),rep(1,5),rep(0,11)),
condition=c(rep("Baseline",16),rep("Nicer",16))
)
IncreasePerc<-function(x,id){
dat<-x[id,]
d1<-filter(dat,condition=="Baseline")
d2<-filter(dat,condition=="Nicer")
res<-(sum(d1$choice)/nrow(d1))-((sum(d2$choice)/nrow(d2)))
return(res)
}
set.seed(59283) # so we can replicate the exact numbers
NicerDiff <- boot(Exp3NB,statistic=IncreasePerc, R=500000)
nice_control_diff <- boot.ci(NicerDiff,type="basic") # 6.17%-71.03%
```
#### Check the values for the fisher's exact test for the Control vs. Nice condition difference.
```{r}
nice_control_diff_ci_low <- nice_control_diff$basic[4] %>% round(4) * 100
nice_control_diff_ci_upper <- nice_control_diff$basic[5] %>% round(4) * 100
nice_control_diff_ci_p <- nice_control_fisher$p.value
# ci-upper
reportObject <- reproCheck(reportedValue = '6.17', obtainedValue = nice_control_diff_ci_low, valueType = 'ci')
# ci-lower
reportObject <- reproCheck(reportedValue = '71.0', obtainedValue = nice_control_diff_ci_upper, valueType = 'ci')
# p-value
reportObject <- reproCheck(reportedValue = '.075', obtainedValue = nice_control_diff_ci_p, valueType = 'p')
```
KM note: they interpret a p-value of p < .075 as a significant effect, which is a little strange. Morever, the p-value via Fisher exact test is p = .0756, which actualy doesn't match their threshold of reporting. Discussed this issue with Tom, and we decided it was strange but given the small difference this should only be considered a minor numerical error. To report this, I entered the reported value as their threshold of .075.
# Step 5: Conclusion
This reproducibiity check was a success. The data, codebook, and R code provided by the original authors were quite helpful. I was able to reproduce the descriptive results and most of the inferential statistics. I did find four minor errors in the inferential statistics, which I think were likely caused by typographical errors in the paper writing process.
```{r}
Author_Assistance = FALSE # was author assistance provided? (if so, enter TRUE)
Insufficient_Information_Errors <- 0 # how many discrete insufficient information issues did you encounter?
# Assess the causal locus (discrete reproducibility issues) of any reproducibility errors. Note that there doesn't necessarily have to be a one-to-one correspondance between discrete reproducibility issues and reproducibility errors. For example, it could be that the original article neglects to mention that a Greenhouse-Geisser correct was applied to ANOVA outcomes. This might result in multiple reproducibility errors, but there is a single causal locus (discrete reproducibility issue).
locus_typo <- NA # how many discrete issues did you encounter that related to typographical errors?
locus_specification <- NA # how many discrete issues did you encounter that related to incomplete, incorrect, or unclear specification of the original analyses?
locus_analysis <- NA # how many discrete issues did you encounter that related to errors in the authors' original analyses?
locus_data <- NA # how many discrete issues did you encounter that related to errors in the data files shared by the authors?
locus_unidentified <- NA # how many discrete issues were there for which you could not identify the cause
# How many of the above issues were resolved through author assistance?
locus_typo_resolved <- NA # how many discrete issues did you encounter that related to typographical errors?
locus_specification_resolved <- NA # how many discrete issues did you encounter that related to incomplete, incorrect, or unclear specification of the original analyses?
locus_analysis_resolved <- NA # how many discrete issues did you encounter that related to errors in the authors' original analyses?
locus_data_resolved <- NA # how many discrete issues did you encounter that related to errors in the data files shared by the authors?
locus_unidentified_resolved <- NA # how many discrete issues were there for which you could not identify the cause
Affects_Conclusion <- NA # Do any reproducibility issues encounter appear to affect the conclusions made in the original article? TRUE, FALSE, or NA. This is a subjective judgement, but you should taking into account multiple factors, such as the presence/absence of decision errors, the number of target outcomes that could not be reproduced, the type of outcomes that could or could not be reproduced, the difference in magnitude of effect sizes, and the predictions of the specific hypothesis under scrutiny.
```
```{r}
reportObject <- reportObject %>%
filter(dummyRow == FALSE) %>% # remove the dummy row
select(-dummyRow) %>% # remove dummy row designation
mutate(articleID = articleID) %>% # add variables to report
select(articleID, everything()) # make articleID first column
# decide on final outcome
if(any(!(reportObject$comparisonOutcome %in% c("MATCH", "MINOR_ERROR"))) | Insufficient_Information_Errors > 0){
finalOutcome <- "Failure without author assistance"
if(Author_Assistance == T){
finalOutcome <- "Failure despite author assistance"
}
}else{
finalOutcome <- "Success without author assistance"
if(Author_Assistance == T){
finalOutcome <- "Success with author assistance"
}
}
# collate report extra details
reportExtras <- data.frame(articleID, pilotNames, copilotNames, pilotTTC, copilotTTC, pilotStartDate, copilotStartDate, completionDate, Author_Assistance, finalOutcome, Insufficient_Information_Errors, locus_typo, locus_specification, locus_analysis, locus_data, locus_unidentified, locus_typo_resolved, locus_specification_resolved, locus_analysis_resolved, locus_data_resolved, locus_unidentified_resolved)
# save report objects
if(reportType == "pilot"){
write_csv(reportObject, "pilotReportDetailed.csv")
write_csv(reportExtras, "pilotReportExtras.csv")
}
if(reportType == "final"){
write_csv(reportObject, "finalReportDetailed.csv")
write_csv(reportExtras, "finalReportExtras.csv")
}
```
# Session information
```{r session_info, include=TRUE, echo=TRUE, results='markup'}
devtools::session_info()
```