The Hidden Universe of Data-Analysis

Important Documents

Published Paper PNAS

Current Supplementary Materials

Executive Report - describing the full study

Research Design and Analysis:

Nate Breznau
Eike Mark Rinke
Alexander Wuttke
Hung H.V. Nguyen

Participant Researchers:

Show all participant co-researchers

Muna Adem, Jule Adriaans, Amalia Alvarez-Benjumea, Henrik Andersen, Daniel Auer, Flavio Azevedo, Oke Bahnsen, Dave Balzer, Paul C. Bauer, Gerrit Bauer, Markus Baumann, Sharon Baute, Verena Benoit, Julian Bernauer, Carl Berning, Anna Berthold, Felix S.Bethke, Thomas Biegert, Katharina Blinzler, Johannes N. Blumenberg, Licia Bobzien, Andrea Bohman, Thijs Bol, Amie Bostic, Zuzanna Brzozowska, Katharina Burgdorf, Kaspar Burger, Kathrin Busch, Juan Carlos-Castillo, Nathan Chan, Pablo Christmann, Roxanne Connelly, Christian Czymara, Elena Damian, Alejandro Ecker, Achim Edelmann, Maureen A.Eger, Simon Ellerbrock, Anna Forke, Andrea Forster, Chris Gaasendam, Konstantin Gavras, Vernon Gayle, Theresa Gessler, Timo Gnambs, Amélie Godefroidt, Alexander Greinert, Max Grömping, Martin Groß, Stefan Gruber, Tobias Gummer, Andreas Hadjar, Jan Paul Heisig, Sebastian Hellmeier, Stefanie Heyne, Magdalena Hirsch, Mikael Hjerm, Oshrat Hochman, Jan H. Höffler, Andreas Hövermann, Sophia Hunger, Christian Hunkler, NoraHuth, Zsofia Ignacz, LauraJacobs, Jannes Jacobsen, Bastian Jaeger, Sebastian Jungkunz, Nils Jungmann, Mathias Kauff, Manuel Kleinert, Julia Klinger, Jan-Philipp Kolb, Marta Kołczyńska, John Kuk, Katharina Kunißen, Dafina Kurti, Philipp Lersch, Lea-Maria Löbel, Philipp Lutscher, Matthias Mader, Joan Madia, Natalia Malancu, Luis Maldonado, Helge Marahrens, Nicole Martin, Paul Martinez, Jochen Mayerl, Oscar J. Mayorga, Patricia McManus, Kyle McWagner, Cecil Meeusen, Daniel Meierrieks, Jonathan Mellon, Friedolin Merhout, Samuel Merk, Daniel Meyer, Jonathan Mijs, Cristobal Moya, Marcel Neunhoeffer, Daniel Nüst, Olav Nygård, Fabian Ochsenfeld, Gunnar Otte, Anna Pechenkina, Christopher Prosser, Louis Raes, Kevin Ralston, Miguel Ramos, Frank Reichert, Leticia Rettore Micheli, Arne Roets, Jonathan Rogers, Guido Ropers, Robin Samuel, Gregor Sand, Constanza Sanhueza Petrarca, Ariela Schachter, Merlin Schaeffer, David Schieferdecker, Elmar Schlueter, Katja Schmidt, Regine Schmidt, Alexander Schmidt-Catran, Claudia Schmiedeberg, Jürgen Schneider, Martijn Schoonvelde, Julia Schulte-Cloos, Sandy Schumann, Reinhard Schunck, Jürgen Schupp, Julian Seuring, Henning Silber, Willem Sleegers, Nico Sonntag, Alexander Staudt, Nadia Steiber, Nils Steiner, Sebastian Sternberg, Dieter Stiers, Dragana Stojmenovska, Nora Storz, Erich Striessnig, Anne-Kathrin Stroppe, Janna Teltemann, Andrey Tibajev, Brian Tung, Giacomo Vagni, Jasper Van Assche, Metavan der Linden, Jolanda van der Noll, Arno Van Hootegem, Stefan Vogtenhuber, Bogdan Voicu, Fieke Wagemans, Nadja Wehl, Hannah Werner, Brenton Wiernik, Fabian Winter, Christof Wolf, Nan Zhang, Conrad Ziller, Björn Zakula, Stefan Zins and Tomasz Żółtak

Abstract

This is the repository for preparation and analysis of data obtained from the Crowdsourced Replication Initiative (Breznau, Rinke and Wuttke et al 2018) and used as the basis for the paper Observing Many Researchers Using the Same Data and Hypothesis Reveals a Hidden Universe of Uncertainty.

Recently, many researchers independently testing the same hypothesis using the same data, reported tremendous variation in results across scientific disciplines. This variability must derive from differences in each research process. Therefore, observation of these differences should reduce the implied uncertainty. Through a controlled study involving 73 researchers/teams we tested this assumption. Taking all research steps as predictors explains at most 2.6% of total effect size variance, and 10% of the deviance in subjective conclusions. Expertise, prior beliefs and attitudes of researchers explain even less. Ultimately, each model was unique, and as a whole this study provides evidence of a vast universe of research design variability normally hidden from view in the presentation, consumption, and perhaps even creation of scientific results.

Workflow

The workflow is provided in a literate programming format, R Markdown notebooks (.Rmd), and split across a number of files as described below. Next to the .Rmd files, there are also .html files of the same name. The latter contain HTML renderings of the notebooks with the created figures and tables, so that non-R users may view the workflow results more easily with any regular browser software. For example, the file 01_CRI_Descriptives.Rmd has a corresponding 01_CRI_Descriptives.html file in the same folder for easy viewing without the need for running any R code. Paths in the notebooks are handled with the here package and the paths are all relative to the projects root directory (where this README.md file is located). You can open an interactive environment to explore and execute the analysis yourself based on Binder (Project Jupyter, 2018):

The runtime environment created for the Binder uses an MRAN snapshot of 2020-03-29 (see file .binder/runtime.txt) and installs all required R packages in the file .binder/install.R.

The workflow includes a shinyapp that allows users to interact with results using specification curves.

0. User Notes

Note that belief in the hypothesis is reverse coded in our analysis and in the shiny app. It has no effect on results, and thus no necessary interpretation. Users should be aware that it is correclty coded in the original survey data, but gets falsely recoded in our workflow.

We have not added a codebook for subjective conclusions (see below) and users can now access the text of each team's subjective conclusion in the folder 'expansion_reports (where available, as not all teams provided one).

1. Source Code Cleaning

We collected the code from 73 teams and cleaned it for public sharing. This involved qualitative identification of model specifications, ensuring replicability, extracting Average Marginal Effects (AMEs) and redacting any identifying features. The resulting codes are compiled by software type in the sub-folders of this project, ordered by team ID number (in folder team_code, and sub-folders: team_code_SPSS, team_code_Stata, team_code_Mplus and team_code_R). The code in the team_code_R) folder imports the results from all other codes to compile a final joined dataset of effect sizes and confidence interval measures.

Users should be aware that the main data files include team zero, which is the results and model specifications from the study of Brady and Finnigan (2014) providing a launching point for the CRI; team zero is dropped from our main analyses but provides a point of comparison.

2. Data Pre-Preparation

Prior to our main analyses we import data from the Participant Survey including subjective voting on model quality, and the voting during the post-result deliberation. The code for these files (001-003) are contained in the folder data_prep. It is not necessary to run these scripts as their output is already saved in the data folder.

3. Code

Our primary analyses and results are in the code folder. Many of the results in this folder depend on data preparation done in the data_prep folder.

List of Command Code Files and their Functions

All of the following are located in the main or sub-folders of the folder code.

Filename	Location	Description	Output
`001_CRI_Prep_Subj_Votes.Rmd`	`data_prep`	Compile peer ranking of models	`FigS4`
`002_CRI_Data_Prep.Rmd`	`data_prep`	Primary data cleaning and merging; measurement of researcher characteristics	`TblS1`;`TblS3`;`FigS3`;`FigS3_fit_stats`
`003_CRI_Multiverse_Simulation.Rmd`	`data_prep`	Sets up multiverse data
`01_CRI_Descriptives.Rmd`	`code`	Descriptive statistics; codebook of 107 model design steps	`FigS5`;`FigS10`
`02_CRI_Common_Specifications.Rmd`	`code`	identifying (dis)similarities across models	`TblS4`
`03_CRI_Spec_Analysis.Rmd`	`code`	Plotting specification curves	`Fig1`;`FigS6`;`FigS7`;`FigS8`;`FigS9`
`04_CRI_Main_Analyses.Rmd`	`code`	Main regression models explaining outcome variance within and between teams	`Fig3`;`TblS5`;`TblS6(see bottom of S5)`;`TblS7`
`05_CRI_Main_Analyses_Variance_Function.Rmd`	`code`	Variance function regressions to explain variation in variance by team	`Fig2`;`FigS11`;`FigS12`;`FigS13`;`TblS11`
`06_CRI_Multiverse.Rmd`	`code`	Function to test all possible combinations of submitted model specifications to explain variance	`TblS8`;`TblS10`
`07_CRI_DVspecific_Analyses.Rmd`	`code`	re-running main models separately by dependent variable (6 ISSP survey questions)	`TblS9`

Subjective Conclusion Variables Codebook

In the file cri.csv, the different variables used to identify whether the teams concluded "support", "reject" or "not testable" for the hypothesis are potentially confusing. Here is the list of variables and their definitions. Please note that the data contain team "0" which is the original Brady and Finnigan study. Therefore, users will find that the number of cases is higher for models and teams, that that used in the study. We include these here for comparison.

Var name	Definition	Unit of analysis
`Hresult`	The team's conclusion separated by hypothesis tests for teams that insisted they tested two, not one hypothesis, the n is higher to account for the 14 teams that submitted two different results for each hypothesis	team-hypothesis-level, n = 89
`Hsupport`	If the team concluded overall support of the hypothesis, 1 = yes	team-level, n = 74
`Hreject`	If the team concluded overall rejection of the hypothesis, 1 = yes	team-level, n = 74
`Hnotest`	If the team concluded that the hypothesis was not testable with these data, 1 = yes	team-level, n=74
`Hmixed`	If the team broke protocol and insisted that there were two different hypotheses being tested, and found different results for each of these hypotheses, 1 = yes	team-level, n = 74
`Hsupport_stock`	If the team concluded that their models testing for an effect of the stock of immigrants in a country led to support of the hypothesis, 1 = yes	team-level, n = 74
`Hsreject_stock`	If the team concluded that their models testing for an effect of the stock of immigrants in a country led to rejection of the hypothesis, 1 = yes	team-level, n = 74
`Hnotest_stock`	If the team concluded that their models testing for an effect of the stock of immigrants in a country were not sufficient to test the hypothesis, 1 = yes	team-level, n = 74
`Hsupport_net`	If the team concluded that their models testing for an effect of the net migration (flow) of immigrants in a country led to support of the hypothesis, 1 = yes	team-level, n = 74
`Hsreject_net`	If the team concluded that their models testing for an effect of the net migration (flow) of immigrants in a country led to rejection of the hypothesis, 1 = yes	team-level, n = 74
`Hnotest_net`	If the team concluded that their models testing for an effect of the net migration (flow) of immigrants in a country were not sufficient to test the hypothesis, 1 = yes	team-level, n = 74
`Hsup`, `Hrej`, `Hno`	The specific hypothesis test conclusion by the team for those models (it is the variable `Hresult` converted into three dummies), 1 = yes for each dummy	team-hypothesis-level, n = 89

4. Users may Run All Code

The following scripts run all notebook files in order to check there are no code issues.

source("all.R")

Source Data

The data preparation code is in the sub-folder data_prep. After the data preparation files, all data files ready for the data analysis are in the data folder. There are numerous data files because the different participants' codes often require individual special files to run properly. The data files needed to reproduce all of the data analysis are:

Filename	Description	Source
MAIN FILES	Used in Main Analyses 01-07
`cri.csv`	Main data analysis file, model & team-levels. All specifications coded by the PIs, team test results and researcher characteristics in numeric format	Worked up in `code/data_prep`
`cri_str.csv`	A string-format only version of `cri.csv`	Worked up in `code/data_prep`
`cri_team.csv`	A version of `cri_str.csv` aggregated team-level means (N = 89 because 16 teams conducted independent hypothesis tests by 'stock' and 'flow' immigration measures)	Worked up in `code/data_prep`
`popdf_out.Rdata`	The peer review/deliberation scoring of model specifications as ranked by all participants; excepting non-response	Generated in sub-folder `CRI/data_prep`

SUB-FILES	Used in Preparation of Data or App
`Research Design Votes.xlsx`	Based on participant pre-registered designs, plus cursory review of all research designs. Not a fully accurate portrayal of final research designs because, (a) the broad range of specifications not reported in basic research designs and (b) the participant's often deviated from their proposed designs, if only slightly	This is a copy of the actual template (a Google Sheet) used to create the peer review voting system in the Participant Survey
`cri_shiny.csv`	The model-level data needed to run the shiny app	Generated in `code/data_prep`
`cri_shiny_team.csv`	The team-level data needed to run the shiny app	Generated in `code/data_prep`

Start local Binder

Install repo2docker and then run

repo2docker --editable .

Name		Name	Last commit message	Last commit date
Latest commit History 396 Commits
.binder		.binder
.github/workflows		.github/workflows
code		code
data		data
results		results
shiny		shiny
.gitattributes		.gitattributes
.gitignore		.gitignore
CRI.Rproj		CRI.Rproj
LICENSE.md		LICENSE.md
README.md		README.md
all.R		all.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Hidden Universe of Data-Analysis

Important Documents

Research Design and Analysis:

Participant Researchers:

Abstract

Workflow

0. User Notes

1. Source Code Cleaning

2. Data Pre-Preparation

3. Code

List of Command Code Files and their Functions

Subjective Conclusion Variables Codebook

4. Users may Run All Code

Source Data

Start local Binder

About

Releases

Packages

Contributors 3

Languages

License

nbreznau/CRI

Folders and files

Latest commit

History

Repository files navigation

The Hidden Universe of Data-Analysis

Important Documents

Research Design and Analysis:

Participant Researchers:

Abstract

Workflow

0. User Notes

1. Source Code Cleaning

2. Data Pre-Preparation

3. Code

List of Command Code Files and their Functions

Subjective Conclusion Variables Codebook

4. Users may Run All Code

Source Data

Start local Binder

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages