title | author | output | ||||||
---|---|---|---|---|---|---|---|---|
Adopting open source practices<br/>for better science |
Pierce Edmiston
<pedmiston@wisc.edu>
[sapir.psych.wisc.edu](http://sapir.psych.wisc.edu)
[github.com/pedmiston](https://github.com/pedmiston)
|
|
Open source practices that make for more reproducible science:
- Version control
- Dynamic documents
- Building from source
Conclusion: It's worth it!
- I want my research to be reproducible.
- I want to attract collaborators.
???
I want my research to be reproducible by other people and by myself. This means no undocumented steps! Document things in code for maximum reproducibility.
I want the bar for getting involved in one of my projects to be as low as possible.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science.
???
My interest in reproducibility is really more of an obsession or a paranoia. I worry that my own research is not going to replicate.
- Simmons, Nelson, & Simonsohn. (2011). False-positive psychology. Psychological Science.
- Gelman & Loken. (2013). The garden of forking paths. Unpublished manuscript.
- Palmeri. (2016). Psychology is in crisis over whether it's in crisis. WIRED.
- Ioannidis. (2005). Why most published research findings are false. PLOS Medicine.
- Edmiston. (now). Publications are not answers to research questions. Unpublished thoughts.
???
The first reason is that there is a lot of flexibility in how we analyze behavioral data. The second reason is that we tend to look until we find something. But of course there are plenty of people who disagree whether the state of psychology research is as bad as some say. And actually these problems with publication bias and the file drawer effect are much more widespread than just psychology. Personally I think the culprit is the idea that publications are definitive answers to research questions.
Munafò et al. (2017). A manifesto for reproducible science. Nature.
- Blind data analysts to experiment conditions.
- Improve statistics education (adapted for web).
- Hire methodological consultants.
- Seek collaboration for scalability.
Compare these two goals of reproducibility in science and in open source:
- Fellow researchers should be able to reproduce my work.
- Anyone should be able to use and contribute to this project.
???
Open source is the answer because the goal of reproducibility in open source communities is actually a loftier goal than in the scientific community.
class: center
- git
- mercurial
- subversion
- gitless
class: center
???
There are a number of ways to think about version control. One way is to think about it as a safety net, that no matter what you do, you can always roll back to what it was before. This is the power of the "undo" button. However, this doesn't really get at why I think version control is such a powerful tool. A better analogy is to think about version control as a tool for climbing. The picture is of tools used by rock climbers called "nuts" that you jam into a crack in the rock, and then you can use it as a hold. This is how I think about version control. It definitely has the effect of keeping you safer, but it also allows you to climb in places you otherwise wouldn't be able to.
class: center
class: center
<script type="application/json" data-for="htmlwidget-b726cd6f19709e90cb0f">{"x":{"diagram":"\ndigraph {\n rankdir = LR;\n bgcolor = transparent;\n node[label = \"\"; style = \"filled\"; fillcolor = \"#8DA0CB\"];\n\n t2;\n\n b0 -> b1 -> b2 -> b3[style = invis];\n m0 -> m1 -> m2 -> m3;\n t0 -> t1 -> t2[style = invis];\n\n m1 -> t2 -> m3[constraint = false];\n m2 -> b3[constraint = false];\n\n b0, b1, b2, t0, t1[style = invis];\n}","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>.pull-left[
# key
parent_repo/
├── child_repo_1 # submodule 1
└── child_repo_2 # submodule 2
# example 1
talk or publication/
├── research_project_1
└── research_project_2
]
.pull-right[
# example 2
meta-analysis/
├── research_project_1
├── ...
└── research_project_n
# example 3
big-project/
├── *web-app* -> also installed on web server
├── *lab-exp* -> also installed on lab computers
├── *r-pkg* -> installed by anyone who wants the data
├── conference
└── journal
]
(It only really works on plaintext files.)
Once you're working in plaintext, you can do lots of cool things.
- Full power of VCS (merge, blame, etc).
- Use free and open source tools (Unix).
- Write dynamic documents.
- Philosophy: DRY, Literate Programming
- Tools: Sphinx, Jupyter, Knitr, Pandoc
.pull-left[
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
Hunt & Thomas. (1999). The pragmatic programmer. ]
.pull-right[
- Intermingle prose and code for better understanding of the program.
- The explanation of a program does not need to resemble the program structure.
Knuth, Donald E. (1983). Literate programming. ]
Python documentation generation
Web-based, language-agnostic lab notebook.
Elegant, flexible and fast dynamic report generation with R.
Participants in condition A outperformed participants in
condition B, `report_model_results(mod, param = "condition")`.
- Handouts
- Homework
- Supplemental materials
- Conference proceedings
- Journal papers
Can you build the published paper without the original data?
- Open source tools
- No undocumented steps
- Centralized control
.pull-left[
- python
- R (S)
- Octave (Matlab)
- Enthought
- Anaconda ]
.pull-right[
- Amazon Web Services
- Open Stack
- ansible
- terraform ]
- Kaggle leaderboards
- Totems game
McKiernan, et al. (2016). How open science helps researchers succeed. eLife.
.pull-left[
- chimps (Whiten et al., 1999)
- whales (Garland et al., 2011)
- crows (Hunt & Gray, 2003) ]
.pull-right[
- ratchet effect (Tomasello et al., 1993)
- evolutionary process (Basalla, 1988)
- transmission fidelity (Lewis & Laland, 2012) ]
Pierce Edmiston (pedmiston@wisc.edu)
github.com/pedmiston/reproducible-research
bit.ly/reproducible-research-refs
Open source practices that make for more reproducible science:
- Version control
- Dynamic documents
- Building from source
Conclusion: It's worth it!