-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathpaper.Rmd
199 lines (137 loc) · 47.1 KB
/
paper.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
title: "Publishing computational research -- A review of infrastructures for reproducible and transparent scholarly communication"
preprint: true
author:
- "Markus Konkol (m.konkol [at] uni-muenster [dot] de), Daniel Nüst, Laura Goulier (Institute for Geoinformatics, University\\ of\\ Münster, Münster,\\ Germany)"
abstract: >
Funding agencies increasingly ask applicants to include data and software management plans into proposals. In addition, the author guidelines of scientific journals and conferences more often include a statement on data availability, and some reviewers reject unreproducible submissions. This trend towards open science increases the pressure on authors to provide access to the source code and data underlying the computational results in their scientific papers. Still, publishing reproducible articles is a demanding task and not achieved simply by providing access to code scripts and data files. Consequently, several projects develop solutions to support the publication of executable analyses alongside articles considering the needs of the aforementioned stakeholders. The key contribution of this paper is a review of applications addressing the issue of publishing executable computational research results. We compare the approaches across properties relevant for the involved stakeholders, e.g., provided features and deployment options, and also critically discuss trends and limitations. The review can support publishers to decide which system to integrate into their submission process, editors to recommend tools for researchers, and authors of scientific papers to adhere to reproducibility principles.
increasinglyheader-includes: >
\setlength{\columnsep}{18pt}
\usepackage{url}
\usepackage{breakurl}
\PassOptionsToPackage{hyperindex,breaklinks}{hyperref}
\usepackage{caption}
\captionsetup{width=5in}
bibliography: mybibfile.bib
output:
pdf_document:
fig_caption: true
keep_tex: true
pandoc_args: ["-V", "classoption=onecolumn"]
---
# Introduction
Many scientific articles report on results based on computations, e.g., a statistical analysis implemented in R. Publishing the used source code and data to adhere to open reproducible research (ORR) principles (i.e., public access to code and data underlying the reported results [@stodden2016enhancing]) seems simple. However, several studies concluded that papers rarely link to these materials [@stagge2019assessing; @nust2018reproducible]. Moreover, due to technical challenges, e.g., capturing the original computational environment of the analyst, even accessible materials do not guarantee reproducibility [@chen2019open; @konkol2019computational]. These issues have several implications [@morin2012shining]: It is difficult (often even impossible) to find errors within the analysis, but publishing erroneous papers can damage an author’s reputation [@herndon2014does] as well as trust in science [@national2019reproducibility]. Also, reviewers cannot verify the results, because they need to understand the analysis just by reading the text [@bailey2016facilitating]. Furthermore, other researchers cannot build upon existing work but have to collect data and implement the analysis from scratch [@powers2019open]. Finally, libraries cannot preserve the materials for future use or education. These issues are also to society’s disadvantage as it cannot benefit fully from publicly funded research [@piwowar2007sharing]. Fortunately, funding bodies, e.g., Horizon 2020 (https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-issues/open-access-dissemination_en.htm, last access for this and the following URLs: 20th Dec 19), increasingly consider data and software management plans as part of grant proposals. Accordingly, more editors add a section on code and data availability into their author guidelines (see, e.g., @nuest2019agile; @hrynaszkiewicz2019publishers), and reviewers consider reproducibility in their decision process [@stark2018before]. Nevertheless, these cultural and systematic developments [@munafo2017manifesto] alone do not solve the plethora of reproducibility issues. Authors often do not know how to fulfill the requirements of funding bodies and journals, such as the TOP guidelines [@Nosek1422]. It is important to consider that the range of researchers’ programming expertise varies from trained research software engineers to self-taught beginners. For these reasons, more and more projects work on solutions to support the publication of executable supplements. The key contribution of this paper is a review of applications that support the publication of executable computational research for transparent and reproducible research. This review can be used as decision support by publishers who want to comply with reproducibility principles, editors and programme committees planning to adopt reproducibility requirements in the author guidelines and integrate code evaluation in their review process [@eglen2019], applicants in the process of creating data and software management plans for their funding proposals, and authors searching for tools to disseminate their work in a convincing, sustainable, and effective manner. We also consider aspects related to preservation relevant for librarians dealing with long-term accessibility of research materials. Based on the survey, we critically discuss trends and limitations in the area of reproducible research infrastructures.
*Scope:* This work focuses on applications that support the publication of research results based on executable source code scripts (e.g., R or Python) and the underlying data. Hence, we did not consider workflow systems (e.g., Taverna [@wolstencroft2013taverna]) or online repositories (e.g., Open Science Framework, https://osf.io/). Also, this paper does not discuss how to work reproducibly since this is covered already in literature (e.g., @rule2019ten, @sandve2013ten, @greenbaum2017structuring, @markowetz2015five). The review is a snapshot of the highly dynamic area of publishing infrastructures. Hence, some of the collected information might become outdated, e.g., an application might extend the set of functionalities or be discontinued. Still, reviewing the current state of the landscape to reflect on available options is helpful for publishers, editors, reviewers, authors, and librarians. All collected data is available in the supplements (see Data and Software Availability).
The paper is structured as follows: First, we survey fundamental concepts and tools underlying the applications. We then introduce each application and the comparison criteria followed by the actual comparison. The paper concludes by a discussion about the observations we made, trends, and limitations.
# Background
## Packaging computational research reproducibly
The traditional research article alone is not sufficient to communicate a complex computational analysis [@donoho2010invitation]. To address this issue, computational reproducibility concerns the publication of code and data underlying a research paper. This form of publishing research allows reviewers to verify the reported results and readers to reuse the materials [@barba2018terminologies]. To achieve that, all materials are needed, including not only the data and code but also the computational environment. A basic concept for such a collection is the research compendium, a “mechanism that combines text, data, and auxiliary software into a distributable and executable unit” [@gentleman2007statistical]. The concept was extended by a description and snapshot of the software environment using containerization resulting in the executable research compendium [@nust2017opening]. Containerization and virtualization are mechanisms to capture the full software stack of a computational environment, including all software dependencies in a portable snapshot [@Perkel_2019]. In contrast to containerization, virtualization also includes the operating system kernel. Despite this difference, both approaches have proven to improve transparency and reproducibility [@Boettiger2015; @howe2012]. One containerization technology is Docker, which is based on so-called Dockerfiles, human and machine readable recipes to create the image of a virtual environment [@Boettiger2015]. These recipes add an additional layer of documentation making Docker a popular tool in the area of computational reproducibility [@nust2019containerit]. A research compendium should contain an entry point, i.e., a main file that needs to be executed to run the entire analysis. One option to realize these entry points is the concept of literate programming, an approach for interweaving source code and text in one notebook [@knuth1984literate]. Two popular realizations of such notebooks are Jupyter Notebooks [@kluyver2016jupyter] and R Markdown [@baumer2014r]. Combining source code and data in one document is advantageous over other approaches, such as having code scripts and the article separated, which might result in inconsistencies between the two. A further advantage is the possibility to execute the analysis with a single click, so called one-click-reproduce [@edzer2013]. This form of making computational results available lowers the barrier for others to reproduce the results and thus increases trust and transparency of computer-based research.
## Licensing and Citation
Appropriate licensing of research components is crucial yet complex, as copyright laws differ between component types, e.g., data, software, and text [@stodden2008legal]. This is particularly important when it comes to reusing research components, which is one of the main goals of research compendia. A further level of complexity emerges if research compendia include, for example, parts of the data and the code of several already published papers. A typical use case is reusing code of a specific version published in a repository, while the same code is developed and stored on a public repository (e.g., GitLab). Besides conscious handling of licenses and copyrights, building on top of the work of others requires adequate citations. This can be supported by connecting the research components with the help of metadata including permanent and global identifiers, e.g., DOIs [@stodden2016enhancing], which can be also used for data [@Park_2019] and software [@Fenner_2016].
## Ethical and technical issues
Frequently mentioned issues related to computational reproducibility concern sensitive data and large data files. To tackle the issue of sensitive data, a first step would be to anonymize the data. Another option is to involve a trustworthy authority which ensures that the results in the article can be achieved based on the used data [@perignon2019certify]. In this case, public access is not required. To ensure that these solutions are not exploited, authors should argue why hiding or providing synthetic data is required and reviewers can then decide whether the reasons are valid. A further solution is the concept of cloud-based data enclaves, which provide data access only to authorized persons [@foster2018research]. Such approaches for access control could be connected with the applications discussed in this paper.
Large data files, e.g., global remote sensing datasets quickly reach several petabytes. However, a large number of papers are based on datasets that can be stored on public and free data repositories, such as Open Science Framework (file size limit only for individual files, https://help.osf.io/hc/en-us/articles/360019737894-FAQs#what-is-the-individual-file-size-limit) or Zenodo (max. 50GB by default, extension possible, https://help.zenodo.org/whatsnew/). Further limiting factors are long computation times and the need for specialized hardware, such as high-performance computing clusters [@ahn2013overcoming].
# Methods
<!--
The methods section should include:
the aim, design and setting of the study
the characteristics of participants or description of materials
a clear description of all processes, interventions and comparisons. Generic drug names should generally be used. When proprietary brands are used in research, include the brand names in parentheses
the type of statistical analysis used, including a power calculation if appropriate
-->
To obtain an overview of what the applications supporting the publication of reproducible analyses provide as well as the trends and limitations, we compared them across a set of criteria.
## Materials
To ensure that the stakeholders receive current recommendations, we considered an application as part of our analysis if **(i)** it was actively maintained at the time the data for this paper was collected (5th-13th Dec 2019), **(ii)** it supported publishing executable code and data which can be inspected and reused, and **(iii)** the application was explicitly connected to the publication process. Hence, we did not consider technologies that alone cannot support the publication process of code and data as further infrastructure is needed (e.g., Docker) or applications that only provide access to data or code (e.g., Zenodo). We found the applications during literature research and discussions at conferences or workshops.
## Applications
Based on the sample criteria, ten applications were selected for the review. In the following, we briefly introduce them in alphabetical order.
Researchers having a repository (e.g., on GitHub/Lab, Zenodo) including, e.g., a Jupyter notebook can use **Binder** (https://mybinder.org/) to make it available in an executable environment [@jupyter2018binder]. Readers can launch the analysis from a Binder-ready repository and inspect the workflow in a browser. Binder creates a containerized environment from a repository based on configuration files. In **Code Ocean** [@clyburne2019computational], authors can create so-called "capsules" which contain code, data, and the computational environment including the version of the operating system and dependencies. Readers can, while studying the article, execute and inspect the analysis in a separate window below the online version of the article or on Code Ocean’s website. The **eLife Reproducible Document Stack** (RDS, https://elifesciences.org/labs/b521cf4d/reproducible-document-stack-towards-a-scalable-solution-for-reproducible-articles) enables authors to publish executable documents based on Stencila (https://stenci.la/), an open-source editor for articles. The executable document, which contains the whole narrative and executable code snippets, is not only a supplement but the actual scientific article. **Galaxy** [@goecks2010galaxy] is a web-based application for developing computational analyses without programming expertise. Scientists can upload and analyze data by using Jupyter Notebooks [@gruning2017jupyter]. **Gigantum** (https://gigantum.com/) builds on top of Git and packages code, data, the computational environment, and the work history into a Git repository. Gigantum is composed of a client application for creating as well as executing analyses locally, and a cloud-based infrastructure for sharing computations and collaborating with peers. **Manuscripts** (https://www.manuscripts.io/about/) is an online tool for writing executable documents collaboratively based on the concept of literate programming, but featuring a “What you see is what you get” user interface. The runtime environment of the author is, however, not considered. **o2r** [@nust2017opening] addresses publishers who want to extend their existing infrastructure by a reproducibility service during the process of paper submission [@nust_daniel2018]. Authors can also create interactive figures, allowing reviewers and readers to check the robustness of the results, e.g., by changing model parameters using a slider [@konkol2019creating]. **REANA** [@vsimko2019reana; @chen2019open] provides a formal specification to guide authors through the process of capturing input datasets, code, and the computational environment. Based on this structure and after creating some configuration files manually, REANA provides a set of command line interface (CLI) commands to run large analyses on a remote REANA cloud. **ReproZip** [@steeves2018using; @chirigati2016reprozip] provides a set of CLI commands for encapsulating data, code, and the computational environment automatically. Users can execute the resulting bundle on a server provided by ReproZip [@rampin2018reproserver] or locally on different computer systems. With **Whole Tale** [@brinckman2019computing], authors can create so called “Tales” that combine narrative, data, code, and the computational environment. Readers can inspect the materials and execute the analysis in the original environment.
## Rationale for the comparison criteria
We identified the comparison criteria considering the needs of stakeholders of the scholarly publication process described by @nust2017opening, i.e., those of publishers, editors, authors, reviewers, readers, and librarians. There is some overlap regarding stakeholder needs, for example, publishers as well as authors aim at attracting readers and providing a convenient reading experience for reviewers.
**Publishers** need to know whether they can integrate the application into their existing infrastructure. The applications can be either made available as open source tools for own hosting or as a service hosted by the provider. If the tool is available for free under an open license, publishers only have to consider costs for maintaining the infrastructure. Moreover, publishers gain full control and can customize the interface or processes according to their own specifications. In case of a paid service, publishers can take advantage of not being responsible for the maintenance. A further criterion relevant for publishers is the development stage of the application, i.e., if it was already used in published articles.
**Editors** of journals need to ensure that a service for publishing reproducible research is consistent with the tools the authors typically use and common practices in their scientific field. For example, journals regularly receiving submissions containing Jupyter Notebooks should not choose a service that supports only R Markdown. This aspect might also affect the author and reviewer guidelines, for which the editors are responsible. A further relevant aspect is the addressed research area. Some applications might address specific fields and thus provide features tailored to domain-specific requirements.
**Authors** need to submit research materials efficiently. Hence, we checked how authors can upload their files and which submission formats and programming languages are supported. We also considered which license submitted materials receive, since this is a frequently mentioned aspect of papers discussing reproducibility guidelines [@stodden2016enhancing]. Although licensing is relevant for all stakeholders, authors are particularly responsible for taking care of it. We also checked whether the applications can deal with sensitive data.
For **readers** and **reviewers**, open reproducible research comes with several benefits, such as advanced search capabilities, re-running workflows, inspecting results in detail (i.e. looking at code or data files), modifying parameter settings, and reusing the data or the analysis for the own work [@konkol2019depth]. We thus checked whether the tools provide any specific support for such investigations of the research materials.
**Librarians** are tasked with preserving research materials. We checked how the materials are stored and shared, and if modifying or deleting them after publication is possible.
Based on these comparison criteria, we investigated the project websites, the actual applications, GitHub/Lab repositories, scientific articles (if available), and blog posts. Since most of the sources were not scientific articles, the supplements contain screenshots and URLs to show where we found the corresponding information.
# Results
In the following, we compare the applications considering the needs of the stakeholders. *Table 1* summarizes aspects relevant for publishers, i.e., if self-hosting is possible, which license is assigned to the application, whether it is already in use or in a beta stage, and the funding source. From the ten applications, eight allow self-hosting. Code Ocean and Gigantum provide the service themselves. eLife RDS, o2r, and REANA (in *Table 1* marked by \*) require own installations since no free online deployments exist. Three applications are released under the *BSD-3-Clause License*, three under *MIT* of which Gigantum assigned this license to the local tool and not to the cloud service, one under *Apache 2.0*, one under the *CPAL 1.0*, and one under *Academic Free License 3.0*. These licenses allow operators to host their own service as well as to modify the software according to their individual needs and styles. This means, however, they also have to maintain the infrastructure and provide the required technological resources as well as personnel. In contrast, Code Ocean’s infrastructure and Gigantum’s cloud service are provided in exchange for payment. From the reviewed applications, four are rather experimental and six are already in use as shown by the example papers with workflows based on the corresponding application. Seven applications receive funding from public or private science foundations. Code Ocean and Gigantum offer a commercial service.
*Table 2* summarizes aspects relevant for editors and authors, i.e., the scientific domains, supported submission formats, upload mechanisms, and license terms. Although none of the investigated applications are strictly tied to a specific domain, we observed that some of them focus on particular areas. For example, Galaxy provides a rich set of features tailored to use cases in the life sciences. Other applications originate from a particular domain, e.g., eLife’s RDS comes from the life sciences whereas REANA focuses on particle physics. From the ten applications, nine support literate programming approaches by default, e.g., Jupyter Notebook or R Markdown. Manuscripts supports Markdown, but also code execution via embedded Jupyter Notebooks.
| |Self-hosting|Open license|Stage|Funding|
|--|--|---|---|-----|
| Binder | yes | BSD 3-Clause "New" or "Revised" | in use by @nust2018reproducible | Moore Foundation, Google Cloud Platform |
| Code Ocean | no | Commercial application | in use by @chitre2018editorial | commercial |
| eLife RDS | yes* | MIT | in use by @lewis2018replication | Howard Hughes Medic. Inst, Max Planck Society, Wellcome Trust, Knut and Alice Wallenberg Foundation |
| Galaxy | yes | Academic Free 3.0 | in use by @ide2015language | National Institutes of Health, National Science Foundation, Penn State, Johns Hopkins, and the Pennsylvania Department of Public Health |
| Gigantum | no | MIT | beta | commercial |
| Manuscripts | yes | CPAL-1.0 | beta | no information available |
| o2r | yes* | Apache 2.0 | beta | DFG (German Funding Agency) |
| REANA | yes* | MIT | in use by @prelipcean2019physics | CERN, National Science Foundation |
| ReproZip | yes | BSD 3-Clause "New" or "Revised" | in use by @chirigati2016data | Moore and Sloan Foundation |
| Whole Tale | yes | BSD 3-Clause "New" or "Revised" | beta | National Science Foundation |
Table: Overview of properties relevant for publishers, i.e., if self-hosting is possible (* denotes only self-hosting is possible), which license the applications have, the stage of the project (in use or beta), and the funding source.
Seven applications are extensible and provide the possibility to configure the application to support further submission formats or programming languages. Except for Code Ocean which also supports MATLAB and Stata, all applications only support non-proprietary programming languages. For making code and data available on the platform, five applications provide file upload. Five applications provide the possibility to upload materials via an external cloud or repository, e.g., Zenodo. However, uploading materials might be disadvantageous for papers based on large data files. For these cases, eLife’s RDS (based on Stencila), REANA, and ReproZip allow local usage. Researchers can also work locally with Gigantum, but then need to synchronize with the online service to access all features. Despite the importance of licensing, we could not find information on copyright for research materials in four applications. Whole Tale and Gigantum only allow open licenses whereas Code Ocean, Galaxy, and o2r encourage it. eLife assigns an open license to the article text only.
| |Research area|Submission formats/ Program. languages|Upload|Copyright|
|--|--|----|----|----|
| Binder | all | R Markdown, Jupyter Notebooks, extensible | via URL/DOI from Git(Hub/Lab), Gist, Zenodo, Figshare, Dataverse | no information found |
| Code Ocean | all | R Markdown, Jupyter Notebooks, C/C++, Fortran, Java, Lua, MATLAB, Stata, extensible | File upload, via URL from Git repository | self-determined, MIT for code/ CC0 for data encouraged |
| eLife RDS | all, focus on life sciences | R Markdown, Jupyter Notebooks, Markdown, Excel, Word, LaTeX, JATS, extensible | created locally using Stencila | CC-BY for text, for code/data not discussed |
| Galaxy | all, focus on life sciences | Jupyter Notebooks, extensible | File upload, FTP, SRA | encourage open license for software |
| Gigantum | all | R Markdown, Jupyter Notebooks | Synchronization | self-determined but has to be open |
| Manuscripts | all | Markdown, Word, Latex, JATS, R, Julia, Python | File upload | no information found |
| o2r | all, focus on geosciences | R Markdown | File upload, ownCloud | self-determined but open is encouraged |
| REANA | all, focus on particle physics | Jupyter Notebooks, extensible | created locally | no information found |
| ReproZip | all | Jupyter Notebooks, extensible | created locally | no information found |
| Whole Tale | all | R Markdown, Jupyter Notebooks, extensible | File upload, URL/DOI from DataOne/ Dataverse, Materials Data Facility | self-determined but has to be open |
Table: Overview of aspects relevant for editors and authors, i.e., the addressed research area, which submission formats are supported, how authors can upload materials, and copyright.
*Table 3* summarizes aspects relevant for reviewers and readers. From the ten applications, five provide a keyword-based search for papers whereas five do not provide any search feature. o2r provides a spatiotemporal search combined with thematic properties, such as libraries used in the code. Nine applications provide tools for inspecting code and data, six of them by providing an own user interface (UI) and three by embedding a programming environment (e.g., JupyterLab, RStudio). Though REANA does not provide supportive tools for inspection, the materials can be viewed when stored on public repositories, e.g., GitLab. Nine applications provide tools for downloading materials. Projects created with REANA can be downloaded if stored on public repositories which already provide a download functionality. Eight applications allow readers to execute the analysis in the browser on a remote server. Gigantum provides a UI running locally, REANA projects are executed via the CLI in a remote REANA cloud. Each application allows manipulating the code and rerunning it based on a new parameter. Most commonly, users can directly manipulate the code in the browser (6 applications provide this option) or locally (Gigantum). In REANA, users can pass new parameter values via the CLI, in ReproZip via the CLI or input fields using ReproServer. The o2r platform allows authors to configure UI widgets giving reviewers/readers the chance to interactively manipulate parameter values, e.g., by using a slider to change a model parameter within a certain range.
| |Searching|Inspection|Download|Execution|Manipulation|
|----|----|------|---|------|-----|
| Binder | no support | within UI of JupyterLab in browser | via UI | within UI in browser | manually within code in browser |
| Code Ocean | keyword-based | below article, or in UI of Code Ocean | via UI | within UI in browser | manually within code in browser |
| eLife RDS | keyword-based | within article in browser | via UI | within UI in browser | manually within code in browser |
| Galaxy | keyword-based | within UI of JupyterLab in browser | via UI | within UI in browser | manually within code in browser |
| Gigantum | no support | within UI of local installation | via UI | within UI of local installation | within UI of local installation |
| Manuscripts | no support | within UI of Manuscripts | via UI | within UI in browser | manually within code in browser |
| o2r | spatiotemporal and keyword- based search | within UI of o2r | via UI | within UI in browser | using UI widgets |
| REANA | no support | no support | no support | via CLI in remote Reana cloud | manually via CLI |
| ReproZip | no support | within UI of ReproServer | via UI | locally via CLI, within UI in browser | manually via CLI/input fields in browser |
| Whole Tale | keyword-based | within UI of JupyterLab/ RStudio in browser | via UI | within UI in browser or locally | manually within code in browser |
Table: Overview of features relevant for reviewers and readers, i.e., searching for papers and materials, inspecting code and data, downloading materials, executing the analysis, and manipulating the code.
*Table 4* addresses libraries and other institutions with a mandate to preserve and provide access to research outputs. It includes information on how the research materials are stored and shared, and whether modifying or deleting content once published is possible. Five applications provide storage, though it remains unclear whether they run the servers by themselves or by third-party services, and what kind of backup and archiving is implemented. Seven applications give hosts the option to store research materials independently, e.g., on the publisher’s infrastructure. The free available instance of Binder, MyBinder.org (https://mybinder.org/), stores Docker images temporarily but beyond that, no storage is provided. Whole Tale and o2r use existing long-term preservation services, e.g., Zenodo and DataOne. Regarding the possibility to modify or delete materials once published, we assigned “possible” if there is any way to do so. In Binder, REANA, and ReproZip, modifying/deleting content is possible if the research materials are stored on GitHub/Lab, but not when stored on Zenodo. The same is true for Galaxy, Gigantum, and Manuscripts, which allow users to edit/delete contents stored in the cloud. Code Ocean and Whole Tale assign DOIs to published contents making it impossible to edit these after publication. The same applies to o2r but only if the materials are archived. Finally, in eLife’s RDS, the article is composed of text and code. Deleting it is thus equivalent to withdrawing a paper. All in all, seven applications allow modifying published materials. However, this issue is mitigated when researchers “go the extra mile” and also publish their materials in long-term repositories, such as Zenodo. One exception is Code Ocean which allows modifications (no deletions) but assigns a new DOI to the modified content. Finally, authors need a way to share reproducible results in their paper. This is possible via a URL/DOI to the application (eight applications provide this possibility) or a URL to an online repository (2).
| | Storing | Modify/Delete after publication | Sharing |
|------------- |------------------- |--------------------------------- |-------------------------- |
| Binder | by host | possible | URL to Binder instance |
| Code Ocean | provided | not possible | URL/DOI to Code Ocean |
| eLife RDS | by host | not possible | URL/DOI to eLife |
| Galaxy | provided, by host | possible | URL to Galaxy |
| Gigantum | provided | possible | URL to Gigantum |
| Manuscripts | provided, by host | possible | URL to Manuscripts |
| o2r | by host, Zenodo | possible | URL to o2r |
| REANA | by host | possible | URL to online repository |
| ReproZip | provided, by host | possible | URL to ReproServer |
| Whole Tale | DataOne | not possible | DOI to DataOne |
Table: Overview of properties relevant for long-term preservation, i.e., how the research materials are stored, if they can be modified/deleted after publication, and how they can be shared in articles.
# Discussion
Several projects develop applications for publishing computational research. One might think the applications, since they all strive for the same overall goal, resemble each other. However, the overview in this paper (see *Tables 1-4*) shows that the applications address different issues and needs. This increases the chances for stakeholders to find a suitable application for their individual requirements.
## Needs of stakeholders
**Publishers:** A critical decision is whether publishers want to host an infrastructure by themselves or engage a provider. Applications exist for both approaches though the majority of them allow self-hosting. Accordingly, all self-hosting solutions have an open license enabling operators to create customized versions of the platforms and the peer review process. A further advantage is the mitigation of risks regarding vendor lock-in or grant-based projects which expire at some point.
Nevertheless, it remains unclear which costs publishers have to expect when hosting an infrastructure. The final costs strongly depend on the number of views and execution attempts, workflow sizes, and manipulation options. These parameters differ between use cases and could be the basis for future research, e.g., on stress tests. Therefore, the metrics of existing publications might provide first insights to calculate the required resources. The Binder instance MyBinder.org provides an initial estimate regarding costs (https://mybinder.org/v2/gh/jupyterhub/binder-billing/master?urlpath=lab/tree/analyze_data.ipynb). However, further data on infrastructure costs from the other services would help to calculate costs in a more accurate way, albeit this transparency is only realistic for non-profit projects.
Working with applications that are already in use and those in a beta stage can both have advantages and disadvantages. While applications already in use provide initial evidence that they work, it might take more effort to make adjustments in order to fit the publisher’s infrastructure. In contrast, applications in a beta stage can adjust their feature set and consider new contributions without worrying about running instances and backwards compatibility, but the deployment of the applications might reveal new issues.
**Editors and authors:** The research area does not narrow down the number of options. Although some applications come from specific domains, e.g., the life sciences, none of them is restricted to a specific field. Regarding submission formats, there is a trend towards literate programming approaches. Most applications either support Jupyter Notebook or R Markdown which both have proven to support reproducibility [@Gr_ning_2018]. However, some journals and publishers have particular requirements, e.g., they rely on LaTeX. Since transformations to other document types are often cumbersome and shifting author requirements can be a lengthy process, it might be easier to have reproducible documents as a supplement, potentially for a transition period until the executable documents are widely accepted. Nevertheless, eLife’s RDS shows that a scientific article combining executable code with narrative is possible today and comes with advantages with respect to communicating scientific results, e.g., studying text and analysis in parallel while also being able to manipulate the analysis.
A critical issue is that not all applications explicitly handle the copyright of the shared materials. Those who do, fortunately, either require or encourage open licenses. Licensing is important to enable reusability and thus a recommendation mentioned frequently in papers discussing reproducibility guidelines [@Nosek1422; @stodden2008legal]. Therefore, the platforms should inform users about licenses, e.g., by referring to existing advising resources (e.g., https://choosealicense.com/) and ideally require open licenses.
A further limitation of the applications is that the anonymity of the authors is not guaranteed during the review process. All applications require an account for creating reproducible results and the name of the creator is usually visible making double-blind review impossible. However, access to code and data is particularly important for reviewers, since they decide on accepting or rejecting a submission. One solution might be to create an anonymous version of the materials, as it is possible with Open Science Framework (https://help.osf.io/hc/en-us/articles/360019930333-Create-a-View-only-Link-for-a-Project) or to adopt an open peer-review process.
**Reviewers and readers:** Being able to reproduce computational results in a paper is a clear benefit. However, open reproducible research comes with a number of further incentives with respect to finding and inspecting papers [@munafo2017manifesto]. Most search tools provided by the applications do not take full advantage of the information contained in code and data files, e.g., spatiotemporal properties. Instead, they either provide a keyword-based search or no search at all. For inspecting materials, most solutions either provide their own UI or integrate a development environment, e.g., JupyterLab, RStudio. In both cases, users can directly access, manipulate, and reuse the code. However, readers might still need to understand complex code scripts. Moreover, identifying specific parameters buried in the code and finding out how to change these can be a daunting task. The concept of nano-publications [@kuhn2016decentralized] or bindings [@konkol2019creating] might help to solve these issues. A further need in this context is a UI for comparing original and manipulated figures since differences in the figure after changing parameters might be difficult to spot. Most applications do not provide any support for substituting research components, e.g., the input datasets. This might be due to the plethora of complex interoperability issues with respect to data formats or column names in tabular data. Only ReproZip [@chirigati2016reprozip] and o2r [@konkol2019depth] provide basic means to substitute input datasets, yet they require users to ensure compatibility.
**Librarians:** The state of the research materials is an essential issue when it comes to publication. While some applications fix the state of the research materials by assigning a DOI and archiving a snapshot, others allow changing and deleting them. This is a disadvantage with respect to reproducibility since verifiability and accessibility are not given anymore. In addition, if self-hosting is not possible, the computational analysis of an article will be executable only as long as the project and its infrastructure exist. This dependence is a crucial aspect with respect to archiving. A further dependence is the fundamental technology. Without Docker, a Dockerfile cannot be run anymore but it is still readable and provides important machine- and human-readable information on how to run the analysis. This is also true for source code scripts which are plain text files and thus can be opened using any editor. These examples demonstrate the importance of using open and text-based instead of proprietary and binary file formats in science. Due to these issues, researchers should consider archiving the research materials on platforms, such as Zenodo in addition to an executable version (which should be the same version) using one of the applications.
## Limitations
This work is subject to a number of limitations. The scope of the paper is narrow and does not cover all applications that support the publication of computational research (e.g., workflow system, such as Taverna). In addition, the information in this review paper might become outdated quickly but having a structured overview can still be helpful for the involved stakeholders to decide on a technology. Finally, the properties we investigated in this survey are certainly not complete. Still, stakeholders requiring more information can use the overview as a starting point for further research.
# Conclusions
Open science is on the rise and obtains more attention and expressions of support from all stakeholders including publishers, editors, reviewers, authors, readers, and institutions responsible for archiving, e.g., libraries. Despite these developments, publishing computational results reproducibly is still a challenge for all parties involved in scholarly communication. Fortunately, several projects aim at tackling these issues by designing applications to support the publication of executable research results. The key contribution of this paper is an overview of these applications, their commonalities, and differences. This overview can be used as a decision support for publishers facing the question of whether to host an application by themselves, for editors who want to ensure that the application is conform with the author and reviewer guidelines, for authors who want to create reproducible analyses efficiently, and for reviewers who want to verify the results and to suggest potential solutions during the review process. Moreover, the overview considers the needs of readers who want to better understand and reuse research materials, such as code scripts and data. Finally, the survey contains aspects relevant for archiving. The applications all provide a rich set of functionalities and address many reproducibility issues. However, issues related to ethics, privacy, big data, and long computation times are not fully solved, yet. Beyond these challenges, considering executable submissions already during the review process comes with a number of novel research questions: How many reviewers try to reproduce submissions using one of the applications? How does a reproducibility attempt (whether it failed or succeeded) affect a reviewer’s decision? Is the effort required for reviewing reproducible papers the same as for a traditional article (i.e., only reading the text), or will reviewers refuse reviews since they fear additional work? And finally, how much time does it take to review a paper supplemented by reproducible documents and how much additional understanding do reviewers obtain? These quantitative (time, user interactions) and qualitative (interviews, questionnaires) measures can help to improve the applications and eventually foster the success of open reproducible research.
# Data and Software Availability
The data and code used to create the tables are openly available on GitHub: https://github.com/o2r-project/reviewpaper. The table allows substituting the input dataset by an updated record. The repository includes a list of all projects we looked and the reason for why we excluded some of them. A snapshot of the repository at the time of submission is available on Zenodo: https://doi.org/10.5281/zenodo.3562270.
# Author Contributions
Markus Konkol wrote the paper, collected the data, and conceptualized the analysis. Daniel Nüst wrote the paper. Laura Goullier collected data and wrote the paper. All authors discussed the results and approved the final manuscript.
# Competing Interests
The authors of this paper are members of the o2r project that was also discussed in this paper (http://o2r.info/).
# Funding
This work is supported by the project Opening Reproducible Research 2 (https://www.uni-muenster.de/forschungaz/project/12343) funded by the German Research Foundation (DFG) under project numbers KR 3930/8-1; TR 864/12-1; PE 1632/17-1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
# References {#references .unnumbered}