BCEM standards for reproducible research

Welcome to BCEM!

We’ve compiled this tutorial to share BCEM's reproducibility standards so that we can better document what we do, for the sake of our future selves, our collaborators, and ultimately, a world with better science. Even though reproducible standards in the field of bioinformatics go well beyond the requirements established here (see a fully reproducible paper as an example), we have decided to take things slow and adopt steps that are manageable and realistic for researchers in our group. As we grow and develop our skills, we will move in that direction, keeping those examples as a North Star. For now, let's delve into the areas we're currently covering.

About this tutorial

This tutorial covers the following aspects around research data management:

Project management
Data storage
File structure
Naming conventions
Shared resource usage
Scripts conventions
Version control

Project folder in the Cloud:

Every member in the lab needs to set up a project folder in a Cloud service of choice (Google Drive, OneDrive, Dropbox) in agreement with all project collaborators. This folder serves an essential purpose: to store the most important documents related to the project that will allow all parties to understand the project's developments. There is a suggested file structure and naming conventions for all files in this folder (see sections XXX and YYY, below).

Mind map:

At first glance, you may wonder why this step is necessary, and one of the first one in the guide. After all, you may have joined us thinking about getting your hands dirty with data. But, sometimes, it is a good idea to slow down, consider where to look and what to look for. At first, it may seem that nothing is happening. In time, you'll realize this is a helpful roadmap – one that is meant to develop with your understanding of your project and the hints you receive from your results.

The most important aspect about the mind map is that you identify the key components that will help you find answers on your objectives. Here are some guideline questions to aid that process:

Data acquisition: How? Where? Is it an experimental project, or are you downloading the data from open databases? In either case, be very explicit on the sources of data.
Data processing: How? Which tools?
Data analysis: What are the expected results? Which techniques could help me reach my goals?
Visualization: Which results are relevant? What are resulting analysis showing?
Final results: Where are these stored?

We suggest using a service such as diagrams.net to produce the actual map. But that's not binding, you may use any one that gets you there. Ideally, at every step of the way, the mind map should be up-to-date in your project folder in the Cloud service chosen.

Here's an example for a project based on experimental data collection:

Here's an example for a project based on data acquired from public databases:

Metadata

This is a mandatory piece of documentation accompanying all data sets used in the project that details the source and the process of data acquisition and processing. tos. We abide by the standards on Mininum Information about a Genome Sequence (MIGS), which are already adopted by specific repositories of genome sequence data such as the European Nucleotide Archive (ENA).

Raw Data

Raw data must be stored under our lab's ENA account immediately upon receival. The guidelines for submission are as follows:

Documenting experiments and data processing

Our lab requires that any process of data acquisition and/or processing be properly documented so that the work is as transparent and reproducible as possible. There are two possibilities for this documentation process, using an Electronic Lab Notebook (ELN), specifically RSpace (our Lab has a centralized account to manage multiple projects by all members with this provider), or digitally keeping detailed logs in a MarkDown (.md) document. Suggested tools to this end are Jupyter notebooks, Zettlr, Typora. The platform does not matter as long as it is a MarkDown document.

These documents must:

o Have one per flowchart (mind map) component o Contain the following sections for each entry: •Date •Aim •Protocol followed •Command lines or methodology in the lab •Third-party software (description of how it was used, under what parameters, includelink to the tutorial(s)) • Results • Must include relevant tables, graphs, etc. (or links to where these are stored, in case of large files) • Must be commented (interpretations of what has been found) • Indication of where the (intermediate) data was deposited (path, link).

File Structure:

This is the suggested (required?) file structure for the folder ...

01_Quality
02_Trimming
03_Quality_Trimming
04_Assembly
05_Results_Figures
06_Results_Tables
07_Manuscript

File Naming Conventions:

These are the conventions adopted by our lab to ensure as much as possible an understanding of what is contained in a file: …

Script Requirements

The minimum requirements for a script include:

The name of the file must be consistent with the function implemented.
Adapt to a standard of mnemonics and notation (Notation camel):
- For example: NotationCamel
Name
Description
Author
Institution
Contact email
Date: When was it implemented
Help (input, output) - how to run
Requirements (codependencies) versions

There must be one README per project module

Version
Parameters
Information needed before
Order in which the script should be put
Data structure (input and output)
Dependencies (versions)
TYPORA
Results graphs (if applicable)

Here's a repository containing an example of an ideal script and its associated README file:

Git repository usage

Note that this is not a requirement yet, advanced users only. Link to a tutorial?

Each commit must be adequately described: consistent, without omitting information
DO NOT commit on incomplete or unstable versions of the script
Teamwork (create n work branches, work on the branch that corresponds, do merge, do push)
Execute push only on the main work branch.
Only push to the master branch once ...(?)
Main folder will be the project with a README and a workflow
Within each project there are modules and each module folder must contain a README file

Note: !

Add text here.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LogoBCEM_C.png		LogoBCEM_C.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BCEM standards for reproducible research

About this tutorial

Project folder in the Cloud:

Mind map:

Metadata

Raw Data

Documenting experiments and data processing

File Structure:

File Naming Conventions:

Script Requirements

Git repository usage

About

Releases

Packages

lachemontes/Reproducibility-Guidelines

Folders and files

Latest commit

History

Repository files navigation

BCEM standards for reproducible research

About this tutorial

Project folder in the Cloud:

Mind map:

Metadata

Raw Data

Documenting experiments and data processing

File Structure:

File Naming Conventions:

Script Requirements

Git repository usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages