Skip to content

Designing Reports

Irzam Sarfraz edited this page Aug 9, 2022 · 18 revisions

Introduction

One aspect of singleCellTK is the generation of comprehensive reports that input data with corresponding parameters and generate a PDF/HTML report that contains the description of overall process, methods or algorithms used, results computed, various related plots along with a brief summary, all integrated together in a single easy to read document.

Why make reports?

  • Descriptive form makes it easier for the users to understand the analysis (data, steps & results)
  • You don’t have to run each step separately (as with GUI or Console)
  • Everything including data description, input params, code for the analysis, description of the analysis, results (tables & figures), summary of the results, is integrated in a single document
  • Easier if you want to share your results or the analysis
  • Just needs input data and params

Examples of existing reports: (add gifs or images)

Differential Expression Report:

Load up the data into a SCE object, define parameters such as the assay to use and and the groups for differential expression. The report compiles the results of the differential expression including a table that shows the top differentially expressed features and various plots to visualize these features.

Seurat Report: (add figure)

Typically, you load up data into a SCE object and use the ‘runSeuratReport’ function to run the workflow and generate a report. This runs all steps in the Seurat workflow controlled by the parameters to the ‘runSeuratReport’ function.

The output document generated contains description of the input data followed by all steps of the Seurat workflow that contain the description of the workflow step, the code for this part of the analysis, resulting plots or tables, and additionally a small summary for this part of the analysis that can be copy pasted to papers or presentations.

The markup of the report is defined in such a modular and flexible manner that if a user desires to re-run the analysis (and re-generate the report document) by changing parameters to a specific part of the workflow, the report will re-compute only that specific part and the parts that depend on it, essentially saving computational resources and time. Give example of control params.

Overall structure of a report: (add a figure with function call explaining interactions) Header Section: Title, authors, report output params, analysis params Content: Analysis description, code, plots, tables. Summary: Paragraph that describes the analysis run and the results computed. SessionInfo

Process Steps:

Picture2

a. If a short individual report (e.g. differentialExpression or a single algorithm from QC):

  1. Create rmarkdown file in /rmakdown folder (confirm dir)
  2. Insert/update the header section (add required analysis parameters and control parameters)
  3. Add content including library calls, description/introduction, analysis code, summary, sessionInfo
  4. Create an R function “reportAlgorithmName” that calls this rmarkdown and stores the output file.

b. If a large report that calls a number of distinct functions possibly a workflow (e.g. complete Seurat workflow or a collection of algorithms from QC such as cellQC):

  1. Divide the workflow into individual steps where each individual step as a separate report (refer to a)
  2. Create an rmarkdown for file for the workflow
  3. Insert/update the header section
  4. Add introduction, summary, sessionInfo.
  5. Compile individual reports using knit_child function and store in variables.
  6. Call variables to display content usint cat() where appropriate.
  7. Create R function “reportWorkflowName” that calls this overall workflow rmarkdown and stores the output file.

Reports directory in singleCellTK

Add report markdown files in the following directory:

singleCellTK/inst/rmarkdown/yourAlgorithm/yourAlgorithmReport.rmd

Report calling function: (add figure and example/template)

An R function that can be called with input data and params from the R environment to generate a report. Typically, this function checks for validity of the input object and params (or sets default values), checks for the output path, calls the rmarkdown to render the report and finally saves both the output object (includes all computed data) and the report document to the defined directory.

Picture3

  1. Create an R function "reportAlgoName" and add parameters including the input SingleCellExperiment object (inSCE), technical parameters (such as which assay to use, choice of dimensionality reduction components, clustering resolution, variable features etc.), control parameters (such as choice of description, if a plot should be displayed, what level of heading to use etc.) and the output path of the rendered document.
  2. In the function, first test if the input SCE object is valid and return an error if it is not.
  3. Check if the values against the technical parameters are valid, i.e. if the values specified actually exist in the input SCE object. Return error or use a default value with warning if a parameter is not valid.
  4. Check if the output path specified by the user is valid and exists, otherwise use a default path (possibly the current working directory) and return a warning.
  5. Render the rmarkdown file for this algorithm/workflow by using the render rmarkdown::render function and specify the output path to store the rendered file.
  6. Save the SCE object (which now contains the computations from the report) in the output path for later use.
  7. Return the SCE object from this function, so this object can be used from the same R environment where this function was called from.

Report functions directory

Function call for the rmarkdown should be created in the R file below:

singleCellTK/R/htmlReports.R

Modular Structure for Reports

The modular structure, where a larger report is divided into smaller individual reports, helps with re-usability and easy maintenance of reports. See for example, how the Seurat Workflow report re-uses individual reports for various sections of the report: (add figure from ppt) Tell benefits of modular structure and explain how they work

Picture1

Control variables in reports (add example)

Control variables differ from other parameters in the sense that they do not affect the analysis, but only on the output document. These control variables control different levels of headings, to show or hide description and plots in a particular report, or if the computation from an analysis should be re-run or skipped when the document is re-rendered.

Below we define and describe a few of the use-cases of control variables:

Control if a snippet/section of code should be evaluated in the report.

Control if a particular plot should be visualized in the report.

Control the level of heading to use in a report.

Control the text of titles and descriptions in the report.

Control if a description should be shown or hidden in the report

Control if a particular report should only run the algorithm (without plotting results) or only plot the results (without re-running the algorithm) from a previous computation

Control report specific plotting options such as defining number of rows and columns for multiple plots

Control report specific technical parameters that let you re-run or iterate code sections for a specific number of times