Skip to content

openproblems-bio/task_grn_inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GRN Inference

Benchmarking GRN inference methods The full documentation is hosted on ReadTheDocs.

Repository: openproblems-bio/task_grn_inference

Description

GRNs are essential for understanding cellular identity and behavior. They are simplified models of gene expression regulated by complex processes involving multiple layers of control, from transcription to post-transcriptional modifications, incorporating various regulatory elements and non-coding RNAs. Gene transcription is controlled by a regulatory complex that includes transcription factors (TFs), cis-regulatory elements (CREs) like promoters and enhancers, and essential co-factors. High-throughput datasets, covering thousands of genes, facilitate the use of machine learning approaches to decipher GRNs. The advent of single-cell sequencing technologies, such as scRNA-seq, has made it possible to infer GRNs from a single experiment due to the abundance of samples. This allows researchers to infer condition-specific GRNs, such as for different cell types or diseases, and study potential regulatory factors associated with these conditions. Combining chromatin accessibility data with gene expression measurements has led to the development of enhancer-driven GRN (eGRN) inference pipelines, which offer significantly improved accuracy over single-modality methods.

Here, we present geneRNIB as a living benchmark platform for GRN inference. This platform provides curated datasets for GRN inference and evaluation, standardized evaluation protocols and metrics, computational infrastructure, and a dynamically updated leaderboard to track state-of-the-art methods. It runs novel GRNs in the cloud, offers competition scores, and stores them for future comparisons, reflecting new developments over time.

The platform supports the integration of new datasets and protocols. When a new feature is added, previously evaluated GRNs are re-assessed, and the leaderboard is updated accordingly. The aim is to evaluate both the accuracy and completeness of inferred GRNs. It is designed for both single-modality and multi-omics GRN inference. Ultimately, it is a community-driven platform.

So far, ten GRN inference methods have been integrated: five single-omics methods of GRNBoost2, GENIE3, Portia, PPCOR, and Scenic; and five eGRN inference methods of Scenic+, CellOracle, FigR, scGLUE, and GRaNIE.

Due to its flexible nature, the platform can incorporate various benchmark datasets and evaluation methods, using either prior knowledge or feature-based approaches. In the current version, due to the absence of standardized prior knowledge, we use indirect approaches to benchmark GRNs. Employing interventional data as evaluation datasets, we have developed 8 metrics using feature-based approach and Wasserstein distance, accounting for both accuracy and comprehensiveness.

Five datasets have been integrated so far, namely OPSCA, Nakatake, Norman, Adamson, and Replogle. For each dataset, standardized inference datasets are provided to be used for GRN inference and evaluation datasets are employed to benchmark. See our publication for the details of methods.

Authors & contributors

name roles
Jalil Nourisa author
Robrecht Cannoodt author
Antoine Passimier contributor
Marco Stock contributor
Christian Arnold contributor

API

flowchart TB
  file_atac_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-NA'>NA</a>")
  comp_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-method'>method</a>"/]
  file_prediction("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-NA'>NA</a>")
  comp_metric_regression[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-metric-regression'>metric_regression</a>"/]
  comp_metric_ws[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-ws-distance'>ws_distance</a>"/]
  comp_metric[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-metrics'>metrics</a>"/]
  file_score("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-NA'>NA</a>")
  file_evaluation_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-NA'>NA</a>")
  file_rna_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-multiomics-rna'>multiomics rna</a>")
  comp_method_r[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-method-r'>Method r</a>"/]
  file_atac_h5ad-.-comp_method
  comp_method-.->file_prediction
  file_prediction---comp_metric_regression
  file_prediction---comp_metric_ws
  file_prediction---comp_metric
  comp_metric_regression-->file_score
  comp_metric_ws-->file_score
  comp_metric-->file_score
  file_evaluation_h5ad-.-comp_metric_regression
  file_rna_h5ad---comp_method
  comp_method_r-.->file_prediction
Loading

File format: op_atac.h5ad

NA

Example file: resources_test/inference_datasets/op_atac.h5ad

Component type: method

A GRN inference method

Arguments:

Name Type Description
--rna file RNA expression for multiomics data.
--atac file (Optional) Peak data for multiomics data.
--prediction file (Optional, Output) GRN prediction.
--tf_all file (Optional) NA.
--max_n_links integer (Optional) NA. Default: 50000.
--num_workers integer (Optional) NA. Default: 4.
--temp_dir string (Optional) NA. Default: output/temdir.
--seed integer (Optional) NA. Default: 32.
--causal boolean (Optional) NA. Default: TRUE.

File format: collectri.csv

NA

Example file: resources_test/grn_models/op/collectri.csv

Component type: metric_regression

Calculates regression scores

Arguments:

Name Type Description
--prediction file GRN prediction.
--score file (Output) File indicating the score of a metric.
--method_id string (Optional) NA.
--layer string (Optional) NA. Default: X_norm.
--max_n_links integer (Optional) NA. Default: 50000.
--verbose integer (Optional) NA. Default: 2.
--dataset_id string (Optional) NA. Default: op.
--evaluation_data file (Optional) Perturbation dataset for benchmarking.
--tf_all file NA.
--reg_type string (Optional) NA. Default: ridge.
--subsample integer (Optional) NA. Default: -1.
--num_workers integer (Optional) NA. Default: 4.
--apply_tf boolean (Optional) NA. Default: TRUE.
--apply_skeleton boolean (Optional) NA. Default: FALSE.

Component type: ws_distance

Calculates Wasserstein distance for a given GRN and dataset

Arguments:

Name Type Description
--prediction file GRN prediction.
--score file (Output) File indicating the score of a metric.
--method_id string (Optional) NA.
--layer string (Optional) NA. Default: X_norm.
--max_n_links integer (Optional) NA. Default: 50000.
--verbose integer (Optional) NA. Default: 2.
--dataset_id string (Optional) NA. Default: op.
--ws_consensus file NA.
--ws_distance_background file NA.
--evaluation_data_sc file NA.

Component type: metrics

A metric to evaluate the performance of the inferred GRN

Arguments:

Name Type Description
--prediction file GRN prediction.
--score file (Output) File indicating the score of a metric.
--method_id string (Optional) NA.
--layer string (Optional) NA. Default: X_norm.
--max_n_links integer (Optional) NA. Default: 50000.
--verbose integer (Optional) NA. Default: 2.
--dataset_id string (Optional) NA. Default: op.

File format: score.h5ad

NA

Example file: resources_test/scores/score.h5ad

File format: op_perturbation.h5ad

NA

Example file: resources_test/evaluation_datasets/op_perturbation.h5ad

File format: multiomics rna

RNA expression for multiomics data.

Example file: resources_test/inference_datasets/op_rna.h5ad

Format:

AnnData object
 obs: 'cell_type', 'donor_id'
 layers: 'counts', 'X_norm'

Data structure:

Slot Type Description
obs["cell_type"] string (Optional) The annotated cell type of each cell based on RNA expression.
obs["donor_id"] string (Optional) Donor id.
layers["counts"] double (Optional) Counts matrix.
layers["X_norm"] double Normalized values.

Component type: Method r

A GRN inference method

Arguments:

Name Type Description
--rna_r file (Optional) NA.
--atac_r file (Optional) NA.
--prediction file (Optional, Output) GRN prediction.
--temp_dir string (Optional) NA. Default: output/temdir.
--num_workers integer (Optional) NA. Default: 4.