pyPAGE is a novel research tool for the analysis of differential gene expression.
Its aim is to provide unbiased estimates of changes in activity of gene regulatory programs. pyPAGE can be used to detect differentially active biological pathways, targets of transcription factors and post-transcriptional regulons controlled either by RNA binding proteins or miRNAs.
Moreover, pyPAGE is applicable both to the analysis of bulk and single-cell RNA-seq data.
Downloading and installing pyPAGE is as easy as typing the following command in your terminal:
pip install bio-pypage
Or alternatively you can install it from the source:
git clone https://github.com/goodarzilab/pypage
cd pypage
pip install -e .
pyPAGE is implemented as a python package.
To run it, you need to, first, download the relevant gene-set annotations and precompute differential gene expression.
You can download preprocessed annotations of TF and RBP regulons from here.
Moreover, MSigDB is a great resource to search for an annotation of interest.
For example, you might want to use "GO: Gene Ontology gene sets" and "MIR: microRNA targets".
Also there is a useful resource of TF targets annotations.
Then you need to compute differential gene expression. For the analysis of bulk data, you would typically use DESeq2, limma or EdgeR. These packages provide user with the log fold changes and their significance. Then differential gene expression can be calculated as 'sign(log_fold_change) * (1-p-value)'.
To compute differential expression in single cell data, a good strategy is to use Mann-Whitney U test:
import numpy as np
from scipy.stats import mannwhitneyu
def count_difexp_with_u_test(cells_A, cells_B):
"""
Computes differential gene expression between two groups of cells
Parameters
----------
cells_A: np.ndarray
expression of genes in the first group of cells (n*m matrix where n is the number of cells and m the number of genes)
cells_B: np.ndarray
expression of genes in the second group of cells
"""
n_genes = cells_A.shape[1]
sign = np.sign(cells_A.mean(axis=0) - cells_B.mean(axis=0))
p_values = np.empty(n_genes)
p_values[:] = np.nan
for i in range(n_genes):
if not np.array_equal(np.unique(cells_A[:, i]), np.unique(cells_B[:, i])):
p_values[i] = mannwhitneyu(cells_A[:, i], cells_B[:, i])[1]
difexp = sign * (1 - p_values)
return difexp
Note, that to perform the analysis post-transcriptional regulation, you need to estimate differential trancript stability. To do this, you need to first compute sample-wise stability estimates of transcripts using REMBRANDTS and then compare them between two conditions with a t-test.
Now that we have completed the preparation, we can proceed directly to using pyPAGE.
First, load the relevant gene-set annotation:
from pypage import GeneSets
def load_annotation(ann_file, first_col_is_genes=False):
gs_ann = GeneSets(ann_file=ann_file, n_bins=3, first_col_is_genes=first_col_is_genes)
return gs_ann
Make sure to check whether the first column in the annotation corresponds to genes or gene-set names.
Next, load the expression data and convert gene names to the same format as the one used in the annotation:
import pandas as pd
from pypage import ExpressionProfile
def load_expression(expression_file):
df = pd.read_csv(expression_file,
sep="\t",
header=0,
names=["gene", "exp"])
exp = ExpressionProfile(df.iloc[:, 0],
df.iloc[:, 1],
n_bins=10)
exp.convert_from_to('refseq', 'ensg', 'human')
return exp
Finally, run pyPAGE algorithm and visualize the results:
from pypage import PAGE
def run_pyPAGE(expression, annotation, show_reg=False):
p = PAGE(
expression,
annotation,
n_shuffle=1000,
k=7,
filter_redundant=True
)
results, hm = p.run()
if show_reg:
hm.convert_from_to('gs', 'ensg', 'human')
hm.show(show_reg=show_reg)
return results
If your annotation gene-sets are named after regulators and you want to display their expression in the output, you should convert their names to match the accession type used in your expression data. Additionally, make sure to set the show_reg parameter to true in the Heatmap.show function to enable this feature.
For an exhaustive description of pyPAGE functions refer to the manual.
MIT license
See the paper
pyPAGE was developed in Goodarzi lab at UCSF by Artemy Bakulin, Noam B Teyssier and Hani Goodarzi.