This repository contains the scripts to run the analyses described in the
PathCORE-T paper. Running ./ANALYSIS.sh
is sufficient to reproduce
the results in the paper. To use PathCORE-T in your own analyses, please
review the sections from
The PathCORE-T analysis workflow onwards in
this README.
We released two Python packages for PathCORE-T:
- PathCORE-T
- crosstalk-correction: listed as a dependency in PathCORE-T
The two packages are used in this analysis repository.
The data directory
A README is provided in the ./data
directory with details about the scripts
to download and/or process datasets, data source citations, etc.
The figures directory
All figures in the PathCORE-T paper are also available here.
The jupyter-notebooks directory
Scripts used to generate Figure 3 and Supplemental Figure 2 are provided in notebook format. We have found that we can offer greater detail about each of the figures in this format.
This directory also contains 2 notebooks that users can read through or run when they are getting started with PathCORE-T analysis:
Please review one of the analysis_<dataset>_<model>.sh
scripts for an example
of the workflow.
In the figure below, (a) is used to generate the weight matrix and (b) specifies the inputs to the PathCORE-T analysis in (c):
-
Iterates through a directory of weight matrices generated by a feature construction algorithm that has been applied to a transcriptomic dataset. Multiple weight matrices can be constructed from the same algorithm initialized with different random seeds. The eADAGE example uses multiple weight matrices, whereas the two NMF examples only use one weight matrix.
-
Iterates through a directory of network files and applies a permutation test to the networks to determine edge significance. If there is more than 1 network file in the directory, the networks are combined to make a single aggregate network. Edges that are significant under their corresponding nulls (generated by the permutation test) are kept in the final network.
-
constants directory
This module allows for import of two dictionaries:
GENE_SIGNATURE_DEFINITIONS
andSHORTEN_PATHWAY_NAMES
. These are intended to be modified when you need to run PathCORE-T using a feature construction algorithm and/or pathway definitions different from those in our case studies.In most cases, the files in
constants
should be the only ones you may need to modify to run an analysis of your own. -
Utility functions for file reading & processing.
Here we describe the steps taken to prepare the database that backs the PathCORE-T demo application. The demo application is built on the Flask microframework and deployed on Heroku. The database is a MongoDB instance hosted on mLab.
Both Heroku and mLab provide free tier options for their services.
Note that the --metadata
flag is used in
analysis_Paeruginosa_eADAGE.sh
for run_network_creation.py
ahead of the web application setup
carried out by running web_db_Paeruginosa_eADAGE.sh.
-
Creates the following collections:
-
genes: Stores the gene identifiers. Assumes these can be retrieved from the first column (the row names/index) of the transcriptomic dataset. For the PAO1 example, we provided an additional file (for more information, see
data/README.md
) that has the common names corresponding to the gene locus tags specified in the compendium. -
pathways: Stores the pathway & definition information from the pathway definitions file.
-
sample_labels: Stores the sample labels and the corresponding normalized expression values. Assumes the labels can be retrieved from the first row (the header) of the transcriptomic dataset and each column is the vector of expression values corresponding to that sample.
-
network_edges: Stores the network files in the
networks
directory created by runningrun_network_creation.py
. -
network_feature_signatures: Stores the feature gene signature information in the
metadata
directory created by runningrun_network_creation.py ... --metadata
-
network_feature_pathways: Stores the feature pathway definitions in the
metadata
directory created by runningrun_network_creation.py ... --metadata
-
sample_annotations: Specific to the PAO1 example, we store additional information about the samples in the compendium that can be displayed on the web application (for more information about the sample annotations file, see `data/README.md).
-
-
Creates the collection
pathcore_edge_data
. All information needed in an edge page is stored here (e.g. computes gene odds ratios, sample "summary" expression scores, creates heatmaps based on these values).
-
Utility files in support of the PAO1 example. Gets the gene common names and sample annotations information.
- Register for an mLab account at mLab.com.
- Create new: Create a free sandbox database (0.5 GB).
- Database Users tab: Add a user to the new database that has write-access.
- Create a credentials file (see example-mLab-credentials.yml)
Step 4: The PathCORE-T-demo source code
Fork the PathCORE-T-demo repository. Follow the setup instructions in the repository's README. Update or remove any text or code specific to the eADAGE-based, KEGG PAO1 case study so that the web application accurately describes and supports your analysis.