pathdinfR output for RNAseq and DIA_proteomics data #63

masaver · 2020-12-10T11:46:41Z

masaver
Dec 10, 2020

Hi Everyone,
I am studying pathways activity in one specific type of cancer, for which I have DIA_proteomics and RNAseq data, from normal and Tumor tissues.

For the proteomics data, I obtained the pvalues for differential expression using limma. For the RNAseq data, I used DESeq2. In both data types, there are several thousands of proteins/genes that are differentially regulated using and FDR of < 0.05.

When I run pathfindR ( with default parameters ) on the proteomics data, and then get the pathway scores, the differences between normal and Tumors look like this:

(Rows = Pathways ; Columns = Samples)

However, when I run pathfindR ( with default parameters, or even with sig_gene_thr = 0.01
and sig_gene_thr = 0.7 ), i get the "No Active Subnetworks" message.

So my questions for you guys are:

Is there any logical explanation for why this might happen ?
Are there any use cases of pathfindR with RNAseq data ?
How much can you lower sig_gene_thr & score_quan_thr , and still be "confident" in your results ? And how would you test for the "confidence" of your results ?

Many thanks in advance.

Best,

-Mathias

Answered by ozanozisik

Dec 10, 2020

Hello Mathias,
In pathfindR, in line with the scoring in jActiveModules by Ideker et al., a background score distribution is calculated and it is used to adjust the score of subnetworks. In your case, almost all genes are significant, which prevents any subnetwork from being significant. I suggest using a more strict filtering on your gene set, taking logfoldchange into account.
Best,
Ozan

View full answer

egeulgen · 2020-12-10T12:24:31Z

egeulgen
Dec 10, 2020
Maintainer

Hello Mathias,

I'm not sure about the exact message you're getting, but if there are no active subnetworks identified, the issue is usually a small number of input genes.

To better answer this question: what exactly is the message you get? How many differentially-expressed genes are there for your RNAseq data? Also, what version of pathfindR are you using? If you wouldn't mind sharing the data and the script you used for the RNAseq data, I could pinpoint any potential issues.
We have used pathfindR in many cases using output from RNAseq data. In fact, any gene-associated p-value data can be analyzed with pathfindR.
The default options for filtering active subnetworks were determined based on analyses of multiple datasets. We're still working on novel ways to ensure pathfindR is providing high-confidence results. There is no straightforward answer to your question. Ideally, any subnetwork that contains at least 2 input genes might be biologically relevant from the active-subnetwork-oriented enrichment analysis perspective. Hence, you may even keep all such active subnetworks.
For testing the confidence of your results, you can (a) search for literature support or (b) perform experimental validation of the enriched pathways.

Best,
-E

0 replies

masaver · 2020-12-10T15:19:33Z

masaver
Dec 10, 2020
Author

The pathfindR version I\m using is pathfindR_1.6.0 Also, from the differential expression analysis I saw that: Down-Regulated genes = 10443 Up-Regulated gene = 14386 Genes with no significant change = 8761 At the start of the analysis i get this message: "## Testing input The input looks OK ## Processing input. Converting gene symbols, if necessary (and if human gene symbols provided) Number of genes provided in input: 33588 Number of genes in input after p-value filtering: 24827 pathfindR cannot handle p values < 1e-13. These were changed to 1e-13" But then ,tat the end of the pathfindR analysis i get this message ( and the results object/data.frame is empty ): "Found 0 active subnetworks Warning message: Did not find any enriched terms!" I run pathfindR with the following command: rna.res.pathfindR = run_pathfindR( input = rna.pathfindR , output_dir = "../Results/rna.CRC.pathfindR_results.reactome" , gene_sets = "Reactome" , plot_enrichment_chart = FALSE , visualize_enriched_terms = FALSE , score_quan_thr = 0.7 , sig_gene_thr = 0.01 ) Where rna.pathfindR is the pathfindR input data.frame. I'm sending you a copy of it, saved as a .rds file, Let me know if you need more information . Best,

…

-Mathias

On Thu, Dec 10, 2020 at 1:24 PM Ege Ulgen ***@***.***> wrote: Hello Mathias, I'm not sure about the exact message you're getting, but if there are no active subnetworks identified, the issue is usually a small number of input genes. 1. To better answer this question: what exactly is the message you get? How many differentially-expressed genes are there for your RNAseq data? Also, what version of pathfindR are you using? If you wouldn't mind sharing the data and the script you used for the RNAseq data, I could pinpoint any potential issues. 2. We have used pathfindR in many cases using output from RNAseq data. In fact, any gene-associated p-value data can be analyzed with pathfindR. 3. The default options for filtering active subnetworks were determined based on analyses of multiple datasets. We're still working on novel ways to ensure pathfindR is providing high-confidence results. There is no straightforward answer to your question. Ideally, any subnetwork that contains at least 2 input genes might be biologically relevant from the active-subnetwork-oriented enrichment analysis perspective. Hence, you may even keep all such active subnetworks. For testing the confidence of your results, you can (a) search for literature support or (b) perform experimental validation of the enriched pathways. Best, -E — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#63 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGGIUT57DD2LLHEZULOPA6TSUC4Y5ANCNFSM4UU4ZQSQ> .

0 replies

ozanozisik · 2020-12-10T16:24:00Z

ozanozisik
Dec 10, 2020
Collaborator

Hello Mathias,
In pathfindR, in line with the scoring in jActiveModules by Ideker et al., a background score distribution is calculated and it is used to adjust the score of subnetworks. In your case, almost all genes are significant, which prevents any subnetwork from being significant. I suggest using a more strict filtering on your gene set, taking logfoldchange into account.
Best,
Ozan

0 replies

egeulgen · 2020-12-10T16:34:01Z

egeulgen
Dec 10, 2020
Maintainer

Hey @masaver,

As @ozanozisik pointed out, you should be filtering for significantly differentially expressed genes, taking into logFC account. Below are two volcano plots, showing p<0.05 only (left) and |LFC| > 1.5 + p < 0.05 (right). As you see, many of the genes in your RNAseq data have low logFC values, implying little impact (even if statistically significant).

After using the latter filtering approach (i.e., |LFC| > 1.5 + p < 0.05), I obtained 93 enriched pathways for your data (with default subnetwork filtering).

Hope these answers help,
Best,
-E

0 replies

masaver · 2020-12-14T09:32:39Z

masaver
Dec 14, 2020
Author

That was of great help!. Indeed, filtering the genes by |LFC| > 1.5 + p < 0.05 did the trick.
Thanks a lot for the Help.

Best,
-Mathias

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pathdinfR output for RNAseq and DIA_proteomics data #63

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

pathdinfR output for RNAseq and DIA_proteomics data #63

masaver Dec 10, 2020

Replies: 5 comments

egeulgen Dec 10, 2020 Maintainer

masaver Dec 10, 2020 Author

ozanozisik Dec 10, 2020 Collaborator

egeulgen Dec 10, 2020 Maintainer

masaver Dec 14, 2020 Author

masaver
Dec 10, 2020

egeulgen
Dec 10, 2020
Maintainer

masaver
Dec 10, 2020
Author

ozanozisik
Dec 10, 2020
Collaborator

egeulgen
Dec 10, 2020
Maintainer

masaver
Dec 14, 2020
Author