Skip to content

DEXSeq_comparison.R -- questionable dpsi calls in SE events without spanPsi filter #21

@SolKatzman

Description

@SolKatzman

junctionCounts version 1.0.0

As was discussed in item #17 when I run DEXSeq_comparison.R, it looks like (from my instrumented version of the R script) that for SE events, the R opject "spanPSI" is empty. For RI events it is not. I assume that this is because the ri_span option is only applied to RI events. Please confirm.

I was examining the outputs of the DEXSeq script and noticed some anomalies. Here is a portion of the dpsi output file, which I have reformatted (as discussed in item #20) into a tsv.

eventId	dpsi	absdpsi	qval	contrl_psi	contrl_ijc	contrl_ejc	u2muca_psi	u2muca_ijc	u2muca_ejc	eventType	chrom	start	end	strand	gene
SE.0089644	0.107	0.107	3.113e-03	0.328	29	0	0.435	20	6	SE	chr7	26174671	26192577	-	HNRNPA2B1
SE.0008842	-0.220	0.220	2.429e-02	0.496	26	0	0.276	10	3	SE	chr20	38431041	38435320	-	SNHG17
SE.0085868	-0.054	0.054	4.317e-02	0.333	14	0	0.279	10	3	SE	chr1	25819978	25823748	+	MTFR1L
SE.0028052	0.513	0.513	6.163e-02	0.000	14	0	0.513	14	3	SE	chr7	117209787	117230096	+	ST7
SE.0076552	0.153	0.153	1.089e-01	0.331	74	0	0.484	88	5	SE	chr17	5387605	5391662	-	NUP88
SE.0049595	0.309	0.309	1.579e-01	0.333	41	0	0.643	41	2	SE	chr12	53315094	53315782	-	AAAS

This represents a portion of the full dpsi tsv file, available here:

SE_contrl_u2muca_dpsi.tsv

Note that the lines in the above subset have
a) very strong qval
b) non-trivial dpsi values

I selected them from the full dpsi.tsv file by specifying in the "contrl" condition:
a) ejc = 0
b) psi < 0.5

I got a similar set by using the same filters for the "u2muca" condition.

Interestingly, there are no lines meeting a similar filter like this:
a) ijc = 0
b) psi > 0.5

Now if ejc is 0 and ijc is not too small, we would expect that psi would be close to 1. So why is it small?

To answer, I went back to the psi.tsv files for the samples comprising the "contrl" condition, available here:

SE_ctl_ms001

SE_ctl_ms002

SE_ctl_ms003

Here are the lines for SE.0028052 from psi_outfile.tsv for the 3 contrl samples:

sample_name	event_id	event_type	min_ijc	min_ejc	avg_psi	max_gene_frac	all_ijc	all_ejc	avg_ijc	avg_ejc	span_psi	min_psi	ijc_min_psi	ejc_min_psi	max_psi	ijc_max_psi	ejc_max_psi	mid_psi	bootstrap_num
hek_ctl783_ms001.SE_count_psi_outfile.tsv:hek_ctl783_ms001	SE.0028052	SE	0	0	NA	NA	0,35	0	17.5	0.0	NA	NA	NA	NA	NA	NA	NA	NA	NA
hek_ctl783_ms002.SE_count_psi_outfile.tsv:hek_ctl783_ms002	SE.0028052	SE	0	0	NA	NA	0,23	0	11.5	0.0	NA	NA	NA	NA	NA	NA	NA	NA	NA
hek_ctl783_ms003.SE_count_psi_outfile.tsv:hek_ctl783_ms003	SE.0028052	SE	0	0	NA	NA	0,29	0	14.5	0.0	NA	NA	NA	NA	NA	NA	NA	NA	NA

And here are the lines for SE.0028052 from psi_outfile.txt for the 3 contrl samples:

hek_ctl783_ms001.SE_count_psi_outfile.txt:SE.0028052:001	18
hek_ctl783_ms001.SE_count_psi_outfile.txt:SE.0028052:002	0
hek_ctl783_ms002.SE_count_psi_outfile.txt:SE.0028052:001	12
hek_ctl783_ms002.SE_count_psi_outfile.txt:SE.0028052:002	0
hek_ctl783_ms003.SE_count_psi_outfile.txt:SE.0028052:001	15
hek_ctl783_ms003.SE_count_psi_outfile.txt:SE.0028052:002	0

Okay, it is clear that this is probably not an SE event. The "all_ijc" fields show that one of the junctions is spliced, but the other is not. The mean ijc does turn out to be about half of the single splice junction mean. But there are NAs for the psi and span_psi. Why? I guess these get turned into 0 for the dexseq run.

If you want to see the ejc=0 filtered sets of events they are in this directory, along with the dpsi.csv and complete dpsi.tsv file. Hopefully the filenames are obvious. I believe you will find many that (apparently erroneously) satisfy your standard dpsi and qval cutoffs. I think you might need a spanPsi filter for the SE (and probably other) events. But it looks like you might not be able to rely on the psi.tsv files, unless you can eliiminate the NA values.

SE_contrl_u2muca_dpsi

And here is the full scratch directory for the DEXseq run that produced those outputs, including a log and subdirectories for the sample psi inputs and the splice-lib inputs.:

scratchDir SE_contrl_u2muca

I look forward to your comments on this.

Sol Katzman
UC Santa Cruz Genomics Institute

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions