-
Notifications
You must be signed in to change notification settings - Fork 1
Description
junctionCounts version 1.0.0
As was discussed in item #17 when I run DEXSeq_comparison.R, it looks like (from my instrumented version of the R script) that for SE events, the R opject "spanPSI" is empty. For RI events it is not. I assume that this is because the ri_span option is only applied to RI events. Please confirm.
I was examining the outputs of the DEXSeq script and noticed some anomalies. Here is a portion of the dpsi output file, which I have reformatted (as discussed in item #20) into a tsv.
eventId dpsi absdpsi qval contrl_psi contrl_ijc contrl_ejc u2muca_psi u2muca_ijc u2muca_ejc eventType chrom start end strand gene
SE.0089644 0.107 0.107 3.113e-03 0.328 29 0 0.435 20 6 SE chr7 26174671 26192577 - HNRNPA2B1
SE.0008842 -0.220 0.220 2.429e-02 0.496 26 0 0.276 10 3 SE chr20 38431041 38435320 - SNHG17
SE.0085868 -0.054 0.054 4.317e-02 0.333 14 0 0.279 10 3 SE chr1 25819978 25823748 + MTFR1L
SE.0028052 0.513 0.513 6.163e-02 0.000 14 0 0.513 14 3 SE chr7 117209787 117230096 + ST7
SE.0076552 0.153 0.153 1.089e-01 0.331 74 0 0.484 88 5 SE chr17 5387605 5391662 - NUP88
SE.0049595 0.309 0.309 1.579e-01 0.333 41 0 0.643 41 2 SE chr12 53315094 53315782 - AAAS
This represents a portion of the full dpsi tsv file, available here:
Note that the lines in the above subset have
a) very strong qval
b) non-trivial dpsi values
I selected them from the full dpsi.tsv file by specifying in the "contrl" condition:
a) ejc = 0
b) psi < 0.5
I got a similar set by using the same filters for the "u2muca" condition.
Interestingly, there are no lines meeting a similar filter like this:
a) ijc = 0
b) psi > 0.5
Now if ejc is 0 and ijc is not too small, we would expect that psi would be close to 1. So why is it small?
To answer, I went back to the psi.tsv files for the samples comprising the "contrl" condition, available here:
Here are the lines for SE.0028052 from psi_outfile.tsv for the 3 contrl samples:
sample_name event_id event_type min_ijc min_ejc avg_psi max_gene_frac all_ijc all_ejc avg_ijc avg_ejc span_psi min_psi ijc_min_psi ejc_min_psi max_psi ijc_max_psi ejc_max_psi mid_psi bootstrap_num
hek_ctl783_ms001.SE_count_psi_outfile.tsv:hek_ctl783_ms001 SE.0028052 SE 0 0 NA NA 0,35 0 17.5 0.0 NA NA NA NA NA NA NA NA NA
hek_ctl783_ms002.SE_count_psi_outfile.tsv:hek_ctl783_ms002 SE.0028052 SE 0 0 NA NA 0,23 0 11.5 0.0 NA NA NA NA NA NA NA NA NA
hek_ctl783_ms003.SE_count_psi_outfile.tsv:hek_ctl783_ms003 SE.0028052 SE 0 0 NA NA 0,29 0 14.5 0.0 NA NA NA NA NA NA NA NA NA
And here are the lines for SE.0028052 from psi_outfile.txt for the 3 contrl samples:
hek_ctl783_ms001.SE_count_psi_outfile.txt:SE.0028052:001 18
hek_ctl783_ms001.SE_count_psi_outfile.txt:SE.0028052:002 0
hek_ctl783_ms002.SE_count_psi_outfile.txt:SE.0028052:001 12
hek_ctl783_ms002.SE_count_psi_outfile.txt:SE.0028052:002 0
hek_ctl783_ms003.SE_count_psi_outfile.txt:SE.0028052:001 15
hek_ctl783_ms003.SE_count_psi_outfile.txt:SE.0028052:002 0
Okay, it is clear that this is probably not an SE event. The "all_ijc" fields show that one of the junctions is spliced, but the other is not. The mean ijc does turn out to be about half of the single splice junction mean. But there are NAs for the psi and span_psi. Why? I guess these get turned into 0 for the dexseq run.
If you want to see the ejc=0 filtered sets of events they are in this directory, along with the dpsi.csv and complete dpsi.tsv file. Hopefully the filenames are obvious. I believe you will find many that (apparently erroneously) satisfy your standard dpsi and qval cutoffs. I think you might need a spanPsi filter for the SE (and probably other) events. But it looks like you might not be able to rely on the psi.tsv files, unless you can eliiminate the NA values.
And here is the full scratch directory for the DEXseq run that produced those outputs, including a log and subdirectories for the sample psi inputs and the splice-lib inputs.:
I look forward to your comments on this.
Sol Katzman
UC Santa Cruz Genomics Institute