Multiple CCS.bam file #1

wyim-pgl · 2018-02-16T01:09:52Z

Dear Kristoffer,

Hello,
I am trying to use IsoCon for my transcriptome.
We have 30 cells to analysis and my flnc file was generated with 30 cells.
Is it okay to use merged bam file through Samtools? or bax2bam?

Thank you.

Won

ksahlin · 2018-02-16T01:34:47Z

Hi Won,

If you have the ccs reads in separate “*.ccs.bam” files per cell, it should be okay to simply merge them with samtools. The important thing is that all the reads in the flnc fasta file are also found in the bam file.

However, is your Iso-Seq dataset targeted or not? IsoCon is designed for use with targeted Iso-Seq sequencing. If you have non-targeted dataset, the algorithm will likely not scale (in runtime) since IsoCon uses an alignment strategy is optimized for highly similar sequences. There is a way to control this (set low value for --neighbor_search_depth, e.g. --neighbor_search_depth 1000 or lower), but it will likely affect the quality of the output. We are currently working on an approach for non-targeted data that uses many of IsoCon's ideas and I hope to release this repository soon.

Best,
Kristoffer

wyim-pgl · 2018-02-16T17:46:31Z

Dear Kristoffer,

Thank you for your comment.
This dataset is NOT targeted. Our species is just polyploidy.
I will use --neighbor_search_depth option to reduce the runtime.
Does BAM file need special pulsefeatures? such as DeletionQV,DeletionTag,InsertionQV,IPD,MergeQV ?

Cheers,

Won

ksahlin · 2018-02-16T18:09:52Z

Hi,

No, IsoCon does not need the pulse features, it only needs the quality values that were generated for the CCS reads, i.e., the ccs bamfile should be the output generated by the tool ccs.

Ok, good to know about the nontargeted. I will definitely let you know when we the nontargeted approach ready. Non-targeted data has more variable cut points at the end of transcripts and this can cause some redundancy in IsoCon. There is a parameter for that as well --ignore_ends_len that we set to default value of 15 for targeted. It is possible that ends have higher variability in non-targeted and should therefor be increased (with the obvious downside if they are two different isoforms). I don't have any data on this variability for a good estimate though, maybe 30-50 or so.

wyim-pgl · 2018-02-16T18:14:47Z

Thanks!

Is it okay to use h5 to bam through bax2bam?

Does it need to .pbi file as well as .bai?

Also, does bam file need to be sorted?

I will use this option --ignore_ends_len.

It looks like process faster with --neighbor_search_depth

Regards,
Won

ksahlin · 2018-02-16T18:38:45Z

Yes, that is what I've been using, namely: bax2bam {hdf5_path}/*bax.h5 -o {out}. Then, for the ccs tool, we have been using the commands (based on recommended settings):

ccs --numThreads=64 --polish --minLength=10 --minPasses=1 --minZScore=-999 --maxDropFraction=0.8 --minPredictedAccuracy=0.8 --minSnr=4 {input.bam_subreads} {output.ccs_bam}

The commands were taken from the snakemake file in our evaluation repository, line 180 and 196.

No, it does not need to be sorted. The default output from ccs works.

wyim-pgl · 2018-02-16T19:03:00Z

Thank you so much.
I am running and let you know.
Cheers,

Won

ksahlin · 2018-02-25T22:47:38Z

Hi again Won,

Just wanted to let you know that while working on extending the IsoCon algorithm for nontargeted data (repository not available yet), I’ve discovered additional parts in the original IsoCon code that would not scale to a nontargeted dataset (especially of size 30 cells). So I wouldn’t wait for IsoCon to try to finish. While I’m incorporating some of the changes in the IsoCon code (e.g., this commit ), I still believe that IsoCon is not suitable for a nontargeted dataset (runtime-wise), unless the reads are somewhat broken into rough batches first, based on e.g. some sequence similarity and length.

Best,
K

wyim-pgl · 2018-02-26T17:44:03Z

Kristoffer,

Thank you for letting me know.

I will think about more way to do this.

Regards,

wyim-pgl · 2018-03-23T03:51:11Z

Hi Kristoffer,

Is it possible to run with subset? For example, we are targeting some specific gene. I can blast or map the CCS to them then run IsoCon.

ksahlin · 2018-03-23T17:38:15Z

That will probably work. Just make sure that all fasta sequences are also in the ccs.bam. Let me know how large your dataset is after blasting as well. It is possible that you want to set --ignore_ends_len to higher than 15 (default) if your reads are not cut at relatively precise breakpoints.

Let me know how it goes and I'm happy to help you get the most of this analysis.

ksahlin · 2018-03-29T15:17:42Z

Hi @Ascendo , just wanted to notify how you can possibly make your analysis faster for a nontargeted dataset. Take-home message: cut transcripts at precise ends after blasting. See issue2 and issue 3.

wyim-pgl closed this as completed Feb 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple CCS.bam file #1

Multiple CCS.bam file #1

wyim-pgl commented Feb 16, 2018

ksahlin commented Feb 16, 2018

wyim-pgl commented Feb 16, 2018

ksahlin commented Feb 16, 2018

wyim-pgl commented Feb 16, 2018 •

edited

Loading

ksahlin commented Feb 16, 2018

wyim-pgl commented Feb 16, 2018

ksahlin commented Feb 25, 2018

wyim-pgl commented Feb 26, 2018

wyim-pgl commented Mar 23, 2018

ksahlin commented Mar 23, 2018

ksahlin commented Mar 29, 2018

Multiple CCS.bam file #1

Multiple CCS.bam file #1

Comments

wyim-pgl commented Feb 16, 2018

ksahlin commented Feb 16, 2018

wyim-pgl commented Feb 16, 2018

ksahlin commented Feb 16, 2018

wyim-pgl commented Feb 16, 2018 • edited Loading

ksahlin commented Feb 16, 2018

wyim-pgl commented Feb 16, 2018

ksahlin commented Feb 25, 2018

wyim-pgl commented Feb 26, 2018

wyim-pgl commented Mar 23, 2018

ksahlin commented Mar 23, 2018

ksahlin commented Mar 29, 2018

wyim-pgl commented Feb 16, 2018 •

edited

Loading