-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple CCS.bam file #1
Comments
Hi Won, If you have the ccs reads in separate “*.ccs.bam” files per cell, it should be okay to simply merge them with samtools. The important thing is that all the reads in the flnc fasta file are also found in the bam file. However, is your Iso-Seq dataset targeted or not? IsoCon is designed for use with targeted Iso-Seq sequencing. If you have non-targeted dataset, the algorithm will likely not scale (in runtime) since IsoCon uses an alignment strategy is optimized for highly similar sequences. There is a way to control this (set low value for --neighbor_search_depth, e.g. --neighbor_search_depth 1000 or lower), but it will likely affect the quality of the output. We are currently working on an approach for non-targeted data that uses many of IsoCon's ideas and I hope to release this repository soon. Best, |
Dear Kristoffer, Thank you for your comment. Cheers, Won |
Hi, No, IsoCon does not need the pulse features, it only needs the quality values that were generated for the CCS reads, i.e., the ccs bamfile should be the output generated by the tool ccs. Ok, good to know about the nontargeted. I will definitely let you know when we the nontargeted approach ready. Non-targeted data has more variable cut points at the end of transcripts and this can cause some redundancy in IsoCon. There is a parameter for that as well |
Thanks! Is it okay to use h5 to bam through bax2bam? Does it need to .pbi file as well as .bai? Also, does bam file need to be sorted? I will use this option --ignore_ends_len. It looks like process faster with --neighbor_search_depth Regards, |
Yes, that is what I've been using, namely:
The commands were taken from the snakemake file in our evaluation repository, line 180 and 196. No, it does not need to be sorted. The default output from |
Thank you so much. Won |
Hi again Won, Just wanted to let you know that while working on extending the IsoCon algorithm for nontargeted data (repository not available yet), I’ve discovered additional parts in the original IsoCon code that would not scale to a nontargeted dataset (especially of size 30 cells). So I wouldn’t wait for IsoCon to try to finish. While I’m incorporating some of the changes in the IsoCon code (e.g., this commit ), I still believe that IsoCon is not suitable for a nontargeted dataset (runtime-wise), unless the reads are somewhat broken into rough batches first, based on e.g. some sequence similarity and length. Best, |
Kristoffer, Thank you for letting me know. I will think about more way to do this. Regards, |
Hi Kristoffer, Is it possible to run with subset? For example, we are targeting some specific gene. I can blast or map the CCS to them then run IsoCon. |
That will probably work. Just make sure that all fasta sequences are also in the ccs.bam. Let me know how large your dataset is after blasting as well. It is possible that you want to set Let me know how it goes and I'm happy to help you get the most of this analysis. |
Dear Kristoffer,
Hello,
I am trying to use IsoCon for my transcriptome.
We have 30 cells to analysis and my flnc file was generated with 30 cells.
Is it okay to use merged bam file through Samtools? or bax2bam?
Thank you.
Won
The text was updated successfully, but these errors were encountered: