Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can IsoCon be used on nontargeted Iso-Seq data sets? #2

Open
ksahlin opened this issue Mar 26, 2018 · 4 comments
Open

Can IsoCon be used on nontargeted Iso-Seq data sets? #2

ksahlin opened this issue Mar 26, 2018 · 4 comments
Labels

Comments

@ksahlin
Copy link
Owner

ksahlin commented Mar 26, 2018

In general: No. IsoCon is designed for targeted sequencing where the CCS flnc reads are cut at relatively precise positions (i.e., at the start and stop primer sites). If this is not the case it may both affect runtime and quality of the output.

However, if a nontargeted Iso-Seq dataset is processed such that the flnc reads from a particular gene are extracted (e.g., by using pre-cluster module from TuFU or aligning ccs reads to genome/transcriptome and separate by region) and these reads are cut at the same start and end position -- IsoCon should work well. Keep in mind though that if reads are "cut", the quality values associated with the ccs reads will also have to be cut the same way to preserve the base quality values remains to their base. This could be done relatively easily from the bam file.

@wyim-pgl
Copy link

wyim-pgl commented Oct 4, 2018

Do you have any example to do it?
Does blasting means NCBI BLAST or reads alignment?
Thanks!

@ksahlin
Copy link
Owner Author

ksahlin commented Oct 5, 2018

Aligning is the better expression, any aligner aligning CCS reads to genome or transcripts should work. Thanks!

As for an implemented example I don't have any. But this simple procedure should work:

  1. Align CCS reads to reference of choice (genomic or transcripts) using minimap2 with -a set to produce a sam file. Minimap2 should have a parameter combination customized for aligning Iso-Seq reads.
  2. Use samtools to extract reads aligning to the region of interest
  3. Either run IsoCon directly on this subset of reads, or try to trim these reads based on their start and stop coordinates of the alignments, and run IsoCon on the trimmed version of these reads.

The "trimming" part is the only step that doesn't have a standard tool to do this. But it's possible it could work without this step. Especially if the resulting dataset is small (say, less than 10,000 reads).

@wyim-pgl
Copy link

wyim-pgl commented Oct 5, 2018 via email

@ksahlin
Copy link
Owner Author

ksahlin commented Oct 5, 2018

Running PacBio's CCS caller ccs with the parameter --polish on the subreads.bam files produces a ccs.bam file with base qualities. This ccs.bam file can be supplied together with a fasta file that contains only the flnc reads to IsoCon as

IsoCon pipeline -fl_reads <flnc.fasta> -outfolder </path/to/output> --ccs </path/to/filename.ccs.bam>

Where the flnc file can be obtained e.g. from lima and isoseq3 cluster in the new Iso-Seq pipeline.

IsoCon can however also be run with only a fasta file as (meaning that you would only have to convert the fastq to a fasta):

IsoCon pipeline -fl_reads <flnc.fasta> -outfolder </path/to/output>

However, since individual base qualities plays a key role in the algorithm, the accuracy of IsoCon will likely give better results with quality values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants