Can IsoCon be used on nontargeted Iso-Seq data sets? #2

ksahlin · 2018-03-26T15:54:03Z

In general: No. IsoCon is designed for targeted sequencing where the CCS flnc reads are cut at relatively precise positions (i.e., at the start and stop primer sites). If this is not the case it may both affect runtime and quality of the output.

However, if a nontargeted Iso-Seq dataset is processed such that the flnc reads from a particular gene are extracted (e.g., by using pre-cluster module from TuFU or aligning ccs reads to genome/transcriptome and separate by region) and these reads are cut at the same start and end position -- IsoCon should work well. Keep in mind though that if reads are "cut", the quality values associated with the ccs reads will also have to be cut the same way to preserve the base quality values remains to their base. This could be done relatively easily from the bam file.

wyim-pgl · 2018-10-04T23:28:40Z

Do you have any example to do it?
Does blasting means NCBI BLAST or reads alignment?
Thanks!

ksahlin · 2018-10-05T00:45:22Z

Aligning is the better expression, any aligner aligning CCS reads to genome or transcripts should work. Thanks!

As for an implemented example I don't have any. But this simple procedure should work:

Align CCS reads to reference of choice (genomic or transcripts) using minimap2 with -a set to produce a sam file. Minimap2 should have a parameter combination customized for aligning Iso-Seq reads.
Use samtools to extract reads aligning to the region of interest
Either run IsoCon directly on this subset of reads, or try to trim these reads based on their start and stop coordinates of the alignments, and run IsoCon on the trimmed version of these reads.

The "trimming" part is the only step that doesn't have a standard tool to do this. But it's possible it could work without this step. Especially if the resulting dataset is small (say, less than 10,000 reads).

wyim-pgl · 2018-10-05T15:24:20Z

The problem of my CCS fastq is the quality score is 5. Subreads fastq file has !. It looks like place holder during the SMRT analysis. Do you have any opinions regarding this? Thanks!

…

------------------------------- Won Cheol Yim, Ph.D Assistant Professor Department of Biochemistry & Molecular Biology University of Nevada – Reno MS330 1664 N. Virginia Street Reno NV 89557 Office: +1 775-682-9447 Lab: +1 775-682-9448 Fax: 775-784-1419 Email: wyim@unr.edu http://www.plantbioinformatics.org From: Kristoffer Sent: Thursday, October 4, 5:45 PM Subject: Re: [ksahlin/IsoCon] Can IsoCon be used on nontargeted Iso-Seq data sets? (#2) To: ksahlin/IsoCon Cc: Won C Yim, Comment Aligning is the better expression, any aligner alinging ccs reads to genome or transcripts should work. Thanks! As for an implemented example I don't have any. But this simple procedure should work: Align CCS reads to reference of choice (genomic or transcripts) using minimap2 with -a set to produce a sam file. Minimap2 should have a parameter combination customized for aligning Iso-Seq reads. Use samtools to extract reads aligning to the region of interest Either run IsoCon directly on these reads, or try to trim the reads based on their start stop coordinates of the alignments. The "trimming" part is the only step that doesn't have a standard tool to do this. But it's possible it could work without this step. Especially if the resulting dataset is small (say, less than 10,000 reads). — You are receiving this because you commented. Reply to this email directly, view it on GitHub<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fksahlin%2FIsoCon%2Fissues%2F2%23issuecomment-427212980&data=01%7C01%7Cwyim%40unr.edu%7C2de7c0ad299a4e0e67d108d62a5bd5cb%7C523b4bfc0ebd4c03b2b96f6a17fd31d8%7C1&sdata=l1GoK9o8njn7ScnYaZYSMKIr4eLnL%2FIrFYztzrGAQQ0%3D&reserved=0>, or mute the thread<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA3XIVjqFAXvxImI1pCx4eAjRFuBtz_dks5uhqujgaJpZM4S7Ydv&data=01%7C01%7Cwyim%40unr.edu%7C2de7c0ad299a4e0e67d108d62a5bd5cb%7C523b4bfc0ebd4c03b2b96f6a17fd31d8%7C1&sdata=N5e0H%2Bxa803mg2LrLMO3DNnAvn81OEjxr1FMCX7C2bo%3D&reserved=0>.

ksahlin · 2018-10-05T18:23:17Z

Running PacBio's CCS caller ccs with the parameter --polish on the subreads.bam files produces a ccs.bam file with base qualities. This ccs.bam file can be supplied together with a fasta file that contains only the flnc reads to IsoCon as

IsoCon pipeline -fl_reads <flnc.fasta> -outfolder </path/to/output> --ccs </path/to/filename.ccs.bam>

Where the flnc file can be obtained e.g. from lima and isoseq3 cluster in the new Iso-Seq pipeline.

IsoCon can however also be run with only a fasta file as (meaning that you would only have to convert the fastq to a fasta):

IsoCon pipeline -fl_reads <flnc.fasta> -outfolder </path/to/output>

However, since individual base qualities plays a key role in the algorithm, the accuracy of IsoCon will likely give better results with quality values.

ksahlin added the question label Mar 26, 2018

ksahlin mentioned this issue Mar 29, 2018

Multiple CCS.bam file #1

Closed

ksahlin mentioned this issue Apr 9, 2018

Can we used IsoCon for a organism with a large polyploid genome / PBS: job killed: mem 127935680kb exceeded limit 125829120kb #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can IsoCon be used on nontargeted Iso-Seq data sets? #2

Can IsoCon be used on nontargeted Iso-Seq data sets? #2

ksahlin commented Mar 26, 2018 •

edited

Loading

wyim-pgl commented Oct 4, 2018

ksahlin commented Oct 5, 2018 •

edited

Loading

wyim-pgl commented Oct 5, 2018 via email

ksahlin commented Oct 5, 2018

Can IsoCon be used on nontargeted Iso-Seq data sets? #2

Can IsoCon be used on nontargeted Iso-Seq data sets? #2

Comments

ksahlin commented Mar 26, 2018 • edited Loading

wyim-pgl commented Oct 4, 2018

ksahlin commented Oct 5, 2018 • edited Loading

wyim-pgl commented Oct 5, 2018 via email

ksahlin commented Oct 5, 2018

ksahlin commented Mar 26, 2018 •

edited

Loading

ksahlin commented Oct 5, 2018 •

edited

Loading