Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gtf2gtf - not selecting longest transcript? #293

Open
ejduncan opened this issue Nov 2, 2016 · 8 comments
Open

gtf2gtf - not selecting longest transcript? #293

ejduncan opened this issue Nov 2, 2016 · 8 comments

Comments

@ejduncan
Copy link

ejduncan commented Nov 2, 2016

I am wanting to extract the longest transcript for each gene from a gtf file (or gff3 file). I have installed cgat gtf2gtf and have tried using various parameters to do this using Drosophila melanogaster r6.12.gtf. It pulls out a single transcript for each gene, but not necessarily the longest transcript (e.g. ocm-RB is selected, yet it is shorter than ocm-RA and GlyS-RA is selected when it is shorter than GlyS-RB).

I was just wondering if anyone else has had problems like this and could give me some advice on how to solve?

Thanks in advance!
Liz

@Acribbs
Copy link
Member

Acribbs commented Nov 14, 2017

Sorry for not replying, it seems as though your issue was missed over a year ago! Did you manage to solve your issue?

@ejduncan ejduncan reopened this Nov 15, 2017
@ejduncan
Copy link
Author

I didn't manage to solve this unfortunately and I am just about to do another (quite large) set of analyses. Any help or advice would be greatly appreciated! Thanks.

@Acribbs
Copy link
Member

Acribbs commented Nov 15, 2017

Are you able to provide s few lines of example input, the command you used and the output so I can help recreate and understand your issue. Thanks

AndreasHeger added a commit that referenced this issue Nov 17, 2017
@AndreasHeger
Copy link
Member

Hi @ejduncan and @Acribbs . I think there were two issue. The length calculation did not take into only exons, but also any other annotations. This was a bug and is now fixed, 'length' is now only counted based on the --exon feature.

Also, "longest-transcript" is unfortunately a bit ambiguous. Longest transcript here is the one with the longest "transcript-length", which might not be the one with the longest genomic span. I have added more options to make this clearer, --filter-method can now be
longest-transcript-genomic-span, longest-transcript-transcript-length, and longest-transcript-exon-count.

I now get: transcript-length: ocm-RA, GlyS-RB
genomic-span: ocm-RB, GlyS-RC
Hope this is better.

@Acribbs
Copy link
Member

Acribbs commented Nov 17, 2017

@AndreasHeger Thanks for the explanation.

@Acribbs
Copy link
Member

Acribbs commented Nov 24, 2017

@AndreasHeger @ejduncan can this issue be closed?

@AndreasHeger
Copy link
Member

Ok for me, but would be good to know if it now behaves as expected for @ejduncan

@ejduncan
Copy link
Author

ejduncan commented Nov 27, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants