Error in gtf library when trying to add stringtie output #77

lukaas33 · 2024-08-26T10:13:54Z

I am getting the following error from the gtf library when trying to annotate my vcf with stringtie output.

$ vcf-expression-annotator -o /shared_dir/temp.annotated.vcf -s sample /shared_dir/temp.vep.vcf /shared_dir/temp.abundance.tsv stringtie transcript
Traceback (most recent call last):
  File "/opt/miniconda/envs/gatk/lib/python3.6/site-packages/gtfparse/read_gtf.py", line 121, in parse_with_polars_lazy
    **kwargs).lazy()
AttributeError: 'LazyFrame' object has no attribute 'lazy'

Version 2.0.1 with Python 3.6

I have also reported this issue at the gtf page:
openvax/gtfparse#49

The text was updated successfully, but these errors were encountered:

lukaas33 · 2024-08-26T15:17:00Z

I have now ran the same command in the latest vatools Docker image and am getting a new error:

> vcf-expression-annotator -o /shared_dir/temp.annotated.vcf -s sample /shared_dir/temp.vep.vcf /shared_dir/temp.abundance.tsv stringtie transcript
Traceback (most recent call last):
  File "parsers.pyx", line 1160, in pandas._libs.parsers.TextReader._convert_tokens
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

lukaas33 · 2024-08-26T15:20:34Z

The start of my stringtie output:

Gene ID	Gene Name	Reference	Strand	Start	End	Coverage	FPKM	TPM
ENSG00000282881	TMEM275	1	-	46532166	46543969	0.0	0.0	0.0
ENSG00000201405	Y_RNA	1	+	23370254	23370346	0.0	0.0	0.0
ENSG00000143774	GUK1	1	+	228139962	228148984	0.0	0.0	0.0
ENSG00000288775	.	1	-	159776325	159779383	0.0	0.0	0.0
ENSG00000239887	C1orf226	1	+	162378841	162386812	0.0	0.0	0.0
ENSG00000200575	RNU6-414P	1	+	61816419	61816522	0.0	0.0	0.0
ENSG00000251785	RNA5SP20	1	+	77614869	77614952	0.0	0.0	0.0
ENSG00000237872	POU5F1P4	1	+	155433178	155434262	0.0	0.0	0.0

susannasiebert · 2024-08-26T15:23:20Z

For the first error, I suspect that one of the dependencies you have installed is incompatible with gtfparse. Here are the versions in our Docker images:

gtfparse==1.3.0
numpy==1.26.1
pandas==2.1.1
pysam==0.22.0
python-dateutil==2.8.2
pytz==2023.3.post1
six==1.16.0
testfixtures==7.2.0
tzdata==2023.3
vatools==5.1.0
vcfpy==0.12.3

Try downgrading your dependencies to match these versions.

susannasiebert · 2024-08-26T15:29:15Z

I'm unable to reproduce the error in your second comment with the stringtie output provided. Can you please attach all of your input files (VCF and full stringtie TSV)?

lukaas33 · 2024-08-26T15:30:43Z

Hi @susannasiebert, can I send these to you privately to maintain privacy of this data?

susannasiebert · 2024-08-26T17:31:30Z

Yes, absolutely. My email is susanna.kiwala@wustl.edu

susannasiebert · 2024-08-28T14:39:06Z

My apologies for the belated replies. After investigating your files, it looks like you are trying to use a gene abundance file in transcript mode. If you switch your command to

vcf-expression-annotator -o /shared_dir/temp.annotated.vcf -s sample /shared_dir/temp.vep.vcf /shared_dir/temp.abundance.tsv stringtie gene

It works without problems.

The transcript abundance file from stringtie is in gtf format while the gene abundance file is in tsv format. Mixing them up leads to unexpected errors like the one you are seeing. I've added issue #78 to add better error handling for this case.

lukaas33 · 2024-08-28T16:30:54Z

Ah, and sorry for my ignorance. But wouldn't I need to add both of them? So the tsv and gtf files.

Is there a way to do this with one command or will this always involve two steps?

Or does the transcript level expression always contain more detail?

susannasiebert · 2024-08-28T18:10:55Z

You would need to run this as two steps, unfortunately.

lukaas33 · 2024-08-28T19:36:38Z

Ah so it is recommended to also add gene expression level besides transcript expression level?

susannasiebert · 2024-08-28T20:44:46Z

Yes, the gene expression levels are, for example, used during tiering in the aggregated report.

lukaas33 · 2024-08-28T20:54:27Z

Ah in that case it may be nice to have a feature to add both files.
Or to support command line piping so that it can be done in one line and creates only one file.

griffithlab deleted a comment Aug 26, 2024

susannasiebert mentioned this issue Aug 28, 2024

Error out when trying to use a stringtie gene tsv in transcript more or a transcript gtf in gene mode #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in gtf library when trying to add stringtie output #77

Error in gtf library when trying to add stringtie output #77

lukaas33 commented Aug 26, 2024

lukaas33 commented Aug 26, 2024

lukaas33 commented Aug 26, 2024

susannasiebert commented Aug 26, 2024 •

edited

Loading

susannasiebert commented Aug 26, 2024

lukaas33 commented Aug 26, 2024

susannasiebert commented Aug 26, 2024

susannasiebert commented Aug 28, 2024

lukaas33 commented Aug 28, 2024 •

edited

Loading

susannasiebert commented Aug 28, 2024

lukaas33 commented Aug 28, 2024

susannasiebert commented Aug 28, 2024 •

edited

Loading

lukaas33 commented Aug 28, 2024

Error in gtf library when trying to add stringtie output #77

Error in gtf library when trying to add stringtie output #77

Comments

lukaas33 commented Aug 26, 2024

lukaas33 commented Aug 26, 2024

lukaas33 commented Aug 26, 2024

susannasiebert commented Aug 26, 2024 • edited Loading

susannasiebert commented Aug 26, 2024

lukaas33 commented Aug 26, 2024

susannasiebert commented Aug 26, 2024

susannasiebert commented Aug 28, 2024

lukaas33 commented Aug 28, 2024 • edited Loading

susannasiebert commented Aug 28, 2024

lukaas33 commented Aug 28, 2024

susannasiebert commented Aug 28, 2024 • edited Loading

lukaas33 commented Aug 28, 2024

susannasiebert commented Aug 26, 2024 •

edited

Loading

lukaas33 commented Aug 28, 2024 •

edited

Loading

susannasiebert commented Aug 28, 2024 •

edited

Loading