You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got the error below when running picard liftover on the bravo-dbsnp-all.vcf.gz (GRCh38 => GRCh37).
I'm reposting this because, even though the user closed broadinstitute/gatk#4761 after they saw that bgzipping + tabixing the input vcf avoids the error, I think there are 2 issues which may not have been fixed yet(?)
if TribbleIndexedFeatureReader stops in the middle of a line, it causes a cryptic parsing error that's hard to connect to lack of tabix index
in the rare event that it stops at the end of a line, it would presumably silently continue and result in truncated output that users may not notice, and that's even harder to debug.
Exception in thread "main" htsjdk.tribble.TribbleException: Line 998904: there aren't enough columns for line 1 4723990 TOPMed_freeze_5?chr1:4,723,9 (we expected 9 tokens, and saw 3 ), for input source: file:///Users/weisburd/code/bravo/bravo-dbsnp-all.removed_chr_prefix.vcf.gz
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:281)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:262)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:64)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:365)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:346)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:307)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:324)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
Subject of the issue
The full logs are:
++ i=
++ java -Xmx50G -jar picard.jar LiftoverVcf R=hg19.fa C=hg38ToHg19.over.chain.remove_chr_prefix.gz I=bravo-dbsnp-all.removed_chr_prefix.vcf.gz O=bravo-dbsnp-all.removed_chr_prefix.liftunder_grch19.vcf.gz REJECT=bravo-dbsnp-all.removed_chr_prefix.rejected.vcf.gz TMP_DIR=./tmp VALIDATION_STRINGENCY=LENIENT WARN_ON_MISSING_CONTIG=true WRITE_ORIGINAL_POSITION=true
++ tee .log
09:33:54.465 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/weisburd/code/bravo/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib
[Thu Jul 19 09:33:54 EDT 2018] LiftoverVcf INPUT=bravo-dbsnp-all.removed_chr_prefix.vcf.gz OUTPUT=bravo-dbsnp-all.removed_chr_prefix.liftunder_grch19.vcf.gz CHAIN=hg38ToHg19.over.chain.remove_chr_prefix.gz REJECT=bravo-dbsnp-all.removed_chr_prefix.rejected.vcf.gz WARN_ON_MISSING_CONTIG=true WRITE_ORIGINAL_POSITION=true TMP_DIR=[./tmp] VALIDATION_STRINGENCY=LENIENT REFERENCE_SEQUENCE=hg19.fa LOG_FAILED_INTERVALS=true LIFTOVER_MIN_MATCH=1.0 ALLOW_MISSING_FIELDS_IN_HEADER=false TAGS_TO_REVERSE=[AF] TAGS_TO_DROP=[MAX_AF] VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Thu Jul 19 09:33:54 EDT 2018] Executing as weisburd@wm4d1-03b on Mac OS X 10.12.6 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12; Deflater: Intel; Inflater: Intel; Picard version: 2.17.4-1-gb63d73e-SNAPSHOT
INFO 2018-07-19 09:33:54 LiftoverVcf Loading up the target reference genome.
INFO 2018-07-19 09:34:04 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180127-180133 failed to match chain 816 because intersection length 6 < minMatchSize 7.0 (0.85714287 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180221-180227 failed to match chain 816 because intersection length 2 < minMatchSize 7.0 (0.2857143 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180221-180227 failed to match chain 977 because intersection length 5 < minMatchSize 7.0 (0.71428573 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180801-180808 failed to match chain 23749811 because intersection length 3 < minMatchSize 8.0 (0.375 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180801-180808 failed to match chain 2410 because intersection length 3 < minMatchSize 8.0 (0.375 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180802-180808 failed to match chain 23749811 because intersection length 3 < minMatchSize 7.0 (0.42857143 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180802-180808 failed to match chain 2410 because intersection length 2 < minMatchSize 7.0 (0.2857143 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180803-180804 failed to match chain 2410 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180803-180808 failed to match chain 23749811 because intersection length 3 < minMatchSize 6.0 (0.5 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180803-180808 failed to match chain 2410 because intersection length 1 < minMatchSize 6.0 (0.16666667 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180803-180809 failed to match chain 23749811 because intersection length 4 < minMatchSize 7.0 (0.5714286 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180803-180809 failed to match chain 2410 because intersection length 1 < minMatchSize 7.0 (0.14285715 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180804-180806 failed to match chain 23749811 because intersection length 1 < minMatchSize 3.0 (0.33333334 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180863-180869 failed to match chain 7666873 because intersection length 1 < minMatchSize 7.0 (0.14285715 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180863-180869 failed to match chain 23749811 because intersection length 6 < minMatchSize 7.0 (0.85714287 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180863-180875 failed to match chain 7666873 because intersection length 7 < minMatchSize 13.0 (0.53846157 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180863-180875 failed to match chain 23749811 because intersection length 6 < minMatchSize 13.0 (0.46153846 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180868-180869 failed to match chain 7666873 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180868-180869 failed to match chain 23749811 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180868-180870 failed to match chain 7666873 because intersection length 2 < minMatchSize 3.0 (0.6666667 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180868-180870 failed to match chain 23749811 because intersection length 1 < minMatchSize 3.0 (0.33333334 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180868-180874 failed to match chain 7666873 because intersection length 6 < minMatchSize 7.0 (0.85714287 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180868-180874 failed to match chain 23749811 because intersection length 1 < minMatchSize 7.0 (0.14285715 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180868-180880 failed to match chain 7666873 because intersection length 12 < minMatchSize 13.0 (0.9230769 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180868-180880 failed to match chain 23749811 because intersection length 1 < minMatchSize 13.0 (0.07692308 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180993-181004 failed to match chain 2576 because intersection length 9 < minMatchSize 12.0 (0.75 < 1.0)
INFO 2018-07-19 09:34:06 LiftOver Interval 1:180993-181004 failed to match chain 2410
...
...
...
INFO 2018-07-19 09:34:13 LiftOver Interval 1:1663380-1663401 failed to match chain 2 because intersection length 1 < minMatchSize 22.0 (0.045454547 < 1.0)
INFO 2018-07-19 09:34:13 LiftOver Interval 1:1663399-1663409 failed to match chain 2 because intersection length 7 < minMatchSize 11.0 (0.6363636 < 1.0)
INFO 2018-07-19 09:34:22 LiftOver Interval 1:2767665-2767666 failed to match chain 4680 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)
INFO 2018-07-19 09:34:22 LiftOver Interval 1:2778770-2778774 failed to match chain 2 because intersection length 3 < minMatchSize 5.0 (0.6 < 1.0)
INFO 2018-07-19 09:34:22 LiftOver Interval 1:2778772-2778776 failed to match chain 2 because intersection length 1 < minMatchSize 5.0 (0.2 < 1.0)
INFO 2018-07-19 09:34:23 LiftOver Interval 1:2970389-2970390 failed to match chain 2 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)
[Thu Jul 19 09:34:30 EDT 2018] picard.vcf.LiftoverVcf done. Elapsed time: 0.60 minutes.
Runtime.totalMemory()=6373244928
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.tribble.TribbleException: Line 998904: there aren't enough columns for line 1 4723990 TOPMed_freeze_5?chr1:4,723,9 (we expected 9 tokens, and saw 3 ), for input source: file:///Users/weisburd/code/bravo/bravo-dbsnp-all.removed_chr_prefix.vcf.gz
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:281)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:262)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:64)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:365)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:346)
at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:307)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:324)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:269)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:108)
Your environment
$ java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
$ java -Xmx50G -jar picard.jar LiftoverVcf --version
2.17.4-1-gb63d73e-SNAPSHOT
happens on latest MacOS and CentOS
Steps to reproduce
Tell us how to reproduce this issue. If possible, include a short code snippet to demonstrate the problem.
I haven't looked through the htsjdk code, but it looks like if there's no .tbi index, the TribbleReader will return lines until it hits some threshold. It then stops & returns incomplete data.
I double checked the input .vcf with a python script and confirmed there's nothing unusual about the line it errored on - the file itself isn't corrupt/truncated.
Generating a tabix index prevents the error.
The text was updated successfully, but these errors were encountered:
I believe that this is a picard issue and should be reported there. The gatk issue that you mention is closed because it was already fixed with the update of htsjdk (version >= 2.14.2 have the fix).
You are using a snapshot of picard (concretely, version 2.17.4-1-gb63d73e-SNAPSHOT, as reported in your logs) and the htsjdk version was updated with 2.17.8. Updating picard should fix this problem. I recommend to use the latest version (2.18.9 at this moment).
If the problem persists after update it, please report it here.
Verify
This is the same Issue as
broadinstitute/gatk#4761
I got the error below when running picard liftover on the bravo-dbsnp-all.vcf.gz (GRCh38 => GRCh37).
I'm reposting this because, even though the user closed broadinstitute/gatk#4761 after they saw that bgzipping + tabixing the input vcf avoids the error, I think there are 2 issues which may not have been fixed yet(?)
Subject of the issue
The full logs are:
Your environment
Steps to reproduce
Tell us how to reproduce this issue. If possible, include a short code snippet to demonstrate the problem.
Run
on the vcf from https://bravo.sph.umich.edu/freeze5/hg38/download
Actual behaviour
I haven't looked through the htsjdk code, but it looks like if there's no .tbi index, the TribbleReader will return lines until it hits some threshold. It then stops & returns incomplete data.
I double checked the input .vcf with a python script and confirmed there's nothing unusual about the line it errored on - the file itself isn't corrupt/truncated.
Generating a tabix index prevents the error.
The text was updated successfully, but these errors were encountered: