Skip to content

bcftools release 1.17:

Compare
Choose a tag to compare
@daviesrob daviesrob released this 21 Feb 14:31
· 297 commits to develop since this release
1.17

Download the source code here: bcftools-1.17.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)

Changes affecting the whole of bcftools, or multiple commands:

  • The -i/-e filtering expressions

    • Error checks were added to prevent incorrect use of vector arithmetics. For example, when evaluating the sum of two vectors A and B, the resulting vector could contain nonsense values when the input vectors were not of the same length. The fix introduces the following logic:

      • evaluate to C_i = A_i + B_i when length(A)==B(A) and set length(C)=length(A)
      • evaluate to C_i = A_i + B_0 when length(B)=1 and set length(C)=length(A)
      • evaluate to C_i = A_0 + B_i when length(A)=1 and set length(C)=length(B)
      • throw an error when length(A)!=length(B) AND length(A)!=1 AND length(B)!=1
    • Arrays in Number=R tags can be now subscripted by alleles found in FORMAT/GT. For example,
      FORMAT/AD[GT] > 10 .. require support of more than 10 reads for each allele
      FORMAT/AD[0:GT] > 10 .. same as above, but in the first sample
      sSUM(FORMAT/AD[GT]) > 20 .. require total sample depth bigger than 20

  • The commands consensus -H and +split-vep -H

    • Drop unnecessary leading space in the first header column and newly print #[1]columnName instead of the previous # [1]columnName (#1856)

Changes affecting specific commands:

  • bcftools +allele-length

    • Fix overflow for indels longer than 512bp and aggregate alleles equal or larger than that in the same bin (#1837)
  • bcftools annotate

    • Support sample reordering of annotation file (#1785)

    • Restore lost functionality of the --pair-logic option (#1808)

  • bcftools call

    • Fix a bug where too many alleles passed to -C alleles via -T caused memory corruption (#1790)

    • Fix a bug where indels constrained with -C alleles -T would sometimes be missed (#1706)

  • bcftools consensus

    • BREAKING CHANGE: the option -I, --iupac-codes newly outputs IUPAC codes based on FORMAT/GT of all samples. The -s, --samples and -S, --samples-file options can be used to subset samples. In order to ignore samples and consider only the REF and ALT columns (the original behavior prior to 1.17), run with -s - (#1828)
  • bcftools convert

    • Make variantkey conversion work for sites without an ALT allele (#1806)
  • bcftool csq

    • Fix a bug where a MNV with multiple consequences (e.g. missense + stop_gained) would report only the less severe one (#1810)

    • GFF file parsing was made slightly more flexible, newly ids can be just XXX rather than, for example, gene:XXX

    • New gff2gff perl script to fix GFF formatting differences

  • bcftools +fill-tags

    • More of the available annotations are now added by the -t all option
  • bcftools +fixref

    • New INFO/FIXREF annotation

    • New -m swap mode

  • bcftools +mendelian

    • The +mendelian plugin has been deprecated and replaced with +mendelian2. The function of the plugin is the same but the command line options and the output format has changed, and for this was introduced as a new plugin.
  • bcftools mpileup

    • Most of the annotations generated by mpileup are now optional via the -a, --annotate option and add several new (mostly experimental) annotations.

    • New option --indels-2.0 for an EXPERIMENTAL indel calling model. This model aims to address some known deficiencies of the current indel calling algorithm, specifically, it uses diploid reference consensus sequence. Note that in the current version it has the potential to increase sensitivity but at the cost of decreased specificity.

    • Make the FS annotation (Fisher exact test strand bias) functional and remove it from the default annotations

  • bcftools norm

    • New --multi-overlaps option allows to set overlapping alleles either to the ref allele (the current default) or to a missing allele (#1764 and #1802)

    • Fixed a bug in -m - which does not split missing FORMAT values correctly and could lead to empty FORMAT fields such as :: instead of the correct :.: (#1818)

    • The --atomize option previously would not split complex indels such as C>GGG. Newly these will be split into two records C>G and C>CGG (#1832)

  • bcftools query

    • Fix a rare bug where the printing of SAMPLE field with query was incorrectly suppressed when the -e option contained a sample expression while the formatting query did not. See #1783 for details.
  • bcftools +setGT

    • Add new --new-gt X option (#1800)

    • Add new --target-gt r:FLOAT option to randomly select a proportion of genotypes (#1850)

    • Fix a bug where -t ./x mode was advertised as selecting both phased and unphased half-missing genotypes, but was in fact selecting only unphased genotypes (#1844)

  • bcftools +split-vep

    • New options -g, --gene-list and --gene-list-fields which allow to prioritize consequences from a list of genes, or restrict output to the listed genes

    • New -H, --print-header option to print the header with -f

    • Work around a bug in the LOFTEE VEP plugin used to annotate gnomAD VCFs. There the LoF_info subfield contains commas which, in general, makes it impossible to parse the VEP subfields. The +split-vep plugin can now work with such files, replacing the offending commas with slash (/) characters. See also Ensembl/ensembl-vep#1351

    • Newly the -c, --columns option can be omitted when a subfield is used in -i/-e filtering expression. Note that -c may still have to be given when it is not possible to infer the type of the subfield. Note that this is an experimental feature.

  • bcftools stats

    • The per-sample stats (PSC) would not be computed when -i/-e filtering options and the -s - option were given but the expression did not include sample columns (1835)
  • bcftools +tag2tag

    • Revamp of the plugin to allow wider range of tag conversions, specifically all combinations from FORMAT/GL,PL,GP to FORMAT/GL,PL,GP,GT
  • bcftools +trio-dnm2

    • New -n, --strictly-novel option to downplay alleles which violate Mendelian inheritance but are not novel

    • Allow to set the --pn and --pns options separately for SNVs and indels and make the indel settings more strict by default

    • Output missing FORMAT/VAF values in non-trio samples, rather than random nonsense values

  • bcftools +variant-distance

    • New option -d, --direction to choose the directionality: forward, reverse, nearest (the default) or both (#1829)