Releases: vgteam/vg
vg 1.52.0 - Bozen
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.52.0
Buildable Source Tarball: vg-v1.52.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
vg construct
now has a-A, --alt-paths-plain
option for storing IDs from the VCF instead of hash-based IDs for alt allele paths.vg call
patched so that certain problem cases no longer take forever.- Mac CI now actually installs Node
- GBZ files can now hold reference paths like
GRCh38#chr1
, with no haplotype phase number. - vg is now compatible with jq 1.7.
- Mac build should no longer fail with complaints about a missing atomic library.
- Tests should no longer fail due to odd alignments from
diff
. - Better error messages from
vg haplotypes
. - vg build process should now always use exactly one libhandlegraph
- add missing
-O
help forvg call
- vg Makefile now can take a
CXX_STANDARD
variable in. You should be able to e.g.make CXX_STANDARD=20
if you have a Protobuf/Abseil for C++20. - GCSA2 construction in
vg autoindex
rewinds to pruning if memory is too high
Updated Submodules
- kff-cpp-api
- gcsa2
- libhandlegraph
- libvgio
- vcflib
- gbwtgraph
vg 1.51.0 - Quellenhof
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.51.0
Buildable Source Tarball: vg-v1.51.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
- Giraffe can do haplotype sampling automatically if sufficient inputs are provided.
- Simplified
vg giraffe
command line help; full list of options is still available with-h
. - Diploid mode for haplotype sampling: first select N haplotypes, then choose the best pair.
- Add ref-path stubbification option -S to vg clip
vg validate
now complains about duplicate path names- vg CI expects only the allocated cores on the Gitlab runners
- vg CI Buildkit docker builds use the local Docker Hub mirror
vg convert
option--no-translation
for converting GBWTGraph to GFA directly without using the node-to-segment translation.vg rna
will not crash when adding transcripts with an intron of length 0vg paths
now supports-H
for selecting haplotype paths and-R
for selecting reference paths
Updated Submodules
- backward-cpp
- gbwtgraph
- libbdsg
vg 1.50.1 - Monopoli
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.50.1
Buildable Source Tarball: vg-v1.50.1.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
vg autoindex --workflow map
can index GFAs with many W linesvg autoindex -w map
and-w mpmap
won't enter an infinite loop when they can't write to diskGRCh38#chr1
style path names in GFA P lines should now be parseable again
Updated Submodules
gcsa2
libhandlegraph
vg 1.50.0 - Monopoli
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.50.0
Buildable Source Tarball: vg-v1.50.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
- CI test jobs now cache pulled Docker images
- GAF output should now have more correct path end positions and block lengths
- Paths that look like PanSN but aren't, due to having a non-numeric haplotype number field, will no longer be parsed, and should thus no longer produce crashes due to parsing failures.
- Haplotype sampling now copies the vg node to GFA segment translation correctly from the original graph.
vg minimizer
requires a distance index for building a minimizer index.-S
option added tovg call
to select reference paths by sample name. This is more convenient as it allows, ex-S GRCh38
to be used in place of-p GRCh38#0#chr1 -p GRCh38#0#chr2 ..
. Such selection is necessary when the graph has more than one reference sample andvg call
will now refuse to handle graphs with multiple reference samples unless paths are selected with-S
or-p
.vg filter
can filter to only mapped or only unmapped readsvg deconstruct
changed back to writing the full sample / hap/ contig name in VCF contig field. In order to just write the contig name (like in the past few versions of vg), use the new-C
option.
Updated Submodules
gbwtgraph
gcsa2
libhandlegraph
libvgio
vg 1.49.0 - Peschici
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.49.0
Buildable Source Tarball: vg-v1.49.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
- Giraffe can now use weighted minimizers, which try to avoid selecting frequent kmers as minimizes.
vg inject
can read BAM files with unmapped readsvg giraffe
now has--match
,--mismatch
,--gap-open
,--gap-extend
, and--full-l-bonus
options to control alignment scoring.- Fix crash during assertion in
vg deconstruct
on PGGB graph that was introduced in v1.48.0 :
character now allowed in path name duringcontig:start-end
range extraction from command line options (ie invg chunk
).- vg now builds with C++17 on Mac, as required by the version of Protobuf packaged in Homebrew
- vg now deduplicates arguments from pkg-config, to limit command line length with Protobuf's 30-odd Abseil dependencies.
- Better default parameters for haplotype sampling.
vg clip
crash on PackedGraphs fixed.- Mac CI now collects Homebrew debug info
- vg's CI can now run on local Gitlab runners
- CI no longer does extra Docker builds without proper caching
vg giraffe -b fast
preset now works again and is under test- Serialized mutable graphs keep proper track of the number of edges they contain
Updated Submodules
gbwtgraph
libbdsg
libvgio
vg 1.48.0 - Gallipoli
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.48.0
Buildable Source Tarball: vg-v1.48.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
vg chunk
will now report an error if asked to chunk reads that do not go with the graphvg autoindex
can construct linear reference indexes from a FASTA filevg gbwt
will now refuse to add reference sample names with#
in them, and will try and advise on what the tags are supposed to be likevg surject
can project to paths that intersect themselves in the reverse orientationvg surject
will now print warning messages when processing a read or pair takes a suspiciously long amount of time.vg giraffe
should no longer try to put hypothetical sequencing errors in empty intervals, and should report errors in MAPQ cap computation in a more debuggable way.- Crashes now include the stack trace by default; set
VG_FULL_TRACEBACK=0
to suppress it to a file. vg surject
andvg giraffe
should now include relevant read name hints when crashing in many cases.- Added
crash_unless()
as an alternative toassert()
that reports these hints. We eventually want to use it everywhere. - Crash reports now have cool hyperlinks.
vg surject
will limit itself to 200 anchors per target path segment by default; use the new-a/--max-anchors
option to control this limit. Surjection against PGGB graphs may require--max-anchors 20
to complete.vg surject
may be able to limit itself to considering only high-scoring surjections in some cases.vg construct
now properly handles the case where it is looking for the end of an inversion from 1 base before itvg construct
will no longer try and coalesce nodes at construction chunk boundaries when those nodes have alt paths that visit them or edges to their outside endpoints. This should fix some crashes and incorrect placement of structural variant breakpoints in the graph.- Update vcflib to current version plus build and parser fixes
vg construct
should now be faster when variants are extremely long and overlap each othervg chunk
now outputs PackedGraph instead of Protobuf by default (unless-T
is used). Also, output files now get the.vg
file extension for any non-GFA format (usevg stats -F
to check the underlying format of any graph).- Snarl clipping bug in
vg clip
fixed so that when there are multiple different reference traversals in a snarl (common in PGGB output), then none of them are chopped. - Fixed build against Ubuntu 22.04's pybind11
- Docker containers now have
/usr/bin/time
for profiling
Updated Submodules
htslib
vcflib
gbwtgraph
libbdsg
tabixpp
vg v1.47.0 - Ostuni
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.47.0
Buildable Source Tarball: vg-v1.47.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
vg sim
andvg stats -a
sped up for GBZ input- Giraffe now uses the watchdog to detect slow reads
vg construct
should no longer fail assertions and will instead report errors.vg construct
now handles IUPAC codes in the reference as Ns even if they are covered by symbolic structural variants- Faster haplotype sampling with
vg haplotypes
. vg stats -a
also outputs statistics on alignment scores and mapping quality.vg giraffe
should no longer crash if the distance index is read-only.vg rna
now supports the GBZ format for the input graph and haplotypes (new option--gbz-format
).vg convert
now defaults to PackedGraph instead of HashGraph if no output format selected.- New option
vg clip -s
to remove stubs (dangling nodes not on ref path) vg call
andvg deconstruct
now only apply node ID translation from GBZ inputs if new-O
is used.vg surject
will now enforce that the reads it is surjection actually were mapped against the graph you are surjecting against. Right now it checks node IDs and lengths. You can turn this off with-V
/--no-validate
.vg gbwt
now accepts a-I
/--gg-in
option, which lets you load a.gg
file and a.gbwt
file and combine them into a.gbz
graph.vg validate
now accepts a-A
/--gam-only
option which will validate only the provided alignment's agreement with the graph, and not the graph itself.- The
vg surject
/vg giraffe
error: couldn't identify a path corresponding to surjected read
error message has been improved to dump more information about the offending read and path. - When selecting paths to surject to, a warning will now be printed if the user asks for a path with a
[]
-enclosed subrange at the end. The base path name without the[]
subrange coordinates should usually be used instead, because that is the space in which the SAM/BAM output will have its coordinates specified. - The
vg surject
Graph does not have a path named
error message should now no longer print pointer values, and is extended to explain a bit more about subpaths.
Updated Submodules
The kff-cpp-api
and libbdsg
submodules have been updated.
vg 1.46.0 - Altamura
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.46.0
Buildable Source Tarball: vg-v1.46.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
- Long read Giraffe codepath now falls back to non-GBWT alignment for very long tails, which is slow but at least tends to finish
- Long read Giraffe codepath refuses to use Dozeu for tails, because the tails are very long and Dozeu will clobber the stack when given a long alignment
- More knobs have been added to long read Giraffe to tweak what inter-pre-cluster connections are sent to reseeding, and what chains are actually made into alignments
vg stats
now reports on time-usage information in GAM files if available- Wiki tutorial on programming with
libbdsg
andgbwtgraph
is now under CI test. - Rescue alignment in
vg giraffe
paired-end mode should no longer decide it rescued off of the wrong alignments - GAMP files no longer lose the "secondary" annotation when converted to GAM
- New benchmarking and read-simulating scripts for testing long-read Giraffe
- Fixed a crash in Giraffe correctness tracking in the long-read codepath due to out-of-bounds accesses into previous stages in the funnel
- Reading SAM/BAM/CRAM files into a graph (i,e,
vg inject
) will now bail out and complain if they are against haplotypes and not reference or generic paths (because positional lookup is likely to be too slow to be practical) vg inject
now defaults to the normal default number of threadsvg gamcompare
now has a-n
/--rename
options for comparing GAM files annotated with position on the same contigs but with different names.vg annotate
now uses a ReferencePathOverlayHelper to make sure it has fast access to the positions of graph nodes along paths.- vg CI now tests against sequenceTubeMap using its recommended Node version
vg rna
will no longer in certain cases skip the first line when the annotation input has a headervg rna
no longer crashes when adding splice-junction from a BED file with intronsscripts/make_pbsim_reads.sh
now works with local graph files in addition to S3 URLsscripts/lr_benchmark.sh
now downloads and uses CHM13 graphs- Chaining lookback now stops at 15 total items max
- Tail alignment with GSSW now refuses to fill more than 16 mibi-cells
- Fix off-by-1 array size bug in
vg clip
edge clipping - Faster
vg chunk
on GBZ input - Highly experimental
vg haplotypes
subcommand for sampling haplotypes based on kmer counts. vg giraffe
now preloads the distance index into memory before mapping any reads- Makefile should deal better with
protoc
not being installed - Surjecting now works for interleaved GAFs, even if one or both reads are unmapped.
Updated Submodules
The libbdsg
and gbwtgraph
submodules have been updated.
New Submodules
The kff-cpp-api
submodule has been added.
vg 1.45.0 - Alpicella
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.45.0
Buildable Source Tarball: vg-v1.45.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
vg clip -d
now removes edges not meeting the depth threshold, in addition to nodesvg paths -C
reports both undirected (visit any node twice) and directed (visit any node twice in same orientation) cyclicity- Macports-based installation instructions have been simplified
- vg Mac build now works even if
g++ --version
doesn't actually have the version string on the first line - vg no longer supports the "ODGI" format. The implementation that vg used was never compatible with the implementation in the actual
odgi
tool. If you have graphs in the old vg-odgi format, use vg 1.44.0 to convert them to another format. To exchange data with theodgi
tool, use the GFA format. - Fix
stats -p
option vg giraffe --align-from-chains
invokes a separate (still experimental) long-read-optimized codepath.vg giraffe
option parsing uses a new setup that makes it easier to add and report on options.- Giraffe funnel explanations can track correctness and placed-ness along the length of a read.
vg mpmap
has increased sensitivity for detecting splice junctionsvg autoindex
can produce graphs (included spliced graphs) from FASTAs without requiring VCFsvg autoindex
now splits the indexing forvg mpmap
from the indexing forrpvg
. The previous behavior can be recapitulated by indicating both workflows:--workflow mpmap --workflow rpvg
vg chunk
andvg trace
can now get haplotypes from an input GBZ filevg chunk
can use--no-embedded-haplotypes
to ignore haplotypes from a GBZ.
Updated Submodules
The libbdsg
and libhandlegraph
submodules have been updated.
vg 1.44.0 - Solara
Don't forget to mark the static binary executable:
chmod +x vg
Docker Image: quay.io/vgteam/vg:v1.44.0
Buildable Source Tarball: vg-v1.44.0.tar.gz
Includes source for vg and all submodules. Use this instead of Github's "Source Code" downloads; those will not build as they do not include code for bundled dependencies that the vg
build process needs.
This release includes:
- GAF output fixed to 1) no longer have consecutive deletions in the CIGAR and 2) be valid for split mappings within the same node (which can happen when mapping long reads with
vg map
) vg autoindex
can auto-tune some key indexing parameters for increased robustness.vg autoindex
no longer crashes on VCFs that contain no samples.- Add the flag
--num-bp-per-min
to Giraffe to adjust the number of selected minimizers based on the read length. - vg main GFA loader can now handle HPRC-style GFAs where the same path exists as rGFA tags and P lines.
- L-lines in GFA output changed back to have
0M
cigars (reverting switch to*
in v1.31.0) vg surject
can annotate SAM/BAM records by all of the sequences it attempted to realign tovg surject
has improved stability on cyclic graphs- Handle GBWTGraphs and GBZ graphs that do not contain a translation correctly when a translation is needed.
vg construct
will no longer fail with an assertion error aboutlast_edit_end != -1
and will instead report the variants that confused it.vg construct
will skip over and warn about variants that do not actually change anything.- vg should now only link one copy of Protobuf into the non-static build
- vg should now build against newer libomp from Homebrew which is keg-only
vg rna
now support reference transcript paths where exon boundaries are on opposite strandsvg convert --drop-haplotypes
will drop haplotype paths from the output graph.- preliminary GBZ input support in
vg deconstruct
andvg call
vg surject
,vg giraffe
,vg mpmap
,vg filter
, andvg deconstruct
now accelerate paths-on-node queries with an overlay, so working from a GBZ will no longer be quite as slow relative to working from an XG- libbdsg
PackedReferencePathOverlay
should no longer crash for short paths - Update distance index to distance index 2 to be more efficient and make clustering faster
- Minimizers now have a payload with two ints
- DI2 files will need to be rebuilt as well as minimizers that use the distance index
- New
vg chunk
option-S
can be used to extract every snarl that is fully contained in the the given path region (ie specified with-p
). This can be used instead of context steps-c
. The advantage is that-S
will return everything inside the region and nothing outside the region (barring the start and end nodes), which helps with the problem of pulling out massive amounts of neighbouring regions when jacking up-c
for complex subgraphs. The disadvantage is that if the region specified contains only parts of snarls, the results will be a misleadingly simple graph. - vg viz no longer silently fails when asked to draw a PNG that is too big for Cairo
Updated Submodules
The gbwt
, gbwtgraph
, gcsa2
, libbdsg
, libhandlegraph
, libvgio
, and xg
submodules have been updated.