Skip to content

Releases: sourmash-bio/sourmash_plugin_directsketch

v0.4.0

04 Oct 18:58
b1afbcd
Compare
Choose a tag to compare

This release introduces two new parameters:

  • --checksum-failures - an output file to log any failures with the checksum file download and parsing or any md5sum mismatches. Required for gbsketch
  • --batch-size - enables writing smaller, batched zipfiles. This is recommended for large database generation, as batches allow restart after unexpected failure. It also should address some issues arising from extremely large zips.

Under the hood, this release also introduces a standardized sketching building framework that may be useful outside of this plugin.

What's Changed

Dependabot

Full Changelog: v0.3.2...v0.4.0

v0.3.2

14 Jun 21:09
81242ac
Compare
Choose a tag to compare

What's Changed

Dependabot

New Contributors

  • @ctb made their first contribution in #52

Full Changelog: v0.3.1...v0.3.2

v0.3.1

21 May 07:10
ef97067
Compare
Choose a tag to compare
  • fixes URL formatting bug in failure output
  • adds new urlsketch command
  • changes failure output format for both gbsketch, urlsketch. The new header is: accession,name,moltype,md5sum,download_filename,url, which matches the urlsketch input format.

What's Changed

Dependabot and version updates

Full Changelog: v0.3.0...v0.3.1

v0.3.0

13 May 23:09
Compare
Choose a tag to compare

This release fixes a bug where the wrong version may be downloaded #27.

The input format has changed slightly! Required columns are now: accession,name,ftp_path. ftp_path column name must be present, but column can be empty.

  • if ftp_path is provided, it is used as the path for finding files associated with the accession. Otherwise, gbsketch will build the ftp_path from the accession.

What's Changed

  • optionally use ftp_path input for gbsketch by @bluegenes in #29
  • prevent unneccesary downloads by also setting genomes-only/proteomes-only via params if not keeping fastas by @bluegenes in #30
  • do not require signature output file if not sketching by @bluegenes in #31

Full Changelog: v0.2.3...v0.3.0

v0.2.3

10 May 03:40
872133b
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.2...v0.2.3

v0.2.2

09 May 17:05
e1fa2fa
Compare
Choose a tag to compare

Bugfix Release

  • fix a bug where md5sum file error caused directsketch to hang

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.2.2

v0.2.1

08 May 22:01
dc9e256
Compare
Choose a tag to compare

What's Changed

  • changed progress reporting back from 5% --> 1%; adjusted to reflect start times better
  • remove interval delay by @bluegenes in #16

Full Changelog: v0.2.0...v0.2.1

v0.2.0

08 May 18:29
Compare
Choose a tag to compare

Major changes:

  • #8 - actually use tokio threading, fully asynchronous file downloading + writing
  • #9 - download md5sums and check them prior to sketching
  • #14 - make sure we return an error if the md5sum can't be downloaded (rather than just continuing)
  • #15 - safer tokio thread/runtime setting while still allowing pytest to run multiple iterations at once

Benchmarking shows this structure is much faster

software/version command acc details time max RAM
v0.1.0 gbsketch 9 fungal 6min 156 MB
main (v0.2.0) gbsketch 9 fungal 10s 156 MB
v0.1.0 gbsketch 49 fungal 58min 1.5 GB
main (v0.2.0) gbsketch 49 fungal 1min 26s 1.6GB
main(v0.2.0) gbsketch 243 fungal 4min 1.16GB

What's Changed

Full Changelog: v0.1.0...v0.2.0

v0.1.0

01 May 22:20
Compare
Choose a tag to compare

Initial Release