-
Notifications
You must be signed in to change notification settings - Fork 10
/
ChangeLog
180 lines (154 loc) · 7.35 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
Unreleased
* logging: use 'SemiBin2' as logger name
* SemiBin: always log to file in DEBUG level and log command-line arguments
* coverage: Better error messages (#168)
Version 2.1.0 Mar 6 2024 by BigDataBiology
* SemiBin: Support running SemiBin with strobealign-aemb
(--abundance/-a)
* citation: Add citation subcommand
* SemiBin1: Introduce separate SemiBin1 command
* internal: Code simplification and refactor
* deprecation: Deprecate --orf-finder=fraggenescan option
* Update abundance normalization
* SemiBin: do not use more processes than can be taken advantage of (#155)
Version 2.0.2 Oct 31 2023 by BigDataBiology
* multi_easy_bin: Fix multi_easy_bin with --write-pre-recluster (#128)
Version 2.0.1 Oct 21 2023 by BigDataBiology
* train_self: Fix bug with --mode
* concatenate_fasta: Fix bug with compression
* bin_short: Make alias work
Version 2.0.0 Oct 20 2023 by BigDataBiology
* SemiBin: Better error checking throughout
* SemiBin: Write a log file
* concatenate_fasta: support compression
* concatenate_fasta: slightly better error message when contig ID already
contains separator
* SemiBin: add `bin_short` as alias for `bin`
Version 1.5.1 (SemiBin2 beta) Mar 7 2023 by BigDataBiology
* Bugfix: using --no-recluster with multi_easy_bin (#128)
Version 1.5.0 (SemiBin2 beta) Jan 17 2023 by BigDataBiology
* Add `SemiBin2` script
* Added naive ORF finder
* Add `--prodigal-output-faa` argument (#113)
* Make command line arguments more flexible for --sequencing-type argument
* Argument checking is more exhaustive instead of exiting at first error
* Add `--quiet` argument
* Add `--compression` option
* Add `--tag-output` option
* Better `--help` (group required arguments separately)
* Make SemiBin.main.main2 callable with a list of arguments
* Add contig -> bin mapping table (#123)
Version 1.4.0 Dec 15 2022 by BigDataBiology
* Provide binning algorithm for assemblies from long read
* Add `--allow-missing-mmseqs2` flag to `check_install` subcommand
* Run Prodigal in multiple jobs without multiprocessing (#106)
* Better command line arguments
* Better error checking
Version 1.3.1 Dec 9 2022 by BigDataBiology
* Make `--training-type` argument optional
Version 1.3.0 Nov 4 2022 by BigDataBiology
* Add self-supervised learning
* Fix output table to contain correct paths
* Accept `--epochs` as argument in the command line (previously it was
spelled `--epoches`
Version 1.2.0 Oct 19 2022 by BigDataBiology
* Pretrained model from chicken caecum
* Output table with basic information on bins (including N50 & L50)
* When reclustering is used (default), output the unreclusted bins into a
directory called `output_prerecluster_bins`
* Added --verbose flag and silenced some of the output when it is not used
* Use coloredlogs (if package is available)
Version 1.1.1 Sep 27 2022 by BigDataBiology
* Completely remove use of atomicwrites package
Version 1.1.0 Sep 21 2022 by BigDataBiology
* Fix bug when atomicwrite on the network file system (#97)
* support .cram format input (#104)
* Support using depth file from Metabat2 (#103)
* Remove torch version restriction (and test on Python 3.10)
* Better output message when no bins are produced
* More flexible specification of prebuilt models (case insensitive,
normalize - and _)
* Support CRAM input (#104)
* Better output message when no bins are produced
* Remove torch version restriction (and test on Python 3.10)
* Support filesystems that do allow you to call fsync on directories (#97)
Version 1.0.3 Wed Aug 3 2022 by BigDataBiology
* Fix coverage parsing when value is not an integer (#103)
* Fix multi_easy_bin with taxonomy file given on the command line
Version 1.0.2 Fri 8 Jul 2022 by BigDataBiology
* Fix issue #93 more thoroughly (see #101)
Version 1.0.1 Mon 9 May 2022 by BigDataBiology
* Fix edge case when calling prodigal with more threads than contigs (#93)
Version 1.0.0 Fri 29 Apr 2022 by BigDataBiology
* More balanced file split when calling prodigal in parallel
* Fix bug when long stretches of Ns are present (#87)
* Better error messages (#90 & #91)
Version 0.7.0 Wed 3 Mar 2022 by BigDataBiology
* Improve `check_install` command by printing out paths and correctly
handling optionality of FragGeneScan/prodigal
* Reuse markers.hmmout to make the training from several samples faster
* Add option `--tmpdir` to set temporary directory
* Substitute FragGeneScan with Prodigal (FragGeneScan can still be used
with `--orf-finder` parameter)
* Add 'concatenate_fasta' command to combine fasta files for multi-sample binning
Version 0.6.0 Mon 7 Feb 2022 by BigDataBiology
* Provide pretrained models from soil, cat gut, human oral,
pig gut, mouse gut, built environment, wastewater and global (training
from all environments).
* Add `check_install` command and run `check_install` before easy* command
* The user can now specify a pre-computed contig annotation table in
mmseqs format
* Fix bug with non-standard characters in sample names (#68)
* Better subcommand names (`generate_sequence_features_*` and
`generate_cannot_links`)
Version 0.5.0 Fri Jan 7 2022 by BigDataBiology
* Faster `SemiBin --version`
* Lower memory usage and faster speed for `bin` subcommand
* Reclustering is now the default (due to improved speed). Added
`--no-recluster` option to disable it
* Output of bedtools is now processed as a stream instead of using a
(potentially large) intermediate file
* Fix bug with --min-len. Previously, only contigs greater than the given
minimal length were used (instead of greater-equal to the minimal length).
* Fix bugs downloading GTDB
* Respect $XDG_CACHE_DIR if set
* Implement CACHEDIR.TAG protocol for the SemiBin cache directory
Version 0.4.0 Tue Oct 10 2021 by BigDataBiology
* Add support for .xz FASTA files as inputs
* Removed BioPython dependency
* Fixed bug in FASTA unzipping
* Fixed bug in multi-sample data splitting
Version 0.3.0 Mon Aug 9 2021 by BigDataBiology
* Support training from several samples
* Remove `output_bin_path` if `output_bin_path` exists
* Make several internal parameters configuable (1) minimum length of
contigs to bin (`--min-len` parameter); (2) minimum length of contigs to
break up in order to generate _must-link_ constraints (`--ml-threshold`
parameter); (3) the ratio of the number of base pairs of contigs between
1000-2500 bp smaller than this value, the minimal length will be set as
1000bp, otherwise 2500bp
* Add `-p` argument for `predict_taxonomy` mode
* Fix `np.concatenate` warning
* Remove redundant matrix when clustering
* Better pretrained models
* Faster calculating dapth using Numpy
* Use correct number of threads in `kneighbors_graph()`
* Respect number of threads (`-p` argument) when training (issue 34)
Version 0.2.0 Thu May 27 2021 by BigDataBiology
* Change name to `SemiBin`
* Add support for training with several samples
* Test with Python 3.9
* Download mmseqs database with `--remove-tmp-file 1`
* Better output names
* Fix bugs when paths have spaces
* Fix installation issues by listing all the dependencies
* Add `download_GTDB` command
* Add `--no-recluster` option
* Add `--environment` option
* Add `--mode` option
* All around more robust code by including more error checking & testing
* Better built-in models
Version 0.1.1 Sun 21 Mar 2021 by BigDataBiology
* Fix bug with --minfasta-kbs
Version 0.1.0 Sun 21 Mar 2021 by BigDataBiology
* First (alpha) release