- Formally adopted the NumPy + Numba solution in the ordinal mapper. This significantly accelerated the algorithm (#209).
- Accelerated command-line interface loading (#215).
- Changed default output subject coverage (
--outcov
) coordinates into BED-like (0-based, exclusive end). The output can be directly parsed by programs like bedtools. Also added support for GFF-like and other custom formats, as controled by paramter--outcov-fmt
(#204 and #205). - Default chunk size is now 1024 for plain and range mapeprs, and 2 ** 20 = 1048576 for ordinal mapper. The latter represents the number of valid query-subject pairs (#209).
- Updated packaging method (#215).
- Improved performance moderately (#192).
- Parameter
--chunk
is now the number of unique query sequences instead of the number of lines (#192). - Updated GitHub Actions workflow.
- Added parameter
-x|--exclude
, which will exclude query sequences that are mapped to given reference sequences (such as host genome, spike-in, vector, etc.) (#192). - Added support for interleaved paired-end SAM files (#191).
- Added native support for PAF file format (#182).
- Updated hyperlinks in documentation.
- Modified the command-line interface. Sub-commands under the
tools
menu were raised to the main menu. For example,woltka tools collapse
now becomeswoltka collapse
(#176). Thetools
menus is still kept for backward compatibility (#177). - Improved efficiency of handling BIOM tables (#171, #175). This significantly reduced the memory consumption and runtime of the
collapse
command. - The
--trim-sub
parameter now takes underscore (_
) as the default separator, if not explicitly specified (#173). - Updated installation protocol. Now Woltka can be Conda-installed without explicit pre-installation of biom-format (#174).
- Upgraded the
collapse
command. Now it can collapse using internal hierarchies of feature IDs, such as genome-gene pairs, taxonomic lineages, and EC numbers. This upgrade enables stratified taxonomic/functional analysis of coord-matching ORF tables, without explicitly stratifying them during the classification step (which otherwise is time and space-consuming). The output should be identical to the that of the old method (#173). - Added a feature to the
normalize
command, which can extract gene lengths from a gene coordinates file. This enables convenient length-based normalization of an existing gene (ORF) profile. For example, it can convert raw frequencies into RPK (#179). - Extended description of stratification and collapsing protocols in the documentation.
- Fixed a bug in parsing sample ID lists (#164).
- Used a single integer to store gene and read information in the ordinal mapper; use bitwise operations to parse the information. This significantly reduced memory consumption (#142).
- Improved alignment file processing efficiency (#153).
- Updated installation protocol. Currently a single
pip install
command should suffice (#145). - Updated GitHub Actions test from Python 3.6 to Python 3.8. The program itself should continue to support Python 3.6+ (#158).
- Added a naive algorithm for read-gene matching when the number of reads is small. This improves speed (#148).
- Added support for stdin as input. This lets the program take a variety of previously unsupported alignment formats (#155).
- Added or updated several pieces of documentation, such as a RefSeq tutorial (#157).
- Experimentally added a much accelerated ordinal mapper, powered by NumPy and Numba (branch
numba
) (#152).
- Migrated from Travis CI to GitHub Actions (#127).
- Made
--map-as-rank
default when only mapping file(s) are provided (#132). - Renamed
--normalize|-z
as--frac|-f
(#128). - Modified core algorithm which slightly improved performance (#124).
- Added
tool normalize
command, with multiple features (#124). - Added the feature to collapse a stratified table (#126).
- Created an WoL FTP server, and added link to it (#118).
- Added an WoL standard operating procedure (
wolsop.sh
) and documentation (#116). - Added first citation of Woltka (#111).
- Added protocols for Bowtie2 / SHOGUN and Fastp (#121).
- Added discussion about mapping uniqueness (#131).
- Fixed free-rank classification subject not found issue (#120).
- Corrected paths to example files and directories (#117).
- Published at PyPI. Can be installed by
pip install woltka
(#108). - Added instructions for using MetaCyc and KEGG (#99, #101).
- Added
tool collapse
command, which supports one-to-many classification (#99).
- Fixed Handling of zero length alignment (#105).
- First official release.