You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ASGART does not differentiate anymore between strand A and strand B,
but simply works on an arbitrarily large set of files. Thus, the
user SHOULD PROVIDE EACH FILE ONLY ONCE. Moreover, it is not
necessarily to concatenate multiple input files in a single one
anymore. This breaking change should give more flexibility to
the users and potentially simplifies pipelines.
The ASGART automaton has been rewritten from scratch to take into
account interlaced SDs at nearly no cost in computation time. For
this reason, interlaced duplication families research is now the
only and default mode.
ASGART will now remove large expanses of nucleotides to ignore (Ns
and/or masked ones) in processed strands, thus slightly improving
performances.
Taking advantage of these new features, the parallelization system
has been rewritten to (i) introduce parallelism at the scale of the
automaton; and (ii) make use of the “natural” aforementioned
breakpoints as delimiters for chunks to process in parallel. By
doing so, it is guaranteed (i) that no duplication families that
would be situated between two chunks will be missed; (ii) that
ASGART will make use of available cores even when processing less
chunks than authorized threads.
ASGART will now make use of the trimming feature to reduce memory
consumption. The suffix array will be built only for the trimmed
part, instead than for the whole input. The whole input will then be
compared to the trimmed part, contrary to what happened in version
1.x. Such an arrangement sacrifice some CPU power in exchange of a
strongly reduced memory consumption when processing trimmed inputs.
It can be used to process large sequences by trimming them in
several consecutive subsequences, then mergin the results later on.
The JSON and GFF3 output formats have been modified to reflect the
duplication families clustering. Please note that they are thus
incompatible with previous versions JSON files.
A new tool asgart-concat has been added to safely concatenate JSON
files resulting from partial runs on the same dataset. Its intended
use is to easily merge the results from multiple runs on the same
dataset with different settings, e.g. direct & palindromic
duplications or if the workload was divided in multiple sub-jobs
using trimming.
Plotting utilities have been modified to reflect these changes.
The automaton will progressively grow the maximal gap size when
extending large duplications, thus letting larger duplications arms
be found in a less fragmented way.
The logging system has been improved to be more detailed and more
coherent in its way to present informations.
Minor technical issues have been resolved: ASGART will correctly
only use the ID field of FASTA files and not the subsequent
informations; the progress bar does not glitch anymore.