Releases: KleistLab/ADAPTed
Releases · KleistLab/ADAPTed
v0.2.4
[v0.2.4] - 2024-12-09
Changed
- Major redesign of the
file_proc
module. Reads are processed in minibatches. Theadapted.file_proc.file_proc
submodule has been removed and the functionality has been moved toadapted.file_proc
.
Fixed
- Non runnable detection workflow caused by partial code updates.
Added
- A new
detect_rna_start_peak
andcombined_detect_start_peak
function that use the start peak detection method to provide a quick alternative for e.g. tRNA and custom adapter workflows. - A new
med_shift
validation workflow that uses the median shift method between adapter and pos-adapter signal to validate the detected boundaries. Does not require a polyA tail signal.
v0.2.3
[v0.2.3] - 2024-11-13
Fixed
- Mysterious runtime warnings occuring in the mvs polya workflow were identified as being due to too short signals. These signals are now filtered out before the detection workflow.
- Run time warnings during peak detection are now filtered.
Added
- A new detection workflow is added that uses a CNN to predict the boundaries of the adapter and polyA signals. The CNN method replaces the previous llr workflow as the primary detection method and provides faster detection. The CNN workflow comes with a depency on torch (
combined_detect_cnn
). - SigProcConfig now has a primary_config attribute that is set at runtime to describe the detection workflow used prior to validation.
- Introduction of a new
combined_detect_llr2
function that uses a downscaled signal for the llr workflow, with a split peak correction for the polyA trace. - The
combined_detect_cnn
function now has a fallback to a quick version of thellr
method if validation of the boundary predictions fails.
Changed
config.batch.bidx_passed/failed
is nowconfig.batch.batch_idx_pass/fail
.- default minibatch size is now 1000.
combined_detect
has been renamed tocombined_detect_llr
.ReadResult
,DetectResults
andBoundaries
have been moved toadapted.container_types
.- Preloaded signals that are shorter than the preload size are now padded with nans instead of zeros.
- Mad winsorization by imputing outliers with local medians has been changed to mad winsorization by clipping.
- The flag for setting the
num_proc
in the parser has been changed to-j
. - Introduction of the 'core' SigProcConfig section that contains parameters that are used across multiple detection methods in the code.
sig_norm_outlier_thresh
has been moved to the newcore
section, as have adapter, trace and polya min/max obs parameters.- The
llr_helpers
module has been merged into thellr
module.
Removed
- The Task attribute of the Config object is removed.
- The
save_llr_trace
parser argument is removed. - Functions for mad winsorization by imputing outliers with local medians (
impute_window_median
,mad_outlier_indices
,mad_normalize
,mad_winsor
) are removed. - The
sig_norm_winsor_window
LLRBoundariesConfig
attribute is removed.
v0.2.2
[v0.2.2] - 2024-09-10
Fixed
- Faulty pod5 files (e.g.
pod5: IOError: Invalid signature in file
) are now catched, faulty input files are skipped.
Added
- get_truncated.sh: added a new script to easily obtain the truncated reads from the detected_boundaries*.csv files.
- command.json file is now saved in the output folder and contains the command used to run ADAPTed.
- Logging: process outputs are now logged to the
adapted.log
file and to stdout. - ADAPTed now supports continuing from a previous (incomplete) run using the
continue <continue_from_path>
subcommand.
Changed
- Multiprocessing of reads now relies on a shared memory object to process the results of the reads. This allows for a more efficient use of system resources and prevents issues with broken pipe errors.
- The output directory is now named after the version of ADAPTed and a random UUID rather than the current date and time.
Removed
- The
--create_subdir
argument is removed. The output directory is now always created in the specified output folder.
v0.2.1
[v0.2.1] - 2024-08-13
Added
- Added the
--max_obs_trace
argument to the parser. This can be used to manually set the maximum number of observations to trace. Useful for rerunning with a larger value on a subset of (previouslytruncated) reads. This overrides the value in the config file. - When the
min_obs_adapter
value is detected as theadapter_end
in the llr workflow, a second round of llr gains (starting frommin_obs_adapter
) is calculated to refine theadapter_end
detection. - Detect results contain information on full read length and preload size.
- Added a
debug_logger
object in the llr workflow. - Added a
refine_max_adapter_end_adjust
parameter to the llr workflow configuration. When the suggested adjusted in the refinement is more than this value, the refinement is skipped. Default is 250. - Added a
--config
flag to the parser. This can be used to specify the latest default config TOML file.
Fixed
- Signal preload: signal preload size is now explicitly updated based on the
max_obs_trace
value after initializing the config object. - Partition stats: partition (adapter, polya, preloaded_rna) stats are now computed for the adapter based on the llr-detected boundary, and no longer becomes
None
when the mvs workflow fails in the combined detection. - setup.py: fixed the package import to correctly find all packages and sub-packages.
- setup.py: added a
config_files
folder to the package data.
Changed
- Renamed
rna
partition stats (start, len, mean, std, med, mad) topreloaded_rna
to clarify that they are computed from the preloaded RNA signal, not the full read signal. adapter_end
defaults to the value detected in the llr workflow, also when the mvs workflow fails in the combined detection (previously defaulted toNone
).preloaded_rna
partition stats areNone
in the case of truncated polya signals.- Median shift calculation in mvs workflow is now done using the median of the detected adapter signal as a whole, instead of the median of a specified window.
- The ADAPTed version is now part of the created output dirctory name.
- The parser now requires a valid configuration file (
--config
) or a chemistry (--chemistry
) to be provided. - Parameters for RNA002:
min_obs_adapter
= 2000 (previously 2500)max_obs_trace
= 12000 (previously 8000)adapter_trace_stride
= 50 (previously 40)adapter_trace_early_stop_stride
= 500 (previously 1000)polya_trace_early_stop_stride
= 50 (previously 40)polyA_local_range
= [0.0, 12.0] (previously [0.0, 10.0])