Releases: pdimens/harpy
Releases · pdimens/harpy
1.15.0
New
- Quarto has replaced RMarkdown/Flexdashboard
- no changes for the user to worry about, but the reports will look a little different
- NXX plots for phasing report
- Introduced new scripts for development installation using Conda and Pixi
- Harpy's printing to console during runtime is sleeker now
Internal
- Streamlined Snakemake command execution for the different workflows
- Improved logging and error handling in various modules
molecule_coverage.py
now uses a sqlite3 backend, which dramatically reduces the amount required RAM- Refactored a few Snakemake workflow files
Bug Fixes
- Small bug reporting the wrong value for one of the valueboxes
1.14.3
Bugs fixed
- return missing haplotag barcode script that went missing after squashing commits and broke demuxing
Changed
- added rule priority for some workflows so they prioritize creating the output files over calculting metrics and writing reports
- this means that, for example,
align bwa
will prioritize creating all the output bam files, rather than running a single sample through everything
- this means that, for example,
Full Changelog: 1.14.2...1.14.3
1.14.2
1.14.1
Never too proud to admit I was wrong. I didnt wan't downsample
to be a snakemake workflow, but with the increased complexity of what I wanted it to do, I found myself writing an increasingly complex python script that was essentially doing all the stuff Snakemake was doing. So:
New
- Introduced a command-line utility for extracting barcodes from SAM/BAM files
- Enhanced phasing statistics reporting with new metrics (N50, N75, N90)
LRez
is now part of the main Harpy installation and accessible to the user- adapter removal in the
qc
module accepts an argument now, one of:auto
for automatic adapter detection- a FASTA file of adapters
Changed
- Downsampling is now a snakemake workflow
downsample
handles invalids in a much more intuitive (and sensible) way
Full Changelog: 1.14...1.14.1
1.14
New
- added a convenience script
separate_singletons
to split a bam file into singletons and nonsingletons harpy downsample
module to downsample FASTQ/BAM by barcodes
Breaking changes
- singletons are now calculated such that both reads of a paired-end read only counts as "one read" for a barcode
- which means unpaired reads now contribute properly to this value
- overall, this is a more accurate way of calculating this metric
Fixes
separate_validbx
has a usage change, which is breaking, however this script is not used by any of the workflows so there should be no appreciable difference- alignment reports have text that clarifies which math is for non-singletons
multiplex
reads (aka reads that arent linked-read singletons) are now just referred to asnon-singletons
1.13
New Features
- new
view
command to view workflow log, snakefile, or configuration file. - conda environment recipes are now stored in
outdir/workflow/envs
for more self-contained workflow directories- also improves workflow-specific troubleshooting
Breaking Changes
stitchparams
has been renamedimputeparams
Internal
- improved handling of conda environments across various commands, allowing for better configuration and dependency management.
- Updated environment directory paths for better organization and clarity across all workflows
- local simuG replaced with conda installation
- Removed dependency on the
simuG.pl
script for several simulation workflows, streamlining the execution process - rename rules and better directory structure for
simulate variants
- Removed dependency on the
Bug Fixes
- Improved regular expression handling in file processing to enhance clarity and prevent issues.
- Corrected typos in
align_stats.Rmd
and routines for handling no valid barcodes
Issues and PRs
- add harpy view by @pdimens in #166
- rebase with harpy view by @pdimens in #167
- swap simuG to conda-based install by @pdimens in #168
Full Changelog: 1.12...1.13
1.12
What's new (and important)
simulate linkedreads
now supports and defaults to haplotagging barcodes
- 84 million barcode options instead of 14m
- support for barcodes of any length, not just the 10X 16bp
- barcode sequencing error has been removed because you're ultimately interested in the linked read data, not the sequencing nuances
Internal
HaploSim.pl
(formerly LRSIM_harpy.pl) focuses solely on creating linked reads from provided haplotypes- output names for
simulate linkedreads
more flexible now - leveraged parameters better in
HaploSim.pl
- Added
haplotag_barcodes.py
to auto-generate haplotag barcodes - inline to haplotagging conversion uses memory-efficient in-memory sqlite3 database
- barcode validations for
align ema
andsimulate linkedreads
Bugs fixed
- [simulate linkedreads] barcode key generated as a fixed keymap, ensuring barcodes have same haplotag code between different haplotypes
What's Changed
- haplotagging barcodes as default by @pdimens in #162
- better sim demux support by @pdimens in #163
- fix param call by @pdimens in #164
Full Changelog: 1.11...1.12
1.11
New Features
- [sv leviathan] now also makes BX tags unique when concatenating population groups
- provided as
--bx
option toconcatenate_bam.py
- provided as
- new standalone script
deconvolve_alignments.py
that does the same thing asassign_mi.py
, but also deconvolves theBX
tag into hyphenated form
Fixes
- R logic for properly parsing new
--contigs
option #160
Improvements
- LOTS more guardrails with respect to validations and error handling
- Simplified logic for file type validation and tag management in scripts
- Enhanced error handling for missing input files across multiple scripts
- Updated argument parser configurations for improved user guidance and error handling
- Streamlined output methods across multiple scripts for consistency
PRs
Full Changelog: 1.10.1...1.11
1.10.1
This release was a big internal refactor and didn't feel like enough visible changes were present to release it as 1.11
, so it's named 1.10.1
instead
Internal
- some of the simpler file validations moved to the command-line parsing part of harpy #159
- [hpc] has much less redundant code
New Features
- All workflows with an
--extra-params
option now have some program-specific argument validation #158 --snakemake
now has validations--hpc
now has validations- [align ema] made read fragment density optimization off by default and is now exposed as a command-line argument to toggle on
Other changes
- [hpc] now checks if the executor plugin is installed and only prints the notice if it isn't
- [stitchparams] and [popgroup] have slightly nicer printing
Full Changelog: 1.10...1.10.1
1.10
New Features
assembly
andmetassembly
workflows--contigs
option foralign ...
sv ...
andphase
workflows- calculations for molecular coverage too
- non-singleton metrics added to alignment reports
Internal
- remove
pandas
dependency b/c no longer usingParamspace()
impute
parameter file now gets transcribed into theconfig.yaml
- new and better validations
- some validations have progressbar and are parallelized
- progressbar respects
--quiet
- progressbar respects
- config.yaml files are written using the
yaml
stdlib for consistency - workflow summaries have more robust logic
Breaking changes
- parameter file for
impute
has a newname
column that will name relevant outputs for a given parameter set- this affects the output directories now, which are named according to this
name
value
- this affects the output directories now, which are named according to this
config.yaml
file restructured a bit, mainly some params have better snake-cased namesskipreports
is nowskip
under a newreports
hierarchy
Pull Requests
- Add Metassembly by @pdimens in #144
- add assembly by @pdimens in #146
- Molecule coverage by @pdimens in #149
- Hotfixes by @pdimens in #150
- 151 provide harpy a list of primary autosomessex chroms for plottingreports for chromosome level assemblies by @pdimens in #152
- add quast reports to assembly by @pdimens in #155
- add singleton/non stats by @pdimens in #156
Full Changelog: 1.9...1.10