Releases · mcvickerlab/GenVarLoader

v0.8.0 (2025-02-05)

Feat

sequence annotations

v0.7.3 (2025-01-27)

Feat

allow subset_to() to accept boolean masks and polars Series
allow subset_to() to accept boolean masks and polars Series

Fix

add test for subset_to
add test for subset_to
update tests to match internal API changes
update tests to match internal API changes
bug in mark_keep_variants with spanning deletions.

v0.7.2 (2025-01-26)

Fix

change loop order to only open files once.
respect memory limits when writing bigwig data.
online docs notebook syntax highlightning
better docs.

v0.7.1 (2025-01-17)

Fix

bump version
scalar dataset indexing, region_indices order, updated docs, hotfixes

v0.7.0 (2025-01-17)

Feat

indexing matches input bed file. make sel() a private method pending better API design. fix: pass tests for indexing and separate indexing and subsetting logic into DatasetIndexer.
write input regions to disk with a column mapping each to a row in the sorted dataset regions

Fix

passing tests
passing tests
passing tests

v0.6.4 (2024-12-16)

Fix

update rust dependencies.

v0.6.3 (2024-12-16)

Fix

unintended torch requirements

v0.6.2 (2024-12-16)

Fix

update version
StratifiedSampler requires torch. fix: remove deprecated conda env files.

v0.6.1 (2024-11-25)

Fix

handle empty genotypes during gvl write. fix: PgenGenos sample_idx should be sorted when compared to current sample_idx.

v0.6.0 (2024-09-03)

Feat

bump version
geuvadis tutorial.
tutorial notebook, pooch dependency.

Fix

update available tracks after writing transformed ones to disk.

v0.5.6 (2024-08-07)

Fix

bump version
offsets can overflow int32, use int64 instead.

v0.5.5 (2024-08-02)

Fix

make Records.vars_in_range... functions fallible by returning None instead of "empty" RecordInfo instances. This fixes downstream behavior of the Variants.read.. methods when there are no variants in the query. feat: when reading VCFs for the first time and no index is found, try to index them first before raising an error. fix: better docstrings on attributes of private API.
add build number to replace yanked release

v0.5.4 (2024-07-05)

Fix

fix breaking changes from polars 1.0
fix breaking changes from polars 1.0

v0.5.3 (2024-07-05)

Fix

fix breaking changes from polars 1.0
fix breaking changes from polars 1.0

v0.5.2 (2024-07-05)

Fix

typo in pyproject causing dependencies to be ignored.
typo in pyproject causing dependencies to be ignored.

v0.5.1 (2024-06-29)

Feat

prep for readthedocs
prepare for online documentation.

Fix

add favicon
documentation formatting
rtd config
rtd config
rtd config
rtd config
rtd config
rtd config
rtd config
readthedocs dependencies
readthedocs config
readthedocs config
readthedocs config
readthedocs config
readthedocs config
readthedocs config

v0.5.0 (2024-06-13)

Feat

bump version
multiprocess reading of genotypes, both VCF and PGEN. fix: bug in reading genotypes from PGEN

v0.4.1 (2024-06-11)

Fix

bump version
got number of regions from wrong array in get_reference

v0.4.0 (2024-06-05)

Feat

deprecate old loader, worse performance. reorganize code.

Fix

better documentation in README. feat!: rename write_transformed_tracks to write_transformed_track. feat: more ergonomic indexing.

v0.3.3 (2024-06-01)

Fix

bump version
wrong max_ends from SparseGenotypes.from_dense_with_length due to data races/incorrect parallel semantics for numba
diffs need to be clipped and negated when computing shifts

Perf

pad haplotypes on-the-fly to avoid extra copying of reference subsequences

v0.3.2 (2024-04-29)

Feat

can convert Records back to a polars DataFrame with minimal copying via conversion of VLenAlleles to pyarrow buffers
make open_with_settings the standard open function. fix: recognize .bgz extension for fasta files

Fix

remove dynamic versioning table
move cli to main feat: generalize Variants to automatically identify whether vcf or pgen is passed
move cli to script in python source directory, maturin limitation?
wrong implementation of heuristic for extending genotypes.

Perf

faster sparsifying genotypes. feat: log level for cli. fix: clip missing lengths for appropriate end extension.

v0.3.1 (2024-04-16)

Feat

benchmark interval decompression on cpu with numba vs. cpu with taichi vs. gpu with taichi
optionally decompress intervals to tracks on gpu
initial support for stranded regions
option to cache fasta files as numpy arrays.
implement BigWig intervals as Rust extension.
finishing touches on multi-track implementation. Block is cryptic issue where writing genotypes is somehow preventing joblib from launching new processes.
stop overwriting by default, add option.
transforms directly on tracks. feat: intervals as array of structs for better data locality.
let extra tracks get added via paths
let extra tracks get added via paths
initial support for indels in tracks and WIP on also returning auxiliary genome wide tracks.
initial sparse genos -> haplotypes and sparse hap diffs.
wip sparse genotypes.
properties for getting haplotypes, references, or tracks only.
properties for getting haplotypes, references, or tracks only.
encourage num_workers <= 1 with GVL dataloader.
freeze gvl.Dataset to prevent user from accidentally introducing invalid states. feat: warn if any query contigs have either no variatns or intervals associated with them.
warn instead of error when no reference passed and genos present.
disable overwriting by default, have no args be help.
also report number of samples.
add .from_table constructor for BigWigs.
move CLI to script, include in package.
use a table to specify bigwigs instead. fix: jittering.
add script to write datasets to disk.
more quality of life improvements. relax dependency version constraints.
with_seed method
quality of life methods for subsetting and converting to dataloaders.
torch convenience functions fix: ensure genotypes and intervals written in sorted order wrt the BED file.
pre-computed implementation.

Fix

dependency typo
remove taichi interval to track implementation since it did not improve performance, even on GPU
need to subset arrays to be reverse complemented
change argument order of subset_to to match the rest of the API. fix: simplify subset implementation.
remove python 3.10 type hints
dimension order on subsets.
make variant indices absolute on write.
sparse genotypes layout
sparse genotypes layout
wrong layout out genotypes and wrong max ends computation.
ragged array layouts for correct concatenation when writing datasets one contig at a time.
bug where init_intervals would not initialize all available tracks.
track_to_intervals had wrong n_intervals and thus, wrong offsets.
track_to_intervals had wrong n_intervals and thus, wrong offsets.
bug in computing max ends.
match serde for genome tracks.
bug in open state management.
bug when writing genotypes where the chromosome of the requested regions is not present in the VCF.
bug getting intersection of samples available.
bug getting intersection of samples available.
sum wrong axis in adjust multi index.
make GVLDataset getitem API match torch Dataset API (i.e. use raveled index)
QOL improvements.
incorrect genotypes returned from VCF when queries have overlapping ranges.
wrong shape.
wrong shape.

Refactor

move construct virtual data to loader so utils import faster.
move construct virtual data to loader so utils import faster.
rename util to utils.
rename util to utils.
move write under dataset directory. perf?: move indexing operations into numba.
move cli to script outside package, faster help message.
break up dataset implementation into smaller files. refactor!: condense with_ methods into single with_settings() methods. feat: sel() and isel() methods for eager retrieval by sample and region.

Perf

when opening witih settings and providing a reference, but return_sequences is false, don't load the reference into memory.

v0.3.0 (2024-03-15)

Feat

write ZarrTracks in smaller chunks.
write ZarrTracks in smaller chunks.

Fix

remove wip vidx feature.
relax numba version constraint
rounding issues for setting fixed lengths on BED regions.
more informative vcf record progress bar.

v0.3.0rc6 (2024-03-11)

Feat

improve record query performance by allowing nearest_nonoverlapping index adjustment to be computed on-the-fly in the weighted activity selection algorithm and thus also benefit from early stopping.
more descriptive progress bar for constructing ZarrGenos from another file.
add progress bar for reading VCF records.

Fix

pylance update, catch possibly unbound variables.
instead of failing, raise warning when encountering non-SNP, non-INDEL variants and skip them.

v0.3.0rc5 (2024-03-04)

Fix

more descriptive pbar when writing ZarrTracks from another reader.
BigWigs, only keep contigs that are shared across all bigwigs.
better error messages and catching cases for non-SNP, non-INDEL variants.
avoid segfault caused when a TensorStore is forked to new processes.
make ZarrTracks implement Reader protocol. feat: add NumpyGenos for in-memory representation. feat: better ZarrGenos.from_recs_genos progress bar.

v0.3.0rc4 (2024-02-29)

Fix

naming of .ends.gvl.arrow to .gvl.ends.arrow so file suffix parsing works correctly.

v0.3.0rc3 (2024-02-29)

v0.3.0rc2 (2024-02-29)

Fix

remove pyd4 dependency, had unspectacular performance.

v0.3.0-rc.1 (2024-02-28)

Feat

add ZarrTracks for much faster performance than D4.
finish deprecating parallel GVL...

Releases: mcvickerlab/GenVarLoader

0.8.0

v0.8.0 (2025-02-05)

Feat

v0.7.3 (2025-01-27)

Feat

Fix

v0.7.2 (2025-01-26)

Fix

v0.7.1 (2025-01-17)

Fix

v0.7.0 (2025-01-17)

Feat

Fix

v0.6.4 (2024-12-16)

Fix

v0.6.3 (2024-12-16)

Fix

v0.6.2 (2024-12-16)

Fix

v0.6.1 (2024-11-25)

Fix

v0.6.0 (2024-09-03)

Feat

Fix

v0.5.6 (2024-08-07)

Fix

v0.5.5 (2024-08-02)

Fix

v0.5.4 (2024-07-05)

Fix

v0.5.3 (2024-07-05)

Fix

v0.5.2 (2024-07-05)

Fix

v0.5.1 (2024-06-29)

Feat

Fix

v0.5.0 (2024-06-13)

Feat

v0.4.1 (2024-06-11)

Fix

v0.4.0 (2024-06-05)

Feat

Fix

v0.3.3 (2024-06-01)

Fix

Perf

v0.3.2 (2024-04-29)

Feat

Fix

Perf

v0.3.1 (2024-04-16)

Feat

Fix

Refactor

Perf

v0.3.0 (2024-03-15)

Feat

Fix

v0.3.0rc6 (2024-03-11)

Feat

Fix

v0.3.0rc5 (2024-03-04)

Fix

v0.3.0rc4 (2024-02-29)

Fix

v0.3.0rc3 (2024-02-29)

v0.3.0rc2 (2024-02-29)

Fix

v0.3.0-rc.1 (2024-02-28)

Feat