Releases · projectglow/glow

20 Oct 03:22

kermany

v2.0.3

72637ad

v2.0.3 Latest

Latest

What's Changed

fixing scala logistic regression test by @kermany in #705
Update sbt-scoverage to 2.2.0 by @scala-steward-projectglow in #704
fixed opt_einsum contract incompatibility issue for linear/logistic regression by @kermany in #710
Update development version to 2.0.4 by @github-actions in #718

Full Changelog: v2.0.2...v2.0.3

Contributors

kermany

Assets 2

12 Mar 01:16

henrydavidge

v2.0.0

55a1c99

v2.0.0

What's Changed

Major changes

Support Spark 3.4 and 3.5
Add functions for left and left semi joins with overlap criteria accelerated by Databricks' range join optimization
Register SQL functions via SQL extension service provider interface, so glow.register is no longer necessary if Glow is on the classpath when Spark is launched

Other user facing changes

Remove Hail integration
Remove features that frequently cause incompatibilities between versions (aggregate_by_index, CSV pipe transformer). Workarounds are provided in the documentation.

Internal changes

Future proof for Spark 4.0 / Scala 2.13 / JDK 17
Migrate CI and release process to GitHub Actions

Overlap join benchmarks

On a dataset with 1B left rows and 1M right rows and varying percentages of SNPs in the left table (tested with 1 4 core executor due to quota):

Inner range join + left join, all SNP percentages: 4h
Glow join, 0% SNPs: 4h
Glow join, 50% SNPs: 2h9m
Glow join, 90% SNPs: 0h42m

Other notes

The Python source artifact is built from tag v2.0.0-conda in order to fix Glow's conda recipe.

New Contributors

@dvcastillo made their first contribution in #505
@dtzeng made their first contribution in #519
@srowen made their first contribution in #524
@a-li made their first contribution in #522
@scala-steward-projectglow made their first contribution in #555

Full Changelog: v1.2.1...v2.0.0

Contributors

srowen, dtzeng, and 2 other contributors

Assets 5

21 Apr 00:32

williambrandler

v1.2.1

0eb8767

v1.2.1

v1.2.1 bumps glow to Spark v3.2.1

This release includes Java/Scala artifacts in Maven Central , and Python artifacts in pypi. Docker containers projectglow/open-source-glow:1.2.1, projectglow/databricks-glow:1.2.1, projectglow/databricks-glow:10.4 and projectglow/databricks-hail:0.2.93 can be found in projectglow's dockerhub. The Glow notebook continous integration test now uses Databricks Runtime 10.4, which is on Spark 3.2.1 (workflow definition json)

Glow leverages private catalyst APIs that have changed from Spark 3.1 to Spark 3.2. We wrote a Shim to maintain backwards compatibility. However, Spark 2 is end of life (EoL). Databricks, AWS EMR and Google Dataproc now depend on Hadoop 3.x, which is incompatible with Spark 2. So we are removing support for Spark 2, including the Spark 2 continuous integration tests (ci/cd) performed with circleci. Glow version 1.1.2 is the last release that supports Spark 2

The Spark 3 ci/cd tests depend on Hail, and these were failing since Hail does not yet support Spark 3.2, they are waiting on Google's Dataproc and AWS EMR to upgrade from Spark 3.1. So for now we expect the Spark 3 circleci tests to continue failing until we can resolve the hail tests. However, we moved forward with the new release as it is unclear when Dataproc or EMR will support Spark 3.2

Thanks to Alex Barreto, Jasser Abidi, Cameron Smith, Marcus Henry, Karen Feng, Joseph Bradley, and William Brandler for their contributions to this release

New Contributors

@cameronraysmith made their first contribution in #483
@JassAbidi and @jkbradley made their first contributions in #501

Full Changelog: v1.1.2...v1.2.1

Contributors

cameronraysmith, jkbradley, and JassAbidi

Assets 2

02 Dec 00:37

williambrandler

v1.1.2

be635c7

Release v1.1.2

v1.1.2

Glow incorporates new functionality for quarantining records with the Glow pipe transformer in v1.1.2.

This release includes Java/Scala artifacts in Maven Central, and Python artifacts in PyPi and Conda Forge.

New Contributors

@dmoore247 made their first contribution in #408
@mah-databricks made their first contribution in #418

Full Changelog: v1.1.1...v1.1.2

Contributors

dmoore247 and mah-databricks

Assets 2

30 Oct 01:16

williambrandler

v1.1.1

53024eb

Release v1.1.1

v1.1.1

Glow incorporates new functionality for sample masking in GWAS v1.1.1, which has been documented as a quickstart guide. Nightly notebook tests are now dockerized, making it easier to integrate Glow with other bioinformatics libraries. VEP schema changes fixes a bug with indel parsing

This release includes Java/Scala artifacts in Maven Central, and Python artifacts in PyPi and Conda Forge.

What's Changed

Dockerize ci tests by @williambrandler in #414
Releasev110 by @williambrandler in #411
adding codecov.yml by @williambrandler in #413
remove init script from nb test by @williambrandler in #415
Fix VEP parsing failures stemming from indels by @bboutkov in #402
Extending sample masking functionality in gwas linear regression by @bcajes in #416
fix bedtools path by @williambrandler in #417
add vep example by @williambrandler in #382
Docker containers for Glow runtime environment on Databricks by @a0x8o in #420
remove extraneous detail from quickstart docs by @williambrandler in #428
add data simulation doc page by @williambrandler in #427
fix pandas lmm notebook link by @williambrandler in #430

New Contributors

@a0x8o made their first contribution in #420

Credits

Alex Barreto, Boris Boutkov, Brian Cajes, Karen Feng, William Brandler, dim de grave

Full Changelog: v1.1.0...v1.1.1

Contributors

bcajes, bboutkov, and 2 other contributors

Assets 2

26 Aug 23:57

williambrandler

v1.1.0

351befb

v1.1.0

v1.1.0 bumps the Spark version of Glow to 3.1.2

Glow also now runs automated nightly testing of notebooks in the docs, making it easier for users to contribute code or algorithms to help others make use of Glow

This release includes Java/Scala artifacts in Maven Central, and Python artifacts in PyPi and Conda Forge.

Notable changes:

Upgrade Spark dependency from 3.0.0 to 3.1.2 #396
Create integration test script #373
Hail related enhancements #377
Remove typecheck for numpy arrays #366

Minor changes include:

Migrate from Bintray to Sonatype #367
Test changed notebooks in branches #380

Credits: Brian Cajes, Karen Feng, William Brandler, dim de grave

Assets 2

28 Apr 00:07

karenfeng

v1.0.1

4a414c6

v1.0.1

v1.0.1 is a patch release

This release includes Java/Scala artifacts in Maven Central, and Python artifacts in PyPi and Conda Forge.

Assets 2

09 Feb 17:45

karenfeng

v1.0.0

c287a97

v1.0.0

We are excited to announce the release of Glow 1.0.0. This release includes major scalability and usability improvements, particularly for GloWGR whole-genome regression and genome-wide association study regression tests. These improvements create a more performant GloWGR workflow with simpler APIs.

Major features and changes include:

#302, #309: Pandas-based linear regression. Introduced the linear_regression Python function which can be used to perform GWAS linear regression tests for multiple phenotypes simultaneously. The function is optimized for performance through one-time calculation of intermediate matrices common across multiple phenotypes and genotypes. The function can also accept WGR terms as an offset parameter. This function is superior in performance compared to the existing SQL-based linear_regression_gwas function, which only works on a single phenotype.
#316, #318, #319: Pandas-based logistic regression. Introduced the logistic_regression Python function with the same properties mentioned above for linear regression. This function implements a fast multi-phenotype multi-genotype score test with fallback logic for significant variants indicated by the score test. The currently supported fallback test is the Approximate Firth method presented in REGENIE.
#323: Improved the WGR API so that the user can now provide all the input to a single class and run different functions without passing any arguments. An estimate_loco_offsets function was added to perform an end-to-end generation of loco predictors using a single command. In addition, GloWGR was revised to make its behavior regarding standardization of phenotypes and genotypes, and treatment of intercept match the REGENIE algorithm.
#300: Conversion from Hail MatrixTables to Glow-compatible Spark DataFrames.
#274: Faster default VCF reader.
#294: Streamlined GloWGR between WGR and GWAS functions.
#282: Improved scalability of GloWGR.
#303: Added hard calling by default to the BGEN reader.

Backwards-incompatible changes:

#326: Changed Glow register function to not modify the Spark session by default.

Assets 2

10 Sep 18:38

kianfar77

v0.6.0

c5b2f30

v0.6.0

We are excited to announce the release of Glow 0.6.0. This release includes both Java/Scala and Python artifacts that can be found in Maven Central and PyPI, respectively. Please note that the name of Maven Artifacts has changed from glow to glow-spark3 and glow-spark2 as glow is now released for both versions of Spark.

Notable additions/changes are:

#245 Added GloWGR for binary traits
#240 Input validation for GloWGR
#242 transform_loco function for RidgeRegression, which applies the fitted model in a leave-one-chromosome-out to get phenotype predictors for each chromosome
#243 reshape_for_gwas convenience function to prepare the output of GloWGR for use in glow GWAS functions
#285 Improved performance of lift_over_variants transformer
#249 Faster conversion form python double array to java array
#276 Added support for reading uncompressed or zstd compressed BGEN files
#254 , #291 Feature to cross release for Spark 3 and Spark 2
#258 Fixed error in python literal conversion
#264 Fixed splitability state of non-compressed VCFs
#271, #281 Minor fixes to GloWGR
#247, #250, #252, #273, #275, #279, #287 Documentation, notebook, and blog improvements
Other minor fixes

Assets 2

24 Jun 17:50

henrydavidge

v0.5.0

354a40a

v0.5.0

This release features the initial release of GloWGR, a framework for distributed whole genome regression. For more information, see the blog post and user guide.

Additional features:
#222: Accept non-string arguments in transformers
#213: Accept numpy ndarrays as literal arguments to GWAS functions
#228: Add a user guide for merging variant datasets with Glow

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Major changes

Other user facing changes

Internal changes

Overlap join benchmarks

Other notes

New Contributors

Contributors

New Contributors

Contributors

New Contributors

Contributors

What's Changed

New Contributors

Credits

Contributors

Releases: projectglow/glow

v2.0.3

What's Changed

Contributors

v2.0.0

What's Changed

Major changes

Other user facing changes

Internal changes

Overlap join benchmarks

Other notes

New Contributors

Contributors

v1.2.1

New Contributors

Contributors

Release v1.1.2

New Contributors

Contributors

Release v1.1.1

What's Changed

New Contributors

Credits

Contributors

v1.1.0

v1.0.1

v1.0.0

v0.6.0

v0.5.0