Skip to content

Releases: delta-io/delta-rs

python-v0.15.2: predicate overwrite, improved table state replay

05 Feb 12:41
Compare
Choose a tag to compare

New features

  • feat: allow merge_execute to release the GIL by @emcake in #2091
  • feat: arrow backed log replay and table state by @roeap in #2037
  • feat: update table config to contain new config keys by @roeap in #2127
  • feat: expose stats schema on Snapshot by @roeap in #2128
  • feat: implementation for replaceWhere by @r3stl355 in #1996
  • feat: implement clone for DeltaTable struct by @mightyshazam in #2160
  • feat: introduce schema evolution on RecordBatchWriter by @rtyler in #2024

Bug Fixes

  • fix: properly deserialize percent-encoded file paths of Remove actions, to make sure tombstone and file paths match by @sigorbor in #2035
  • fix: reinstate copy-if-not-exists passthrough by @emcake in #2083
  • refactor: add deltalake-gcp crate by @ion-elgreco in #2061
  • fix: schema issue within writebuilder by @universalmind303 in #2106
  • fix: temporarily skip s3 roundtrip test by @roeap in #2124
  • fix: set partition values for added files when building compaction plan by @alexwilcoxson-rel in #2119
  • fix: clean-up paths created during tests by @roeap in #2126
  • fix: add missing pandas import by @Tim-Haarman in #2116
  • fix: order logical schema to match physical schema by @Blajda in #2129
  • fix: do not write empty parquet file/add on writer close; accurately … by @alexwilcoxson-rel in #2123
  • fix: prevent empty stats struct during parquet write by @alexwilcoxson-rel in #2125
  • fix(#2143): keep specific error type when writing fails by @abaerptc in #2144
  • fix(s3): restore working test for DynamoDb log store repair log on read by @dispanser in #2120
  • fix: made generalize_filter less permissive, also added more cases by @emcake in #2149
  • fix: allow loading of tables with identity columns by @rtyler in #2155
  • fix: replace BTreeMap with IndexMap to preserve insertion order by @roeap in #2150

Other Changes

New Contributors

Full Changelog: python-v0.15.1...python-v0.15.2

python-v0.15.1

07 Jan 21:53
180b35b
Compare
Choose a tag to compare

New features

  • feat(python, rust): expose custom_metadata for all operations by @ion-elgreco in #2032
  • feat: refactor WriterProperties class by @ion-elgreco in #2030
  • refactor: increase metadata action usage by @roeap in #2027
  • feat(rust): add more commit info to most operations by @ion-elgreco in #2009
  • feat(python): add schema conversion of FixedSizeBinaryArray and FixedSizeList by @balbok0 in #2005
  • feat: retry with exponential backoff for DynamoDb interaction by @dispanser in #1975

Bug Fixes

  • fix: ensure metadata cleanup do not corrupt tables without checkpoints by @Blajda in #2044
  • fix: remove casts of structs to record batch by @Blajda in #2033
  • fix: use temporary table names during the constraint checks by @r3stl355 in #2017

Other Changes

Full Changelog: python-v0.15.0...python-v0.15.1

python-v0.15.0: check constraints operation, and faster MERGE

02 Jan 15:15
093a756
Compare
Choose a tag to compare

New features

Bug Fixes

  • fix: respect case sensitivity on operations by @Blajda in #1954
  • fix: case sensitivity for z-order by @Blajda in #1982
  • fix: implement consistent formatting for constraint expressions by @Blajda in #1985
  • fix: remove the get_data_catalog() function by @rtyler in #1941
  • fix: handle empty table response in unity api by @JonasDev1 in #1963
  • fix: flakey gcs test by @roeap in #1987
  • fix: enable S3 integration tests to be configured via environment vars by @dispanser in #1966
  • fix: properly decode percent-encoded file paths coming from parquet checkpoints by @sigorbor in #1970

Breaking Changes

To control the writer properties in .update you need to pass the deltalake.WriterProperties class instead of a dicationary.

Other Changes

New Contributors

Full Changelog: python-v0.14.0...python-v0.15.0

python-v0.14.0

05 Dec 17:32
8f5c41d
Compare
Choose a tag to compare

New features

  • feat: adopt kernel schemas and improve protocol support by @roeap in #1756
  • feat: drop python 3.7 and adopt 3.12 by @roeap in #1859
  • feat: expose cleanup_metadata in Python by @r3stl355 in #1826
  • feat: handle protocol compatibility by @roeap in #1807
  • feat(python): expose convert_to_deltalake by @ion-elgreco in #1842
  • feat(python): add pyarrow to delta compatible schema conversion in writer/merge by @ion-elgreco in #1820
  • feat(python): expose create to DeltaTable class by @ion-elgreco in #1932
  • feat(python): expose rust writer as additional engine v2 by @ion-elgreco in #1891
  • feat: extend write_deltalake to accept Deltalake schema by @r3stl355 in #1922

Bug Fixes

  • fix: prevent writing checkpoints with a version that does not exist in table state by @rtyler in #1863
  • fix: checkpoint error with Azure Synapse by @PierreDubrulle in #1848
  • fix: improve catalog failure error message, add missing Glue native-tls feature dependency by @r3stl355 in #1883
  • fix: use physical name for column name lookup in partitions by @aersam in #1836
  • fix(rust/python): optimize.compact not working with tables with mixed large/normal arrow by @ion-elgreco in #1926
  • fix: support os.PathLike for table references by @bolkedebruin in #1809
  • fix: add buffer flushing to filesystem writes by @r3stl355 in #1911
  • fix: fail fast for opening non-existent path by @dimonchik-suvorov in #1917
  • fix: compare timestamp partition values as timestamps instead of strings by @sigorbor in #1895
  • fix: add high-level checking for append-only tables by @junjunjd in #1887
  • fix: prune each merge bin with only 1 file by @haruband in #1902
  • fix: get rid of panic in during table by @dimonchik-suvorov in #1928

Other Changes

New Contributors

Full Changelog: python-v0.13.0...python-v0.14.0

rust-v0.16.5

15 Nov 16:14
Compare
Choose a tag to compare

⚠️ If you are upgrading from any release other than 0.16.4, please also read these release notes ⚠️

This release includes a number of minor bug fixes including one for users of create_checkpoint_for() which previously allowed the caller to specify a version which did not match the loaded table state, leading to incorrect _last_checkjpoint files and a broke Delta table.

rust-v0.16.4

12 Nov 21:38
Compare
Choose a tag to compare

The v0.16.4 version of the deltalake crate contains one notable and important fix: an upgrade to the dynamodb_lock crate to v0.6.1.

That release changes the expected of the format for leaseDuration in DynamoDb from String to Number, which is a long-overlooked bug in the lock code which prevented stale locks from being reaped automatically using DynamoDb's TTL attribute

⚠️ CAUTION: Users of DynamoDb-based locking should use caution when upgrading their applications. ⚠️

Pre-existing locks should be properly respected by this newer version of dynamodb_lock however the consequences of a lock not being respected can result in data corruption of Delta tables. It is therefore recommended that when upgrading:

  • All writers using a given DynamoDb table for locking are stopped
  • DynamoDb is inspected and stale locks are cleared.
  • TTL is enabled on the table on the leaseDuration attribute (adjust if the application uses a different attribute name for lease duration).
  • Writers are restarted.

python-v0.13.0: Repair operation and PyArrow 13+ support

06 Nov 00:13
a5e2e3b
Compare
Choose a tag to compare

New features

Bug fixes

Other changes

New Contributors

Full Changelog: python-v0.12.0...python-v0.13.0

python-v0.12.0: Delete, Update, and Merge

19 Oct 01:00
3bcc428
Compare
Choose a tag to compare

What's Changed

New features

Bug fixes

  • fix: exception string in writer.py by @sebdiem in #1665
  • fix: change partitioning schema from large to normal string for pyarrow<12 by @ion-elgreco in #1671
  • fix: use epoch instead of ce for date stats by @universalmind303 in #1672
  • fix: unify environment variables referenced by Databricks docs by @rtyler in #1673
  • fix!: ensure predicates are parsable by @Blajda in #1690
  • fix: merge operation with string predicates by @Blajda in #1705
  • fix: reorder encode_partition_value() checks and add tests by @ldacey in #1733

Other contributions

Breaking changes

The DeltaTable.history() method now returns transactions in reverse chronological order. This matches the Spark implementation.

DeltaTable.files_by_partitions() has been removed. It has been deprecated since 0.7.0. Use DeltaTable.file_uris() instead.

DeltaTable.pyarrow_schema() has been removed. it has been deprecated since 0.7.0. Use DeltaTable.schema().to_pyarrow() instead.

New Contributors

Full Changelog: python-v0.11.0...python-v0.12.0

rust-v0.16.0

27 Sep 19:14
55a309d
Compare
Choose a tag to compare

Full Changelog

Implemented enhancements:

  • Expose Optimize option min_commit_interval in Python #1640
  • Expose create_checkpoint_for #1513
  • integration tests regularly fail for HDFS #1428
  • Add Support for Microsoft OneLake #1418
  • add support for atomic rename in R2 #1356

Fixed bugs:

  • Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
  • [python] Different stringification of partition values in reader and writer #1653
  • Unable to interface with data written from Spark Databricks #1651
  • get_last_checkpoint does some unnecessary listing #1643
  • PartitionWriter's buffer_len doesn't include incomplete row groups #1637
  • Slack community invite link has expired #1636
  • delta-rs does not appear to support tables with liquid clustering #1626
  • Internal Parquet panic when using a Map type. #1619
  • partition_by with "$" on local filesystem #1591
  • ProtocolChanged error when perfoming append write #1585
  • Unable to cargo update using git tag or rev on Rust 1.70 #1580
  • NoMetadata error when reading detlatable #1562
  • Cannot read delta table: Delta protocol violation #1557
  • Update the CODEOWNERS to capture the current reviewers and contributors #1553
  • [Python] Incorrect file URIs when partition values contain escape character #1533
  • add documentation how to Query Delta natively from datafusion #1485
  • Python: write_deltalake to ADLS Gen2 issue #1456
  • Partition values that have been url encoded cannot be read when using deltalake #1446
  • Error optimizing large table #1419
  • Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
  • ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
  • Invalid JSON in log record missing field schemaString for DLT tables #1302
  • Special characters in partition path not handled locally #1299

Merged pull requests:

  • chore: bump rust crate version #1675 (rtyler)
  • fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
  • feat: allow to set large dtypes for the schema check in write_deltalake #1668 (ion-elgreco)
  • docs: small consistency update in guide and readme #1666 (ion-elgreco)
  • fix: exception string in writer.py #1665 (sebdiem)
  • chore: increment python library version #1664 (wjones127)
  • docs: fix some typos #1662 (ion-elgreco)
  • fix: more consistent handling of partition values and file paths #1661 (roeap)
  • docs: add docstring to protocol method #1660 (MrPowers)
  • docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
  • fix: enable offset listing for s3 #1654 (eeroel)
  • chore: fix the incorrect Slack link in our readme #1649 (rtyler)
  • fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
  • chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
  • feat: expose min_commit_interval to optimize.compact and optimize.z_order #1645 (ion-elgreco)
  • fix: avoid excess listing of log files #1644 (eeroel)
  • fix: introduce support for Microsoft OneLake #1642 (rtyler)
  • fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
  • fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
  • chore: relax chrono pin to 0.4 #1635 (houqp)
  • chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
  • docs: update Readme #1633 (dennyglee)
  • chore: pin the chrono dependency #1631 (rtyler)
  • feat: pass known file sizes to filesystem in Python #1630 (eeroel)
  • feat: implement parsing for the new domainMetadata actions in the commit log #1629 (rtyler)
  • ci: fix python release #1624 (wjones127)
  • ci: extend azure timeout #1622 (wjones127)
  • feat: allow multiple incremental commits in optimize #1621 (kvap)
  • fix: change map nullable value to false #1620 (cmackenzie1)
  • Introduce the changelog for the last couple releases #1617 (rtyler)
  • chore: bump python version to 0.10.2 #1616 (wjones127)
  • perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
  • fix: don't re-encode paths #1613 (wjones127)
  • feat: use url parsing from object store #1592 (roeap)
  • feat: buffered reading of transaction logs #1549 (eeroel)
  • feat: merge operation #1522 (Blajda)
  • feat: expose create_checkpoint_for to the public #1514 (haruband)
  • docs: update Readme #1440 (roeap)
  • refactor: re-organize top level modules #1434 (roeap)
  • feat: integrate unity catalog with datafusion #1338 (roeap)

python-v0.11.0

26 Sep 16:10
b447934
Compare
Choose a tag to compare

What's Changed

New Features

  • feat: expose min_commit_interval to optimize.compact and optimize.z_order by @ion-elgreco in #1645
  • feat: allow multiple incremental commits in optimize by @kvap in #1621
  • feat: introduce support for Microsoft OneLake by @rtyler in #1642

Performance Improvements

  • feat: pass known file sizes to filesystem in Python by @eeroel in #1630
  • fix: avoid excess listing of log files by @eeroel in #1644
  • fix: enable offset listing for s3 by @eeroel in #1654

Other

  • chore: update datafusion to 31, arrow to 46 and object_store to 0.7 by @houqp in #1634
  • feat: implement parsing for the new domainMetadata actions in the commit log by @rtyler in #1629
  • feat: integrate unity catalog with datafusion by @roeap in #1338
  • fix: compensate for invalid log files created by Delta Live Tables by @rtyler in #1647
  • docs: add docstring to protocol method by @MrPowers in #1660
  • docs: fix some typos by @ion-elgreco in #1662
  • feat: use url parsing from object store by @roeap in #1592
  • chore: proposed updated CODEOWNERS to allow better review notifications by @rtyler in #1646
  • fix: more consistent handling of partition values and file paths by @roeap in #1661

New Contributors

Full Changelog: python-v0.10.2...python-v0.11.0