Releases: delta-io/delta-rs
Releases · delta-io/delta-rs
python-v0.9.0
What's Changed
New features
- added support for Databricks Unity Catalog by @nohajc in #1331
- add optimize command in python binding by @loleek in #1313
- optimistic transaction protocol by @roeap in #632
- use new conflict checker in Python by @wjones127 in #1275
- Add Max Partitions Arg to Write by @ColeMurray in #1242
- Write support for additional Arrow datatypes by @chitralverma in #1044
- add package version by @wjones127 in #1243
- improve err msg on use of non-partitioned column by @marijncv in #1221
- update incremental after operations by @wjones127 in #1337
Fixes
- fix: double url encode of partition key by @mrjoe7 in #1324
- fix: documentation typo fix by @benrutter in #1332
- fix: allow special characters in storage prefix by @wjones127 in #1311
- fix: Fixed Documentation for
get_add_actions
function by @JHibbard in #1253 - fix: use native-tls for python deltalake releases by @wjones127 in #1244
- refactor: Simplify the Store Backend Configuration code by @mrjoe7 in #1265
New Contributors
- @ColeMurray made their first contribution in #1242
- @JHibbard made their first contribution in #1253
- @chitralverma made their first contribution in #1044
- @mrjoe7 made their first contribution in #1265
- @loleek made their first contribution in #1313
- @nohajc made their first contribution in #1331
Full Changelog: python-v0.8.1...python-v0.9.0
rust-v0.10.0
Implemented enhancements:
- Support Optimize on non-append-only tables #1125
Fixed bugs:
- DataFusion integration incorrectly handles partition columns defined "first" in schema #1168
- Datafusion: SQL projection returns wrong column for partitioned data #1292
- Unable to query partitioned tables #1291
Merged pull requests:
- chore: add deprecation notices for commit logic on
DeltaTable
#1323 (roeap) - fix: handle local paths on windows #1322 (roeap)
- fix: scan partitioned tables with datafusion #1303 (roeap)
- fix: allow special characters in storage prefix #1311 (wjones127)
- feat: upgrade to Arrow 37 and Datafusion 23 #1314 (rtyler)
- Hide the parquet/json feature behind our own JSON feature #1307 (rtyler)
- Enable the json feature for the parquet crate #1300 (rtyler)
rust-v0.9.0
Implemented enhancements:
- hdfs support #300
- Add decimal primitive type to document #1280
- Improve error message when filtering on non-existant partition columns #1218
Fixed bugs:
- Datafusion table provider: issues with timestamp types #441
- Not matching column names when creating a RecordBatch from MapArray #1257
- All stores created using
DeltaObjectStore::new
have an identicalobject_store_url
#1188
Merged pull requests:
- Upgrade datafusion to 22 which brings arrow upgrades with it #1249 (rtyler)
- chore: df / arrow changes after update #1288 (roeap)
- feat: read schema from parquet files in datafusion scans #1266 (roeap)
- HDFS storage support via datafusion-objectstore-hdfs #1279 (iajoiner)
- Add description of decimal primitive to SchemaDataType #1281 (ognis1205)
- Fix names and nullability when creating RecordBatch from MapArray #1258 (balbok0)
- Simplify the Store Backend Configuration code #1265 (mrjoe7)
- feat: optimistic transaction protocol #632 (roeap)
- Write support for additional Arrow datatypes #1044(chitralverma)
- Unique delta object store url #1212 (gruuya)
- improve err msg on use of non-partitioned column #1221 (marijncv)
python-v0.8.1
What's Changed
- Unique delta object store url by @gruuya in #1212
- fix: make sure we handle data checking correctly by @wjones127 in #1222
Full Changelog: python-v0.8.0...python-v0.8.1
rust-v0.8.0
Implemented enhancements:
- feat(rust): support additional types for partition values #1170
Fixed bugs:
- File pruning does not occur on partition columns #1175
- Bug: Error loading Delta table locally #1157
- Deltalake 0.7.0 with s3 feature compliation error due to rusoto_dynamodb version conflict #1191
- Writing from a Delta table scan using WriteBuilder fails due to missing object store #1186
Merged pull requests:
- build(deps): bump datafusion #1217 (roeap)
- Implement pruning on partition columns #1179 (Blajda)
- feat: enable passing storage options to Delta table builder via Datafusion's CREATE EXTERNAL TABLE #1043 (gruuya)
- feat: typed commit info #1207 (roeap)
- add boolean, date, timestamp & binary partition types #1180 (marijncv)
- feat: extend configuration handling #1206 (marijncv)
- fix: load command for local tables #1205 (roeap)
- Enable passing Datafusion session state to WriteBuilder #1187 (gruuya)
- chore: increment dynamodb_lock version #1202 (wjones127)
- fix: update out-of-date doc about datafusion #1183 (xudong963)
- feat: move and update Optimize operation #1154 (roeap)
- add test for extract_partition_values #1159 (marijncv)
- fix typo #1166 (spebern)
- chore: remove star dependencies #1139 (wjones127)
python-v0.8.0
What's Changed
- Selectively overwrite data with python by @ismoshkov in #1101
- Python write_deltalake fails if pyarrow table contains binary columns by @rbushri in #1167
- minor: optimize partition lookup for vacuum loop by @houqp in #1120
- improve debuggability of json ser/de errors by @houqp in #1119
- docs(python): update docs by @wjones127 in #1155
- Set AddAction timestamps to milliseconds. Fixes #1124 by @guyrt in #1133
- fix: change unexpected field logging level to debug by @houqp in #1112
- chore: update datafusion by @roeap in #1114
- build(deps): bump tokio from 1.23.1 to 1.24.2 in /delta-inspect by @dependabot in #1118
- Make rustls default across all packages by @wjones127 in #1097
- build(deps): bump openssl-src from 111.22.0+1.1.1q to 111.25.0+1.1.1t in /aws/delta-checkpoint by @dependabot in #1134
- chore: remove star dependencies by @wjones127 in #1139
- add function & test for parsing table_or_uri by @marijncv in #1138
- build(deps): update errno requirement from 0.2 to 0.3 by @dependabot in #1142
- use Path object in writer tests by @marijncv in #1147
- fix typo by @spebern in #1166
- add test for extract_partition_values by @marijncv in #1159
- fix: avoid some allocations in DeltaStorageHandler by @roeap in #1115
- first setup of ruff for python linting by @marijncv in #1158
- feat: move and update Optimize operation by @roeap in #1154
- chore: increment dynamodb_lock version by @wjones127 in #1202
- feat: extend configuration handling by @roeap in #1206
- feat: typed commit info by @roeap in #1207
- build(deps): bump datafusion by @roeap in #1217
New Contributors
Full Changelog: python-v0.7.0...python-v0.8.0
rust-v0.7.0
Implemented enhancements:
- Support FSCK REPAIR TABLE Operation #1092
- Expose the Delta Log in a DataFrame that's easy for analysis #1031
- Provide case-insensitive storage options in backend #999
- Support local file path in CreateBuilder::with_location() #998
- Save operational params in the same way with delta io #1054 (ismoshkov)
Fixed bugs:
- DeltaTable DataFusion TableProvider does not support filter pushdown #1064
- DeltaTable DataFusion scan does not prune files properly #1063
- deltalake.DeltaTable constructor hangs in Jupyter #1093
- Transaction log JSON formatting issue when writing data via Python bindings #1017
- crates.io entry is missing link to rustdoc documentation #1076
- URL Registered with ObjectStore registry is different from url in DeltaScan #1018
- Not able to connect to Azure Storage with client id/secret #977
- Deltalake 0.5 crate s3 feature dynamodb version mismatch #973
- Overwrite mode does not work with Azure #939
- Use Chrono without default features #914
cargo test
does not run due to tls conflict #985- Azure SAS authorization fails with
<AuthenticationErrorDetail>Signature fields not well formed.
#910
Merged pull requests:
- Make rustls default across all packages #1097 (wjones127)
- Implement filesystem check #1103 (Blajda)
- refactor: move vacuum command to operations module #1045 (roeap)
- feat: enable passing storage options to Delta table builder via DataFusion's CREATE EXTERNAL TABLE #1043 (gruuya)
- feat: improve storage location handling #1065 (roeap)
- Fix to support UTC timezone #1022 (andrei-ionescu)
- feat: harmonize and simplify storage configuration #1052 (roeap)
- feat: expose function to get table of add actions #1033 (wjones127)
- fix: change unexpected field logging level to debug #1112 (houqp)
- fix: datafusion predicate pushdown and dependencies #1071 (roeap)
- fix: azure sas key url encoding #1036 (roeap)
- Add provisional workaround to support CDC #1039 #1042 (Fazzani)
- improve debuggability of json ser/de errors #1119 (houqp)
- Add an example of writing to a delta table with a RecordBatch #1085 (rtyler)
- minor: optimize partition lookup for vacuum loop #1120 (houqp)
- Add missing documentation metadata to Cargo.toml #1077 (johnbatty)
- add test for null_count_schema_for_fields #1135 (marijncv)
- add test for min_max_schema_for_fields #1122 (marijncv)
- add test for get_boolean_from_metadata #1121 (marijncv)
- add test for left_larger_than_right #1110 (marijncv)
- Add test for: to_scalar_value #1086 (marijncv)
- Fix typo in delta-inspect #1072 (byteink)
- chore: update datafusion #1114 (roeap)
* This Changelog was automatically generated by github_changelog_generator
python-v0.7.0
What's Changed
- Add the support of the AWS_PROFILE environment variable for S3 by @fvaleye in #986
- Handle pandas timestamps by @hayesgb in #958
- fix: get azure client secret from config by @roeap in #981
- fix truncating signature on SAS by @damiondoesthings in #1007
- fix: azure sas key url encoding by @roeap in #1036
- test: add Data Acceptance Tests by @wjones127 in #909
- test(python): add read / write benchmarks by @wjones127 in #933
- Add a new release github action for Python binding: macos with universal2 wheel by @fvaleye in #976
- test(python): add azure integration tests by @wjones127 in #912
- Loosen version requirement for maturin by @gyscos in #1005
- Support DataFusion 15 by @andrei-ionescu in #1021
- Add provisional workaround to support CDC #1039 by @Fazzani in #1042
- refactor(api!): refactor Python APIs for getting file list by @wjones127 in #1032
- feat: make
DeltaStorageHandler
pickle serializable by @roeap in #1016 - Expose checkpoint creation for current table state in python by @ismoshkov in #1058
- feat: expose function to get table of add actions by @wjones127 in #1033
- Save operational params in the same way with delta io by @ismoshkov in #1054
New Contributors
- @hayesgb made their first contribution in #958
- @iajoiner made their first contribution in #1001
- @gyscos made their first contribution in #1005
- @damiondoesthings made their first contribution in #1007
- @Fazzani made their first contribution in #1042
- @gruuya made their first contribution in #1043
- @ismoshkov made their first contribution in #1058
- @byteink made their first contribution in #1072
- @johnbatty made their first contribution in #1077
- @marijncv made their first contribution in #1086
Full Changelog: python-v0.6.4...python-v0.7.0
rust-v0.6.0
What's Changed
- Add a new release github action for Python binding: macos with universal2 wheel by @fvaleye in #976
- feat: check invariants in write command by @roeap in #980
- fix: get azure client secret from config by @roeap in #981
- Update
.gitignore
and add/removeCargo.lock
when appropriate by @iajoiner in #1001 - Add the support of the AWS_PROFILE environment variable for S3 by @fvaleye in #986
- fix truncating signature on SAS by @damiondoesthings in #1007
- Support DataFusion 15 by @andrei-ionescu in #1021
- Update versions to 0.6.0 and update changelog by @iajoiner in #1023
- Add PR autolabeling by @iajoiner in #1030
- build(deps): bump env_logger from 0.9.3 to 0.10.0 by @dependabot in #962
- build(deps): bump serde_json from 1.0.88 to 1.0.89 by @dependabot in #963
- build(deps): bump bytes from 1.2.1 to 1.3.0 by @dependabot in #964
- build(deps): bump tokio from 1.21.2 to 1.22.0 by @dependabot in #965
- build(deps): bump openssl from 0.10.42 to 0.10.43 by @dependabot in #966
- build(deps): bump async-trait from 0.1.58 to 0.1.59 by @dependabot in #990
- build(deps): bump parquet2 from 0.16.3 to 0.17.0 by @dependabot in #991
- build(deps): bump libc from 0.2.137 to 0.2.138 by @dependabot in #992
- build(deps): bump serde from 1.0.147 to 1.0.149 by @dependabot in #995
- build(deps): bump flatbuffers from 2.1.2 to 22.9.29 in /aws/delta-checkpoint by @dependabot in #952
- build(deps): bump flatbuffers from 2.1.2 to 22.9.29 in /delta-inspect by @dependabot in #951
New Contributors
- @iajoiner made their first contribution in #1001
- @damiondoesthings made their first contribution in #1007
Full Changelog: python-v0.6.4...rust-v0.6.0
rust-v0.5.0
What's Changed
- Add max and min values to Statistics by @viirya in #327
- Use WebIdentityProvider for DynamoDb client in k8s by @rusty-jules in #328
- bump rust version in preparation for the next release by @houqp in #329
- fix automated rust release CD job by @houqp in #326
- add pandas keyword to python package metadata by @houqp in #325
- expose update_incremental API to python binding by @houqp in #332
- update python related docs by @houqp in #331
- Upgrade arrow, parquet and datafusion by @Dandandan in #335
- added warning message if the detected glibc version is < 2.28 by @Smurphy000 in #334
- Convert scalar value to correct type based on arrow data type. by @viirya in #336
- Fix consecutive checkpoints by @mosyp in #333
- Fix new clippy warnings coming up in CI by @xianwill in #341
- perform incremental update after transaction commit by @houqp in #343
- Add timestamp handling to checkpoint writer by @xianwill in #340
- Add clear table state in load_version when no checkpoint found. by @zijie0 in #347
- Low level create table by @Smurphy000 in #342
- pub DeltaTable method to retrieve table configurations by @Smurphy000 in #356
- Modify partition_values field type in Add/Remove actions. by @zijie0 in #354
- fix sleep workaround in checkpoint test by @houqp in #360
- Modify get_files_by_partitions to use partition values by @zijie0 in #362
- Fix get_latest_version returning version < 0. by @zijie0 in #364
- fix typo in python release CI config by @houqp in #365
- cache cargo builds by @houqp in #359
- Add '.tmp' suffix to temporary file of prepared commit by @mosyp in #366
- support partition value string deserialization for float/double/date by @houqp in #363
- Implement atomic put_obj. by @zijie0 in #367
- Make Format.options to be required field by @mosyp in #370
- Allow filesystem backend put_obj to overwrite existing by @mosyp in #376
- Wrap DeltaTransactionError with DeltaTableError. by @zijie0 in #374
- Refactoring of black, isort, mypy tools usages into pyproject.toml by @fvaleye in #378
- Implement consistent behavior in Windows with regard to swap parameter. by @zijie0 in #379
- Merge Cargo.toml into pyproject.toml by @fvaleye in #381
- Update datafusion and ballista links in README by @ei-grad in #382
- Add sts assume role credentials provider for S3 by @mosyp in #383
- Reuse table/storage instances in checkpoints by @mosyp in #384
- additional error handling to atomic_rename by @Marnixvdb in #386
- Upgrade to DataFusion 5.0 by @Dandandan in #389
- added initial commit info on create method for a DeltaTable by @Smurphy000 in #387
- Google cloud by @blogle in #355
- Remove version param from create_checkpoint_from_table by @mosyp in #399
- Implement delete_objs in fs and s3 storage backends. by @zijie0 in #395
- Add examples for reading delta table with Rust API. by @zijie0 in #400
- Update pyproject definition in pyproject.toml by @fvaleye in #405
- Use
tokio::fs::rename
input_obj
. by @zijie0 in #403 - Fix duplicates on update call by @mosyp in #398
- Add a Makefile build task in the Python binding by @fvaleye in #410
- Add implementation for
load_with_datetime
in Python package. by @zijie0 in #411 - Add filesystem argument for reading DeltaTable in Python binding by @fvaleye in #414
- Fix reading nullable action fields from parquet by @mosyp in #417
- Ensures that all table schemas are of StructType by @blogle in #415
- Gcs writer bugs by @blogle in #412
- Add S3StorageOptions to allow configuring S3 backend explicitly by @xianwill in #418
- Read a DeltaTable using a Data Catalog by @fvaleye in #419
- Change checkpoint creation logs from info to debug by @mosyp in #423
- Add LICENSE file in the Python binding and refer it in the pyproject by @fvaleye in #422
- Audit action field optionality by @fvaleye in #380
- Introduce DeltaConfig and tombstones retention policy by @mosyp in #420
- [README] Replace the inactive rust-dataframe with polars by @sa- in #426
- Bump arrow to 6.0.0-SNAPSHOT and bring map support to schema by @mosyp in #375
- Support partition value string deserialization for timestamp/binary by @zijie0 in #371
- Document the valid primitive types by @Ekleog in #430
- Add is_non_acquirable field to the dynamodb lock by @mosyp in #429
- Clean up DeltaTransactionError by @mosyp in #432
- Optimize remove action apply with early iteration exit #424 by @akshay26031996 in #431
- Decode path in Add and Remove actions. by @zijie0 in #434
- reenable datafusion integration with temporary fork by @houqp in #436
- Add history command in delta-rs by @fvaleye in #428
- Release Python binding version 0.5.3 by @fvaleye in #439
- Add delete_lock and fix release_lock by @mosyp in #440
- Fixing test to compare sorted vec by @akshay26031996 in #443
- Batch-apply remove actions in tombstone handling by @dispanser in #444
- Update datafusion links by @bbigras in #446
- Run all tests under s3 feature flag by @mosyp in #447
- Add maturin develop command with extras in Python binding by @fvaleye in #448
- README: mark Checkpoint creation as done for Rust by @bbigras in #449
- Fix broken tombstones metadata when extended_file_metadata is different between tomstones in state by @mosyp in #450
- No tombstone loading by @dispanser in #445
- return lazy iterator in get tombstone methods by @houqp in #452
- Generate new session name on assume role credentials provider refresh by @mosyp in #451
- Add pool_idle_timeout options for s3 and sts clients by @mosyp in #458
- Do action reconciliation by @viirya in #456
- Use action default stats by @viirya in #459
- Add new module for DeltaTableState by @viirya in #464
- Support hash lookup by path string for Remove action by @viirya in #462
- Fix coverage of the Python tests by @fvaleye in #467
- materialize tables in pyhton via native storage backend by @roeap in #463
- Make file storage backend's atomic rename async by @viirya in #471
- Add GCS feature to the Python Cargo.toml file by @kelvins in #476
- Throw an error when filter key is not in partitioned columns. by @zijie0 in #475
- Fix documentation for the DeltaStorageHandler by @fvaleye in #483
- Update README.adoc by @dennyglee in #482
- Update az...