All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
3.4.0 - 2024-01-30
3.3.0 - 2023-04-05
3.2.4 - 2022-10-18
- s3_date_prefix_scan won't error when a path is not found
3.2.3 - 2022-10-10
- Do not delete files, just overwrite the delta tbale with mode overwrite in delta_write block
3.2.2 - 2022-10-06
Bugfix: Add new exception for delta table read errors
3.2.1 - 2022-10-06
- Add new exception for delta table read errors
3.2.0 - 2022-10-06
- Delta core and delta storage 2.1
- Use bool param mergeSchema for all delta writes
3.1.0 - 2022-10-04
- Test with latest 2 pyspark versions only
3.0.0 - 2022-08-22
- Breaking change: HiveTable.DatabaseName and HiveTable.TableName is mandatory
- Add: Support for Databricks Unity Catalog
- Change: file registry date scan now use SQL LIST for UC
2.8.1 - 2021-12-17
- Add possibility to define JSON schema with JSON or PySpark code when reading XML files
2.8.0 - 2021-12-13
- Add possibility to define JSON schema with JSON or PySpark code when reading JSON files
2.7.3 - 2021-11-18
- Add support to read the Delta change data feed when running on Databricks platform
2.7.2 - 2021-11-11
- Add mergeSchema option to delta write block
2.7.1 - 2021-09-30
- Set default logging formatter to detailed
- Fixed code smells in upsert.py
2.7.0 - 2021-08-30
- Support for pyspark 3.1
2.6.0 - 2021-08-17
- Support for retry in postgresql and mysql upsert
2.5.1 - 2021-06-28
- Add logging of lift parametes and it values before starting lift
2.5.0 - 2021-06-17
- Add secret word filter for logging
- Use f-string for logging statements
- Update how to run test instructions
- Make mysql library non extra
2.4.0 - 2021-05-18
- custom::sql block for executing SQL statements
- Some codesmells according to SonarCloud
2.3.0 - 2021-03-29
- get_json_object transform function
2.2.0 - 2021-03-26
- MySQL upsert support
- Test against postgres versions 10, 11, 12 and 13
2.1.0 - 2021-03-23
- split transform function
- get_item transform function
2.0.0 - 2021-02-24
- Support for pyspark 2.4.5
1.11.0 - 2021-02-24
- Add substring transform function
1.10.1 - 2021-01-25
- The
fileregistry::delta_diff
fileregistry will read all data if the default start date is before the first version of the delta table
1.10.0 - 2021-01-22
- The
fileregistry::delta_diff
fileregistry for delta files
1.9.2 - 2020-12-16
- Parameters resolving will happen in sub strings as well like "${myVar}/extra"
1.9.1 - 2020-12-08
- Add support for nested columns in drop_duplicates transform function
1.9.0 - 2020-12-02
- Multiple outputs in custom::python_codeblock
1.8.0 - 2020-11-11
- Add drop_duplicates transform function
1.7.1 - 2020-11-10
Allow a retention interval shorter than 7 days for delta tables
1.7.0 - 2020-10-30
- Write json files through write::batch_json block
- Update dependency versions
1.6.3 - 2020-10-29
- Bugfix: When creating empty arrays they looked like array. That is not supported by spark 3 so instead we create empty array
1.6.2 - 2020-10-29
- Bugfix for loading empty directories with batch_delta using spark 3.0
1.6.1 - 2020-10-27
- Bugfix the Databricks optimize of file-registry after updating
1.6.0 - 2020-10-27
- Changed python version requirements to include python 3.9
- Add Databricks optimize and vacuum of file-registry after updating
1.5.0 - 2020-10-23
- Options parameter in load::batch_json to be able to submit more settings when loading json files (like multiLine: true)
1.4.3 - 2020-10-21
- Use of psycopg2.extras.execute_values to remove and simplify code
- Utils functions chunked and flatten_rows_dict in getl/common/upsert.py
- When checking if a file registry exists in an empty directory or in a S3 prefix that doesn't exist, a different exception is raised
1.4.2 - 2020-09-30
- Critical bugfix for Hive table creation.
1.4.1 - 2020-09-30
- Support for PartitionBy columns for HiveTable
1.4.0 - 2020-09-29
- Support for PartitionBy columns in write::batch_delta
1.3.0 - 2020-09-28
- Support for loading csv files with batch_csv
1.2.0 - 2020-09-09
- Explode functionality is added to the generic transform block
1.1.0 - 2020-09-03
- Postgres upsert support
- Schema for batch_json and batch_xml is now optional
1.0.1 - 2020-08-24
- Support for s3a:// paths
- python -m bin bumpversion changes the CHANGELOG.md for a release changelog
- Documentation on how to release a new version
- Links to versions in CHANGELOG.md
- Fix the fileregistry type in docs/migrations/s3_date_prefix_scan.md to fileregistry::s3_date_prefix_scan
1.0.0 - 2020-08-19
- s3_date_prefix_scan fileregistry, based upon prefix_based_date, see migration.
- pyspark 3.0 support including backwards compatibility support for pyspark 2.4
- Python 3.8 support for pyspark 3.0
- prefix_based_date fileregistry.