Skip to content

Releases: GoogleCloudDataproc/spark-bigquery-connector

0.16.1

11 Jun 18:15
Compare
Choose a tag to compare

New Features

  • Apache Arrow is now the default read format. Based on our benchmarking, Arrow provides read performance faster by 40% then Avro. (PR #180)
  • Apache Avro has been added as a write intermediate format. Based on our testing it shows performance improvements when the DataFrame is larger than 50GB (PR #163)
  • Usage simplification: Now instead of using the table mandatory option, user can use the built in path parameter of load() and save(), so that read becomes df = spark.read.format("bigquery").load("source_table") and write becomes df.write.format("bigquery").save("target_table") (PR #176)
  • An experimental implementation of the DataSource v2 API has been added. It is not ready for production use.

Dependency Updates

  • BigQuery API has been upgraded to version 1.116.1
  • BigQuery Storage API has been upgraded to version 0.133.2-beta
  • gRPC has been upgraded to version 1.29.0
  • Guava has been upgraded to version 29.0-jre

0.15.1-beta

27 Apr 17:39
Compare
Choose a tag to compare

A bug fix release:

  • PR #158: Users can now add the spark.datasource.bigquery prefix to the configuration options in order to support Spark's --conf command line flag
  • PR #160: View materialization is performed only on action, fixing a bug where view materialization was done too early

0.15.0-beta

21 Apr 01:56
Compare
Choose a tag to compare
  • PR #150: Reading DataFrames should be quicker, especially in interactive usage such in notebooks
  • PR #154: Upgraded to the BigQuery Storage v1 API
  • PR #146: Authentication can be done using AccessToken on top of Credentials file, Credentials, and the GOOGLE_APPLICATION_CREDENTIALS environment variable.

0.14.0-beta

31 Mar 19:34
Compare
Choose a tag to compare
  • Issue #96: Added Arrow as a supported format for reading from BigQuery
  • Issue #130 Adding the field description to the schema metadata
  • Issue #124: Fixing null values in ArrayType
  • Issue #143: Allowing the setting of SchemaUpdateOptions When writing to BigQuery
  • PR #148: Add support for writing clustered tables
  • Upgrade version of google-cloud-bigquery library to 1.110.0
  • Upgrade version of google-cloud-bigquerystorage library to 0.126.0-beta

0.13.1-beta

14 Feb 22:36
Compare
Choose a tag to compare
  • changed the parallelism parameter to maxParallelism in order to reflect the Change in the underlining API (the old parameter has been deprecated)
  • Upgrade version of google-cloud-bigquerystorage library to 0.122.0-beta.
  • Issue #73: Optimized empty projection used for count() execution.
  • Issue #121: Added the option to configure CreateDisposition when inserting data to BigQuery.

Notice: Version 0.13.0-beta also included an upgrade to version v1beta2 of the BigStorage API. Due to issues discovered when used with custom API roles, it has been deprecated and is not recommended to use. The 0.13.1-beta version of the connector is using version v1beta1 of the BigStorage API, also used in the previous versions.

0.12.0-beta

29 Jan 18:17
Compare
Choose a tag to compare
  • Issue #72: Moved the shaded jar name from classifier to a new artifact name. Now it
    is easier to use the connector within Jupyter notebooks
  • Issues #73, #87: Added better logging to help understand which columns and filters
    are asked by spark, and which are passed down to BigQuery
  • Issue #107: The connector will now alert when is is used with the wrong scala version

0.11.0-beta

18 Dec 00:39
Compare
Choose a tag to compare
  • Upgrade version of google-cloud-bigquery library to 1.102.0
  • Upgrade version of google-cloud-bigquerystorage library to 0.120.0-beta
  • Issue #6: Do not initialize bigquery options by default
  • Added ReadRows retries on GRPC internal errors
  • Issue #97: Added support for GEOGRAPHY type

0.5.0

07 Mar 01:36
Compare
Choose a tag to compare
Release 0.5.0-beta