22 Jul 00:59

davidrabinowitz

3b61d8d

0.17.0

New Features

Structured streaming write is now supported (PR #201, thanks @varundhussa)
Users now has the option to keep the data on GCS after writing to BigQuery (PR #202, thanks @leoneuwald)
Enabling to overwrite data of a single date partition (PR #211)
Supporting MATERIALIZED_VIEW as table type (PR #192)
Supporting columnar batch reads from Spark in the DataSource V2 implementation. (PR #198) It is not ready for production use.

Bug Fixes

Conditions on StructType fields are now handled by Spark and not the connector, Fixing Issue #197

Dependency Updates

BigQuery API has been upgraded to version 1.116.3
BigQuery Storage API has been upgraded to version 1.0.0
Netty has been upgraded to version 4.1.48.Final (Fixing issue #200)

Assets 4

11 Jun 18:15

davidrabinowitz

0.16.1

3572f91

0.16.1

New Features

Apache Arrow is now the default read format. Based on our benchmarking, Arrow provides read performance faster by 40% then Avro. (PR #180)
Apache Avro has been added as a write intermediate format. Based on our testing it shows performance improvements when the DataFrame is larger than 50GB (PR #163)
Usage simplification: Now instead of using the table mandatory option, user can use the built in path parameter of load() and save(), so that read becomes df = spark.read.format("bigquery").load("source_table") and write becomes df.write.format("bigquery").save("target_table") (PR #176)
An experimental implementation of the DataSource v2 API has been added. It is not ready for production use.

Dependency Updates

BigQuery API has been upgraded to version 1.116.1
BigQuery Storage API has been upgraded to version 0.133.2-beta
gRPC has been upgraded to version 1.29.0
Guava has been upgraded to version 29.0-jre

Assets 4

27 Apr 17:39

davidrabinowitz

0.15.1-beta

b1177aa

0.15.1-beta

A bug fix release:

PR #158: Users can now add the spark.datasource.bigquery prefix to the configuration options in order to support Spark's --conf command line flag
PR #160: View materialization is performed only on action, fixing a bug where view materialization was done too early

Assets 4

21 Apr 01:56

davidrabinowitz

0.15.0-beta

c7cf321

0.15.0-beta

PR #150: Reading DataFrames should be quicker, especially in interactive usage such in notebooks
PR #154: Upgraded to the BigQuery Storage v1 API
PR #146: Authentication can be done using AccessToken on top of Credentials file, Credentials, and the GOOGLE_APPLICATION_CREDENTIALS environment variable.

Assets 4

31 Mar 19:34

davidrabinowitz

0.14.0-beta

375e617

0.14.0-beta

Issue #96: Added Arrow as a supported format for reading from BigQuery
Issue #130 Adding the field description to the schema metadata
Issue #124: Fixing null values in ArrayType
Issue #143: Allowing the setting of SchemaUpdateOptions When writing to BigQuery
PR #148: Add support for writing clustered tables
Upgrade version of google-cloud-bigquery library to 1.110.0
Upgrade version of google-cloud-bigquerystorage library to 0.126.0-beta

Assets 4

14 Feb 22:36

davidrabinowitz

0.13.1-beta

d4dabd5

0.13.1-beta

changed the parallelism parameter to maxParallelism in order to reflect the Change in the underlining API (the old parameter has been deprecated)
Upgrade version of google-cloud-bigquerystorage library to 0.122.0-beta.
Issue #73: Optimized empty projection used for count() execution.
Issue #121: Added the option to configure CreateDisposition when inserting data to BigQuery.

Notice: Version 0.13.0-beta also included an upgrade to version v1beta2 of the BigStorage API. Due to issues discovered when used with custom API roles, it has been deprecated and is not recommended to use. The 0.13.1-beta version of the connector is using version v1beta1 of the BigStorage API, also used in the previous versions.

Assets 4

29 Jan 18:17

davidrabinowitz

0.12.0-beta

4ce9f23

0.12.0-beta

Issue #72: Moved the shaded jar name from classifier to a new artifact name. Now it
is easier to use the connector within Jupyter notebooks
Issues #73, #87: Added better logging to help understand which columns and filters
are asked by spark, and which are passed down to BigQuery
Issue #107: The connector will now alert when is is used with the wrong scala version

Assets 4

18 Dec 00:39

davidrabinowitz

0.11.0-beta

14b8375

0.11.0-beta

Upgrade version of google-cloud-bigquery library to 1.102.0
Upgrade version of google-cloud-bigquerystorage library to 0.120.0-beta
Issue #6: Do not initialize bigquery options by default
Added ReadRows retries on GRPC internal errors
Issue #97: Added support for GEOGRAPHY type

Assets 10

07 Mar 01:36

pmkc

0.5.0

6a11c0e

0.5.0

Release 0.5.0-beta

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

Bug Fixes

Dependency Updates

New Features

Dependency Updates

Releases: GoogleCloudDataproc/spark-bigquery-connector

0.17.0

New Features

Bug Fixes

Dependency Updates

0.16.1

New Features

Dependency Updates

0.15.1-beta

0.15.0-beta

0.14.0-beta

0.13.1-beta

0.12.0-beta

0.11.0-beta

0.5.0