From 59f1b91ac6314a7c625e3c584af31738eece75e2 Mon Sep 17 00:00:00 2001 From: kbuilder Date: Thu, 25 Jan 2024 11:18:58 -0800 Subject: [PATCH] Release 0.36.0. --- CHANGES.md | 2 +- README.md | 80 ++++++++++++++++++++++++++++++------------------------ 2 files changed, 46 insertions(+), 36 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index 69af906b0..ed85bf39d 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -1,6 +1,6 @@ # Release Notes -## Next +## 0.36.0 - 2024-01-25 * PR #1155: allow lazy materialization of query on load * PR #1163: Added config to set the BigQuery Job timeout diff --git a/README.md b/README.md index 2fee17384..f3b1d82d0 100644 --- a/README.md +++ b/README.md @@ -57,14 +57,14 @@ The latest version of the connector is publicly available in the following links | version | Link | |------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| Spark 3.5 | `gs://spark-lib/bigquery/spark-3.5-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.35.1.jar)) | -| Spark 3.4 | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.35.1.jar)) | -| Spark 3.3 | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.35.1.jar)) | -| Spark 3.2 | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.35.1.jar)) | -| Spark 3.1 | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.35.1.jar)) | -| Spark 2.4 | `gs://spark-lib/bigquery/spark-2.4-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.35.1.jar)) | -| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.35.1.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.35.1.jar)) | -| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.35.1.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.35.1.jar)) | +| Spark 3.5 | `gs://spark-lib/bigquery/spark-3.5-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.36.0.jar)) | +| Spark 3.4 | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.36.0.jar)) | +| Spark 3.3 | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.36.0.jar)) | +| Spark 3.2 | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.36.0.jar)) | +| Spark 3.1 | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.36.0.jar)) | +| Spark 2.4 | `gs://spark-lib/bigquery/spark-2.4-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.36.0.jar)) | +| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.36.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.36.0.jar)) | +| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.36.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.36.0.jar)) | | Scala 2.11 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar)) | The first four versions are Java based connectors targeting Spark 2.4/3.1/3.2/3.3 of all Scala versions built on the new @@ -87,16 +87,17 @@ below. | spark-bigquery-with-dependencies_2.11 | ✓ | ✓ | | | | | | | ### Connector to Dataproc Image Compatibility Matrix -| Connector \ Dataproc Image | 1.3 | 1.4 | 1.5 | 2.0 | 2.1 | Serverless
Image 1.0 | Serverless
Image 2.0 | Serverless
Image 2.1 | -|---------------------------------------|---------|---------|---------|---------|---------|-------------------------|-------------------------|-------------------------| -| spark-3.4-bigquery | | | | | | | | ✓ | -| spark-3.3-bigquery | | | | | ✓ | ✓ | ✓ | ✓ | -| spark-3.2-bigquery | | | | | ✓ | ✓ | ✓ | ✓ | -| spark-3.1-bigquery | | | | ✓ | ✓ | ✓ | ✓ | ✓ | -| spark-2.4-bigquery | | ✓ | ✓ | | | | | | -| spark-bigquery-with-dependencies_2.13 | | | | | | | ✓ | ✓ | -| spark-bigquery-with-dependencies_2.12 | | | ✓ | ✓ | ✓ | ✓ | | | -| spark-bigquery-with-dependencies_2.11 | ✓ | ✓ | | | | | | | +| Connector \ Dataproc Image | 1.3 | 1.4 | 1.5 | 2.0 | 2.1 | 2.2 | Serverless
Image 1.0 | Serverless
Image 2.0 | Serverless
Image 2.1 | Serverless
Image 2.2 | +|---------------------------------------|---------|---------|---------|---------|---------|---------|-------------------------|-------------------------|-------------------------|-------------------------| +| spark-3.5-bigquery | | | | | | ✓ | | | | ✓ | +| spark-3.4-bigquery | | | | | | ✓ | | | ✓ | ✓ | +| spark-3.3-bigquery | | | | | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | +| spark-3.2-bigquery | | | | | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | +| spark-3.1-bigquery | | | | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | +| spark-2.4-bigquery | | ✓ | ✓ | | | | | | | | +| spark-bigquery-with-dependencies_2.13 | | | | | | | | ✓ | ✓ | ✓ | +| spark-bigquery-with-dependencies_2.12 | | | ✓ | ✓ | ✓ | ✓ | ✓ | | | | +| spark-bigquery-with-dependencies_2.11 | ✓ | ✓ | | | | | | | | | ### Maven / Ivy Package Usage The connector is also available from the @@ -106,14 +107,14 @@ repository. It can be used using the `--packages` option or the | version | Connector Artifact | |------------|------------------------------------------------------------------------------------| -| Spark 3.5 | `com.google.cloud.spark:spark-3.5-bigquery:0.35.1` | -| Spark 3.4 | `com.google.cloud.spark:spark-3.4-bigquery:0.35.1` | -| Spark 3.3 | `com.google.cloud.spark:spark-3.3-bigquery:0.35.1` | -| Spark 3.2 | `com.google.cloud.spark:spark-3.2-bigquery:0.35.1` | -| Spark 3.1 | `com.google.cloud.spark:spark-3.1-bigquery:0.35.1` | -| Spark 2.4 | `com.google.cloud.spark:spark-2.4-bigquery:0.35.1` | -| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.35.1` | -| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.35.1` | +| Spark 3.5 | `com.google.cloud.spark:spark-3.5-bigquery:0.36.0` | +| Spark 3.4 | `com.google.cloud.spark:spark-3.4-bigquery:0.36.0` | +| Spark 3.3 | `com.google.cloud.spark:spark-3.3-bigquery:0.36.0` | +| Spark 3.2 | `com.google.cloud.spark:spark-3.2-bigquery:0.36.0` | +| Spark 3.1 | `com.google.cloud.spark:spark-3.1-bigquery:0.36.0` | +| Spark 2.4 | `com.google.cloud.spark:spark-2.4-bigquery:0.36.0` | +| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.36.0` | +| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.36.0` | | Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.29.0` | ### Specifying the Spark BigQuery connector version in a Dataproc cluster @@ -123,8 +124,8 @@ Using the standard `--jars` or `--packages` (or alternatively, the `spark.jars`/ To use another version than the built-in one, please do one of the following: -* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.35.1`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.35.1.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version. -* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.35.1`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.35.1.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version. +* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.36.0`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.36.0.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version. +* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.36.0`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.36.0.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version. ## Hello World Example @@ -134,7 +135,7 @@ You can run a simple PySpark wordcount against the API without compilation by ru ``` gcloud dataproc jobs submit pyspark --cluster "$MY_CLUSTER" \ - --jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.35.1.jar \ + --jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.36.0.jar \ examples/python/shakespeare.py ``` @@ -879,6 +880,15 @@ word-break:break-word Read + + bigQueryJobTimeoutInMinutes + + Config to set the BigQuery job timeout in minutes. + Default value is 360 minutes. +
(Optional) + + Read/Write + @@ -1151,7 +1161,7 @@ using the following code: ```python from pyspark.sql import SparkSession spark = SparkSession.builder \ - .config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.35.1") \ + .config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.36.0") \ .getOrCreate() df = spark.read.format("bigquery") \ .load("dataset.table") @@ -1160,7 +1170,7 @@ df = spark.read.format("bigquery") \ **Scala:** ```scala val spark = SparkSession.builder -.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.35.1") +.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.36.0") .getOrCreate() val df = spark.read.format("bigquery") .load("dataset.table") @@ -1168,7 +1178,7 @@ val df = spark.read.format("bigquery") In case Spark cluster is using Scala 2.12 (it's optional for Spark 2.4.x, mandatory in 3.0.x), then the relevant package is -com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.35.1. In +com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.36.0. In order to know which Scala version is used, please run the following code: **Python:** @@ -1192,14 +1202,14 @@ To include the connector in your project: com.google.cloud.spark spark-bigquery-with-dependencies_${scala.version} - 0.35.1 + 0.36.0 ``` ### SBT ```sbt -libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.35.1" +libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.36.0" ``` ### Connector metrics and how to view them @@ -1244,7 +1254,7 @@ word-break:break-word -**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.35.1.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above. +**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.36.0.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above. ## FAQ