Skip to content

Commit

Permalink
Release 0.36.0.
Browse files Browse the repository at this point in the history
  • Loading branch information
kokoro-team committed Jan 25, 2024
1 parent 95fa9ab commit 59f1b91
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 36 deletions.
2 changes: 1 addition & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Release Notes

## Next
## 0.36.0 - 2024-01-25

* PR #1155: allow lazy materialization of query on load
* PR #1163: Added config to set the BigQuery Job timeout
Expand Down
80 changes: 45 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,14 @@ The latest version of the connector is publicly available in the following links

| version | Link |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Spark 3.5 | `gs://spark-lib/bigquery/spark-3.5-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.35.1.jar)) |
| Spark 3.4 | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.35.1.jar)) |
| Spark 3.3 | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.35.1.jar)) |
| Spark 3.2 | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.35.1.jar)) |
| Spark 3.1 | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.35.1.jar)) |
| Spark 2.4 | `gs://spark-lib/bigquery/spark-2.4-bigquery-0.35.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.35.1.jar)) |
| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.35.1.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.35.1.jar)) |
| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.35.1.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.35.1.jar)) |
| Spark 3.5 | `gs://spark-lib/bigquery/spark-3.5-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.36.0.jar)) |
| Spark 3.4 | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.36.0.jar)) |
| Spark 3.3 | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.36.0.jar)) |
| Spark 3.2 | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.36.0.jar)) |
| Spark 3.1 | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.36.0.jar)) |
| Spark 2.4 | `gs://spark-lib/bigquery/spark-2.4-bigquery-0.36.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.36.0.jar)) |
| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.36.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.36.0.jar)) |
| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.36.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.36.0.jar)) |
| Scala 2.11 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar)) |

The first four versions are Java based connectors targeting Spark 2.4/3.1/3.2/3.3 of all Scala versions built on the new
Expand All @@ -87,16 +87,17 @@ below.
| spark-bigquery-with-dependencies_2.11 | ✓ | ✓ | | | | | | |

### Connector to Dataproc Image Compatibility Matrix
| Connector \ Dataproc Image | 1.3 | 1.4 | 1.5 | 2.0 | 2.1 | Serverless<br>Image 1.0 | Serverless<br>Image 2.0 | Serverless<br>Image 2.1 |
|---------------------------------------|---------|---------|---------|---------|---------|-------------------------|-------------------------|-------------------------|
| spark-3.4-bigquery | | | | | | | | &check; |
| spark-3.3-bigquery | | | | | &check; | &check; | &check; | &check; |
| spark-3.2-bigquery | | | | | &check; | &check; | &check; | &check; |
| spark-3.1-bigquery | | | | &check; | &check; | &check; | &check; | &check; |
| spark-2.4-bigquery | | &check; | &check; | | | | | |
| spark-bigquery-with-dependencies_2.13 | | | | | | | &check; | &check; |
| spark-bigquery-with-dependencies_2.12 | | | &check; | &check; | &check; | &check; | | |
| spark-bigquery-with-dependencies_2.11 | &check; | &check; | | | | | | |
| Connector \ Dataproc Image | 1.3 | 1.4 | 1.5 | 2.0 | 2.1 | 2.2 | Serverless<br>Image 1.0 | Serverless<br>Image 2.0 | Serverless<br>Image 2.1 | Serverless<br>Image 2.2 |
|---------------------------------------|---------|---------|---------|---------|---------|---------|-------------------------|-------------------------|-------------------------|-------------------------|
| spark-3.5-bigquery | | | | | | &check; | | | | &check; |
| spark-3.4-bigquery | | | | | | &check; | | | &check; | &check; |
| spark-3.3-bigquery | | | | | &check; | &check; | &check; | &check; | &check; | &check; |
| spark-3.2-bigquery | | | | | &check; | &check; | &check; | &check; | &check; | &check; |
| spark-3.1-bigquery | | | | &check; | &check; | &check; | &check; | &check; | &check; | &check; |
| spark-2.4-bigquery | | &check; | &check; | | | | | | | |
| spark-bigquery-with-dependencies_2.13 | | | | | | | | &check; | &check; | &check; |
| spark-bigquery-with-dependencies_2.12 | | | &check; | &check; | &check; | &check; | &check; | | | |
| spark-bigquery-with-dependencies_2.11 | &check; | &check; | | | | | | | | |

### Maven / Ivy Package Usage
The connector is also available from the
Expand All @@ -106,14 +107,14 @@ repository. It can be used using the `--packages` option or the

| version | Connector Artifact |
|------------|------------------------------------------------------------------------------------|
| Spark 3.5 | `com.google.cloud.spark:spark-3.5-bigquery:0.35.1` |
| Spark 3.4 | `com.google.cloud.spark:spark-3.4-bigquery:0.35.1` |
| Spark 3.3 | `com.google.cloud.spark:spark-3.3-bigquery:0.35.1` |
| Spark 3.2 | `com.google.cloud.spark:spark-3.2-bigquery:0.35.1` |
| Spark 3.1 | `com.google.cloud.spark:spark-3.1-bigquery:0.35.1` |
| Spark 2.4 | `com.google.cloud.spark:spark-2.4-bigquery:0.35.1` |
| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.35.1` |
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.35.1` |
| Spark 3.5 | `com.google.cloud.spark:spark-3.5-bigquery:0.36.0` |
| Spark 3.4 | `com.google.cloud.spark:spark-3.4-bigquery:0.36.0` |
| Spark 3.3 | `com.google.cloud.spark:spark-3.3-bigquery:0.36.0` |
| Spark 3.2 | `com.google.cloud.spark:spark-3.2-bigquery:0.36.0` |
| Spark 3.1 | `com.google.cloud.spark:spark-3.1-bigquery:0.36.0` |
| Spark 2.4 | `com.google.cloud.spark:spark-2.4-bigquery:0.36.0` |
| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.36.0` |
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.36.0` |
| Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.29.0` |

### Specifying the Spark BigQuery connector version in a Dataproc cluster
Expand All @@ -123,8 +124,8 @@ Using the standard `--jars` or `--packages` (or alternatively, the `spark.jars`/

To use another version than the built-in one, please do one of the following:

* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.35.1`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.35.1.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.35.1`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.35.1.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.36.0`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.36.0.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.36.0`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.36.0.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.

## Hello World Example

Expand All @@ -134,7 +135,7 @@ You can run a simple PySpark wordcount against the API without compilation by ru

```
gcloud dataproc jobs submit pyspark --cluster "$MY_CLUSTER" \
--jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.35.1.jar \
--jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.36.0.jar \
examples/python/shakespeare.py
```

Expand Down Expand Up @@ -879,6 +880,15 @@ word-break:break-word
</td>
<td>Read</td>
</tr>
<tr>
<td><code>bigQueryJobTimeoutInMinutes</code>
</td>
<td>Config to set the BigQuery job timeout in minutes.
Default value is <code>360</code> minutes.
<br/> (Optional)
</td>
<td>Read/Write</td>
</tr>

</table>

Expand Down Expand Up @@ -1151,7 +1161,7 @@ using the following code:
```python
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.35.1") \
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.36.0") \
.getOrCreate()
df = spark.read.format("bigquery") \
.load("dataset.table")
Expand All @@ -1160,15 +1170,15 @@ df = spark.read.format("bigquery") \
**Scala:**
```scala
val spark = SparkSession.builder
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.35.1")
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.36.0")
.getOrCreate()
val df = spark.read.format("bigquery")
.load("dataset.table")
```

In case Spark cluster is using Scala 2.12 (it's optional for Spark 2.4.x,
mandatory in 3.0.x), then the relevant package is
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.35.1. In
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.36.0. In
order to know which Scala version is used, please run the following code:

**Python:**
Expand All @@ -1192,14 +1202,14 @@ To include the connector in your project:
<dependency>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-with-dependencies_${scala.version}</artifactId>
<version>0.35.1</version>
<version>0.36.0</version>
</dependency>
```

### SBT

```sbt
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.35.1"
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.36.0"
```

### Connector metrics and how to view them
Expand Down Expand Up @@ -1244,7 +1254,7 @@ word-break:break-word
</table>


**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.35.1.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.
**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.36.0.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.

## FAQ

Expand Down

0 comments on commit 59f1b91

Please sign in to comment.