Skip to content

Commit

Permalink
Release 0.34.0.
Browse files Browse the repository at this point in the history
  • Loading branch information
kokoro-team committed Oct 31, 2023
1 parent 17dbe36 commit 155470c
Show file tree
Hide file tree
Showing 2 changed files with 46 additions and 26 deletions.
2 changes: 1 addition & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Release Notes

## Next
## 0.34.0 - 2023-10-31

* PR #1057: Enable async writes for greater throughput
* PR #1094: CVE-2023-5072: Upgrading the org.json:json dependency
Expand Down
70 changes: 45 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ The latest version of the connector is publicly available in the following links

| version | Link |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Spark 3.4 | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.33.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.33.0.jar)) |
| Spark 3.3 | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.33.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.33.0.jar)) |
| Spark 3.2 | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.33.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.33.0.jar)) |
| Spark 3.1 | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.33.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.33.0.jar)) |
| Spark 2.4 | `gs://spark-lib/bigquery/spark-2.4-bigquery-0.33.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.33.0.jar)) |
| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.33.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.33.0.jar)) |
| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.33.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.33.0.jar)) |
| Spark 3.4 | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.34.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.34.0.jar)) |
| Spark 3.3 | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.34.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.34.0.jar)) |
| Spark 3.2 | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.34.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.34.0.jar)) |
| Spark 3.1 | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.34.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.34.0.jar)) |
| Spark 2.4 | `gs://spark-lib/bigquery/spark-2.4-bigquery-0.34.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.34.0.jar)) |
| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.34.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.34.0.jar)) |
| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.34.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.34.0.jar)) |
| Scala 2.11 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar)) |

The first four versions are Java based connectors targeting Spark 2.4/3.1/3.2/3.3 of all Scala versions built on the new
Expand Down Expand Up @@ -104,13 +104,13 @@ repository. It can be used using the `--packages` option or the

| version | Connector Artifact |
|------------|------------------------------------------------------------------------------------|
| Spark 3.4 | `com.google.cloud.spark:spark-3.4-bigquery:0.33.0` |
| Spark 3.3 | `com.google.cloud.spark:spark-3.3-bigquery:0.33.0` |
| Spark 3.2 | `com.google.cloud.spark:spark-3.2-bigquery:0.33.0` |
| Spark 3.1 | `com.google.cloud.spark:spark-3.1-bigquery:0.33.0` |
| Spark 2.4 | `com.google.cloud.spark:spark-2.4-bigquery:0.33.0` |
| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.33.0` |
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.33.0` |
| Spark 3.4 | `com.google.cloud.spark:spark-3.4-bigquery:0.34.0` |
| Spark 3.3 | `com.google.cloud.spark:spark-3.3-bigquery:0.34.0` |
| Spark 3.2 | `com.google.cloud.spark:spark-3.2-bigquery:0.34.0` |
| Spark 3.1 | `com.google.cloud.spark:spark-3.1-bigquery:0.34.0` |
| Spark 2.4 | `com.google.cloud.spark:spark-2.4-bigquery:0.34.0` |
| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.34.0` |
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.34.0` |
| Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.29.0` |

### Specifying the Spark BigQuery connector version in a Dataproc cluster
Expand All @@ -120,8 +120,8 @@ Using the standard `--jars` or `--packages` (or alternatively, the `spark.jars`/

To use another version than the built-in one, please do one of the following:

* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.33.0`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.33.0.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.33.0`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.33.0.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.34.0`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.34.0.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.34.0`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.34.0.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.

## Hello World Example

Expand All @@ -131,7 +131,7 @@ You can run a simple PySpark wordcount against the API without compilation by ru

```
gcloud dataproc jobs submit pyspark --cluster "$MY_CLUSTER" \
--jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.33.0.jar \
--jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.34.0.jar \
examples/python/shakespeare.py
```

Expand Down Expand Up @@ -281,8 +281,8 @@ df.write \
```

Writing to existing partitioned tables (date partitioned, ingestion time partitioned and range
partitioned) in APPEND save mode is fully supported by the connector and the BigQuery Storage Write
API. Partition overwrite and the use of `datePartition`, `partitionField`, `partitionType`, `partitionRangeStart`, `partitionRangeEnd`, `partitionRangeInterval` as
partitioned) in APPEND save mode and OVERWRITE mode (only date and range partitioned) is fully supported by the connector and the BigQuery Storage Write
API. The use of `datePartition`, `partitionField`, `partitionType`, `partitionRangeStart`, `partitionRangeEnd`, `partitionRangeInterval`
described below is not supported at this moment by the direct write method.

**Important:** Please refer to the [data ingestion pricing](https://cloud.google.com/bigquery/pricing#data_ingestion_pricing)
Expand Down Expand Up @@ -860,6 +860,26 @@ word-break:break-word
</td>
<td>Read</td>
</tr>
<tr>
<td><code>spark.sql.sources.partitionOverwriteMode</code>
</td>
<td>Config to specify the overwrite mode on write when the table is range/time partitioned.
Currently supportd two modes : <code>static</code> and <code>dynamic</code>. In <code>static</code> mode,
the entire table is overwritten. In <code>dynamic</code> mode, the data is overwritten by partitions of the existing table.
The default value is static.
<br/> (Optional)
</td>
<td>Write</td>
</tr>
<tr>
<td><code>enableReadSessionCaching</code>
</td>
<td>Boolean config to disable read session caching. Caches BigQuery read sessions to allow for faster Spark query planning.
Default value is <code>true</code>.
<br/> (Optional)
</td>
<td>Read</td>
</tr>

</table>

Expand Down Expand Up @@ -1128,7 +1148,7 @@ using the following code:
```python
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.33.0") \
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.34.0") \
.getOrCreate()
df = spark.read.format("bigquery") \
.load("dataset.table")
Expand All @@ -1137,15 +1157,15 @@ df = spark.read.format("bigquery") \
**Scala:**
```scala
val spark = SparkSession.builder
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.33.0")
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.34.0")
.getOrCreate()
val df = spark.read.format("bigquery")
.load("dataset.table")
```

In case Spark cluster is using Scala 2.12 (it's optional for Spark 2.4.x,
mandatory in 3.0.x), then the relevant package is
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.33.0. In
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.34.0. In
order to know which Scala version is used, please run the following code:

**Python:**
Expand All @@ -1169,14 +1189,14 @@ To include the connector in your project:
<dependency>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-with-dependencies_${scala.version}</artifactId>
<version>0.33.0</version>
<version>0.34.0</version>
</dependency>
```

### SBT

```sbt
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.33.0"
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.34.0"
```

### Connector metrics and how to view them
Expand Down Expand Up @@ -1221,7 +1241,7 @@ word-break:break-word
</table>


**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.33.0.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.
**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.34.0.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.

## FAQ

Expand Down

0 comments on commit 155470c

Please sign in to comment.