Skip to content

Commit

Permalink
preparing release 0.23.0
Browse files Browse the repository at this point in the history
  • Loading branch information
davidrabinowitz committed Dec 6, 2021
1 parent b17eeb0 commit 0afc1ab
Show file tree
Hide file tree
Showing 32 changed files with 166 additions and 71 deletions.
20 changes: 20 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# Release Notes

## 0.23.0 - 2021-12-06
* New connector: A Java only connector implementing the Spark 2.4 APIs
* PR #469: Added support for the BigQuery Storage Write API, allowing faster
writes (Spark 2.4 connector only)
* Issue #481: Added configuration option to use compression from the READ API
for Arrow
* BigQuery API has been upgraded to version 2.1.8
* BigQuery Storage API has been upgraded to version 2.1.2
* gRPC has been upgraded to version 1.41.0

## 0.22.2 - 2021-09-22
* Issue #446: BigNumeric values are properly written to BigQuery
* Issue #452: Adding the option to clean BigQueryClient.destinationTableCache
* BigQuery API has been upgraded to version 2.1.12
* BigQuery Storage API has been upgraded to version 2.3.1
* gRPC has been upgraded to version 1.40.0

## 0.22.1 - 2021-09-08
* Issue #444: allowing unpartitioned clustered table

## 0.22.0 - 2021-06-22
* PR #404: Added support for BigNumeric
* PR #430: Added HTTP and gRPC proxy support
Expand Down
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,11 @@ The latest version of the connector is publicly available in the following links

| version | Link |
| --- | --- |
| Scala 2.11 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.22.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.22.0.jar)) |
| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.22.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.22.0.jar)) |
| Spark 2.4 | `gs://spark-lib/bigquery/spark-bigquery-spark24-0.22.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-spark24-0.22.0.jar)) |
| Scala 2.11 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.23.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.23.0.jar)) |
| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.23.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.23.0.jar)) |
| Spark 2.4 | `gs://spark-lib/bigquery/spark-2.4-bigquery-0.23.0-preview.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.23.0-preview.jar)) |

**Note:** If you are using scala jars please use the jar as per the scala version. From Spark 2.4 onwards there is an
**Note:** If you are using scala jars please use the jar relevant to your Spark installation. Starting from Spark 2.4 onwards there is an
option to use the Java only jar.

The connector is also available from the
Expand All @@ -82,9 +82,9 @@ repository. It can be used using the `--packages` option or the

| version | Connector Artifact |
| --- | --- |
| Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.22.0` |
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.22.0` |
| Spark 2.4 | `com.google.cloud.spark:spark-bigquery:spark24-0.22.0` |
| Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.23.0` |
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.23.0` |
| Spark 2.4 | `com.google.cloud.spark:spark-2.4-bigquery:0.23.0-preview` |

If you want to keep up with the latest version of the connector the following links can be used. Notice that for production
environments where the connector version should be pinned, one of the above links should be used.
Expand All @@ -93,7 +93,7 @@ environments where the connector version should be pinned, one of the above link
| --- | --- |
| Scala 2.11 | `gs://spark-lib/bigquery/spark-bigquery-latest_2.11.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-latest_2.11.jar)) |
| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-latest_2.12.jar)) |
| Spark 2.4 | `gs://spark-lib/bigquery/spark-bigquery-latest-spark24.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-latest-spark24.jar)) |
| Spark 2.4 | `gs://spark-lib/bigquery/spark-2.4-bigquery-latest.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-latest.jar)) |

## Hello World Example

Expand Down Expand Up @@ -254,7 +254,7 @@ df.write \
page regarding the BigQuery Storage Write API pricing.

#### Indirect write
This method is supported by all the connector. In this method the data is written first to GCS and then
This method is supported by all the connector. In this method the data is written first to GCS and then
it is loaded it to BigQuery. A GCS bucket must be configured to indicate the temporary data location.

```
Expand All @@ -264,7 +264,7 @@ df.write \
.save("dataset.table")
```

The data is temporarily stored using the [Apache Parquet](https://parquet.apache.org/),
The data is temporarily stored using the [Apache Parquet](https://parquet.apache.org/),
[Apache ORC](https://orc.apache.org/) or [Apache Avro](https://avro.apache.org/) formats.

The GCS bucket and the format can also be set globally using Spark's RuntimeConfig like this:
Expand Down Expand Up @@ -449,7 +449,7 @@ The API Supports a number of options to configure the read
in which the data is written to BigQuery. Available values are <code>direct</code>
to use the BigQuery Storage Write API and <code>indirect</code> which writes the
data first to GCS and then triggers a BigQuery load operation. See more
<a href="#writing-data-to-bigquery">here</a>
<a href="#writing-data-to-bigquery">here</a>
<br/>(Optional, defaults to <code>indirect</code>)
</td>
<td>Write (supported only by the Spark 2.4 dedicated connector)</td>
Expand Down Expand Up @@ -800,9 +800,9 @@ creating the job or added during runtime. See examples below:
1) Adding python files while launching pyspark
```
# use appropriate version for jar depending on the scala version
pyspark --jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.22.0.jar
--py-files gs://spark-lib/bigquery/spark-bigquery-support-0.22.0.zip
--files gs://spark-lib/bigquery/spark-bigquery-support-0.22.0.zip
pyspark --jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.23.0.jar
--py-files gs://spark-lib/bigquery/spark-bigquery-support-0.23.0.zip
--files gs://spark-lib/bigquery/spark-bigquery-support-0.23.0.zip
```

2) Adding python files in Jupyter Notebook
Expand All @@ -811,9 +811,9 @@ from pyspark.sql import SparkSession
# use appropriate version for jar depending on the scala version
spark = SparkSession.builder\
.appName('BigNumeric')\
.config('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.22.0.jar')\
.config('spark.submit.pyFiles', 'gs://spark-lib/bigquery/spark-bigquery-support-0.22.0.zip')\
.config('spark.files', 'gs://spark-lib/bigquery/spark-bigquery-support-0.22.0.zip')\
.config('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.23.0.jar')\
.config('spark.submit.pyFiles', 'gs://spark-lib/bigquery/spark-bigquery-support-0.23.0.zip')\
.config('spark.files', 'gs://spark-lib/bigquery/spark-bigquery-support-0.23.0.zip')\
.getOrCreate()
```

Expand All @@ -822,10 +822,10 @@ spark = SparkSession.builder\
# use appropriate version for jar depending on the scala version
spark = SparkSession.builder\
.appName('BigNumeric')\
.config('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.22.0.jar')\
.config('spark.jars', 'gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.23.0.jar')\
.getOrCreate()
spark.sparkContext.addPyFile("gs://spark-lib/bigquery/spark-bigquery-support-0.22.0.zip")
spark.sparkContext.addPyFile("gs://spark-lib/bigquery/spark-bigquery-support-0.23.0.zip")
```

Usage Example:
Expand Down Expand Up @@ -908,7 +908,7 @@ using the following code:
```python
from pyspark.sql import SparkSession
spark = SparkSession.builder\
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.22.0")\
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.23.0")\
.getOrCreate()
df = spark.read.format("bigquery")\
.load("dataset.table")
Expand All @@ -917,15 +917,15 @@ df = spark.read.format("bigquery")\
**Scala:**
```python
val spark = SparkSession.builder
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.22.0")
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.23.0")
.getOrCreate()
val df = spark.read.format("bigquery")
.load("dataset.table")
```

In case Spark cluster is using Scala 2.12 (it's optional for Spark 2.4.x,
mandatory in 3.0.x), then the relevant package is
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.22.0. In
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.23.0. In
order to know which Scala version is used, please run the following code:

**Python:**
Expand All @@ -949,14 +949,14 @@ To include the connector in your project:
<dependency>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-with-dependencies_${scala.version}</artifactId>
<version>0.22.0</version>
<version>0.23.0</version>
</dependency>
```

### SBT

```sbt
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.22.0"
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.23.0"
```

## FAQ
Expand Down
2 changes: 1 addition & 1 deletion bigquery-connector-common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>../spark-bigquery-parent</relativePath>
</parent>

Expand Down
14 changes: 7 additions & 7 deletions coverage/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>../spark-bigquery-parent</relativePath>
</parent>

Expand Down Expand Up @@ -82,8 +82,8 @@
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>spark-bigquery</artifactId>
<version>spark24-${project.version}</version>
<artifactId>spark-2.4-bigquery</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
Expand Down Expand Up @@ -179,8 +179,8 @@
<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>spark-bigquery</artifactId>
<version>spark24-${project.version}</version>
<artifactId>spark-2.4-bigquery</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
Expand All @@ -197,8 +197,8 @@
<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>spark-bigquery</artifactId>
<version>spark24-${project.version}</version>
<artifactId>spark-2.4-bigquery</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
Expand Down
12 changes: 10 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>spark-bigquery-parent</relativePath>
</parent>

Expand Down Expand Up @@ -51,7 +51,6 @@
<module>spark-bigquery-tests</module>
<module>spark-bigquery-connector-common</module>
<module>spark-bigquery-python-lib</module>
<module>coverage</module>
</modules>

<profiles>
Expand Down Expand Up @@ -133,5 +132,14 @@
<module>spark-bigquery-dsv2/spark-bigquery-spark3</module>
</modules>
</profile>
<profile>
<id>coverage</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<modules>
<module>coverage</module>
</modules>
</profile>
</profiles>
</project>
2 changes: 1 addition & 1 deletion spark-bigquery-connector-common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>../spark-bigquery-parent</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion spark-bigquery-dsv1/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>../spark-bigquery-parent</relativePath>
</parent>

Expand Down
21 changes: 20 additions & 1 deletion spark-bigquery-dsv1/spark-bigquery-dsv1-parent/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>../../spark-bigquery-parent</relativePath>
</parent>

Expand Down Expand Up @@ -171,6 +171,25 @@
</execution>
</executions>
</plugin>
<!-- generating empty javadoc jar, for Maven Central publishing -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<id>empty-javadoc-jar</id>
<phase>package</phase>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<classifier>javadoc</classifier>
<classesDirectory>${basedir}/src/build/javadoc</classesDirectory>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>../../spark-bigquery-parent</relativePath>
</parent>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>../../spark-bigquery-parent</relativePath>
</parent>
<artifactId>spark-bigquery-with-dependencies-parent</artifactId>
Expand Down Expand Up @@ -173,6 +173,25 @@
</execution>
</executions>
</plugin>
<!-- generating empty javadoc jar, for Maven Central publishing -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<id>empty-javadoc-jar</id>
<phase>package</phase>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<classifier>javadoc</classifier>
<classesDirectory>${basedir}/src/build/javadoc</classesDirectory>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-with-dependencies-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>../spark-bigquery-with-dependencies-parent</relativePath>
</parent>

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
In order to comply with Maven-Central requirements
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ public Scala211DataprocImage13AcceptanceTest() {

@BeforeClass
public static void setup() throws Exception {
context = DataprocAcceptanceTestBase.setup("1.3-debian10");
context = DataprocAcceptanceTestBase.setup("1.3-debian10", "spark-bigquery");
}

@AfterClass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ public Scala211DataprocImage14AcceptanceTest() {

@BeforeClass
public static void setup() throws Exception {
context = DataprocAcceptanceTestBase.setup("1.4-debian10");
context = DataprocAcceptanceTestBase.setup("1.4-debian10", "spark-bigquery");
}

@AfterClass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<parent>
<groupId>com.google.cloud.spark</groupId>
<artifactId>spark-bigquery-with-dependencies-parent</artifactId>
<version>0.30.0-SNAPSHOT</version>
<version>0.23.0</version>
<relativePath>../spark-bigquery-with-dependencies-parent</relativePath>
</parent>

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
In order to comply with Maven-Central requirements
Loading

0 comments on commit 0afc1ab

Please sign in to comment.