diff --git a/README.md b/README.md index 8b15743..7832edb 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,6 @@ This project houses the **experimental** client for [Spark Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html) for [Apache Spark](https://spark.apache.org/) written in [Golang](https://go.dev/). - ## Current State of the Project Currently, the Spark Connect client for Golang is highly experimental and should @@ -13,33 +12,42 @@ project reserves the right to withdraw and abandon the development of this proje if it is not sustainable. ## Getting started + +This section explains how to run Spark Connect Go locally. + +Step 1: Install Golang: https://go.dev/doc/install. + +Step 2: Ensure you have installed `buf CLI` installed, [more info here](https://buf.build/docs/installation/) + +Step 3: Run the following commands to setup the Spark Connect client. + ``` git clone https://github.com/apache/spark-connect-go.git git submodule update --init --recursive make gen && make test ``` -> Ensure you have installed `buf CLI`; [more info](https://buf.build/docs/installation/) -## How to write Spark Connect Go Application in your own project +Step 4: Setup the Spark Driver on localhost. -See [Quick Start Guide](quick-start.md) +1. [Download Spark distribution](https://spark.apache.org/downloads.html) (3.4.0+), unzip the package. -## Spark Connect Go Application Example +2. Start the Spark Connect server with the following command (make sure to use a package version that matches your Spark distribution): -A very simple example in Go looks like following: +``` +sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0 +``` + +Step 5: Run the example Go application. ``` -func main() { - remote := "localhost:15002" - spark, _ := sql.SparkSession.Builder.Remote(remote).Build() - defer spark.Stop() - - df, _ := spark.Sql("select 'apple' as word, 123 as count union all select 'orange' as word, 456 as count") - df.Show(100, false) -} +go run cmd/spark-connect-example-spark-session/main.go ``` +## How to write Spark Connect Go Application in your own project + +See [Quick Start Guide](quick-start.md) + ## High Level Design Following [diagram](https://textik.com/#ac299c8f32c4c342) shows main code in current prototype: @@ -66,7 +74,6 @@ Following [diagram](https://textik.com/#ac299c8f32c4c342) shows main code in cur | SparkConnectServiceClient |--------------+| Spark Driver | | | | | +---------------------------+ +----------------+ - ``` `SparkConnectServiceClient` is GRPC client which talks to Spark Driver. `sparkSessionImpl` generates `dataFrameImpl` @@ -75,24 +82,6 @@ instances. `dataFrameImpl` uses the GRPC client in `sparkSessionImpl` to communi We will mimic the logic in Spark Connect Scala implementation, and adopt Go common practices, e.g. returning `error` object for error handling. -## How to Run Spark Connect Go Application - -1. Install Golang: https://go.dev/doc/install. - -2. Download Spark distribution (3.4.0+), unzip the folder. - -3. Start Spark Connect server by running command: - -``` -sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0 -``` - -4. In this repo, run Go application: - -``` -go run cmd/spark-connect-example-spark-session/main.go -``` - ## Contributing Please review the [Contribution to Spark guide](https://spark.apache.org/contributing.html)