Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR] Make readme easier to follow #18

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 22 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ This project houses the **experimental** client for [Spark
Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html) for
[Apache Spark](https://spark.apache.org/) written in [Golang](https://go.dev/).


## Current State of the Project

Currently, the Spark Connect client for Golang is highly experimental and should
Expand All @@ -13,33 +12,42 @@ project reserves the right to withdraw and abandon the development of this proje
if it is not sustainable.

## Getting started

This section explains how to run Spark Connect Go locally.

Step 1: Install Golang: https://go.dev/doc/install.

Step 2: Ensure you have installed `buf CLI` installed, [more info here](https://buf.build/docs/installation/)

Step 3: Run the following commands to setup the Spark Connect client.

```
git clone https://github.com/apache/spark-connect-go.git
git submodule update --init --recursive

make gen && make test
```
> Ensure you have installed `buf CLI`; [more info](https://buf.build/docs/installation/)

## How to write Spark Connect Go Application in your own project
Step 4: Setup the Spark Driver on localhost.

See [Quick Start Guide](quick-start.md)
1. [Download Spark distribution](https://spark.apache.org/downloads.html) (3.4.0+), unzip the package.

## Spark Connect Go Application Example
2. Start the Spark Connect server with the following command (make sure to use a package version that matches your Spark distribution):

A very simple example in Go looks like following:
```
sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0
```

Step 5: Run the example Go application.

```
func main() {
remote := "localhost:15002"
spark, _ := sql.SparkSession.Builder.Remote(remote).Build()
defer spark.Stop()

df, _ := spark.Sql("select 'apple' as word, 123 as count union all select 'orange' as word, 456 as count")
df.Show(100, false)
}
go run cmd/spark-connect-example-spark-session/main.go
```

## How to write Spark Connect Go Application in your own project

See [Quick Start Guide](quick-start.md)

## High Level Design

Following [diagram](https://textik.com/#ac299c8f32c4c342) shows main code in current prototype:
Expand All @@ -66,7 +74,6 @@ Following [diagram](https://textik.com/#ac299c8f32c4c342) shows main code in cur
| SparkConnectServiceClient |--------------+| Spark Driver |
| | | |
+---------------------------+ +----------------+

```

`SparkConnectServiceClient` is GRPC client which talks to Spark Driver. `sparkSessionImpl` generates `dataFrameImpl`
Expand All @@ -75,24 +82,6 @@ instances. `dataFrameImpl` uses the GRPC client in `sparkSessionImpl` to communi
We will mimic the logic in Spark Connect Scala implementation, and adopt Go common practices, e.g. returning `error` object for
error handling.

## How to Run Spark Connect Go Application

1. Install Golang: https://go.dev/doc/install.

2. Download Spark distribution (3.4.0+), unzip the folder.

3. Start Spark Connect server by running command:

```
sbin/start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.4.0
```

4. In this repo, run Go application:

```
go run cmd/spark-connect-example-spark-session/main.go
```

## Contributing

Please review the [Contribution to Spark guide](https://spark.apache.org/contributing.html)
Expand Down
Loading