Skip to content

Commit 2f520e9

Browse files
committed
chore: update readme
1 parent 69c773f commit 2f520e9

File tree

1 file changed

+37
-4
lines changed

1 file changed

+37
-4
lines changed

Readme.md

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
spark-web-proxy acts as a reverse proxy for [Spark History Server](https://spark.apache.org/docs/latest/monitoring.html) and [Spark UI](https://spark.apache.org/docs/latest/web-ui.html). It completes [Spark History Server](https://spark.apache.org/docs/latest/monitoring.html) by seamlessly integrating live (running) Spark applications UIs. The web proxy enables real-time dynamic discovery and monitoring of running spark applications (without delay) alongside completed applications, all within your existing Spark History Server Web UI.
88

9-
The proxy is non-intrusive and independent of any specific version of Spark History Server or Spark. It supports all Spark application deployment modes, including Kubernetes Jobs, Spark Operator, Jupyter Spark notebooks, etc.
9+
The proxy is non-intrusive and independent of any specific version of Spark History Server or Spark. It supports all Spark application deployment modes, including Kubernetes Jobs, Spark Operator, notebooks (Jupyter, etc), etc.
1010

1111
![Spark History](docs/images/spark-history.png)
1212

@@ -59,17 +59,50 @@ For more configuration properties, refer to [Spark Monitoring](https://spark.apa
5959

6060
## Spark jobs deployment
6161

62-
In a cluster mode, spark by default adds the label `spark-role: driver` in the spark driver pods.
62+
### Cluster mode
6363

64-
In a client mode, add the following label into your driver pods:
64+
In a cluster mode, `no additional configuration` is needed as spark by default adds the label `spark-role: driver` and the `spark-ui` port in the spark driver pods as shown in the following:
6565

6666
```yaml
67-
kind: ...
67+
apiVersion: v1
68+
kind: Pod
6869
metadata:
6970
labels:
7071
...
7172
spark-role: driver
73+
spec:
74+
containers:
75+
- args:
76+
- driver
77+
name: spark-kubernetes-driver
78+
ports:
7279
...
80+
- containerPort: 4040
81+
name: spark-ui
82+
protocol: TCP
83+
```
84+
85+
### Notebooks and Client mode
86+
87+
In a client mode, the web proxy relies on [/api/v1/applications/[app-id]/environment](https://spark.apache.org/docs/latest/monitoring.html) Spark History Rest API to get the Spark driver IP and UI port and [/api/v1/applications/[app-id]](https://spark.apache.org/docs/latest/monitoring.html) to get the application status.
88+
89+
By default, Spark does not not render the property `spark.ui.port` in the environment properties. So, you should set the property during the job submission or using a listener.
90+
91+
Here is an example of how to set the `spark.ui.port` on a jupyter notebook:
92+
93+
```python
94+
import socket
95+
def find_available_port(start_port=4041, max_port=4100):
96+
"""Find the next available port starting from start_port."""
97+
for port in range(start_port, max_port):
98+
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
99+
if s.connect_ex(("localhost", port)) != 0:
100+
return port
101+
raise Exception(f"No available ports found in range {start_port}-{max_port}")
102+
```
103+
104+
```python
105+
conf.set("spark.ui.port", find_available_port())
73106
```
74107

75108
## Authentication

0 commit comments

Comments
 (0)