|
6 | 6 |
|
7 | 7 | spark-web-proxy acts as a reverse proxy for [Spark History Server](https://spark.apache.org/docs/latest/monitoring.html) and [Spark UI](https://spark.apache.org/docs/latest/web-ui.html). It completes [Spark History Server](https://spark.apache.org/docs/latest/monitoring.html) by seamlessly integrating live (running) Spark applications UIs. The web proxy enables real-time dynamic discovery and monitoring of running spark applications (without delay) alongside completed applications, all within your existing Spark History Server Web UI. |
8 | 8 |
|
9 | | -The proxy is non-intrusive and independent of any specific version of Spark History Server or Spark. It supports all Spark application deployment modes, including Kubernetes Jobs, Spark Operator, Jupyter Spark notebooks, etc. |
| 9 | +The proxy is non-intrusive and independent of any specific version of Spark History Server or Spark. It supports all Spark application deployment modes, including Kubernetes Jobs, Spark Operator, notebooks (Jupyter, etc), etc. |
10 | 10 |
|
11 | 11 |  |
12 | 12 |
|
@@ -59,17 +59,50 @@ For more configuration properties, refer to [Spark Monitoring](https://spark.apa |
59 | 59 |
|
60 | 60 | ## Spark jobs deployment |
61 | 61 |
|
62 | | -In a cluster mode, spark by default adds the label `spark-role: driver` in the spark driver pods. |
| 62 | +### Cluster mode |
63 | 63 |
|
64 | | -In a client mode, add the following label into your driver pods: |
| 64 | +In a cluster mode, `no additional configuration` is needed as spark by default adds the label `spark-role: driver` and the `spark-ui` port in the spark driver pods as shown in the following: |
65 | 65 |
|
66 | 66 | ```yaml |
67 | | -kind: ... |
| 67 | +apiVersion: v1 |
| 68 | +kind: Pod |
68 | 69 | metadata: |
69 | 70 | labels: |
70 | 71 | ... |
71 | 72 | spark-role: driver |
| 73 | +spec: |
| 74 | + containers: |
| 75 | + - args: |
| 76 | + - driver |
| 77 | + name: spark-kubernetes-driver |
| 78 | + ports: |
72 | 79 | ... |
| 80 | + - containerPort: 4040 |
| 81 | + name: spark-ui |
| 82 | + protocol: TCP |
| 83 | +``` |
| 84 | +
|
| 85 | +### Notebooks and Client mode |
| 86 | +
|
| 87 | +In a client mode, the web proxy relies on [/api/v1/applications/[app-id]/environment](https://spark.apache.org/docs/latest/monitoring.html) Spark History Rest API to get the Spark driver IP and UI port and [/api/v1/applications/[app-id]](https://spark.apache.org/docs/latest/monitoring.html) to get the application status. |
| 88 | +
|
| 89 | +By default, Spark does not not render the property `spark.ui.port` in the environment properties. So, you should set the property during the job submission or using a listener. |
| 90 | + |
| 91 | +Here is an example of how to set the `spark.ui.port` on a jupyter notebook: |
| 92 | + |
| 93 | +```python |
| 94 | +import socket |
| 95 | +def find_available_port(start_port=4041, max_port=4100): |
| 96 | + """Find the next available port starting from start_port.""" |
| 97 | + for port in range(start_port, max_port): |
| 98 | + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: |
| 99 | + if s.connect_ex(("localhost", port)) != 0: |
| 100 | + return port |
| 101 | + raise Exception(f"No available ports found in range {start_port}-{max_port}") |
| 102 | +``` |
| 103 | + |
| 104 | +```python |
| 105 | +conf.set("spark.ui.port", find_available_port()) |
73 | 106 | ``` |
74 | 107 |
|
75 | 108 | ## Authentication |
|
0 commit comments