-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from OKDP/spark-history
Add Spark history helm chart
- Loading branch information
Showing
13 changed files
with
944 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,4 +29,4 @@ bin/ | |
.vscode/ | ||
|
||
# Other | ||
tmp/ | ||
tmp/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
ci/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
apiVersion: v2 | ||
name: spark-history-server | ||
description: A Helm chart for the Spark-History-Server | ||
version: 1.0.0 | ||
appVersion: "3.5.1" | ||
keywords: | ||
- spark | ||
- spark-history | ||
- spark-history-server | ||
home: https://spark.apache.org | ||
maintainers: | ||
- email: moncef.lahouar@kubotal.io | ||
name: mlahouar | ||
url: https://github.com/mlahouar | ||
sources: | ||
- https://github.com/apache/spark | ||
- https://github.com/OKDP/okdp-spark-auth-filter | ||
- https://github.com/OKDP/spark-images | ||
- https://github.com/okdp/charts/tree/main/charts/spark-history-server |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
# spark-history-server | ||
|
||
A Helm chart for the Spark-History-Server | ||
|
||
## Introduction | ||
|
||
This chart bootstraps a [Spark History Server](https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact) deployment using the [Helm](https://helm.sh) package manager. | ||
|
||
## Prerequisites | ||
|
||
- Helm >= 3 | ||
- Kubernetes >= 1.19 | ||
|
||
## Installing the chart | ||
|
||
To deploy the chart with a release name `my-release`: | ||
|
||
1. First, add the okdp helm charts repository: | ||
|
||
```shell | ||
$ helm repo add okdp https://okdp.github.io/charts | ||
``` | ||
|
||
2. Then, install the chart with the release name `my-release`: | ||
|
||
```shell | ||
$ helm install --version 1.0.0 \ | ||
my-release okdp/spark-history-server | ||
``` | ||
|
||
This will create a release of `my-release` in the default namespace. To install in a different one: | ||
|
||
```shell | ||
$ helm install --version 1.0.0 \ | ||
--namespace spark \ | ||
my-release okdp/spark-history-server | ||
``` | ||
|
||
Note that `helm` will fail to install if the namespace doesn't exist. Either create the namespace beforehand or pass the `--create-namespace` flag to the `helm install` command. | ||
|
||
## Uninstalling the chart `my-release` | ||
|
||
To uninstall `my-release`: | ||
|
||
```shell | ||
$ helm uninstall my-release -n spark | ||
``` | ||
|
||
The command removes all the Kubernetes components associated with the chart and deletes the release. | ||
## Configuration examples | ||
|
||
### Using an S3-Compatible System for Log Storage | ||
|
||
This example guides you through setting up the Spark History Server to store logs in an S3-compatible storage system. | ||
|
||
#### Authentication using accessKey and secretKey | ||
|
||
##### 1. Create a Kubernetes Secret for S3 Credentials | ||
|
||
This secret will store the access key and secret key required to access your S3 bucket. | ||
|
||
```shell | ||
kubectl create secret generic s3-secret \ | ||
--from-literal=accessKey='XXXXX' \ | ||
--from-literal=secretKey='XXXXX' \ | ||
-n spark | ||
``` | ||
|
||
Replace your-access-key and your-secret-key with your actual S3 credentials. | ||
|
||
##### 2. Prepare Configuration Values (my-values.yaml): | ||
|
||
```yaml | ||
config: | ||
spark.hadoop.fs.s3a.endpoint: https://<S3 endpoint> | ||
spark.hadoop.fs.s3a.connection.ssl.enabled: true | ||
spark.history.fs.logDirectory: s3a://<your-bucket-name>/eventLogs | ||
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem | ||
spark.hadoop.fs.s3a.path.style.access: true | ||
|
||
extraEnvs: | ||
- name: AWS_ACCESS_KEY_ID | ||
valueFrom: | ||
secretKeyRef: | ||
key: accessKey | ||
name: s3-secret | ||
- name: AWS_SECRET_ACCESS_KEY | ||
valueFrom: | ||
secretKeyRef: | ||
key: secretKey | ||
name: s3-secret | ||
# Disable Certificate Checking | ||
- name: SPARK_HISTORY_OPTS | ||
value: "-Dcom.amazonaws.sdk.disableCertChecking=true" | ||
``` | ||
##### 3. Deploy the Helm Chart | ||
```shell | ||
helm repo add okdp https://okdp.github.io/charts | ||
helm install --version 1.0.0 \ | ||
--namespace spark \ | ||
--values my-values.yaml \ | ||
my-release okdp/spark-history-server | ||
``` | ||
|
||
## Values | ||
|
||
| Key | Type | Default | Description | | ||
|-----|------|---------|-------------| | ||
| affinity | object | `{}` | | | ||
| commonAnnotations | object | `{}` | Common annotations for all deployed resources. Useful for tools that perform actions based on annotations. | | ||
| commonLabels | object | `{}` | Common labels for all deployed resources. Use this to apply a set of labels to all resources that are deployed. | | ||
| config."spark.history.custom.executor.log.url" | string | `""` | Specifies custom spark executor log URL for supporting external log service. | | ||
| config."spark.history.custom.executor.log.url.applyIncompleteApplication" | bool | `true` | Specifies whether to apply custom spark executor log URL to incomplete applications as well. | | ||
| config."spark.history.fs.cleaner.enabled" | bool | `false` | Specifies whether the History Server should periodically clean up event logs from storage. | | ||
| config."spark.history.fs.cleaner.interval" | string | `"1d"` | Specifies how often the filesystem job history cleaner checks for files to delete. | | ||
| config."spark.history.fs.cleaner.maxAge" | string | `"7d"` | Job history files older than this will be deleted when the cleaner runs. | | ||
| config."spark.history.fs.cleaner.maxNum" | int | `2147483647` | Specifies the maximum number of files in the event log directory (defaults to Int.MaxValue). | | ||
| config."spark.history.fs.driverlog.cleaner.enabled" | string | `"spark.history.fs.cleaner.enabled"` | Specifies whether the History Server should periodically clean up driver logs. | | ||
| config."spark.history.fs.driverlog.cleaner.interval" | string | `"spark.history.fs.cleaner.interval"` | Specifies how often the filesystem driver log cleaner checks for files to delete. | | ||
| config."spark.history.fs.driverlog.cleaner.maxAge" | string | `"spark.history.fs.cleaner.maxAge"` | Driver log files older than this will be deleted when the cleaner runs. | | ||
| config."spark.history.fs.endEventReparseChunkSize" | string | `"1m"` | How many bytes to parse at the end of log files looking for the end event. | | ||
| config."spark.history.fs.eventLog.rolling.maxFilesToRetain" | int | `2147483647` | The maximum number of event log files which will be retained as non-compacted (defaults to Int.MaxValue). | | ||
| config."spark.history.fs.inProgressOptimization.enabled" | bool | `true` | Enable optimized handling of in-progress logs. | | ||
| config."spark.history.fs.logDirectory" | string | `"file:/tmp/spark-events"` | For the filesystem history provider, the URL to the directory containing application event logs to load. | | ||
| config."spark.history.fs.update.batchSize" | int | `2147483647` | Specifies the batch size for updating new eventlog files (defaults to Int.MaxValue). | | ||
| config."spark.history.fs.update.interval" | string | `"10s"` | The period at which the filesystem history provider checks for new or updated logs. | | ||
| config."spark.history.kerberos.enabled" | bool | `false` | Indicates whether the history server should use kerberos to login. | | ||
| config."spark.history.kerberos.keytab" | string | `""` | Location of the kerberos keytab file for the History Server when kerberos is enabled. | | ||
| config."spark.history.kerberos.principal" | string | `""` | Kerberos principal name for the History Server when kerberos is enabled. | | ||
| config."spark.history.provider" | string | `"org.apache.spark.deploy.history.FsHistoryProvider"` | Name of the class implementing the application history backend. | | ||
| config."spark.history.retainedApplications" | int | `50` | The number of applications to retain UI data for in the cache. | | ||
| config."spark.history.store.hybridStore.diskBackend" | string | `"ROCKSDB"` | Specifies a disk-based store used in hybrid store. | | ||
| config."spark.history.store.hybridStore.enabled" | bool | `false` | Whether to use HybridStore as the store when parsing event logs. | | ||
| config."spark.history.store.hybridStore.maxMemoryUsage" | string | `"2g"` | Maximum memory space for HybridStore. | | ||
| config."spark.history.store.maxDiskUsage" | string | `"10g"` | Maximum disk usage for the local directory storing cached application history information. | | ||
| config."spark.history.store.path" | string | `""` | Local directory where to cache application history data. | | ||
| config."spark.history.store.serializer" | string | `"JSON"` | Serializer for writing/reading UI objects to/from disk-based KV Store. | | ||
| config."spark.history.ui.maxApplications" | int | `2147483647` | The number of applications to display on the history summary page (defaults to Int.MaxValue). | | ||
| config."spark.history.ui.port" | int | `18080` | The port to which the web interface of the history server binds. | | ||
| containerSecurityContext | object | `{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]},"readOnlyRootFilesystem":false,"runAsUser":185}` | Security context for the container. | | ||
| deploymentName | string | `nil` | Desired name of the Spark History Server deployment. | | ||
| envFrom | list | `[]` | Environment variables from ConfigMap or Secret. | | ||
| extraEnvs | list | `[]` | Extra environment variables to pass to the container. | | ||
| extraVolumeMounts | list | `[]` | Additional volume mounts. | | ||
| extraVolumes | list | `[]` | Additional volumes to be mounted. | | ||
| fullnameOverride | string | `""` | Overrides the release name. | | ||
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy. | | ||
| image.repository | string | `"quay.io/okdp/spark"` | Docker image registry. | | ||
| image.tag | string | `"spark-3.5.1-scala-2.12-java-17-2024-04-04-1.0.0"` | Image tag. | | ||
| imagePullSecrets | list | `[]` | Secrets to be used for pulling images from private Docker registries. | | ||
| ingress.annotations | object | `{}` | | | ||
| ingress.enabled | bool | `false` | | | ||
| ingress.hosts[0].host | string | `"chart-example.local"` | | | ||
| ingress.hosts[0].paths[0].path | string | `"/"` | | | ||
| ingress.hosts[0].paths[0].pathType | string | `"Prefix"` | | | ||
| ingress.ingressClassName | string | `""` | Specify the ingress class (Kubernetes >= 1.18). | | ||
| ingress.labels | object | `{}` | | | ||
| ingress.name | string | `nil` | | | ||
| ingress.tls | list | `[]` | | | ||
| lifecycle | object | `{}` | | | ||
| livenessProbe | object | `{}` | Liveness probe for the Spark History Server container. | | ||
| logLevel | string | `"info"` | Log4j2 log level. One of `` `debug`, `info`, `warn`, `error`, `fatal`, `trace` | | ||
| nameOverride | string | `""` | Override for the `spark-history-server.fullname` template, maintains the release name. | | ||
| nodeSelector | object | `{}` | | | ||
| podAnnotations | object | `{}` | Additional annotations for the Spark History Server pod. | | ||
| podLabels | object | `{}` | Additional labels for the Spark History Server pod. | | ||
| podSecurityContext | object | `{"seccompProfile":{"type":"RuntimeDefault"}}` | Security context for the pod. | | ||
| priorityClassName | string | `""` | Priority class name for the Spark History Server pod. | | ||
| readinessProbe | object | `{}` | Readiness probe for the Spark History Server container. | | ||
| replicaCount | int | `1` | Desired number of Spark History Server pods to run. | | ||
| resources | object | `{}` | | | ||
| service.annotations | object | `{}` | | | ||
| service.httpPortName | string | `"http"` | | | ||
| service.labels | object | `{}` | | | ||
| service.loadBalancerSourceRanges | list | `[]` | | | ||
| service.name | string | `nil` | | | ||
| service.nodePort | string | `""` | | | ||
| service.port | int | `18080` | | | ||
| service.type | string | `"ClusterIP"` | | | ||
| serviceAccount.annotations | object | `{}` | Annotations to add to the service account. | | ||
| serviceAccount.create | bool | `false` | Whether to create a service account or not. | | ||
| serviceAccount.name | string | `"default"` | The name of the service account to use. | | ||
| startupProbe | object | `{}` | Startup probe for the Spark History Server container. | | ||
| tolerations | object | `{}` | | | ||
| topologySpreadConstraints | list | `[]` | | | ||
| updateStrategy | object | `{"type":"RollingUpdate"}` | Update strategy for deployments. | | ||
|
||
## Source Code | ||
|
||
* <https://github.com/apache/spark> | ||
* <https://github.com/OKDP/okdp-spark-auth-filter> | ||
* <https://github.com/OKDP/spark-images> | ||
* <https://github.com/okdp/charts/tree/main/charts/spark-history-server> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
{{ template "chart.header" . }} | ||
|
||
{{ template "chart.description" . }} | ||
|
||
## Introduction | ||
|
||
This chart bootstraps a [Spark History Server](https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact) deployment using the [Helm](https://helm.sh) package manager. | ||
|
||
{{ template "chart.requirementsSection" . }} | ||
|
||
## Prerequisites | ||
|
||
- Helm >= 3 | ||
- Kubernetes >= 1.19 | ||
|
||
## Installing the chart | ||
|
||
To deploy the chart with a release name `my-release`: | ||
|
||
1. First, add the okdp helm charts repository: | ||
|
||
```shell | ||
$ helm repo add okdp https://okdp.github.io/charts | ||
``` | ||
|
||
2. Then, install the chart with the release name `my-release`: | ||
|
||
```shell | ||
$ helm install --version {{ template "chart.version" . }} \ | ||
my-release okdp/spark-history-server | ||
``` | ||
|
||
This will create a release of `my-release` in the default namespace. To install in a different one: | ||
|
||
```shell | ||
$ helm install --version {{ template "chart.version" . }} \ | ||
--namespace spark \ | ||
my-release okdp/spark-history-server | ||
``` | ||
|
||
Note that `helm` will fail to install if the namespace doesn't exist. Either create the namespace beforehand or pass the `--create-namespace` flag to the `helm install` command. | ||
|
||
## Uninstalling the chart `my-release` | ||
|
||
To uninstall `my-release`: | ||
|
||
```shell | ||
$ helm uninstall my-release -n spark | ||
``` | ||
|
||
The command removes all the Kubernetes components associated with the chart and deletes the release. | ||
## Configuration examples | ||
|
||
### Using an S3-Compatible System for Log Storage | ||
|
||
This example guides you through setting up the Spark History Server to store logs in an S3-compatible storage system. | ||
|
||
#### Authentication using accessKey and secretKey | ||
|
||
##### 1. Create a Kubernetes Secret for S3 Credentials | ||
|
||
This secret will store the access key and secret key required to access your S3 bucket. | ||
|
||
```shell | ||
kubectl create secret generic s3-secret \ | ||
--from-literal=accessKey='XXXXX' \ | ||
--from-literal=secretKey='XXXXX' \ | ||
-n spark | ||
``` | ||
|
||
Replace your-access-key and your-secret-key with your actual S3 credentials. | ||
|
||
##### 2. Prepare Configuration Values (my-values.yaml): | ||
|
||
```yaml | ||
config: | ||
spark.hadoop.fs.s3a.endpoint: https://<S3 endpoint> | ||
spark.hadoop.fs.s3a.connection.ssl.enabled: true | ||
spark.history.fs.logDirectory: s3a://<your-bucket-name>/eventLogs | ||
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem | ||
spark.hadoop.fs.s3a.path.style.access: true | ||
|
||
extraEnvs: | ||
- name: AWS_ACCESS_KEY_ID | ||
valueFrom: | ||
secretKeyRef: | ||
key: accessKey | ||
name: s3-secret | ||
- name: AWS_SECRET_ACCESS_KEY | ||
valueFrom: | ||
secretKeyRef: | ||
key: secretKey | ||
name: s3-secret | ||
# Disable Certificate Checking | ||
- name: SPARK_HISTORY_OPTS | ||
value: "-Dcom.amazonaws.sdk.disableCertChecking=true" | ||
``` | ||
|
||
##### 3. Deploy the Helm Chart | ||
|
||
```shell | ||
helm repo add okdp https://okdp.github.io/charts | ||
helm install --version {{ template "chart.version" . }} \ | ||
--namespace spark \ | ||
--values my-values.yaml \ | ||
my-release okdp/spark-history-server | ||
``` | ||
|
||
|
||
{{ template "chart.valuesSection" . }} | ||
|
||
{{ template "chart.sourcesSection" . }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
1. Get the application URL by running these commands: | ||
{{- if .Values.ingress.enabled }} | ||
{{- range $host := .Values.ingress.hosts }} | ||
{{- range .paths }} | ||
http{{ if $.Values.ingress.tls }}s{{ end }}://{{ $host.host }}{{ .path }} | ||
{{- end }} | ||
{{- end }} | ||
{{- else if contains "NodePort" .Values.service.type }} | ||
export NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ include "spark-history-server.fullname" . }}) | ||
export NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}") | ||
echo http://$NODE_IP:$NODE_PORT | ||
{{- else if contains "LoadBalancer" .Values.service.type }} | ||
NOTE: It may take a few minutes for the LoadBalancer IP to be available. | ||
You can watch the status of by running 'kubectl get --namespace {{ .Release.Namespace }} svc -w {{ include "spark-history-server.fullname" . }}' | ||
export SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ include "spark-history-server.fullname" . }} --template "{{"{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"}}") | ||
echo http://$SERVICE_IP:{{ .Values.service.port }} | ||
{{- else if contains "ClusterIP" .Values.service.type }} | ||
export POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ include "spark-history-server.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}") | ||
export CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") | ||
kubectl --namespace {{ .Release.Namespace }} port-forward $POD_NAME 8080:$CONTAINER_PORT | ||
Visit http://127.0.0.1:8080 to use your application | ||
{{- end }} |
Oops, something went wrong.