Skip to content

Latest commit



112 lines (82 loc) · 4.48 KB

File metadata and controls

112 lines (82 loc) · 4.48 KB

Spark on Kubernetes Cluster

Local setup

To setup the spark-cluster chart locally you need:

minikube start --kubernetes-version=1.18.0 --cpus=12 --memory=14g
export TILLER_NAMESPACE=kube-system
kubectl create -n kube-system -f scripts/cluster-admin.yaml
kubectl create serviceaccount tiller --namespace kube-system
kubectl create clusterrolebinding tiller-cluster-role --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
helm init --upgrade --service-account tiller
  • Add and sync Helm repository jahstreet
helm repo add jetstack
helm repo add ingress-nginx
helm repo add jupyterhub
helm repo add loki
helm repo add jahstreet
helm repo update
  • Run in a separate terminal minikube tunnel

  • Install cluster-base chart

kubectl apply -f
helm upgrade --install cluster-base jahstreet/cluster-base --namespace kube-system
  • Check the Nginx Ingress Controller load balancer external IP
kubectl get service cluster-base-ingress-nginx-controller --namespace kube-system
  • Add entry to hosts file
  • Install spark-cluster chart (NOTE: use release name spark-cluster)
helm upgrade --install spark-cluster --namespace spark-cluster jahstreet/spark-cluster \
    --timeout 600 \
    -f charts/spark-cluster/examples/custom-values-local.yaml
  • Installation may take some time, wait until the Pods are Running
kubectl get pods --watch --namespace spark-cluster
  • Go to in your browser

  • Enter login admin and password admin

  • Spawn Jupyter profile and you'll be redirected to your personal Jupyter Notebook once it's Up and Running

  • You can find Livy UI with the clickable links to the Spark UI, logs and debug info for the Running Jupyter sessions at

  • Try out notebooks in examples/ folder

  • Install spark-monitoring chart

helm upgrade --install spark-monitoring --namespace monitoring jahstreet/spark-monitoring \
    --timeout 600 \
    -f charts/spark-monitoring/examples/custom-values-example.yaml

Note: at present the spark-monitoring chart requires to be installed with the release name spark-monitoring to the monitoring namespace in order to make Prometheus Pushgateway service monitor work properly. Please refer charts/spark-monitoring/values.yaml section pushgateway to change that.

  • Installation may take some time, wait until the Pods are Running
kubectl get pods --watch --namespace monitoring
  • Go to in your browser
  • Login to Grafana with user admin and password admin
  • Go to Explore page via corresponding tab on the left panel, select datasource Loki and choose the Kubernetes labels to get Pod logs for

Livy schema

  • Also you can find already pre-installed Grafana dasboards: Spark Metrics and Cluster State Board