Skip to content

Latest commit

 

History

History
61 lines (49 loc) · 2.22 KB

README.md

File metadata and controls

61 lines (49 loc) · 2.22 KB

= Important Links

Launch neo4j:

helm repo add neo4j https://helm.neo4j.com/neo4j
helm repo update
helm install my-neo4j-release neo4j/neo4j -f neo4j.yaml

Create custom service account:

kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default

Create docker image:

${spark_home}/bin/docker-image-tool.sh -r docker.io/connectors-pyspark -t v3.4.0 -p ${spark_home}/kubernetes/dockerfiles/spark/bindings/python/Dockerfile build

Install python3 and configure:

python3 -m venv venv
source venv/bin/activate
python3 -m pip -r requirements.txt

Launch a Job:

(/private/tmp is probably MacOS specific, you might need to reconfigure it based on your OS/Docker/K8S configuration)

SOURCE_DIR=/private/tmp/spark-on-k8s
VOLUME_TYPE=hostPath
VOLUME_NAME=spark-on-k8s
MOUNT_PATH=/private/tmp/spark-on-k8s

spark-submit \
  --master k8s://https://kubernetes.docker.internal:6443 \
  --deploy-mode cluster \
  --name spark-test \
  --packages org.neo4j:neo4j-connector-apache-spark_2.12:5.2.0_for_spark_3,org.postgresql:postgresql:42.6.0 \
  --conf spark.kubernetes.file.upload.path=$SOURCE_DIR \
  --conf spark.kubernetes.driver.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.path=$MOUNT_PATH \
  --conf spark.kubernetes.driver.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.type=Directory \
  --conf spark.kubernetes.driver.volumes.$VOLUME_TYPE.$VOLUME_NAME.options.path=$MOUNT_PATH \
  --conf spark.kubernetes.executor.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.path=$MOUNT_PATH \
  --conf spark.kubernetes.executor.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.type=Directory \
  --conf spark.kubernetes.executor.volumes.$VOLUME_TYPE.$VOLUME_NAME.options.path=$MOUNT_PATH \
  --conf spark.kubernetes.container.image=connectors-pyspark/spark-py:v3.4.0 \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --conf spark.driver.extraJavaOptions="-Divy.cache.dir=/tmp -Divy.home=/tmp" \
  push-to-neo4j.py