Skip to content

aws-samples/amazon-eks-apache-spark-etl-sample

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

amazon-eks-spark-best-practices

Examples providing best practices for Apache Spark on Amazon EKS

Pre-requisite

Preparing the required Docker images

Run the folowwing command to build respectively a spark base image and the application image

cd spark-application

docker build -t <DOCKER_REPO>/spark-eks:v3.1.2 .

docker push <DOCKER_REPO>/spark-eks:v3.1.2

Running the demo steps

  • Create the EKS cluster using eksctl

    eksctl create cluster -f kubernetes/eksctl.yaml

  • Deploy the Kubernetes autoscaler

    kubectl create -f kubernetes/cluster_autoscaler.yaml

  • Create an Amazon IAM Policy with the right permissions for the job

  • Create two IAM role for service accounts with the previous Policy ARN

eksctl create iamserviceaccount \
--name spark \
--namespace spark \
--cluster spark-eks-best-practices \
--attach-policy-arn <POLICY_ARN> \
--approve --override-existing-serviceaccounts
eksctl create iamserviceaccount \
--name spark-fargate \
--namespace spark-fargate \
--cluster spark-eks-best-practices \
--attach-policy-arn <POLICY_ARN> \
--approve --override-existing-serviceaccounts
  • Launch Spark jobs with self managed Amazon EKS Nodegroups or with AWS Fargate

kubectl apply -f examples/spark-job-hostpath-volume.yaml

kubectl apply -f examples/spark-job-fargate.yaml

  • Monitor Kubernetes Nodes and Pods via the Kubernetes Dashboard

  • Monitor the Spark job progress via the Spark UI. To do that I can forward the Spark UI port to localhost and access it via my browser

    • Get the Spark driver Pod name
    • Forward the 4040 port from the Spark driver Pod
    • Access the Spark UI via this URL https://localhost:4040

kubectl get pod -n=spark

kubectl port-forward -n=spark <SPARK_DRIVER_NAME> 4040:4040

About

Spark ETL example processing New York taxi rides public dataset on EKS

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published