This repository contains all the artifacts and instructions to install ApacheSpark cluster using docker containers. Following sections explain steps to build the Apache Spark cluster using this repository.
Step 0: Pre-requisites
- All host machines should have Docker engine installed (Follow Docker installation guide https://docs.docker.com/engine/install/
- Add master host and worker hosts by swarm network.
---->Initiate docker swarm (Only on Master host) with below command
$ docker swarm init --advertise-addr <master-ip-address> #Replace master-ip-address with your IP address.
---->Add worker hosts to docker swarm (Only on worker hosts).Docker init command generates a token and complete line of command (looks like as shown below) which can be executed on all worker hosts to add them to Swarm network.
$ docker swarm join --token <token from swarm init output > master-ip-address:2377
Step 1: Clone this git repository.
$ git clone https://github.com/gprasad09/Apache-Spark-Cluster-Project.git
Step 2: Build and start the spark cluster by executing the command below.
$ ./start.sh
Step 3: Launch Jupyter notebook
$ ./jupyter.sh
Step 4: Execute port forward command so that you can access Spark UI and Jupyter on your local machine.
$ ssh -L 8888:localhost:8888 -L 8080:localhost:8080 userid@master-host-address
Step 5: Stop the cluster once you are done with you work on Apache Cluster
$./stop.sh