Skip to content

gprasad09/Apache-Spark-Cluster-Project

Repository files navigation

Automated containerized Apache Spark Cluster

Description

This repository contains all the artifacts and instructions to install ApacheSpark cluster using docker containers. Following sections explain steps to build the Apache Spark cluster using this repository.

How to run

Step 0: Pre-requisites

  1. All host machines should have Docker engine installed (Follow Docker installation guide https://docs.docker.com/engine/install/
  2. Add master host and worker hosts by swarm network.

---->Initiate docker swarm (Only on Master host) with below command

$ docker swarm init --advertise-addr <master-ip-address>  #Replace master-ip-address with your IP address.

---->Add worker hosts to docker swarm (Only on worker hosts).Docker init command generates a token and complete line of command (looks like as shown below) which can be executed on all worker hosts to add them to Swarm network.

$ docker swarm join --token <token from swarm init output > master-ip-address:2377

Step 1: Clone this git repository.

$ git clone https://github.com/gprasad09/Apache-Spark-Cluster-Project.git

Step 2: Build and start the spark cluster by executing the command below.

$ ./start.sh

Step 3: Launch Jupyter notebook

$ ./jupyter.sh

Step 4: Execute port forward command so that you can access Spark UI and Jupyter on your local machine.

$ ssh  -L 8888:localhost:8888 -L 8080:localhost:8080 userid@master-host-address

Step 5: Stop the cluster once you are done with you work on Apache Cluster

 $./stop.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published