The following steps are a recipe for a Spark Standalone cluster with 2 workers on a single machine.
The aim is to have a complete Spark-clustered environment at your laptop.
Tip
|
Consult the following documents: |
Important
|
You can use the Spark Standalone cluster in the following ways:
For our learning purposes, |
-
Start a standalone master server.
./sbin/start-master.sh
Notes:
-
Use
SPARK_CONF_DIR
for the configuration directory (defaults to$SPARK_HOME/conf
). -
Use
spark.deploy.retainedApplications
(default:200
) -
Use
spark.deploy.retainedDrivers
(default:200
) -
Use
spark.deploy.recoveryMode
(default:NONE
) -
Use
spark.deploy.defaultCores
(default:Int.MaxValue
)
-
Open master’s web UI at http://localhost:8080 to know the current setup - no workers and applications.
-
Start the first worker.
./sbin/start-slave.sh spark://japila.local:7077
NoteThe command above in turn executes org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://japila.local:7077
-
Check out master’s web UI at http://localhost:8080 to know the current setup - one worker.
Note the number of CPUs and memory, 8 and 15 GBs, respectively (one gigabyte left for the OS — oh, how generous, my dear Spark!).
-
Let’s stop the worker to start over with custom configuration. You use
./sbin/stop-slave.sh
to stop the worker../sbin/stop-slave.sh
-
Check out master’s web UI at http://localhost:8080 to know the current setup - one worker in DEAD state.
-
Start a worker using
--cores 2
and--memory 4g
for two CPU cores and 4 GB of RAM../sbin/start-slave.sh spark://japila.local:7077 --cores 2 --memory 4g
NoteThe command translates to org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://japila.local:7077 --cores 2 --memory 4g
-
Check out master’s web UI at http://localhost:8080 to know the current setup - one worker ALIVE and another DEAD.
-
Configuring cluster using
conf/spark-env.sh
There’s the
conf/spark-env.sh.template
template to start from.We’re going to use the following
conf/spark-env.sh
:conf/spark-env.shSPARK_WORKER_CORES=2 # (1) SPARK_WORKER_INSTANCES=2 # (2) SPARK_WORKER_MEMORY=2g
-
the number of cores per worker
-
the number of workers per node (a machine)
-
-
Start the workers.
./sbin/start-slave.sh spark://japila.local:7077
As the command progresses, it prints out starting org.apache.spark.deploy.worker.Worker, logging to for each worker. You defined two workers in
conf/spark-env.sh
usingSPARK_WORKER_INSTANCES
, so you should see two lines.$ ./sbin/start-slave.sh spark://japila.local:7077 starting org.apache.spark.deploy.worker.Worker, logging to ../logs/spark-jacek-org.apache.spark.deploy.worker.Worker-1-japila.local.out starting org.apache.spark.deploy.worker.Worker, logging to ../logs/spark-jacek-org.apache.spark.deploy.worker.Worker-2-japila.local.out
-
Check out master’s web UI at http://localhost:8080 to know the current setup - at least two workers should be ALIVE.
NoteUse
jps
on master to see the instances given they all run on the same machine, e.g.localhost
).$ jps 6580 Worker 4872 Master 6874 Jps 6539 Worker
-
Stop all instances - the driver and the workers.
./sbin/stop-all.sh