spark-docker

This Docker image helps to run the Spark in a cluster mode with a master and variable slave (worker) nodes.

Installation

Setup Docker and docker-compose first
Build the image using included Dockerfile docker-compose build
Spin up a Spark cluster with 1 master and 2 slaves (as an example) docker-compose up --scale master=1 --scale slave=2
Verify that the cluster is running by going to http://docker-machine-ip:8080. Note: if you are running Docker on OS X or Windows, replace localhost with the docker host VM IP address. You can get the IP address by running docker-machine ip.
Verify that Jupyter notebook server is running by going to http://docker-machine-ip:8888
Destroy the cluster docker-compose down

Test

import pyspark
conf = pyspark.SparkConf()

conf.setMaster("spark://<docker machine IP>:7077")
conf.setAppName('test')

sc = pyspark.SparkContext(conf=conf)

rdd = sc.parallelize(range(100))
print(rdd.reduce(lambda x,y: x+y))

TODO

Need to add support for the following components:

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-docker

Installation

Test

TODO

About

Releases 1

Packages

Languages

License

kzabashta/spark-docker

Folders and files

Latest commit

History

Repository files navigation

spark-docker

Installation

Test

TODO

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages