Images to support various Spark/Hadoop configurations in a Stand-alone Cluster controlled by Docker.
Fetch images by schema deepelement/docker-spark:{Spark Version}-{Hadoop Version}
.
Examples:
deepelement/docker-spark:2.0.2-2.7
deepelement/docker-spark:2.0.0-2.7
deepelement/docker-spark:2.0.2-2.4
Workers auto-discover master
Spark driver via injected network environmental variables at launch.
To use with Docker Compose, in docker-compose.yml
:
spark_master:
image: deepelement/docker-spark:2.0.2-2.7
container_name: spark_master
network_mode: 'bridge'
command: /start-master.sh
ports:
- "7077:7077"
- "8080:8080"
spark_worker:
image: deepelement/docker-spark:2.0.0-2.7
command: /start-worker.sh
network_mode: 'bridge'
links:
- spark_master
To scale up workers, use the standard Compose interface:
docker-compose scale spark_worker=5
docker-compose up
While the Spark setup follows traditional configuration, making things less noisy is a good example of configuration override:
RUN cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties \
&& grep -rl 'log4j.rootCategory=INFO' $SPARK_HOME/conf | xargs sed -i 's/log4j.rootCategory=INFO/log4j.rootCategory=WARN/g'