- This project is based on Blog: Run Hadoop Cluster in Docker Update
sudo docker pull silencebingo/hadoop-spark-cluster
git clone https://github.com/silencebingo/hadoop-spark-cluster
sudo docker network create --driver=bridge hadoop
cd hadoop-cluster-docker
sudo ./start-container.sh
./start-hadoop-spark.sh
./run-wordcount.sh
output
input file1.txt:
Hello Hadoop
input file2.txt:
Hello Docker
wordcount output:
Docker 1
Hadoop 1
Hello 2
output
Master
4498 NameNode
4851 ResourceManager
4695 SecondaryNameNode
5211 Jps
4957 Master
Slave
1553 Worker
1362 DataNode
1682 Jps
1476 NodeManager
Hadoop Cluster http://masterip:8088/cluster
Hadoop Overview http://masterip:50070/
Saprk Cluster http://masterip:9090/
do 1~3 like section A
sudo ./resize-cluster.sh 6
- specify parameter > 1: 2, 3..
- this script just rebuild hadoop image with different slaves file, which pecifies the name of all slave nodes
sudo ./start-container.sh 6
- use the same parameter as the step 2
do 5~6 like section A