Two steps to create a pyspark hadoop cluster using docker.
- Build the image
- Run docker compose command to start the cluster.
Base image used are as below:
- tensorflow/tensorflow:2.12.0-gpu-jupyter
- jupyter/minimal-notebook:python-3.8
- postgres:11
To deploy the cluster, run:
make
docker-compose up
ResourceManager: http://localhost:8088
NameNode: http://localhost:9870
HistoryServer: http://localhost:19888
Datanode1: http://localhost:9864 Datanode2: http://localhost:9865
NodeManager1: http://localhost:8042 NodeManager2: http://localhost:8043
master: http://localhost:8080
worker1: http://localhost:8081 worker2: http://localhost:8082
history: http://localhost:18080
URI: jdbc:hive2://localhost:10000
Source of these code is from below github repository, below code did not worked for me and I need to make quite changes to make it work. https://github.com/myamafuj/hadoop-hive-spark-docker