Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
base		base
ddl		ddl
dev		dev
history		history
image-data		image-data
jupyter		jupyter
master		master
worker		worker
Makefile		Makefile
README.md		README.md
build_image.cmd		build_image.cmd
build_image.sh		build_image.sh
docker-compose.yml		docker-compose.yml

Repository files navigation

Pyspark-hadoop-hive-docker

Two steps to create a pyspark hadoop cluster using docker.

Build the image
Run docker compose command to start the cluster.

Base image used are as below:

tensorflow/tensorflow:2.12.0-gpu-jupyter
jupyter/minimal-notebook:python-3.8
postgres:11

Software

Quick Start

To deploy the cluster, run:

make
docker-compose up

Access interfaces with the following URL

Hadoop

ResourceManager: http://localhost:8088

NameNode: http://localhost:9870

HistoryServer: http://localhost:19888

Datanode1: http://localhost:9864 Datanode2: http://localhost:9865

NodeManager1: http://localhost:8042 NodeManager2: http://localhost:8043

Spark

master: http://localhost:8080

worker1: http://localhost:8081 worker2: http://localhost:8082

history: http://localhost:18080

Hive

URI: jdbc:hive2://localhost:10000

Jupyter Notebook

URL: http://localhost:8888

Source of these code is from below github repository, below code did not worked for me and I need to make quite changes to make it work. https://github.com/myamafuj/hadoop-hive-spark-docker

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 45.2%
Dockerfile 40.7%
Shell 10.6%
Makefile 1.8%
Batchfile 1.7%