Skip to content

BCoskun/Pyspark-hadoop-hive-docker

 
 

Repository files navigation

Pyspark-hadoop-hive-docker

Two steps to create a pyspark hadoop cluster using docker.

  1. Build the image
  2. Run docker compose command to start the cluster.

Base image used are as below:

  • tensorflow/tensorflow:2.12.0-gpu-jupyter
  • jupyter/minimal-notebook:python-3.8
  • postgres:11

Software

Quick Start

To deploy the cluster, run:

make
docker-compose up

Access interfaces with the following URL

Hadoop

ResourceManager: http://localhost:8088

NameNode: http://localhost:9870

HistoryServer: http://localhost:19888

Datanode1: http://localhost:9864 Datanode2: http://localhost:9865

NodeManager1: http://localhost:8042 NodeManager2: http://localhost:8043

Spark

master: http://localhost:8080

worker1: http://localhost:8081 worker2: http://localhost:8082

history: http://localhost:18080

Hive

URI: jdbc:hive2://localhost:10000

Jupyter Notebook

URL: http://localhost:8888

Source of these code is from below github repository, below code did not worked for me and I need to make quite changes to make it work. https://github.com/myamafuj/hadoop-hive-spark-docker

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 45.2%
  • Dockerfile 40.7%
  • Shell 10.6%
  • Makefile 1.8%
  • Batchfile 1.7%