https://keyhong.github.io/2024/02/07/Hadoop/hadoop3-docker-installation/
Component | Version | Description |
---|---|---|
Ubuntu | 22.04 | The modern, open source operating system on Linux |
Python | 3.10 | An interpreted, object-oriented, high-level programming language with dynamic semantics |
Hadoop | 3.3.6 | A framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models |
zookeeper | 3.9.1 | A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services |
Hive | 3.1.3 | A distributed, fault-tolerant data warehouse system |
spark | 3.4.2 | A multi-language engine for executing data engineering, data science, and machine learningng |
First, build the Docker image. Downloading open-source packages directly on Ubuntu may take some time depending on the network speed.
# Build a Docker image using a Makefile.
$ make build
Note: If the download fails from the CDN server (Kakao mirror server), you can run
make build
again to resume the download from the point of failure.
Then access the lakehouse services.
- HDFS Namenode Web UI: http://localhost:9870
- HDFS Datanode1 Web UI: http://localhost:9864
- HDFS Datanode2 Web UI: http://localhost:9865