BigData Solution written in Python based on Hadoop and Spark.

Scale your data management by distributing workload and storage on Hadoop and Spark Clusters, explore and transform your data in Jupyter Notebook.

About The Project

Purpose for this tutorial is to show how to get started with Hadoop, Spark and Jupyter for your BigData solution, deploy as Docker Containers.

Pre-requisite

Only confirmed working on Linux/Windows (Apple Silicon might have issues).
Ensure Docker is installed.

Start

Execute bash master-build.sh to start the the build and start the containers.

Hadoop

Access Hadoop UI on ' http://localhost:9870 '

Spark

Access Spark Master UI on ' http://localhost:8080 '

Jupyter

Access Jupyter UI on ' http://localhost:8888 '

Contributing

Fork the Project
Create your Feature Branch (git checkout -b feature/featureName)
Commit your Changes (git commit -m 'Add some featureName')
Push to the Branch (git push origin feature/featureName)
Open a Pull Request

Contact

Martin Karlsson

LinkedIn : martin-karlsson
Twitter : @HelloKarlsson
Email : hello@martinkarlsson.io
Webpage : www.martinkarlsson.io

Project Link: github.com/martinkarlssonio/big-data-solution

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
hadoop		hadoop
jupyter		jupyter
spark		spark
.gitignore		.gitignore
README.md		README.md
arch.png		arch.png
docker-compose.yml		docker-compose.yml
master-build.sh		master-build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigData Solution written in Python based on Hadoop and Spark.

Scale your data management by distributing workload and storage on Hadoop and Spark Clusters, explore and transform your data in Jupyter Notebook.

About The Project

Pre-requisite

Start

Hadoop

Spark

Jupyter

Contributing

Contact

Martin Karlsson

About

Releases

Packages

Contributors 3

Languages

martinkarlssonio/big-data-solution

Folders and files

Latest commit

History

Repository files navigation

BigData Solution written in Python based on Hadoop and Spark.

Scale your data management by distributing workload and storage on Hadoop and Spark Clusters, explore and transform your data in Jupyter Notebook.

About The Project

Pre-requisite

Start

Hadoop

Spark

Jupyter

Contributing

Contact

Martin Karlsson

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages