big-data-infrastructure

How to run Big Data Infrastructure

Prepare the pipenv for python to install python environment and add it to requirements.txt
run sudo docker compose up -d to run the docker container
run mkdir hadoop/datanode and mkdir hadoop/namenode to build the checkpoint and data to store in hadoop
run sudo make makeHadoop to make hadoop file inside the container (for the first time and if you want to format the namenode, just delete the hadoop folder that you make in step 3)
run sudo make makeCassandra to prepare table Cassandra for the database (Note: if you wanted to reset the database, just run the command because its gonna drop all the table when there's existed)
In the path of ~/.big-data-infrastructure , run bash project-orchestrate.sh to run the pipeline

To stop the pipeline, just run sudo docker container stop stream_job

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bus-spark-processor		bus-spark-processor
cassandra		cassandra
conf		conf
database		database
spark-processor		spark-processor
.gitignore		.gitignore
Makefile		Makefile
Pipfile		Pipfile
README.md		README.md
docker-compose.yml		docker-compose.yml
hadoop.env		hadoop.env
project-orchestrate.sh		project-orchestrate.sh