This tutorials helps people who has intesest to learn big data project and want to build Hadoop Distribute File System within Mac or Windonws system.
The overview gives the briefly introduction on our project. It contains the description of software (Docker) and technology (Hadoop, MapReduce).
https://github.com/Hadoop-bigdata/Hadoop/blob/master/Overview.md
https://github.com/Hadoop-bigdata/Hadoop/blob/master/Hadoop-Installation.md
1. Install Docker
2. Pull Hadoop Image in Docker
https://github.com/Hadoop-bigdata/Hadoop/blob/master/Hadoop-Environment.md
3. Update Image
4. Create new Hadoop Image
https://github.com/Hadoop-bigdata/Hadoop/blob/master/Hadoop-MapReduce.md
5. Download the Purchases Dataset
6. Edit Map and Reduce Function
7. Running the Mapreduce in Hadoop
https://github.com/Hadoop-bigdata/Hadoop/blob/master/Hadoop-NLP.md
8. Download the Amazon Review Dataset
9. Edit Map and Reduce Function
10. Running the Mapreduce in Hadoop based on vader sentiment
11. Running the Mapreduce in Hadoop based on Bag of word
Workflow of the MapReduce in Bag of word
Workflow of the MapReduce in vader sentiment
Appendix gives the basic operation command in Docker to help people connect the different containers.
https://github.com/Hadoop-bigdata/Hadoop/blob/master/Docker-Operation.md
Picture Reference
https://user-images.githubusercontent.com/26347639/34073834-0167b5fe-e271-11e7-8974-0f4850969a7b.png