Skip to content

Latest commit

 

History

History
78 lines (43 loc) · 2.16 KB

README.md

File metadata and controls

78 lines (43 loc) · 2.16 KB

hadoop-elephant_logo

Hadoop Tutorial

This tutorials helps people who has intesest to learn big data project and want to build Hadoop Distribute File System within Mac or Windonws system.

The project include the overview and four parts of tutorial introduction:

Overview

The overview gives the briefly introduction on our project. It contains the description of software (Docker) and technology (Hadoop, MapReduce).

https://github.com/Hadoop-bigdata/Hadoop/blob/master/Overview.md

First part: Installation

https://github.com/Hadoop-bigdata/Hadoop/blob/master/Hadoop-Installation.md

1. Install Docker
	
2. Pull Hadoop Image in Docker

Second part: Setting up Environment

https://github.com/Hadoop-bigdata/Hadoop/blob/master/Hadoop-Environment.md

3. Update Image

4. Create new Hadoop Image

Third part: Total Purchases

https://github.com/Hadoop-bigdata/Hadoop/blob/master/Hadoop-MapReduce.md

5. Download the Purchases Dataset

6. Edit Map and Reduce Function

7. Running the Mapreduce in Hadoop

Fourth part: Nature Language Processing on Amazon Review

https://github.com/Hadoop-bigdata/Hadoop/blob/master/Hadoop-NLP.md

8. Download the Amazon Review Dataset

9. Edit Map and Reduce Function

10. Running the Mapreduce in Hadoop based on vader sentiment

11. Running the Mapreduce in Hadoop based on Bag of word

Workflow of the MapReduce in Bag of word

screen shot 2017-12-20 at 21 43 55

Workflow of the MapReduce in vader sentiment

screen shot 2017-12-20 at 21 37 49

Appendix: Operation Command in Docker

Appendix gives the basic operation command in Docker to help people connect the different containers.

https://github.com/Hadoop-bigdata/Hadoop/blob/master/Docker-Operation.md

Picture Reference

https://user-images.githubusercontent.com/26347639/34073834-0167b5fe-e271-11e7-8974-0f4850969a7b.png