Word Count Project Using MapReduce

Prerequisite

Download and unzip HanLP data to the directory src/main/resources.

Workflow

Given a file --Map-> (Split by line) for each line(i.e., sentence), tokenize the sentence using CRFLexicalAnalyzer in HanLP, and then count the non-stopwords --Reduce-> Sum up the number of each non-stopword --> Generate word-cloud using Kumo.

Usage

hadoop jar jar_path_of_the_project WordCount file_path(text)/to/process result_path(dir)/to/store

* The input file should be in the format of "one sentence per line".

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src		src
.gitignore		.gitignore
MapReduceProject.iml		MapReduceProject.iml
README.md		README.md
hanlp.properties		hanlp.properties
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Count Project Using MapReduce

Prerequisite

Workflow

Usage

Example

wordcloud of lotus

wordcloud of whale

About

Releases

Packages

Languages

ssjjcao/EasyWordCount

Folders and files

Latest commit

History

Repository files navigation

Word Count Project Using MapReduce

Prerequisite

Workflow

Usage

Example

wordcloud of lotus

wordcloud of whale

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages