Download and unzip HanLP data to the directory src/main/resources.
Given a file --Map-> (Split by line) for each line(i.e., sentence), tokenize the sentence using CRFLexicalAnalyzer in HanLP, and then count the non-stopwords --Reduce-> Sum up the number of each non-stopword --> Generate word-cloud using Kumo.
hadoop jar jar_path_of_the_project WordCount file_path(text)/to/process result_path(dir)/to/store
* The input file should be in the format of "one sentence per line".