-
Notifications
You must be signed in to change notification settings - Fork 0
Py DS_Engineer Lab Report #07
Amy Lin edited this page Jul 25, 2017
·
6 revisions
Link to Tableau Visualization --> Check out real-time interactions here!
Article "The Man Trap" from The Economist Unwinds 1843 is used as the data source for text processing. It's about the masculinity still remains in the workplace even thought men nowadays are expected to do more chores - and work longer hours. Which in sum, is the trials of modern manhood.
I parsed the article words by NLTK package in python to take out non-verbal symbols ( , ! . etc ) and stop words first. Then pick out the top 5 most used words and calculate frequencies of those words.
I dumped the results to Tableau to visualize the results. ( yay! graphs are always better than words. :) )
Since the article is about men, the winner word is of course, "men". Second from the list is "women". ( Most likely, women is the comparison group in a household. Thus, it makes sense to be the second most used words in this article. )
This mapping technique screened out a lot of words that can probably be counted toward the frequency. That's one of the reasons a lot of words ( small circles going outward ) have only one count.
Sentences that contains top 5 words can be found here.
Full list of words with frequencies can be found here.
