Tweet Collector
Initial stages -- just collects tweets pertaining to a certain topic and stores them in a MongoDB.
Next steps -- do some cool analysis.
Run by entering "node testnodetwitter.js" on the command line.
Preprocessing ideas (some taken from http://www.cs.columbia.edu/~julia/papers/Agarwaletal11.pdf):
- Remove words beginning with "@"" (mentions) and URLs, delete "#" from hashtags
- Use emoticon dict to link emoticons with various levels of sentiment (http://en.wikipedia.org/wiki/List of emoticons)
- Use abbreviation dict to replace words like "lol" and "gr8" with their written out versions (http://noslang.com)
- Filter out "stop words" (those commonly ignored by search engines) (http://www.webconfs.com/stop-words.php)
- Replace words with repeating character sequences with 3 charactes: i.e. "coooooool" to "coool" to standardize yet also retain emphasis.
- Link negation words with the words they follow... i.e "isn't good" should mean that "good" is replaced by "NOT_good"
- run the