This is a project of the class "CS235 Data Mining Techniques".
You are thinking about starting research in data mining and related fields. Your friend who is already working on these fields has told you that conferences usually happen in nice and exotic locations and even though this should not affect your decision, knowing that you will get to travel to an exotic location to present your work is always some extra motivation. In this assignment you are going to (empirically) verify your friend’s statement by mining WikiCFP http://www.wikicfp.com/cfp/ a website that contains calls for papers for a wide variety of conferences for multiple fields. You will have to 1) crawl the data, 2) clean the data, and 3) use Hadoop to compute various statistics of the data.