Data-Analysis-Using-Cloud-Technologies

Questions to be solved in this Assignment

Get data from Stack Exchange Acquire the top 200,000 posts by viewcount (see notes on Data Acquisition)
Load them with PIG Using Pig or MapReduce, extract, transform and load the data as applicable
Query them with Hive Using Hive and/or MapReduce, get: I. The top 10 posts by score II. The top 10 users by post score III. The number of distinct users, who used the word “Hadoop” in one of their posts
Calculate TF-IDF with MapReduce (Note: plenty of versions of code online in both Java and Python, just acknowledge the source and the changes you had to do to it) Using Mapreduce calculate the per-user TF-IDF (just submit the top 10 terms for each of the top 10 users from Query 3.II)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
MapReduce Code		MapReduce Code
DataAnalysis_CloudTechnologies.pdf		DataAnalysis_CloudTechnologies.pdf
README.md		README.md

Provide feedback