Big data project analyzing weather dataset.
The dataset is a collection of daily weather measurements (temperature, wind speed, humidity, pressure, etc.) from 20000+ weather stations around the world. It has various attributes such as Mean temperature, Mean dew point, Mean sea level pressure, Mean station pressure, Mean visibility, Mean wind speed, Maximum sustained wind speed, Maximum temperature, Minimum temperature, Precipitation amount, Snow depth etc. Data collected by NCDC over the last 85 years. The data is available through ftp at ftp://ftp.ncdc.noaa.gov/pub/data/gsod/.
- Minimum and Maximum Temperatures using Hive and Java.
- K-means Clustering using iterative Map-Reduce in Java.
- Weather prediction by training models using Weka.
========================