A reference implementation in weather domain, to demonstrate the learning and application of functional programming, in parallel computing, with the choice of Apache Spark and scala.
- For a set of pre determined locations using markov-chain model generate training data related to the weather: Condition, Temperature, Pressure, Humidity.
- Format the training data into LIBSVM and feed to Spark MLlib's Random Forest
- Build RandomForest model with the training data for Temperature, Pressure, Humidity and Condition
- For a new feed: time and location, with the aid of generated model predict the Condition, Temperature, Pressure and Humidity
- Apache Spark Mlib - RandomForest
- Scala 11.8
- Inspired by python's numpy and scipy in data science. Explored similar library in scala:
- probability-monad - Markov Chain and probability
- sbt - Build tool
-
Generate the training data
sbt "run-main com.mak.weather.model.TrainingDataGenerator"
-
To generate simulated weather data for 10 random positions
sbt "run-main com.mak.weather.station.Simulator"
-
To predict a weather for a known place say Bangalore : 12.97, 77.59, 12
command : sbt "run-main com.mak.weather.station.Simulator [latitude longitude elevation time]"
- Example:
sbt "run-main com.mak.weather.station.Simulator 12.97 77.59 12 2016-11-04T14:12:43"
- To run test cases:
sbt test
- To create an eclipse specific project
sbt eclipse