R is a powerful language for data science, but on its own cant cope with large amounts of big data. sparklyR bridges this gap by connecting R to the hadoop ecosystem using spark via the tidy grammar of dplyR.
Agenda
- Types of BigData
- Introduction to Hadoop
- Hadoop Ecosystem
- Introduction to Spark (RDD)
- Spark overview
- Integration of spark with R via sparklyR
- Architecture
- Demo
- Downsides of on spark native languages
- streaming and R?
The slides can be found here https://docs.google.com/presentation/d/1NHG7-WoEUsjrdxFjy01OmZjxWB-FZomhfrxO-QapzKg/edit?usp=sharing as well as a PDF within the repository.
Here is the sample code for the lab