Integrating R into the big data ecosystem using sparklyR

R is a powerful language for data science, but on its own cant cope with large amounts of big data. sparklyR bridges this gap by connecting R to the hadoop ecosystem using spark via the tidy grammar of dplyR.

Agenda

Types of BigData
Introduction to Hadoop
Hadoop Ecosystem
Introduction to Spark (RDD)
Spark overview
Integration of spark with R via sparklyR
Architecture
Demo
Downsides of on spark native languages
streaming and R?

The slides can be found here https://docs.google.com/presentation/d/1NHG7-WoEUsjrdxFjy01OmZjxWB-FZomhfrxO-QapzKg/edit?usp=sharing as well as a PDF within the repository.

Here is the sample code for the lab

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
.gitignore		.gitignore
README.md		README.md
RStudio.png		RStudio.png
cloud.Rmd		cloud.Rmd
kfSparkandR.Rproj		kfSparkandR.Rproj
r_bigData_integration_lab.Rmd		r_bigData_integration_lab.Rmd
r_bigData_integration_lab_questions.Rmd		r_bigData_integration_lab_questions.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Integrating R into the big data ecosystem using sparklyR

About

Releases

Packages

viennadatasciencegroup/kf-2017-11-09-R-and-spark

Folders and files

Latest commit

History

Repository files navigation

Integrating R into the big data ecosystem using sparklyR

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages