GitHub - carloapp2/Spark-POT

Introduction to Apache Spark

IBM Proof of Technology

Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. http://spark.apache.org/

Hands on Labs

IBM created basic labs to help people learn Apache Sparl.

To perform these labs, get on IBM's free Data Science platform at http://datascience.ibm.com/

Labs:
Lab 1 - Intro - Use Spark Context, Create basic RDDs
Lab 2 - SQL - Use SQL Context, Write SQL to perform basic transformations
Lab 3 - Machine Learning - Use Spark ML, Create a Machine Learning Model

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
CIO Summit.pdf		CIO Summit.pdf
IBM Data Science Experience Overview.pdf		IBM Data Science Experience Overview.pdf
Lab 1 - IntroSpark - Student.ipynb		Lab 1 - IntroSpark - Student.ipynb
Lab 1 - Solution.ipynb		Lab 1 - Solution.ipynb
Lab-1.ipynb		Lab-1.ipynb
Lab-2.ipynb		Lab-2.ipynb
Lab-3.ipynb		Lab-3.ipynb
README.md		README.md
Spark POT - Class Overview.pdf		Spark POT - Class Overview.pdf
Spark POT - Intro Material.pdf		Spark POT - Intro Material.pdf
Spark POT - ML Material.pdf		Spark POT - ML Material.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction to Apache Spark

IBM Proof of Technology

Hands on Labs

About

Releases

Packages

Contributors 2

Languages

carloapp2/Spark-POT

Folders and files

Latest commit

History

Repository files navigation

Introduction to Apache Spark

IBM Proof of Technology

Hands on Labs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages