Skip to content

carloapp2/Spark-POT

Repository files navigation

Introduction to Apache Spark

IBM Proof of Technology

Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. http://spark.apache.org/

Hands on Labs

IBM created basic labs to help people learn Apache Sparl.

To perform these labs, get on IBM's free Data Science platform at http://datascience.ibm.com/

Labs:
Lab 1 - Intro - Use Spark Context, Create basic RDDs
Lab 2 - SQL - Use SQL Context, Write SQL to perform basic transformations
Lab 3 - Machine Learning - Use Spark ML, Create a Machine Learning Model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published