Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.
IBM created basic labs to help people learn Apache Sparl.
To perform these labs, get on IBM's free Data Science platform at
Lab 1 - Intro - Use Spark Context, Create basic RDDs
Lab 2 - SQL - Use SQL Context, Write SQL to perform basic transformations
Lab 3 - Machine Learning - Use Spark ML, Create a Machine Learning Model