Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. http://spark.apache.org/
IBM created basic labs to help people learn Apache Sparl.
To perform these labs, get on IBM's free Data Science platform at http://datascience.ibm.com/
Labs:
Lab 1 - Intro - Use Spark Context, Create basic RDDs
Lab 2 - SQL - Use SQL Context, Write SQL to perform basic transformations
Lab 3 - Machine Learning - Use Spark ML, Create a Machine Learning Model