Many may have been asking yourself why they should be using Datasets rather than the foundation of all Spark - RDDs using case classes.
This document collects advantages of Dataset
vs RDD[CaseClass]
to answer the question Dan has asked on twitter:
"In #Spark, what is the advantage of a DataSet over an RDD[CaseClass]?"
In Datasets, reading or writing boils down to using SQLContext.read
or SQLContext.write
methods, appropriately.