The module introduces the fundamental concepts of data science and machine learning using Spark and Spark Machine Learning library. Thus, at the end of the course, students should know the fundamental concepts of machine learning and be adapt Spark for machine learning and data science to predict the trend and patterns of massive data sets.
Lesson | Title | Lab | Objectives |
---|---|---|---|
1 | Data Science and Machine Learning Overview | The basics of data science. Machine learning basics. Machine learning feature representation and modeling. | |
2 | Spark MLlib and Data Types | The fundamentals of Spark. Major components of Spark programming. How Spark Machine learning library works. Spark data types. | |
3 | Spark ML Overview and Spark on Azure | Lab | Recognize the Spark ML API. Demonstrate how a Spark Cluster is configured on top of HDInsight Cluster. Explain some features available in Azure HDInsight Spark Clusters. |
4 | Spark MLlib Basic Statistics | Lab | How to use basic statistics functions provided by MLlib. The input data types for these functions. How data types affect the functionality of the statistical methods. |
5 | Clustering | Lab | Understand what a clustering algorithm does. Understand supervised and unsupervised learning. Recognize the K-Means algorithm. Run K-Means on Spark MLlib. |
6 | Regression | Formally define the regression model. Define how to model using simple linear regression. Understand how to model using linear regression. Understand overfitting and underfitting the model. Understand what a regularization term accomplishes. | |
7 | Regression and Classification | Lab | Explain what regularizers accomplish. Understand cross-validation procedures. Understand nested cross-validation procedures. Define a classification problem. Represent classification errors. Explain loss functions. Understand logistic regression. Utilize Spark MLlib to implement logistic regression. |