Skip to content

SalmaOuardi/Classification-with-PySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation


Front-End Checklist

Classification with Pyspark

Author: Salma OUARDI


In the course of this project, I successfully performed feature extraction from raw input data, and trained multiple classification models using the Mllib library. A comprehensive performance comparison was carried out among the models, in order to determine the most optimal model.

This project is inspired from the book Machine Learning with Spark


Tasks / Achievements

  • Used PySpark to extract the appropriate features from raw input data.
  • Trained a number of classification models using MLlib.
  • Made predictions with our classification models.
  • Applied a number of standard evaluation techniques to assess the predictive performance of our models.
  • Explored the impact of parameter tuning on model performance and learn how to use cross-validation to select the most optimal model parameters.

The notebook Classification_with_Pyspark.ipynb has a full description of each step of this project.