Classification with Pyspark

Author: Salma OUARDI

In the course of this project, I successfully performed feature extraction from raw input data, and trained multiple classification models using the Mllib library. A comprehensive performance comparison was carried out among the models, in order to determine the most optimal model.

This project is inspired from the book Machine Learning with Spark

Tasks / Achievements

Used PySpark to extract the appropriate features from raw input data.
Trained a number of classification models using MLlib.
Made predictions with our classification models.
Applied a number of standard evaluation techniques to assess the predictive performance of our models.
Explored the impact of parameter tuning on model performance and learn how to use cross-validation to select the most optimal model parameters.

The notebook Classification_with_Pyspark.ipynb has a full description of each step of this project.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Classification_with_Pyspark.ipynb		Classification_with_Pyspark.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification with Pyspark

Author: Salma OUARDI

Tasks / Achievements

About

Releases

Packages

Languages

SalmaOuardi/Classification-with-PySpark

Folders and files

Latest commit

History

Repository files navigation

Classification with Pyspark

Author: Salma OUARDI

Tasks / Achievements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages