Skip to content

Using python code and Machine Learning Pipeline Model to create an ETL model with Apache Spark.

Notifications You must be signed in to change notification settings

Mohamed-fawzyy/ETL-Spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Project Scenario 🎩

You are a data engineer at an aeronautics consulting company. Your company prides itself in being able to efficiently design airfoils for use in planes and sports cars. Data scientists in your office need to work with different algorithms and data in different formats. While they are good at Machine Learning, they count on you to be able to do ETL jobs and build ML pipelines. In this project, you will use the modified version of the NASA Airfoil Self Noise dataset. You will clean this dataset, by dropping the duplicate rows and removing the rows with null values. You will create an ML pipeline to create a model that will predict SoundLevel based on all the other columns. You will evaluate the model and towards the end, you will persist the model.

Reach/Follow me on

linkedIn    googleEmail    facebook


Objectives📝

  • Part 1 Perform ETL activity
    • Load a CSV dataset
    • Remove duplicates if any
    • Drop rows with null values if any
    • Make transformations
    • Store the cleaned data in parquet format
  • Part 2 Create a Machine Learning Pipeline
    • Create a machine learning pipeline for prediction
  • Part 3 Evaluate the Model
    • Evaluate the model using relevant metrics
  • Part 4 Persist the Model
    • Save the model for future production use
    • Load and verify the stored model

Diagram of an airfoil. - For informational purposes

Airfoil_with_flow

Diagram showing the Angle of attack. - For informational purposes

Airfoil_angle_of_attack

Contributing 📝

Contributions are welcome! Please open an issue or pull request for any changes or improvements.

About

Using python code and Machine Learning Pipeline Model to create an ETL model with Apache Spark.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published