Skip to content

PySpark-Roadmap is an 18-day structured learning journey that takes you from the basics of DataFrames and SQL to advanced topics like joins, performance tuning, and MLlib. Each day includes a dataset, coding task, and implementation in PySpark, making it a practical guide for mastering big data processing and machine learning with Spark.

License

Notifications You must be signed in to change notification settings

sivasurya681/PySpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySpark-Roadmap

A structured 18-day PySpark learning roadmap covering DataFrames, SQL, Joins, Window Functions, Performance Tuning, and MLlib with daily datasets and coding tasks.


📖 Project Description

This repository contains my complete journey of learning PySpark through a daily coding roadmap.
Each day focuses on a new PySpark concept with a dataset and implementation example. By the end, the roadmap covers an end-to-end mini project and MLlib model training.


✨ Features

  • Beginner to advanced PySpark concepts in 18 days.
  • Hands-on coding tasks with datasets for each topic.
  • Covers DataFrames, SQL, Aggregations, Joins, Window Functions, Performance tuning.
  • Mini Project (Day 15): Data cleaning, joining, aggregations, and saving results.
  • MLlib tasks (Day 16–18): Feature engineering, decision tree, and ML pipeline.
  • Code is written for Google Colab / Jupyter Notebook for easy execution.

About

PySpark-Roadmap is an 18-day structured learning journey that takes you from the basics of DataFrames and SQL to advanced topics like joins, performance tuning, and MLlib. Each day includes a dataset, coding task, and implementation in PySpark, making it a practical guide for mastering big data processing and machine learning with Spark.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published