Skip to content
#

pyspark-python

Here are 88 public repositories matching this topic...

Azure

Azure projects - End to End Data Engineering Project with medallion architecture using Azure Data Factory & Azure Databricks. Azure Serverless/Logical DataWarehouse using Azure Synapse Analystics to demo CETAS, Data Modeling, Incremental loading, CDC and Sql Monitoring the data processing connected to Power BI

  • Updated Feb 16, 2025
  • Python

This repository contains an end-to-end real-time YouTube comments sentiment analysis solution. It uses Azure Event Hub for data ingestion, Azure Data Factory for orchestration, and Databricks for data processing with VADER for sentiment analysis. The pipeline outputs results to Delta Lake for scalable querying and storage.

  • Updated Oct 15, 2024
  • Jupyter Notebook

This repository contains a data engineering project analyzing global earthquake events. Utilizing Microsoft Fabric, PySpark, and Power BI, it automates data fetching and cleaning from the USGS Earthquake Catalog and provides dynamic visualizations to uncover insights.

  • Updated Sep 22, 2024
  • Jupyter Notebook

The IPL Data Analysis project aims to explore and analyze the Indian Premier League (IPL) data using PySpark for data processing and Matplotlib and Seaborn for data visualization. The goal is to derive actionable insights into player performances, match trends, and overall league dynamics.

  • Updated Sep 15, 2024
  • Jupyter Notebook

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

  • Updated Aug 17, 2024
  • Python

This repository contains Databricks projects utilizing RDDs, DataFrames, and SQL to process and analyze various real-world datasets. Data cleaning and analysis have been performed using PySpark functions to handle challenges such as inconsistent formats, missing values, and complex data structures. The project ensures efficient data transformation

  • Updated May 14, 2024
  • HTML

Improve this page

Add a description, image, and links to the pyspark-python topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark-python topic, visit your repo's landing page and select "manage topics."

Learn more