Skip to content
#

pyspark-python

Here are 86 public repositories matching this topic...

This repository contains an end-to-end real-time YouTube comments sentiment analysis solution. It uses Azure Event Hub for data ingestion, Azure Data Factory for orchestration, and Databricks for data processing with VADER for sentiment analysis. The pipeline outputs results to Delta Lake for scalable querying and storage.

  • Updated Oct 15, 2024
  • Jupyter Notebook

This repository contains a data engineering project analyzing global earthquake events. Utilizing Microsoft Fabric, PySpark, and Power BI, it automates data fetching and cleaning from the USGS Earthquake Catalog and provides dynamic visualizations to uncover insights.

  • Updated Sep 22, 2024
  • Jupyter Notebook

The IPL Data Analysis project aims to explore and analyze the Indian Premier League (IPL) data using PySpark for data processing and Matplotlib and Seaborn for data visualization. The goal is to derive actionable insights into player performances, match trends, and overall league dynamics.

  • Updated Sep 15, 2024
  • Jupyter Notebook

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

  • Updated Aug 17, 2024
  • Python
Azure

Azure projects - End to End Data Engineering Project with medallion architecture using Azure Data Factory & Azure Databricks. Azure Serverless/Logical DataWarehouse using Azure Synapse Analystics to demo CETAS, Data Modeling, Incremental loading, CDC and Sql Monitoring the data processing connected to Power BI

  • Updated Jul 31, 2024
  • TSQL

Improve this page

Add a description, image, and links to the pyspark-python topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark-python topic, visit your repo's landing page and select "manage topics."

Learn more