IPL Data Analysis Using Apache Spark

Overview

This project involves the analysis of Indian Premier League (IPL) cricket data using Apache Spark, a powerful open-source unified analytics engine. The primary objective is to uncover valuable insights and trends within the IPL datasets, utilizing Spark's capabilities for large-scale data processing.

Project Objectives

Data Ingestion and Cleaning: Efficiently load and preprocess raw IPL data.
Exploratory Data Analysis (EDA): Generate descriptive statistics and visualizations to understand the underlying patterns in the data.
Advanced Analytics: Implement advanced analytical techniques to derive meaningful insights from the data.
Visualization: Create interactive and static visualizations to present the findings effectively.

Datasets

The datasets used in this project can be found at the following link:

IPL Data Till 2017: Includes match and ball-by-ball data up to the year 2017.

Technologies Used

Apache Spark (PySpark)
Databricks
SparkSQL
Pandas
Matplotlib

Architecture Diagram

Below is the architecture diagram that illustrates the data flow and components used in this IPL data analysis project:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
IPL_DATA_ANALYSIS_SPARK.ipynb		IPL_DATA_ANALYSIS_SPARK.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IPL Data Analysis Using Apache Spark

Overview

Project Objectives

Datasets

Technologies Used

Architecture Diagram

About

Releases

Packages

Languages

raghul3/IPL_Data_Analysis

Folders and files

Latest commit

History

Repository files navigation

IPL Data Analysis Using Apache Spark

Overview

Project Objectives

Datasets

Technologies Used

Architecture Diagram

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages