Test Cricket Batting Analysis

Overview

This project performs a detailed analysis of Test cricket batting records using data cleaning, feature engineering, and exploratory data analysis (EDA). The goal is to uncover insights into players' performances and career trends.

Steps and Insights

1. Data Loading and Exploration

Loading the Dataset

import pandas as pd

df = pd.read_csv('TestMatch_Data - Test matches _ Batting records _ Highest career batting average _ ESPNcricinfo.csv')

Initial Inspection:
- Displayed the dataset using df.head() and verified dimensions with df.shape.
- Checked for missing and duplicate values using df.isnull() and df.duplicated().

Key Observations:

Columns contained mixed data formats (e.g., Span, Matches).
Player names included country affiliations.

2. Data Cleaning

Renaming Columns

df.rename(columns={
    'Mat': 'Matches',
    'NO': 'Not_Out',
    'HS': 'Highest_Inns_Score',
    'BF': 'Ball_Faced',
    'SR': 'Batting_Strike_Rate',
    '0': 'Ducks'
}, inplace=True)

Handling Missing and Duplicate Values

df.dropna(inplace=True)
df.drop_duplicates(inplace=True)

Formatting Columns

Split Span into Debut_Year and Last_Year.

df['Debut_Year'] = df['Span'].str.split('-').str[0].astype(int)
df['Last_Year'] = df['Span'].str.split('-').str[1].astype(int)
df.drop(['Span'], axis=1, inplace=True)

Extract Player Name and Country:

df['Player_Name'] = df['Player'].str.split('(').str[0]
df['Country'] = df['Player'].str.extract('\((.*?)\)')[0]
df.drop(['Player'], axis=1, inplace=True)

Convert Columns to Numeric:

df['Highest_Inns_Score'] = df['Highest_Inns_Score'].str.replace('*', '').astype(int)
df['Matches'] = df['Matches'].astype(int)
df['4s'] = df['4s'].astype(int)
df['6s'] = df['6s'].astype(int)
df['Ball_Faced'] = df['Ball_Faced'].str.split('+|-').str[0].astype(int)

3. Feature Engineering

Career Length

df['Career_Length'] = df['Last_Year'] - df['Debut_Year']

New Metrics

Average Career Length:

df['Career_Length'].mean()

Average Strike Rate for Players with Long Careers:

df[df['Career_Length'] > 10]['Batting_Strike_Rate'].mean()

4. Exploratory Data Analysis (EDA)

Key Questions and Insights

Players Who Debuted Before 1960:

df[df['Debut_Year'] < 1960]['Player_Name'].count()

Highest Innings Scores by Country:

df.groupby('Country')['Highest_Inns_Score'].max()

Centuries Scored by Country:

df.groupby('Country')['100'].max()

Averages of Key Metrics by Country:

df.groupby('Country')[['100', '50', 'Ducks']].mean()

5. Visualizations

Relationships Among Metrics

import seaborn as sns
sns.set_theme(style='darkgrid')
sns.scatterplot(df.groupby('Country')[['100', '50', 'Ducks']].mean())

Conclusion

This analysis provides a comprehensive view of Test cricket batting performances. It highlights:

Career trends, including average career lengths and strike rates for long-tenured players.
Country-wise performance metrics like highest innings scores and averages of centuries.
The importance of data cleaning and feature engineering in deriving meaningful insights from raw data.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
TestMatch_Data - Test matches _ Batting records _ Highest career batting average _ ESPNcricinfo.csv		TestMatch_Data - Test matches _ Batting records _ Highest career batting average _ ESPNcricinfo.csv
Test_Cricket.webp		Test_Cricket.webp
Test_Cricket_Battings_Analysis (Data_Cleaning).ipynb		Test_Cricket_Battings_Analysis (Data_Cleaning).ipynb
test_battings_result.png		test_battings_result.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Test Cricket Batting Analysis

Overview

Steps and Insights

1. Data Loading and Exploration

Loading the Dataset

Key Observations:

2. Data Cleaning

Renaming Columns

Handling Missing and Duplicate Values

Formatting Columns

3. Feature Engineering

Career Length

New Metrics

4. Exploratory Data Analysis (EDA)

Key Questions and Insights

5. Visualizations

Relationships Among Metrics

Conclusion

About

Releases

Packages

Languages

SriSurya-DA/Test_Cricket_Batting_Analysis_ESPNcricinfo

Folders and files

Latest commit

History

Repository files navigation

Test Cricket Batting Analysis

Overview

Steps and Insights

1. Data Loading and Exploration

Loading the Dataset

Key Observations:

2. Data Cleaning

Renaming Columns

Handling Missing and Duplicate Values

Formatting Columns

3. Feature Engineering

Career Length

New Metrics

4. Exploratory Data Analysis (EDA)

Key Questions and Insights

5. Visualizations

Relationships Among Metrics

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages