tech-layoffs-cleaning-SQL-VS-Python

Overview

This project focuses on cleaning and analyzing a dataset containing information on layoffs in the tech industry. The dataset includes details on affected companies, industries, locations, and funding levels. The goal is to clean and process the data using both MySQL and Python (Pandas) to compare their effectiveness in handling data cleaning and analysis.

📂 Files in This Repository

layoffs_cleaning.sql – SQL script for cleaning the dataset using MySQL.
layoffs_cleaning.ipynb – Jupyter Notebook replicating the cleaning process using Python (Pandas).
layoffs.csv – The original raw dataset.
layoffs_cleaned.csv – The cleaned dataset after processing.
README.md – This file, which provides project details and comparisons between MySQL and Python.

Data Cleaning Process

Using MySQL

Created a staging table to preserve raw data.
Identified and removed duplicates using ROW_NUMBER().
Standardized company and country names using TRIM() and LIKE.
Converted the date column to a proper DATE format using STR_TO_DATE().
Handled missing values by filling them based on related records.
Removed rows where critical numerical values were missing.
Performed analysis on layoffs by industry, company, country, and year.

Using Python (Pandas)

Loaded the dataset using Pandas.
Removed duplicates with groupby() and cumcount().
Standardized text fields by converting them to lowercase and stripping special characters.
Converted the date column to datetime format using pd.to_datetime().
Filled missing values using grouped data (mode per country).
Identified and removed outliers using the interquartile range (IQR) method.
Analyzed layoffs by company, industry, country, and year.

🔍 MySQL vs. Python (Pandas): A Comparison

Feature	MySQL	Python (Pandas)
Duplicate Removal	Uses `ROW_NUMBER()` & `DELETE`	Uses `groupby().cumcount()` & `drop_duplicates()`
Text Standardization	Uses `TRIM()` & `LIKE`	Uses `str.strip()` & `apply()`
Date Conversion	Uses `STR_TO_DATE()` & `ALTER`	Uses `pd.to_datetime()`
Handling Missing Data	Uses `UPDATE` & `JOIN`	Uses `fillna()` & `map()`
Performance	Faster for large structured data	More flexible for complex transformations
Ease of Use	Requires SQL queries	More programmatic and adaptable

🏆 Key Takeaways

MySQL is efficient for handling structured datasets stored in databases.
Python (Pandas) is more flexible for complex data transformations and analysis.
Both approaches work well, but Python simplifies handling missing values dynamically.

🚀 How to Use This Repository

Running MySQL Script

Import layoffs.csv into MySQL.
Execute layoffs_cleaning.sql.
Query layoffs_clean2 for the cleaned dataset.

Running Python Script

Open layoffs_cleaning.ipynb in Jupyter Notebook.
Run all cells to process the dataset.
The cleaned dataset will be saved as layoffs_cleaned.csv.

📢 Insights & Discussion

This project explores layoff trends in the tech industry, highlighting affected companies, industries, and regions. The comparison between MySQL and Python demonstrates how both tools handle data cleaning efficiently but with different strengths.

🌟 Contributions & Feedback

If you have suggestions or improvements, feel free to contribute or raise an issue!

Author: Naitik Nayak

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tech-layoffs-cleaning-SQL-VS-Python

Overview

📂 Files in This Repository

Data Cleaning Process

Using MySQL

Using Python (Pandas)

🔍 MySQL vs. Python (Pandas): A Comparison

🏆 Key Takeaways

🚀 How to Use This Repository

Running MySQL Script

Running Python Script

📢 Insights & Discussion

🌟 Contributions & Feedback

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
layoffs.csv		layoffs.csv
layoffs_cleaned.csv		layoffs_cleaned.csv
layoffs_cleaning.ipynb		layoffs_cleaning.ipynb
layoffs_cleaning.sql		layoffs_cleaning.sql

naitiknayak196/tech-layoffs-cleaning-SQL-VS-Python

Folders and files

Latest commit

History

Repository files navigation

tech-layoffs-cleaning-SQL-VS-Python

Overview

📂 Files in This Repository

Data Cleaning Process

Using MySQL

Using Python (Pandas)

🔍 MySQL vs. Python (Pandas): A Comparison

🏆 Key Takeaways

🚀 How to Use This Repository

Running MySQL Script

Running Python Script

📢 Insights & Discussion

🌟 Contributions & Feedback

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages