- Repository for cleaning, transforming, and preparing datasets for analysis.
- Includes Python scripts (
pandas,numpy,re) files.
1. 01_Badly_Structured_Data_Transformation
- Details: Fix poorly structured datasets and reformat for analysis.
- Dataset:
Badly Structured Sales Data.xlsx
2. 02_Column_Split_and_Melt_Transformation
- Details: Transform wide data into tidy, long format using splits and melts.
- Dataset:
Roadmap.xlsx
3. 03_Error_Detection_and_Cleaning
- Details: Identify and fix errors, remove duplicates, and handle missing values.
- Dataset:
Hospital Data with Mixed Numbers and Characters.xlsx
- Details: Extract multi-value fields into separate rows using splitting and exploding.
- Dataset:
Invoices-with-Merged-Categories-and-Merged-Amounts.xlsx
- Details: Use regular expressions to identify, extract, and split patterns in data.
- Dataset:
Jumbled-up-Customers-Details.xlsx
6. 06_Extract_From_Mixedup_Columns
- Details: Separate mixed data in single columns into distinct components.
- Dataset:
Medicine-Data-with-lumped-Quantity-and-Measure.xlsx
- Step-by-step techniques for common data challenges.
- Python and respective libraries (
numpy,pandas,re)are used to complete tasks. - Applicable to real-world datasets.
- Clone the repository:
https://github.com/mayur-de/Data_Wrangling_and_Transformation.git
- Install Python dependencies:
pip install pandas numpy re
- Contributions are welcome via pull requests.
- MIT License.