Skip to content

End-to-End data preprocessing and exploratory data analysis (EDA) techniques. It covers data cleaning, handling missing values, feature engineering, encoding categorical variables, scaling, and visualizations to understand patterns and insights in datasets

Notifications You must be signed in to change notification settings

smusab9152/Data_Preprocessing_and_Exploratory_data_analysis

Repository files navigation

Data Preprocessing and Exploratory Data Analysis

This repository contains a collection of Jupyter notebooks that demonstrate different techniques for data preprocessing, exploratory data analysis (EDA), feature engineering, and predictive modeling.
The notebooks cover a wide range of tasks commonly performed in a data science workflow — from cleaning and transforming raw datasets to building and evaluating machine learning models.


📂 Repository Structure

Notebook Description
Classification_and_prediction_regression.ipynb Builds classification and regression models, evaluates predictive performance.
Data_Aggregation.ipynb Demonstrates grouping, summarizing, and aggregating datasets.
Data_Transformation.ipynb Shows how to transform raw data (scaling, normalization, handling missing values, encoding).
Feature_Selection.ipynb Implements techniques for selecting the most relevant features.
Feature_Transformation.ipynb Feature engineering and transformations (e.g., encoding, PCA).
Handle_Imbalanced_Classes.ipynb Methods to deal with imbalanced datasets (oversampling, undersampling, SMOTE).
Market_Basket_Analysis.ipynb Association rule mining and frequent itemset analysis.
PCA_Data_Mine.ipynb Principal Component Analysis (PCA) for dimensionality reduction.
Webscrapping.ipynb Collecting datasets via web scraping (typo in filename: should be Webscraping).

📊 Workflow

The general data science workflow demonstrated in this repository is:

  1. Data Collection → e.g. Webscrapping.ipynb
  2. Exploration & CleaningData_Transformation.ipynb, Data_Aggregation.ipynb
  3. Feature EngineeringFeature_Selection.ipynb, Feature_Transformation.ipynb
  4. Dimensionality ReductionPCA_Data_Mine.ipynb
  5. Modeling & EvaluationClassification_and_prediction_regression.ipynb, Handle_Imbalanced_Classes.ipynb
  6. Advanced AnalysisMarket_Basket_Analysis.ipynb

🛠️ Requirements

  • Python 3.8+
  • Jupyter Notebook / JupyterLab
  • Common ML/data libraries:
    • pandas, numpy, matplotlib, seaborn
    • scikit-learn
    • imbalanced-learn
    • mlxtend
    • beautifulsoup4, requests (for web scraping)

📌 Notes

  • Some notebooks may rely on sample datasets (e.g., CSV files or scikit-learn’s built-in datasets).
  • Ensure file paths are updated if you use your own datasets.
  • The notebook Webscrapping.ipynb should be renamed to Webscraping.ipynb for clarity.

🤝 Contributing

Contributions are welcome!

  • Open an issue if you find a bug or want to suggest improvements.
  • Submit a pull request if you want to add more preprocessing or EDA techniques.

About

End-to-End data preprocessing and exploratory data analysis (EDA) techniques. It covers data cleaning, handling missing values, feature engineering, encoding categorical variables, scaling, and visualizations to understand patterns and insights in datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published