This repository contains a collection of Jupyter notebooks that demonstrate different techniques for data preprocessing, exploratory data analysis (EDA), feature engineering, and predictive modeling.
The notebooks cover a wide range of tasks commonly performed in a data science workflow — from cleaning and transforming raw datasets to building and evaluating machine learning models.
Notebook | Description |
---|---|
Classification_and_prediction_regression.ipynb |
Builds classification and regression models, evaluates predictive performance. |
Data_Aggregation.ipynb |
Demonstrates grouping, summarizing, and aggregating datasets. |
Data_Transformation.ipynb |
Shows how to transform raw data (scaling, normalization, handling missing values, encoding). |
Feature_Selection.ipynb |
Implements techniques for selecting the most relevant features. |
Feature_Transformation.ipynb |
Feature engineering and transformations (e.g., encoding, PCA). |
Handle_Imbalanced_Classes.ipynb |
Methods to deal with imbalanced datasets (oversampling, undersampling, SMOTE). |
Market_Basket_Analysis.ipynb |
Association rule mining and frequent itemset analysis. |
PCA_Data_Mine.ipynb |
Principal Component Analysis (PCA) for dimensionality reduction. |
Webscrapping.ipynb |
Collecting datasets via web scraping (typo in filename: should be Webscraping). |
The general data science workflow demonstrated in this repository is:
- Data Collection → e.g.
Webscrapping.ipynb
- Exploration & Cleaning →
Data_Transformation.ipynb
,Data_Aggregation.ipynb
- Feature Engineering →
Feature_Selection.ipynb
,Feature_Transformation.ipynb
- Dimensionality Reduction →
PCA_Data_Mine.ipynb
- Modeling & Evaluation →
Classification_and_prediction_regression.ipynb
,Handle_Imbalanced_Classes.ipynb
- Advanced Analysis →
Market_Basket_Analysis.ipynb
- Python 3.8+
- Jupyter Notebook / JupyterLab
- Common ML/data libraries:
pandas
,numpy
,matplotlib
,seaborn
scikit-learn
imbalanced-learn
mlxtend
beautifulsoup4
,requests
(for web scraping)
- Some notebooks may rely on sample datasets (e.g., CSV files or scikit-learn’s built-in datasets).
- Ensure file paths are updated if you use your own datasets.
- The notebook
Webscrapping.ipynb
should be renamed toWebscraping.ipynb
for clarity.
Contributions are welcome!
- Open an issue if you find a bug or want to suggest improvements.
- Submit a pull request if you want to add more preprocessing or EDA techniques.