Skip to content

This project builds an Income Prediction model using Decision Tree Regression. It includes full data preprocessing, categorical encoding, log transformation, and hyperparameter tuning with GridSearchCV. The project demonstrates an end-to-end ML workflow and highlights model performance on a noisy synthetic dataset. Topics

Notifications You must be signed in to change notification settings

samir-m0hamed/IncomePrediction_DecisionTreeRegression

Repository files navigation

πŸ“Š Income Prediction Using Decision Tree Regression

This project aims to predict individual income using a complete Machine Learning workflow.
The dataset contains demographic, education, employment, and household-related attributes, and the model uses a Decision Tree Regressor with full hyperparameter tuning to estimate income values.


πŸš€ Project Overview

  • Perform data cleaning and preprocessing.
  • Apply Ordinal and One-Hot Encoding to categorical features.
  • Use log transformation to reduce skewness in the target variable.
  • Split data into training and testing sets.
  • Use GridSearchCV to optimize tree hyperparameters.
  • Evaluate model performance (RΒ², RMSE).
  • Visualize predicted vs actual income values.

πŸ› οΈ Technologies Used

  • Python
  • Pandas
  • NumPy
  • Scikit-Learn
  • Plotly
  • Jupyter Notebook
  • Google Colab

πŸ”§ Model Details

  • Algorithm: Decision Tree Regression
  • Tuning:
    • max_depth
    • min_samples_leaf
    • min_samples_split
  • Scoring: Negative Mean Squared Error (MSE)
  • Target: Income (log-transformed during training)

πŸ“ˆ Results

Due to the synthetic nature of the dataset, the model shows:

  • Training RΒ² : low
  • Testing RΒ² : low

This indicates underfitting, meaning the dataset lacks strong relationships between features and income.

Despite this, the project demonstrates a clean, end-to-end ML pipeline suitable for learning and experimentation.


πŸ“ Files Included

  • data.csv β†’ Dataset used for training and tetsing the model
  • Income Prediction Project No LogTransformation.ipynb β†’ Main notebook containing full ML workflow without log transformation applied ( higher in the accuracy )
  • Income Prediction with Log Trasnfromation.ipynb β†’ Another version of the notebook containing full ML workflow with log transformation applied ( lower in the accuracy )
  • README.md β†’ Project documentation

πŸ“ˆ Results

Due to the synthetic nature of the dataset and the selected model ( Decision Tree Regression ) , the model shows underfitting, with low RΒ² scores on the both notebooks .

  • Income Prediction Project No LogTransformation.ipyn β†’ Rsquared = 1.68%
  • Income Prediction Project with Log Trasnfromation.ipyn β†’ Rsquared = -8.54%

This demonstrates a realistic challenge when datasets lack strong feature–target relationships.


🎯 Future Improvements

  • Try ensemble models: RandomForest, GradientBoosting, XGBoost
  • Use a more realistic dataset
  • Apply advanced feature engineering to extract meaningful patterns

πŸ‘€ Author

Developed by Samir Mohamed as part of a regression machine learning practice project.

About

This project builds an Income Prediction model using Decision Tree Regression. It includes full data preprocessing, categorical encoding, log transformation, and hyperparameter tuning with GridSearchCV. The project demonstrates an end-to-end ML workflow and highlights model performance on a noisy synthetic dataset. Topics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published