Price sale for housing regressor

The objective of this project is to create an accurate sale price calculation tool for a Real State company, developing in the process a full Data Science Project

Overview

- Exploratory Data Analysis of housing data (Insights):

Missing values handling
All numerical variables
Distribution of numerical variables
Categorical variables
Cardinatily of categorical variables
Outliers
Relationships between dependent and independent features

- Feature engineering framework:

Splitting data into train and test set with stratified sampling tecnique to preserve distributions found in the housing data
Creation of an advanced custom transformation pipeline with helpfull tools such as: median imputer, log normalization, custom column modifiers, rare category handler, NaN dropper, categoric variable encoder and variables scaler. (to clean and transform in a fast way future raw sets)

Note: I could have used ScikitLearn tools such as imputer or Ordinal Encoder, however to add more value to the project and have more control I defined my own cleaning functions

- Feature selection framework:

-Use filter algorithms simple methods such as Chi square and Correlation matrix methods to have an initial visualization.

Different regression embeded methods will be used to determine feature importance and select the best features to feed our models:
Linear Regression
Decision Tree Regression
Random Forest Regression
XgBoost Regression
Permutation Regression

I propose 2 ways of selecting best features, one is simply selecting the top occurrences total sum of the counts, the second one is selecting the the features that appeared as a top in at least 2 different models

- Model selection, training and tunning:

Diferent algorithms to find the one with the best metrics and results for this problem

Models proposed:

Linear Regressor
Decision Tree Regressor
Random Forest Regressor
Support vector machine regressor
KG-Boost Regressor
K-n Neighbor regressor
Lasso Regressor

- Results summary

A regressor tool for estimating housing sale price was developed from scrath with excelent results, reaching a 88% accuracy in test set. All steps and assumptions were important to reach this result

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.ipynb_checkpoints		.ipynb_checkpoints
process sets		process sets
raw set		raw set
results		results
test		test
.DS_Store		.DS_Store
1.- EDA.ipynb		1.- EDA.ipynb
2.-Feature Engineering Portafolio.ipynb		2.-Feature Engineering Portafolio.ipynb
3.-Feature Sel.ipynb		3.-Feature Sel.ipynb
4.-Correct model selection, training and tunning.ipynb		4.-Correct model selection, training and tunning.ipynb
5.- Results and conclusions.ipynb		5.- Results and conclusions.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Price sale for housing regressor

Overview

- Exploratory Data Analysis of housing data (Insights):

- Feature engineering framework:

- Feature selection framework:

- Model selection, training and tunning:

- Results summary

About

Releases

Packages

Languages

lealcastillo1996/Housing-Price-Estimator

Folders and files

Latest commit

History

Repository files navigation

Price sale for housing regressor

Overview

- Exploratory Data Analysis of housing data (Insights):

- Feature engineering framework:

- Feature selection framework:

- Model selection, training and tunning:

- Results summary

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages