Help Alzar, the record keeper for finding lost details of 3.5k houses with the help of Machine Learning.
- Clone repository and run feature_extraction.py to create all_data.csv dataset. (Change the paths of all files accessed in feature_extraction.py to local paths on your machine first)
- Dataset is given in form of text files so preprocessing is required to convert them into
csv file feature_extraction.pyextracts data from text files and makeall_data.csv.
- Dataset is given in form of text files so preprocessing is required to convert them into
- This problem statement uses xgboost Regressor so it must be installed through either of these ways.
- Using pip-
pip install xgboost - Using conda-
conda install -c py-xgboost
- Using pip-
- Python2.7 is preferred for this project.
- Run
feature_extraction.pyto create dataset from raw text files to processed csv files. - Run
feature_analysis.pyon Jupyter notebook to visualize dataset using functions of pandas dataframe. - Run
feature_analysis.pyon Jupyter notebook to visualize relations between features and target value with the help of histogram, scatter plots and Heat Map.
- Run
regression.pyon Jupyter notebook for trying new features and feature selection and filling NaN values through interpolation.- After this data is ready to fit for different models.
- Running
regression.py- This gives detail
r2_score analysisafter tuning hyperparameters of different types of regressions. - This will run
cross validationacross the training set on LinearRegression, LassoRegression, Ridge Regression and xgboost Regression and printsr2_score.
- This gives detail
- With the help of
xgboost regressorwe are able to achieve r2_score of 0.99512. Solution.csvis also given in repository to match results of test dataset.- xgboost with tuned parameters gives final
r2_scoreof 0.99553 on test dataset.
