This project predicts house prices using the House Prices: Advanced Regression Techniques dataset from Kaggle. It leverages TensorFlow Decision Forests (TF-DF) to build a Random Forest regression model, providing accurate predictions and insights into the factors influencing house prices.
data/
Contains the training and test datasets used for model development.house-prices-prediction-using-tfdf.ipynb
Jupyter Notebook with data preprocessing, model training, evaluation, and predictions.README.md
This file with an overview of the project.submission.csv
File containing the final predictions for Kaggle submission.
- Data Preprocessing:
- Handle mixed data types (numerical and categorical).
- Train-test split (70-30).
- Model Development:
- Built using TensorFlow Decision Forests (Random Forest).
- Out-of-Bag (OOB) evaluation for performance monitoring.
- Evaluation Metrics:
- RMSE (Root Mean Squared Error).
- Feature importance analysis.
- Visualization:
- Plots for feature distributions and evaluation metrics.
- Source: Kaggle - House Prices: Advanced Regression Techniques
- Training Data: 1460 houses with 79 features.
- Test Data: 1459 houses with missing sale prices.
- Target Variable:
SalePrice
(house sale price).
- Numerical:
LotArea
,GrLivArea
,YearBuilt
. - Categorical:
MSZoning
,Neighborhood
,HouseStyle
.
- Framework: TensorFlow Decision Forests (TF-DF)
- Language: Python
- Libraries:
Pandas
for data manipulation.Matplotlib
for visualizations.Numpy
for numerical computations.
- Clone the repository:
git clone https://github.com/BandaAkshith/House-Prices-Prediction-using-TFDF.git
- Navigate to the project directory:
cd House-Price-Prediction-using-TFDF
- Install the required libraries:
pip install -r requirements.txt
- Open the Jupyter Notebook:
jupyter notebook house-prices-prediction-using-tfdf.ipynb
- Follow the steps in the notebook to load data, train the model, and make predictions.
- Achieved RMSE of
<add your result>
on the validation set. - Feature importance analysis highlighted
OverallQual
,GrLivArea
, andGarageCars
as top contributors to house prices.
- Experiment with additional models (e.g., Gradient Boosting, XGBoost).
- Optimize hyperparameters for improved accuracy.
- Explore additional feature engineering techniques.
Contributions are welcome! Feel free to open an issue or submit a pull request for enhancements or bug fixes.
- Kaggle Dataset for the data.
- TensorFlow Decision Forests team for their fantastic library.