An advanced machine learning pipeline for predicting house prices using ensemble learning, automated hyperparameter optimization, and feature engineering. This project involves production-ready ML practices including custom sklearn transformers, statistical outlier detection, and automated model selection.
LogTransformer: Skewness reduction with sparse feature detectionOutliersRemoval: Statistical outlier elimination for training dataCatEncoder: Combined one-hot and ordinal encoding with missing value handlingTotalArea: Domain-specific feature combination (basement + ground floor)TotalBaths: Weighted bathroom counting (full=1.0, half=0.5)HighlyCorrelatedFeatures: Automated multicollinearity reductionMedianImputer: Robust missing value imputationAgeCalculator: Temporal feature engineering (house age from build/sale years)
- Optuna Objective Function: Automated ensemble discovery with cross-validation scoring
- Performance Metrics: RMSE, R², MAE for comprehensive evaluation
- Residual Analysis: Error distribution examination for model validation
- Visualization: Optimization history and parameter importance plots
- Training Pipeline: Includes outlier removal and feature engineering
- Validation/Test Pipeline: Consistent preprocessing without outlier removal
- Missing Value Handling: Differential imputation
pip install -r requirements.txt