Report Date: March 01, 2025 Revised: November 7, 2025
Project: Auto Price Prediction Using 1985 Auto Imports Database Evaluation Dataset: 158 training, 40 test samples | 42 features (6 PCA + 36 categorical)
This report compares 10 regression algorithms for automobile price prediction. After systematic evaluation using test performance, cross-validation stability, overfitting analysis, and training efficiency, Lasso Regression (alpha=10) was selected despite not achieving the lowest test RMSE.
Key Findings:
- Best Test Performance: XGBoost (RMSE = 1,663, R² = 0.942)
- Most Generalizable: Lasso (CV R² = 0.894 ± 0.027, overfitting = 3.3%)
- Fastest Training: Lasso (0.014 seconds)
- Worst Performance: SVR (RMSE = 6,918, R² = -0.009)
The trade-off between test accuracy and generalization stability led to Lasso's selection, prioritizing robust deployment over single-test-set metrics.
- Model Comparison Report: Automobile Price Prediction
Dataset: Train 158 (79%) | Test 40 (21%) | Features 42 | Target: Price ($5,118 - $29,589)
Metrics:
| Metric | Purpose | Interpretation |
|---|---|---|
| RMSE | Prediction error | Lower better (dollars) |
| R² | Variance explained | Higher better (0-1) |
| Training R² | Fit capacity | Indicates overfitting if >> Test R² |
| CV R² (mean ± SD) | Generalization | 5-fold stability measure |
| Overfit (Δ R²) | Train - Test R² | Lower gap = better generalization |
| Training Time | Fit duration | Seconds; faster enables retraining |
Selection Criteria (Weighted):
- Generalization (CV R² and stability) - 40%
- Accuracy (Test RMSE and R²) - 30%
- Stability (Overfitting gap) - 20%
- Efficiency (Training time, interpretability) - 10%
Performance Rankings:
| Rank | Model | Test RMSE | Test R² | Training R² | Overfit | Time (s) | CV R² |
|---|---|---|---|---|---|---|---|
| 1 | Gradient Boosting | 1,659 | 0.942 | 0.993 | 0.051 | 0.211 | 0.867 |
| 2 | XGBoost | 1,723 | 0.937 | 0.989 | 0.052 | 0.101 | 0.836 |
| 3 | Random Forest | 1,823 | 0.930 | 0.958 | 0.028 | 0.166 | 0.848 |
| 4 | KNN | 1,864 | 0.927 | 0.881 | -0.046 | 0.003 | 0.792 |
| 5 | Linear Regression | 1,920 | 0.922 | 0.956 | 0.033 | 0.009 | 0.879 |
| 6 | Lasso | 1,919 | 0.922 | 0.956 | 0.033 | 0.002 | 0.874 |
| 7 | Decision Tree | 2,079 | 0.909 | 0.958 | 0.049 | 0.003 | 0.763 |
| 8 | Ridge | 2,114 | 0.906 | 0.942 | 0.037 | 0.004 | 0.892 |
| 9 | ElasticNet | 2,531 | 0.865 | 0.898 | 0.033 | 0.002 | 0.872 |
| 10 | SVR | 6,918 | -0.009 | -0.093 | -0.083 | 0.006 | -0.118 |
Model Insights:
Tree-Based Models (Ranks 1-3, 7):
- Gradient Boosting: Best test RMSE (1,659) but training R² = 0.993 signals overfitting. CV-test gap = 7.5 points (0.942 - 0.867) suggests test set bias.
- XGBoost: Second-best RMSE with extreme training R² (0.989). CV-test gap = 10.1 points (largest), concerning for generalization.
- Random Forest: Best overfitting control among trees (Δ R² = 0.028), balances accuracy with generalization.
- Decision Tree: High variance (CV R² = 0.763), not competitive.
Linear Models (Ranks 5-6, 8-9):
- Linear Regression & Lasso: Nearly identical performance, excellent generalization with CV-test gap = 4.3-4.8 points. Fastest training (0.002-0.009s).
- Ridge: Highest CV R² (0.892) but weaker test performance (RMSE = 2,114). Default alpha may over-regularize.
- ElasticNet: Worst linear model (RMSE = 2,531). Default hyperparameters clearly suboptimal.
Instance-Based (Ranks 4, 10):
- KNN: Unusual underfit pattern (test R² > training R²). Poor CV R² (0.792) confirms lack of robustness.
- SVR: Complete failure with negative R² on all sets. Default RBF kernel inappropriate.
Key Observations:
- Tree methods dominate test RMSE (top 3) but show 7-10 point CV-test gaps
- Linear models show better generalization (4-5 point gaps) but sacrifice 200-250 RMSE
- Overfitting trade-off: lowest RMSE models have highest Δ R² (~0.05)
- Default hyperparameters critical: ElasticNet and SVR fail, suggesting tuning potential
Tuning Configurations:
| Model | Parameters Tuned | Grid | Folds | Time (s) | Optimal Parameters |
|---|---|---|---|---|---|
| Lasso | alpha: 10 values | 10 | 5 | 5.71 | alpha=10.0 (10x default) |
| ElasticNet | alpha: 10, l1_ratio: 10 | 100 | 5 | 0.82 | alpha=0.0046, l1_ratio=0.6 |
| Random Forest | n_estimators, max_depth, min_samples | 36 | 5 | 13.55 | n=50, depth=20, split=2 |
| Gradient Boosting | n_estimators, lr, max_depth | 27 | 5 | 6.78 | n=50, lr=0.3, depth=3 |
| XGBoost | n_estimators, lr, depth, subsample | 81 | 5 | 15.15 | n=200, lr=0.1, depth=3, sub=0.6 |
Total Tuning Time: 42 seconds (253 model fits)
Post-Tuning Performance:
| Model | Test RMSE | Test R² | CV R² (Mean ± SD) | Training R² | Overfit | Time (s) |
|---|---|---|---|---|---|---|
| XGBoost | 1,663 | 0.942 | 0.859 ± 0.027 | 0.997 | 0.056 | 0.161 |
| Gradient Boosting | 1,842 | 0.928 | 0.865 ± 0.032 | 0.997 | 0.068 | 0.060 |
| Random Forest | 1,883 | 0.925 | 0.848 ± 0.053 | 0.978 | 0.053 | 0.130 |
| ElasticNet | 1,968 | 0.918 | 0.893 ± 0.034 | 0.953 | 0.034 | 0.004 |
| Lasso | 1,987 | 0.917 | 0.894 ± 0.027 | 0.950 | 0.033 | 0.014 |
Tuning Impact:
| Model | Δ RMSE | Δ CV R² | Key Finding |
|---|---|---|---|
| ElasticNet | -563 | +0.021 | Massive improvement (default severely over-regularized) |
| Lasso | +68 | +0.020 | Worse test, better CV (prioritized generalization) |
| XGBoost | -60 | +0.023 | Marginal gain (already near-optimal) |
| Gradient Boosting | +183 | -0.002 | Worse test (tuning favored generalization) |
| Random Forest | +60 | +0.001 | Negligible change |
Multi-Criteria Scoring:
| Model | CV R² (40%) | Test R² (30%) | Stability (20%) | Efficiency (10%) | Total |
|---|---|---|---|---|---|
| Lasso | 0.358 | 0.275 | 0.194 | 0.098 | 0.925 |
| ElasticNet | 0.357 | 0.275 | 0.193 | 0.100 | 0.925 |
| Gradient Boosting | 0.346 | 0.278 | 0.186 | 0.083 | 0.893 |
| XGBoost | 0.344 | 0.283 | 0.189 | 0.031 | 0.847 |
| Random Forest | 0.339 | 0.278 | 0.189 | 0.038 | 0.844 |
Decision: Lasso Regression (alpha=10.0)
Lasso and ElasticNet tied (0.925), but Lasso selected for:
- Interpretability: L1 regularization zeroed 13 features (31% sparsity). ElasticNet retains all with small coefficients.
- CV Stability: SD = 0.027 vs. ElasticNet SD = 0.034 (26% lower variance)
- Simplicity: 1 hyperparameter (alpha) vs. 2 (alpha, l1_ratio)
- Established: More widely adopted in pricing applications, easier regulatory explanation
Why Not XGBoost (Lowest Test RMSE)?
| Concern | XGBoost | Lasso | Impact |
|---|---|---|---|
| Test RMSE | $1,663 | $1,987 | Lasso loses $324 (16% higher) |
| CV R² | 0.859 | 0.894 | Lasso gains 3.5 points (4% better generalization) |
| CV-Test Gap | 8.3 pts | 2.3 pts | Lasso consistent across data splits |
| Training R² | 0.997 | 0.950 | XGBoost memorizing training data |
| Interpretability | Black box (200 trees × 3 depth) | Transparent (29 coefficients) | Lasso enables business insights |
| Training Time | 0.161s (11.5x slower) | 0.014s | Lasso enables rapid retraining |
| Prediction Speed | 25,200 ops (600x slower) | 42 ops | Lasso suitable for real-time API |
Trade-off Analysis:
RMSE Sacrifice: $1,987 vs. $1,663 = $324 additional error
Percentage of avg price ($12,759): 2.5%
Generalization Gain: CV R² 0.894 vs. 0.859 = +3.5 points
Improvement on new data: 4% better variance explained
Business Context: Pricing decisions round to nearest $500-$1,000
$324 difference is within rounding tolerance
Cross-Validation Validation:
Lasso outperformed XGBoost in 4 of 5 CV folds:
| Fold | Lasso R² | XGBoost R² | Winner |
|---|---|---|---|
| 1 | 0.935 | 0.890 | Lasso (+4.5) |
| 2 | 0.917 | 0.875 | Lasso (+4.2) |
| 3 | 0.867 | 0.820 | Lasso (+4.7) |
| 4 | 0.909 | 0.850 | Lasso (+5.9) |
| 5 | 0.871 | 0.860 | XGBoost (+1.1) |
| Mean | 0.900 | 0.859 | Lasso (+4.1) |
Interpretation: Test set not representative. Lasso wins most data splits, suggesting superior production performance.
Fold-by-Fold Stability:
| Metric | Lasso | XGBoost | Interpretation |
|---|---|---|---|
| Mean CV R² | 0.900 | 0.859 | Lasso +4.1 points |
| Std Dev | 0.027 | 0.027 | Equal fold variance |
| Range | 0.068 (0.867-0.935) | 0.070 (0.820-0.890) | Similar spread |
| Worst fold | 0.867 | 0.820 | Lasso's worst > XGBoost mean |
CV vs. Test Gap Analysis:
| Model | Test R² | CV R² | Gap | Interpretation |
|---|---|---|---|---|
| Lasso | 0.917 | 0.894 | 0.023 | Consistent generalization |
| ElasticNet | 0.918 | 0.893 | 0.025 | Consistent |
| XGBoost | 0.942 | 0.859 | 0.083 | Potential test set overfit |
| Gradient Boosting | 0.928 | 0.865 | 0.063 | Test set favorability |
| Random Forest | 0.925 | 0.848 | 0.077 | Test easier than CV |
Key Insight: XGBoost's 8.3-point gap (largest) suggests test set contains patterns XGBoost exploits but that don't generalize. Lasso's 2.3-point gap indicates consistent performance. In production, new data resembles CV fold distributions more than specific test set.
Time Comparison:
| Model | Training Time | Speedup vs. Slowest |
|---|---|---|
| ElasticNet | 0.004s | 52.8x |
| Lasso | 0.014s | 15.1x |
| Linear Regression | 0.009s | 23.4x |
| Gradient Boosting | 0.060s | 3.5x |
| XGBoost | 0.161s | 1.3x |
| Random Forest | 0.130s | 1.6x |
| Gradient Boosting (base) | 0.211s | 1.0x (slowest) |
Operational Implications:
| Scenario | Lasso | XGBoost | Difference |
|---|---|---|---|
| Daily retraining (2,000 samples) | 0.14s | 1.61s | 1.47s (negligible) |
| Hyperparameter retuning | 5.71s (50 fits) | 15.15s (405 fits) | 9.44s |
| Real-time inference (per prediction) | 42 operations | 25,200 operations | 600x faster |
Comparison:
| Metric | Lasso | XGBoost | Difference |
|---|---|---|---|
| MAE | $1,482 | $1,255 | XGBoost 15% better |
| RMSE | $1,987 | $1,663 | XGBoost 16% better |
| MAPE | 12.4% | 1.9% | XGBoost significantly better |
| RMSE/MAE Ratio | 1.34 | 1.32 | Similar distribution shape |
95% Confidence Intervals (for $12,759 avg car):
- Lasso: $8,865 - $16,653 (±$3,894)
- XGBoost: $9,499 - $16,019 (±$3,260)
Impact: 6% wider interval for Lasso acceptable given superior generalization properties.
Primary Model: Lasso (alpha=10)
- Deploy for production pricing
- Use sparse coefficients for stakeholder communication
- Retrain quarterly with new data
A/B Testing (Recommended):
- Primary: Lasso (70% of predictions)
- Challenger: XGBoost (30%)
- Monitor: If XGBoost consistently outperforms over 3+ months, consider switching
- Metrics: Live RMSE, prediction latency, business impact
Retrain if:
- RMSE on new data exceeds $2,500 (20% degradation)
- 100+ new samples collected (50% data increase)
- Market changes (new brands, economic shifts)
- Quarterly scheduled (best practice)
Short-Term (1-3 months):
- Collect 200+ contemporary samples (2020-2025 data)
- Test interaction terms (brand × engine-size)
- Validate on modern vehicle data
Medium-Term (3-6 months):
- Implement SHAP values for XGBoost interpretability
- Develop A/B testing framework
- Add temporal features (year, mileage)
Long-Term (6-12 months):
- Transition to ensemble if XGBoost proves reliable
- Explore neural networks for automatic feature learning
- Build region-specific models (North America, Europe, Asia)
After evaluating 10 algorithms across base and tuned configurations, Lasso Regression (alpha=10) was selected based on multi-criteria framework prioritizing generalization, stability, and interpretability.
Key Findings:
-
Test Accuracy vs. Generalization: XGBoost achieved lowest test RMSE (1,663) but 8.3-point CV-test gap. Lasso showed consistent 2.3-point gap with only 16% higher RMSE.
-
Overfitting Trade-offs: Tree models (Training R² > 0.99) indicate memorization. Lasso's Training R² = 0.950 suggests appropriate complexity.
-
Hyperparameter Impact: ElasticNet saw largest improvement (+563 RMSE). Lasso gained CV stability despite slightly worse test performance.
-
Interpretability: Lasso's 29 sparse coefficients enable stakeholder trust and regulatory compliance. XGBoost remains black box.
-
Deployment: Lasso's 11.5x faster training and 600x faster prediction make it suitable for real-time APIs.
Final Verdict: Lasso selected over XGBoost represents principled trade-off: sacrificing 2.5% test accuracy ($324 RMSE) to gain 4.1 points in CV R², reduce overfitting by 2.3%, and enable full interpretability. This aligns with best practices for production ML systems where robustness and transparency outweigh marginal accuracy gains.
Lasso Configuration:
from sklearn.linear_model import Lasso
model = Lasso(alpha=10.0, max_iter=10000, random_state=42)XGBoost Configuration (Alternative):
from xgboost import XGBRegressor
model = XGBRegressor(
n_estimators=200,
learning_rate=0.1,
max_depth=3,
subsample=0.6,
random_state=42
)GridSearchCV:
from sklearn.model_selection import GridSearchCV
param_grid = {'alpha': [0.1, 0.5, 1, 5, 10, 50, 100, 500, 1000]}
grid = GridSearchCV(Lasso(), param_grid, cv=5, scoring='neg_mean_squared_error')
grid.fit(X_train, y_train)
best_model = grid.best_estimator_Cross-Validation:
from sklearn.model_selection import cross_val_score
cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='r2')
print(f"CV R²: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")
# Output (Lasso): CV R²: 0.899 ± 0.027Report Prepared By: Dhanesh B. B. Contact: GitHub License: MIT
End of Model Comparison Report