forked from scikit-learn/scikit-learn
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
User Request
A TypeError is raised when fitting HuberRegressor with boolean predictors. The expectation is that boolean arrays should be accepted and internally converted to float (consistent with LinearRegression).
Specification
- Root cause:
HuberRegressor.fitdoes not coerce boolean inputs to floating-point during validation; downstream numerical routines (L-BFGS-B via SciPy) expect float arrays and error on bool dtype. - Required change: Validate inputs using scikit-learn’s standard utilities with
dtype=FLOAT_DTYPES(viacheck_X_y(..., dtype=FLOAT_DTYPES, ...)). Handlesample_weightsimilarly withcheck_array(..., dtype=FLOAT_DTYPES)andcheck_consistent_length. - File/Location:
sklearn/linear_model/huber.py, methodHuberRegressor.fit(self, X, y, sample_weight=None). - Sparse handling: Keep current behavior (
accept_sparse=['csr']). - Tests: Add unit tests under
sklearn/linear_model/tests/test_huber.pyto verify booleanXacceptance and equivalence to float-cast. Include asample_weightboolean case, consistent with current support.
Steps/Code to Reproduce
import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import HuberRegressor
# Random data
X, y, coef = make_regression(n_samples=200, n_features=2, noise=4.0, coef=True, random_state=0)
X_bool = X > 0
X_bool_as_float = np.asarray(X_bool, dtype=float)
# Works
huber = HuberRegressor().fit(X, y)
# Fails (!)
huber = HuberRegressor().fit(X_bool, y)
# Also works
huber = HuberRegressor().fit(X_bool_as_float, y)Expected Results
No error is thrown when dtype of X is bool (.fit(X_bool, y) should work). Boolean arrays are expected to be converted to float by HuberRegressor.fit (consistent with LinearRegression).
Actual Results
A TypeError is raised:
TypeError Traceback (most recent call last)
----> 1 huber = HuberRegressor().fit(X_bool, y)
.../sklearn/linear_model/huber.py in fit(self, X, y, sample_weight)
286 args=(X, y, self.epsilon, self.alpha, sample_weight),
287 maxiter=self.max_iter, pgtol=self.tol, bounds=bounds,
--> 288 iprint=0)
289 if dict_['warnflag'] == 2:
290 raise ValueError("HuberRegressor convergence failed:")
Proposed Fix
- In
HuberRegressor.fit, callcheck_X_y(X, y, dtype=FLOAT_DTYPES, y_numeric=True, accept_sparse=['csr'])to coerce bool inputs to float. - If
sample_weightis provided, validate/cast it usingcheck_array(..., ensure_2d=False, dtype=FLOAT_DTYPES)and verify length consistency withcheck_consistent_length.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels