Skip to content

TypeError when supplying a boolean X to HuberRegressor fit #69

@rowan-stein

Description

@rowan-stein

User Request

A TypeError is raised when fitting HuberRegressor with boolean predictors. The expectation is that boolean arrays should be accepted and internally converted to float (consistent with LinearRegression).

Specification

  • Root cause: HuberRegressor.fit does not coerce boolean inputs to floating-point during validation; downstream numerical routines (L-BFGS-B via SciPy) expect float arrays and error on bool dtype.
  • Required change: Validate inputs using scikit-learn’s standard utilities with dtype=FLOAT_DTYPES (via check_X_y(..., dtype=FLOAT_DTYPES, ...)). Handle sample_weight similarly with check_array(..., dtype=FLOAT_DTYPES) and check_consistent_length.
  • File/Location: sklearn/linear_model/huber.py, method HuberRegressor.fit(self, X, y, sample_weight=None).
  • Sparse handling: Keep current behavior (accept_sparse=['csr']).
  • Tests: Add unit tests under sklearn/linear_model/tests/test_huber.py to verify boolean X acceptance and equivalence to float-cast. Include a sample_weight boolean case, consistent with current support.

Steps/Code to Reproduce

import numpy as np
from sklearn.datasets import make_regression
from sklearn.linear_model import HuberRegressor

# Random data
X, y, coef = make_regression(n_samples=200, n_features=2, noise=4.0, coef=True, random_state=0)
X_bool = X > 0
X_bool_as_float = np.asarray(X_bool, dtype=float)

# Works
huber = HuberRegressor().fit(X, y)
# Fails (!)
huber = HuberRegressor().fit(X_bool, y)
# Also works
huber = HuberRegressor().fit(X_bool_as_float, y)

Expected Results

No error is thrown when dtype of X is bool (.fit(X_bool, y) should work). Boolean arrays are expected to be converted to float by HuberRegressor.fit (consistent with LinearRegression).

Actual Results

A TypeError is raised:

TypeError                                 Traceback (most recent call last)
----> 1 huber = HuberRegressor().fit(X_bool, y)

.../sklearn/linear_model/huber.py in fit(self, X, y, sample_weight)
    286             args=(X, y, self.epsilon, self.alpha, sample_weight),
    287             maxiter=self.max_iter, pgtol=self.tol, bounds=bounds,
--> 288             iprint=0)
    289         if dict_['warnflag'] == 2:
    290             raise ValueError("HuberRegressor convergence failed:")

Proposed Fix

  • In HuberRegressor.fit, call check_X_y(X, y, dtype=FLOAT_DTYPES, y_numeric=True, accept_sparse=['csr']) to coerce bool inputs to float.
  • If sample_weight is provided, validate/cast it using check_array(..., ensure_2d=False, dtype=FLOAT_DTYPES) and verify length consistency with check_consistent_length.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions