Skip to content

Comments

swev-id: scikit-learn__scikit-learn-25931: Fix IsolationForest feature names warning during fit when contamination != 'auto'#58

Open
casey-brooks wants to merge 1 commit intoscikit-learn__scikit-learn-25931from
fix-iforest-feature-names-warning-376
Open

swev-id: scikit-learn__scikit-learn-25931: Fix IsolationForest feature names warning during fit when contamination != 'auto'#58
casey-brooks wants to merge 1 commit intoscikit-learn__scikit-learn-25931from
fix-iforest-feature-names-warning-376

Conversation

@casey-brooks
Copy link

Reproduction (before fix)

```bash
PYTHONWARNINGS=always python - <<'PY'
import pandas as pd
from sklearn.ensemble import IsolationForest

X = pd.DataFrame({"a": [-1.1, 0.3, 0.5, 100]})
IsolationForest(random_state=0, contamination=0.05).fit(X)
PY
```

Observed warning
```text
/workspace/scikit-learn/.venv/lib/python3.11/site-packages/pandas/core/algorithms.py:1743: DeprecationWarning: is_sparse is deprecated and will be removed in a future version. Check isinstance(dtype, pd.SparseDtype) instead.
return lib.map_infer(values, mapper, convert=convert)
/workspace/scikit-learn/sklearn/utils/validation.py:605: DeprecationWarning: is_sparse is deprecated and will be removed in a future version. Check isinstance(dtype, pd.SparseDtype) instead.
if is_sparse(pd_dtype):
/workspace/scikit-learn/sklearn/utils/validation.py:614: DeprecationWarning: is_sparse is deprecated and will be removed in a future version. Check isinstance(dtype, pd.SparseDtype) instead.
if is_sparse(pd_dtype) or not is_extension_array_dtype(pd_dtype):
/workspace/scikit-learn/sklearn/base.py:451: UserWarning: X does not have valid feature names, but IsolationForest was fitted with feature names
warnings.warn(
```

Summary

  • delegate IsolationForest.score_samples validation to a new _score_samples_no_validation helper
  • use the helper during fit to compute offset_ without re-validating the already-sanitized training data
  • add pandas DataFrame regression tests covering fit, score, predict, and contamination='auto'

Tests

  • pytest -q sklearn/ensemble/tests/test_iforest.py -k "fit_dataframe_contamination_no_warning or dataframe_then_ndarray_warns_on_score_and_predict or fit_dataframe_auto_no_warning_and_offset"
  • pytest -q sklearn/ensemble/tests/test_iforest.py -k "chunks_works1 or chunks_works2"
  • flake8 sklearn/ensemble/_iforest.py sklearn/ensemble/tests/test_iforest.py

Fixes #51

@casey-brooks casey-brooks requested a review from a team December 26, 2025 13:39
@casey-brooks
Copy link
Author

Local Testing

  • pytest -q sklearn/ensemble/tests/test_iforest.py -k "fit_dataframe_contamination_no_warning or dataframe_then_ndarray_warns_on_score_and_predict or fit_dataframe_auto_no_warning_and_offset" → 3 passed, 21 deselected
  • pytest -q sklearn/ensemble/tests/test_iforest.py -k "chunks_works1 or chunks_works2" → 4 passed, 20 deselected
  • flake8 sklearn/ensemble/_iforest.py sklearn/ensemble/tests/test_iforest.py → no issues

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal helper cleanly separates validation from scoring so fit no longer retriggers the feature-name warning, and the new pandas regression tests cover the relevant pathways. Looks ready to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants