IsolationForest: feature names warning during fit when contamination != 'auto'

## User request

X does not have valid feature names, but IsolationForest was fitted with feature names

### Describe the bug
If you fit an IsolationForest using a pd.DataFrame it generates a warning:

```
X does not have valid feature names, but IsolationForest was fitted with feature names
```

This only occurs when a non-default contamination value (i.e., not "auto") is supplied. The warning is unexpected because (a) X does have valid feature names and (b) it is raised by fit(), whereas this warning usually indicates that predict/score methods were called with an ndarray after fitting on a DataFrame.

Root cause: When contamination != "auto", IsolationForest computes offset_ by internally calling score_samples on the training data. At that point X has already been validated and converted to an array, triggering a feature-name mismatch when score_samples re-validates with reset=False.

### Steps/Code to Reproduce
```python
from sklearn.ensemble import IsolationForest
import pandas as pd

X = pd.DataFrame({"a": [-1.1, 0.3, 0.5, 100]})
clf = IsolationForest(random_state=0, contamination=0.05).fit(X)
```

### Expected Results
No "X does not have valid feature names" warning during fit.

### Actual Results
UserWarning during fit:
```
X does not have valid feature names, but IsolationForest was fitted with feature names
```

### Versions
Reproducible with scikit-learn 1.2.x and main; see upstream reports.


## Implementation specification (from research)

Rationale: Avoid false-positive feature-name mismatch warnings during internal calls to score_samples in fit() while preserving feature-name validation for all user-facing methods.

Proposed changes (sklearn/ensemble/_iforest.py):
1. Introduce a private method `_score_samples_no_validation(self, X)` that computes score samples without calling `_validate_data`.
2. Update public `score_samples(self, X)` to validate inputs and then delegate to `_score_samples_no_validation(X)`.
3. In `fit()`, when `contamination != 'auto'`, call the private method instead of the public one to compute `offset_`.

Testing plan (sklearn/ensemble/tests/test_iforest.py):
- test_iforest_fit_dataframe_contamination_no_warning: Fit with a pandas DataFrame and contamination set; assert no feature-name warning is emitted by fit.
- test_iforest_dataframe_then_ndarray_warns_on_score_and_predict: Fit with DataFrame; calling score_samples/predict with ndarray should still warn.
- test_iforest_fit_dataframe_auto_no_warning_and_offset: With contamination='auto', fit does not warn and offset_ remains the documented value.

These changes follow the pattern used in scikit-learn PR #24873 (public method validates, private method performs computation), ensuring internal calls during fit do not re-trigger feature-name checks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IsolationForest: feature names warning during fit when contamination != 'auto' #51

User request

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Implementation specification (from research)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IsolationForest: feature names warning during fit when contamination != 'auto' #51

Description

User request

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Implementation specification (from research)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions