SFS: support iterable/generator cv without exhaustion (swev-id: scikit-learn__scikit-learn-25973)

Unable to pass splits to SequentialFeatureSelector

Describe the bug
Passing an iterable of splits (e.g., a generator from a CV splitter) to SequentialFeatureSelector fails. Using cv=5 works, but cv=splits generated via LeaveOneGroupOut.split(X, y, groups) triggers an IndexError.

Steps/Code to Reproduce
```python
from sklearn.datasets import make_classification
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import LeaveOneGroupOut
import numpy as np

X, y = make_classification()

groups = np.zeros_like(y, dtype=int)
groups[y.size//2:] = 1

cv = LeaveOneGroupOut()
splits = cv.split(X, y, groups=groups)

clf = KNeighborsClassifier(n_neighbors=5)

seq = SequentialFeatureSelector(clf, n_features_to_select=5, scoring='accuracy', cv=splits)
seq.fit(X, y)
```

Expected Results
Runs without errors.

Actual Results
```
IndexError: list index out of range
  File "sklearn/model_selection/_validation.py", line 1930, in _aggregate_score_dicts
    for key in scores[0]
```

Observed Failure and Explanation
- When cv is a generator, it is consumed during the first scoring pass. Subsequent passes see an exhausted iterable, leading to empty results and IndexError in _aggregate_score_dicts.
- Converting to a list (cv=list(splits)) works around the issue but is inconsistent with typical cv handling.

Specification (research by Emerson Gray)
- Align SequentialFeatureSelector cv handling with *SearchCV classes.
- In fit, call `check_cv(self.cv, y, classifier=is_classifier(self.estimator))` once to normalize cv.
- If cv is an iterable/generator, `check_cv` will materialize it once to a reusable list via its wrapper.
- Thread the normalized cv into all cross_val_score calls within SFS to prevent re-listing/re-consuming.
- Update SFS cv parameter docs to mirror *SearchCV: accept int, splitter objects, or iterables of (train, test) indices. Note that iterables are materialized once (memory cost).

Implementation Outline
- File: sklearn/feature_selection/_sequential.py
  - Imports: add `from ..base import is_classifier` and `from ..model_selection import check_cv, cross_val_score`.
  - In `SequentialFeatureSelector.fit`: `cv = check_cv(self.cv, y, classifier=is_classifier(self.estimator))`, then pass `cv` into scoring helper.
  - Update `_get_best_new_feature_score` to accept `cv` and use it in `cross_val_score`.
  - Update docstring for `cv` parameter.

Non-regression Tests
- File: sklearn/feature_selection/tests/test_sequential.py
  - `test_sfs_supports_iterable_cv_generator`: build splits via LeaveOneGroupOut.split (generator), use as cv in SFS, ensure fit completes and selects expected number of features.
  - `test_sfs_baseline_cv_int_runs`: verify cv=5 baseline still works.

Versions
Originally observed in 1.2.2; branch scikit-learn__scikit-learn-25973 targets this fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFS: support iterable/generator cv without exhaustion (swev-id: scikit-learn__scikit-learn-25973) #55

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SFS: support iterable/generator cv without exhaustion (swev-id: scikit-learn__scikit-learn-25973) #55

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions