fix: relax pandas index wrapping (swev-id: scikit-learn__scikit-learn-25747) by casey-brooks · Pull Request #73 · agyn-sandbox/scikit-learn

casey-brooks · 2026-01-13T21:29:48Z

Summary

relax _wrap_in_pandas_container so pandas outputs keep their own index when row counts change and only align when lengths match
adjust ndarray wrapping to ignore mismatched indices while preserving column naming
add targeted FeatureUnion and utility tests covering mismatch and aligned cases

Testing

pytest -q sklearn/utils/tests/test_set_output.py::test__wrap_in_pandas_container_preserve_index_on_length_mismatch_dataframe
pytest -q sklearn/utils/tests/test_set_output.py::test__wrap_in_pandas_container_ndarray_ignore_index_on_length_mismatch
pytest -q sklearn/utils/tests/test_set_output.py::test__wrap_in_pandas_container_align_when_lengths_match
pytest -q sklearn/tests/test_pipeline.py::test_feature_union_pandas_preserves_aggregated_index
pytest -q sklearn/tests/test_pipeline.py::test_feature_union_pandas_aligns_index_when_lengths_match
flake8 sklearn/utils/_set_output.py sklearn/utils/tests/test_set_output.py sklearn/tests/test_pipeline.py

Reproduction

Steps from #72:

import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn import set_config
from sklearn.pipeline import make_union

index = pd.date_range(start="2020-01-01", end="2020-01-05", inclusive="left", freq="H")
data = pd.DataFrame(index=index, data=[10] * len(index), columns=["value"])
data["date"] = index.date

class MyTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        return X["value"].groupby(X["date"]).sum()

set_config(transform_output="default")
print(make_union(MyTransformer()).fit_transform(data))

set_config(transform_output="pandas")
print(make_union(MyTransformer()).fit_transform(data))

Observed failure:

ValueError: Length mismatch: Expected axis has 4 elements, new values have 96 elements

Stack trace summary: the error is raised in sklearn/utils/_set_output.py when _wrap_in_pandas_container assigns the original input index to a pandas output with fewer rows.

Fixes #72.

casey-brooks · 2026-01-13T21:30:06Z

Local Testing

pytest -q sklearn/utils/tests/test_set_output.py::test__wrap_in_pandas_container_preserve_index_on_length_mismatch_dataframe ➜ 1 passed
pytest -q sklearn/utils/tests/test_set_output.py::test__wrap_in_pandas_container_ndarray_ignore_index_on_length_mismatch ➜ 1 passed
pytest -q sklearn/utils/tests/test_set_output.py::test__wrap_in_pandas_container_align_when_lengths_match ➜ 1 passed
pytest -q sklearn/tests/test_pipeline.py::test_feature_union_pandas_preserves_aggregated_index ➜ 1 passed
pytest -q sklearn/tests/test_pipeline.py::test_feature_union_pandas_aligns_index_when_lengths_match ➜ 1 passed
flake8 sklearn/utils/_set_output.py sklearn/utils/tests/test_set_output.py sklearn/tests/test_pipeline.py ➜ no issues

noa-lucent

Thanks for the fixes and the added coverage. I left one blocking comment: we still need an integration test that exercises a FeatureUnion transformer returning a pandas Series, which is the exact shape that regressed in #72. That path bypasses the new DataFrame branch, so we should lock it down explicitly.

sklearn/tests/test_pipeline.py

casey-brooks · 2026-01-13T21:50:07Z

Local Testing Update

pytest -q sklearn/tests/test_pipeline.py::test_feature_union_pandas_preserves_aggregated_index_series ➜ 1 passed
pytest -q sklearn/tests/test_pipeline.py::test_feature_union_pandas_preserves_aggregated_index ➜ 1 passed
pytest -q sklearn/tests/test_pipeline.py::test_feature_union_pandas_aligns_index_when_lengths_match ➜ 1 passed
pytest -q sklearn/utils/tests/test_set_output.py::test__wrap_in_pandas_container_preserve_index_on_length_mismatch_dataframe ➜ 1 passed
pytest -q sklearn/utils/tests/test_set_output.py::test__wrap_in_pandas_container_ndarray_ignore_index_on_length_mismatch ➜ 1 passed
pytest -q sklearn/utils/tests/test_set_output.py::test__wrap_in_pandas_container_align_when_lengths_match ➜ 1 passed

noa-lucent

Thanks for adding the Series coverage and extending the wrapper logic. The new _wrap_in_pandas_container branch for Series plus the FeatureUnion integration test exercise the exact regression from #72, and the unit tests assert the DataFrame/ndarray behavior stays consistent. Looks good to me.

fix: relax pandas index wrapping

ebab79c

casey-brooks requested a review from a team January 13, 2026 21:29

noa-lucent requested changes Jan 13, 2026

View reviewed changes

sklearn/tests/test_pipeline.py Show resolved Hide resolved

sklearn/tests/test_pipeline.py Show resolved Hide resolved

fix: preserve pandas series index

587e929

noa-lucent approved these changes Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: relax pandas index wrapping (swev-id: scikit-learn__scikit-learn-25747)#73

fix: relax pandas index wrapping (swev-id: scikit-learnscikit-learn-25747)#73
casey-brooks wants to merge 2 commits intoscikit-learnscikit-learn-25747from
fix/pandas-index-wrap-25747

casey-brooks commented Jan 13, 2026

Uh oh!

casey-brooks commented Jan 13, 2026

Uh oh!

noa-lucent left a comment

Uh oh!

Uh oh!

Uh oh!

casey-brooks commented Jan 13, 2026

Uh oh!

noa-lucent left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

casey-brooks commented Jan 13, 2026

Summary

Testing

Reproduction

Uh oh!

casey-brooks commented Jan 13, 2026

Local Testing

Uh oh!

noa-lucent left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

casey-brooks commented Jan 13, 2026

Local Testing Update

Uh oh!

noa-lucent left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants