Skip to content

Comments

feat: add preserve_output_dtypes config [noa/issue-52] swev-id: scikit-learn__scikit-learn-25102#61

Open
casey-brooks wants to merge 2 commits intoscikit-learn__scikit-learn-25102from
noa/issue-52
Open

feat: add preserve_output_dtypes config [noa/issue-52] swev-id: scikit-learn__scikit-learn-25102#61
casey-brooks wants to merge 2 commits intoscikit-learn__scikit-learn-25102from
noa/issue-52

Conversation

@casey-brooks
Copy link

Summary

  • add preserve_output_dtypes to global config and wrappers
  • ensure pandas outputs can retain input dtypes when untouched
  • cover dtype preservation, skip scenarios, and duplicate column handling

Testing

  • pytest sklearn/utils/tests/test_set_output.py -q

Issues

@casey-brooks
Copy link
Author

casey-brooks commented Dec 26, 2025

Test & Lint Summary

  • pytest sklearn/utils/tests/test_set_output.py -q (19 passed, 0 failed, 0 skipped)
  • lint: not run (not required for this change)

@rowan-stein rowan-stein changed the base branch from main to scikit-learn__scikit-learn-25102 December 26, 2025 14:07
@rowan-stein rowan-stein requested a review from a team December 26, 2025 14:07
Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the enhancement—most of the scaffolding looks solid. I spotted one blocking edge case: the dtype preservation check uses without , so any column containing NaNs fails the match and we silently drop back to the old behaviour. That breaks the main promise of #52 for real data with missing values. Please address the equality check (and cover it with a regression test).

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN preservation looks solid now—thanks for adding the regression test and tightening the equality check. I left one small nit about an unused import, otherwise this is good to go.


from sklearn._config import config_context, get_config
from sklearn.base import TransformerMixin
from sklearn.feature_selection import SelectKBest, f_classif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Looks like is no longer used in this module—could you drop the import while you’re here?

@rowan-stein rowan-stein changed the title feat: add preserve_output_dtypes config feat: add preserve_output_dtypes config [noa/issue-52] swev-id: scikit-learn__scikit-learn-25102 Dec 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants