Skip to content

bugfix: Set requires_fit tag to False for all stateless estimators#1390

Open
reevesc7 wants to merge 5 commits intoEpistasisLab:mainfrom
reevesc7:fix-passthrough-fit
Open

bugfix: Set requires_fit tag to False for all stateless estimators#1390
reevesc7 wants to merge 5 commits intoEpistasisLab:mainfrom
reevesc7:fix-passthrough-fit

Conversation

@reevesc7
Copy link

@reevesc7 reevesc7 commented Mar 6, 2026

What does this PR do?

This PR adds the following line to Passthrough.fit() and SkipTransformer.fit():

        self.is_fitted_ = True

It also alters the docstring for each function to reflect the new behavior.

Where should the reviewer start?

tpot/builtin_modules/passthrough.py:

  • Passthrough.fit()
  • SkipTransformer.fit()

How should this PR be tested?

The following script produces the issue described below:

from tpot import TPOTEstimator
import tpot.config as cfg
import tpot.search_spaces.pipelines as pl
import tpot.objectives as obj
from sklearn.datasets import load_diabetes
from sklearn.utils import Bunch


def main():
    dynlin_space = pl.DynamicLinearPipeline(
        search_space=cfg.get_search_space("Passthrough"),
        max_length=3,
    )
    seq_space = pl.SequentialPipeline(
        search_spaces=[
            dynlin_space,
            cfg.get_search_space("KNeighborsRegressor"),
        ],
    )
    est = TPOTEstimator(
        search_space=seq_space,
        scorers=[
            "neg_mean_squared_error",
            obj.complexity_scorer,
        ],
        scorers_weights=[1, -1],
        classification=False,
        cv=4,
        bigger_is_better=True,
        population_size=4,
        generations=1,
        max_time_mins=20,
        max_eval_time_mins=1,
        verbose=4,
        random_state=1,
    )
    data = load_diabetes()
    assert isinstance(data, Bunch)
    est.fit(data.data, data.target)


if __name__ == "__main__":
    main()

When run before the patch, TPOT raises a "No individuals could be evaluated in the initial population...." exception, with each individual raising the following exception:

This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

After the patch, fitting should complete successfully.

Notably, the above script only tests Passthrough.
I personally have only run into this issue while using Passthrough, as I have not used SkipTransformer.
However, I infer that it would cause the same problem and that this solution will not cause any issues.

Any background context you want to provide?

As is, Passthrough.fit() and SkipTransformer.fit() do not set any properties which indicate to scikit-learn that the object is fitted.

scikit-learn checks whether objects are fitted by identifying properties which end in _ and do not start with __.
To determine whether a pipeline is fitted, scikit-learn checks whether each component of the pipeline has such properties.

Passthrough.fit() and SkipTransformer.fit() merely return self, failing to adhere to scikit-learn's fitting convention.
This causes an exception when Passthrough is a step in a DynamicLinearPipeline which is a step in a SequentialPipeline.
However, I believe it does not cause an exception when a Passthrough is merely included as a step in a SequentialPipeline.
I have not investigated other cases.

What are the relevant issues?

None that I found with a brief search.

Questions:

  • Do the docs need to be updated? No, this PR does not alter any intended behavior
  • Does this PR add new (Python) dependencies? No

@reevesc7
Copy link
Author

reevesc7 commented Mar 8, 2026

While trolling through TPOT's code, I noticed that other estimator classes have fit() methods which do not alter state.

I looked into scikit-learn's conventions for fitting determination as well, and found that the recommended approach for stateless estimators is to set the requires_fit tag to False.
The docstring for sklearn.utils.validation.check_is_fitted() states as such:

If no `attributes` are passed, this function will pass if an estimator is stateless.
An estimator can indicate it's stateless by setting the `requires_fit` tag. See
:ref:`estimator_tags` for more information. Note that the `requires_fit` tag
is ignored if `attributes` are passed.

The scikit-learn 1.6 release highlights state that estimators should set tags by overriding BaseEstimator.__sklearn_tags__().

As such, I have now reverted the changes I made to Passthrough and SkipTransformer.
Instead, I have added a __sklearn_tags__() method which includes tags.requires_fit = False to every stateless built-in estimator.

I also added _ to the end of the names of the 2 properties created by FeatureEncodingFrequencySelector.fit(), such that they will now act as fitted flags.

I have validated that my overload of __sklearn_tags__() properly sets requires_fit to False for Passthrough and MulTransformer and that this allows my test script above to run successfully (with either Passthrough or MulTransformer).
However, I have not tested any other classes that I modified.

@reevesc7 reevesc7 changed the title bugfix: Add fitted flag to Passthrough, SkipTransformer bugfix: Set requires_fit tag to False for all stateless estimators Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant