bugfix: Exception raised during population/budget scaling after reading from `population.pkl` by reevesc7 · Pull Request #1392 · EpistasisLab/tpot

reevesc7 · 2026-03-06T09:07:19Z

What does this PR do?

This PR adds a cast to integer when retrieving self.population.evaluated_individuals["Generation"].max() in tpot.evolvers.BaseEvolver.__init__().
This resolves the symptoms of an issue with evaluated_individuals["Generation"] values being stored as numpy.float64.

Where should the reviewer start?

tpot/evolvers.base_evolver.py: BaseEvolver.__init__(), line 424.

How should this PR be tested?

It would be worth ensuring that no np.nan values can ever be stored in evaluated_individuals["Generation"].
Beyond that and precision errors for extremely large values, I cannot conceive of this causing any issues.

The following script replicates the issue:

from tpot import TPOTEstimator
import tpot.config as cfg
import tpot.objectives as obj
from sklearn.datasets import load_diabetes
from sklearn.utils import Bunch


def main():
    est = TPOTEstimator(
        search_space=cfg.get_search_space("KNeighborsRegressor"),
        scorers=[
            "neg_mean_squared_error",
            obj.complexity_scorer,
        ],
        scorers_weights=[
            1,
            -1,
        ],
        classification=False,
        cv=4,
        bigger_is_better=True,
        population_size=4,
        initial_population_size=5,
        generations_until_end_population=20,
        generations=1,
        max_time_mins=20,
        max_eval_time_mins=1,
        periodic_checkpoint_folder="warm_popscale",
        verbose=4,
        random_state=2,
    )
    data = load_diabetes()
    assert isinstance(data, Bunch)
    est.fit(data.data, data.target)
    est.fit(data.data, data.target)


if __name__ == "__main__":
    main()

Before the patch, this script will throw the following exception:

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

After the patch, it should run successfully.

Any background context you want to provide?

As far as I understand, the .loc[] insertion into the self.evaluated_individuals DataFrame in Population.update_column() forces Pandas to account for NaN values in any added column(s).
Because NumPy integers are not nullable, Pandas converts integers into numpy.float64, which then becomes the dtype of the "Generations" column.

This data then gets stored in population.pkl if periodic_checkpoint_folder is set.

When a TPOTEstimator with the same periodic_checkpoint_folder then runs .fit(), its _evolver_instance (type BaseEvolver) reads from that population.pkl.
This sets the evaluated_individuals property of the BaseEvolver to the data in population.pkl.
Then the BaseEvolver sets its generation property to the maximum value from evaluated_individuals["Generation"], which is a numpy.float64.

If there is population or budget scaling occurring across generations, the BaseEstimator attempts to index its population_size_list or budget_list properties with generation.
This raises an exception if generation is a float, as a floats are not valid for indexing.

Notably, it would be better to fix Population.update_column() such that it does not store integers as floats.
However, integer casting on retrieval is still a good idea along with it, Pandas is doing a lot of magic in that function, and I struggled to come up with a decently safe solution which allowed for casting to Pandas nullable integer dtype.

What are the relevant issues?

None that I was able to find during a brief search.

Questions:

Do the docs need to be updated? No, this PR does not alter any intended behavior
Does this PR add new (Python) dependencies? No

Add int cast on retrieve evaluated_individuals["Generation"].max()

111e9ae

reevesc7 changed the title ~~Add int cast on retrieve evaluated_individuals["Generation"].max()~~ bugfix: Exception raised during population/budget scaling after reading from population.pkl Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: Exception raised during population/budget scaling after reading from `population.pkl`#1392

bugfix: Exception raised during population/budget scaling after reading from `population.pkl`#1392
reevesc7 wants to merge 1 commit intoEpistasisLab:mainfrom
reevesc7:fix-float-generation

reevesc7 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

reevesc7 commented Mar 6, 2026

What does this PR do?

Where should the reviewer start?

How should this PR be tested?

Any background context you want to provide?

What are the relevant issues?

Questions:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant