Skip to content

bugfix: Exception raised during population/budget scaling after reading from population.pkl#1392

Open
reevesc7 wants to merge 1 commit intoEpistasisLab:mainfrom
reevesc7:fix-float-generation
Open

bugfix: Exception raised during population/budget scaling after reading from population.pkl#1392
reevesc7 wants to merge 1 commit intoEpistasisLab:mainfrom
reevesc7:fix-float-generation

Conversation

@reevesc7
Copy link

@reevesc7 reevesc7 commented Mar 6, 2026

What does this PR do?

This PR adds a cast to integer when retrieving self.population.evaluated_individuals["Generation"].max() in tpot.evolvers.BaseEvolver.__init__().
This resolves the symptoms of an issue with evaluated_individuals["Generation"] values being stored as numpy.float64.

Where should the reviewer start?

tpot/evolvers.base_evolver.py: BaseEvolver.__init__(), line 424.

How should this PR be tested?

It would be worth ensuring that no np.nan values can ever be stored in evaluated_individuals["Generation"].
Beyond that and precision errors for extremely large values, I cannot conceive of this causing any issues.

The following script replicates the issue:

from tpot import TPOTEstimator
import tpot.config as cfg
import tpot.objectives as obj
from sklearn.datasets import load_diabetes
from sklearn.utils import Bunch


def main():
    est = TPOTEstimator(
        search_space=cfg.get_search_space("KNeighborsRegressor"),
        scorers=[
            "neg_mean_squared_error",
            obj.complexity_scorer,
        ],
        scorers_weights=[
            1,
            -1,
        ],
        classification=False,
        cv=4,
        bigger_is_better=True,
        population_size=4,
        initial_population_size=5,
        generations_until_end_population=20,
        generations=1,
        max_time_mins=20,
        max_eval_time_mins=1,
        periodic_checkpoint_folder="warm_popscale",
        verbose=4,
        random_state=2,
    )
    data = load_diabetes()
    assert isinstance(data, Bunch)
    est.fit(data.data, data.target)
    est.fit(data.data, data.target)


if __name__ == "__main__":
    main()

Before the patch, this script will throw the following exception:

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

After the patch, it should run successfully.

Any background context you want to provide?

As far as I understand, the .loc[] insertion into the self.evaluated_individuals DataFrame in Population.update_column() forces Pandas to account for NaN values in any added column(s).
Because NumPy integers are not nullable, Pandas converts integers into numpy.float64, which then becomes the dtype of the "Generations" column.

This data then gets stored in population.pkl if periodic_checkpoint_folder is set.

When a TPOTEstimator with the same periodic_checkpoint_folder then runs .fit(), its _evolver_instance (type BaseEvolver) reads from that population.pkl.
This sets the evaluated_individuals property of the BaseEvolver to the data in population.pkl.
Then the BaseEvolver sets its generation property to the maximum value from evaluated_individuals["Generation"], which is a numpy.float64.

If there is population or budget scaling occurring across generations, the BaseEstimator attempts to index its population_size_list or budget_list properties with generation.
This raises an exception if generation is a float, as a floats are not valid for indexing.

Notably, it would be better to fix Population.update_column() such that it does not store integers as floats.
However, integer casting on retrieval is still a good idea along with it, Pandas is doing a lot of magic in that function, and I struggled to come up with a decently safe solution which allowed for casting to Pandas nullable integer dtype.

What are the relevant issues?

None that I was able to find during a brief search.

Questions:

  • Do the docs need to be updated? No, this PR does not alter any intended behavior
  • Does this PR add new (Python) dependencies? No

@reevesc7 reevesc7 changed the title Add int cast on retrieve evaluated_individuals["Generation"].max() bugfix: Exception raised during population/budget scaling after reading from population.pkl Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant