bugfix: Exception raised during population/budget scaling after reading from population.pkl#1392
Open
reevesc7 wants to merge 1 commit intoEpistasisLab:mainfrom
Open
bugfix: Exception raised during population/budget scaling after reading from population.pkl#1392reevesc7 wants to merge 1 commit intoEpistasisLab:mainfrom
population.pkl#1392reevesc7 wants to merge 1 commit intoEpistasisLab:mainfrom
Conversation
evaluated_individuals["Generation"].max()population.pkl
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR adds a cast to integer when retrieving
self.population.evaluated_individuals["Generation"].max()intpot.evolvers.BaseEvolver.__init__().This resolves the symptoms of an issue with
evaluated_individuals["Generation"]values being stored asnumpy.float64.Where should the reviewer start?
tpot/evolvers.base_evolver.py:BaseEvolver.__init__(), line 424.How should this PR be tested?
It would be worth ensuring that no
np.nanvalues can ever be stored inevaluated_individuals["Generation"].Beyond that and precision errors for extremely large values, I cannot conceive of this causing any issues.
The following script replicates the issue:
Before the patch, this script will throw the following exception:
After the patch, it should run successfully.
Any background context you want to provide?
As far as I understand, the
.loc[]insertion into theself.evaluated_individualsDataFrameinPopulation.update_column()forces Pandas to account forNaNvalues in any added column(s).Because NumPy integers are not nullable, Pandas converts integers into
numpy.float64, which then becomes the dtype of the"Generations"column.This data then gets stored in
population.pklifperiodic_checkpoint_folderis set.When a
TPOTEstimatorwith the sameperiodic_checkpoint_folderthen runs.fit(), its_evolver_instance(typeBaseEvolver) reads from thatpopulation.pkl.This sets the
evaluated_individualsproperty of theBaseEvolverto the data inpopulation.pkl.Then the
BaseEvolversets itsgenerationproperty to the maximum value fromevaluated_individuals["Generation"], which is anumpy.float64.If there is population or budget scaling occurring across generations, the
BaseEstimatorattempts to index itspopulation_size_listorbudget_listproperties withgeneration.This raises an exception if
generationis a float, as a floats are not valid for indexing.Notably, it would be better to fix
Population.update_column()such that it does not store integers as floats.However, integer casting on retrieval is still a good idea along with it, Pandas is doing a lot of magic in that function, and I struggled to come up with a decently safe solution which allowed for casting to Pandas nullable integer dtype.
What are the relevant issues?
None that I was able to find during a brief search.
Questions: