Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restart hyperopt #1824

Merged
merged 40 commits into from
Nov 7, 2023
Merged

Restart hyperopt #1824

merged 40 commits into from
Nov 7, 2023

Conversation

Cmurilochem
Copy link
Collaborator

@Cmurilochem Cmurilochem commented Oct 24, 2023

This PR addresses the issue of restarting an hyperoptimization with the Hyperopt library as discussed in #1800.

Comments on the initial changes made

1. hyper_optimization/filetrials.py

  • To the FileTrials class I have added the from_pkl and to_pkl methods. The last one is a @classmethod that is useful to create instances of the class when tries.pkl file is available from a previous run.
    The to_pkl method saves the current state of FileTrials to a pickle file, although this is currently being indeed done for every trial in hyperopt.fmin directly via the trials_save_file argument.
  • In this regard, I also added an attribute self.pkl_file which will be responsible for generating a tries.pkl file in the same directory as the tries.json
  • An additional attribute self._rstate is also added that will store the last numpy.random.Generator of the hyperopt algorithm and will be passed as rstate in the hyperopt.fmin function so that we warrant that
    by restarting we do so with the same history as if we were doing a direct calculation. The initial fixed seed in trials.rstate = np.random.default_rng(42) here can still be relaxed and provided as input later.

2. hyper_optimization/hyper_scan.py

  • In case of restarts, an extra boolean attribute is added to HyperScanner, named self.restart_hyperopt which is set true in case of the --continue option in n3fit command line (details to be discussed below).
  • I have adapted hyper_scan_wrapper to allow it to check if hyperscanner.restart_hyperopt is true. If so, it will generate an initial FileTrial instance (trials) from tries.pkl, which contains by built-in the history of the previous hyperopt and also the trials.rstate attribute with the previous numpy random generator.

3. scripts/n3fit_exec.py

This is perhaps the most fragile of the changes and where I would need help to adapt it properly.

  • To the N3FitApp I added a new parser --continue that will be the keyword to hyperopt restarts.
  • To its run method I add a new self.environment.restart = self.args["continue"] attribute.
  • The way I found to pass this keyword to HyperScanner later is to use it in connection with produce_hyperscanner. If this is true, I then update hyperscan_config with hyperscan_config.update({'restart': 'true'}) and this will later be part of the HyperScanner's
    sampling_dict argument.

Questions and requested feedback

  • It looks to me that the adaptations made in the scripts/n3fit_exec.py file to allow for --continue are not optimal. Maybe a more experienced developer could suggest a more convenient way to do so.
  • Despite all our efforts to make sure that the hyperopt restarts have the same history as if we were just making direct experiments, it seems that, despite having the same hyperparameter guesses, restart calculations will afterall show
    differences in the obtained final losses for different k folds. This might be due to the fact that the seeds for the initial weights for each k-fold in difference runs are inherently different (see below).
    For example, I have done a test in which I make a simple hyperoptimization with 2 trials, and then restart it to make another 2 trials (4 in total). Then I run another experiment and calculate (with the same runcard) 4 trials directly and compare the results.
Restart 0
 {'validation_losses': '[[ 8.973183]\n [12.285054]]', 'experimental_losses': '[[52.548443]\n [42.66381 ]]', 'hyper_losses': '[[52.548443]\n [42.66381 ]]'}
 {'Adam_clipnorm': [], 'Adam_learning_rate': [], 'Nadam_clipnorm': [2.0117276880696024e-05], 'Nadam_learning_rate': [0.002011336799276246], 'nl2:-0/2': [41.0], 'nl2:-1/2': [32.0], 'nodes_per_layer': ['0'], 'optimizer': ['0']}
Direct 0
 {'validation_losses': '[[ 8.973183]\n [12.285054]]', 'experimental_losses': '[[52.548443]\n [42.66381 ]]', 'hyper_losses': '[[52.548443]\n [42.66381 ]]'}
 {'Adam_clipnorm': [], 'Adam_learning_rate': [], 'Nadam_clipnorm': [2.0117276880696024e-05], 'Nadam_learning_rate': [0.002011336799276246], 'nl2:-0/2': [41.0], 'nl2:-1/2': [32.0], 'nodes_per_layer': ['0'], 'optimizer': ['0']}

Restart 1
 {'validation_losses': '[[14.459031]\n [29.935106]]', 'experimental_losses': '[[64.00286 ]\n [74.955086]]', 'hyper_losses': '[[64.00286 ]\n [74.955086]]'}
 {'Adam_clipnorm': [4.9234053337502195e-06], 'Adam_learning_rate': [0.00013178694123594783], 'Nadam_clipnorm': [], 'Nadam_learning_rate': [], 'nl2:-0/2': [40.0], 'nl2:-1/2': [31.0], 'nodes_per_layer': ['0'], 'optimizer': ['1']}
Direct 1
 {'validation_losses': '[[14.459031]\n [29.935106]]', 'experimental_losses': '[[64.00286 ]\n [74.955086]]', 'hyper_losses': '[[64.00286 ]\n [74.955086]]'}
 {'Adam_clipnorm': [4.9234053337502195e-06], 'Adam_learning_rate': [0.00013178694123594783], 'Nadam_clipnorm': [], 'Nadam_learning_rate': [], 'nl2:-0/2': [40.0], 'nl2:-1/2': [31.0], 'nodes_per_layer': ['0'], 'optimizer': ['1']}

-- restarting and performing more two trials

Restart 2
 {'validation_losses': '[[19.959248]\n [39.484573]]', 'experimental_losses': '[[55.306988]\n [92.67653 ]]', 'hyper_losses': '[[55.306988]\n [92.67653 ]]'}
 {'Adam_clipnorm': [], 'Adam_learning_rate': [], 'Nadam_clipnorm': [5.226179920719625e-06], 'Nadam_learning_rate': [0.00020623358967934892], 'nl2:-0/2': [32.0], 'nl2:-1/2': [13.0], 'nodes_per_layer': ['0'], 'optimizer': ['0']}
Direct 2
 {'validation_losses': '[[19.959248]\n [43.446335]]', 'experimental_losses': '[[ 55.306988]\n [104.45502 ]]', 'hyper_losses': '[[ 55.306988]\n [104.45502 ]]'}
 {'Adam_clipnorm': [], 'Adam_learning_rate': [], 'Nadam_clipnorm': [5.226179920719625e-06], 'Nadam_learning_rate': [0.00020623358967934892], 'nl2:-0/2': [32.0], 'nl2:-1/2': [13.0], 'nodes_per_layer': ['0'], 'optimizer': ['0']}

Restart 3
 {'validation_losses': '[[23.391615]\n [65.55123 ]]', 'experimental_losses': '[[ 69.19588]\n [137.06268]]', 'hyper_losses': '[[ 69.19588]\n [137.06268]]'}
 {'Adam_clipnorm': [1.6662863474168997e-07], 'Adam_learning_rate': [0.0025640118767782183], 'Nadam_clipnorm': [], 'Nadam_learning_rate': [], 'nl2:-0/2': [31.0], 'nl2:-1/2': [17.0], 'nodes_per_layer': ['0'], 'optimizer': ['1']}
Direct 3
 {'validation_losses': '[[23.391615]\n [44.021515]]', 'experimental_losses': '[[69.19588 ]\n [93.464554]]', 'hyper_losses': '[[69.19588 ]\n [93.464554]]'}
 {'Adam_clipnorm': [1.6662863474168997e-07], 'Adam_learning_rate': [0.0025640118767782183], 'Nadam_clipnorm': [], 'Nadam_learning_rate': [], 'nl2:-0/2': [31.0], 'nl2:-1/2': [17.0], 'nodes_per_layer': ['0'], 'optimizer': ['1']}

By looking at the above results, we can see that Restart 2/3 have the same hyperparameters as Direct 2/3, with the 2 folds having different losses however. Maybe the 1st fold can still keep up with the losses but not the second fold.
With the help of @goord and @APJansen, I investigated this issue and have printed the generated random integers passed as seeds to generate the PDF models for each fold in MoldelTrainer.hyperparametrizable(); see here. They are shown in the Table below:

Trial 0 Trial 1 Trial 2 Trial 3
Fold 1 Fold 2 Fold 1 Fold 2 Fold 1 Fold 2 Fold 1 Fold 2
Restart Job -random integers 1181867710 461027504 1181867710 1020231754 1181867710 461027504 1181867710 1020231754
Direct Job - random integers 1181867710 461027504 1181867710 1020231754 1181867710 1543757328 1181867710 1392765670

As foreseen, it is clear from the table that the seeds are different for the second fold every time we run a new calculation, despite the fact that the runs start with the same hyperparameters. This clearly reflects in the different losses shown above. I suspect that if we want to make hyperopt
runs completely reproducible we could think of alternatives to

for k, partition in enumerate(self.kpartitions):
            # Each partition of the kfolding needs to have its own separate model
            # and the seed needs to be updated accordingly
            seeds = self._nn_seeds
            if k > 0:
                seeds = [np.random.randint(0, pow(2, 31)) for _ in seeds]

to initialise the seeds.

Solution to the random integer issue described above

4. model_trainer.py

To ensure that these seeds are generated in reproducible way, @RoyStegeman helped me to devise a new form that changes the way they are generated by defining:

        for k, partition in enumerate(self.kpartitions):
            # Each partition of the kfolding needs to have its own separate model
            # and the seed needs to be updated accordingly
            seeds = self._nn_seeds
            if k > 0:
                # generate random integers for each k-fold from the input `nnseeds`
                # we generate new seeds to avoid the integer overflow that may
                # occur when doing k*nnseeds
                rngs = [np.random.default_rng(seed=seed) for seed in seeds]
                seeds = [generator.integers(1, pow(2, 30)) * k for generator in rngs]

With all the above modifications, I have repeated my previous 4 trial experiment. The results are shown below for both restart and direct runs:

Restart 0
 {'validation_losses': ['2.2993183', '4.4195056'], 'experimental_losses': [10.660690008425245, 13.892794249487705], 'hyper_losses': [19.669736106403093, 21.73920023647384]}
 {'Adadelta_clipnorm': [], 'Adadelta_learning_rate': [], 'RMSprop_learning_rate': [0.015380823956886622], 'activation_per_layer': ['0'], 'dropout': [0.15], 'epochs': [35.0], 'initializer': ['0'], 'multiplier': [1.074400261320179], 'nl2:-0/2': [], 'nl2:-1/2': [], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [15.0], 'nl4:-1/4': [41.0], 'nl4:-2/4': [36.0], 'nl4:-3/4': [45.0], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['2'], 'optimizer': ['1'], 'stopping_patience': [0.3600000000000001]}
Direct 0
 {'validation_losses': ['2.2993183', '4.4195056'], 'experimental_losses': [10.660690008425245, 13.892794249487705], 'hyper_losses': [19.669736106403093, 21.73920023647384]}
 {'Adadelta_clipnorm': [], 'Adadelta_learning_rate': [], 'RMSprop_learning_rate': [0.015380823956886622], 'activation_per_layer': ['0'], 'dropout': [0.15], 'epochs': [35.0], 'initializer': ['0'], 'multiplier': [1.074400261320179], 'nl2:-0/2': [], 'nl2:-1/2': [], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [15.0], 'nl4:-1/4': [41.0], 'nl4:-2/4': [36.0], 'nl4:-3/4': [45.0], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['2'], 'optimizer': ['1'], 'stopping_patience': [0.3600000000000001]}

Restart 1
 {'validation_losses': ['10.667141', '18.144234'], 'experimental_losses': [14.569936714920344, 25.68137247054303], 'hyper_losses': [46.88904701966194, 52.881341569995435]}
 {'Adadelta_clipnorm': [1.7558937825962389], 'Adadelta_learning_rate': [0.02971486397602543], 'RMSprop_learning_rate': [], 'activation_per_layer': ['0'], 'dropout': [0.03], 'epochs': [30.0], 'initializer': ['0'], 'multiplier': [1.0896393776712885], 'nl2:-0/2': [], 'nl2:-1/2': [], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [13.0], 'nl4:-1/4': [33.0], 'nl4:-2/4': [12.0], 'nl4:-3/4': [44.0], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['2'], 'optimizer': ['0'], 'stopping_patience': [0.18000000000000005]}
Direct 1
 {'validation_losses': ['10.667141', '18.144234'], 'experimental_losses': [14.569936714920344, 25.68137247054303], 'hyper_losses': [46.88904701966194, 52.881341569995435]}
 {'Adadelta_clipnorm': [1.7558937825962389], 'Adadelta_learning_rate': [0.02971486397602543], 'RMSprop_learning_rate': [], 'activation_per_layer': ['0'], 'dropout': [0.03], 'epochs': [30.0], 'initializer': ['0'], 'multiplier': [1.0896393776712885], 'nl2:-0/2': [], 'nl2:-1/2': [], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [13.0], 'nl4:-1/4': [33.0], 'nl4:-2/4': [12.0], 'nl4:-3/4': [44.0], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['2'], 'optimizer': ['0'], 'stopping_patience': [0.18000000000000005]}

-- restarting and performing more two trials

Restart 2
 {'validation_losses': ['18.18834', '52.55721'], 'experimental_losses': [21.345310585171568, 49.5125512295082], 'hyper_losses': [144.60983298921894, 105.20777437819639]}
 {'Adadelta_clipnorm': [0.8411342478713798], 'Adadelta_learning_rate': [0.04928810632634438], 'RMSprop_learning_rate': [], 'activation_per_layer': ['1'], 'dropout': [0.09], 'epochs': [47.0], 'initializer': ['1'], 'multiplier': [1.0615455307107098], 'nl2:-0/2': [16.0], 'nl2:-1/2': [35.0], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [], 'nl4:-1/4': [], 'nl4:-2/4': [], 'nl4:-3/4': [], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['0'], 'optimizer': ['0'], 'stopping_patience': [0.12000000000000002]}
Direct 2
 {'validation_losses': ['18.18834', '52.55721'], 'experimental_losses': [21.345310585171568, 49.5125512295082], 'hyper_losses': [144.60983298921894, 105.20777437819639]}
 {'Adadelta_clipnorm': [0.8411342478713798], 'Adadelta_learning_rate': [0.04928810632634438], 'RMSprop_learning_rate': [], 'activation_per_layer': ['1'], 'dropout': [0.09], 'epochs': [47.0], 'initializer': ['1'], 'multiplier': [1.0615455307107098], 'nl2:-0/2': [16.0], 'nl2:-1/2': [35.0], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [], 'nl4:-1/4': [], 'nl4:-2/4': [], 'nl4:-3/4': [], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['0'], 'optimizer': ['0'], 'stopping_patience': [0.12000000000000002]}

Restart 3
 {'validation_losses': ['26.753922', '24.388603'], 'experimental_losses': [52.71014284620098, 35.8982934170082], 'hyper_losses': [82.31994112766945, 3697.219938467043]}
 {'Adadelta_clipnorm': [0.44633727461389994], 'Adadelta_learning_rate': [0.023650226340698025], 'RMSprop_learning_rate': [], 'activation_per_layer': ['1'], 'dropout': [0.09], 'epochs': [26.0], 'initializer': ['1'], 'multiplier': [1.0166524890792967], 'nl2:-0/2': [38.0], 'nl2:-1/2': [34.0], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [], 'nl4:-1/4': [], 'nl4:-2/4': [], 'nl4:-3/4': [], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['0'], 'optimizer': ['0'], 'stopping_patience': [0.24000000000000005]}
Direct 3
 {'validation_losses': ['26.753922', '24.388603'], 'experimental_losses': [52.71014284620098, 35.8982934170082], 'hyper_losses': [82.31994112766945, 3697.219938467043]}
 {'Adadelta_clipnorm': [0.44633727461389994], 'Adadelta_learning_rate': [0.023650226340698025], 'RMSprop_learning_rate': [], 'activation_per_layer': ['1'], 'dropout': [0.09], 'epochs': [26.0], 'initializer': ['1'], 'multiplier': [1.0166524890792967], 'nl2:-0/2': [38.0], 'nl2:-1/2': [34.0], 'nl3:-0/3': [], 'nl3:-1/3': [], 'nl3:-2/3': [], 'nl4:-0/4': [], 'nl4:-1/4': [], 'nl4:-2/4': [], 'nl4:-3/4': [], 'nl5:-0/5': [], 'nl5:-1/5': [], 'nl5:-2/5': [], 'nl5:-3/5': [], 'nl5:-4/5': [], 'nodes_per_layer': ['0'], 'optimizer': ['0'], 'stopping_patience': [0.24000000000000005]}
Trial 0 Trial 1 Trial 2 Trial 3
Fold 1 Fold 2 Fold 1 Fold 2 Fold 1 Fold 2 Fold 1 Fold 2
Restart Job -random integers 1872583848 203138455 1872583848 203138455 1872583848 203138455 1872583848 203138455
Direct Job - random integers 1872583848 203138455 1872583848 203138455 1872583848 203138455 1872583848 203138455

As seen, we are now able to ensure that both the hyperparameter space and the initial weights for each k-fold are reproducible when restarting.

Note

As can be seen from the above (last) table, because the seeds to generate the random integers for each k-fold are now derived from the fixed value self._nn_seeds here, the generated random integers will always be the same in every trial; see #1824 (comment). This is an important aspect to keep in mind.

@RoyStegeman
Copy link
Member

Thanks @Cmurilochem , do you understand why the test is failing in ubuntu?

@Cmurilochem
Copy link
Collaborator Author

Cmurilochem commented Oct 26, 2023

Thanks @Cmurilochem , do you understand why the test is failing in ubuntu?

Hi @RoyStegeman, I added a new test that compares the results of one restart and direct hyperopt runs. This then checks for files and depend on paths and from where you run pytest. I will soon try to correct for this.
I do not know why this is not being run in macos. Do you have any idea?

Note: The test (as it is) is not expected to pass entirely since (among other asserts) it requires that the final json ['results'] dictionaries of both runs should match here. This relates to my comments above regarding the differences in the hyper losses for different folds. @goord gave me a nice idea on how to investigate this issue; further details to come.

@RoyStegeman
Copy link
Member

RoyStegeman commented Oct 26, 2023

I do not know why this is not being run in macos. Do you have any idea?

That's actually a good question. Since you added the pytest.mark.skip decorator I would have expected it to be skipped for both ubuntu and macos, so my question would be why it does run in ubuntu instead of why it doesn't in macos. Either way, since you're still investigating an issue related to this specific test, let's not worry too much about it now.

@Radonirinaunimi
Copy link
Member

Thanks a lot @Cmurilochem for these works! Regarding the issue you are facing now, are you sure that the other seeds (tr/vl, MC replicas) aren't also different? In any case, I think that there should be some (simple) ways to trick the random generators that it is starting from a $n$-th trial (as you can see from the table, the seeds for the continued hyperopt exactly restarted from the 2nd Trial).

Copy link
Member

@RoyStegeman RoyStegeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments but not a complete review

n3fit/src/n3fit/hyper_optimization/filetrials.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/tests/test_hyperopt.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/tests/test_hyperopt.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/tests/hyperopt/hyper-quickcard.yml Outdated Show resolved Hide resolved
n3fit/src/n3fit/scripts/n3fit_exec.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/model_trainer.py Outdated Show resolved Hide resolved
@Cmurilochem
Copy link
Collaborator Author

Thanks a lot @Cmurilochem for these works! Regarding the issue you are facing now, are you sure that the other seeds (tr/vl, MC replicas) aren't also different? In any case, I think that there should be some (simple) ways to trick the random generators that it is starting from a n-th trial (as you can see from the table, the seeds for the continued hyperopt exactly restarted from the 2nd Trial).

Thanks a lot @Cmurilochem for these works! Regarding the issue you are facing now, are you sure that the other seeds (tr/vl, MC replicas) aren't also different? In any case, I think that there should be some (simple) ways to trick the random generators that it is starting from a n-th trial (as you can see from the table, the seeds for the continued hyperopt exactly restarted from the 2nd Trial).

Thanks @Radonirinaunimi. This is something that I will need to check. Thanks for pointing this out. At least, the provisory change to seeds = [seed * k for seed in seeds] was enough to solve the problem. My test is passing now ate least locally. I am now struggling a bit to make it work in the CI/CD. I hope to solve it soon.

Copy link
Member

@RoyStegeman RoyStegeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And these are my comments for now. Thanks for the work so far!

I suspect that this might be due to the fact that the seeds for the initial weights for each k-fold in difference runs are inherently different (see below).

I suspect so as well - I noticed you froze the seed for the folds but probably not the tensorflow/numpy/python seeds. If the disagreement is due to the effect of random seeds it's of course not a problem for a real run of the hyperoptimization, and for your tests you already freeze them by setting debug: true so that should also be fine.

n3fit/src/n3fit/model_trainer.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/hyper_optimization/hyper_scan.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/hyper_optimization/hyper_scan.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/hyper_optimization/hyper_scan.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/model_trainer.py Outdated Show resolved Hide resolved
Copy link
Member

@Radonirinaunimi Radonirinaunimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Cmurilochem, here is a quick review. These are mainly formatting/styling and asking clarifications for various points.

In relation to the replica dependence. In the multireplica case I suspect that self._nn_seeds (= nnseeds argument in ModelTrainer) is already a list of seeds for each replica. I am not quite certain but, please, correct if I am wrong.

Yes, that is correct! The status of master right now is that it can have different seeds per replica for MCseed and NNseed during a multireplica fit, the only seed that is always the same is the TrVlseed $-$ which #1788 will fix.

n3fit/src/n3fit/hyper_optimization/filetrials.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/hyper_optimization/filetrials.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/hyper_optimization/filetrials.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/hyper_optimization/filetrials.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/hyper_optimization/filetrials.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/hyper_optimization/filetrials.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/hyper_optimization/hyper_scan.py Outdated Show resolved Hide resolved
n3fit/src/n3fit/tests/test_hyperopt.py Outdated Show resolved Hide resolved
@Cmurilochem
Copy link
Collaborator Author

Cmurilochem commented Nov 6, 2023

@Cmurilochem Other than moving the HYPEROPT_SEED to the runcard, is there anything else you want to do in this PR before merging?

If not I'll have a quick last look but then I'd say this is done?

Hi @Radonirinaunimi. I have corrected for all @RoyStegeman's suggestions and now will proceed to yours. Thanks for you time in reviewing and for your excellent suggestions.

Cmurilochem and others added 8 commits November 6, 2023 07:40
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
Co-authored-by: Tanjona Rabemananjara <rrabeman@nikhef.nl>
@Cmurilochem Cmurilochem marked this pull request as ready for review November 6, 2023 11:03
Copy link
Member

@RoyStegeman RoyStegeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

HYPEROPT_SEED is going to remain fixed for now?

@Cmurilochem
Copy link
Collaborator Author

Thanks, looks good!

HYPEROPT_SEED is going to remain fixed for now?

Hi @RoyStegeman that is what we initially thought and implemented so far. But If you think it would be better, I could try to add a new entry into the runcard so that (like other seeds) the user could have control over it. Please, just let me know what do you think ?

@RoyStegeman
Copy link
Member

Don't worry I'm fine either way, was just making sure it wasn't something you had forgotten about as I had understood you planned to take it from a runcard seed.

@Cmurilochem
Copy link
Collaborator Author

Don't worry I'm fine either way, was just making sure it wasn't something you had forgotten about as I had understood you planned to take it from a runcard seed.

Great! Maybe we could add this feature in the near future as long as we feel the need to do so. I will keep this in mind.

@Radonirinaunimi
Copy link
Member

Thanks a lot @Cmurilochem! I guess the only minor thing missing in order to merge this PR is a small note in the documentation (at the end of https://docs.nnpdf.science/n3fit/hyperopt.html?highlight=hyperopt) describing how one can restart hyperopt.

@Cmurilochem
Copy link
Collaborator Author

Thanks a lot @Cmurilochem! I guess the only minor thing missing in order to merge this PR is a small note in the documentation (at the end of https://docs.nnpdf.science/n3fit/hyperopt.html?highlight=hyperopt) describing how one can restart hyperopt.

Hi @Radonirinaunimi. I could add a note after Changing the hyperoptimization target and let you know after the commit has been done.

@RoyStegeman
Copy link
Member

Ah good point! It needs to be documented of course. Completely forgot about that 😅

@Cmurilochem
Copy link
Collaborator Author

Cmurilochem commented Nov 6, 2023

Ah good point! It needs to be documented of course. Completely forgot about that 😅

Thanks @Radonirinaunimi and @Radonirinaunimi. Documentation added! Please, feel free to suggest any possible changes and/or additions.

Co-authored-by: Roy Stegeman <roystegeman@live.nl>
@Cmurilochem
Copy link
Collaborator Author

Hi @RoyStegeman, @Radonirinaunimi and @scarlehoff. Please, let me know whether I could merge this PR after the approval of Roy and Tanjona. Thanks you all for your very valuable suggestions.

@RoyStegeman
Copy link
Member

Yes, please merge this

@Cmurilochem Cmurilochem merged commit 68a372f into master Nov 7, 2023
4 checks passed
@Cmurilochem Cmurilochem deleted the restart_hyperopt branch November 7, 2023 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Restart Hyperopt from pickle tries.pkl file Allow for hyperoptimization restart from tries.json
3 participants