Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PwBaseWorkChain: handler for BFGS history failure #985

Merged
merged 9 commits into from
Dec 22, 2023

Conversation

bastonero
Copy link
Collaborator

Sometimes the BFGS algorithm for ionic minimizaiton fails, and the current handler simply restart from scratch. This might work, but here we try to improve upon this simplistic solution, trying first to lower the trusted radius, and then trying to switch algorithm.

Comment on lines 507 to 508
@process_handler(priority=559, exit_codes=[
PwCalculation.exit_codes.ERROR_IONIC_CYCLE_BFGS_HISTORY_FAILURE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this exit code from the handler above (with priority 560)? Otherwise it will be handled twice. Since the first handler will just set the restart type and report the error being handled, I don't think we need it. And do you maybe want to handle ERROR_IONIC_CYCLE_BFGS_HISTORY_AND_FINAL_SCF_FAILURE also in this new way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it will be handled twice? If the first handler fixes the issue, the second one should't be called, should it?
Re ERROR_IONIC_CYCLE_BFGS_HISTORY_AND_FINAL_SCF_FAILURE I wasn't unsure what the FINAL_SCF_FAILURE actually meant. But indeed we can make it to be handled too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it returns ProcessHandlerReport(True) where the True is for the do_break argument, which is False by default. If set to True all other handlers are skipped. But that makes the problem worse. This means that your new handler is never actually called, right? Because it first matches the handler with priority 560 and then breaks. So it will never get to the next one.

The ERROR_IONIC_CYCLE_BFGS_HISTORY_AND_FINAL_SCF_FAILURE exit code means that first BFGS failed due to the history and then on top of that, the final scf also didn't converge. I think it makes sense to handle this as the normal BFGS history problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, interesting. But why it should stop first at the handler with priority 560? I thought lower priority number meant it would be called sooner. Is it the opposite? If this is the case we can just change it to 561

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, they are called based on priority in reverse order:
https://github.com/aiidateam/aiida-core/blob/main/aiida/engine/processes/workchains/restart.py#L243
This means higher numbers first. See also the docs: https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/workchains_restart.html#multiple-process-handlers

The process handlers with a higher priority will be called first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, thanks! Then I move it to higher priority and commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To nitpick a bit further here ^^

If the handle_relax_recoverable_ionic_convergence_bfgs_history_error process handler is called first, and it switches away from bfgs at some point, can't we just remove the ERROR_IONIC_CYCLE_BFGS_HISTORY_FAILURE and ERROR_IONIC_CYCLE_BFGS_HISTORY_AND_FINAL_SCF_FAILURE exit codes from the handle_relax_recoverable_ionic_convergence_error process handler? There shouldn't be any BFGS history failures for damp or fire, I'd think?

@mbercx
Copy link
Member

mbercx commented Nov 24, 2023

Thanks @bastonero! Does this one fix #637?

I remember I was testing this when @giovannipizzi first opened the issue (and then got distracted, as usual), and the damped was very effective for relax, but not-so-effective for vc-relax. The strategy here is already quite good, but we may consider immediately switching to the damped algorithm for calculation == relax.

@bastonero
Copy link
Collaborator Author

Thanks @mbercx !

Thanks @bastonero! Does this one fix #637?

Yes, that was the idea (I forgot the number of that issue, thanks for linking).

I remember I was testing this when @giovannipizzi first opened the issue (and then got distracted, as usual), and the damped was very effective for relax, but not-so-effective for vc-relax. The strategy here is already quite good, but we may consider immediately switching to the damped algorithm for calculation == relax.

Ok, amazing, then it's also tested! We can then proceed as you suggest.

@bastonero
Copy link
Collaborator Author

@sphuber now working with the new logic suggested by @mbercx . I also added for the relax case the fire algorithm`. I think it's new, we should make sure from which version of QE it's implemented. But now we just support the latest 3 versions correct?

@sphuber
Copy link
Contributor

sphuber commented Nov 24, 2023

But now we just support the latest 3 versions correct?

I think we only support v6.6 and newer, but maybe for the next release we already even planned to drop support for v6 completely. @mbercx ?

@bastonero
Copy link
Collaborator Author

Looking into QE history it seems the fire algorithm is there since a while, so it should be a safe addition.

@bastonero bastonero requested a review from sphuber November 27, 2023 08:47
@mbercx
Copy link
Member

mbercx commented Nov 27, 2023

I think we only support v6.6 and newer, but maybe for the next release we already even planned to drop support for v6 completely. @mbercx ?

According to our compatibility guarantees, we support the last 3 minor releases, as well as older versions for up to two years. As QE v6.8 was released in July 2021:

Screenshot 2023-11-27 at 10 09 16

We can also drop support for that in the next aiida-quantumespresso release.

@bastonero
Copy link
Collaborator Author

Nevertheless, I didn't understand whether this fire algorithm is something brand new or it was introduced years ago. I cannot find on their GitLab changelog when this was introduced. Maybe it is already quite old stuff, so no need to drop support (yet)

@bastonero bastonero requested a review from mbercx November 28, 2023 12:13
Copy link
Member

@mbercx mbercx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bastonero! I've left some nitpicks/questions. ^^

src/aiida_quantumespresso/workflows/pw/base.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/workflows/pw/base.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/workflows/pw/base.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/workflows/pw/base.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/workflows/pw/base.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/workflows/pw/base.py Outdated Show resolved Hide resolved
@bastonero bastonero requested a review from mbercx December 13, 2023 15:05
Copy link
Member

@mbercx mbercx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bastonero! I came up with some more comments. ^^

src/aiida_quantumespresso/workflows/pw/base.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/workflows/pw/base.py Outdated Show resolved Hide resolved
@bastonero bastonero requested a review from mbercx December 13, 2023 16:54
bastonero and others added 7 commits December 13, 2023 18:19
Sometimes the BFGS algorithm for ionic minimizaiton fails, and
the current handler simply restart from scratch. This might work,
but here we try to improve upon this simplistic solution, trying first
to lower the trusted radius, and then trying to switch algorithm.
Co-authored-by: Marnik Bercx <mbercx@gmail.com>
Copy link
Member

@mbercx mbercx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bastonero! I think this one is good to go.

@bastonero bastonero merged commit 0224f8a into aiidateam:main Dec 22, 2023
7 checks passed
@bastonero bastonero deleted the handler/pw branch December 22, 2023 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants