-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move from torch.cuda.amp to torch.autocast; Add tests for amp #838
base: main
Are you sure you want to change the base?
Conversation
…IR-Chem/fairchem into move_to_autocast_general_and_amp_test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @misko, mostly style suggestions
if node_energy.device.type == "cuda": | ||
energy.index_add_(0, data.batch, node_energy.view(-1)) | ||
else: | ||
energy.index_add_(0, data.batch, node_energy.float().view(-1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we loose a lot of performance or use up too much memory when casting? if not then should we cast regardless?
…en-Catalyst-Project/ocp into move_to_autocast_general_and_amp_test
…en-Catalyst-Project/ocp into move_to_autocast_general_and_amp_test
…en-Catalyst-Project/ocp into move_to_autocast_general_and_amp_test
…IR-Chem/fairchem into move_to_autocast_general_and_amp_test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg, just minor suggestions!
@@ -36,6 +36,8 @@ def __init__( | |||
precon=None, | |||
cpu=False, | |||
batch_size=4, | |||
seed=0, # set a seed for reproducibility | |||
amp=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just set amp=True
as the default. From the lines bellow it looks like thats the case
@@ -110,11 +112,13 @@ def __init__( | |||
local_rank=config.get("local_rank", 0), | |||
is_debug=config.get("is_debug", True), | |||
cpu=cpu, | |||
amp=True, | |||
amp=(amp==None or amp), # AMP on by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, its more pythonic to check None with amp is None
(its a "singleton")
if self.lmax > 0: | ||
num_m_components = (self.lmax + 1) ** 2 | ||
feature = node_input.narrow(1, 1, num_m_components - 1) | ||
with torch.autocast(device_type=node_input.device.type, enabled=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to double up on autocast decorators like the forward method above?
I am just worried about potential incorrect indentation bugs in the future.
out = self._forward(batch) | ||
out = {k: v.float() for k, v in out.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide to always cast to float32 on predictions? Thats sounds ok with me, just making sure because this is different than before.
This PR updates torch.cuda.amp.autocast(args...) and torch.cuda.amp.GradScaler(args...) (deprecated) and also adds CPU AMP tests. ( https://pytorch.org/docs/stable/amp.html )
Recently there was an eSCN model that did not run on GPU AMP and the current set of tests did not catch it. This PR adds those equivalent tests on CPU AMP, which will test this going forward.
Additional fixes haven been made to eSCN / SCN / Gemnet OC in this PR.