-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about fp16 #22
Comments
I haven’t tried fp16 in pytorch. Do you think it’s due to some type mismatch: fp32 vs. fp16? It will be great if you could help me to add a try-catch in the forward method of the batch norm class. We should first check if some exceptions have been thrown out there. |
Thanks for your help .Firstly,I am using two gpus. Secondly,I add a try-catch in the forward method of the _SynchronizedBatchNorm class(batchnorm.py).Then ,I locate the error step by step. 1. batchnorm.py:
2.comm.py:
The error is 'An error occured.'My try-catch like this:except IOError: except ValueError: except ImportError: except EOFError: except KeyboardInterrupt: except: |
Can you give detailed information about the "error"? For example, you may directly wrap the whole function body of
|
The detailed informationTraceback (most recent call last): |
Seems that some values in the tensors exceed the max value of fp16 ... I guess it's the I am not an expert on this: is there any solution to this? I think this should be a general problem for fp16 training. |
When I use fp16 (16-bit float) and multi-gpu training,the code will wait in SyncBN(comm.py).
The text was updated successfully, but these errors were encountered: