Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After sync batch norm is applied, more gpu memory is consumed #10

Open
shachoi opened this issue Sep 28, 2018 · 9 comments
Open

After sync batch norm is applied, more gpu memory is consumed #10

shachoi opened this issue Sep 28, 2018 · 9 comments
Labels
enhancement New feature or request

Comments

@shachoi
Copy link

shachoi commented Sep 28, 2018

First of all, thank you for the implementation. It's very helpful.

I have one question.
After sync batch norm is applied, it consumes more GPU memory than normal batch norm.
Is it right?

@vacancy
Copy link
Owner

vacancy commented Sep 28, 2018

Hi @shachoi Thank you for your interested in!

I haven't test the memory usage carefully, but a quick answer is yes, mainly for the master GPU card, because it need to collect the statistics from other cards.

But I don't think there will be some big difference. Could you please share with us how precisely is the extra memory usage (in percent, for example)?

@shachoi
Copy link
Author

shachoi commented Sep 28, 2018

Hi @vacancy,
Thanks a lot for your reply :)

I have tested sync batch norm on deeplab-resnet based segmentation task.
When I applied sync batch norm, it consumes about 30-40% more GPU memory. Detailed memory consumption info. is as follows.

  • sync batch norm : GPU1 - 8769 / GPU2 - 7125 / GPU3 - 7125
  • pytorch typical batch norm : GPU1 - 6687 / GPU2 - 5039 / GPU3 - 5039

@vacancy
Copy link
Owner

vacancy commented Sep 28, 2018

Hi,

I currently have little idea about the exact cause of the memory consumption. I will probably revisit this issue next week.

Just for your reference, here is another project using this SyncBN: https://github.com/CSAILVision/semantic-segmentation-pytorch

@Tete-Xiao, do you have any comment on this?

@Tete-Xiao
Copy link
Collaborator

@vacancy I did notice that the segmentation framework consumes more GPU memory than the normal one.

@vacancy
Copy link
Owner

vacancy commented Sep 30, 2018

@shachoi Thank you for posting this issue! I think the memory consumption issue is confirmed. I will get back to this next week.

@Hellomodo
Copy link

Hi @vacancy 。 Thanks for your great work!, And do you have any solution to the memory consumption issue now?

@vacancy
Copy link
Owner

vacancy commented Nov 18, 2018

@Tete-Xiao If you have spare time recently, can you help me with this issue?

@Hellomodo Here is my quick reply. There are two major reasons.

  1. We use the NCCL backend provided by PyTorch to sync the feature statistics across GPUs. This requires a certain amount of extra memory. Although it shouldn't be this much in theory, in practice, PyTorch/NCCL might allocate more memory than required, depending on the implementation.
  2. We implemented bachnorm using primitive PyTorch apis, this requires extra memories to store intermediate variables. One way to reduce such cost is by optimizing the codes in https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/blob/master/sync_batchnorm/batchnorm.py.

@yelantf
Copy link

yelantf commented Nov 25, 2018

I have faced the same issue. Any progress so far?

@vacancy vacancy added the enhancement New feature or request label Feb 21, 2019
@CarpeDiemly
Copy link

I have faced the same issue,too.
Before using convet_model to replace typical batch norm with SynchronizedBatchNorm2d:
GPU_1---7520, GPU_2---6756.
After that,
GPU_1---9760,GPU_2---8796.
can you help me? @vacancy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants