After sync batch norm is applied, more gpu memory is consumed #10

shachoi · 2018-09-28T07:36:56Z

First of all, thank you for the implementation. It's very helpful.

I have one question.
After sync batch norm is applied, it consumes more GPU memory than normal batch norm.
Is it right?

vacancy · 2018-09-28T14:37:27Z

Hi @shachoi Thank you for your interested in!

I haven't test the memory usage carefully, but a quick answer is yes, mainly for the master GPU card, because it need to collect the statistics from other cards.

But I don't think there will be some big difference. Could you please share with us how precisely is the extra memory usage (in percent, for example)?

shachoi · 2018-09-28T17:19:06Z

Hi @vacancy,
Thanks a lot for your reply :)

I have tested sync batch norm on deeplab-resnet based segmentation task.
When I applied sync batch norm, it consumes about 30-40% more GPU memory. Detailed memory consumption info. is as follows.

sync batch norm : GPU1 - 8769 / GPU2 - 7125 / GPU3 - 7125
pytorch typical batch norm : GPU1 - 6687 / GPU2 - 5039 / GPU3 - 5039

vacancy · 2018-09-28T19:07:35Z

Hi,

I currently have little idea about the exact cause of the memory consumption. I will probably revisit this issue next week.

Just for your reference, here is another project using this SyncBN: https://github.com/CSAILVision/semantic-segmentation-pytorch

@Tete-Xiao, do you have any comment on this?

Tete-Xiao · 2018-09-29T19:40:02Z

@vacancy I did notice that the segmentation framework consumes more GPU memory than the normal one.

vacancy · 2018-09-30T00:24:20Z

@shachoi Thank you for posting this issue! I think the memory consumption issue is confirmed. I will get back to this next week.

Hellomodo · 2018-11-18T01:58:21Z

Hi @vacancy 。 Thanks for your great work!, And do you have any solution to the memory consumption issue now?

vacancy · 2018-11-18T03:30:37Z

@Tete-Xiao If you have spare time recently, can you help me with this issue?

@Hellomodo Here is my quick reply. There are two major reasons.

We use the NCCL backend provided by PyTorch to sync the feature statistics across GPUs. This requires a certain amount of extra memory. Although it shouldn't be this much in theory, in practice, PyTorch/NCCL might allocate more memory than required, depending on the implementation.
We implemented bachnorm using primitive PyTorch apis, this requires extra memories to store intermediate variables. One way to reduce such cost is by optimizing the codes in https://github.com/vacancy/Synchronized-BatchNorm-PyTorch/blob/master/sync_batchnorm/batchnorm.py.

yelantf · 2018-11-25T01:57:11Z

I have faced the same issue. Any progress so far?

CarpeDiemly · 2021-03-16T13:17:30Z

I have faced the same issue，too.
Before using convet_model to replace typical batch norm with SynchronizedBatchNorm2d:
GPU_1---7520, GPU_2---6756.
After that,
GPU_1---9760,GPU_2---8796.
can you help me? @vacancy

vacancy added the enhancement New feature or request label Feb 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After sync batch norm is applied, more gpu memory is consumed #10

After sync batch norm is applied, more gpu memory is consumed #10

shachoi commented Sep 28, 2018

vacancy commented Sep 28, 2018

shachoi commented Sep 28, 2018

vacancy commented Sep 28, 2018

Tete-Xiao commented Sep 29, 2018

vacancy commented Sep 30, 2018

Hellomodo commented Nov 18, 2018

vacancy commented Nov 18, 2018

yelantf commented Nov 25, 2018

CarpeDiemly commented Mar 16, 2021

After sync batch norm is applied, more gpu memory is consumed #10

After sync batch norm is applied, more gpu memory is consumed #10

Comments

shachoi commented Sep 28, 2018

vacancy commented Sep 28, 2018

shachoi commented Sep 28, 2018

vacancy commented Sep 28, 2018

Tete-Xiao commented Sep 29, 2018

vacancy commented Sep 30, 2018

Hellomodo commented Nov 18, 2018

vacancy commented Nov 18, 2018

yelantf commented Nov 25, 2018

CarpeDiemly commented Mar 16, 2021