about multi-GPU training setup #10

Jason-u · 2024-12-29T14:23:53Z

Dear author,

Thank you for your work. I would like to inquire why the model's computation is placed on device 1. When I set two cards to be visible, for example, gpus=0,1, and I set the batch_size to 1, I noticed a strange occurrence: both card 0 and card 1 are running simultaneously. May I ask you for guidance on how to modify this so that the data and model for a batch_size are all on one card using DataParallel?

pfriedri · 2025-01-08T08:27:23Z

@Jason-u We never used a multi-GPU training setup. There is some code to set up a distributed training environment, but we never tested it. You probably need to modify guided_diffusion/dist_util.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about multi-GPU training setup #10

about multi-GPU training setup #10

Jason-u commented Dec 29, 2024

pfriedri commented Jan 8, 2025

about multi-GPU training setup #10

about multi-GPU training setup #10

Comments

Jason-u commented Dec 29, 2024

pfriedri commented Jan 8, 2025