Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about multi-GPU training setup #10

Open
Jason-u opened this issue Dec 29, 2024 · 1 comment
Open

about multi-GPU training setup #10

Jason-u opened this issue Dec 29, 2024 · 1 comment

Comments

@Jason-u
Copy link

Jason-u commented Dec 29, 2024

Dear author,

Thank you for your work. I would like to inquire why the model's computation is placed on device 1. When I set two cards to be visible, for example, gpus=0,1, and I set the batch_size to 1, I noticed a strange occurrence: both card 0 and card 1 are running simultaneously. May I ask you for guidance on how to modify this so that the data and model for a batch_size are all on one card using DataParallel?

@pfriedri
Copy link
Owner

pfriedri commented Jan 8, 2025

@Jason-u We never used a multi-GPU training setup. There is some code to set up a distributed training environment, but we never tested it. You probably need to modify guided_diffusion/dist_util.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants