Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when use single node and multi gpu for training #55

Open
lrj16 opened this issue Feb 20, 2023 · 5 comments
Open

error when use single node and multi gpu for training #55

lrj16 opened this issue Feb 20, 2023 · 5 comments

Comments

@lrj16
Copy link

lrj16 commented Feb 20, 2023

I can train successfully by 1 node and 1 gpu, but when i use 1 node and 4 gpu, it trigger error——ValueError: you need to add an explicit nodesplitter to your input pipeline for multi-node training. Does any one know how to fix it?

@xvjiarui
Copy link
Contributor

Hi @lrj16 , what's webdataset version? Could you please post the error trace?

@lrj16
Copy link
Author

lrj16 commented Feb 28, 2023

Hi @lrj16 , what's webdataset version? Could you please post the error trace?

@xvjiarui Thanks for replying. My webdataset version is 0.2.33. I have solved this problem. But this is my first time use webdataset, and I have a new problem. I need a more complex dataloader, which can return more information beside <image,text> pair. But map_dict(key=funct()) just can return the corresponding value of the key. I tried map(add_key()) to create new key but result in error as follow, do you know why? Thanks very much.
image

@lrj16
Copy link
Author

lrj16 commented Mar 7, 2023

@xvjiarui I have solved this problem. Thanks!

@lianxxx
Copy link

lianxxx commented Dec 5, 2023

I can train successfully by 1 node and 1 gpu, but when i use 1 node and 4 gpu, it trigger error——ValueError: you need to add an explicit nodesplitter to your input pipeline for multi-node training. Does any one know how to fix it?

I had the same problem. How did you solve it? Thanks!!!

@lianxxx
Copy link

lianxxx commented Dec 5, 2023

Hi @lrj16 , what's webdataset version? Could you please post the error trace?

Thanks Author, problem solved. Changed webdataset version to 0.1.103.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants