-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installation Issue #1
Comments
Hi, thank you for your interest in our work.
We would also like to know whether all packages were successfully installed during |
Hi, Then I proceed to run Many thanks, |
This is strange. Can you run By the way, deepspeed internally ignores environment device specification ( Can you remove If this works, you can proceed with python instead of deepspeed. In this case you need to adjust |
Here is the output after removing --distributed from the script, and change deepspeed train.py ... to python3 train.py. I'm trying to investigate to check all the tensors and model to be in the GPU (
|
At line 114 of There could be several more errors like (unmatched device) this because we checked our code with deepspeed (with We will shortly add a patch for non-deepspeed use cases. |
Many hanks for your reply :)
|
I honestly have no idea for this. Seems like an issue related to torch backend, because the problematic module is
|
Yes, I can run above code without error. It turns out that I install torch using cudatoolkit=10 meanwhile my nvdia-smi has cudatoolkit=11. After upgrade torch using the version of 11, it's OK. However, another error is encountered, because of (a)
Usually people just use (c)
I guess this is related to |
You are right. These are all related to deepspeed-style of handling model, gradient, and optimizer. |
Hi,
Thanks for the awesome work. I'm trying to run the code, via non-docker method (because I only have non-sudo access to my server), and I encounter this error when running
bash install.sh
. Here is the error message.What should I do?
I notice that without above installation, errors will be encountered in
metrics/evaluation_metrics.py/metrics.StructuralLosses
. For now, I only want to run the code with dummy data to see the data flow (value and each variable shape) to help me understand the details of the paper. Thus, running the non-optimal one (no cuda version) is also OK for me. Any suggestion?Many thanks!
The text was updated successfully, but these errors were encountered: