Replies: 2 comments
-
update: I've been able to run the t5 example inside a docker container (image you provided): now, I'd like to avoid using docker. Any suggestions how it's possible, i.e. how to fix the previous error? |
Beta Was this translation helpful? Give feedback.
-
After doing some tests on master, we have found that values in rmsnorm layer are increasing up to 60K and then overflow (out of range of fp16). We have also verified that using bf16 we have not this issue. We did the tests by hardcoding stuff and we need to find an elegant way to let user choosing between fp16 and bf16. |
Beta Was this translation helpful? Give feedback.
-
I've tried to run the example in this notebook: https://github.com/ELS-RD/kernl/blob/main/tutorial/t5%20e2e.ipynb
First, I've created a conda environment with python 3.9
Then, I've run the installation command:
pip install 'git+https://github.com/ELS-RD/kernl' --extra-index-url https://download.pytorch.org/whl/nightly/cu117
nvidia-smi gives me:
print(torch.version.cuda)
gives: 11.7nvcc --version
outside of conda environment:nvcc --version
inside my conda environment:Then I try to execute the code from the example. It crashes at the warmup stage:
model.generate(inputs=input_ids["input_ids"], min_length=22, max_length=22)
The errors I see in the traceback:
and
To get more info on the first error:
ld -lcuda -verbose
:Any help is much appreciated!
Beta Was this translation helpful? Give feedback.
All reactions