Can't reproduce example from t5 e2e #260

Pavelrst · 2023-01-23T13:50:29Z

Pavelrst
Jan 23, 2023

I've tried to run the example in this notebook: https://github.com/ELS-RD/kernl/blob/main/tutorial/t5%20e2e.ipynb

First, I've created a conda environment with python 3.9

Then, I've run the installation command:
pip install 'git+https://github.com/ELS-RD/kernl' --extra-index-url https://download.pytorch.org/whl/nightly/cu117

nvidia-smi gives me:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   27C    P8    17W / 300W |      0MiB / 23028MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

print(torch.version.cuda) gives: 11.7

nvcc --version outside of conda environment:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

nvcc --version inside my conda environment:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

Then I try to execute the code from the example. It crashes at the warmup stage:
model.generate(inputs=input_ids["input_ids"], min_length=22, max_length=22)

The errors I see in the traceback:

/usr/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status

and

subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmph8t8tdm0/main.c', '-O3', '-I/usr/local/cuda/include', '-I/home/ubuntu/miniconda3/envs/kernl/include/python3.9', '-I/tmp/tmph8t8tdm0', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmph8t8tdm0/_layer_norm_fwd_fused_single_pass.cpython-39-x86_64-linux-gnu.so', '-L/usr/share/man/man7', '-L/usr/share/man/man7']' returned non-zero exit status 1.

To get more info on the first error: ld -lcuda -verbose :

attempt to open //usr/local/lib/x86_64-linux-gnu/libcuda.so failed
attempt to open //usr/local/lib/x86_64-linux-gnu/libcuda.a failed
attempt to open //lib/x86_64-linux-gnu/libcuda.so failed
attempt to open //lib/x86_64-linux-gnu/libcuda.a failed
attempt to open //usr/lib/x86_64-linux-gnu/libcuda.so failed
attempt to open //usr/lib/x86_64-linux-gnu/libcuda.a failed
attempt to open //usr/lib/x86_64-linux-gnu64/libcuda.so failed
attempt to open //usr/lib/x86_64-linux-gnu64/libcuda.a failed
attempt to open //usr/local/lib64/libcuda.so failed
attempt to open //usr/local/lib64/libcuda.a failed
attempt to open //lib64/libcuda.so failed
attempt to open //lib64/libcuda.a failed
attempt to open //usr/lib64/libcuda.so failed
attempt to open //usr/lib64/libcuda.a failed
attempt to open //usr/local/lib/libcuda.so failed
attempt to open //usr/local/lib/libcuda.a failed
attempt to open //lib/libcuda.so failed
attempt to open //lib/libcuda.a failed
attempt to open //usr/lib/libcuda.so failed
attempt to open //usr/lib/libcuda.a failed
attempt to open //usr/x86_64-linux-gnu/lib64/libcuda.so failed
attempt to open //usr/x86_64-linux-gnu/lib64/libcuda.a failed
attempt to open //usr/x86_64-linux-gnu/lib/libcuda.so failed
attempt to open //usr/x86_64-linux-gnu/lib/libcuda.a failed
ld: cannot find -lcuda

Any help is much appreciated!

Pavelrst · 2023-01-24T08:03:11Z

Pavelrst
Jan 24, 2023
Author

update: I've been able to run the t5 example inside a docker container (image you provided): docker run --rm --gpus all <kernl-docker-name>
So, first of all - thanks! I've got 4.6x speedup.

now, I'd like to avoid using docker. Any suggestions how it's possible, i.e. how to fix the previous error?

0 replies

pommedeterresautee · 2023-01-24T22:07:30Z

pommedeterresautee
Jan 24, 2023
Maintainer

After doing some tests on master, we have found that values in rmsnorm layer are increasing up to 60K and then overflow (out of range of fp16). We have also verified that using bf16 we have not this issue. We did the tests by hardcoding stuff and we need to find an elegant way to let user choosing between fp16 and bf16.
cc @jonathlela

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce example from t5 e2e #260

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Can't reproduce example from t5 e2e #260

Pavelrst Jan 23, 2023

Replies: 2 comments

Pavelrst Jan 24, 2023 Author

pommedeterresautee Jan 24, 2023 Maintainer

Pavelrst
Jan 23, 2023

Pavelrst
Jan 24, 2023
Author

pommedeterresautee
Jan 24, 2023
Maintainer