Skip to content
This repository has been archived by the owner on Jul 30, 2024. It is now read-only.

RuntimeError: CUDA error: device-side assert triggered #22

Open
anilesec opened this issue Dec 14, 2020 · 12 comments
Open

RuntimeError: CUDA error: device-side assert triggered #22

anilesec opened this issue Dec 14, 2020 · 12 comments

Comments

@anilesec
Copy link

anilesec commented Dec 14, 2020

Dear Author,

Thank you for the cool implementation.
I installed successfully and tried to run "python train_nerf.py --config config/lego.yml"
But I am getting RuntimeError: CUDA error: device-side assert triggered.

Traceback (most recent call last):
File "train_nerf.py", line 404, in
main()
File "train_nerf.py", line 240, in main
encode_direction_fn=encode_direction_fn,
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in run_one_iter_of_nerf
for batch in batches
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in
for batch in batches
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 115, in predict_and_render_radiance
encode_direction_fn,
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 11, in run_network
embedded = embed_fn(pts_flat)
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/nerf_helpers.py", line 166, in
x, num_encoding_functions, include_input, log_sampling
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/nerf_helpers.py", line 138, in positional_encoding
device=tensor.device,
File "/home/aswamy/tools/anaconda3/envs/nerf-pytorch-krish/lib/python3.7/site-packages/torch/tensor.py", line 27, in wrapped
return f(*args, **kwargs)
File "/home/aswamy/tools/anaconda3/envs/nerf-pytorch-krish/lib/python3.7/site-packages/torch/tensor.py", line 547, in rpow
return torch.tensor(other, dtype=dtype, device=self.device) ** self
RuntimeError: CUDA error: device-side assert triggered

Any suggestions to solve this?

Thank you!

@krrish94
Copy link
Owner

It's hard to tell without knowing the exact config, but does this issue seem to help you?
#9

@anilesec
Copy link
Author

I tried reducing the chunk size and the num of layers. But the error still persists. Besides, I do not have issues with gpu memory.
If you are talking about the model config file, I just used the default config file given in the github.

@pgmsuper
Copy link

I think the number of labels is wrong,causes error when calculating loss value

@anilesec
Copy link
Author

but there is no number of labels involved as per my understanding

@anilesec
Copy link
Author

It's hard to tell without knowing the exact config, but does this issue seem to help you?
#9
@krrish94 Here is more information about the error: Looks like the error is in file (/nerf-pytorch/nerf/nerf_helpers.py, line 301; cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g))
==============================================================
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [47,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [48,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [50,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [53,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [54,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [56,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [59,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [60,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [62,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
0%| | 0/200000 [00:08<?, ?it/s]
Traceback (most recent call last):
File "train_nerf.py", line 406, in
main()
File "train_nerf.py", line 242, in main
encode_direction_fn=encode_direction_fn,
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in run_on e_iter_of_nerf
for batch in batches
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in
for batch in batches
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 101, in predic t_and_render_radiance
det=(getattr(options.nerf, mode).perturb == 0.0),
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/nerf_helpers.py", line 301, in sampl e_pdf_2
cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g)
RuntimeError: CUDA error: device-side assert triggered

@anilesec
Copy link
Author

@krrish94 Following the preview comment:
I tried to print the tensors and found that values of tensor inds_g is too large and too small(which casuses out of bounds error)
inds_g.min() = tensor(-4993021444723710459)
inds_g.max() = tensor(4575432887736600530)
inds_g = tensor([[[ 4255818524050935954, 62],
[ 4256250760978027750, 62],
[ 4237722774569238629, 62],
...,]]

usage of this tensor:
file "run_nerf_helpers.py"
cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g)

@anilesec
Copy link
Author

More update: Actually these large values of indices are coming from func torchsearchsorted.searchsorted()
inds = torchsearchsorted.searchsorted(cdf, u, side="right") --> after this line of code inds values are very extreme(out of bounds)

@anilesec
Copy link
Author

More update: Actually these large values of indices are coming from func torchsearchsorted.searchsorted()
inds = torchsearchsorted.searchsorted(cdf, u, side="right") --> after this line of code inds values are very extreme(out of bounds)

It may be related to the issue you created for searchsorted() @krrish94

@anilesec
Copy link
Author

More update: Actually these large values of indices are coming from func torchsearchsorted.searchsorted()
inds = torchsearchsorted.searchsorted(cdf, u, side="right") --> after this line of code inds values are very extreme(out of bounds)

It may be related to the issue you created for searchsorted() @krrish94

I replaced torchsearchsorted.searchsorted() with official torch.searchsorted(), now the error is gone and in runs successfully, though I am not sure influence on performance due to this change. I think it may be worth mentioning somewhere because I spent some time to get this :)

Thank you!

@krrish94
Copy link
Owner

Thanks so much for digging into this! From a skim this appears to be due to a weird config that's potentially leading to indexing errors. I'd trust the newer torch searchsorted function as opposed to the external package.

@pgmsuper
Copy link

can you update your code?because I change my code but it's not work

@krrish94 krrish94 reopened this Dec 17, 2020
@pgmsuper
Copy link

I think you can try to upgrade your python's libraries, such as numpy and so on,I do that and succed run it

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants