RuntimeError: CUDA error: device-side assert triggered #22

anilesec · 2020-12-14T17:02:37Z

Dear Author,

Thank you for the cool implementation.
I installed successfully and tried to run "python train_nerf.py --config config/lego.yml"
But I am getting RuntimeError: CUDA error: device-side assert triggered.

Traceback (most recent call last):
File "train_nerf.py", line 404, in
main()
File "train_nerf.py", line 240, in main
encode_direction_fn=encode_direction_fn,
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in run_one_iter_of_nerf
for batch in batches
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in
for batch in batches
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 115, in predict_and_render_radiance
encode_direction_fn,
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 11, in run_network
embedded = embed_fn(pts_flat)
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/nerf_helpers.py", line 166, in
x, num_encoding_functions, include_input, log_sampling
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/nerf_helpers.py", line 138, in positional_encoding
device=tensor.device,
File "/home/aswamy/tools/anaconda3/envs/nerf-pytorch-krish/lib/python3.7/site-packages/torch/tensor.py", line 27, in wrapped
return f(*args, **kwargs)
File "/home/aswamy/tools/anaconda3/envs/nerf-pytorch-krish/lib/python3.7/site-packages/torch/tensor.py", line 547, in rpow
return torch.tensor(other, dtype=dtype, device=self.device) ** self
RuntimeError: CUDA error: device-side assert triggered

Any suggestions to solve this?

Thank you!

krrish94 · 2020-12-15T13:48:50Z

It's hard to tell without knowing the exact config, but does this issue seem to help you?
#9

anilesec · 2020-12-15T18:25:22Z

I tried reducing the chunk size and the num of layers. But the error still persists. Besides, I do not have issues with gpu memory.
If you are talking about the model config file, I just used the default config file given in the github.

pgmsuper · 2020-12-16T01:46:16Z

I think the number of labels is wrong,causes error when calculating loss value

anilesec · 2020-12-16T09:34:08Z

but there is no number of labels involved as per my understanding

anilesec · 2020-12-16T10:22:07Z

It's hard to tell without knowing the exact config, but does this issue seem to help you?
#9
@krrish94 Here is more information about the error: Looks like the error is in file (/nerf-pytorch/nerf/nerf_helpers.py, line 301; cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g))
==============================================================
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [47,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [48,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [50,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [53,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [54,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [56,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [59,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [60,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [62,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
0%| | 0/200000 [00:08<?, ?it/s]
Traceback (most recent call last):
File "train_nerf.py", line 406, in
main()
File "train_nerf.py", line 242, in main
encode_direction_fn=encode_direction_fn,
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in run_on e_iter_of_nerf
for batch in batches
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in
for batch in batches
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 101, in predic t_and_render_radiance
det=(getattr(options.nerf, mode).perturb == 0.0),
File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/nerf_helpers.py", line 301, in sampl e_pdf_2
cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g)
RuntimeError: CUDA error: device-side assert triggered

anilesec · 2020-12-16T10:23:26Z

@krrish94 Following the preview comment:
I tried to print the tensors and found that values of tensor inds_g is too large and too small(which casuses out of bounds error)
inds_g.min() = tensor(-4993021444723710459)
inds_g.max() = tensor(4575432887736600530)
inds_g = tensor([[[ 4255818524050935954, 62],
[ 4256250760978027750, 62],
[ 4237722774569238629, 62],
...,]]

usage of this tensor:
file "run_nerf_helpers.py"
cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g)

anilesec · 2020-12-16T13:07:13Z

More update: Actually these large values of indices are coming from func torchsearchsorted.searchsorted()
inds = torchsearchsorted.searchsorted(cdf, u, side="right") --> after this line of code inds values are very extreme(out of bounds)

anilesec · 2020-12-16T13:18:40Z

More update: Actually these large values of indices are coming from func torchsearchsorted.searchsorted()
inds = torchsearchsorted.searchsorted(cdf, u, side="right") --> after this line of code inds values are very extreme(out of bounds)

It may be related to the issue you created for searchsorted() @krrish94

anilesec · 2020-12-16T13:50:09Z

More update: Actually these large values of indices are coming from func torchsearchsorted.searchsorted()
inds = torchsearchsorted.searchsorted(cdf, u, side="right") --> after this line of code inds values are very extreme(out of bounds)

It may be related to the issue you created for searchsorted() @krrish94

I replaced torchsearchsorted.searchsorted() with official torch.searchsorted(), now the error is gone and in runs successfully, though I am not sure influence on performance due to this change. I think it may be worth mentioning somewhere because I spent some time to get this :)

Thank you!

krrish94 · 2020-12-17T02:59:58Z

Thanks so much for digging into this! From a skim this appears to be due to a weird config that's potentially leading to indexing errors. I'd trust the newer torch searchsorted function as opposed to the external package.

pgmsuper · 2020-12-17T03:30:22Z

can you update your code?because I change my code but it's not work

pgmsuper · 2020-12-23T08:38:18Z

I think you can try to upgrade your python's libraries, such as numpy and so on,I do that and succed run it

krrish94 closed this as completed Dec 17, 2020

krrish94 reopened this Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: device-side assert triggered #22

RuntimeError: CUDA error: device-side assert triggered #22

anilesec commented Dec 14, 2020 •

edited

Loading

krrish94 commented Dec 15, 2020

anilesec commented Dec 15, 2020

pgmsuper commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

krrish94 commented Dec 17, 2020

pgmsuper commented Dec 17, 2020

pgmsuper commented Dec 23, 2020

RuntimeError: CUDA error: device-side assert triggered #22

RuntimeError: CUDA error: device-side assert triggered #22

Comments

anilesec commented Dec 14, 2020 • edited Loading

krrish94 commented Dec 15, 2020

anilesec commented Dec 15, 2020

pgmsuper commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

anilesec commented Dec 16, 2020

krrish94 commented Dec 17, 2020

pgmsuper commented Dec 17, 2020

pgmsuper commented Dec 23, 2020

anilesec commented Dec 14, 2020 •

edited

Loading