CUDA error: an illegal memory access was encountered #10

captureguy · 2020-05-28T14:23:03Z

When I try to generate training data for onet I receive a memory error in the middle of processing the image files. I am using pytorch 1.5. Any help would be appreciated.

ansonku · 2020-05-29T02:04:32Z

What's your gpu memory size ?

SURABHI-GUPTA · 2020-05-29T06:20:19Z

hey.. while generating data for pnet.. did you encountered module not found error ?
like while running python scripts/gen_pnet_train.py, I got
Traceback (most recent call last):
File "scripts/gen_pnet_train.py", line 4, in
import mtcnn.train.gen_pnet_train as gptd
ImportError: No module named 'mtcnn'

Can you help me with this ?

ansonku · 2020-05-29T06:25:27Z

@SURABHI-GUPTA

Have a try.
in inscripts/gen_pnet_train.py file. You can add following code

import sys

sys.path.insert(0,'/path_of_folder/FaceDetector/')

SURABHI-GUPTA · 2020-05-29T08:15:02Z

but I am getting this error after that:

ModuleNotFoundError: No module named 'mtcnn.utils.nms.cpu_nms'

ansonku · 2020-05-29T11:39:18Z

@SURABHI-GUPTA

Have you compiled the Cython code?

python setup.py build_ext --inplace

SURABHI-GUPTA · 2020-05-29T17:42:27Z

yes, I compiled but there was some error.. it has been resolved.
thanks @ansonku

SURABHI-GUPTA · 2020-05-29T17:43:49Z

after training pnet.. I want to generate samples for rnet,
got this error,
File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms
cdef int boxes_num = dets.shape[0]
TypeError: an integer is required

although dets.shape[0] is returning an integer

captureguy · 2020-05-29T18:54:34Z

What's your gpu memory size ?

I have 16GB of memory on the GPU so I don't think I should see this error. That being said, I do see part of the code that is meant to deal with memory errors:

https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/mtcnn/train/gen_onet_train.py#L83-L88

I assume this part is running on the CPU because it is very slow on my machine.

captureguy · 2020-05-29T18:58:21Z

after training pnet.. I want to generate samples for rnet,
got this error,
File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms
cdef int boxes_num = dets.shape[0]
TypeError: an integer is required

although dets.shape[0] is returning an integer

Try changing cuda to cuda:0 here

https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/scripts/gen_rnet_train.py#L33

I had the same issue and this change fixed it for me.

SURABHI-GUPTA · 2020-05-30T02:00:35Z

after training pnet.. I want to generate samples for rnet,
got this error,
File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms
cdef int boxes_num = dets.shape[0]
TypeError: an integer is required
although dets.shape[0] is returning an integer

Try changing cuda to cuda:0 here

https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/scripts/gen_rnet_train.py#L33

I had the same issue and this change fixed it for me.

cool.. thanks @captureguy

btw for how many epochs have you trained pnet and rnet ?

SURABHI-GUPTA · 2020-05-31T03:42:36Z

@captureguy issue is solved.
have you loaded cuda and cudnn modules properly ?

gerald-ftk · 2020-08-03T22:48:50Z

@captureguy Did you end up fixing your RuntimeError: CUDA error: an illegal memory access was encountered

Issue? I'm getting the same issue and I also have a 16GB GPU

congduan-HNU · 2021-09-06T08:50:03Z

I think I have solved this problem "RuntimeError: CUDA error: an illegal memory access was encountered"
I don't know the reason why the CUDA memory cache can't release, and I added this code in "mtcnn/train/gen_onet_train.py" solved the problem.

for index, item in enumerate(meta_data):
    bar.update(index)
    torch.cuda.empty_cache()

and changed this to monitor：

    try:
        processed_img = detector._preprocess(img)
        candidate_boxes = detector.stage_one(processed_img, 0.5, 0.707, 12, 0.7)
    except RuntimeError:
        print("Out of memory on process img '%s.'" % file_name)
        continue
    try:
        candidate_boxes = detector.stage_two(processed_img, candidate_boxes, 0.5, 0.7)
    except RuntimeError:
        print("Out of memory on process img '%s.'" % file_name)
        continue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error: an illegal memory access was encountered #10

CUDA error: an illegal memory access was encountered #10

captureguy commented May 28, 2020

ansonku commented May 29, 2020

SURABHI-GUPTA commented May 29, 2020

ansonku commented May 29, 2020 •

edited

Loading

SURABHI-GUPTA commented May 29, 2020

ansonku commented May 29, 2020

SURABHI-GUPTA commented May 29, 2020

SURABHI-GUPTA commented May 29, 2020 •

edited

Loading

captureguy commented May 29, 2020

captureguy commented May 29, 2020

SURABHI-GUPTA commented May 30, 2020 •

edited

Loading

SURABHI-GUPTA commented May 31, 2020 •

edited

Loading

gerald-ftk commented Aug 3, 2020

congduan-HNU commented Sep 6, 2021 •

edited

Loading

CUDA error: an illegal memory access was encountered #10

CUDA error: an illegal memory access was encountered #10

Comments

captureguy commented May 28, 2020

ansonku commented May 29, 2020

SURABHI-GUPTA commented May 29, 2020

ansonku commented May 29, 2020 • edited Loading

SURABHI-GUPTA commented May 29, 2020

ansonku commented May 29, 2020

SURABHI-GUPTA commented May 29, 2020

SURABHI-GUPTA commented May 29, 2020 • edited Loading

captureguy commented May 29, 2020

captureguy commented May 29, 2020

SURABHI-GUPTA commented May 30, 2020 • edited Loading

SURABHI-GUPTA commented May 31, 2020 • edited Loading

gerald-ftk commented Aug 3, 2020

congduan-HNU commented Sep 6, 2021 • edited Loading

ansonku commented May 29, 2020 •

edited

Loading

SURABHI-GUPTA commented May 29, 2020 •

edited

Loading

SURABHI-GUPTA commented May 30, 2020 •

edited

Loading

SURABHI-GUPTA commented May 31, 2020 •

edited

Loading

congduan-HNU commented Sep 6, 2021 •

edited

Loading