Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: an illegal memory access was encountered #10

Open
captureguy opened this issue May 28, 2020 · 13 comments
Open

CUDA error: an illegal memory access was encountered #10

captureguy opened this issue May 28, 2020 · 13 comments

Comments

@captureguy
Copy link

When I try to generate training data for onet I receive a memory error in the middle of processing the image files. I am using pytorch 1.5. Any help would be appreciated.

Capture

@ansonku
Copy link

ansonku commented May 29, 2020

What's your gpu memory size ?

@SURABHI-GUPTA
Copy link

hey.. while generating data for pnet.. did you encountered module not found error ?
like while running python scripts/gen_pnet_train.py, I got
Traceback (most recent call last):
File "scripts/gen_pnet_train.py", line 4, in
import mtcnn.train.gen_pnet_train as gptd
ImportError: No module named 'mtcnn'

Can you help me with this ?

@ansonku
Copy link

ansonku commented May 29, 2020

@SURABHI-GUPTA

Have a try.
in inscripts/gen_pnet_train.py file. You can add following code

import sys

sys.path.insert(0,'/path_of_folder/FaceDetector/')

@SURABHI-GUPTA
Copy link

but I am getting this error after that:

ModuleNotFoundError: No module named 'mtcnn.utils.nms.cpu_nms'

@ansonku
Copy link

ansonku commented May 29, 2020

@SURABHI-GUPTA

Have you compiled the Cython code?

python setup.py build_ext --inplace

@SURABHI-GUPTA
Copy link

yes, I compiled but there was some error.. it has been resolved.
thanks @ansonku

@SURABHI-GUPTA
Copy link

SURABHI-GUPTA commented May 29, 2020

after training pnet.. I want to generate samples for rnet,
got this error,
File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms
cdef int boxes_num = dets.shape[0]
TypeError: an integer is required

although dets.shape[0] is returning an integer

@captureguy
Copy link
Author

What's your gpu memory size ?

I have 16GB of memory on the GPU so I don't think I should see this error. That being said, I do see part of the code that is meant to deal with memory errors:

https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/mtcnn/train/gen_onet_train.py#L83-L88

I assume this part is running on the CPU because it is very slow on my machine.

@captureguy
Copy link
Author

after training pnet.. I want to generate samples for rnet,
got this error,
File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms
cdef int boxes_num = dets.shape[0]
TypeError: an integer is required

although dets.shape[0] is returning an integer

Try changing cuda to cuda:0 here

https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/scripts/gen_rnet_train.py#L33

I had the same issue and this change fixed it for me.

@SURABHI-GUPTA
Copy link

SURABHI-GUPTA commented May 30, 2020

after training pnet.. I want to generate samples for rnet,
got this error,
File "mtcnn/utils/nms/gpu_nms.pyx", line 17, in mtcnn.utils.nms.gpu_nms.gpu_nms
cdef int boxes_num = dets.shape[0]
TypeError: an integer is required
although dets.shape[0] is returning an integer

Try changing cuda to cuda:0 here

https://github.com/faciallab/FaceDetector/blob/8ece6aaeb65161017999e8bbc0833ff311c8cbf9/scripts/gen_rnet_train.py#L33

I had the same issue and this change fixed it for me.

cool.. thanks @captureguy

btw for how many epochs have you trained pnet and rnet ?

@SURABHI-GUPTA
Copy link

SURABHI-GUPTA commented May 31, 2020

@captureguy issue is solved.
have you loaded cuda and cudnn modules properly ?

@gerald-ftk
Copy link

@captureguy Did you end up fixing your RuntimeError: CUDA error: an illegal memory access was encountered

Issue? I'm getting the same issue and I also have a 16GB GPU

@congduan-HNU
Copy link

congduan-HNU commented Sep 6, 2021

I think I have solved this problem "RuntimeError: CUDA error: an illegal memory access was encountered"
I don't know the reason why the CUDA memory cache can't release, and I added this code in "mtcnn/train/gen_onet_train.py" solved the problem.

for index, item in enumerate(meta_data):
    bar.update(index)
    torch.cuda.empty_cache()

and changed this to monitor:

    try:
        processed_img = detector._preprocess(img)
        candidate_boxes = detector.stage_one(processed_img, 0.5, 0.707, 12, 0.7)
    except RuntimeError:
        print("Out of memory on process img '%s.'" % file_name)
        continue
    try:
        candidate_boxes = detector.stage_two(processed_img, candidate_boxes, 0.5, 0.7)
    except RuntimeError:
        print("Out of memory on process img '%s.'" % file_name)
        continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants