Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM on stapls3d dataset #61

Open
Wzy-lab opened this issue Jun 19, 2024 · 3 comments
Open

OOM on stapls3d dataset #61

Wzy-lab opened this issue Jun 19, 2024 · 3 comments

Comments

@Wzy-lab
Copy link

Wzy-lab commented Jun 19, 2024

Thank you very much for your contribution to point cloud instance segmentation, but I encountered an oom problem when reproducing the experiment.
My experimental environment is a cloud server with 40G video memory. I even adjusted the batch_size to 2 and still oom.
First of all, when I executed test.py on the stplsed data set, I used more than 30 G of video memory. Although there are only 25 point cloud files under the val_250m file. Secondly, when I train on the stpls3d data set, it will oom whenever val is used.
I don't know why this is happening. I noticed that you can complete the experiment with 32G of video memory on v100. Looking forward to your reply.

@Wzy-lab
Copy link
Author

Wzy-lab commented Jun 19, 2024

Even 80g of video memory is not enough.
image

@kellieda
Copy link

@Wzy-lab Hello, I noticed that you seem to be encountering similar issues as I am. During the validation/testing phase, I also experience the problem of CUDA running out of memory. If you have found a solution, could you kindly share your experience and strategies? I would greatly appreciate your help. Looking forward to your reply! Thank you.

@Wzy-lab
Copy link
Author

Wzy-lab commented Jul 23, 2024

@Wzy-lab Hello, I noticed that you seem to be encountering similar issues as I am. During the validation/testing phase, I also experience the problem of CUDA running out of memory. If you have found a solution, could you kindly share your experience and strategies? I would greatly appreciate your help. Looking forward to your reply! Thank you.

Sorry. I didn't completely solve this problem. I added some memory recovery code, which can barely perform inference on 40g video memory. I failed to reproduce the training process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants