Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leakage #3

Open
Po0ria opened this issue Aug 24, 2022 · 0 comments
Open

Memory leakage #3

Po0ria opened this issue Aug 24, 2022 · 0 comments

Comments

@Po0ria
Copy link

Po0ria commented Aug 24, 2022

Thank you for your great research.
I was trying to reproduce your results for the SNN model using DVS dataset. After running for about 70 to 85 epochs the code overwhelms the CPU memory(200GB).
After using a memory profiler Inoticed that test dataloader is the bottleneck. Here is the profiler analysis for the first epoch and second for the test and training functions:
image

image

From the look of it looks like dataloader is not getting freed after each batch which I don't understand.
looking at the dataloader.py in dvs directory:
trainset = spikedata.DVSGesture(data_dir, train=True, num_steps=100, dt=3000, ds=4) testset = spikedata.DVSGesture(data_dir, train=False, num_steps=600, dt=3000, ds=4)
testset is running for more steps which is understandable. Here is the trace in the log:
Traceback (most recent call last): File "run.py", line 60, in <module> evaluate(Net, config, load_data, train, test, optim_func) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/evaluate.py", line 73, in evaluate test_accuracy = test(config, net, testloader, device) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/dvs/test.py", line 10, in test for data in testloader: File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in default_collate return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in <listcomp> return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: [enforce fail at CPUAllocator.cpp:71] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 19660800 bytes. Error code 12 (Cannot allocate memory)
Thank you for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant