You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your great research.
I was trying to reproduce your results for the SNN model using DVS dataset. After running for about 70 to 85 epochs the code overwhelms the CPU memory(200GB).
After using a memory profiler Inoticed that test dataloader is the bottleneck. Here is the profiler analysis for the first epoch and second for the test and training functions:
From the look of it looks like dataloader is not getting freed after each batch which I don't understand.
looking at the dataloader.py in dvs directory: trainset = spikedata.DVSGesture(data_dir, train=True, num_steps=100, dt=3000, ds=4) testset = spikedata.DVSGesture(data_dir, train=False, num_steps=600, dt=3000, ds=4)
testset is running for more steps which is understandable. Here is the trace in the log: Traceback (most recent call last): File "run.py", line 60, in <module> evaluate(Net, config, load_data, train, test, optim_func) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/evaluate.py", line 73, in evaluate test_accuracy = test(config, net, testloader, device) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/dvs/test.py", line 10, in test for data in testloader: File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in default_collate return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in <listcomp> return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: [enforce fail at CPUAllocator.cpp:71] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 19660800 bytes. Error code 12 (Cannot allocate memory)
Thank you for your help
The text was updated successfully, but these errors were encountered:
Thank you for your great research.
I was trying to reproduce your results for the SNN model using DVS dataset. After running for about 70 to 85 epochs the code overwhelms the CPU memory(200GB).
After using a memory profiler Inoticed that test dataloader is the bottleneck. Here is the profiler analysis for the first epoch and second for the test and training functions:
From the look of it looks like dataloader is not getting freed after each batch which I don't understand.
looking at the dataloader.py in dvs directory:
trainset = spikedata.DVSGesture(data_dir, train=True, num_steps=100, dt=3000, ds=4) testset = spikedata.DVSGesture(data_dir, train=False, num_steps=600, dt=3000, ds=4)
testset is running for more steps which is understandable. Here is the trace in the log:
Traceback (most recent call last): File "run.py", line 60, in <module> evaluate(Net, config, load_data, train, test, optim_func) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/evaluate.py", line 73, in evaluate test_accuracy = test(config, net, testloader, device) File "/afs/crc.nd.edu/user/p/ptaheri/Private/benchmarkSNN/QSNNs/dvs/test.py", line 10, in test for data in testloader: File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data) File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in default_collate return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 84, in <listcomp> return [default_collate(samples) for samples in transposed] File "/afs/crc.nd.edu/user/p/ptaheri/.conda/envs/QSNNs/lib/python3.8/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate return torch.stack(batch, 0, out=out) RuntimeError: [enforce fail at CPUAllocator.cpp:71] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 19660800 bytes. Error code 12 (Cannot allocate memory)
Thank you for your help
The text was updated successfully, but these errors were encountered: