OOM issue with self._prevent_trainer_and_dataloaders_deepcopy()
#12516
Unanswered
MGheini
asked this question in
code help: RL / MetaLearning
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm working on a meta-learning-based code where I'm implementing the maml algorithm. I understand you might not necessarily be fully familiar with the algorithm. But I think for my purposes it's enough to know that each round of optimization consists of an inner loop and the outer loop, and in the inner loop I need to copy the model. I expected these two snippets below to behave in the same way:
vs.
However, the first one runs fine. But the second one results in CUDA OOM.
I print out the
torch.cuda.memory_allocated()
at the beginning of the outer loop at each round, and while it seems pretty stable in the first case, it keeps increasing with._prevent_trainer_and_dataloaders_deepcopy()
. I have not been able to pin down the root of the problem. Can you please give me some pointers?Thanks a lot!
Beta Was this translation helpful? Give feedback.
All reactions