Bug反馈 #3

weixians · 2022-12-05T06:00:42Z

作者您好，非常有幸能阅读到您这篇论文，让我受益匪浅。
关于论文的代码实现，这里我发现了几个问题。

由于将邻接矩阵转成了sparse，导致内存爆炸，目前还没有人能够使用这份代码进行训练；
代码中有比较明显的低级错误，麻烦您改一下。

Lei-Kun · 2022-12-05T06:30:14Z

您好，显存原因可以调小规模训练，关于该问题确实不规范但我测试了一下不影响结果从 Windows 版邮件发送发件人: weixians 发送时间: 2022年12月5日 14:01 收件人: leikun-starting/End-to-end-DRL-for-FJSP 抄送: Subscribed 主题: [leikun-starting/End-to-end-DRL-for-FJSP] Bug反馈 (Issue #3) 作者您好，非常有幸能阅读到您这篇论文，让我受益匪浅。关于论文的代码实现，这里我发现了几个问题。 1. 由于将邻接矩阵转成了sparse，导致内存爆炸，目前还没有人能够使用这份代码进行训练； 2. 代码中有比较明显的低级错误，麻烦您改一下。 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

William1234nn · 2022-12-27T08:14:24Z

C:\Users\William.conda\envs\Pytorch\python.exe "C:/Users/William/OneDrive - stu.xjtu.edu.cn/桌面/leikun/End-to-end-DRL-for-FJSP-main/End-to-end-DRL-for-FJSP-main/FJSP_MultiPPO/PPOwithValue.py"
C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\FJSP_Env.py:187: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at C:\cb\pytorch_1000000000000\work\torch\csrc\utils\tensor_new.cpp:204.)
self.adj = torch.tensor(self.adj)
C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\autograd_init_.py:173: UserWarning: Error detected in AddmmBackward0. Traceback of forward call that caused the error:
File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 471, in
main(1)
File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 403, in main
loss, v_loss = ppo.update(memory,batch_idx)
File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 197, in update
a_entropy, v, log_a, action_node, _, mask_mch_action, hx = self.policy_job(x=env_fea,
File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\models\PPO_Actor1.py", line 217, in forward
v = self.critic(h_pooled)
File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\models\mlp.py", line 156, in forward
return self.linearsself.num_layers - 1
File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
(Triggered internally at C:\cb\pytorch_1000000000000\work\torch\csrc\autograd\python_anomaly_mode.cpp:104.)
Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 471, in
main(1)
File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 403, in main
loss, v_loss = ppo.update(memory,batch_idx)
File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 267, in update
mch_loss_sum.mean().backward(retain_graph=False)
File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\autograd_init.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [32, 1]], which is output 0 of AsStridedBackward0, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

进程已结束,退出代码1

问题定位到这里，请问这个该怎么解决，我用的代码是训练的代码

MagMueller · 2023-02-13T17:30:59Z

C:\Users\William.conda\envs\Pytorch\python.exe "C:/Users/William/OneDrive - stu.xjtu.edu.cn/桌面/leikun/End-to-end-DRL-for-FJSP-main/End-to-end-DRL-for-FJSP-main/FJSP_MultiPPO/PPOwithValue.py" C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\FJSP_Env.py:187: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at C:\cb\pytorch_1000000000000\work\torch\csrc\utils\tensor_new.cpp:204.) self.adj = torch.tensor(self.adj) C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\autograd__init__.py:173: UserWarning: Error detected in AddmmBackward0. Traceback of forward call that caused the error: File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 471, in main(1) File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 403, in main loss, v_loss = ppo.update(memory,batch_idx) File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 197, in update a_entropy, v, log_a, action_node, _, mask_mch_action, hx = self.policy_job(x=env_fea, File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\models\PPO_Actor1.py", line 217, in forward v = self.critic(h_pooled) File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\models\mlp.py", line 156, in forward return self.linearsself.num_layers - 1 File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 1130, in call_impl return forward_call(*input, **kwargs) File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) (Triggered internally at C:\cb\pytorch_1000000000000\work\torch\csrc\autograd\python_anomaly_mode.cpp:104.) Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass Traceback (most recent call last): File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 471, in main(1) File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 403, in main loss, v_loss = ppo.update(memory,batch_idx) File "C:\Users\William\OneDrive - stu.xjtu.edu.cn\桌面\leikun\End-to-end-DRL-for-FJSP-main\End-to-end-DRL-for-FJSP-main\FJSP_MultiPPO\PPOwithValue.py", line 267, in update mch_loss_sum.mean().backward(retain_graph=False) File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch_tensor.py", line 396, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "C:\Users\William.conda\envs\Pytorch\lib\site-packages\torch\autograd__init.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [32, 1]], which is output 0 of AsStridedBackward0, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

进程已结束,退出代码1 问题定位到这里，请问这个该怎么解决，我用的代码是训练的代码

for me the same

Lei-Kun · 2023-02-14T01:07:44Z

你可以降低pytorch版本试试，我用的1.4.0，或者按照报错提升修改下，应该没问题

MagMueller · 2023-02-15T11:23:20Z

The error occurs, because the code uses the same critic for Job and Machine Policy. Therefore the same v value is used in the loss function and after the job policy finished its backward step, the machine policy says, that the gradients are changed.

Lei-Kun · 2023-02-15T12:00:14Z

I've revised the 'PPOwithValue.py', please check for detials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug反馈 #3

Bug反馈 #3

weixians commented Dec 5, 2022

Lei-Kun commented Dec 5, 2022 via email

William1234nn commented Dec 27, 2022

MagMueller commented Feb 13, 2023

Lei-Kun commented Feb 14, 2023

MagMueller commented Feb 15, 2023

Lei-Kun commented Feb 15, 2023

Bug反馈 #3

Bug反馈 #3

Comments

weixians commented Dec 5, 2022

Lei-Kun commented Dec 5, 2022 via email

William1234nn commented Dec 27, 2022

MagMueller commented Feb 13, 2023

Lei-Kun commented Feb 14, 2023

MagMueller commented Feb 15, 2023

Lei-Kun commented Feb 15, 2023