Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

不理解done的时机 #2

Open
tphhh123 opened this issue Feb 26, 2025 · 1 comment
Open

不理解done的时机 #2

tphhh123 opened this issue Feb 26, 2025 · 1 comment

Comments

@tphhh123
Copy link

tphhh123 commented Feb 26, 2025

代码中done的是要36000步之后设置为true,为什么要这么做?
还有奖励设置也与论文不同reward = -(np.sum(total_energy)),只有能量部分
代码似乎并不是很完整

@qlt315
Copy link
Owner

qlt315 commented Feb 26, 2025

同学你好,36000应该是比较随意的设置的一个episode中的step数。reward的话在文章发出后经过几次调整,可能是某次调试中把惩罚项给去掉了,reward应该我一开始加了一个正整数保证其是正的。 此外我建议你可以参考这个库的thesis branch,里面是这个库原始代码的优化和修改版本。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants