Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCB1's estimates update #1

Open
Haotian-CS opened this issue Jan 11, 2020 · 2 comments
Open

UCB1's estimates update #1

Haotian-CS opened this issue Jan 11, 2020 · 2 comments

Comments

@Haotian-CS
Copy link

in solvers.py, line 97 : self.estimates[i] += 1. / (self.counts[i] + 1) * (r - self.estimates[i])

i think it should like that: self.estimates[i] = payoff[i] / (self.counts[i] + 1)

Could you please explain it? Thanks!

@Jayzhaowj
Copy link

Hello Haotian,

I think they are equivalent. Since line97 is adding the difference between estimated rewards at time t and estimated rewards at time t-1 which is equivalent as your statement. Thanks.

@zhengshuai202
Copy link

为什么我运行了,没有图形结果

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants