You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think they are equivalent. Since line97 is adding the difference between estimated rewards at time t and estimated rewards at time t-1 which is equivalent as your statement. Thanks.
in solvers.py, line 97 : self.estimates[i] += 1. / (self.counts[i] + 1) * (r - self.estimates[i])
i think it should like that: self.estimates[i] = payoff[i] / (self.counts[i] + 1)
Could you please explain it? Thanks!
The text was updated successfully, but these errors were encountered: