Rewards not according to criteria

Hi, 

Thank you for your awesome code (mcts.py). 
However, I think the algorithm is not working according to the policy network. For example, in the link you gave, under section 2.1, it is stated that it aims to find the policy that yields the highest reward. 
When I ran the algorithm, the nodes/child selected were not the highest rewards.  The root child selected also did not have the highest reward. Does this mean that the policy network is not optimized, and therefore, did not choose the best action/search?

E.g. as follows:
level 0
Num Children: 4
(0, Node; children: 4; visits: 354; reward: 295.106667)
(1, Node; children: 4; visits: 928; reward: 826.746667) <--- This was selected by the algorithim
(2, Node; children: 4; visits: 582; reward: 504.422222)
(3, Node; children: 4; visits: 933; reward: 831.537778) <--- Shouldn't this be selected?
Best Child: Value: 20; Moves: [20]

The input parameters I ran was 10000 loops and 8 levels. 

Thank you once again for your help. Hope to hear from you soon.

James


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewards not according to criteria #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Rewards not according to criteria #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions