Skip to content

Rewards not according to criteria #4

@Syaoran87

Description

@Syaoran87

Hi,

Thank you for your awesome code (mcts.py).
However, I think the algorithm is not working according to the policy network. For example, in the link you gave, under section 2.1, it is stated that it aims to find the policy that yields the highest reward.
When I ran the algorithm, the nodes/child selected were not the highest rewards. The root child selected also did not have the highest reward. Does this mean that the policy network is not optimized, and therefore, did not choose the best action/search?

E.g. as follows:
level 0
Num Children: 4
(0, Node; children: 4; visits: 354; reward: 295.106667)
(1, Node; children: 4; visits: 928; reward: 826.746667) <--- This was selected by the algorithim
(2, Node; children: 4; visits: 582; reward: 504.422222)
(3, Node; children: 4; visits: 933; reward: 831.537778) <--- Shouldn't this be selected?
Best Child: Value: 20; Moves: [20]

The input parameters I ran was 10000 loops and 8 levels.

Thank you once again for your help. Hope to hear from you soon.

James

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions