Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maxent seems to be using max instead of softmax for V_soft? #4

Open
mohitsharma0690 opened this issue Jul 31, 2017 · 4 comments
Open

Comments

@mohitsharma0690
Copy link

In the backwards pass of MaxEnt (Algo 9.1 Brian's thesis), MaxEnt uses a softmax calculation to update the V function (soft Value function), but maxent.py seems to call value_iteration.optimal_value which calculates the hard Value function that is it uses max instead of softmax. This seems like a bug.

Also the initialization seems kind of weird, atleast for gridworld settings only the final state should be initialized to 0 while all others should be -infinity but value_iteration.optimal_value seems to set everything to 0 initially. Any reason for this discrepancy?

Code for reference: https://github.com/MatthewJA/Inverse-Reinforcement-Learning/blob/master/irl/value_iteration.py#L63

@magnusja
Copy link

Did you reach any conclusion on that?

@mohitsharma0690
Copy link
Author

I'm pretty sure it is wrong, it should be the softmax value function and not a hard max. Should be pretty easy to fix though.

@MatthewJA
Copy link
Owner

Hey, thanks for the comments. It could very well be a bug (including the initialisation of gridworld) — it's been a long while since I've looked at inverse reinforcement learning so I'm not sure. I'm happy to take pull requests if anyone wants to look into this.

@magnusja
Copy link

Hey guys,

thanks for the responses.
In these paper they usually mention that the policy should look something like this:
image (Kitani et al.)
image (Ziebart et al.)

But I wonder if and how it matters? Because apparently, it seems to work with your code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants