maxent seems to be using max instead of softmax for V_soft? #4

mohitsharma0690 · 2017-07-31T02:56:47Z

In the backwards pass of MaxEnt (Algo 9.1 Brian's thesis), MaxEnt uses a softmax calculation to update the V function (soft Value function), but maxent.py seems to call value_iteration.optimal_value which calculates the hard Value function that is it uses max instead of softmax. This seems like a bug.

Also the initialization seems kind of weird, atleast for gridworld settings only the final state should be initialized to 0 while all others should be -infinity but value_iteration.optimal_value seems to set everything to 0 initially. Any reason for this discrepancy?

Code for reference: https://github.com/MatthewJA/Inverse-Reinforcement-Learning/blob/master/irl/value_iteration.py#L63

The text was updated successfully, but these errors were encountered:

magnusja · 2017-11-27T23:21:44Z

Did you reach any conclusion on that?

mohitsharma0690 · 2017-12-03T01:10:53Z

I'm pretty sure it is wrong, it should be the softmax value function and not a hard max. Should be pretty easy to fix though.

MatthewJA · 2017-12-07T03:54:46Z

Hey, thanks for the comments. It could very well be a bug (including the initialisation of gridworld) — it's been a long while since I've looked at inverse reinforcement learning so I'm not sure. I'm happy to take pull requests if anyone wants to look into this.

magnusja · 2017-12-13T02:05:39Z

Hey guys,

thanks for the responses.
In these paper they usually mention that the policy should look something like this:
(Kitani et al.)
(Ziebart et al.)

But I wonder if and how it matters? Because apparently, it seems to work with your code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maxent seems to be using max instead of softmax for V_soft? #4

maxent seems to be using max instead of softmax for V_soft? #4

mohitsharma0690 commented Jul 31, 2017

magnusja commented Nov 27, 2017

mohitsharma0690 commented Dec 3, 2017

MatthewJA commented Dec 7, 2017

magnusja commented Dec 13, 2017

maxent seems to be using max instead of softmax for V_soft? #4

maxent seems to be using max instead of softmax for V_soft? #4

Comments

mohitsharma0690 commented Jul 31, 2017

magnusja commented Nov 27, 2017

mohitsharma0690 commented Dec 3, 2017

MatthewJA commented Dec 7, 2017

magnusja commented Dec 13, 2017