Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are Ziebart's thesis, equation 9.2 and find_policy() function the same? #15

Open
tessavdheiden opened this issue Jun 7, 2022 · 0 comments

Comments

@tessavdheiden
Copy link

Hi Matthew!

This repo is just great: It works, its transparant and modular!

I only found two differences between Ziebart's thesis and your implementation.
Can you let me know if you were aware of them?

So here is Eq 9.2:
Screenshot 2022-06-07 at 11 12 54

Here is your code:
Screenshot 2022-06-07 at 11 10 21

And here is Eq 9.1:
Screenshot 2022-06-07 at 11 12 59
Which uses $V^{\text{soft}}$:
Screenshot 2022-06-07 at 11 17 22

And here is your code:
Screenshot 2022-06-07 at 11 10 30

You include a discount factor in Eq 9.2, and in 9.1 you convert a subtraction ($Q^{\text{soft}}-V^{\text{soft}}$) into a fraction ($\frac{Q^{\text{soft}}}{V^{\text{soft}}}$), correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant