Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed Quick Start example to QuickPOMDP from DiscreteExplicitPOMDP #304

Merged
merged 3 commits into from
May 30, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 38 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,54 +42,55 @@ using Pkg; pkg"add SARSOP"

## Quick Start

To run a simple simulation of the classic [Tiger POMDP](https://www.cs.rutgers.edu/~mlittman/papers/aij98-pomdp.pdf) using a policy created by the QMDP solver, you can use the following code (Note that this uses the simplified [Discrete Explicit interface from QuickPOMDPs.jl](https://github.com/JuliaPOMDP/QuickPOMDPs.jl); the [main interface](https://juliapomdp.github.io/POMDPs.jl/stable/def_pomdp/) and the [Quick Interface](https://github.com/JuliaPOMDP/QuickPOMDPs.jl) have much more expressive power):
To run a simple simulation of the classic [Tiger POMDP](https://www.cs.rutgers.edu/~mlittman/papers/aij98-pomdp.pdf) using a policy created by the QMDP solver, you can use the following code (note that POMDPs.jl is not limited to discrete problems with explicitly-defined distributions like this):

```julia
using POMDPs, QuickPOMDPs, POMDPSimulators, QMDP

S = [:left, :right]
A = [:left, :right, :listen]
O = [:left, :right]
γ = 0.95

function T(s, a, sp)
if a == :listen
return s == sp
else # a door is opened
return 0.5 #reset
end
end

function Z(a, sp, o)
if a == :listen
if o == sp
return 0.85
using POMDPs, QuickPOMDPs, POMDPModelTools, POMDPSimulators, QMDP

m = QuickPOMDP(
states = [:left, :right],
actions = [:left, :right, :listen],
observations = [:left, :right],
initialstate_distribution = Uniform([:left, :right]),
discount = 0.95,

transition = function (s, a)
if a == :listen
return Deterministic(s) # tiger stays behind the same door
else # a door is opened
return Uniform([:left, :right]) # reset
end
end,

observation = function (s, a, sp)
if a == :listen
if sp == :left
return SparseCat([:left, :right], [0.85, 0.15]) # sparse categorical distribution
else
return SparseCat([:right, :left], [0.85, 0.15])
end
else
return 0.15
return Uniform([:left, :right])
end
end,

reward = function (s, a, sp, o...) # QMDP needs R(s,a,sp), but simulations use R(s,a,sp,o)
if a == :listen
return -1.0
elseif s == a # the tiger was found
return -100.0
else # the tiger was escaped
return 10.0
end
else
return 0.5
end
end

function R(s, a)
if a == :listen
return -1.0
elseif s == a # the tiger was found
return -100.0
else # the tiger was escaped
return 10.0
end
end

m = DiscreteExplicitPOMDP(S,A,O,T,Z,R,γ)
)

solver = QMDPSolver()
policy = solve(solver, m)

rsum = 0.0
for (s,b,a,o,r) in stepthrough(m, policy, "s,b,a,o,r", max_steps=10)
println("s: $s, b: $([pdf(b,s) for s in S]), a: $a, o: $o")
println("s: $s, b: $([pdf(b,s) for s in states(m)]), a: $a, o: $o")
global rsum += r
end
println("Undiscounted reward was $rsum.")
Expand Down