PFT-(DPW)

 

Overview

Online algorithms for POMDPs with continuous state, action, and observation spaces - Sunberg et al.

b_o

Input Belief

Root

Particle Belief

Insert new action if:

 

Action Progressive Widening

|C(b)| \le k_{a}N(b)^{\alpha_a}

Choose next action with

\text{argmax}_{a\in C(b)}\left\{ Q(ba) + c\sqrt{\frac{\log N(b)}{N(ba)}}\right\}
a
b

Generate next belief node

Insert new belief if:

 

|C(ba)| \le k_{o}N(ba)^{\alpha_o}
s_s \sim b \\ o \leftarrow G(s_s,a)
b
a
o
b'
b' = \tau(bao)
b
b',r \leftarrow G_{PF}(b,a)
a

Full Belief Propagation

o
s_i',r_i \leftarrow G(s_i,a)
w_i '= \eta w_i\mathcal{Z}(o|s_i,a,s_i')
r(b,a) = \sum_iw_ir_i
\eta = \left(\sum_i w_i\mathcal{Z}(o|s_i,a,s_i') \right)^{-1}

Propagate

Reweight

b'

Value Estimation

b'
\hat{V}(b') \approx \sum_i\hat{V}(s'_i)w'_i
  • PO Rollout
  • Sparse Belief VI

etc.

  • FO Rollout
  • State VI

etc.

b
a
o
b'
\text{total} = r(b,a) + \gamma\hat{V}(b')
N(b) \leftarrow N(b) + 1 \\ N(ba) \leftarrow N(ba) + 1 \\ Q(ba) \leftarrow Q(ba) + \frac{\text{total} - Q(ba)}{N(ba)}
a_1
a_2
o_1
o_2
o_1
o_2

When not widening observations

\text{i.e. } |C(ba)| > k_oN(ba)^{\alpha_o}
  • Type Stability / Limiting dynamic dispatch
    • JET.jl, code_warntype, flamegraph profiling
    • Type parameterization
  • Eliding observation generation in fully observable rollouts
  • Caching belief vectors (less temporary arrays)
  • Mutable to immutable
  • Sizehinting vectors of unknown size
  • Undef initialization of vectors of known size

Quick note on type mutability

(mutable types inside immutable types are still mutable)

mutable struct Mstruct{T}
    i::T
    v::Vector{T}
end

struct Istruct{T}
    i::T
    v::Vector{T}
end
julia> MS = Mstruct(1,[1,2,3])
Mstruct{Int64}(1, [1, 2, 3])

julia> MS.i += 1
2

julia> push!(MS.v,4)
4-element Vector{Int64}:
 1
 2
 3
 4

julia> MS.v[1] = 10
10

julia> MS.v
4-element Vector{Int64}:
 10
  2
  3
  4
julia> IS.i += 1
ERROR: setfield! immutable struct of type Istruct cannot be changed
Stacktrace:
 [1] setproperty!(x::Istruct{Int64}, f::Symbol, v::Int64)
   @ Base ./Base.jl:34
 [2] top-level scope
   @ none:1

julia> push!(IS.v,4)
4-element Vector{Int64}:
 1
 2
 3
 4

julia> IS.v[1] = 10
10

julia> IS
Istruct{Int64}(1, [10, 2, 3, 4])

Mutable

Immutable

Copy of PFT-DPW (No DPW)

By Zachary Sunberg

Copy of PFT-DPW (No DPW)

  • 286