Zachary Sunberg
Two Objectives for Autonomy
Minimize resource use
(especially time)
Minimize the risk of harm to oneself and others
Safety often opposes Efficiency
Alleatory
Static Epistemic
Dynamic Epistemic
MDP
Uncertain MDP (RL)
POMDP
Thrusters (Alleatory)
Gravity (Epistemic)
Rough Terrain (Alleatory and Epistemic)
Policies of Other Vehicles (Epistemic)
Markov Decision Process (MDP)
Partially Observable Markov Decision Process (POMDP)
Safety
Better Performance
Model \(M_2\), Algorithm \(A_2\)
Model \(M_1\), Algorithm \(A_1\)
Efficiency
$$\underset{\pi}{\mathop{\text{maximize}}} \, \sum_{t=1}^T r_t = \sum_{t=1}^T r_t^\text{E} + \lambda r_t^\text{S}$$
Safety
Weight
Efficiency
Time
Estimate \(Q(s, a)\) based on children
$$Q(s,a) = E\left[\sum_t \gamma^t r_t | s_0 = s, a_0=a\right]$$
Tweet by Nitin Gupta
29 April 2018
https://twitter.com/nitguptaa/status/990683818825736192
Intelligent Driver Model (IDM)
[Treiber, et al., 2000] [Kesting, et al., 2007] [Kesting, et al., 2009]
Internal States
MDP trained on normal drivers
MDP trained on all drivers
Omniscient
POMCPOW (Ours)
Simulation results
[Sunberg & Kochenderfer, ACC 2017, T-ITS Under Review]
State
Timestep
Accurate Observations
Goal: \(a=0\) at \(s=0\)
Optimal Policy
Localize
\(a=0\)
POMCP
POMCP-DPW
-18.46
-18.46
POMCP-DPW converges to QMDP
Proof Outline:
Observation space is continuous → observations unique w.p. 1.
(1) → One state particle in each belief, so each belief is merely an alias for that state
(2) → POMCP-DPW = MCTS-DPW applied to fully observable MDP + root belief state
Solving this MDP is equivalent to finding the QMDP solution → POMCP-DPW converges to QMDP
Sunberg, Z. N. and Kochenderfer, M. J. "Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces", ICAPS (2018)
\[\underset{\pi: \mathcal{B} \to \mathcal{A}}{\mathop{\text{maximize}}} \, V^\pi(b)\]
\[\underset{a \in \mathcal{A}}{\mathop{\text{maximize}}} \, \underset{s \sim{} b}{E}\Big[Q_{MDP}(s, a)\Big]\]
Same as full observability on the next step
POMCP
POMCP-DPW
POMCPOW
-18.46
-18.46
51.85
[Sunberg ICAPS 2018, Thesis]
Our simplified algorithm is near-optimal
[Lim, Tomlin, & Sunberg, IJCAI 2020]
(GPS = Generalized Pattern Search)
$$a' = a + \eta \nabla_a Q(s, a)$$
[Mern, Sunberg, et al. AAAI 2021]
[Lim, Tomlin, & Sunberg ICAPS 2021 (Submitted)]
POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia
Previous C++ framework: APPL
"At the moment, the three packages are independent. Maybe one day they will be merged in a single coherent framework."
Celeste Project
1.54 Petaflops