Zachary Sunberg
Postdoctoral Scholar
University of California
Waymo Image By Dllu - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=64517567
Safety and efficiency through planning with drivers internal states
Pioneering online algorithm for POMDPs with continuous observation spaces
POMDP software with first-class speed and unprecedented flexibility
Driving
POMCPOW
POMDPs.jl
POMDPs
UAV Collision Avoidance
Space Situational Awareness
Dual Control
Autorotation
Successful flight test of automatic controller
Future
Autorotation
Driving
POMDPs
POMCPOW
POMDPs.jl
Future
"Expert System" Autorotation Controller
Example Controller: Flare
\[\alpha \equiv \frac{KE_{\text{available}} - KE_{\text{flare exit}}}{KE_{\text{flare entry}} - KE_{\text{flare exit}}}\]
\[TTLE = TTLE_{max} \times \alpha\]
\[\ddot{h}_{des} = -\frac{2}{TTI_F^2}h - \frac{2}{TTI_F}\dot{h}\]
[Sunberg, 2015]
[Sunberg, 2015]
[Sunberg, 2015]
"Expert System" Autorotation Controller
Example Controller: Flare
\[\alpha \equiv \frac{KE_{\text{available}} - KE_{\text{flare exit}}}{KE_{\text{flare entry}} - KE_{\text{flare exit}}}\]
\[TTLE = TTLE_{max} \times \alpha\]
\[\ddot{h}_{des} = -\frac{2}{TTI_F^2}h - \frac{2}{TTI_F}\dot{h}\]
Problems with this approach:
[Sunberg, 2015]
Autorotation
Driving
POMDPs
POMCPOW
POMDPs.jl
Future
Tweet by Nitin Gupta
29 April 2018
https://twitter.com/nitguptaa/status/990683818825736192
Two Objectives for Autonomy
Minimize resource use
(especially time)
Minimize the risk of harm to oneself and others
Safety often opposes Efficiency
Pareto Optimization
Safety
Better Performance
Model \(M_2\), Algorithm \(A_2\)
Model \(M_1\), Algorithm \(A_1\)
Efficiency
$$\underset{\pi}{\mathop{\text{maximize}}} \, V^\pi = V^\pi_\text{E} + \lambda V^\pi_\text{S}$$
Safety
Weight
Efficiency
Intelligent Driver Model (IDM)
[Treiber, et al., 2000] [Kesting, et al., 2007] [Kesting, et al., 2009]
Internal States
All drivers normal
No learning (MDP)
Omniscient
POMCPOW (Ours)
Simulation results
[Sunberg, 2017]
Marginal Distribution: Uniform
\(\rho=0\)
\(\rho=1\)
\(\rho=0.75\)
Internal parameter distributions
Conditional Distribution: Copula
Assume normal
No Learning (MDP)
Omniscient
Mean MPC
QMDP
POMCPOW (Ours)
[Sunberg, 2017]
Autorotation
Driving
POMDPs
POMCPOW
POMDPs.jl
Future
Types of Uncertainty
OUTCOME
MODEL
STATE
Markov Model
Markov Decision Process (MDP)
Solving MDPs - The Value Function
$$V^*(s) = \underset{a\in\mathcal{A}}{\max} \left\{R(s, a) + \gamma E\Big[V^*\left(s_{t+1}\right) \mid s_t=s, a_t=a\Big]\right\}$$
Involves all future time
Involves only \(t\) and \(t+1\)
$$\underset{\pi:\, \mathcal{S}\to\mathcal{A}}{\mathop{\text{maximize}}} \, V^\pi(s) = E\left[\sum_{t=0}^{\infty} \gamma^t R(s_t, \pi(s_t)) \bigm| s_0 = s \right]$$
$$Q(s,a) = R(s, a) + \gamma E\Big[V^* (s_{t+1}) \mid s_t = s, a_t=a\Big]$$
Value = expected sum of future rewards
Online Decision Process Tree Approaches
Time
Estimate \(Q(s, a)\) based on children
$$Q(s,a) = R(s, a) + \gamma E\Big[V^* (s_{t+1}) \mid s_t = s, a_t=a\Big]$$
\[V(s) = \max_a Q(s,a)\]
Partially Observable Markov Decision Process (POMDP)
State
Timestep
Accurate Observations
Goal: \(a=0\) at \(s=0\)
Optimal Policy
Localize
\(a=0\)
Environment
Belief Updater
Policy
\(b\)
\(a\)
\[b_t(s) = P\left(s_t = s \mid a_1, o_1 \ldots a_{t-1}, o_{t-1}\right)\]
True State
\(s = 7\)
Observation \(o = -0.21\)
Environment
Belief Updater
Policy
\(a\)
\(b = \mathcal{N}(\hat{s}, \Sigma)\)
True State
\(s \in \mathbb{R}^n\)
Observation \(o \sim \mathcal{N}(C s, V)\)
\(s_{t+1} \sim \mathcal{N}(A s_t + B a_t, W)\)
\(\pi(b) = K \hat{s}\)
Kalman Filter
\(R(s, a) = - s^T Q s - a^T R a\)
1) ACAS
2) Orbital Object Tracking
4) Medical
3) Dual Control
ACAS X
Trusted UAV
Collision Avoidance
[Sunberg, 2016]
[Kochenderfer, 2011]
\(\mathcal{S}\): Information space for all objects
\(\mathcal{A}\): Which objects to measure
\(R\): - Entropy
Approximately 20,000 objects >10cm in orbit
[Sunberg, 2016]
1) ACAS
2) Orbital Object Tracking
4) Medical
3) Dual Control
State \(x\) Parameters \(\theta\)
\(s = (x, \theta)\) \(o = x + v\)
POMDP solution automatically balances exploration and exploitation
[Slade, Sunberg, et al. 2017]
1) ACAS
2) Orbital Object Tracking
4) Medical
3) Dual Control
1) ACAS
2) Orbital Object Tracking
4) Medical
3) Dual Control
[Ayer 2012]
[Sun 2014]
Personalized Cancer Screening
Steerable Needle Guidance
Autorotation
Driving
POMDPs
POMCPOW
POMDPs.jl
Future
Environment
Belief Updater
Policy
\(o\)
\(b\)
\(a\)
[Ross, 2008] [Silver, 2010]
*(Partially Observable Monte Carlo Planning)
[ ] An infinite number of child nodes must be visited
[ ] Each node must be visited an infinite number of times
Solving continuous POMDPs - POMCP fails
POMCP
Double Progressive Widening (DPW): Gradually grow the tree by limiting the number of children to \(k N^\alpha\)
Necessary Conditions for Consistency
[Coutoux, 2011]
POMCP
POMCP-DPW
[Sunberg, 2018]
\[\underset{\pi: \mathcal{B} \to \mathcal{A}}{\mathop{\text{maximize}}} \, V^\pi(b)\]
\[\underset{a \in \mathcal{A}}{\mathop{\text{maximize}}} \, \underset{s \sim{} b}{E}\Big[Q_{MDP}(s, a)\Big]\]
Same as full observability on the next step
POMCP-DPW converges to QMDP
Proof Outline:
Observation space is continuous with finite density → w.p. 1, no two trajectories have matching observations
(1) → One state particle in each belief, so each belief is merely an alias for that state
(2) → POMCP-DPW = MCTS-DPW applied to fully observable MDP + root belief state
Solving this MDP is equivalent to finding the QMDP solution → POMCP-DPW converges to QMDP
[Sunberg, 2018]
POMCP-DPW
[ ] An infinite number of child nodes must be visited
[ ] Each node must be visited an infinite number of times
[ ] An infinite number of particles must be added to each belief node
Necessary Conditions for Consistency
Use \(Z\) to insert weighted particles
[Sunberg, 2018]
POMCP
POMCP-DPW
POMCPOW
[Sunberg, 2018]
Ours
Suboptimal
State of the Art
Discretized
[Ye, 2017] [Sunberg, 2018]
[Sunberg, 2018]
Ours
Suboptimal
State of the Art
Discretized
[Sunberg, 2018]
Ours
Suboptimal
State of the Art
Discretized
[Sunberg, 2018]
Ours
Suboptimal
State of the Art
Discretized
[Sunberg, 2018]
Ours
Suboptimal
State of the Art
Discretized
[Sunberg, 2018]
Autorotation
Driving
POMDPs
POMCPOW
POMDPs.jl
Future
POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia
[Egorov, Sunberg, et al., 2017]
Celeste Project
1.54 Petaflops
Explicit
Black Box
("Generative" in POMDP lit.)
\(s,a\)
\(s', o, r\)
Previous C++ framework: APPL
"At the moment, the three packages are independent. Maybe one day they will be merged in a single coherent framework."
[Egorov, Sunberg, et al., 2017]
Autorotation
Driving
POMDPs
POMCPOW
POMDPs.jl
Future
Deploying autonomous agents with confidence
Practical Safety Gaurantees
Trusting Visual Sensors
Algorithms for Physical Problems
Physical Vehicles
Environment
Belief State
Convolutional Neural Network
Control System
Architecture for Safety Assurance
1. Continuous multi-dimensional action spaces
2. Data-driven models on modern parallel hardware
CPU Image By Eric Gaba, Wikimedia Commons user Sting, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=68125990
3. Better algorithms for existing POMDPs
Texas A&M (HUSL)
RC Car with assured visual sensing
Optimized Autorotation
Active Sensing
Project-Centric
1. Intro to Probabilistic Models
2. Markov Decision Processes
3. Reinforcement Learning
4. POMDPs
(More focus on online POMDP solutions than Stanford course)
The content of my research reflects my opinions and conclusions, and is not necessarily endorsed by my funding organizations.
POMDP Formulation
\(s=\left(x, y, \dot{x}, \left\{(x_c,y_c,\dot{x}_c,l_c,\theta_c)\right\}_{c=1}^{n}\right)\)
\(o=\left\{(x_c,y_c,\dot{x}_c,l_c)\right\}_{c=1}^{n}\)
\(a = (\ddot{x}, \dot{y})\), \(\ddot{x} \in \{0, \pm 1 \text{ m/s}^2\}\), \(\dot{y} \in \{0, \pm 0.67 \text{ m/s}\}\)
Ego physical state
Physical states of other cars
Internal states of other cars
Physical states of other cars
Efficiency
Safety
$$R(s, a, s') = \text{in\_goal}(s') - \lambda \left(\text{any\_hard\_brakes}(s, s') + \text{any\_too\_slow}(s')\right)$$
\(s=\left(x, y, \dot{x}, \left\{(x_c,y_c,\dot{x}_c,l_c,\theta_c)\right\}_{c=1}^{n}\right)\)
\(o=\left\{(x_c,y_c,\dot{x}_c,l_c)\right\}_{c=1}^{n}\)
\(a = (\ddot{x}, \dot{y})\)
Ego physical state
Physical states of other cars
Internal states of other cars
Physical states
Efficiency
Safety
\( - \lambda \left(\text{any\_hard\_brakes}(s, s') + \text{any\_too\_slow}(s')\right)\)
\(R(s, a, s') = \text{in\_goal}(s')\)
[Sunberg, 2017]
"[The autonomous vehicle] performed perfectly, except when it had to merge onto I-395 South and swing across three lanes of traffic"
- Bloomberg
http://bloom.bg/1Qw8fjB
Monte Carlo Tree Search
Image by Dicksonlaw583 (CC 4.0)
Autorotation
Driving
POMDPs
POMCPOW
POMDPs.jl
Future
Environment
Belief Updater
Policy
\(o\)
\(b\)
\(a\)
\[b_t(s) = P\left(s_t = s \mid a_1, o_1 \ldots a_{t-1}, o_{t-1}\right)\]
Online Decision Process Tree Approaches
State Node
Action Node
(Estimate \(Q(s,a)\) here)