Zachary Sunberg, PhD
Assistant Professor
University of Colorado Boulder
Waymo Image By Dllu - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=64517567
Markov Model
Markov Decision Process (MDP)
Solving MDPs - The Value Function
$$V^*(s) = \underset{a\in\mathcal{A}}{\max} \left\{R(s, a) + \gamma E\Big[V^*\left(s_{t+1}\right) \mid s_t=s, a_t=a\Big]\right\}$$
Involves all future time
Involves only \(t\) and \(t+1\)
$$\underset{\pi:\, \mathcal{S}\to\mathcal{A}}{\mathop{\text{maximize}}} \, V^\pi(s) = E\left[\sum_{t=0}^{\infty} \gamma^t R(s_t, \pi(s_t)) \bigm| s_0 = s \right]$$
$$Q(s,a) = R(s, a) + \gamma E\Big[V^* (s_{t+1}) \mid s_t = s, a_t=a\Big]$$
Value = expected sum of future rewards
Online Decision Process Tree Approaches
Time
Estimate \(Q(s, a)\) based on children
$$Q(s,a) = R(s, a) + \gamma E\Big[V^* (s_{t+1}) \mid s_t = s, a_t=a\Big]$$
\[V(s) = \max_a Q(s,a)\]
Partially Observable Markov Decision Process (POMDP)
Environment
Belief Updater
Planner
\(o\)
\(b\)
\(a\)
Aggressive: 63%
Normal: 34%
Timid: 3%
\(x, y, v\)
Turn Left
POMDP Models
+
=
Optimization
Specification
POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia
[Egorov, Sunberg, et al., 2017]
Celeste Project
1.54 Petaflops
Explicit
Black Box
("Generative" in POMDP lit.)
\(s,a\)
\(s', o, r\)
Previous C++ framework: APPL
"At the moment, the three packages are independent. Maybe one day they will be merged in a single coherent framework."
[Egorov, Sunberg, et al., 2017]
[Ross, 2008] [Silver, 2010]
*(Partially Observable Monte Carlo Planning)
POMCP
POMCP-DPW
POMCPOW
Joint Dynamics:
$$\dot{x} = f(t, x, u_1, \ldots, u_N)$$
Cost for player \(i\):
$$J_i = \int_0^T{g_i(t, x, u_1, \ldots, u_N) dt}$$
Strategy of player \(i\):
$$u_i(t) = \gamma_i(t, x)$$
(sp, r), back = pullback((s,a)->@gen(:sp,:r)(m, s, a, rng), s, a)
The content of my research reflects my opinions and conclusions, and is not necessarily endorsed by my funding organizations.
Environment
Belief Updater
Policy
\(o\)
\(b\)
\(a\)
\[b_t(s) = P\left(s_t = s \mid a_1, o_1 \ldots a_{t-1}, o_{t-1}\right)\]
Types of Uncertainty
ALEATORY
MODEL (Epistemic, Static)
STATE (Epistemic, Dynamic)