Breaking the Curse of Dimensionality in Decision-Making for Autonomous Systems
Assistant Professor Zachary Sunberg
University of Colorado Boulder
September 6th, 2024
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11333987/pasted-from-clipboard.png)
Autonomous Decision and Control Laboratory
-
Algorithmic Contributions
- Scalable algorithms for partially observable Markov decision processes (POMDPs)
- Motion planning with safety guarantees
- Game theoretic algorithms
-
Theoretical Contributions
- Particle POMDP approximation bounds
-
Applications
- Space Domain Awareness
- Autonomous Driving
- Autonomous Aerial Scientific Missions
- Search and Rescue
- Space Exploration
- Ecology
-
Open Source Software
- POMDPs.jl Julia ecosystem
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10272214/pasted-from-clipboard.png)
PI: Prof. Zachary Sunberg
PhD Students
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458281/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458283/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458284/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458406/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9421050/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10229532/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10273976/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11106504/pasted-from-clipboard.png)
Postdoc
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964498/orbital.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11106615/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964509/aspen-and-hex.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8462117/rrbt_considerbelief.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10275503/hippo.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964516/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11333987/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571767/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571768/pasted-from-clipboard.png)
The ADCL creates autonomy that is safe and efficient despite uncertainty
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9348773/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10275503/hippo.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8462117/rrbt_considerbelief.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964498/orbital.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964509/aspen-and-hex.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964516/pasted-from-clipboard.png)
Two Objectives for Autonomy
EFFICIENCY
SAFETY
Minimize resource use
(especially time)
Minimize the risk of harm to oneself and others
Safety often opposes Efficiency
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458272/pasted-from-clipboard.png)
Example 1: Autonomous Driving
Tweet by Nitin Gupta
29 April 2018
https://twitter.com/nitguptaa/status/990683818825736192
Example 1:
Autonomous Driving
Example 2: Tornados
Video: Eric Frew
Example 2: Tornados
Video: Eric Frew
Example 2: Tornados
Video: Eric Frew
Example 2: Tornados
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332212/image2.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332213/image13.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332219/image17.png)
Example 3: Search and Rescue
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11566424/image10.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11566425/image19.jpg)
What do they have in common?
Driving: what are the other road users going to do?
Tornado Forecasting: what is going on in the storm?
Search and Rescue: where is the lost person?
All are sequential decision-making problems with uncertainty!
All can be modeled as a POMDP (with a very large state and observation spaces).
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332220/image17.png)
Outline
- The Promise and Curse of POMDPs
- Breaking the Curse
- Applications
- Multiple Agents
Part I: The Promise and Curse of POMDPs
Types of Uncertainty
Aleatory
Epistemic (Static)
Epistemic (Dynamic)
Interaction
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8834668/pasted-from-clipboard.png)
MDP
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9420979/pasted-from-clipboard.png)
RL
POMDP
Game
Markov Decision Process (MDP)
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
- \(\mathcal{A}\) - Action space
- \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
Aleatory
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332220/image17.png)
\([x, y, z,\;\; \phi, \theta, \psi,\;\; u, v, w,\;\; p,q,r]\)
\(\mathcal{S} = \mathbb{R}^{12}\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332212/image2.jpeg)
\(\mathcal{S} = \mathbb{R}^{12} \times \mathbb{R}^\infty\)
\[\underset{\pi:\, \mathcal{S} \to \mathcal{A}}{\text{maximize}} \quad \text{E}\left[ \sum_{t=0}^\infty R(s_t, a_t) \right]\]
Reinforcement Learning
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
- \(\mathcal{A}\) - Action space
- \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
Aleatory
Epistemic (Static)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332220/image17.png)
\([x, y, z,\;\; \phi, \theta, \psi,\;\; u, v, w,\;\; p,q,r]\)
\(\mathcal{S} = \mathbb{R}^{12}\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332212/image2.jpeg)
\(\mathcal{S} = \mathbb{R}^{12} \times \mathbb{R}^\infty\)
Partially Observable Markov Decision Process (POMDP)
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
- \(\mathcal{A}\) - Action space
- \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
- \(\mathcal{O}\) - Observation space
- \(Z:\mathcal{S} \times \mathcal{A}\times \mathcal{S} \times \mathcal{O} \to \mathbb{R}\) - Observation probability distribution
Aleatory
Epistemic (Static)
Epistemic (Dynamic)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332220/image17.png)
\([x, y, z,\;\; \phi, \theta, \psi,\;\; u, v, w,\;\; p,q,r]\)
\(\mathcal{S} = \mathbb{R}^{12}\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332212/image2.jpeg)
\(\mathcal{S} = \mathbb{R}^{12} \times \mathbb{R}^\infty\)
State
Timestep
Accurate Observations
Goal: \(a=0\) at \(s=0\)
Optimal Policy
Localize
\(a=0\)
POMDP Example: Light-Dark
Solving a POMDP
Environment
Belief Updater
Planner
\(a = +10\)
True State
\(s = 7\)
Observation \(o = -0.21\)
\(b\)
\[b_t(s) = P\left(s_t = s \mid b_0, a_0, o_1 \ldots a_{t-1}, o_{t}\right)\]
\[ = P\left(s_t = s \mid b_{t-1}, a_{t-1}, o_{t}\right)\]
\(Q(b, a)\)
\(O(|\mathcal{S}|^2)\)
Online Tree Search in MDPs
Time
Estimate \(Q(s, a)\) based on children
Bayesian Belief Updates
Environment
Belief Updater
Policy/Planner
\(b\)
\(a\)
\[b_t(s) = P\left(s_t = s \mid b_0, a_0, o_1 \ldots a_{t-1}, o_{t}\right)\]
True State
\(s = 7\)
Observation \(o = -0.21\)
\(O(|S|^2)\)
\[ = P\left(s_t = s \mid b_{t-1}, a_{t-1}, o_{t}\right)\]
Curse of History in POMDPs
Environment
Policy/Planner
\(a\)
True State
\(s = 7\)
Observation \(o = -0.21\)
Optimal planners need to consider the entire history
\(h_t = (b_0, a_0, o_1, a_1, o_2 \ldots a_{t-1}, o_{t})\)
A POMDP is an MDP on the Belief Space
POMDP \((S, A, T, R, O, Z)\) is equivalent to MDP \((S', A', T', R')\)
- \(S' = \Delta(S)\)
- \(A' = A\)
- \(T'\) defined by belief updates (\(T\) and \(Z\))
- \(R'(b, a) = \underset{s \sim b}{E}[R(s, a)]\)
One new continuous state dimension for each state in \(S\)!
Why are POMDPs difficult?
- Curse of History
- Curse of dimensionality
- State space
- Observation space
- Action space
Tree size: \(O\left(\left(|A||O|\right)^D\right)\)
POMDP (decision problem) is PSPACE Complete
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10961923/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10961924/pasted-from-clipboard.png)
Curse of Dimensionality
\(d\) dimensions, \(k\) segments \(\,\rightarrow \, |S| = k^d\)
1 dimension
e.g. \(s = x \in S = \{1,2,3,4,5\}\)
\(|S| = 5\)
2 dimensions
e.g. \(s = (x,y) \in S = \{1,2,3,4,5\}^2\)
\(|S| = 25\)
3 dimensions
e.g. \(s = (x,y,x_h) \in S = \{1,2,3,4,5\}^3\)
\(|S| = 125\)
(Discretize each dimension into 5 segments)
\(x\)
\(y\)
\(x_h\)
Part II: Breaking the Curse
Integration
Find \(\underset{s\sim b}{E}[f(s)]\)
\[=\sum_{s \in S} f(s) b(s)\]
Monte Carlo Integration
\(Q_N \equiv \frac{1}{N} \sum_{i=1}^N f(s_i)\)
\(s_i \sim b\) i.i.d.
\(\text{Var}(Q_N) = \text{Var}\left(\frac{1}{N} \sum_{i=1}^N f(s_i)\right)\)
\(= \frac{1}{N^2} \sum_{i=1}^N\text{Var}\left(f(s_i)\right)\)
\(= \frac{1}{N} \text{Var}\left(f(s_i)\right)\)
\[P(|Q_N - E[f(s_i)]| \geq \epsilon) \leq \frac{\text{Var}(f(s_i))}{N \epsilon^2}\]
(Bienayme)
(Chebyshev)
Curse of dimensionality!
Particle Filter POMDP Approximation
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10961018/just-p-to-pb-mdp.png)
\[b(s) \approx \sum_{i=1}^N \delta_{s}(s_i)\; w_i\]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964375/particle-viz_30.svg.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571732/purple_pomcpow.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571733/planning.png)
[Sunberg and Kochenderfer, ICAPS 2018, T-ITS 2022]
How do we prove convergence?
POMDP Assumptions for Proofs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10953920/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10953922/pasted-from-clipboard.png)
Continuous \(S\), \(O\); Discrete \(A\)
No Dirac-delta observation densities
Bounded Reward
Generative model for \(T\); Explicit model for \(Z\)
Finite Horizon
Only reasonable beliefs
Sparse Sampling-\(\omega\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10953909/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964375/particle-viz_30.svg.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10961018/just-p-to-pb-mdp.png)
Key 1: Self Normalized Infinite Renyi Divergence Concentation
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10953895/pasted-from-clipboard.png)
\(\mathcal{P}\): state distribution conditioned on observations (belief)
\(\mathcal{Q}\): marginal state distribution (proposal)
Key 2: Sparse Sampling
Expand for all actions (\(\left|\mathcal{A}\right| = 2\) in this case)
...
Expand for all \(\left|\mathcal{S}\right|\) states
\(C=3\) states
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964252/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964255/pasted-from-clipboard.png)
SS-\(\omega\) is close to Belief MDP
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10961033/just-p-to-ssw.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10961103/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10961105/pasted-from-clipboard.png)
SS-\(\omega\) close to Particle Belief MDP (in terms of Q)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10961084/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10961088/pasted-from-clipboard.png)
PF Approximation Accuracy
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9903364/pomdp-pb-mdp-2.png)
\[|Q_{\mathbf{P}}^*(b,a) - Q_{\mathbf{M}_{\mathbf{P}}}^*(\bar{b},a)| \leq \epsilon \quad \text{w.p. } 1-\delta\]
For any \(\epsilon>0\) and \(\delta>0\), if \(C\) (number of particles) is high enough,
[Lim, Becker, Kochenderfer, Tomlin, & Sunberg, JAIR 2023]
No direct dependence on \(|\mathcal{S}|\) or \(|\mathcal{O}|\)!
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964414/pasted-from-clipboard.png)
Particle belief planning suboptimality
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10953965/pasted-from-clipboard.png)
\(C\) is too large for any direct safety guarantees. But, in practice, works extremely well for improving efficiency.
[Lim, Becker, Kochenderfer, Tomlin, & Sunberg, JAIR 2023]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964414/pasted-from-clipboard.png)
Why are POMDPs difficult?
- Curse of History
- Curse of dimensionality
- State space
- Observation space
- Action space
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8047634/pasted-from-clipboard.png)
Tree size: \(O\left(\left(|A|C\right)^D\right)\)
Solve simplified surrogate problem for policy deep in the tree
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964414/pasted-from-clipboard.png)
[Lim, Tomlin, and Sunberg, 2021]
Easy MDP to POMDP Extension
Part III: Applications
Example 1: Autonomous Driving
POMDP Formulation
\(s=\left(x, y, \dot{x}, \left\{(x_c,y_c,\dot{x}_c,l_c,\theta_c)\right\}_{c=1}^{n}\right)\)
\(o=\left\{(x_c,y_c,\dot{x}_c,l_c)\right\}_{c=1}^{n}\)
\(a = (\ddot{x}, \dot{y})\), \(\ddot{x} \in \{0, \pm 1 \text{ m/s}^2\}\), \(\dot{y} \in \{0, \pm 0.67 \text{ m/s}\}\)
Ego external state
External states of other cars
Internal states of other cars
External states of other cars
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4897150/scene.png)
- Actions shielded (based only on external states) so they can never cause crashes
- Braking action always available
Efficiency
Safety
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4897993/highway_planning.png)
MDP trained on normal drivers
MDP trained on all drivers
Omniscient
POMCPOW (Ours)
Simulation results
[Sunberg & Kochenderfer, T-ITS 2023]
Navigation among Pedestrians
[Gupta, Hayes, & Sunberg, AAMAS 2022]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571757/baseline-opt.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571758/hjb-opt.gif)
Previous solution: 1-D POMDP (92s avg)
Our solution (65s avg)
State:
- Vehicle physical state
- Human physical state
- Human intention
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9421050/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8832042/resusing_old_hybrid_astar_path_1d_action_space_speed_pomdp_planner_run.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8832043/safe_planner2.gif)
Conventional 1DOF POMDP
Multi-DOF POMDP
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8832045/pasted-from-clipboard.png)
Pedestrian Navigation
[Gupta, Hayes, & Sunberg, AAMAS 2021]
Meteorology
- State: (physical state of aircraft, which forecast is the truth)
- Action: (flight direction, drifter deploy)
- Reward: Terminal reward for correct weather prediction
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332252/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11572017/MART_Weather_Forecast__3_.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332246/exp_7_mcts.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9421050/pasted-from-clipboard.png)
Example 2: Tornado Prediction
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332246/exp_7_mcts.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332250/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332251/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332252/pasted-from-clipboard.png)
Drone Search and Rescue
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332293/image40.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332295/image41.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571901/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571909/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571909/pasted-from-clipboard.png)
State:
- Location of Drone
- Location of Human
Baseline
Our POMDP Planner
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458281/pasted-from-clipboard.png)
[Ray, Laouar, Sunberg, & Ahmed, ICRA 2023]
Drone Search and Rescue
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332283/image15.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332287/image8.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332288/image44.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332289/image45.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332291/image43.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458281/pasted-from-clipboard.png)
[Ray, Laouar, Sunberg, & Ahmed, ICRA 2023]
Space Domain Awareness
(Result for simplified dynamical system)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571994/pasted-from-clipboard.png)
State:
- Position, velocity of object-of-interest
- Anomalies: navigation failure, suspicious maneuver, thruster failure, etc.
Innovation: Large language models allow analysts to quickly specify anomaly hypotheses
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11572014/plan-opt.gif)
Catalog Maintenance Plan
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458283/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11106504/pasted-from-clipboard.png)
Practical Safety Guarantees
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571930/science_09_probability_short.gif)
Three Contributions
- Recursive constraints (solves "stochastic self-destruction")
- Undiscounted POMDP solutions for estimating probability
- Much faster motion planning with Gaussian uncertainty
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571980/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11572003/environment2_trajectory_side.png)
State:
- Position of rover
- Environment state: e.g. traversibility
- Internal status: e.g. battery, component health
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458406/pasted-from-clipboard.png)
[Ho et al., UAI 24], [Ho, Feather, Rossi, Sunberg, & Lahijanian, UAI 24], [Ho, Sunberg, & Lahijanian, ICRA 22]
Explainability: Reward Reconciliation
![](https://s3.amazonaws.com/media-p.slid.es/uploads/2004519/images/10492737/pasted-from-clipboard.png)
Calculate Outcomes
Calculate Weight Update
\( \frac{\epsilon - \alpha_a \cdot \Delta \mu_{h-a}}{\Delta\mu_{h-j} \cdot \Delta \mu_{h-a}} \Delta\mu_{h-j}\)
Estimate Weight with Update
\( \alpha[2]\)
\( \alpha[1]\)
\( a_{h}\) - optimal
\( a_{a}\) - optimal
\( \alpha_{h}\)
\( \alpha_{a}\)
\( R(s,a) = \alpha \cdot \boldsymbol{\phi}(s,a)\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11572006/purple_pomcpow.png)
\(a_a\) outcomes: \(\mu_a\)
\(a_h\) outcomes: \(\mu_h\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11572006/purple_pomcpow.png)
\(a_a\)
\(a_h\)
\( \hat{\alpha}_{h}\)
\(a_h\)
\(a_a\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458284/pasted-from-clipboard.png)
[Kraske, Saksena, Buczak, & Sunberg, ICAA 2024]
Part IV: Multiple Agents
Partially Observable Markov Decision Process (POMDP)
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
- \(\mathcal{A}\) - Action space
- \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
- \(\mathcal{O}\) - Observation space
- \(Z:\mathcal{S} \times \mathcal{A}\times \mathcal{S} \times \mathcal{O} \to \mathbb{R}\) - Observation probability distribution
Aleatory
Epistemic (Static)
Epistemic (Dynamic)
Partially Observable Stochastic Game (POSG)
Aleatory
Epistemic (Static)
Epistemic (Dynamic)
Interaction
- \(\mathcal{S}\) - State space
- \(T(s' \mid s, \bm{a})\) - Transition probability distribution
- \(\mathcal{A}^i, \, i \in 1..k\) - Action spaces
- \(R^i(s, \bm{a})\) - Reward function (cooperative, opposing, or somewhere in between)
- \(\mathcal{O}^i, \, i \in 1..k\) - Observation spaces
- \(Z(o^i \mid \bm{a}, s')\) - Observation probability distributions
Game Theory
Nash Equilibrium: All players play a best response.
Optimization Problem
(MDP or POMDP)
\(\text{maximize} \quad f(x)\)
Game
Player 1: \(U_1 (a_1, a_2)\)
Player 2: \(U_2 (a_1, a_2)\)
Collision
Example: Airborne Collision Avoidance
|
|
|
Player 1
Player 2
Up
Down
Up
Down
-6, -6
-1, 1
1, -1
-4, -4
Collision
Mixed Strategies
Nash Equilibrium \(\iff\) Zero Exploitability
\[\sum_i \max_{\pi_i'} U_i(\pi_i', \pi_{-i})\]
No Pure Nash Equilibrium!
Instead, there is a Mixed Nash where each player plays up or down with 50% probability.
If either player plays up or down more than 50% of the time, their strategy can be exploited.
Exploitability (zero sum):
Strategy (\(\pi_i\)): probability distribution over actions
|
|
|
Up
Down
Up
Down
-1, 1
1, -1
1, -1
-1, 1
Collision
Collision
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9859563/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9859565/pasted-from-clipboard.png)
Space Domain Awareness Games
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9859696/pasted-from-clipboard.png)
POSG Example: Missile Defense
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9706473/pasted-from-clipboard.png)
POMDP Solution:
- Assume a distribution for the missile's actions
- Update belief according to this distribution
- Use a POMDP planner to find the best defensive action
Nash equilibrium: All players play a best response to the other players
Fundamentally impossible for POMDP solvers to compute.
May include stochastic behavior (bluffing)
A shrewd missile operator will use different actions, invalidating our belief
Defending against Maneuverable Hypersonic Weapons: the Challenge
Ballistic
Maneuverable Hypersonic
- Sense
- Estimate
- Intercept
Every maneuver involves tradeoffs
- Energy
- Targets
- Intentions
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9706472/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9706473/pasted-from-clipboard.png)
Simplified SDA Game
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1539327/images/8994832/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/1539327/images/8953973/pasted-from-clipboard.png)
1
2
...
...
...
...
...
...
...
\(N\)
[Becker & Sunberg, AMOS 2022]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9859702/deterministic-strategy.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458283/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9856540/pasted-from-clipboard.png)
[Becker & Sunberg, AMOS 2022]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458283/pasted-from-clipboard.png)
Counterfactual Regret Minimization Training
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9859691/mode-change-game.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9348773/pasted-from-clipboard.png)
[Becker & Sunberg, AMOS 2022]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458283/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9856540/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9856546/pasted-from-clipboard.png)
[Becker & Sunberg, AMOS 2022]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458283/pasted-from-clipboard.png)
Finding a Nash Equilibrium: Poker
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9052289/pasted-from-clipboard.png)
Image: Russel & Norvig, AI, a modern approach
P1: A
P1: K
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11572023/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11572025/pasted-from-clipboard.png)
P2: A
P2: A
P2: K
Tree Search Algorithms for POSGs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11106525/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11563636/joint-distribution-tree.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11563637/tag-strategy.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11563639/strategy_grid.png)
[Becker & Sunberg, NeurIPS 2024 (Under Review)]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458283/pasted-from-clipboard.png)
Tree Search Algorithms for POSGs
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11563695/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11563694/pasted-from-clipboard.png)
[Becker & Sunberg, NeurIPS 2024 (Under Review)]
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458283/pasted-from-clipboard.png)
Thank You!
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964414/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964415/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964416/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458281/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458284/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332296/image2.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332298/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332299/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11332304/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571707/pasted-from-clipboard.png)
Funding orgs: (all opinions are my own)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458406/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11106504/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11573282/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11573291/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11573298/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11573307/pasted-from-clipboard.png)
VADeR
Autonomous Decision and Control Laboratory
-
Algorithmic Contributions
- Scalable algorithms for partially observable Markov decision processes (POMDPs)
- Motion planning with safety guarantees
- Game theoretic algorithms
-
Theoretical Contributions
- Particle POMDP approximation bounds
-
Applications
- Space Domain Awareness
- Autonomous Driving
- Autonomous Aerial Scientific Missions
- Search and Rescue
- Space Exploration
- Ecology
-
Open Source Software
- POMDPs.jl Julia ecosystem
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10272214/pasted-from-clipboard.png)
PI: Prof. Zachary Sunberg
PhD Students
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458281/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458283/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458284/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8458406/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9421050/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10229532/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10273976/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11106504/pasted-from-clipboard.png)
Postdoc
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964498/orbital.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11106615/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964509/aspen-and-hex.jpg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8462117/rrbt_considerbelief.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10275503/hippo.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10964516/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11333987/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571767/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/11571768/pasted-from-clipboard.png)
Part V: Open Source Research Software
Good Examples
- Open AI Gym interface
- OMPL
- ROS
Challenges for POMDP Software
- There is a huge variety of
- Problems
- Continuous/Discrete
- Fully/Partially Observable
- Generative/Explicit
- Simple/Complex
- Solvers
- Online/Offline
- Alpha Vector/Graph/Tree
- Exact/Approximate
- Domain-specific heuristics
- Problems
- POMDPs are computationally difficult.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4924831/explicit.png)
Explicit
Black Box
("Generative" in POMDP lit.)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4924804/black_box.png)
\(s,a\)
\(s', o, r\)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4902355/appl.png)
Previous C++ framework: APPL
"At the moment, the three packages are independent. Maybe one day they will be merged in a single coherent framework."
Open Source Research Software
- Performant
- Flexible and Composable
- Free and Open
- Easy for a wide range of people to use (for homework)
- Easy for a wide range of people to understand
C++
Python, C++
Python, Matlab
Python, Matlab
Python, C++
2013
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/7383382/Screenshot_from_2020-05-20_10-47-36.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/7383407/Screenshot_from_2020-05-20_12-55-46.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9553696/pasted-from-clipboard.png)
We love [Matlab, Lisp, Python, Ruby, Perl, Mathematica, and C]; they are wonderful and powerful. For the work we do — scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing — each one is perfect for some aspects of the work and terrible for others. Each one is a trade-off.
We are greedy: we want more.
2012
![](http://juliapomdp.github.io/POMDPs.jl/latest/figures/concepts.png)
POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia
Mountain Car
partially_observable_mountaincar = QuickPOMDP(
actions = [-1., 0., 1.],
obstype = Float64,
discount = 0.95,
initialstate = ImplicitDistribution(rng -> (-0.2*rand(rng), 0.0)),
isterminal = s -> s[1] > 0.5,
gen = function (s, a, rng)
x, v = s
vp = clamp(v + a*0.001 + cos(3*x)*-0.0025, -0.07, 0.07)
xp = x + vp
if xp > 0.5
r = 100.0
else
r = -1.0
end
return (sp=(xp, vp), r=r)
end,
observation = (a, sp) -> Normal(sp[1], 0.15)
)
using POMDPs
using QuickPOMDPs
using POMDPPolicies
using Compose
import Cairo
using POMDPGifs
import POMDPModelTools: Deterministic
mountaincar = QuickMDP(
function (s, a, rng)
x, v = s
vp = clamp(v + a*0.001 + cos(3*x)*-0.0025, -0.07, 0.07)
xp = x + vp
if xp > 0.5
r = 100.0
else
r = -1.0
end
return (sp=(xp, vp), r=r)
end,
actions = [-1., 0., 1.],
initialstate = Deterministic((-0.5, 0.0)),
discount = 0.95,
isterminal = s -> s[1] > 0.5,
render = function (step)
cx = step.s[1]
cy = 0.45*sin(3*cx)+0.5
car = (context(), circle(cx, cy+0.035, 0.035), fill("blue"))
track = (context(), line([(x, 0.45*sin(3*x)+0.5) for x in -1.2:0.01:0.6]), stroke("black"))
goal = (context(), star(0.5, 1.0, -0.035, 5), fill("gold"), stroke("black"))
bg = (context(), rectangle(), fill("white"))
ctx = context(0.7, 0.05, 0.6, 0.9, mirror=Mirror(0, 0, 0.5))
return compose(context(), (ctx, car, track, goal), bg)
end
)
energize = FunctionPolicy(s->s[2] < 0.0 ? -1.0 : 1.0)
makegif(mountaincar, energize; filename="out.gif", fps=20)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9859869/out.gif)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4924690/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4924692/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4924693/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4924695/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/4924723/pasted-from-clipboard.png)
POMDP Planning with Learned Components
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8832055/pasted-from-clipboard.png)
[Deglurkar, Lim, Sunberg, & Tomlin, 2023]
Continuous \(A\): BOMCP
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8050752/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8050756/pasted-from-clipboard.png)
[Mern, Sunberg, et al. AAAI 2021]
Continuous \(A\): Voronoi Progressive Widening
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8047634/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8047638/pasted-from-clipboard.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/8047645/pasted-from-clipboard.png)
[Lim, Tomlin, & Sunberg CDC 2021]
Storm Science
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10954008/merged.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/10954310/output.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/870752/images/9936153/single_tree.png)
Human Behavior Model: IDM and MOBIL
M. Treiber, et al., “Congested traffic states in empirical observations and microscopic simulations,” Physical Review E, vol. 62, no. 2 (2000).
A. Kesting, et al., “General lane-changing model MOBIL for car-following models,” Transportation Research Record, vol. 1999 (2007).
A. Kesting, et al., "Agents for Traffic Simulation." Multi-Agent Systems: Simulation and Applications. CRC Press (2009).
All drivers normal
Omniscient
Mean MPC
QMDP
POMCPOW
Reappointment Seminar
By Zachary Sunberg
Reappointment Seminar
- 147