What else am I doing?
Adversarial Policies in POMDPs
Professor Zachary Sunberg
University of Colorado Boulder
CAAMS Winter '25 IAB Meeting
PI: Prof. Zachary Sunberg
PhD Students
Postdoc
Markov Decision Process (MDP)
Aleatory
\([x, y, z,\;\; \phi, \theta, \psi,\;\; u, v, w,\;\; p,q,r]\)
\(\mathcal{S} = \mathbb{R}^{12}\)
\(\mathcal{S} = \mathbb{R}^{12} \times \mathbb{R}^\infty\)
\[\underset{\pi:\, \mathcal{S} \to \mathcal{A}}{\text{maximize}} \quad \text{E}\left[ \sum_{t=0}^\infty R(s_t, a_t) \right]\]
Partially Observable Markov Decision Process (POMDP)
Aleatory
Epistemic (Static)
Epistemic (Dynamic)
\([x, y, z,\;\; \phi, \theta, \psi,\;\; u, v, w,\;\; p,q,r]\)
\(\mathcal{S} = \mathbb{R}^{12}\)
\(\mathcal{S} = \mathbb{R}^{12} \times \mathbb{R}^\infty\)
1. Low-sample particle filtering
2. Sparse Sampling
[Lim, Becker, Kochenderfer, Tomlin, & Sunberg, JAIR 2023; Sunberg & Kochenderfer, ICAPS 2018; Others]
Environment
Belief Updater
Planner
\(a = +10\)
\(b\)
\[b_t(s) = P\left(s_t = s \mid b_0, a_0, o_1 \ldots a_{t-1}, o_{t}\right)\]
\[ = P\left(s_t = s \mid b_{t-1}, a_{t-1}, o_{t}\right)\]
\(Q(b, a)\)
True State
\(s = 7\)
Observation \(o = -0.21\)
[Lim, Becker, Kochenderfer, Tomlin, & Sunberg, JAIR 2023]
\[|Q_{\mathbf{P}}^*(b,a) - Q_{\mathbf{M}_{\mathbf{P}}}^*(\bar{b},a)| \leq \epsilon \quad \text{w.p. } 1-\delta\]
For any \(\epsilon>0\) and \(\delta>0\), if \(C\) (number of particles) is high enough,
No direct relationship between \(C\) and \(|\mathcal{S}|\) or \(|\mathcal{O}|\)
What is the worst case probabilistic model for the orange evader?
This cannot be calculated by solving a (PO)MDP!
(Every (PO)MDP has an optimal deterministic policy)
The POMDP is a good model for information gathering, but it is incomplete:
Partially Observable Stochastic Game (POSG)
Image: Russel & Norvig, AI, a modern approach
P1: A
P1: K
P2: A
P2: A
P2: K
Partially Observable Markov Decision Process (POMDP)
Aleatory
Epistemic (Static)
Epistemic (Dynamic)
Aleatory
Epistemic (Static)
Epistemic (Dynamic)
Interaction
[Becker & Sunberg, AAMAS 2025]
Our approach: combine particle filtering and information sets
Joint Belief
Joint Action
[Becker & Sunberg, AAMAS 2025]
Open (related) questions:
PI: Prof. Zachary Sunberg
PhD Students
Postdoc
[Peters, Tomlin, and Sunberg 2020]
Incomplete Information Extensive form Game
Our new algorithm for POMGs
POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia
PI: Prof. Zachary Sunberg
PhD Students
Postdoc
[Mern, Sunberg, et al. AAAI 2021]
[Lim, Tomlin, & Sunberg CDC 2021]
Individual Infectiousness
Infection Age
Incident Infections
Need
Test sensitivity is secondary to frequency and turnaround time for COVID-19 surveillance
Larremore et al.
Viral load represented by piecewise-linear hinge function