Teaching is only half of my job...
Linux
PI: Prof. Zachary Sunberg
PhD Students
Postdoc
Tweet by Nitin Gupta
29 April 2018
https://twitter.com/nitguptaa/status/990683818825736192
Video: Eric Frew
Driving: what are the other road users going to do?
Tornado Forecasting: what is going on in the storm?
Search and Rescue: where is the lost person?
All are sequential decision-making problems with uncertainty!
All can be modeled as a POMDP (with a very large state and observation spaces).
Markov Decision Process (MDP)
Aleatory
\([x, y, z,\;\; \phi, \theta, \psi,\;\; u, v, w,\;\; p,q,r]\)
\(\mathcal{S} = \mathbb{R}^{12}\)
\(\mathcal{S} = \mathbb{R}^{12} \times \mathbb{R}^\infty\)
\[\underset{\pi:\, \mathcal{S} \to \mathcal{A}}{\text{maximize}} \quad \text{E}\left[ \sum_{t=0}^\infty R(s_t, a_t) \right]\]
Reinforcement Learning
Aleatory
Epistemic (Static)
\([x, y, z,\;\; \phi, \theta, \psi,\;\; u, v, w,\;\; p,q,r]\)
\(\mathcal{S} = \mathbb{R}^{12}\)
\(\mathcal{S} = \mathbb{R}^{12} \times \mathbb{R}^\infty\)
Partially Observable Markov Decision Process (POMDP)
Aleatory
Epistemic (Static)
Epistemic (Dynamic)
\([x, y, z,\;\; \phi, \theta, \psi,\;\; u, v, w,\;\; p,q,r]\)
\(\mathcal{S} = \mathbb{R}^{12}\)
\(\mathcal{S} = \mathbb{R}^{12} \times \mathbb{R}^\infty\)
Environment
Belief Updater
Planner
\(a = +10\)
True State
\(s = 7\)
Observation \(o = -0.21\)
\(b\)
\[b_t(s) = P\left(s_t = s \mid b_0, a_0, o_1 \ldots a_{t-1}, o_{t}\right)\]
\[ = P\left(s_t = s \mid b_{t-1}, a_{t-1}, o_{t}\right)\]
\(Q(b, a)\)
\(O(|\mathcal{S}|^2)\)
[Gupta, Hayes, & Sunberg, AAMAS 2022]
Previous solution: 1-D POMDP (92s avg)
Our solution (65s avg)
State:
State:
Baseline
Our POMDP Planner
[Ray, Laouar, Sunberg, & Ahmed, ICRA 2023]
[Ray, Laouar, Sunberg, & Ahmed, ICRA 2023]
(Result for simplified dynamical system)
State:
Innovation: Large language models allow analysts to quickly specify anomaly hypotheses
Catalog Maintenance Plan
Aleatory
Epistemic (Static)
Epistemic (Dynamic)
Interaction
POMDP Solution:
Nash equilibrium: All players play a best response to the other players
Fundamentally impossible for POMDP solvers to compute.
May include stochastic behavior (bluffing)
A shrewd missile operator will use different actions, invalidating our belief
[Becker & Sunberg, NeurIPS 2024 (Under Review)]
Funding orgs: (all opinions are my own)
VADeR