Professor Zachary Sunberg
PI: Prof. Zachary Sunberg
PhD Students
Postdoc
Every maneuver involves tradeoffs
Nash Equilibrium: All players play a best response.
Optimization Problem
\(\text{subject to} \quad g(x) \geq 0\)
\(\text{maximize} \quad f(x)\)
Game
Player 1: \(U_1 (a_1, a_2)\)
Player 2: \(U_2 (a_1, a_2)\)
Collision
Example: Airborne Collision Avoidance
|
|
|
Player 1
Player 2
Up
Down
Up
Down
-6, -6
-1, 1
1, -1
-4, -4
Collision
Nash Equilibrium \(\iff\) Zero Exploitability
\[\sum_i \max_{\pi_i'} U_i(\pi_i', \pi_{-i})\]
No Pure Nash Equilibrium!
Instead, there is a Mixed Nash where each player plays up or down with 50% probability.
If either player plays up or down more than 50% of the time, their strategy can be exploited.
Exploitability (zero sum):
Hypersonic Missile Defense (simplified)
|
|
|
Attacker
Defender
Up
Down
Up
Down
-1, 1
1, -1
1, -1
-1, 1
Collision
Collision
Strategy (\(\pi_i\)): probability distribution over actions
Domain 1: Onboard Vehicle Control
Domain 2: Battlespace Management
Domain 3: Arsenal Design
Differential Games
Deep Reinforcement Learning
Image: DeepMind
Incomplete Information Game Theory
*Partially observable Markov decision process
POMDP* Planning
Incomplete Information Extensive form Game
Our new algorithm for POMGs
|
||
|
||
|
Default Strategy Profile
Mix Pure Strategies
Add Best Response to Pure Strategy Set
Compute Best Response
(Solve POMDP)
\[\hat{x}', \Sigma' = \text{Filter}(\hat{x}, \Sigma, y, \pi_1)\]
\[u = \tilde{u} -K(\hat{x} - \tilde{x})\]
[Peters, Sunberg, et al. 2020]
2 ms solve time with warm starting
< 3 min offline training, 2 ms online computation
100-1000 ms online planning
[Sunberg and Kochenderfer, 2018]
[Sunberg and Kochenderfer, 2022]
[Gupta, Hayes, and Sunberg, 2022]
[Lim, Sunberg, et al. (in prep)]
\[|Q_{\mathbf{P}}^*(b,a) - Q_{\mathbf{M}_{\mathbf{P}}}^*(\bar{b},a)| \leq \epsilon \quad \text{w.p. } 1-\delta\]
For any \(\epsilon>0\) and \(\delta>0\), if \(C\) (number of particles) is high enough,
[Lim, Becker, Kochenderfer, Tomlin, & Sunberg, JAIR 2023]
No dependence on \(|\mathcal{S}|\) or \(|\mathcal{O}|\)!
1
2
...
...
...
...
...
...
...
\(N\)
[Becker & Sunberg AMOS 2021]
Issue: Current counterfactual regret (CFR) techniques have large computation requirements because of high variance due to random sampling
Solution: Scenario-based GT planning (idea borrowed from POMDP research)
[Somani et al. 2017]
[Garg et al. 2019]
Notional timeline (2 Faculty, 3 Students):
Apply existing diff. game approaches
Develop and implement feedback diff. game algorithms
Teach advanced dynamic GT class
Months:
Test in high fidelity simulation
Diff. games for onboard control
POMDP DO for battlespace mgmt
Scenario-based GT Planning
6
12
18
24
32
36
implement on representative PO Markov game
Develop decentralized composable algorithm
Other
Organize workshop at conference
Develop and imlement basic algorithm
Test in realistic simulation
Develop Mathematical Theory
Test in realistic simulation
Deliverables: Reference implementations
Hybrid Games
Student Internship at contractor
POMDPs.jl - An interface for defining and solving MDPs and POMDPs
Mathematical Frameworks
Algorithm Development
Realistic Simulation
Physical Tests
Develop
to compute unexploitable strategies for hypersonic weapon defense systems
Probability distributions describing what all players should do
Nash Equilibrium: All players play a best response.
Example: Stag Hunt
Optimization Problem
\(\text{subject to} \quad g(x) \geq 0\)
\(\text{maximize} \quad f(x)\)
Game
Player 1: \(U_1 (a_1, a_2)\)
Player 2: \(U_2 (a_1, a_2)\)
Strategy (\(\pi_i\)): probability distribution over actions
|
|
|
Player 1
Player 2
Stag
Hare
Stag
Hare
10, 10
1, 8
5, 5
8, 1