Game Theoretic Approaches for Deception and Counterdeception
Assistant Professor Zachary Sunberg
University of Colorado Boulder
May 5th, 2026
POMDP Solution:
Need some Game Theory!
Nash equilibrium: All players play a best response to the other players
A shrewd missile operator will use different actions, invalidating our belief
|
|
|
|
|
Attacker
Defender
Up
Down
Up
Down
-1, 1
1, -1
1, -1
-1, 1
Collision
Collision
No Pure Nash Equilibrium!
Need a broader solution concept: Mixed Nash equilibrium (includes deceptive behavior like bluffing)
Nash equilibrium: All players play a best response to the other players
Is a Nash equilibrium strategy a good choice in real life against humans?
Yes! Example: Superhuman play in poker with deception through bluffing.
Image: Russel & Norvig, AI, a modern approach
P1: A
P1: K
P2: A
P2: A
P2: K
Tyler Becker
Joined DECODE AI project Fall 2024
State space \(S\)
Actions
\(\mathbf{a}\)
\(a^1\)
\(a^2\)
\(a^n\)
Environment with
shared state \(s\)
Policy \(\pi^1\)
Policy \(\pi^1\)
Policy \(\pi^n\)
\(\vdots\)
Observations
\(\mathbf{o}\)
\(o^1\)
\(o^2\)
\(o^n\)
\(S = \mathbb{R}^{12}\)
\(\times \mathbb{R}^{12}\)
\(\times \mathbb{R}^{\infty}\)
Single Player (POMDP): Beliefs
Extensive Form Game: Information Set
Image: Russel & Norvig, AI, a modern approach
P1: A
P1: K
P2: A
P2: A
P2: K
Text
Environment
\(Q(b, a)\)
Belief \(b\)
Belief Updater
Planner
State \(s\)
Observation \(o\)
[Lim, et al., 2023, JAIR]
Improvements Needed
Policy Network
Value Network
| Tool | Good At | Missing |
|---|---|---|
| POMDP* solvers | State uncertainty | Strategic opponents |
| EFG** Solvers | Strategic opponents | Probabilistic beliefs |
| AlphaZero | Learned search | Simultaneous moves |
* A POMDP (Partially Observable Markov Decision Process) is a single agent POSG (much easier)
** EFG = Extensive Form Game
Image: Russel & Norvig, AI, a modern approach
[Becker & Sunberg, AAMAS 2025]
Our approach: combine particle filtering and information sets
Joint Belief
Joint Action
[Becker & Sunberg, AAMAS 2025]
[Becker & Sunberg, AAMAS 2025]
Alpha Zero
SimultaneousAlpha Zero
(Ours)
[Becker & Sunberg, AMOS 2025]
What about state uncertainty?
Reduce histories to a representative set, to make online planning tractable
Mel Krusniak
Joined DECODE AI project Jan. 2026
Past work: How do we teach adversarial agents in trajectory space to hide and seek information about each other?
Present work: How do we teach a higher-level broadcaster to publicly share the "right" information with an ally?
Consider a three player interaction with one player at a higher informational level.
Research thrust: Automatically choose what information to broadcast, what channel to use, and how precisely to convey it.
Test environment: Discrete tag in a 2x2 grid.
Four states, two bits of information.
Test environment: Discrete tag in a 2x2 grid.
Four states, two bits of information.
\[r^{(\text{evd})}(a) = (1 - \rho) \; r^{(\text{evd})}_{\text{reach\_goal}}(a) + \rho \;r^{(\text{evd})}_{\text{avoid\_psr}}(a)\]
This is one example of an "information game." Others of interest include:
Game theory, machine learning, and control theory all contribute useful tools, but converting between formalizations is fraught, error-prone, and inefficient.
Decisions.jl: Representing and solving games with arbitrary decision networks
[Krusniak et al. AAMAS 2026]
Decisions.jl
Arbitrary Dynamic Decision Networks
POMDPs.jl
using POMDPs, QuickPOMDPs, POMDPTools, QMDP
m = QuickPOMDP(
states = ["left", "right"],
actions = ["left", "right", "listen"],
observations = ["left", "right"],
initialstate = Uniform(["left", "right"]),
discount = 0.95,
transition = function (s, a)
if a == "listen"
return Deterministic(s)
else # a door is opened
return Uniform(["left", "right"]) # reset
end
end,
observation = function (s, a, sp)
if a == "listen"
if sp == "left"
return SparseCat(["left", "right"], [0.85, 0.15])
else
return SparseCat(["right", "left"], [0.85, 0.15])
end
else
return Uniform(["left", "right"])
end
end,
reward = function (s, a)
if a == "listen"
return -1.0
elseif s == a # the tiger was found
return -100.0
else # the tiger was escaped
return 10.0
end
end
)
solver = QMDPSolver()
policy = solve(solver, m)(Opinions are my own)