Game Theoretic Approaches for Deception and Counterdeception

Assistant Professor Zachary Sunberg

University of Colorado Boulder

May 5th, 2026

Why Game Theory?

Example context: air defense against maneuvering attackers.
Traditional approach: assume a model for the attacker, use optimal control to intercept.
Attacker can exploit this approach by behaving unpredictably (a form of deception).
If, instead, we play a Nash equilibrium strategy for a zero-sum game, the strategy is robust in the sense that we will do no worse if the other player changes their strategy.

POSG Example: Missile Defense

POMDP Solution:

Assume a distribution for the missile's actions
Update belief according to this distribution
Use a POMDP planner to find the best defensive action

Need some Game Theory!

Nash equilibrium: All players play a best response to the other players

A shrewd missile operator will use different actions, invalidating our belief

Simplified Missile Defense Game

Attacker

Defender

Down

-1, 1

1, -1

-1, 1

Collision

No Pure Nash Equilibrium!

Need a broader solution concept: Mixed Nash equilibrium (includes deceptive behavior like bluffing)

Nash equilibrium: All players play a best response to the other players

Why Game Theory?

Is a Nash equilibrium strategy a good choice in real life against humans?

Yes! Example: Superhuman play in poker with deception through bluffing.

Tabletop Game 2: Poker

Image: Russel & Norvig, AI, a modern approach

P1: A

P1: K

P2: A

P2: K

Part 1: Online Planning in POSGs

Tyler Becker

Joined DECODE AI project Fall 2024

Partially Observable Stochastic Games (POSGs)

Environment has state \(s \in S\)
Agents act simultaneously with \(a^i\) forming joint action \(\mathbf{a}\)
State transitions according to \(T(s' \mid s, \mathbf{a})\)
Each agent receives reward \(R^i(s, \mathbf{a}, s')\)
Each agent receives an observation \(o^i\) with distribution \(Z(o^i \mid \mathbf{a}, s')\)

State space \(S\)

Actions

\(\mathbf{a}\)

\(a^1\)

\(a^2\)

\(a^n\)

Environment with

shared state \(s\)

Policy \(\pi^1\)

Policy \(\pi^n\)

\(\vdots\)

Observations

\(\mathbf{o}\)

\(o^1\)

\(o^2\)

\(o^n\)

\(S = \mathbb{R}^{12}\)

\(\times \mathbb{R}^{12}\)

\(\times \mathbb{R}^{\infty}\)

Two ways to represent uncertainty

Single Player (POMDP): Beliefs

Extensive Form Game: Information Set

Image: Russel & Norvig, AI, a modern approach

P1: A

P1: K

P2: A

P2: K

Text

Online POMDP Planning

Environment

\(Q(b, a)\)

Belief \(b\)

Belief Updater

Planner

State \(s\)

Observation \(o\)

Why are POMDPs difficult?

Curse of History
Curse of dimensionality
1. State space
2. Observation space
3. Action space

[Lim, et al., 2023, JAIR]

Tabletop Game 1: Go

Improvements Needed

Simultaneous Play
State Uncertainty

Policy Network

Value Network

POSGs: Challenges and Existing Tools

Large state spaces
Partial observations
Strategic opponents
Deception and counter-deception

Our approach: Build on these three tools

Tool	Good At	Missing
POMDP* solvers	State uncertainty	Strategic opponents
EFG** Solvers	Strategic opponents	Probabilistic beliefs
AlphaZero	Learned search	Simultaneous moves

* A POMDP (Partially Observable Markov Decision Process) is a single agent POSG (much easier)

** EFG = Extensive Form Game

Image: Russel & Norvig, AI, a modern approach

CDIT (conditional distribution infoset trees) = particles + information sets
Particles approximate hidden state
Information sets preserve strategic uncertainty
CFR (counterfactual regret minimization) searches for low-exploitability policies

Particle-Filter POSGs

Online Planning Structure: CDIT

[Becker & Sunberg, AAMAS 2025]

Our approach: combine particle filtering and information sets

Joint Belief

Joint Action

Standard uncertainty-aware planners handle hidden states, but not strategic interaction
Standard game-theoretic planners handle other agents, but scale poorly to physical state spaces
Need planning methods for uncertain, multi-agent, real-world environments

Test Environment: Partially Observable Tag

[Becker & Sunberg, AAMAS 2025]

Equilibrium policies are stochastic and hide intent
Exploitability measures vulnerability - demonstrably decreases

What the CDIT Gives Us

[Becker & Sunberg, AAMAS 2025]

Equilibrium search without state enumeration
Continuous-state POSG planning
Probabilistic exploitability bounds
No direct dependence on state space size

Combining with Learning: Simultaneous Alpha Zero

AlphaZero assumes turn-taking
Markov Games (e.g. POSGs) move simultaneously
Each state becomes a matrix game
Search improves both players' policies

Alpha Zero

SimultaneousAlpha Zero

(Ours)

Search Improves Robustness

Best-response value improves; exploitability falls
Mixed actions prevent exploitation
Deeper search provably reduces exploitability

Counter-Deception in SDA

Target exploits eclipse and occlusion
Observer learns sensing geometry and robust manuevering strategies

Observer satellite seeks well-illuminated vantage-points of target satellite

1. Simultaneous Play:

Space Domain Awareness

[Becker & Sunberg, AMOS 2025]

What about state uncertainty?

Up Next: Policy State Approximation

Reduce histories to a representative set, to make online planning tractable

Part 2: Optimal Deception on Shared Channels

Mel Krusniak

Joined DECODE AI project Jan. 2026

Motivation: Informed Pursuit-Evasion

Past work: How do we teach adversarial agents in trajectory space to hide and seek information about each other?

Present work: How do we teach a higher-level broadcaster to publicly share the "right" information with an ally?

Goal: Optimal Informer
Behavior

Consider a three player interaction with one player at a higher informational level.

Pursuer, Evader: Competition with side goals, acting based on semi-public information.
Informer: Sees the full state. Determines what semi-public information is broadcast.

Research thrust: Automatically choose what information to broadcast, what channel to use, and how precisely to convey it.

Preliminary Methods

"When should I lie?"
(or "what channels are safe?")

Test environment: Discrete tag in a 2x2 grid.
Four states, two bits of information.

"When should I lie?"
(or "what channels are safe?")

Test environment: Discrete tag in a 2x2 grid.
Four states, two bits of information.

Adversariality: Evader has a side goal.
It seeks a goal state with weight (1-⍴).
Goal is random, but seen by the informer.
Noise: Pursuer has worse observations.
The second bit (or "channel") is perturbed with probability ⍵.

\[r^{(\text{evd})}(a) = (1 - \rho) \; r^{(\text{evd})}_{\text{reach\_goal}}(a) + \rho \;r^{(\text{evd})}_{\text{avoid\_psr}}(a)\]

Long Term Goals

This is one example of an "information game." Others of interest include:

Channel takeover prevention / "confirm-then-commit" tactics
Dogwhistle / "double speak" identification
Information denial via mixed strategies

Game theory, machine learning, and control theory all contribute useful tools, but converting between formalizations is fraught, error-prone, and inefficient.

Decisions.jl: Representing and solving games with arbitrary decision networks

Open Source Software!

[Krusniak et al. AAMAS 2026]

Decisions.jl

Arbitrary Dynamic Decision Networks

POMDPs.jl

using POMDPs, QuickPOMDPs, POMDPTools, QMDP

m = QuickPOMDP(
    states = ["left", "right"],
    actions = ["left", "right", "listen"],
    observations = ["left", "right"],
    initialstate = Uniform(["left", "right"]),
    discount = 0.95,

    transition = function (s, a)
        if a == "listen"
            return Deterministic(s)
        else # a door is opened
            return Uniform(["left", "right"]) # reset
        end
    end,

    observation = function (s, a, sp)
        if a == "listen"
            if sp == "left"
                return SparseCat(["left", "right"], [0.85, 0.15])
            else
                return SparseCat(["right", "left"], [0.85, 0.15])
            end
        else
            return Uniform(["left", "right"])
        end
    end,

    reward = function (s, a)
        if a == "listen"
            return -1.0
        elseif s == a # the tiger was found
            return -100.0
        else # the tiger was escaped
            return 10.0
        end
    end
)

solver = QMDPSolver()
policy = solve(solver, m)

Thank You!

www.cu-adcl.org

(Opinions are my own)

DECODE AI May 2026

By Zachary Sunberg

Why Game Theory?

POSG Example: Missile Defense

Simplified Missile Defense Game

Why Game Theory?

Tabletop Game 2: Poker

Part 1: Online Planning in POSGs

Partially Observable Stochastic Games (POSGs)

Two ways to represent uncertainty

Online POMDP Planning

Why are POMDPs difficult?

Tabletop Game 1: Go

POSGs: Challenges and Existing Tools

Particle-Filter POSGs

Online Planning Structure: CDIT

Test Environment: Partially Observable Tag

What the CDIT Gives Us

Combining with Learning: Simultaneous Alpha Zero

Search Improves Robustness

Counter-Deception in SDA

1. Simultaneous Play:

Space Domain Awareness

Up Next: Policy State Approximation

Part 2: Optimal Deception on Shared Channels

Motivation: Informed Pursuit-Evasion

Goal: Optimal Informer Behavior

Preliminary Methods

"When should I lie?" (or "what channels are safe?")

"When should I lie?" (or "what channels are safe?")

Long Term Goals

Open Source Software!

Thank You!

DECODE AI May 2026

DECODE AI May 2026

Zachary Sunberg

More from Zachary Sunberg

Goal: Optimal Informer
Behavior

"When should I lie?"
(or "what channels are safe?")

"When should I lie?"
(or "what channels are safe?")