Game Theoretic Approaches for Deception and Counterdeception

 

Assistant Professor Zachary Sunberg

University of Colorado Boulder

May 5th, 2026

Why Game Theory?

  • Example context: air defense against maneuvering attackers.
  • Traditional approach: assume a model for the attacker, use optimal control to intercept.
  • Attacker can exploit this approach by behaving unpredictably (a form of deception).
  • If, instead, we play a Nash equilibrium strategy for a zero-sum game, the strategy is robust in the sense that we will do no worse if the other player changes their strategy.

POSG Example: Missile Defense

POMDP Solution:

  1. Assume a distribution for the missile's actions
  2. Update belief according to this distribution
  3. Use a POMDP planner to find the best defensive action

Need some Game Theory!

Nash equilibrium: All players play a best response to the other players

A shrewd missile operator will use different actions, invalidating our belief

Simplified Missile Defense Game


 

 

Attacker

Defender

Up

Down

Up

Down

-1, 1

1, -1

1, -1

-1, 1

Collision

Collision

No Pure Nash Equilibrium!

Need a broader solution concept: Mixed Nash equilibrium (includes deceptive behavior like bluffing)

Nash equilibrium: All players play a best response to the other players

Why Game Theory?

Is a Nash equilibrium strategy a good choice in real life against humans?

Yes! Example: Superhuman play in poker with deception through bluffing.

Tabletop Game 2: Poker

Image: Russel & Norvig, AI, a modern approach

P1: A

P1: K

P2: A

P2: A

P2: K

Part 1: Online Planning in POSGs

Tyler Becker

Joined DECODE AI project Fall 2024

Partially Observable Stochastic Games (POSGs)

  • Environment has state \(s \in S\)
  • Agents act simultaneously with \(a^i\) forming joint action \(\mathbf{a}\)
  • State transitions according to \(T(s' \mid s, \mathbf{a})\)
  • Each agent receives reward \(R^i(s, \mathbf{a}, s')\)
  • Each agent receives an observation \(o^i\) with distribution \(Z(o^i \mid \mathbf{a}, s')\)

State space \(S\)

Actions

\(\mathbf{a}\)

\(a^1\)

\(a^2\)

\(a^n\)

Environment with

shared state \(s\)

Policy \(\pi^1\)

Policy \(\pi^1\)

Policy \(\pi^n\)

\(\vdots\)

Observations

\(\mathbf{o}\)

\(o^1\)

\(o^2\)

\(o^n\)

\(S = \mathbb{R}^{12}\)

\(\times \mathbb{R}^{12}\)

\(\times \mathbb{R}^{\infty}\)

Two ways to represent uncertainty

Single Player (POMDP): Beliefs

Extensive Form Game: Information Set

Image: Russel & Norvig, AI, a modern approach

P1: A

P1: K

P2: A

P2: A

P2: K

Text

Online POMDP Planning

Environment

\(Q(b, a)\)

Belief \(b\)

Belief Updater

Planner

State \(s\)

Observation \(o\)

Why are POMDPs difficult?

  1. Curse of History
  2. Curse of dimensionality
    1. State space
    2. Observation space
    3. Action space

[Lim, et al., 2023, JAIR]

Tabletop Game 1: Go

Improvements Needed

  1. Simultaneous Play
  2. State Uncertainty
image/svg+xml
image/svg+xml

Policy Network

Value Network

POSGs: Challenges and Existing Tools

  • Large state spaces
  • Partial observations
  • Strategic opponents
  • Deception and counter-deception
  • Our approach: Build on these three tools
ToolGood At Missing
POMDP* solversState uncertaintyStrategic opponents
EFG** SolversStrategic opponentsProbabilistic beliefs
AlphaZeroLearned searchSimultaneous moves

* A POMDP (Partially Observable Markov Decision Process) is a single agent POSG (much easier)

** EFG = Extensive Form Game

Image: Russel & Norvig, AI, a modern approach

  • CDIT (conditional distribution infoset trees)  = particles + information sets
  • Particles approximate hidden state
  • Information sets preserve strategic uncertainty
  • CFR (counterfactual regret minimization) searches for low-exploitability policies

Particle-Filter POSGs

Online Planning Structure: CDIT

[Becker & Sunberg, AAMAS 2025]

Our approach: combine particle filtering and information sets

Joint Belief

Joint Action

  • Standard uncertainty-aware planners handle hidden states, but not strategic interaction
  • Standard game-theoretic planners handle other agents, but scale poorly to physical state spaces
  • Need planning methods for uncertain, multi-agent, real-world environments

Test Environment: Partially Observable Tag

[Becker & Sunberg, AAMAS 2025]

  • Equilibrium policies are stochastic and hide intent
  • Exploitability measures vulnerability - demonstrably decreases

What the CDIT Gives Us

[Becker & Sunberg, AAMAS 2025]

  • Equilibrium search without state enumeration
  • Continuous-state POSG planning
  • Probabilistic exploitability bounds
  • No direct dependence on state space size

Combining with Learning: Simultaneous Alpha Zero

  • AlphaZero assumes turn-taking
  • Markov Games (e.g. POSGs) move simultaneously
  • Each state becomes a matrix game
  • Search improves both players' policies

Alpha Zero

SimultaneousAlpha Zero

(Ours)

Search Improves Robustness

  • Best-response value improves; exploitability falls
  • Mixed actions prevent exploitation
  • Deeper search provably reduces exploitability

Counter-Deception in SDA

  • Target exploits eclipse and occlusion
  • Observer learns sensing geometry and robust manuevering strategies
  • Observer satellite seeks well-illuminated vantage-points of target satellite

1. Simultaneous Play:

Space Domain Awareness

[Becker & Sunberg, AMOS 2025]

What about state uncertainty?

Up Next: Policy State Approximation

Reduce histories to a representative set, to make online planning tractable

Part 2: Optimal Deception on Shared Channels

Mel Krusniak

Joined DECODE AI project Jan. 2026

Motivation: Informed Pursuit-Evasion

Past work: How do we teach adversarial agents in trajectory space to hide and seek information about each other?

Present work: How do we teach a higher-level broadcaster to publicly share the "right" information with an ally? 

Goal: Optimal Informer
Behavior

Consider a three player interaction with one player at a higher informational level.
 

  • Pursuer, Evader: Competition with side goals, acting based on semi-public information.
     
  • Informer: Sees the full state. Determines what semi-public information is broadcast.

 

Research thrust: Automatically choose what information to broadcast, what channel to use, and how precisely to convey it.

 

Preliminary Methods

"When should I lie?"
(or "what channels are safe?")
 

Test environment: Discrete tag in a 2x2 grid.
Four states, two bits of information.

"When should I lie?"
(or "what channels are safe?")
 

Test environment: Discrete tag in a 2x2 grid.
Four states, two bits of information.
 

  • Adversariality: Evader has a side goal.
    It seeks a goal state with weight (1-).
    Goal is random, but seen by the informer.

     
  • Noise: Pursuer has worse observations.
    The second bit (or "channel") is perturbed with probability .

\[r^{(\text{evd})}(a) = (1 - \rho) \; r^{(\text{evd})}_{\text{reach\_goal}}(a) + \rho \;r^{(\text{evd})}_{\text{avoid\_psr}}(a)\]

Long Term Goals

This is one example of an "information game." Others of interest include:

  • Channel takeover prevention / "confirm-then-commit" tactics
  • Dogwhistle / "double speak" identification
  • Information denial via mixed strategies

Game theory, machine learning, and control theory all contribute useful tools, but converting between formalizations is fraught, error-prone, and inefficient.

Decisions.jl: Representing and solving games with arbitrary decision networks

Open Source Software!

[Krusniak et al. AAMAS 2026]

Decisions.jl

Arbitrary Dynamic Decision Networks

POMDPs.jl

using POMDPs, QuickPOMDPs, POMDPTools, QMDP

m = QuickPOMDP(
    states = ["left", "right"],
    actions = ["left", "right", "listen"],
    observations = ["left", "right"],
    initialstate = Uniform(["left", "right"]),
    discount = 0.95,

    transition = function (s, a)
        if a == "listen"
            return Deterministic(s)
        else # a door is opened
            return Uniform(["left", "right"]) # reset
        end
    end,

    observation = function (s, a, sp)
        if a == "listen"
            if sp == "left"
                return SparseCat(["left", "right"], [0.85, 0.15])
            else
                return SparseCat(["right", "left"], [0.85, 0.15])
            end
        else
            return Uniform(["left", "right"])
        end
    end,

    reward = function (s, a)
        if a == "listen"
            return -1.0
        elseif s == a # the tiger was found
            return -100.0
        else # the tiger was escaped
            return 10.0
        end
    end
)

solver = QMDPSolver()
policy = solve(solver, m)

Thank You!

(Opinions are my own)

DECODE AI May 2026

By Zachary Sunberg

DECODE AI May 2026

  • 13