Alleatory
Epistemic (Static)
Epistemic (Dynamic)
Interaction
Markov Decision Process
Reinforcement Learning
POMDP
Game
Called a Normal Form, Simple, or Bimatrix Game
Question for today: What solution concept should we use for games?
S | W | |
S | 4 | 2 |
W | 3 | 1 |
A
B
Alice's Payoffs
S | W | |
S | 4 | 3 |
W | 2 | 1 |
A
B
Bob's Payoffs
S | W | |
S | 4, 4 | 2, 3 |
W | 3, 2 | 1, 1 |
Alice
Bob
From 2021: This example isn't great
Definitions
Deterministic Best Response:
Action \(a^i\) is a deterministic best response to \(a^{-i}\) if \[R^i (a^i, a^{-i}) \geq R^i ({a^i}', a^{-i}) \quad \forall {a^i}'\]
Is the dominant strategy equilibrium always the best outcome for the players?
S | W | |
S | 4, 4 | 2, 3 |
W | 3, 2 | 1, 1 |
Alice
Bob
Do all simple games have a dominant strategy equilibrium?
S | T | |
S | -1, -1 | -4, 0 |
T | 0, -4 | -3, -3 |
Player 2
Player 1
Pure Nash Equilibrium: All players play a deterministic best response.
Collision
Example: Airborne Collision Avoidance
|
|
|
Player 1
Player 2
Up
Down
Up
Down
-5, -5
-1, 0
0, -1
-4, -4
Collision
Do all simple games have a pure Nash equilibrium?
Which equilibrium is better?
Missile Defense (simplified)
|
|
|
Attacker
Defender
Up
Down
Up
Down
-1, 1
1, -1
1, -1
-1, 1
Collision
Collision
No Pure Nash Equilibrium!
Need a broader solution concept: Mixed Nash equilibrium.
Single Player Joint
Best Response: Given a joint policy of all other agents, \(\pi^{-i}\), a best response is a policy \(\pi^i\) that satisfies
\[U^i\left(\pi^i, \pi^{-i} \right) \geq U^i\left({\pi^i}',\pi^{-i}\right)\]
for all other \({\pi^i}'\).
Two Player Zero Sum: \[R^1(a) + R^2(a) = 0 \quad \forall a\]
Single Player Joint
\((R^1(A^1,A^2), R^2(A^1,B^1))\)
\((R^1(\cdot), R^2(\cdot))\)
\((R^1(\cdot), R^2(\cdot))\)
\((R^1(\cdot), R^2(\cdot))\)
\((R^1(\cdot), R^2(\cdot))\)
Nash Equilibrium
Missile Defense (simplified)
|
|
|
Attacker
Defender
Up
Down
Up
Down
-1, 1
1, -1
1, -1
-1, 1
Collision
Collision
Do all simple games have at least one Nash equilibrium?
Yes!! (might be mixed)
Stag | Hare | |
---|---|---|
Stag | 4, 4 | 1, 3 |
Hare | 3, 1 | 2, 2 |
Correlated Equilibrium
Topology of bimatrix games:
Iterated Best Response: randomly cycle between agents who play the best response for the current policy (converges to Nash for certain narrow classes of games)
Fictitious Play:
(converges to Nash for wider class of games, notably zero-sum)
Correlated Equilibrium
B | S | |
---|---|---|
B | 2, 1 | 0,0 |
S | 0, 0 | 1, 2 |
https://youtube.com/shorts/w3q77ZZIqwA?si=J8H6L6W5kTRs-mUx