Dynamic Games
- Markov Games
- 2 Player zero sum turn taking games
Markov Games
Markov Game Definition
Markov Game Definition
\[(I, S, A, T, R, \gamma)\]
- \(I\) = set of players
- \(S\) = state space (combined for all players)
- \(A\) = joint action space
- \(T\) = transition distributions: \(T(s' \mid s, a)\) is the distribution of \(s'\) given state \(s\) and joint action \(a\)
- \(R\) = joint reward function: \(R^i(s, a)\) is the reward for agent \(i\) in state \(s\) when joint action \(a\) is taken
Goofspiel (or Game of Pure Strategy)
Goofspiel(5) Optimal Strategy
Calculated with Dynamic Programming [Rhoads and Bartholdi, 2012]
This is not a game payoff matrix!
Nash equilibrium at every state
Reduction to Simple Game
Reduction to Simple Game
Best response in a Markov Game
Best response in a Markov Game
Fictitious Play in a Markov Game
Fictitious Play in a Markov Game
Loop:
1. Simulate with \(\pi^{BR}\)
2. Update \(N(j, a^j, s)\) (\(N\) should reflect *all* past simulations)
2. \(\pi^j (a^j \mid s) \propto N(j, a^j, s) \quad \forall j\)
3. \(\pi^{BR} \gets\) best response to \(\pi\)
\(\pi\) (not necessarily \(\pi^{BR}\)) converges to a Nash equilibrium in some cases, notably 2-player zero-sum games
Markov Game Recap
- Definition \((I, S, A, T, R, \gamma)\)
- Reduction to simple game
- Computing a best response means solving an MDP - know how to construct the MDP
Turn-taking Games
Terminology
- Max and Min players
- deterministic
- two player
- zero-sum
- perfect information
Minimax Trees
Minimax Tree
MDP Expectimax Tree
\[V(s) = \text{max}_{a \in \mathcal{A}}\left(R(s, a) + \text{E}[V(s')]\right)\]
\[V(s) = \text{max}_{a \in \mathcal{A_1}}\left(R(s, a) + \text{min}_{a' \in A_2} (R(s', a') + V(s''))\right)\]
Tic-Tac-Toe Example
Tree Backup Example
Why is this harder than an MDP? (think back to sparse sampling)
Alpha-Beta Pruning
MCTS for Games
Note: the above example does not follow UCB exploration
Imperfect Information
Partially Observable Markov Game
Belief updates?
Reduction to Simple Game
Reduction to Simple Game
Extensive Form Game
(Alternative to POMGs that is more common in the literature)
- Similar to a minimax tree for a turn-taking game
- Chance nodes
- Information sets
Extensive Form Game
(Alternative to POMGs that is more common in the literature)
- Similar to a minimax tree for a turn-taking game
- Chance nodes
- Information sets
Extensive Form Game
Extensive-form game definition (\(h\) is a sequence of actions called a "history"):
- Finite set of \(n\) players, plus the "chance" player
- \(P(h)\) (player at each history)
- \(A(h)\) (set of actions at each history)
- \(I(h)\) (information set that each history maps to)
- \(U(h)\) (payoff for each leaf node in the game tree)
King-Ace Poker Example
- 4 Cards: 2 Aces, 2 Kings
- Each player is dealt a card
- P1 can either raise (\(r\)) the payoff to 2 points or check (\(k\)) the payoff at 1 point
- If P1 raises, P2 can either call (\(c\)) Player 1's bet, or fold (\(f\)) the payoff back to 1 point
- The highest card wins
King-Ace Poker Example
- 4 Cards: 2 Aces, 2 Kings
- Each player is dealt a card
- P1 can either raise (\(r\)) the payoff to 2 points or check (\(k\)) the payoff at 1 point
- If P1 raises, P2 can either call (\(c\)) Player 1's bet, or fold (\(f\)) the payoff back to 1 point
- The highest card wins
Extensive to Matrix Form
Exponential in number of info states!
Fictitious Play in Extensive Form Games
This slide not covered on exam
Heinrich et al. 2015 "Fictitious Self Play in Extensive-Form Games"
250 Dynamic Games
By Zachary Sunberg
250 Dynamic Games
- 498