POMDPs
POMDPs
- We've been living a lie:
s = observe(env)
Types of Uncertainty
Alleatory
Epistemic (Static)
Epistemic (Dynamic)
Interaction
Markov Decision Process (MDP)
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
- \(\mathcal{A}\) - Action space
- \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
Alleatory
Partially Observable Markov Decision Process (POMDP)
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
- \(\mathcal{A}\) - Action space
- \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
- \(\mathcal{O}\) - Observation space
- \(Z:\mathcal{S} \times \mathcal{A}\times \mathcal{S} \times \mathcal{O} \to \mathbb{R}\) - Observation probability distribution
Alleatory
Epistemic (Static)
Epistemic (Dynamic)
Tiger POMDP Definition
Hidden Markov Models and Beliefs
Hidden Markov Models and Beliefs
\(P(s_{t_1} \mid s_0, a_0, \ldots, s_t, a_t) = T(s_{t+1} \mid s_t, a_t)\)
\(P(s_{t_1} \mid o_0, a_0, \ldots, o_t, a_t) = P(s_{t+1} \mid a_t, o_{t+t})\text{????}\)
Let
- \(b_0(s) \equiv P(s_0 = s)\)
- \(h_t \equiv (b_0, a_0, o_1, a_1, \ldots, a_{t-1}, o_t)\)
- \(b_t(s) \equiv P(s_t = s \mid h_t)\)
Bayesian Belief Updates
Bayesian Belief Updates
Filtering Loop
Tiger Example
Recap
160 POMDPs
By Zachary Sunberg
160 POMDPs
- 343