POMDPs

POMDPs

  • We've been living a lie:
s = observe(env)

Types of Uncertainty

Alleatory

Epistemic (Static)

Epistemic (Dynamic)

Interaction

Markov Decision Process (MDP)

  • \(\mathcal{S}\) - State space
  • \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
  • \(\mathcal{A}\) - Action space
  • \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward

Alleatory

Partially Observable Markov Decision Process (POMDP)

  • \(\mathcal{S}\) - State space
  • \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
  • \(\mathcal{A}\) - Action space
  • \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
  • \(\mathcal{O}\) - Observation space
  • \(Z:\mathcal{S} \times \mathcal{A}\times \mathcal{S} \times \mathcal{O} \to \mathbb{R}\) - Observation probability distribution

Alleatory

Epistemic (Static)

Epistemic (Dynamic)

Tiger POMDP Definition

Hidden Markov Models and Beliefs

Hidden Markov Models and Beliefs

\(P(s_{t_1} \mid s_0, a_0, \ldots, s_t, a_t) = T(s_{t+1} \mid s_t, a_t)\)

\(P(s_{t_1} \mid o_0, a_0, \ldots, o_t, a_t) = P(s_{t+1} \mid a_t, o_{t+t})\text{????}\)

Let

  • \(b_0(s) \equiv P(s_0 = s)\)
  • \(h_t \equiv (b_0, a_0, o_1, a_1, \ldots, a_{t-1}, o_t)\)
  • \(b_t(s) \equiv P(s_t = s \mid h_t)\)

Bayesian Belief Updates

Bayesian Belief Updates

Filtering Loop

Tiger Example

Recap

160 POMDPs

By Zachary Sunberg

160 POMDPs

  • 222