(we think)
(approximately solve original problem)
(solve a slightly different problem)
Last week
Thursday
Today!
\[\pi^*Â = \underset{\pi : B \to A}{\text{argmax}} \,\, \text{E}\left[\sum_{t=0}^\infty \gamma^t R(s_t, \pi(b_t))\right]\]
\[b' = \tau(b, a, o)\]
\[\pi^*Â = \underset{\pi : B \to A}{\text{argmax}} \,\, \text{E}\left[\sum_{t=0}^\infty \gamma^t R(s_t, \pi(b_t))\right]\]
\[b' = \tau(b, a, o)\]
\[\pi_{\text{CE}}(b)Â = \pi_s (\underset{s\sim b}{\text{E}[s])}\]
\[b' = \tau(b, a, o)\]
Optimal for LQG
\[\pi^*Â = \underset{\pi : B \to A}{\text{argmax}} \,\, \text{E}\left[\sum_{t=0}^\infty \gamma^t R(s_t, \pi(b_t))\right]\]
\[b' = \tau(b, a, o)\]
\[\pi_\text{QMDP}(b)Â = \underset{a \in A}{\text{argmax}} \,\, \underset{s\sim b}{\text{E}}\left[Q_\text{MDP}(s, a)\right]\]
\[b' = \tau(b, a, o)\]
State
Timestep
Accurate Observations
Goal: \(a=0\) at \(s=0\)
Optimal Policy
Localize
\(a=0\)
Same as full observability on the next step
QMDP
Full POMDP
INDUSTRIAL GRADE
ACAS X
[Kochenderfer, 2011]
\[\pi^*Â = \underset{\pi : B \to A}{\text{argmax}} \,\, \text{E}\left[\sum_{t=0}^\infty \gamma^t R(s_t, \pi(b_t))\right]\]
\[b' = \tau(b, a, o)\]
\[\pi^*Â = \underset{\pi : B \to A}{\text{argmax}} \,\, \text{E}\left[\sum_{t=0}^\infty \gamma^t R(s_t, \pi(b_t))\right]\]
\[b' = \tau(b, a, o)\]
\[\pi^*Â = \underset{\pi : B \to A}{\text{argmax}} \,\, \text{E}\left[\sum_{t=0}^\infty \gamma^t R(s_t, \pi(b_t))\right]\]
\[b' = \tau(b, a, o)\]
\[\pi^*Â = \underset{\pi : B \to A}{\text{argmax}} \,\, \text{E}\left[\sum_{t=0}^\infty \gamma^t R(s_t, \pi(b_t))\right]\]
\[b' = \tau(b, a, o)\]