★★★★★

Certainty Equivalence

Control as if the mean (or median or mode) is the true state

(subjective)

Optimal for LQG

★★★★★

QMDP

Full observability after 1 step (hindsight knowledge of state uncertainty)

Upper bound on true value

★★★★☆

Hindsight Optimization

Hindsight knowledge of state and outcome uncertainty

Looser upper bound than QMDP

★★☆☆☆

Fast Informed Bound (FIB)

Take one observation into account

Tighter upper bound than QMDP

★★★★☆

\(k\)-Markov

Pretend the last \(k\) observations make up the state and solve the MDP

Great for Atari!

★★★☆☆

Open Loop

Choose a sequence of actions that optimizes the objective in expectation

Good if alleatory is low, epistemic is hard to reduce

★★★☆☆

Most likely observation

Plan assuming \(b' = \tau(b, a, \hat{o}(b))\)

No observation branching; Good when \(Z\) unimodal

By Zachary Sunberg

191-Formulation-Approximation-Table

More from Zachary Sunberg