Environment
Option 2: Belief Updater
Policy
\(b\)
\(a\)
True State
\(s = TL\)
Observation \(o = TL\)
Belief: \(b_t = P(s_t \mid h_t)\)
\(TL\)
\(TR\)
(Options below)
Option 1: History
\(h\)
History: \(h_t = (b_0, a_0, o_1, a_1, \ldots a_{t-1}, o_{t})\)
\(b_t(s) = P(s_t = s \mid h_t)\)
\(b' = \tau (b, a, o)\)
\[b'(s') \propto Z(o \mid a, s') \sum_{s} T(s' \mid s, a) \, b(s)\]
Solution: Domain specific particle injection based on:
By Zachary Sunberg