RL for POMDPs

4 Approaches

Approach 1: k-Markov

e.g. Alpha Star

Images: Pascanu 2013

Input: \(u_t\)

State/Output: \(x_t\)

Cost: \(\mathcal{E}_t\)

Input: \(x_t\)

Output: \(h_t\)

Cell state: \(c_t\)

Forget gate

Input Gate

Output Gate

Hochreiter, S. and Schmidhuber, J. (1997). Long Short-Term Memory. 1780, 1735–1780.

By Guillaume Chevalier - File:The_LSTM_Cell.svg, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=109362147

\(f_t\)

\(i_t\)

\(o_t\)

\(\tilde{c}_t\)

By Zachary Sunberg