What tools do we have to solve MDPs with continuous \(S\) and \(A\)?
Previously: \((S, A, T, R, \gamma)\)
r = act!(env, a)
s = observe(env)
Note: Different from \(s', r = G(s, a)\)
In python, typically
s, r = env.step(a)
Now: Episodic Simulator