Zachary Sunberg
June 28, 2018
Partially Observable Markov Decision Process (POMDP)
Solving MDPs and POMDPs - Offline vs Online
Est. Value at Every State
Sequential Decision Trees
Monte Carlo Tree Search
Image by Dicksonlaw583 (CC 4.0)
QMDP
Equivalent to assuming full observability on the next step
Will not take costly exploratory actions
$$Q_{MDP}(s,a)$$
$$Q_{MDP}(b, a) = \sum_{s \in \mathcal{S}}Q_{MDP}(s,a) b(s) \geq Q^*(b,a)$$
\[Q_\pi (b,a) = E \left[\sum_{t=0}^{\infty} \gamma^t R(s_t, a_t) \mid s_0 \sim b, a_0 = a \right]\]
Monte Carlo Tree Search for POMDPs
Silver, David, and Joel Veness. "Monte-Carlo planning in large POMDPs." Advances in neural information processing systems. 2010.
Ross, Stéphane, et al. "Online planning algorithms for POMDPs." Journal of Artificial Intelligence Research 32 (2008): 663-704.
Light-Dark Problem
State
Timestep
Accurate Observations
Goal: a=0 at s=0
Optimal Policy
Localize
a=0
[ ] An infinite number of child nodes must be visited
[ ] Each node must be visited an infinite number of times
Solving continuous POMDPs - POMCP fails
[1] Adrien Coutoux, Jean-Baptiste Hoock, Nataliya Sokolovska, Olivier Teytaud, Nicolas Bonnard. Continuous Upper Confidence Trees. LION’11: Proceedings of the 5th International Conference on Learning and Intelligent OptimizatioN, Jan 2011, Italy. pp.TBA. <hal-00542673v2>
POMCP
Limit number of children to
kNα
Necessary Conditions for Consistency [1]
POMCP
POMCP-DPW
POMCP-DPW converges to QMDP
Proof Outline:
Observation space is continuous → observations unique w.p. 1.
(1) → One state particle in each belief, so each belief is merely an alias for that state
(2) → POMCP-DPW = MCTS-DPW applied to fully observable MDP + root belief state
Solving this MDP is equivalent to finding the QMDP solution → POMCP-DPW converges to QMDP
Sunberg, Z. N. and Kochenderfer, M. J. "Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces", ICAPS (2018)
POMCP-DPW
[ ] An infinite number of child nodes must be visited
[ ] Each node must be visited an infinite number of times
[ ] An infinite number of particles must be added to each belief node
Necessary Conditions for Consistency
Use Z to insert weighted particles
POMCP
POMCP-DPW
POMCPOW
Particle Filter Tree (with DPW)
Discretization
Fixed number of particles \[m\]
Fixed number of scenarios, \(K\)
Somani, A., Ye, N., Hsu, D., & Lee, W. "DESPOT : Online POMDP Planning with Regularization." Journal of Artificial Intelligence Research, 2017
Ye, Nan, et al. "DESPOT: Online POMDP planning with regularization." Journal of Artificial Intelligence Research 58 (2017): 231-266.
Light Dark
Sub Hunt
Sadigh, Dorsa, et al. "Information gathering actions over human internal state." Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on. IEEE, 2016.
Schmerling, Edward, et al. "Multimodal Probabilistic Model-Based Planning for Human-Robot Interaction." arXiv preprint arXiv:1710.09483 (2017).
Sadigh, Dorsa, et al. "Planning for Autonomous Cars that Leverage Effects on Human Actions." Robotics: Science and Systems. 2016.
Tweet by Nitin Gupta
29 April 2018
https://twitter.com/nitguptaa/status/990683818825736192
Human Behavior Model: IDM and MOBIL
M. Treiber, et al., “Congested traffic states in empirical observations and microscopic simulations,” Physical Review E, vol. 62, no. 2 (2000).
A. Kesting, et al., “General lane-changing model MOBIL for car-following models,” Transportation Research Record, vol. 1999 (2007).
A. Kesting, et al., "Agents for Traffic Simulation." Multi-Agent Systems: Simulation and Applications. CRC Press (2009).
POMDP Formulation
s=(x,y,x˙,{(xc,yc,x˙c,lc,θc)}c=1n)
o={(xc,yc,x˙c,lc)}c=1n
a=(x¨,y˙), x¨∈{0,±1 m/s2}, y˙∈{0,±0.67 m/s}
Ego physical state
Physical states of other cars
Internal states of other cars
Physical states of other cars
Efficiency
Safety
POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia
Markov Model
Markov Decision Process (MDP)
Previous C++ framework: APPL
"At the moment, the three packages are independent. Maybe one day they will be merged in a single coherent framework."
All drivers normal
Outcome only
Omniscient
Mean MPC
QMDP
POMCPOW
Simulation results
All drivers normal
Omniscient
Mean MPC
QMDP
POMCPOW
Monte Carlo Tree Search
Image by Dicksonlaw583 (CC 4.0)