Autonomous Thermalling as a Partially Observable Markov Decision Process
Iain Guilliard, Richard Rogahn, Jim Piavis, Andrey Kolobov
Robotics: Science and Systems, 2018
Presented by Zachary Sunberg, August 25, 2021
Motivation
Thermals can help aircraft fly with less power.
Video by Tobias Kemmerer
Motivation
Birds use thermalling extensively
Motivation
Thermals are difficult to detect.
Contributions
- Formulate the thermalling problem as a POMDP with key assumptions that simplify the problem without crippling performance.
- Propose an algorithm simple enough to run on a Pixhawk microcontroller.
- Implement POMDPSoar as an Open-source arduplane module.
- Demonstrate convincingly that POMDPSoar outperforms ArduSoar.
Background: POMDPs
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
- \(\mathcal{A}\) - Action space
- \(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
- \(\mathcal{O}\) - Observation space
- \(Z:\mathcal{S} \times \mathcal{A}\times \mathcal{S} \times \mathcal{O} \to \mathbb{R}\) - Observation probability distribution
Environment
Belief Updater
Policy/Planner
\(b\)
\(a\)
\[b_t(s) = P\left(s_t = s \mid a_1, o_1 \ldots a_{t-1}, o_{t-1}\right)\]
True State
\(s = 7\)
Observation \(o = -0.21\)
Background: POMDPs
Background: Kalman Filter
Context in Literature
- ArduSoar - Included in ArduPlane
- RL Approaches - episodic (is this actually a problem??)
- Heuristics, e.g. Reichmann rules (5.3 hour record)
- Other methods, (e.g. work by John Bird), focus on extending endurance and range with flight path planning
Problem Formulation
Assumptions:
- Thermal vertical velocity distribution (\(w\)) is Gaussian
- Thermal does not move w.r.t. surrounding air
Problem Formulation
- \(s = (s^u, s^{th}); \quad s^u = (p^u, v, \psi, \phi, \dot{\phi}, h); s^{th} = (p^{th}, W_0, R_0)\)
- \(\mathcal{A} = \{-45^\circ, -30^\circ, -15^\circ, 0^\circ, 15^\circ, 30^\circ, 45^\circ\}\) roll angle
- \(\mathcal{T}\) = vehicle dynamics, and process noise
- \(R(s, a, s') = (h_{s'} - h_s)\)
- \(\mathcal{O}\): sensor readings
- \(\mathcal{Z}\): Gaussian
No pitch because that would be "cheating"
Approach
\(R(s, a, s') = h_s - h_{s'}\)
\(R(b) = tr(cov(b))\)
- Online
- Multi-forecast model-predictive control
4 second horizon
12 second horizon
Approach
Computational Resources
- Implemented on a Pixhawk
- "32-bit ARM processor with only 168MHz clock speed and 256KB RAM"
- FPU?
- < 1s per action computation
Hardware Experiments
Radian Pro (2m wingspan)
"Nearly constant cloud cover, frequent gusty winds and rain"
Sim-to-real gap
Hardware Experiments
Typical flight time: 2000s = half an hour
Hardware Experiment Results
Critique
- Negatives:
- Doesn't actually solve the hard part of the POMDP (the exploration vs exploitation tradeoff)
- Unimodal beliefs
- Does not take information on where thermals are likely to occur into account
- Positive:
- Hardware experiments
- Controlled for many confounding factors (airframe, battery, wave lift)
Impact and Legacy
Future Work /Reading
"companion computers on larger sUAVs may be able to run a full-fledged POMDP solver in real time. Design and evaluation of a controller based on solving the thermalling POMDP near-optimally, e.g., using an approach similar to Slade et al. [36]’s, is a direction for future work."
[36] P. Slade, P. Culbertson, Z. Sunberg, and M. J. Kochen- derfer. Simultaneous active parameter estimation and control using sampling-based bayesian reinforcement learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017. URL https://arxiv.org/abs/1707.09055.
Contributions (Recap)
- Formulate the thermalling problem as a POMDP with key assumptions that simplify the problem without crippling performance.
- Propose an algorithm simple enough to run on a Pixhawk microcontroller.
- Implement POMDPSoar as an Open-source arduplane module.
- Demonstrate convincingly that POMDPSoar outperforms ArduSoar.
POMDP Thermalling
By Zachary Sunberg
POMDP Thermalling
- 347