Autonomous Thermalling as a Partially Observable Markov Decision Process

Iain Guilliard, Richard Rogahn, Jim Piavis, Andrey Kolobov

Robotics: Science and Systems, 2018

Presented by Zachary Sunberg, August 25, 2021

Motivation

Thermals can help aircraft fly with less power.

Video by Tobias Kemmerer

Motivation

Birds use thermalling extensively

Motivation

Thermals are difficult to detect.

Contributions

Formulate the thermalling problem as a POMDP with key assumptions that simplify the problem without crippling performance.
Propose an algorithm simple enough to run on a Pixhawk microcontroller.
Implement POMDPSoar as an Open-source arduplane module.
Demonstrate convincingly that POMDPSoar outperforms ArduSoar.

Background: POMDPs

\(\mathcal{S}\) - State space
\(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
\(\mathcal{A}\) - Action space
\(R:\mathcal{S}\times \mathcal{A} \to \mathbb{R}\) - Reward
\(\mathcal{O}\) - Observation space
\(Z:\mathcal{S} \times \mathcal{A}\times \mathcal{S} \times \mathcal{O} \to \mathbb{R}\) - Observation probability distribution

Environment

Belief Updater

Policy/Planner

\(b\)

\(a\)

\[b_t(s) = P\left(s_t = s \mid a_1, o_1 \ldots a_{t-1}, o_{t-1}\right)\]

True State

\(s = 7\)

Observation \(o = -0.21\)

Background: POMDPs

Background: Kalman Filter

Context in Literature

ArduSoar - Included in ArduPlane
RL Approaches - episodic (is this actually a problem??)
Heuristics, e.g. Reichmann rules (5.3 hour record)
Other methods, (e.g. work by John Bird), focus on extending endurance and range with flight path planning

Problem Formulation

Assumptions:

Thermal vertical velocity distribution (\(w\)) is Gaussian
Thermal does not move w.r.t. surrounding air

Problem Formulation

\(s = (s^u, s^{th}); \quad s^u = (p^u, v, \psi, \phi, \dot{\phi}, h); s^{th} = (p^{th}, W_0, R_0)\)
\(\mathcal{A} = \{-45^\circ, -30^\circ, -15^\circ, 0^\circ, 15^\circ, 30^\circ, 45^\circ\}\) roll angle
\(\mathcal{T}\) = vehicle dynamics, and process noise
\(R(s, a, s') = (h_{s'} - h_s)\)
\(\mathcal{O}\): sensor readings
\(\mathcal{Z}\): Gaussian

No pitch because that would be "cheating"

Approach

\(R(s, a, s') = h_s - h_{s'}\)

\(R(b) = tr(cov(b))\)

Online
Multi-forecast model-predictive control

4 second horizon

12 second horizon

Approach

Computational Resources

Implemented on a Pixhawk
"32-bit ARM processor with only 168MHz clock speed and 256KB RAM"
FPU?
< 1s per action computation

Hardware Experiments

Radian Pro (2m wingspan)

"Nearly constant cloud cover, frequent gusty winds and rain"

Sim-to-real gap

Hardware Experiments

Typical flight time: 2000s = half an hour

Hardware Experiment Results

Critique

Negatives:
- Doesn't actually solve the hard part of the POMDP (the exploration vs exploitation tradeoff)
- Unimodal beliefs
- Does not take information on where thermals are likely to occur into account
Positive:
- Hardware experiments
- Controlled for many confounding factors (airframe, battery, wave lift)

Impact and Legacy

Future Work /Reading

"companion computers on larger sUAVs may be able to run a full-fledged POMDP solver in real time. Design and evaluation of a controller based on solving the thermalling POMDP near-optimally, e.g., using an approach similar to Slade et al. [36]’s, is a direction for future work."

[36] P. Slade, P. Culbertson, Z. Sunberg, and M. J. Kochen- derfer. Simultaneous active parameter estimation and control using sampling-based bayesian reinforcement learning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017. URL https://arxiv.org/abs/1707.09055.

Contributions (Recap)

Formulate the thermalling problem as a POMDP with key assumptions that simplify the problem without crippling performance.
Propose an algorithm simple enough to run on a Pixhawk microcontroller.
Implement POMDPSoar as an Open-source arduplane module.
Demonstrate convincingly that POMDPSoar outperforms ArduSoar.

POMDP Thermalling

By Zachary Sunberg

Autonomous Thermalling as a Partially Observable Markov Decision Process

Motivation

Motivation

Motivation

Contributions

Background: POMDPs

Background: POMDPs

Background: Kalman Filter

Context in Literature

Problem Formulation

Problem Formulation

Approach

Approach

Computational Resources

Hardware Experiments

Sim-to-real gap

Hardware Experiments

Hardware Experiment Results

Critique

Impact and Legacy

Future Work /Reading

Contributions (Recap)

POMDP Thermalling

More from Zachary Sunberg