Learning step-level dynamic soaring in shear flow

Hong Liu; Jinpeng Huang; Jixin Lu; Lunbing Chen; Yang Xiang; Yufei Yin

arxiv: 2604.12413 · v1 · submitted 2026-04-14 · ⚛️ physics.flu-dyn · cs.RO

Learning step-level dynamic soaring in shear flow

Lunbing Chen , Jixin Lu , Yufei Yin , Jinpeng Huang , Yang Xiang , Hong Liu This is my paper

Pith reviewed 2026-05-10 15:23 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn cs.RO

keywords dynamic soaringshear flowreinforcement learningstate feedback controlenergy harvestingautonomous navigationbiological flight

0 comments

The pith

Dynamic soaring can emerge from step-by-step local feedback control without any explicit trajectory planning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that sustained energy-harvesting flight in wind shear does not require planning complete cycles or assuming steady flow. Instead, a control policy that reacts at each time step to immediate local measurements is sufficient. Policies trained via reinforcement learning produce robust flight across changing shear conditions and organize naturally into a two-phase pattern of turning and climbing that trades energy gain against forward progress. This structure reproduces features seen in birds and in optimal-control calculations. The result matters because it simplifies the design of autonomous vehicles that must operate in unsteady, flow-coupled environments and offers a concrete mechanism for how animals might achieve the same performance with limited sensing.

Core claim

Dynamic soaring can emerge from step-level, state-feedback control using only local sensing, without explicit trajectory planning. Deep reinforcement learning yields policies that achieve robust omnidirectional navigation across diverse shear-flow conditions. The learned behavior organizes into a structured control law coordinating turning and vertical motion, giving rise to a two-phase strategy governed by a trade-off between energy extraction and directional progress. The resulting policy generalizes across varying conditions and reproduces key features observed in biological flight and optimal-control solutions.

What carries the argument

The step-level state-feedback policy, which at each instant maps local flow and motion measurements to coordinated changes in heading and climb rate.

Load-bearing premise

The simulation environment and reward function used in reinforcement learning faithfully capture the essential physics and objectives of real dynamic soaring.

What would settle it

Flight tests in which a real glider or drone, using only onboard local wind and motion sensors and the learned reactive rule, fails to sustain net energy gain or directional progress in measured unsteady shear.

Figures

Figures reproduced from arXiv: 2604.12413 by Hong Liu, Jinpeng Huang, Jixin Lu, Lunbing Chen, Yang Xiang, Yufei Yin.

**Figure 1.** Figure 1: Problem formulation and deep reinforcement learning framework for autonomous dynamic soaring. (A) Three-dimensional trajectory of the navigation task. (B) The point-mass glider model [14]. The egocentric frame (xe, ye, z) denotes heading, left-wing, and up directions. u, v, and w represent airspeed, ground velocity, and wind velocity. The aerodynamic states are defined by pitch θ, heading ψ, and bank angl… view at source ↗

**Figure 2.** Figure 2: Emergence of a two-phase dynamic-soaring navigation strategy governed by kinetic-energy management. (A–D) Time evolution of key variables along a representative crosswind trajectory (Figure 1A): (A) airspeed u, ground-directed velocity vnet, and altitude z; (B) total energy e, kinetic energy ek, and potential energy ep; (C) pitch angle θ and heading angle ψ; (D) control actions CL and ϕ. The grey line ind… view at source ↗

**Figure 3.** Figure 3: Structured policy representation in observation space under a fixed condition (ψt = 90◦ , wref = 10 m/s, δ = 0.55 m). Columns 1–2 show a representative successful trajectory colored by ϕ and CL, with the start, target, and DS–TG transition marked by a green circle, red circle, and red cross. Columns 3–4 show occupancy-filtered heatmaps from 1, 000 successful DSphase trajectories (TG phase in Figure S3), r… view at source ↗

**Figure 4.** Figure 4: Robustness and generalization under out-of-distribution conditions. (A, C) Representative trajectory in a spatially varying wind field with coupled speed and shear variations. (B) Normalized spatial distribution of the harmonic disturbance field H(p) (defined in subsection 4.4). (D–F) Success-rate heatmaps under perturbed wind conditions, showing robust performance across variations in wind-direction scale… view at source ↗

**Figure 5.** Figure 5: Comparison of ground-speed envelopes and energy-direction trade-offs across learned, biological, and optimal strategies. (A–C) Ground-speed envelopes under different wind conditions. (A) RL policy predictions for wref = 6, 10, 18 m/s in polar coordinates. (B) Experimental envelopes derived from biological flight data [10], fitted using a generalized additive model [11], with background shading indicating d… view at source ↗

read the original abstract

Dynamic soaring enables sustained flight by extracting energy from wind shear, yet it is commonly understood as a cycle-level maneuver that assumes stable flow conditions. In realistic unsteady environments, however, such assumptions are often violated, raising the question of whether explicit cycle-level planning is necessary. Here, we show that dynamic soaring can emerge from step-level, state-feedback control using only local sensing, without explicit trajectory planning. Using deep reinforcement learning as a tool, we obtain policies that achieve robust omnidirectional navigation across diverse shear-flow conditions. The learned behavior organizes into a structured control law that coordinates turning and vertical motion, giving rise to a two-phase strategy governed by a trade-off between energy extraction and directional progress. The resulting policy generalizes across varying conditions and reproduces key features observed in biological flight and optimal-control solutions. These findings identify a feedback-based control structure underlying dynamic soaring, demonstrating that efficient energy-harvesting flight can emerge from local interactions with the flow without explicit planning, and providing insights for biological flight and autonomous systems in complex, flow-coupled environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows RL can produce a local step-level policy for dynamic soaring that self-organizes into two phases, but the abstract-level evidence leaves the emergence claim only partly supported.

read the letter

The main thing to know is that this paper uses deep reinforcement learning to train a step-level feedback controller for dynamic soaring in shear flow, and the resulting policy self-organizes into a two-phase strategy without any explicit cycle planning. What the work does well is show that local state feedback can produce robust omnidirectional navigation across different conditions. The learned behavior matches some patterns seen in birds and in optimal control solutions, which suggests the RL found something meaningful rather than just a simulation trick. Treating RL as a way to discover control structures is a reasonable move here. The soft spots are mostly around the evidence. The abstract claims success and generalization but gives no specific metrics like average energy gain, success rates, or variance across runs. There are no ablations shown for the reward function or the state inputs, so it's not clear how much the two-phase structure depends on the particular training setup. The concern that the simulation and reward might bake in the periodic behavior is worth checking – if the policy still works when you add noise or change the weights, that would help. The state vector needs to be confirmed as truly local, without global position or future information, to support the no-planning claim. How they identified the two phases from the policy also isn't detailed. This paper is aimed at people interested in energy-efficient flight for drones or the mechanics of biological soaring. It engages honestly with the literature on cycle-level planning versus local control. I think it deserves a serious referee because the idea has potential implications for both fields, even if the current version needs more quantitative support and validation against real physics. My recommendation is to send it out for peer review, with notes to the authors to add the missing details on metrics, ablations, and state definition.

Referee Report

3 major / 2 minor

Summary. The manuscript uses deep reinforcement learning to obtain step-level state-feedback policies for dynamic soaring in shear flow. It claims these policies enable robust omnidirectional navigation using only local sensing, without explicit trajectory planning; the behavior self-organizes into a two-phase strategy trading energy extraction against directional progress; the policies generalize across shear conditions and reproduce key features of biological flight and optimal-control solutions.

Significance. If substantiated, the result would be significant for fluid dynamics and control: it would demonstrate that a complex, energy-harvesting maneuver can emerge from local feedback rules rather than cycle-level planning, offering a mechanistic explanation for observed avian strategies and a design principle for autonomous vehicles in unsteady flows. The RL-based discovery of an interpretable two-phase structure is a methodological strength.

major comments (3)

[Abstract] Abstract: the central claim that policies achieve robust navigation, generalize, and reproduce biological/optimal features is stated without any quantitative metrics (success rates, energy gain per cycle, cross-condition statistics), error bars, or ablation studies, so the support for emergence of the two-phase strategy remains qualitative only.
[Methods] Methods (state and observation definitions): the manuscript does not explicitly verify that the state vector excludes global position, absolute heading, or future wind-field information; without this, the assertions of 'local sensing' and 'no explicit planning' cannot be confirmed and are load-bearing for the main claim.
[Results] Results (policy analysis): no ablation or sensitivity study is reported on the reward weights (energy extraction versus directional progress) or on added flow unsteadiness; therefore it is unclear whether the two-phase structure is an emergent property of the physics or an artifact of the specific training setup.

minor comments (2)

[Methods] Notation for the state vector and action space should be collected in a single table for clarity.
[Figures] Figure captions describing trajectories would benefit from explicit labels indicating the two phases and quantitative annotations (altitude, speed, energy).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the positive evaluation of the significance of our work and for the constructive comments. We address each major comment point by point below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that policies achieve robust navigation, generalize, and reproduce biological/optimal features is stated without any quantitative metrics (success rates, energy gain per cycle, cross-condition statistics), error bars, or ablation studies, so the support for emergence of the two-phase strategy remains qualitative only.

Authors: We agree that the abstract would be strengthened by including quantitative support for the claims. In the revised manuscript we have updated the abstract to report key metrics drawn from the results, including navigation success rates across tested conditions, mean energy gain per cycle with variability, and cross-condition generalization statistics. These additions provide a more quantitative basis for the reported emergence of the two-phase strategy while preserving the abstract's conciseness. revision: yes
Referee: [Methods] Methods (state and observation definitions): the manuscript does not explicitly verify that the state vector excludes global position, absolute heading, or future wind-field information; without this, the assertions of 'local sensing' and 'no explicit planning' cannot be confirmed and are load-bearing for the main claim.

Authors: We thank the referee for highlighting the need for explicit verification. The observation vector is defined exclusively from local quantities (body-frame velocities, local shear gradient, height relative to the shear layer, and body rates); global position, absolute heading, and any future or non-local wind information are deliberately omitted by construction. We have added a dedicated verification paragraph in the Methods section that lists the exact state components and confirms the absence of global or predictive information, thereby directly supporting the local-sensing and no-explicit-planning claims. revision: yes
Referee: [Results] Results (policy analysis): no ablation or sensitivity study is reported on the reward weights (energy extraction versus directional progress) or on added flow unsteadiness; therefore it is unclear whether the two-phase structure is an emergent property of the physics or an artifact of the specific training setup.

Authors: This is a fair observation. To demonstrate that the two-phase structure is robust rather than an artifact, we have conducted additional sensitivity analyses. We varied the relative weighting between energy-extraction and directional-progress terms over a broad range and include the resulting policy behaviors in a new supplementary figure; the two-phase organization persists. We have also evaluated the trained policies under superimposed flow unsteadiness (sinusoidal perturbations to the base shear profile) and report that the strategy remains functional with only modest performance degradation. These results are now summarized in the revised Results section and support the interpretation that the structure emerges from the physics of the problem. revision: yes

Circularity Check

0 steps flagged

No significant circularity; result emerges from RL interaction rather than definitional closure

full rationale

The paper's central result is obtained by training a deep reinforcement learning policy on a simulated shear-flow environment with a reward combining energy extraction and directional progress. The two-phase strategy is discovered through environment interaction and is not presupposed in the state representation or reward definition in a way that forces the outcome by construction. No mathematical derivation chain reduces to self-referential inputs, fitted parameters renamed as predictions, or load-bearing self-citations. The simulation acts as an independent testbed, making the emergence claim self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract provides no explicit list of free parameters or invented entities. The central claim rests on standard assumptions about the fidelity of the simulated shear-flow environment and the appropriateness of the reinforcement-learning reward function.

axioms (1)

domain assumption The numerical simulation of shear flow and vehicle dynamics is sufficiently realistic that policies learned inside it will transfer to physical conditions.
Invoked when claiming that the learned policies achieve robust navigation across diverse shear-flow conditions.

pith-pipeline@v0.9.0 · 5486 in / 1346 out tokens · 52686 ms · 2026-05-10T15:23:50.144517+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

[1]

Fast and fuel efficient? optimal use of wind by flying albatrosses.Proceedings of the Royal Society of London

Henri Weimerskirch, T Guionnet, JSSA Martin, Scott A Shaffer, and DP Costa. Fast and fuel efficient? optimal use of wind by flying albatrosses.Proceedings of the Royal Society of London. Series B: Biological Sciences, 267(1455):1869–1874, 2000

work page 2000
[2]

Gps tracking of foraging albatrosses.Science, 295(5558):1259– 1259, 2002

Henri Weimerskirch, Francesco Bonadonna, Fr´ ed´ eric Bailleul, G´ eraldine Mabille, Giacomo Dell’Omo, and Hans-Peter Lipp. Gps tracking of foraging albatrosses.Science, 295(5558):1259– 1259, 2002

work page 2002
[3]

Asymmetry hidden in birds’ tracks reveals wind, heading, and orientation ability over the ocean.Science advances, 3(9):e1700097, 2017

Yusuke Goto, Ken Yoda, and Katsufumi Sato. Asymmetry hidden in birds’ tracks reveals wind, heading, and orientation ability over the ocean.Science advances, 3(9):e1700097, 2017. 15

work page 2017
[4]

Optimization of dynamic soaring in a flap-gliding seabird affects its large-scale distribution at sea.Science advances, 8(22):eabo0200, 2022

James A Kempton, Joe Wynn, Sarah Bond, James Evry, Annette L Fayet, Natasha Gillies, Tim Guilford, Marwa Kavelaars, Ignacio Juarez-Martinez, Oliver Padget, et al. Optimization of dynamic soaring in a flap-gliding seabird affects its large-scale distribution at sea.Science advances, 8(22):eabo0200, 2022

work page 2022
[5]

The soaring of birds.Nature, 27(701):534–535, 1883

Lord Rayleigh. The soaring of birds.Nature, 27(701):534–535, 1883

work page
[6]

Experimental verification of dynamic soaring in albatrosses.Journal of Experimental Biology, 216(22):4222–4232, 2013

G Sachs, J Traugott, AP Nesterova, and F Bonadonna. Experimental verification of dynamic soaring in albatrosses.Journal of Experimental Biology, 216(22):4222–4232, 2013

work page 2013
[7]

Opportunistic soaring by birds suggests new opportunities for atmospheric energy harvesting by flying robots

Abdulghani Mohamed, Graham K Taylor, Simon Watkins, and Shane P Windsor. Opportunistic soaring by birds suggests new opportunities for atmospheric energy harvesting by flying robots. Journal of the Royal Society Interface, 19(196):20220671, 2022

work page 2022
[8]

Enabling new missions for robotic aircraft.Science, 326(5960):1642–1644, 2009

Jack W Langelaan and Nicholas Roy. Enabling new missions for robotic aircraft.Science, 326(5960):1642–1644, 2009

work page 2009
[9]

Observations and models of across-wind flight speed of the wandering albatross.Royal Society Open Science, 9(11):211364, 2022

Philip L Richardson and Ewan D Wakefield. Observations and models of across-wind flight speed of the wandering albatross.Royal Society Open Science, 9(11):211364, 2022

work page 2022
[10]

Wandering albatrosses exert high take-off effort only when both wind and waves are gentle.Elife, 12:RP87016, 2023

Leo Uesaka, Yusuke Goto, Masaru Naruoka, Henri Weimerskirch, Katsufumi Sato, and Kentaro Q Sakamoto. Wandering albatrosses exert high take-off effort only when both wind and waves are gentle.Elife, 12:RP87016, 2023

work page 2023
[11]

Albatrosses employ orientation and routing strategies similar to yacht racers.Proceedings of the National Academy of Sciences, 121(23):e2312851121, 2024

Yusuke Goto, Henri Weimerskirch, Keiichi Fukaya, Ken Yoda, Masaru Naruoka, and Katsufumi Sato. Albatrosses employ orientation and routing strategies similar to yacht racers.Proceedings of the National Academy of Sciences, 121(23):e2312851121, 2024

work page 2024
[12]

Minimum shear wind strength required for dynamic soaring of albatrosses.Ibis, 147(1):1–10, 2005

Gottfried Sachs. Minimum shear wind strength required for dynamic soaring of albatrosses.Ibis, 147(1):1–10, 2005

work page 2005
[13]

Engineless unmanned aerial vehicle propulsion by dynamic soaring.Journal of guidance, control, and dynamics, 32(5):1446–1457, 2009

Markus Deittert, Arthur Richards, Chris A Toomer, and Anthony Pipe. Engineless unmanned aerial vehicle propulsion by dynamic soaring.Journal of guidance, control, and dynamics, 32(5):1446–1457, 2009

work page 2009
[14]

Optimal dynamic soaring trades off energy harvest and directional flight.iScience, 28(6), 2025

Lunbing Chen, Yufei Yin, Yang Xiang, Suyang Qin, and Hong Liu. Optimal dynamic soaring trades off energy harvest and directional flight.iScience, 28(6), 2025

work page 2025
[15]

Soaring energetics and glide performance in a moving atmosphere.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704):20150398, 2016

Graham K Taylor, Kate V Reynolds, and Adrian LR Thomas. Soaring energetics and glide performance in a moving atmosphere.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704):20150398, 2016

work page 2016
[16]

Optimal dynamic soar- ing consists of successive shallow arcs.Journal of The Royal Society Interface, 14(135):20170496, 2017

Gabriel D Bousquet, Michael S Triantafyllou, and Jean-Jacques E Slotine. Optimal dynamic soar- ing consists of successive shallow arcs.Journal of The Royal Society Interface, 14(135):20170496, 2017

work page 2017
[17]

Kinetic energy in dynamic soaring—inertial speed and airspeed.Journal of Guidance, Control, and Dynamics, 42(8):1812–1821, 2019

Gottfried Sachs. Kinetic energy in dynamic soaring—inertial speed and airspeed.Journal of Guidance, Control, and Dynamics, 42(8):1812–1821, 2019

work page 2019
[18]

Dynamic soaring in finite-thickness wind shears: an asymptotic solution

Gabriel D Bousquet, Michael S Triantafyllou, and Jean-Jacques E Slotine. Dynamic soaring in finite-thickness wind shears: an asymptotic solution. InAIAA Guidance, Navigation, and Control Conference, page 1908, 2017

work page 1908
[19]

Towards Robust Optimization-Based Autonomous Dynamic Soaring with a Fixed-Wing UAV

Marvin Harms, Jaeyoung Lim, David Rohr, Friedrich Rockenbauer, Nicholas Lawrance, and Roland Siegwart. Robust optimization-based autonomous dynamic soaring with a fixed-wing uav.arXiv preprint arXiv:2512.06610, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[20]

Novel approach to dynamic soaring modeling and simulation.Journal of Guidance, Control, and Dynamics, 42(6):1250–1260, 2019

Jean-Marie Kai, Tarek Hamel, and Claude Samson. Novel approach to dynamic soaring modeling and simulation.Journal of Guidance, Control, and Dynamics, 42(6):1250–1260, 2019

work page 2019
[21]

Wind field estima- tion for autonomous dynamic soaring

Jack W Langelaan, John Spletzer, Corey Montella, and Joachim Grenestedt. Wind field estima- tion for autonomous dynamic soaring. In2012 IEEE International conference on robotics and automation, pages 16–22. IEEE, 2012. 16

work page 2012
[22]

Physics and modeling of large flow dis- turbances: discrete gust encounters for modern air vehicles.Annual Review of Fluid Mechanics, 54(1):469–493, 2022

Anya R Jones, Oksan Cetiner, and Marilyn J Smith. Physics and modeling of large flow dis- turbances: discrete gust encounters for modern air vehicles.Annual Review of Fluid Mechanics, 54(1):469–493, 2022

work page 2022
[23]

Closing the loop in dynamic soaring

John J Bird, Jack W Langelaan, Corey Montella, John Spletzer, and Joachim L Grenestedt. Closing the loop in dynamic soaring. InAIAA Guidance, Navigation, and Control Conference, page 0263, 2014

work page 2014
[24]

Dynamic soaring under differ- ent atmospheric stability conditions.Journal of Guidance, Control, and Dynamics, 46(5):970–977, 2023

Haichao Hong, Luoqin Liu, Florian Holzapfel, and Gottfried Sachs. Dynamic soaring under differ- ent atmospheric stability conditions.Journal of Guidance, Control, and Dynamics, 46(5):970–977, 2023

work page 2023
[25]

Flight testing of dynamic soaring part-2: Open-field inclined circle trajectory

Murat Bronz, Nikola Gavrilovic, Antoine Drouin, Gautier Hattenberger, and Jean-Marc Moschetta. Flight testing of dynamic soaring part-2: Open-field inclined circle trajectory. In AIAA Aviation 2021 Forum, page 2803, 2021

work page 2021
[26]

Learning to soar in turbulent environments.Proceedings of the National Academy of Sciences, 113(33):E4877– E4884, 2016

Gautam Reddy, Antonio Celani, Terrence J Sejnowski, and Massimo Vergassola. Learning to soar in turbulent environments.Proceedings of the National Academy of Sciences, 113(33):E4877– E4884, 2016

work page 2016
[27]

Learning efficient navigation in vortical flow fields.Nature communications, 12(1):7143, 2021

Peter Gunnarson, Ioannis Mandralis, Guido Novati, Petros Koumoutsakos, and John O Dabiri. Learning efficient navigation in vortical flow fields.Nature communications, 12(1):7143, 2021

work page 2021
[28]

Sensing flow gradients is necessary for learning autonomous underwater navigation.Nature Communications, 16(1):3044, 2025

Yusheng Jiao, Haotian Hang, Josh Merel, and Eva Kanso. Sensing flow gradients is necessary for learning autonomous underwater navigation.Nature Communications, 16(1):3044, 2025

work page 2025
[29]

Hierarchical reinforcement learning approach for autonomous cross-country soaring.Journal of Guidance, Control, and Dynamics, 46(1):114–126, 2023

Stefan Notter, Fabian Schimpf, Gregor M¨ uller, and Walter Fichter. Hierarchical reinforcement learning approach for autonomous cross-country soaring.Journal of Guidance, Control, and Dynamics, 46(1):114–126, 2023

work page 2023
[30]

Towards development of a dynamic soaring capable uav using reinforcement learning

Jacob R Adamski, Vladimir V Golubev, Snorri Gudmundsson, and Fedor Kuznetsov. Towards development of a dynamic soaring capable uav using reinforcement learning. InAIAA AVIATION 2023 Forum, page 4455, 2023

work page 2023
[31]

Revealing principles of au- tonomous thermal soaring in windy conditions using vulture-inspired deep reinforcement-learning

Yoav Flato, Roi Harel, Aviv Tamar, Ran Nathan, and Tsevi Beatus. Revealing principles of au- tonomous thermal soaring in windy conditions using vulture-inspired deep reinforcement-learning. Nature Communications, 15(1):4942, 2024

work page 2024
[32]

Larval zebrafish minimize energy consumption during hunting via adaptive movement selection.Proceedings of the National Academy of Sciences, 123(7):e2513853123, 2026

Thomas Darveniza, Robert Wong, Shuyu I Zhu, Zac Pujic, Biao Sun, Matthew Levendosky, Ramesh Agarwal, Michael H McCullough, and Geoffrey J Goodhill. Larval zebrafish minimize energy consumption during hunting via adaptive movement selection.Proceedings of the National Academy of Sciences, 123(7):e2513853123, 2026

work page 2026
[33]

Reinforcement learning for autonomous dynamic soaring in shear winds

Corey Montella and John R Spletzer. Reinforcement learning for autonomous dynamic soaring in shear winds. In2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3423–3428. IEEE, 2014

work page 2014
[34]

Efficient collective swimming by harnessing vortices through deep reinforcement learning.Proceedings of the National Academy of Sciences, 115(23):5849–5854, 2018

Siddhartha Verma, Guido Novati, and Petros Koumoutsakos. Efficient collective swimming by harnessing vortices through deep reinforcement learning.Proceedings of the National Academy of Sciences, 115(23):5849–5854, 2018

work page 2018
[35]

A comprehensive assessment to the potential of reinforcement learning in dynamic soaring

Sara Abozeid, Sameer Pokhrel, and Sameh Eisa. A comprehensive assessment to the potential of reinforcement learning in dynamic soaring. InAIAA SCITECH 2023 Forum, page 2236, 2023

work page 2023
[36]

A framework for developing robust, autonomous, power man- aged dynamic soaring flight controllers using deep reinforcement learning

Milo F DiPaola and Tyler F Barkin. A framework for developing robust, autonomous, power man- aged dynamic soaring flight controllers using deep reinforcement learning. InAIAA AVIATION 2023 Forum, page 4046, 2023

work page 2023
[37]

Dynamic soaring in uavs: a deep reinforcement learning approach.The Aeronautical Journal, pages 1–29, 2026

Mishma Akhtar, Adnan Maqsood, Imran Mir, and Baris Gungordu. Dynamic soaring in uavs: a deep reinforcement learning approach.The Aeronautical Journal, pages 1–29, 2026. 17

work page 2026
[38]

How did extinct giant birds and pterosaurs fly? a comprehensive modeling approach to evaluate soaring performance.PNAS nexus, 1(1):pgac023, 2022

Yusuke Goto, Ken Yoda, Henri Weimerskirch, and Katsufumi Sato. How did extinct giant birds and pterosaurs fly? a comprehensive modeling approach to evaluate soaring performance.PNAS nexus, 1(1):pgac023, 2022

work page 2022
[39]

Optimal patterns of glider dynamic soaring.Optimal control applications and methods, 25(2):67–89, 2004

Yiyuan J Zhao. Optimal patterns of glider dynamic soaring.Optimal control applications and methods, 25(2):67–89, 2004

work page 2004
[40]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

work page 1998
[41]

McGraw hill, 2011

John Anderson.EBOOK: Fundamentals of Aerodynamics (SI units). McGraw hill, 2011

work page 2011
[42]

Flying at no mechanical energy cost: disclosing the secret of wandering albatrosses

Gottfried Sachs, Johannes Traugott, Anna P Nesterova, Giacomo Dell’Omo, Franz K¨ ummeth, Wolfgang Heidrich, Alexei L Vyssotski, and Francesco Bonadonna. Flying at no mechanical energy cost: disclosing the secret of wandering albatrosses. 2012

work page 2012
[43]

Flight speed and performance of the wandering albatross with respect to wind.Movement ecology, 6(1):3, 2018

Philip L Richardson, Ewan D Wakefield, and Richard A Phillips. Flight speed and performance of the wandering albatross with respect to wind.Movement ecology, 6(1):3, 2018

work page 2018
[44]

Springer Science & Business Media, 2012

Roland B Stull.An introduction to boundary layer meteorology. Springer Science & Business Media, 2012

work page 2012
[45]

Miniature multihole airflow sensor for lightweight aircraft over wide speed and angular range.IEEE Robotics and Automation Letters, 2025

Lukas Stuber, Simon Luis Jeger, Raphael Zufferey, and Dario Floreano. Miniature multihole airflow sensor for lightweight aircraft over wide speed and angular range.IEEE Robotics and Automation Letters, 2025

work page 2025
[46]

Evolutionary trade-offs, pareto optimality, and the geometry of pheno- type space.Science, 336(6085):1157–1160, 2012

Oren Shoval, Hila Sheftel, Guy Shinar, Yuval Hart, Omer Ramote, Avi Mayo, Erez Dekel, Kathryn Kavanagh, and Uri Alon. Evolutionary trade-offs, pareto optimality, and the geometry of pheno- type space.Science, 336(6085):1157–1160, 2012

work page 2012
[47]

Continuous control with deep reinforcement learning, September 15 2020

Timothy Paul Lillicrap, Jonathan James Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daniel Pieter Wierstra. Continuous control with deep reinforcement learning, September 15 2020. US Patent 10,776,692

work page 2020
[48]

Sim-to-real transfer of robotic control with dynamics randomization

Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automation (ICRA), pages 3803–3810. IEEE, 2018

work page 2018
[49]

Optimal feedback control as a theory of motor coordi- nation.Nature neuroscience, 5(11):1226–1235, 2002

Emanuel Todorov and Michael I Jordan. Optimal feedback control as a theory of motor coordi- nation.Nature neuroscience, 5(11):1226–1235, 2002

work page 2002
[50]

Yoshinari Yonehara, Yusuke Goto, Ken Yoda, Yutaka Watanuki, Lindsay C Young, Henri Weimer- skirch, Charles-Andr´ e Bost, and Katsufumi Sato. Flight paths of seabirds soaring over the ocean surface enable measurement of fine-scale wind speed and direction.Proceedings of the National Academy of Sciences, 113(32):9039–9044, 2016

work page 2016
[51]

Wing-strain-based flight control of flapping-wing drones through reinforcement learning.Nature Machine Intelligence, 6(9):992–1005, 2024

Taewi Kim, Insic Hong, Sunghoon Im, Seungeun Rho, Minho Kim, Yeonwook Roh, Changhwan Kim, Jieun Park, Daseul Lim, Doohoe Lee, et al. Wing-strain-based flight control of flapping-wing drones through reinforcement learning.Nature Machine Intelligence, 6(9):992–1005, 2024

work page 2024
[52]

Flap or soar? how a flight generalist responds to its aerial environment.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704), 2016

Judy Shamoun-Baranes, Willem Bouten, E Emiel Van Loon, Christiaan Meijer, and CJ Cam- phuysen. Flap or soar? how a flight generalist responds to its aerial environment.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704), 2016

work page 2016
[53]

Glider soaring via reinforcement learning in the field.Nature, 562(7726):236–239, 2018

Gautam Reddy, Jerome Wong-Ng, Antonio Celani, Terrence J Sejnowski, and Massimo Vergassola. Glider soaring via reinforcement learning in the field.Nature, 562(7726):236–239, 2018

work page 2018
[54]

Wind, waves, and surface currents in the southern ocean: observations from the antarctic circumnaviga- tion expedition.Earth System Science Data Discussions, 2020:1–22, 2020

Marzieh H Derkani, Alberto Alberello, Filippo Nelli, Luke G Bennetts, Katrin G Hessner, Keith MacHutchon, Konny Reichert, Lotfi Aouf, Salman Saeed Khan, and Alessandro Toffoli. Wind, waves, and surface currents in the southern ocean: observations from the antarctic circumnaviga- tion expedition.Earth System Science Data Discussions, 2020:1–22, 2020. 18

work page 2020
[55]

Gust soaring as a basis for the flight of petrels and albatrosses (procellari- iformes).Avian Science, 2:1–12, 2002

Colin J Pennycuick. Gust soaring as a basis for the flight of petrels and albatrosses (procellari- iformes).Avian Science, 2:1–12, 2002

work page 2002
[56]

Direct observations of airflow separation over ocean surface waves.Nature Communications, 16(1):5526, 2025

Marc P Buckley, Jochen Horstmann, Ivan Savelyev, and Jeff R Carpenter. Direct observations of airflow separation over ocean surface waves.Nature Communications, 16(1):5526, 2025

work page 2025
[57]

Application of reinforcement learning for autonomous dynamic soaring

Sungje Park, Adrian Fanjoy, and Vladimir V Golubev. Application of reinforcement learning for autonomous dynamic soaring. InAIAA SCITECH 2025 Forum, page 2290, 2025

work page 2025
[58]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018

work page 2018
[59]

Curriculum learning

Yoshua Bengio, J´ erˆ ome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009

work page 2009
[60]

Domain randomization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–

work page
[61]

The chorus-line hypothesis of manoeuvre coordination in avian flocks.Nature, 309(5966):344–345, 1984

Wayne K Potts. The chorus-line hypothesis of manoeuvre coordination in avian flocks.Nature, 309(5966):344–345, 1984

work page 1984
[62]

Laboratory determination of startle reaction time of the starling (sturnus vulgaris).Animal Behaviour, 25:720–725, 1977

Harold Pomeroy and Frank Heppner. Laboratory determination of startle reaction time of the starling (sturnus vulgaris).Animal Behaviour, 25:720–725, 1977

work page 1977
[63]

Design of a bio-inspired controller for dynamic soaring in a simulated unmanned aerial vehicle.Bioinspiration & biomimetics, 1(3):76, 2006

Renaud Barate, St´ ephane Doncieux, and Jean-Arcady Meyer. Design of a bio-inspired controller for dynamic soaring in a simulated unmanned aerial vehicle.Bioinspiration & biomimetics, 1(3):76, 2006

work page 2006
[64]

Long-distance navigation and magnetoreception in migratory animals.Nature, 558(7708):50–59, 2018

Henrik Mouritsen. Long-distance navigation and magnetoreception in migratory animals.Nature, 558(7708):50–59, 2018

work page 2018
[65]

Aiaa, 2003

Thomas R Yechout.Introduction to aircraft flight mechanics: performance, static stability, dy- namic stability, and classical feedback control. Aiaa, 2003

work page 2003
[66]

Fixed-wing mav attitude stability in atmospheric turbulence—part 2: Investigating biologically-inspired sensors.Progress in Aerospace Sciences, 71:1–13, 2014

Abdulghani Mohamed, Simon Watkins, Reece Clothier, Mujahid Abdulrahim, Kevin Massey, and Roberto Sabatini. Fixed-wing mav attitude stability in atmospheric turbulence—part 2: Investigating biologically-inspired sensors.Progress in Aerospace Sciences, 71:1–13, 2014

work page 2014
[67]

Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022

Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022

work page 2022
[68]

Rectifier nonlinearities improve neural network acoustic models

Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. InProc. icml, volume 30, page 3. Atlanta, GA, 2013. 19 Supplementary Material No.NN Actor NNCritic Training SR Test SR 1 [512,512,512] [512,512,512] 95.5%±0.7% 97.3%±0.8% 2 [512,512] [512,512] 82.6%±6.4% 82.6%±2.5% 3 [256,256,256] [256,256,25...

work page 2013

[1] [1]

Fast and fuel efficient? optimal use of wind by flying albatrosses.Proceedings of the Royal Society of London

Henri Weimerskirch, T Guionnet, JSSA Martin, Scott A Shaffer, and DP Costa. Fast and fuel efficient? optimal use of wind by flying albatrosses.Proceedings of the Royal Society of London. Series B: Biological Sciences, 267(1455):1869–1874, 2000

work page 2000

[2] [2]

Gps tracking of foraging albatrosses.Science, 295(5558):1259– 1259, 2002

Henri Weimerskirch, Francesco Bonadonna, Fr´ ed´ eric Bailleul, G´ eraldine Mabille, Giacomo Dell’Omo, and Hans-Peter Lipp. Gps tracking of foraging albatrosses.Science, 295(5558):1259– 1259, 2002

work page 2002

[3] [3]

Asymmetry hidden in birds’ tracks reveals wind, heading, and orientation ability over the ocean.Science advances, 3(9):e1700097, 2017

Yusuke Goto, Ken Yoda, and Katsufumi Sato. Asymmetry hidden in birds’ tracks reveals wind, heading, and orientation ability over the ocean.Science advances, 3(9):e1700097, 2017. 15

work page 2017

[4] [4]

Optimization of dynamic soaring in a flap-gliding seabird affects its large-scale distribution at sea.Science advances, 8(22):eabo0200, 2022

James A Kempton, Joe Wynn, Sarah Bond, James Evry, Annette L Fayet, Natasha Gillies, Tim Guilford, Marwa Kavelaars, Ignacio Juarez-Martinez, Oliver Padget, et al. Optimization of dynamic soaring in a flap-gliding seabird affects its large-scale distribution at sea.Science advances, 8(22):eabo0200, 2022

work page 2022

[5] [5]

The soaring of birds.Nature, 27(701):534–535, 1883

Lord Rayleigh. The soaring of birds.Nature, 27(701):534–535, 1883

work page

[6] [6]

Experimental verification of dynamic soaring in albatrosses.Journal of Experimental Biology, 216(22):4222–4232, 2013

G Sachs, J Traugott, AP Nesterova, and F Bonadonna. Experimental verification of dynamic soaring in albatrosses.Journal of Experimental Biology, 216(22):4222–4232, 2013

work page 2013

[7] [7]

Opportunistic soaring by birds suggests new opportunities for atmospheric energy harvesting by flying robots

Abdulghani Mohamed, Graham K Taylor, Simon Watkins, and Shane P Windsor. Opportunistic soaring by birds suggests new opportunities for atmospheric energy harvesting by flying robots. Journal of the Royal Society Interface, 19(196):20220671, 2022

work page 2022

[8] [8]

Enabling new missions for robotic aircraft.Science, 326(5960):1642–1644, 2009

Jack W Langelaan and Nicholas Roy. Enabling new missions for robotic aircraft.Science, 326(5960):1642–1644, 2009

work page 2009

[9] [9]

Observations and models of across-wind flight speed of the wandering albatross.Royal Society Open Science, 9(11):211364, 2022

Philip L Richardson and Ewan D Wakefield. Observations and models of across-wind flight speed of the wandering albatross.Royal Society Open Science, 9(11):211364, 2022

work page 2022

[10] [10]

Wandering albatrosses exert high take-off effort only when both wind and waves are gentle.Elife, 12:RP87016, 2023

Leo Uesaka, Yusuke Goto, Masaru Naruoka, Henri Weimerskirch, Katsufumi Sato, and Kentaro Q Sakamoto. Wandering albatrosses exert high take-off effort only when both wind and waves are gentle.Elife, 12:RP87016, 2023

work page 2023

[11] [11]

Albatrosses employ orientation and routing strategies similar to yacht racers.Proceedings of the National Academy of Sciences, 121(23):e2312851121, 2024

Yusuke Goto, Henri Weimerskirch, Keiichi Fukaya, Ken Yoda, Masaru Naruoka, and Katsufumi Sato. Albatrosses employ orientation and routing strategies similar to yacht racers.Proceedings of the National Academy of Sciences, 121(23):e2312851121, 2024

work page 2024

[12] [12]

Minimum shear wind strength required for dynamic soaring of albatrosses.Ibis, 147(1):1–10, 2005

Gottfried Sachs. Minimum shear wind strength required for dynamic soaring of albatrosses.Ibis, 147(1):1–10, 2005

work page 2005

[13] [13]

Engineless unmanned aerial vehicle propulsion by dynamic soaring.Journal of guidance, control, and dynamics, 32(5):1446–1457, 2009

Markus Deittert, Arthur Richards, Chris A Toomer, and Anthony Pipe. Engineless unmanned aerial vehicle propulsion by dynamic soaring.Journal of guidance, control, and dynamics, 32(5):1446–1457, 2009

work page 2009

[14] [14]

Optimal dynamic soaring trades off energy harvest and directional flight.iScience, 28(6), 2025

Lunbing Chen, Yufei Yin, Yang Xiang, Suyang Qin, and Hong Liu. Optimal dynamic soaring trades off energy harvest and directional flight.iScience, 28(6), 2025

work page 2025

[15] [15]

Soaring energetics and glide performance in a moving atmosphere.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704):20150398, 2016

Graham K Taylor, Kate V Reynolds, and Adrian LR Thomas. Soaring energetics and glide performance in a moving atmosphere.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704):20150398, 2016

work page 2016

[16] [16]

Optimal dynamic soar- ing consists of successive shallow arcs.Journal of The Royal Society Interface, 14(135):20170496, 2017

Gabriel D Bousquet, Michael S Triantafyllou, and Jean-Jacques E Slotine. Optimal dynamic soar- ing consists of successive shallow arcs.Journal of The Royal Society Interface, 14(135):20170496, 2017

work page 2017

[17] [17]

Kinetic energy in dynamic soaring—inertial speed and airspeed.Journal of Guidance, Control, and Dynamics, 42(8):1812–1821, 2019

Gottfried Sachs. Kinetic energy in dynamic soaring—inertial speed and airspeed.Journal of Guidance, Control, and Dynamics, 42(8):1812–1821, 2019

work page 2019

[18] [18]

Dynamic soaring in finite-thickness wind shears: an asymptotic solution

Gabriel D Bousquet, Michael S Triantafyllou, and Jean-Jacques E Slotine. Dynamic soaring in finite-thickness wind shears: an asymptotic solution. InAIAA Guidance, Navigation, and Control Conference, page 1908, 2017

work page 1908

[19] [19]

Towards Robust Optimization-Based Autonomous Dynamic Soaring with a Fixed-Wing UAV

Marvin Harms, Jaeyoung Lim, David Rohr, Friedrich Rockenbauer, Nicholas Lawrance, and Roland Siegwart. Robust optimization-based autonomous dynamic soaring with a fixed-wing uav.arXiv preprint arXiv:2512.06610, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[20] [20]

Novel approach to dynamic soaring modeling and simulation.Journal of Guidance, Control, and Dynamics, 42(6):1250–1260, 2019

Jean-Marie Kai, Tarek Hamel, and Claude Samson. Novel approach to dynamic soaring modeling and simulation.Journal of Guidance, Control, and Dynamics, 42(6):1250–1260, 2019

work page 2019

[21] [21]

Wind field estima- tion for autonomous dynamic soaring

Jack W Langelaan, John Spletzer, Corey Montella, and Joachim Grenestedt. Wind field estima- tion for autonomous dynamic soaring. In2012 IEEE International conference on robotics and automation, pages 16–22. IEEE, 2012. 16

work page 2012

[22] [22]

Physics and modeling of large flow dis- turbances: discrete gust encounters for modern air vehicles.Annual Review of Fluid Mechanics, 54(1):469–493, 2022

Anya R Jones, Oksan Cetiner, and Marilyn J Smith. Physics and modeling of large flow dis- turbances: discrete gust encounters for modern air vehicles.Annual Review of Fluid Mechanics, 54(1):469–493, 2022

work page 2022

[23] [23]

Closing the loop in dynamic soaring

John J Bird, Jack W Langelaan, Corey Montella, John Spletzer, and Joachim L Grenestedt. Closing the loop in dynamic soaring. InAIAA Guidance, Navigation, and Control Conference, page 0263, 2014

work page 2014

[24] [24]

Dynamic soaring under differ- ent atmospheric stability conditions.Journal of Guidance, Control, and Dynamics, 46(5):970–977, 2023

Haichao Hong, Luoqin Liu, Florian Holzapfel, and Gottfried Sachs. Dynamic soaring under differ- ent atmospheric stability conditions.Journal of Guidance, Control, and Dynamics, 46(5):970–977, 2023

work page 2023

[25] [25]

Flight testing of dynamic soaring part-2: Open-field inclined circle trajectory

Murat Bronz, Nikola Gavrilovic, Antoine Drouin, Gautier Hattenberger, and Jean-Marc Moschetta. Flight testing of dynamic soaring part-2: Open-field inclined circle trajectory. In AIAA Aviation 2021 Forum, page 2803, 2021

work page 2021

[26] [26]

Learning to soar in turbulent environments.Proceedings of the National Academy of Sciences, 113(33):E4877– E4884, 2016

Gautam Reddy, Antonio Celani, Terrence J Sejnowski, and Massimo Vergassola. Learning to soar in turbulent environments.Proceedings of the National Academy of Sciences, 113(33):E4877– E4884, 2016

work page 2016

[27] [27]

Learning efficient navigation in vortical flow fields.Nature communications, 12(1):7143, 2021

Peter Gunnarson, Ioannis Mandralis, Guido Novati, Petros Koumoutsakos, and John O Dabiri. Learning efficient navigation in vortical flow fields.Nature communications, 12(1):7143, 2021

work page 2021

[28] [28]

Sensing flow gradients is necessary for learning autonomous underwater navigation.Nature Communications, 16(1):3044, 2025

Yusheng Jiao, Haotian Hang, Josh Merel, and Eva Kanso. Sensing flow gradients is necessary for learning autonomous underwater navigation.Nature Communications, 16(1):3044, 2025

work page 2025

[29] [29]

Hierarchical reinforcement learning approach for autonomous cross-country soaring.Journal of Guidance, Control, and Dynamics, 46(1):114–126, 2023

Stefan Notter, Fabian Schimpf, Gregor M¨ uller, and Walter Fichter. Hierarchical reinforcement learning approach for autonomous cross-country soaring.Journal of Guidance, Control, and Dynamics, 46(1):114–126, 2023

work page 2023

[30] [30]

Towards development of a dynamic soaring capable uav using reinforcement learning

Jacob R Adamski, Vladimir V Golubev, Snorri Gudmundsson, and Fedor Kuznetsov. Towards development of a dynamic soaring capable uav using reinforcement learning. InAIAA AVIATION 2023 Forum, page 4455, 2023

work page 2023

[31] [31]

Revealing principles of au- tonomous thermal soaring in windy conditions using vulture-inspired deep reinforcement-learning

Yoav Flato, Roi Harel, Aviv Tamar, Ran Nathan, and Tsevi Beatus. Revealing principles of au- tonomous thermal soaring in windy conditions using vulture-inspired deep reinforcement-learning. Nature Communications, 15(1):4942, 2024

work page 2024

[32] [32]

Larval zebrafish minimize energy consumption during hunting via adaptive movement selection.Proceedings of the National Academy of Sciences, 123(7):e2513853123, 2026

Thomas Darveniza, Robert Wong, Shuyu I Zhu, Zac Pujic, Biao Sun, Matthew Levendosky, Ramesh Agarwal, Michael H McCullough, and Geoffrey J Goodhill. Larval zebrafish minimize energy consumption during hunting via adaptive movement selection.Proceedings of the National Academy of Sciences, 123(7):e2513853123, 2026

work page 2026

[33] [33]

Reinforcement learning for autonomous dynamic soaring in shear winds

Corey Montella and John R Spletzer. Reinforcement learning for autonomous dynamic soaring in shear winds. In2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3423–3428. IEEE, 2014

work page 2014

[34] [34]

Efficient collective swimming by harnessing vortices through deep reinforcement learning.Proceedings of the National Academy of Sciences, 115(23):5849–5854, 2018

Siddhartha Verma, Guido Novati, and Petros Koumoutsakos. Efficient collective swimming by harnessing vortices through deep reinforcement learning.Proceedings of the National Academy of Sciences, 115(23):5849–5854, 2018

work page 2018

[35] [35]

A comprehensive assessment to the potential of reinforcement learning in dynamic soaring

Sara Abozeid, Sameer Pokhrel, and Sameh Eisa. A comprehensive assessment to the potential of reinforcement learning in dynamic soaring. InAIAA SCITECH 2023 Forum, page 2236, 2023

work page 2023

[36] [36]

A framework for developing robust, autonomous, power man- aged dynamic soaring flight controllers using deep reinforcement learning

Milo F DiPaola and Tyler F Barkin. A framework for developing robust, autonomous, power man- aged dynamic soaring flight controllers using deep reinforcement learning. InAIAA AVIATION 2023 Forum, page 4046, 2023

work page 2023

[37] [37]

Dynamic soaring in uavs: a deep reinforcement learning approach.The Aeronautical Journal, pages 1–29, 2026

Mishma Akhtar, Adnan Maqsood, Imran Mir, and Baris Gungordu. Dynamic soaring in uavs: a deep reinforcement learning approach.The Aeronautical Journal, pages 1–29, 2026. 17

work page 2026

[38] [38]

How did extinct giant birds and pterosaurs fly? a comprehensive modeling approach to evaluate soaring performance.PNAS nexus, 1(1):pgac023, 2022

Yusuke Goto, Ken Yoda, Henri Weimerskirch, and Katsufumi Sato. How did extinct giant birds and pterosaurs fly? a comprehensive modeling approach to evaluate soaring performance.PNAS nexus, 1(1):pgac023, 2022

work page 2022

[39] [39]

Optimal patterns of glider dynamic soaring.Optimal control applications and methods, 25(2):67–89, 2004

Yiyuan J Zhao. Optimal patterns of glider dynamic soaring.Optimal control applications and methods, 25(2):67–89, 2004

work page 2004

[40] [40]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

work page 1998

[41] [41]

McGraw hill, 2011

John Anderson.EBOOK: Fundamentals of Aerodynamics (SI units). McGraw hill, 2011

work page 2011

[42] [42]

Flying at no mechanical energy cost: disclosing the secret of wandering albatrosses

Gottfried Sachs, Johannes Traugott, Anna P Nesterova, Giacomo Dell’Omo, Franz K¨ ummeth, Wolfgang Heidrich, Alexei L Vyssotski, and Francesco Bonadonna. Flying at no mechanical energy cost: disclosing the secret of wandering albatrosses. 2012

work page 2012

[43] [43]

Flight speed and performance of the wandering albatross with respect to wind.Movement ecology, 6(1):3, 2018

Philip L Richardson, Ewan D Wakefield, and Richard A Phillips. Flight speed and performance of the wandering albatross with respect to wind.Movement ecology, 6(1):3, 2018

work page 2018

[44] [44]

Springer Science & Business Media, 2012

Roland B Stull.An introduction to boundary layer meteorology. Springer Science & Business Media, 2012

work page 2012

[45] [45]

Miniature multihole airflow sensor for lightweight aircraft over wide speed and angular range.IEEE Robotics and Automation Letters, 2025

Lukas Stuber, Simon Luis Jeger, Raphael Zufferey, and Dario Floreano. Miniature multihole airflow sensor for lightweight aircraft over wide speed and angular range.IEEE Robotics and Automation Letters, 2025

work page 2025

[46] [46]

Evolutionary trade-offs, pareto optimality, and the geometry of pheno- type space.Science, 336(6085):1157–1160, 2012

Oren Shoval, Hila Sheftel, Guy Shinar, Yuval Hart, Omer Ramote, Avi Mayo, Erez Dekel, Kathryn Kavanagh, and Uri Alon. Evolutionary trade-offs, pareto optimality, and the geometry of pheno- type space.Science, 336(6085):1157–1160, 2012

work page 2012

[47] [47]

Continuous control with deep reinforcement learning, September 15 2020

Timothy Paul Lillicrap, Jonathan James Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daniel Pieter Wierstra. Continuous control with deep reinforcement learning, September 15 2020. US Patent 10,776,692

work page 2020

[48] [48]

Sim-to-real transfer of robotic control with dynamics randomization

Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automation (ICRA), pages 3803–3810. IEEE, 2018

work page 2018

[49] [49]

Optimal feedback control as a theory of motor coordi- nation.Nature neuroscience, 5(11):1226–1235, 2002

Emanuel Todorov and Michael I Jordan. Optimal feedback control as a theory of motor coordi- nation.Nature neuroscience, 5(11):1226–1235, 2002

work page 2002

[50] [50]

Yoshinari Yonehara, Yusuke Goto, Ken Yoda, Yutaka Watanuki, Lindsay C Young, Henri Weimer- skirch, Charles-Andr´ e Bost, and Katsufumi Sato. Flight paths of seabirds soaring over the ocean surface enable measurement of fine-scale wind speed and direction.Proceedings of the National Academy of Sciences, 113(32):9039–9044, 2016

work page 2016

[51] [51]

Wing-strain-based flight control of flapping-wing drones through reinforcement learning.Nature Machine Intelligence, 6(9):992–1005, 2024

Taewi Kim, Insic Hong, Sunghoon Im, Seungeun Rho, Minho Kim, Yeonwook Roh, Changhwan Kim, Jieun Park, Daseul Lim, Doohoe Lee, et al. Wing-strain-based flight control of flapping-wing drones through reinforcement learning.Nature Machine Intelligence, 6(9):992–1005, 2024

work page 2024

[52] [52]

Flap or soar? how a flight generalist responds to its aerial environment.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704), 2016

Judy Shamoun-Baranes, Willem Bouten, E Emiel Van Loon, Christiaan Meijer, and CJ Cam- phuysen. Flap or soar? how a flight generalist responds to its aerial environment.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704), 2016

work page 2016

[53] [53]

Glider soaring via reinforcement learning in the field.Nature, 562(7726):236–239, 2018

Gautam Reddy, Jerome Wong-Ng, Antonio Celani, Terrence J Sejnowski, and Massimo Vergassola. Glider soaring via reinforcement learning in the field.Nature, 562(7726):236–239, 2018

work page 2018

[54] [54]

Wind, waves, and surface currents in the southern ocean: observations from the antarctic circumnaviga- tion expedition.Earth System Science Data Discussions, 2020:1–22, 2020

Marzieh H Derkani, Alberto Alberello, Filippo Nelli, Luke G Bennetts, Katrin G Hessner, Keith MacHutchon, Konny Reichert, Lotfi Aouf, Salman Saeed Khan, and Alessandro Toffoli. Wind, waves, and surface currents in the southern ocean: observations from the antarctic circumnaviga- tion expedition.Earth System Science Data Discussions, 2020:1–22, 2020. 18

work page 2020

[55] [55]

Gust soaring as a basis for the flight of petrels and albatrosses (procellari- iformes).Avian Science, 2:1–12, 2002

Colin J Pennycuick. Gust soaring as a basis for the flight of petrels and albatrosses (procellari- iformes).Avian Science, 2:1–12, 2002

work page 2002

[56] [56]

Direct observations of airflow separation over ocean surface waves.Nature Communications, 16(1):5526, 2025

Marc P Buckley, Jochen Horstmann, Ivan Savelyev, and Jeff R Carpenter. Direct observations of airflow separation over ocean surface waves.Nature Communications, 16(1):5526, 2025

work page 2025

[57] [57]

Application of reinforcement learning for autonomous dynamic soaring

Sungje Park, Adrian Fanjoy, and Vladimir V Golubev. Application of reinforcement learning for autonomous dynamic soaring. InAIAA SCITECH 2025 Forum, page 2290, 2025

work page 2025

[58] [58]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018

work page 2018

[59] [59]

Curriculum learning

Yoshua Bengio, J´ erˆ ome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009

work page 2009

[60] [60]

Domain randomization for transferring deep neural networks from simulation to the real world

Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–

work page

[61] [61]

The chorus-line hypothesis of manoeuvre coordination in avian flocks.Nature, 309(5966):344–345, 1984

Wayne K Potts. The chorus-line hypothesis of manoeuvre coordination in avian flocks.Nature, 309(5966):344–345, 1984

work page 1984

[62] [62]

Laboratory determination of startle reaction time of the starling (sturnus vulgaris).Animal Behaviour, 25:720–725, 1977

Harold Pomeroy and Frank Heppner. Laboratory determination of startle reaction time of the starling (sturnus vulgaris).Animal Behaviour, 25:720–725, 1977

work page 1977

[63] [63]

Design of a bio-inspired controller for dynamic soaring in a simulated unmanned aerial vehicle.Bioinspiration & biomimetics, 1(3):76, 2006

Renaud Barate, St´ ephane Doncieux, and Jean-Arcady Meyer. Design of a bio-inspired controller for dynamic soaring in a simulated unmanned aerial vehicle.Bioinspiration & biomimetics, 1(3):76, 2006

work page 2006

[64] [64]

Long-distance navigation and magnetoreception in migratory animals.Nature, 558(7708):50–59, 2018

Henrik Mouritsen. Long-distance navigation and magnetoreception in migratory animals.Nature, 558(7708):50–59, 2018

work page 2018

[65] [65]

Aiaa, 2003

Thomas R Yechout.Introduction to aircraft flight mechanics: performance, static stability, dy- namic stability, and classical feedback control. Aiaa, 2003

work page 2003

[66] [66]

Fixed-wing mav attitude stability in atmospheric turbulence—part 2: Investigating biologically-inspired sensors.Progress in Aerospace Sciences, 71:1–13, 2014

Abdulghani Mohamed, Simon Watkins, Reece Clothier, Mujahid Abdulrahim, Kevin Massey, and Roberto Sabatini. Fixed-wing mav attitude stability in atmospheric turbulence—part 2: Investigating biologically-inspired sensors.Progress in Aerospace Sciences, 71:1–13, 2014

work page 2014

[67] [67]

Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022

Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022

work page 2022

[68] [68]

Rectifier nonlinearities improve neural network acoustic models

Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. InProc. icml, volume 30, page 3. Atlanta, GA, 2013. 19 Supplementary Material No.NN Actor NNCritic Training SR Test SR 1 [512,512,512] [512,512,512] 95.5%±0.7% 97.3%±0.8% 2 [512,512] [512,512] 82.6%±6.4% 82.6%±2.5% 3 [256,256,256] [256,256,25...

work page 2013