pith. sign in

arxiv: 2604.12413 · v1 · submitted 2026-04-14 · ⚛️ physics.flu-dyn · cs.RO

Learning step-level dynamic soaring in shear flow

Pith reviewed 2026-05-10 15:23 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn cs.RO
keywords dynamic soaringshear flowreinforcement learningstate feedback controlenergy harvestingautonomous navigationbiological flight
0
0 comments X

The pith

Dynamic soaring can emerge from step-by-step local feedback control without any explicit trajectory planning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that sustained energy-harvesting flight in wind shear does not require planning complete cycles or assuming steady flow. Instead, a control policy that reacts at each time step to immediate local measurements is sufficient. Policies trained via reinforcement learning produce robust flight across changing shear conditions and organize naturally into a two-phase pattern of turning and climbing that trades energy gain against forward progress. This structure reproduces features seen in birds and in optimal-control calculations. The result matters because it simplifies the design of autonomous vehicles that must operate in unsteady, flow-coupled environments and offers a concrete mechanism for how animals might achieve the same performance with limited sensing.

Core claim

Dynamic soaring can emerge from step-level, state-feedback control using only local sensing, without explicit trajectory planning. Deep reinforcement learning yields policies that achieve robust omnidirectional navigation across diverse shear-flow conditions. The learned behavior organizes into a structured control law coordinating turning and vertical motion, giving rise to a two-phase strategy governed by a trade-off between energy extraction and directional progress. The resulting policy generalizes across varying conditions and reproduces key features observed in biological flight and optimal-control solutions.

What carries the argument

The step-level state-feedback policy, which at each instant maps local flow and motion measurements to coordinated changes in heading and climb rate.

Load-bearing premise

The simulation environment and reward function used in reinforcement learning faithfully capture the essential physics and objectives of real dynamic soaring.

What would settle it

Flight tests in which a real glider or drone, using only onboard local wind and motion sensors and the learned reactive rule, fails to sustain net energy gain or directional progress in measured unsteady shear.

Figures

Figures reproduced from arXiv: 2604.12413 by Hong Liu, Jinpeng Huang, Jixin Lu, Lunbing Chen, Yang Xiang, Yufei Yin.

Figure 1
Figure 1. Figure 1: Problem formulation and deep reinforcement learning framework for au￾tonomous dynamic soaring. (A) Three-dimensional trajectory of the navigation task. (B) The point-mass glider model [14]. The egocentric frame (xe, ye, z) denotes heading, left-wing, and up directions. u, v, and w represent airspeed, ground velocity, and wind velocity. The aerodynamic states are defined by pitch θ, heading ψ, and bank angl… view at source ↗
Figure 2
Figure 2. Figure 2: Emergence of a two-phase dynamic-soaring navigation strategy governed by kinetic-energy management. (A–D) Time evolution of key variables along a representative cross￾wind trajectory (Figure 1A): (A) airspeed u, ground-directed velocity vnet, and altitude z; (B) total energy e, kinetic energy ek, and potential energy ep; (C) pitch angle θ and heading angle ψ; (D) control actions CL and ϕ. The grey line ind… view at source ↗
Figure 3
Figure 3. Figure 3: Structured policy representation in observation space under a fixed condition (ψt = 90◦ , wref = 10 m/s, δ = 0.55 m). Columns 1–2 show a representative successful trajectory colored by ϕ and CL, with the start, target, and DS–TG transition marked by a green circle, red circle, and red cross. Columns 3–4 show occupancy-filtered heatmaps from 1, 000 successful DS￾phase trajectories (TG phase in Figure S3), r… view at source ↗
Figure 4
Figure 4. Figure 4: Robustness and generalization under out-of-distribution conditions. (A, C) Representative trajectory in a spatially varying wind field with coupled speed and shear variations. (B) Normalized spatial distribution of the harmonic disturbance field H(p) (defined in subsection 4.4). (D–F) Success-rate heatmaps under perturbed wind conditions, showing robust performance across variations in wind-direction scale… view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of ground-speed envelopes and energy-direction trade-offs across learned, biological, and optimal strategies. (A–C) Ground-speed envelopes under different wind conditions. (A) RL policy predictions for wref = 6, 10, 18 m/s in polar coordinates. (B) Experimental envelopes derived from biological flight data [10], fitted using a generalized additive model [11], with background shading indicating d… view at source ↗
read the original abstract

Dynamic soaring enables sustained flight by extracting energy from wind shear, yet it is commonly understood as a cycle-level maneuver that assumes stable flow conditions. In realistic unsteady environments, however, such assumptions are often violated, raising the question of whether explicit cycle-level planning is necessary. Here, we show that dynamic soaring can emerge from step-level, state-feedback control using only local sensing, without explicit trajectory planning. Using deep reinforcement learning as a tool, we obtain policies that achieve robust omnidirectional navigation across diverse shear-flow conditions. The learned behavior organizes into a structured control law that coordinates turning and vertical motion, giving rise to a two-phase strategy governed by a trade-off between energy extraction and directional progress. The resulting policy generalizes across varying conditions and reproduces key features observed in biological flight and optimal-control solutions. These findings identify a feedback-based control structure underlying dynamic soaring, demonstrating that efficient energy-harvesting flight can emerge from local interactions with the flow without explicit planning, and providing insights for biological flight and autonomous systems in complex, flow-coupled environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript uses deep reinforcement learning to obtain step-level state-feedback policies for dynamic soaring in shear flow. It claims these policies enable robust omnidirectional navigation using only local sensing, without explicit trajectory planning; the behavior self-organizes into a two-phase strategy trading energy extraction against directional progress; the policies generalize across shear conditions and reproduce key features of biological flight and optimal-control solutions.

Significance. If substantiated, the result would be significant for fluid dynamics and control: it would demonstrate that a complex, energy-harvesting maneuver can emerge from local feedback rules rather than cycle-level planning, offering a mechanistic explanation for observed avian strategies and a design principle for autonomous vehicles in unsteady flows. The RL-based discovery of an interpretable two-phase structure is a methodological strength.

major comments (3)
  1. [Abstract] Abstract: the central claim that policies achieve robust navigation, generalize, and reproduce biological/optimal features is stated without any quantitative metrics (success rates, energy gain per cycle, cross-condition statistics), error bars, or ablation studies, so the support for emergence of the two-phase strategy remains qualitative only.
  2. [Methods] Methods (state and observation definitions): the manuscript does not explicitly verify that the state vector excludes global position, absolute heading, or future wind-field information; without this, the assertions of 'local sensing' and 'no explicit planning' cannot be confirmed and are load-bearing for the main claim.
  3. [Results] Results (policy analysis): no ablation or sensitivity study is reported on the reward weights (energy extraction versus directional progress) or on added flow unsteadiness; therefore it is unclear whether the two-phase structure is an emergent property of the physics or an artifact of the specific training setup.
minor comments (2)
  1. [Methods] Notation for the state vector and action space should be collected in a single table for clarity.
  2. [Figures] Figure captions describing trajectories would benefit from explicit labels indicating the two phases and quantitative annotations (altitude, speed, energy).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the positive evaluation of the significance of our work and for the constructive comments. We address each major comment point by point below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that policies achieve robust navigation, generalize, and reproduce biological/optimal features is stated without any quantitative metrics (success rates, energy gain per cycle, cross-condition statistics), error bars, or ablation studies, so the support for emergence of the two-phase strategy remains qualitative only.

    Authors: We agree that the abstract would be strengthened by including quantitative support for the claims. In the revised manuscript we have updated the abstract to report key metrics drawn from the results, including navigation success rates across tested conditions, mean energy gain per cycle with variability, and cross-condition generalization statistics. These additions provide a more quantitative basis for the reported emergence of the two-phase strategy while preserving the abstract's conciseness. revision: yes

  2. Referee: [Methods] Methods (state and observation definitions): the manuscript does not explicitly verify that the state vector excludes global position, absolute heading, or future wind-field information; without this, the assertions of 'local sensing' and 'no explicit planning' cannot be confirmed and are load-bearing for the main claim.

    Authors: We thank the referee for highlighting the need for explicit verification. The observation vector is defined exclusively from local quantities (body-frame velocities, local shear gradient, height relative to the shear layer, and body rates); global position, absolute heading, and any future or non-local wind information are deliberately omitted by construction. We have added a dedicated verification paragraph in the Methods section that lists the exact state components and confirms the absence of global or predictive information, thereby directly supporting the local-sensing and no-explicit-planning claims. revision: yes

  3. Referee: [Results] Results (policy analysis): no ablation or sensitivity study is reported on the reward weights (energy extraction versus directional progress) or on added flow unsteadiness; therefore it is unclear whether the two-phase structure is an emergent property of the physics or an artifact of the specific training setup.

    Authors: This is a fair observation. To demonstrate that the two-phase structure is robust rather than an artifact, we have conducted additional sensitivity analyses. We varied the relative weighting between energy-extraction and directional-progress terms over a broad range and include the resulting policy behaviors in a new supplementary figure; the two-phase organization persists. We have also evaluated the trained policies under superimposed flow unsteadiness (sinusoidal perturbations to the base shear profile) and report that the strategy remains functional with only modest performance degradation. These results are now summarized in the revised Results section and support the interpretation that the structure emerges from the physics of the problem. revision: yes

Circularity Check

0 steps flagged

No significant circularity; result emerges from RL interaction rather than definitional closure

full rationale

The paper's central result is obtained by training a deep reinforcement learning policy on a simulated shear-flow environment with a reward combining energy extraction and directional progress. The two-phase strategy is discovered through environment interaction and is not presupposed in the state representation or reward definition in a way that forces the outcome by construction. No mathematical derivation chain reduces to self-referential inputs, fitted parameters renamed as predictions, or load-bearing self-citations. The simulation acts as an independent testbed, making the emergence claim self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract provides no explicit list of free parameters or invented entities. The central claim rests on standard assumptions about the fidelity of the simulated shear-flow environment and the appropriateness of the reinforcement-learning reward function.

axioms (1)
  • domain assumption The numerical simulation of shear flow and vehicle dynamics is sufficiently realistic that policies learned inside it will transfer to physical conditions.
    Invoked when claiming that the learned policies achieve robust navigation across diverse shear-flow conditions.

pith-pipeline@v0.9.0 · 5486 in / 1346 out tokens · 52686 ms · 2026-05-10T15:23:50.144517+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 1 internal anchor

  1. [1]

    Fast and fuel efficient? optimal use of wind by flying albatrosses.Proceedings of the Royal Society of London

    Henri Weimerskirch, T Guionnet, JSSA Martin, Scott A Shaffer, and DP Costa. Fast and fuel efficient? optimal use of wind by flying albatrosses.Proceedings of the Royal Society of London. Series B: Biological Sciences, 267(1455):1869–1874, 2000

  2. [2]

    Gps tracking of foraging albatrosses.Science, 295(5558):1259– 1259, 2002

    Henri Weimerskirch, Francesco Bonadonna, Fr´ ed´ eric Bailleul, G´ eraldine Mabille, Giacomo Dell’Omo, and Hans-Peter Lipp. Gps tracking of foraging albatrosses.Science, 295(5558):1259– 1259, 2002

  3. [3]

    Asymmetry hidden in birds’ tracks reveals wind, heading, and orientation ability over the ocean.Science advances, 3(9):e1700097, 2017

    Yusuke Goto, Ken Yoda, and Katsufumi Sato. Asymmetry hidden in birds’ tracks reveals wind, heading, and orientation ability over the ocean.Science advances, 3(9):e1700097, 2017. 15

  4. [4]

    Optimization of dynamic soaring in a flap-gliding seabird affects its large-scale distribution at sea.Science advances, 8(22):eabo0200, 2022

    James A Kempton, Joe Wynn, Sarah Bond, James Evry, Annette L Fayet, Natasha Gillies, Tim Guilford, Marwa Kavelaars, Ignacio Juarez-Martinez, Oliver Padget, et al. Optimization of dynamic soaring in a flap-gliding seabird affects its large-scale distribution at sea.Science advances, 8(22):eabo0200, 2022

  5. [5]

    The soaring of birds.Nature, 27(701):534–535, 1883

    Lord Rayleigh. The soaring of birds.Nature, 27(701):534–535, 1883

  6. [6]

    Experimental verification of dynamic soaring in albatrosses.Journal of Experimental Biology, 216(22):4222–4232, 2013

    G Sachs, J Traugott, AP Nesterova, and F Bonadonna. Experimental verification of dynamic soaring in albatrosses.Journal of Experimental Biology, 216(22):4222–4232, 2013

  7. [7]

    Opportunistic soaring by birds suggests new opportunities for atmospheric energy harvesting by flying robots

    Abdulghani Mohamed, Graham K Taylor, Simon Watkins, and Shane P Windsor. Opportunistic soaring by birds suggests new opportunities for atmospheric energy harvesting by flying robots. Journal of the Royal Society Interface, 19(196):20220671, 2022

  8. [8]

    Enabling new missions for robotic aircraft.Science, 326(5960):1642–1644, 2009

    Jack W Langelaan and Nicholas Roy. Enabling new missions for robotic aircraft.Science, 326(5960):1642–1644, 2009

  9. [9]

    Observations and models of across-wind flight speed of the wandering albatross.Royal Society Open Science, 9(11):211364, 2022

    Philip L Richardson and Ewan D Wakefield. Observations and models of across-wind flight speed of the wandering albatross.Royal Society Open Science, 9(11):211364, 2022

  10. [10]

    Wandering albatrosses exert high take-off effort only when both wind and waves are gentle.Elife, 12:RP87016, 2023

    Leo Uesaka, Yusuke Goto, Masaru Naruoka, Henri Weimerskirch, Katsufumi Sato, and Kentaro Q Sakamoto. Wandering albatrosses exert high take-off effort only when both wind and waves are gentle.Elife, 12:RP87016, 2023

  11. [11]

    Albatrosses employ orientation and routing strategies similar to yacht racers.Proceedings of the National Academy of Sciences, 121(23):e2312851121, 2024

    Yusuke Goto, Henri Weimerskirch, Keiichi Fukaya, Ken Yoda, Masaru Naruoka, and Katsufumi Sato. Albatrosses employ orientation and routing strategies similar to yacht racers.Proceedings of the National Academy of Sciences, 121(23):e2312851121, 2024

  12. [12]

    Minimum shear wind strength required for dynamic soaring of albatrosses.Ibis, 147(1):1–10, 2005

    Gottfried Sachs. Minimum shear wind strength required for dynamic soaring of albatrosses.Ibis, 147(1):1–10, 2005

  13. [13]

    Engineless unmanned aerial vehicle propulsion by dynamic soaring.Journal of guidance, control, and dynamics, 32(5):1446–1457, 2009

    Markus Deittert, Arthur Richards, Chris A Toomer, and Anthony Pipe. Engineless unmanned aerial vehicle propulsion by dynamic soaring.Journal of guidance, control, and dynamics, 32(5):1446–1457, 2009

  14. [14]

    Optimal dynamic soaring trades off energy harvest and directional flight.iScience, 28(6), 2025

    Lunbing Chen, Yufei Yin, Yang Xiang, Suyang Qin, and Hong Liu. Optimal dynamic soaring trades off energy harvest and directional flight.iScience, 28(6), 2025

  15. [15]

    Soaring energetics and glide performance in a moving atmosphere.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704):20150398, 2016

    Graham K Taylor, Kate V Reynolds, and Adrian LR Thomas. Soaring energetics and glide performance in a moving atmosphere.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704):20150398, 2016

  16. [16]

    Optimal dynamic soar- ing consists of successive shallow arcs.Journal of The Royal Society Interface, 14(135):20170496, 2017

    Gabriel D Bousquet, Michael S Triantafyllou, and Jean-Jacques E Slotine. Optimal dynamic soar- ing consists of successive shallow arcs.Journal of The Royal Society Interface, 14(135):20170496, 2017

  17. [17]

    Kinetic energy in dynamic soaring—inertial speed and airspeed.Journal of Guidance, Control, and Dynamics, 42(8):1812–1821, 2019

    Gottfried Sachs. Kinetic energy in dynamic soaring—inertial speed and airspeed.Journal of Guidance, Control, and Dynamics, 42(8):1812–1821, 2019

  18. [18]

    Dynamic soaring in finite-thickness wind shears: an asymptotic solution

    Gabriel D Bousquet, Michael S Triantafyllou, and Jean-Jacques E Slotine. Dynamic soaring in finite-thickness wind shears: an asymptotic solution. InAIAA Guidance, Navigation, and Control Conference, page 1908, 2017

  19. [19]

    Towards Robust Optimization-Based Autonomous Dynamic Soaring with a Fixed-Wing UAV

    Marvin Harms, Jaeyoung Lim, David Rohr, Friedrich Rockenbauer, Nicholas Lawrance, and Roland Siegwart. Robust optimization-based autonomous dynamic soaring with a fixed-wing uav.arXiv preprint arXiv:2512.06610, 2025

  20. [20]

    Novel approach to dynamic soaring modeling and simulation.Journal of Guidance, Control, and Dynamics, 42(6):1250–1260, 2019

    Jean-Marie Kai, Tarek Hamel, and Claude Samson. Novel approach to dynamic soaring modeling and simulation.Journal of Guidance, Control, and Dynamics, 42(6):1250–1260, 2019

  21. [21]

    Wind field estima- tion for autonomous dynamic soaring

    Jack W Langelaan, John Spletzer, Corey Montella, and Joachim Grenestedt. Wind field estima- tion for autonomous dynamic soaring. In2012 IEEE International conference on robotics and automation, pages 16–22. IEEE, 2012. 16

  22. [22]

    Physics and modeling of large flow dis- turbances: discrete gust encounters for modern air vehicles.Annual Review of Fluid Mechanics, 54(1):469–493, 2022

    Anya R Jones, Oksan Cetiner, and Marilyn J Smith. Physics and modeling of large flow dis- turbances: discrete gust encounters for modern air vehicles.Annual Review of Fluid Mechanics, 54(1):469–493, 2022

  23. [23]

    Closing the loop in dynamic soaring

    John J Bird, Jack W Langelaan, Corey Montella, John Spletzer, and Joachim L Grenestedt. Closing the loop in dynamic soaring. InAIAA Guidance, Navigation, and Control Conference, page 0263, 2014

  24. [24]

    Dynamic soaring under differ- ent atmospheric stability conditions.Journal of Guidance, Control, and Dynamics, 46(5):970–977, 2023

    Haichao Hong, Luoqin Liu, Florian Holzapfel, and Gottfried Sachs. Dynamic soaring under differ- ent atmospheric stability conditions.Journal of Guidance, Control, and Dynamics, 46(5):970–977, 2023

  25. [25]

    Flight testing of dynamic soaring part-2: Open-field inclined circle trajectory

    Murat Bronz, Nikola Gavrilovic, Antoine Drouin, Gautier Hattenberger, and Jean-Marc Moschetta. Flight testing of dynamic soaring part-2: Open-field inclined circle trajectory. In AIAA Aviation 2021 Forum, page 2803, 2021

  26. [26]

    Learning to soar in turbulent environments.Proceedings of the National Academy of Sciences, 113(33):E4877– E4884, 2016

    Gautam Reddy, Antonio Celani, Terrence J Sejnowski, and Massimo Vergassola. Learning to soar in turbulent environments.Proceedings of the National Academy of Sciences, 113(33):E4877– E4884, 2016

  27. [27]

    Learning efficient navigation in vortical flow fields.Nature communications, 12(1):7143, 2021

    Peter Gunnarson, Ioannis Mandralis, Guido Novati, Petros Koumoutsakos, and John O Dabiri. Learning efficient navigation in vortical flow fields.Nature communications, 12(1):7143, 2021

  28. [28]

    Sensing flow gradients is necessary for learning autonomous underwater navigation.Nature Communications, 16(1):3044, 2025

    Yusheng Jiao, Haotian Hang, Josh Merel, and Eva Kanso. Sensing flow gradients is necessary for learning autonomous underwater navigation.Nature Communications, 16(1):3044, 2025

  29. [29]

    Hierarchical reinforcement learning approach for autonomous cross-country soaring.Journal of Guidance, Control, and Dynamics, 46(1):114–126, 2023

    Stefan Notter, Fabian Schimpf, Gregor M¨ uller, and Walter Fichter. Hierarchical reinforcement learning approach for autonomous cross-country soaring.Journal of Guidance, Control, and Dynamics, 46(1):114–126, 2023

  30. [30]

    Towards development of a dynamic soaring capable uav using reinforcement learning

    Jacob R Adamski, Vladimir V Golubev, Snorri Gudmundsson, and Fedor Kuznetsov. Towards development of a dynamic soaring capable uav using reinforcement learning. InAIAA AVIATION 2023 Forum, page 4455, 2023

  31. [31]

    Revealing principles of au- tonomous thermal soaring in windy conditions using vulture-inspired deep reinforcement-learning

    Yoav Flato, Roi Harel, Aviv Tamar, Ran Nathan, and Tsevi Beatus. Revealing principles of au- tonomous thermal soaring in windy conditions using vulture-inspired deep reinforcement-learning. Nature Communications, 15(1):4942, 2024

  32. [32]

    Larval zebrafish minimize energy consumption during hunting via adaptive movement selection.Proceedings of the National Academy of Sciences, 123(7):e2513853123, 2026

    Thomas Darveniza, Robert Wong, Shuyu I Zhu, Zac Pujic, Biao Sun, Matthew Levendosky, Ramesh Agarwal, Michael H McCullough, and Geoffrey J Goodhill. Larval zebrafish minimize energy consumption during hunting via adaptive movement selection.Proceedings of the National Academy of Sciences, 123(7):e2513853123, 2026

  33. [33]

    Reinforcement learning for autonomous dynamic soaring in shear winds

    Corey Montella and John R Spletzer. Reinforcement learning for autonomous dynamic soaring in shear winds. In2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 3423–3428. IEEE, 2014

  34. [34]

    Efficient collective swimming by harnessing vortices through deep reinforcement learning.Proceedings of the National Academy of Sciences, 115(23):5849–5854, 2018

    Siddhartha Verma, Guido Novati, and Petros Koumoutsakos. Efficient collective swimming by harnessing vortices through deep reinforcement learning.Proceedings of the National Academy of Sciences, 115(23):5849–5854, 2018

  35. [35]

    A comprehensive assessment to the potential of reinforcement learning in dynamic soaring

    Sara Abozeid, Sameer Pokhrel, and Sameh Eisa. A comprehensive assessment to the potential of reinforcement learning in dynamic soaring. InAIAA SCITECH 2023 Forum, page 2236, 2023

  36. [36]

    A framework for developing robust, autonomous, power man- aged dynamic soaring flight controllers using deep reinforcement learning

    Milo F DiPaola and Tyler F Barkin. A framework for developing robust, autonomous, power man- aged dynamic soaring flight controllers using deep reinforcement learning. InAIAA AVIATION 2023 Forum, page 4046, 2023

  37. [37]

    Dynamic soaring in uavs: a deep reinforcement learning approach.The Aeronautical Journal, pages 1–29, 2026

    Mishma Akhtar, Adnan Maqsood, Imran Mir, and Baris Gungordu. Dynamic soaring in uavs: a deep reinforcement learning approach.The Aeronautical Journal, pages 1–29, 2026. 17

  38. [38]

    How did extinct giant birds and pterosaurs fly? a comprehensive modeling approach to evaluate soaring performance.PNAS nexus, 1(1):pgac023, 2022

    Yusuke Goto, Ken Yoda, Henri Weimerskirch, and Katsufumi Sato. How did extinct giant birds and pterosaurs fly? a comprehensive modeling approach to evaluate soaring performance.PNAS nexus, 1(1):pgac023, 2022

  39. [39]

    Optimal patterns of glider dynamic soaring.Optimal control applications and methods, 25(2):67–89, 2004

    Yiyuan J Zhao. Optimal patterns of glider dynamic soaring.Optimal control applications and methods, 25(2):67–89, 2004

  40. [40]

    MIT press Cambridge, 1998

    Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

  41. [41]

    McGraw hill, 2011

    John Anderson.EBOOK: Fundamentals of Aerodynamics (SI units). McGraw hill, 2011

  42. [42]

    Flying at no mechanical energy cost: disclosing the secret of wandering albatrosses

    Gottfried Sachs, Johannes Traugott, Anna P Nesterova, Giacomo Dell’Omo, Franz K¨ ummeth, Wolfgang Heidrich, Alexei L Vyssotski, and Francesco Bonadonna. Flying at no mechanical energy cost: disclosing the secret of wandering albatrosses. 2012

  43. [43]

    Flight speed and performance of the wandering albatross with respect to wind.Movement ecology, 6(1):3, 2018

    Philip L Richardson, Ewan D Wakefield, and Richard A Phillips. Flight speed and performance of the wandering albatross with respect to wind.Movement ecology, 6(1):3, 2018

  44. [44]

    Springer Science & Business Media, 2012

    Roland B Stull.An introduction to boundary layer meteorology. Springer Science & Business Media, 2012

  45. [45]

    Miniature multihole airflow sensor for lightweight aircraft over wide speed and angular range.IEEE Robotics and Automation Letters, 2025

    Lukas Stuber, Simon Luis Jeger, Raphael Zufferey, and Dario Floreano. Miniature multihole airflow sensor for lightweight aircraft over wide speed and angular range.IEEE Robotics and Automation Letters, 2025

  46. [46]

    Evolutionary trade-offs, pareto optimality, and the geometry of pheno- type space.Science, 336(6085):1157–1160, 2012

    Oren Shoval, Hila Sheftel, Guy Shinar, Yuval Hart, Omer Ramote, Avi Mayo, Erez Dekel, Kathryn Kavanagh, and Uri Alon. Evolutionary trade-offs, pareto optimality, and the geometry of pheno- type space.Science, 336(6085):1157–1160, 2012

  47. [47]

    Continuous control with deep reinforcement learning, September 15 2020

    Timothy Paul Lillicrap, Jonathan James Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daniel Pieter Wierstra. Continuous control with deep reinforcement learning, September 15 2020. US Patent 10,776,692

  48. [48]

    Sim-to-real transfer of robotic control with dynamics randomization

    Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Sim-to-real transfer of robotic control with dynamics randomization. In2018 IEEE international conference on robotics and automation (ICRA), pages 3803–3810. IEEE, 2018

  49. [49]

    Optimal feedback control as a theory of motor coordi- nation.Nature neuroscience, 5(11):1226–1235, 2002

    Emanuel Todorov and Michael I Jordan. Optimal feedback control as a theory of motor coordi- nation.Nature neuroscience, 5(11):1226–1235, 2002

  50. [50]

    Yoshinari Yonehara, Yusuke Goto, Ken Yoda, Yutaka Watanuki, Lindsay C Young, Henri Weimer- skirch, Charles-Andr´ e Bost, and Katsufumi Sato. Flight paths of seabirds soaring over the ocean surface enable measurement of fine-scale wind speed and direction.Proceedings of the National Academy of Sciences, 113(32):9039–9044, 2016

  51. [51]

    Wing-strain-based flight control of flapping-wing drones through reinforcement learning.Nature Machine Intelligence, 6(9):992–1005, 2024

    Taewi Kim, Insic Hong, Sunghoon Im, Seungeun Rho, Minho Kim, Yeonwook Roh, Changhwan Kim, Jieun Park, Daseul Lim, Doohoe Lee, et al. Wing-strain-based flight control of flapping-wing drones through reinforcement learning.Nature Machine Intelligence, 6(9):992–1005, 2024

  52. [52]

    Flap or soar? how a flight generalist responds to its aerial environment.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704), 2016

    Judy Shamoun-Baranes, Willem Bouten, E Emiel Van Loon, Christiaan Meijer, and CJ Cam- phuysen. Flap or soar? how a flight generalist responds to its aerial environment.Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1704), 2016

  53. [53]

    Glider soaring via reinforcement learning in the field.Nature, 562(7726):236–239, 2018

    Gautam Reddy, Jerome Wong-Ng, Antonio Celani, Terrence J Sejnowski, and Massimo Vergassola. Glider soaring via reinforcement learning in the field.Nature, 562(7726):236–239, 2018

  54. [54]

    Wind, waves, and surface currents in the southern ocean: observations from the antarctic circumnaviga- tion expedition.Earth System Science Data Discussions, 2020:1–22, 2020

    Marzieh H Derkani, Alberto Alberello, Filippo Nelli, Luke G Bennetts, Katrin G Hessner, Keith MacHutchon, Konny Reichert, Lotfi Aouf, Salman Saeed Khan, and Alessandro Toffoli. Wind, waves, and surface currents in the southern ocean: observations from the antarctic circumnaviga- tion expedition.Earth System Science Data Discussions, 2020:1–22, 2020. 18

  55. [55]

    Gust soaring as a basis for the flight of petrels and albatrosses (procellari- iformes).Avian Science, 2:1–12, 2002

    Colin J Pennycuick. Gust soaring as a basis for the flight of petrels and albatrosses (procellari- iformes).Avian Science, 2:1–12, 2002

  56. [56]

    Direct observations of airflow separation over ocean surface waves.Nature Communications, 16(1):5526, 2025

    Marc P Buckley, Jochen Horstmann, Ivan Savelyev, and Jeff R Carpenter. Direct observations of airflow separation over ocean surface waves.Nature Communications, 16(1):5526, 2025

  57. [57]

    Application of reinforcement learning for autonomous dynamic soaring

    Sungje Park, Adrian Fanjoy, and Vladimir V Golubev. Application of reinforcement learning for autonomous dynamic soaring. InAIAA SCITECH 2025 Forum, page 2290, 2025

  58. [58]

    Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor

    Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. InInternational conference on machine learning, pages 1861–1870. Pmlr, 2018

  59. [59]

    Curriculum learning

    Yoshua Bengio, J´ erˆ ome Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pages 41–48, 2009

  60. [60]

    Domain randomization for transferring deep neural networks from simulation to the real world

    Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. Domain randomization for transferring deep neural networks from simulation to the real world. In2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 23–

  61. [61]

    The chorus-line hypothesis of manoeuvre coordination in avian flocks.Nature, 309(5966):344–345, 1984

    Wayne K Potts. The chorus-line hypothesis of manoeuvre coordination in avian flocks.Nature, 309(5966):344–345, 1984

  62. [62]

    Laboratory determination of startle reaction time of the starling (sturnus vulgaris).Animal Behaviour, 25:720–725, 1977

    Harold Pomeroy and Frank Heppner. Laboratory determination of startle reaction time of the starling (sturnus vulgaris).Animal Behaviour, 25:720–725, 1977

  63. [63]

    Design of a bio-inspired controller for dynamic soaring in a simulated unmanned aerial vehicle.Bioinspiration & biomimetics, 1(3):76, 2006

    Renaud Barate, St´ ephane Doncieux, and Jean-Arcady Meyer. Design of a bio-inspired controller for dynamic soaring in a simulated unmanned aerial vehicle.Bioinspiration & biomimetics, 1(3):76, 2006

  64. [64]

    Long-distance navigation and magnetoreception in migratory animals.Nature, 558(7708):50–59, 2018

    Henrik Mouritsen. Long-distance navigation and magnetoreception in migratory animals.Nature, 558(7708):50–59, 2018

  65. [65]

    Aiaa, 2003

    Thomas R Yechout.Introduction to aircraft flight mechanics: performance, static stability, dy- namic stability, and classical feedback control. Aiaa, 2003

  66. [66]

    Fixed-wing mav attitude stability in atmospheric turbulence—part 2: Investigating biologically-inspired sensors.Progress in Aerospace Sciences, 71:1–13, 2014

    Abdulghani Mohamed, Simon Watkins, Reece Clothier, Mujahid Abdulrahim, Kevin Massey, and Roberto Sabatini. Fixed-wing mav attitude stability in atmospheric turbulence—part 2: Investigating biologically-inspired sensors.Progress in Aerospace Sciences, 71:1–13, 2014

  67. [67]

    Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022

    Michael O’Connell, Guanya Shi, Xichen Shi, Kamyar Azizzadenesheli, Anima Anandkumar, Yisong Yue, and Soon-Jo Chung. Neural-fly enables rapid learning for agile flight in strong winds.Science Robotics, 7(66):eabm6597, 2022

  68. [68]

    Rectifier nonlinearities improve neural network acoustic models

    Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. InProc. icml, volume 30, page 3. Atlanta, GA, 2013. 19 Supplementary Material No.NN Actor NNCritic Training SR Test SR 1 [512,512,512] [512,512,512] 95.5%±0.7% 97.3%±0.8% 2 [512,512] [512,512] 82.6%±6.4% 82.6%±2.5% 3 [256,256,256] [256,256,25...