pith. the verified trust layer for science. sign in

arxiv: 2604.14553 · v1 · submitted 2026-04-16 · ⚛️ physics.flu-dyn · physics.comp-ph

Learning to traverse convective flows at moderate to high Rayleigh numbers

Pith reviewed 2026-05-10 10:45 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn physics.comp-ph
keywords Rayleigh-Bénard convectionreinforcement learningLagrangian coherent structuresparticle navigationturbulent transportflow topologyconvective flows
0
0 comments X p. Extension

The pith

A reinforcement learning agent learns to navigate convective turbulence by crossing repelling barriers and riding attracting pathways.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines navigation of a self-propelled inertial particle in two-dimensional Rayleigh-Bénard convection across Rayleigh numbers from 10^7 to 10^11. An RL controller chooses bounded propulsive acceleration to reach a target horizontal displacement. At moderate Ra the large-scale circulation creates robust transport barriers that demand a finite thrust surplus to cross, while at higher Ra the flow reorganizes, barriers fragment, and transient plume-assisted routes appear, so that total propulsion energy drops even though travel time increases. The learned policy outperforms constant-heading control by aligning with local currents, and Lagrangian coherent structure analysis shows it crosses repelling barriers while surfing attracting ones; these behaviors are then mapped to Eulerian topology via Voronoi tessellation and the Q-criterion to yield a simple physics-based heuristic that retains robust performance.

Core claim

In 2D Rayleigh-Bénard convection the RL agent inherently learns to cross repelling Lagrangian barriers and surf along attracting pathways. Proper orthogonal decomposition shows that performance differences arise from reorganization of the carrier flow: at moderate Ra dominant large-scale circulation partitions the domain through robust barriers, while at higher Ra energy spreads across many modes, barriers fragment, and plume-assisted pathways emerge. Mapping the observed behaviors onto local Eulerian flow topology with Voronoi tessellation and the Q-criterion distils an interpretable heuristic strategy that achieves robust navigability with lower energy than constant-heading baselines.

What carries the argument

The bounded-acceleration reinforcement-learning policy whose actions are interpreted through Lagrangian coherent structure analysis, proper orthogonal decomposition of the velocity field, and Eulerian topology measures (Voronoi tessellation and Q-criterion).

If this is right

  • Success rate increases abruptly with maximum acceleration at moderate Ra but shifts to larger values and becomes more gradual at high Ra.
  • Although completion time grows with Ra, the propulsion energy required for successful traversal decreases because of flow reorganization.
  • The learned policy consumes significantly less energy than constant-heading control by aligning with local currents.
  • At higher Ra, transient plume-assisted pathways emerge as transport barriers fragment.
  • A simple heuristic distilled from local Eulerian flow topology achieves comparable robust navigability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same controller might transfer to 3D convection or laboratory cells if the key topological features (barrier crossing and plume riding) persist across dimensionality.
  • The distilled heuristic could guide design of energy-efficient autonomous vehicles in other turbulent flows such as ocean currents or atmospheric convection.
  • Testing whether the RL-discovered behaviors survive changes in Prandtl number or cell aspect ratio would reveal how sensitive the strategy is to the carrier flow details.
  • The observed drop in required energy at high Ra suggests similar navigation advantages may appear in other high-Ra regimes once barriers fragment.

Load-bearing premise

The navigation behaviors learned in this fixed 2D incompressible setup at Pr=0.71 and aspect ratio 4 remain useful when the same controller is placed in 3D convection or laboratory experiments.

What would settle it

If the distilled heuristic strategy fails to produce robust horizontal navigation when implemented in a 3D Rayleigh-Bénard simulation at the same range of Rayleigh numbers, the claim that the learned behaviors generalize would be falsified.

Figures

Figures reproduced from arXiv: 2604.14553 by Ao Xu, Ben-Rui Xu, Heng-Dong Xi, Hua-Lin Wu.

Figure 1
Figure 1. Figure 1: Instantaneous dimensionless temperature field 𝑇 ∗ at 𝑃𝑟 = 0.71 in a cell of aspect ratio 𝛤 = 4 for (a) 𝑅𝑎 = 107 , (b) 𝑅𝑎 = 108 , (c) 𝑅𝑎 = 109 , (d) 𝑅𝑎 = 1010, and (e) 𝑅𝑎 = 1011. Here, 𝑇 ∗ = (𝑇 − 𝑇0)/𝛥𝑇 , where 𝛥𝑇 denotes the temperature difference between the heated bottom wall and the cooled top wall. coherent barriers and widens the available pathways, thereby improving reachability under a fixed actuati… view at source ↗
Figure 2
Figure 2. Figure 2: Representative navigation trajectories for the fixed-displacement task. Panels (a,b) show single trajectories, and panels (c,d) show trajectory ensembles, for (a) 𝑅𝑎 = 108 with Amax = 1.5, (b) 𝑅𝑎 = 1010 with Amax = 5.0, (c) 𝑅𝑎 = 108 with Amax = 1.0, and (d) 𝑅𝑎 = 1010 with Amax = 3.0. The background contours indicate the instantaneous dimensionless temperature 𝑇 ∗ . The prescribed horizontal displacement is… view at source ↗
Figure 3
Figure 3. Figure 3: Success rate 𝑆 = 𝑁1/𝑁0 as a function of (a) the maximum propulsive acceleration, Amax = max(∥𝒂propel ∥/𝑔), and (b) the navigable-area fraction 𝛾, for Rayleigh numbers 107 ≤ 𝑅𝑎 ≤ 1011. Here, 𝛾 denotes the spatiotemporal mean fraction of the flow domain in which the particle’s active terminal velocity 𝑈 ∗ 𝑡,max exceeds the local fluid speed ∥𝒖 ∗ 𝑓 ∥. and 1010, the transition shifts to larger Amax and becomes… view at source ↗
Figure 4
Figure 4. Figure 4: Dimensionless completion time 𝑡 ∗ comp for the fixed-displacement task as a function of (a) the maximum propulsive acceleration Amax for different 𝑅𝑎 and (b) the Rayleigh number 𝑅𝑎 for different Amax. define the particle’s dimensionless active terminal velocity in a quiescent fluid, determined by the balance between horizontal propulsion and Stokes drag, as 𝑈 ∗ 𝑡,max = 𝑆𝑡Amax𝛬. We then calculate the naviga… view at source ↗
Figure 5
Figure 5. Figure 5: Dimensionless propulsion energy 𝐸 ∗ propel for the fixed-displacement task as a function of (a) the maximum propulsive acceleration Amax for different 𝑅𝑎 and (b) the Rayleigh number 𝑅𝑎 for different Amax. similar slopes. This linear scaling is consistent with traversal across successive coherent LSC cells. In this regime, each increment in actuation produces a comparable increment in mechanical work, as al… view at source ↗
Figure 6
Figure 6. Figure 6: Normalised cumulative modal energy Í𝑚 𝑖=1 𝜆𝑖 as a function of mode number 𝑚 for (a) 𝑅𝑎 = 108 and (b) 𝑅𝑎 = 1010. The vertical dashed lines indicate the number of modes required to capture 99% of the total fluctuating kinetic energy. Panel (c) shows the required number of modes as a function of 𝑅𝑎. Rapid saturation of this curve indicates strong flow coherence, whereas slow saturation implies that the energy… view at source ↗
Figure 7
Figure 7. Figure 7: Generalisability of the learned policies across Rayleigh numbers. The heatmap shows the success rate 𝑆 (%) for policies trained at a given 𝑅𝑎 (rows) and evaluated at different test values of 𝑅𝑎 (columns). All cross-evaluations are performed with the same actuation bound of Amax = 5.0. 𝑅/𝑄 is large, the reward for forward progress dominates; in the limit 𝑅/𝑄 → ∞, the reward reduces to 𝑟𝑡 = 𝑉 ∗ eff (𝑡). This… view at source ↗
Figure 8
Figure 8. Figure 8: (a) Average cumulative reward ⟨ Í 𝑟𝑡⟩ and (b) success rate 𝑆 as functions of the reward-weighting ratio 𝑅/𝑄 for different Rayleigh numbers 𝑅𝑎. (a) (b) [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: (a) Average cumulative reward components⟨ Í 𝑟𝑡⟩ and (b) their relative variations𝜎( Í 𝑟𝑡)/⟨Í 𝑟𝑡⟩. The curves show how the total reward decomposes into the time-efficiency term Í 𝑅𝑉∗ eff (squares) and the control￾effort penalty term Í 𝑄∥𝒂 ∗ propel ∥ (triangles), as functions of the reward-weighting ratio 𝑅/𝑄 at 𝑅𝑎 = 109 . grows rapidly, whereas Í 𝑄∥𝒂 ∗ propel∥ remains nearly constant. The improvement in suc… view at source ↗
Figure 10
Figure 10. Figure 10: Trajectories illustrating the limiting behaviours of the reward formulation for two actuation bounds. Panels (a,b) show the time-reward-only limit (𝑅/𝑄 → ∞), and panels (c,d) show the energy-penalty-only limit (𝑅/𝑄 → 0), for (a,c) Amax = 2.5 and (b,d) Amax = 10. The background contours indicate the instantaneous dimensionless temperature 𝑇 ∗ , and the colour along each trajectory indicates the dimensionle… view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of the learned RL policy with the constant-heading baseline for the fixed-displacement task. Panels (a) and (b) show trajectories obtained with the RL policy and the constant-heading baseline, respectively, at 𝑅𝑎 = 1010 and Amax = 5.0. The background contours indicate the instantaneous dimensionless temperature 𝑇 ∗ , and the colour along each trajectory indicates the dimensionless particle spee… view at source ↗
Figure 12
Figure 12. Figure 12: Probability density functions (p.d.f.s) of the instantaneous alignment angle 𝜙 between the particle’s propulsive vector and the local fluid-velocity vector. The data are ensemble over all successful trajectories obtained with (a) the learned RL policy and (b) the constant-heading baseline for different Rayleigh numbers 𝑅𝑎. To further compare propulsive behaviour during navigation, we examine the probabili… view at source ↗
Figure 13
Figure 13. Figure 13: Instantaneous snapshots of the Lagrangian coherent structures (LCS) at 𝑡 ∗ = 9.00 for (a,c) 𝑅𝑎 = 108 and (b,d) 𝑅𝑎 = 1010. Panels (a,b) show the attracting LCS (blue regions), whereas panels (c,d) show the repelling LCS (red regions). over the corresponding crossing interval (e.g. from 𝑡1 to 𝑡5). Applying a threshold of 60% of the maximum value to this time-averaged field allowed us to extract the persiste… view at source ↗
Figure 14
Figure 14. Figure 14: (a) Representative trajectory of the RL agent at 𝑅𝑎 = 108 with Amax = 1.25, corresponding to the optimal policy for this regime. The upper panel shows the repelling LCS over the full domain, whereas the lower panel provides a magnified view of the dashed region and highlights a barrier-crossing event. Coloured markers indicate five representative instants, 𝑡1–𝑡5, along the trajectory. (b) Corresponding ti… view at source ↗
Figure 15
Figure 15. Figure 15: (a) Representative trajectory of the RL agent at 𝑅𝑎 = 1010 with Amax = 4.0, corresponding to the optimal policy for this regime. The upper panel shows the repelling LCS over the full domain, whereas the lower panel provides a magnified view of the dashed region and highlights a typical event in which the trajectory crosses a repelling LCS. Coloured markers indicate five representative instants, 𝑡1–𝑡5, alo… view at source ↗
Figure 16
Figure 16. Figure 16: Instantaneous Voronoi tessellations of (a,b) passive tracer and (c,d) active navigators distribution at 𝑡 ∗ = 9.00; (e,f) show the corresponding temporal evolution of the clustering index 𝐶𝑉 for the optimal actuation conditions. The Voronoi cells are coloured by the instantaneous background temperature field 𝑇 ∗ . Panels (a,c,e) correspond to 𝑅𝑎 = 108 with Amax = 1.25, and panels (b,d,f) correspond to 𝑅𝑎 … view at source ↗
Figure 17
Figure 17. Figure 17: Joint probability distributions of the Voronoi-cell area 𝐴 and the local 𝑄-value at 𝑡 ∗ = 9.00 for the optimal actuation conditions. Panels (a,b) show passive tracers at 𝑅𝑎 = 108 and 𝑅𝑎 = 1010, respectively, whereas panels (c,d) show active navigators at the corresponding Rayleigh numbers. The colour scale in panels (a)–(d) denotes the joint probability density 𝑝(𝑄, 𝐴/⟨𝐴⟩), where ⟨𝐴⟩ is the mean Voronoi-c… view at source ↗
Figure 18
Figure 18. Figure 18: Comparison of representative trajectories and performance between the distilled heuristic strategy and the full RL policy. Panels (a,b) show representative trajectories of the heuristic strategy at the optimal actuation bounds: (a) 𝑅𝑎 = 108 with Amax = 1.25 and (b) 𝑅𝑎 = 1010 with Amax = 4.0. The background colour indicates the instantaneous temperature field 𝑇 ∗ . The blue square and red star mark the sta… view at source ↗
Figure 19
Figure 19. Figure 19: Success rate 𝑆 for the different observation state spaces at (a) 𝑅𝑎 = 108 and (b) 𝑅𝑎 = 1010 , evaluated at the respective optimal values of Amax. Four baseline observation sets are considered: O1 = {𝒖 𝑓 }, O2 = {𝒖 𝑓 , 𝑇}, O3 = {𝒖 𝑓 , 𝒖 𝑝 }, and O4 = {𝒖 𝑓 , 𝒖 𝑝, 𝑇}. For each baseline set O𝑖 , three augmentations are introduced: the inclusion of spatial information X, the inclusion of gradient information ∇… view at source ↗
Figure 20
Figure 20. Figure 20: Sample learning curves showing the cumulative reward Í 𝑟𝑡 , averaged over training seeds, as a function of training timesteps for 𝑅𝑎 = 108 . The panels compare training convergence for different observation sets: (a) the four baseline sets O𝑖 , (b) baselines augmented with spatial information X, (c) baselines augmented with gradient information ∇O𝑖 , and (d) baselines augmented with both X and ∇O𝑖 . Shade… view at source ↗
Figure 21
Figure 21. Figure 21: Sample learning curves showing the averaged cumulative reward Í 𝑟𝑡 as a function of training timesteps for 𝑅𝑎 = 1010. The panels compare the training convergence across different observation sets: (a) the four baseline sets O𝑖 , (b) baselines augmented with spatial information X, (c) baselines augmented with gradient information ∇O𝑖 , and (d) baselines augmented with both X and ∇O𝑖 . Shaded regions repres… view at source ↗
read the original abstract

We study the navigation of a self-propelled inertial particle in two-dimensional Rayleigh--B\'enard convection at Prandtl number $Pr = 0.71$ and cell aspect ratio $\Gamma = 4$ for Rayleigh numbers $Ra$ ranging from $10^{7}$ to $10^{11}$. A reinforcement-learning (RL) controller selects the propulsive acceleration, subject to an upper bound $\mathcal{A}_{\max}$, to achieve a prescribed horizontal displacement. We find that the success rate increases abruptly with $\mathcal{A}_{\max}$ at moderate $Ra$, whereas at higher $Ra$ the transition becomes more gradual and shifts to larger $\mathcal{A}_{\max}$. Moreover, although the completion time increases with $Ra$, the propulsion energy required for successful traversal decreases. Proper orthogonal decomposition (POD) reveals that these performance differences arise from reorganisation of the carrier flow. At moderate $Ra$, the dominant large-scale circulation partitions the domain through robust transport barriers, requiring a finite thrust surplus to cross them; at higher $Ra$, energy is distributed across many modes, the barriers fragment, and transient plume-assisted pathways emerge. Compared with a constant-heading baseline, the learned policy aligns with local currents and consumes significantly less energy. Lagrangian coherent structure (LCS) analysis further shows that the RL agent inherently learns to cross repelling barriers and surf along attracting pathways. Finally, by mapping these behaviours onto the local Eulerian flow topology using Voronoi tessellation and the $Q$-criterion, we distil an interpretable, physics-based heuristic strategy that achieves robust navigability. These results connect turbulent-flow organisation with autonomous navigation under bounded actuation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies reinforcement learning control of a self-propelled inertial particle navigating horizontal displacement in 2D Rayleigh-Bénard convection (Pr=0.71, Γ=4, Ra=10^7 to 10^11). It reports that success rate transitions with A_max become more gradual and shift higher at larger Ra, while required propulsion energy decreases despite longer completion times. POD shows flow reorganization from robust large-scale barriers at moderate Ra to fragmented barriers and plume pathways at high Ra. LCS analysis indicates the RL policy crosses repelling structures and follows attracting ones; mapping these behaviors via Voronoi tessellation and the Q-criterion is claimed to yield an interpretable heuristic that itself achieves robust navigability. Comparisons to constant-heading control are also presented.

Significance. If substantiated, the work is significant for linking RL-derived navigation strategies to concrete Lagrangian and Eulerian flow structures in high-Ra convection. The combination of POD, LCS, and topology-based heuristic extraction provides a template for interpreting data-driven controllers in turbulent flows and could inform bounded-actuation navigation in convective environments. The observation that energy cost decreases with Ra while success improves via transient pathways is a potentially useful physical insight.

major comments (2)
  1. [Abstract / heuristic distillation section] Abstract and the section describing the heuristic extraction: the claim that the Voronoi/Q-criterion mapping 'distils an interpretable, physics-based heuristic strategy that achieves robust navigability' is not supported by direct evidence. The manuscript presents a post-hoc correlation between RL trajectories and local Eulerian topology but does not report results from deploying the extracted rule (e.g., a Q-sign and Voronoi-cell-based policy) as a standalone controller and comparing its success rates, energy, and barrier-crossing statistics to the RL agent across the Ra range.
  2. [Results (RL performance metrics)] Results section on RL performance: the reported abrupt/gradual transitions in success rate, the decrease in propulsion energy with Ra, and the superiority over constant-heading control are presented without error bars, statistics from multiple independent training runs, or ablation checks on state representation, reward formulation, or network architecture. This makes it impossible to assess whether the trends are robust or sensitive to training stochasticity.
minor comments (2)
  1. [Throughout] The notation for the actuation bound (A_max vs script A_max) should be unified for clarity.
  2. [Figures showing LCS and topology] Figure captions for the LCS and Voronoi visualizations would benefit from explicit arrows or annotations linking specific RL trajectory segments to the identified repelling/attracting structures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important points on evidence strength and statistical robustness, which we address below with planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract / heuristic distillation section] Abstract and the section describing the heuristic extraction: the claim that the Voronoi/Q-criterion mapping 'distils an interpretable, physics-based heuristic strategy that achieves robust navigability' is not supported by direct evidence. The manuscript presents a post-hoc correlation between RL trajectories and local Eulerian topology but does not report results from deploying the extracted rule (e.g., a Q-sign and Voronoi-cell-based policy) as a standalone controller and comparing its success rates, energy, and barrier-crossing statistics to the RL agent across the Ra range.

    Authors: We agree that the current wording overstates the direct validation of the extracted heuristic. The mapping was performed post-hoc to interpret the RL policy's observed behaviors (crossing repelling LCS and following attracting ones), and its consistency with successful navigation is supported by the LCS and POD analyses. However, we did not implement or benchmark the rule-based policy as a standalone controller. In revision we will modify the abstract and heuristic section to state that the mapping 'yields a candidate interpretable heuristic whose alignment with the RL trajectories suggests it may support robust navigability,' and we will add a short discussion of how the rule could be implemented. We will also include a limited comparison of a simple Q/Voronoi-based policy against the RL agent for one or two Ra values if computational resources allow. revision: partial

  2. Referee: [Results (RL performance metrics)] Results section on RL performance: the reported abrupt/gradual transitions in success rate, the decrease in propulsion energy with Ra, and the superiority over constant-heading control are presented without error bars, statistics from multiple independent training runs, or ablation checks on state representation, reward formulation, or network architecture. This makes it impossible to assess whether the trends are robust or sensitive to training stochasticity.

    Authors: We concur that the absence of error bars and multi-seed statistics limits assessment of robustness. In the revised manuscript we will rerun the RL training for each Ra with at least five independent random seeds, report mean success rates, energy, and completion times with standard-deviation error bars, and add a short subsection on sensitivity to state representation and reward weights. These additions will appear in the main Results section and supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper trains an RL controller on direct numerical simulations of 2D Rayleigh-Bénard convection, then applies post-processing (POD, LCS, Voronoi tessellation, Q-criterion) to interpret the learned policy and distill a heuristic. No central result is obtained by fitting a parameter to data and then re-using that same parameter as a 'prediction'; no self-definitional loop exists where a quantity is defined in terms of itself; and no load-bearing premise reduces to a self-citation chain. All performance metrics (success rate, energy, completion time) are computed from independent forward simulations of the RL policy and the constant-heading baseline. The mapping to Eulerian topology is descriptive analysis, not a redefinition that forces the claimed navigability.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on the standard incompressible Navier-Stokes equations at fixed Prandtl number together with the assumption that the RL policy converges to a near-optimal strategy under the stated reward and actuation bounds.

free parameters (1)
  • A_max
    Upper bound on propulsive acceleration; its value controls the abruptness of the success-rate transition and is varied parametrically.
axioms (2)
  • domain assumption Two-dimensional incompressible flow governed by Boussinesq approximation
    Invoked to generate the carrier flow at the stated Ra, Pr, and Gamma.
  • domain assumption Reinforcement-learning policy converges to a stable navigation strategy
    Required for the reported success rates and energy savings to be reproducible.

pith-pipeline@v0.9.0 · 5607 in / 1338 out tokens · 21577 ms · 2026-05-10T10:45:51.424632+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence ...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    , Nagy, M

    Akos, Z. , Nagy, M. & Vicsek, T. 2008 Comparing bird and human soaring strategies . Proc. Natl Acad. Sci. USA 105 , 4139--4143

  4. [4]

    , Di Leonardo, R

    Bechinger, C. , Di Leonardo, R. , Löwen, H. , Reichhardt, C. , Volpe, G. & Volpe, G. 2016 Active particles in complex and crowded environments . Rev. Mod. Phys. 88 , 045006

  5. [5]

    Bellemare, M. G. , Candido, S. , Castro, P. S. , Gong, J. , Machado, M. C. , Moitra, S. , Ponda, S. S. & Wang, Z. 2020 Autonomous navigation of stratospheric balloons using reinforcement learning . Nature 588 , 77--82

  6. [6]

    , Bonaccorso, F

    Biferale, L. , Bonaccorso, F. , Buzzicotti, M. , Clark di Leoni, P. & Gustavsson, K. 2019 Zermelo’s problem: Optimal point-to-point navigation in 2D turbulent flows using reinforcement learning . Chaos 29 , 103138

  7. [7]

    , Biferale, L

    Borra, F. , Biferale, L. , Cencini, M. & Celani, A. 2022 Reinforcement learning for pursuit and evasion of microswimmers at low Reynolds number . Phys. Rev. Fluids 7 , 023103

  8. [8]

    Brunton, S. L. , Noack, B. R. & Koumoutsakos, P. 2020 Machine learning for fluid mechanics . Annu. Rev. Fluid Mech. 52 , 477--508

  9. [9]

    , Sergent, A

    Castillo-Castellanos, A. , Sergent, A. , Podvin, B. & Rossi, M. 2019 Cessation and reversals of large-scale structures in square Rayleigh--Bénard cells . J. Fluid Mech. 877 , 922--954

  10. [10]

    & Schumacher, J

    Chillà, F. & Schumacher, J. 2012 New perspectives in turbulent Rayleigh--Bénard convection . Eur. Phys. J. E 35 , 58

  11. [11]

    , Gustavsson, K

    Cichos, F. , Gustavsson, K. , Mehlig, B. & Volpe, G. 2020 Machine learning for active matter . Nat. Mach. Intell. 2 (2), 94--103

  12. [12]

    , Mahault, B

    Cocconi, L. , Mahault, B. & Piro, L. 2025 Dissipation-accuracy tradeoffs in autonomous control of smart active matter . New J. Phys. 27 (1), 013002

  13. [13]

    , Gustavsson, K

    Colabrese, S. , Gustavsson, K. , Celani, A. & Biferale, L. 2017 Flow navigation by smart microswimmers via reinforcement learning . Phys. Rev. Lett. 118 , 158004

  14. [14]

    Emran, M. S. & Schumacher, J. 2010 Lagrangian tracer dynamics in a closed cylindrical turbulent convection cell . Phys. Rev. E 82 , 016303

  15. [15]

    Fischer, P. F. 1997 An overlapping Schwarz method for spectral element solution of the incompressible Navier--Stokes equations . J. Comput. Phys. 133 (1), 84--101

  16. [16]

    , Tao, X

    Gao, Z.-Y. , Tao, X. , Huang, S.-D. , Bao, Y. & Xie, Y.-C. 2024 Flow state transition induced by emergence of orbiting satellite eddies in two-dimensional turbulent Rayleigh--B \'e nard convection . J. Fluid Mech. 997 , A54

  17. [17]

    , Mandralis, I

    Gunnarson, P. , Mandralis, I. , Novati, G. , Koumoutsakos, P. & Dabiri, J. O. 2021 Learning efficient navigation in vortical flow fields . Nat. Commun. 12 , 7143

  18. [18]

    , Berglund, F

    Gustavsson, K. , Berglund, F. , Jonsson, P. R. & Mehlig, B. 2016 Preferential sampling and small-scale clustering of gyrotactic microswimmers in turbulence . Phys. Rev. Lett. 116 , 108104

  19. [19]

    , Biferale, L

    Gustavsson, K. , Biferale, L. , Celani, A. & Colabrese, S. 2017 Finding efficient swimming strategies in a three-dimensional chaotic flow by reinforcement learning . Eur. Phys. J. E 40 , 110

  20. [20]

    , Zhou, A

    Haarnoja, T. , Zhou, A. , Abbeel, P. & Levine, S. 2018 Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor . In Proc. Int. Conf. Mach. Learn. (ICML)\/ , pp. 1861--1870 . PMLR

  21. [21]

    2015 Lagrangian coherent structures

    Haller, G. 2015 Lagrangian coherent structures . Annu. Rev. Fluid Mech. 47 , 137--162

  22. [22]

    Hang, Haotian , Jiao, Yusheng , Merel, Josh & Kanso, Eva 2026 Flow currents support simple and versatile trail-tracking strategies . Phys. Rev. Res. 8 (1), 013019

  23. [23]

    , Bao, Y

    He, J.-C. , Bao, Y. & Chen, X. 2024 Turbulent boundary layers in thermal convection at moderately high Rayleigh numbers . Phys. Fluids 36 (2), 025140

  24. [24]

    Heinonen, R. A. , Biferale, L. , Celani, A. & Vergassola, M. 2025 Exploring Bayesian olfactory search in realistic turbulent flows . Phys. Rev. Fluids 10 (6), 064614

  25. [25]

    2025 Transport phenomena in microswimmer suspensions: migration, collective motion, diffusion and rheology

    Ishikawa, T. 2025 Transport phenomena in microswimmer suspensions: migration, collective motion, diffusion and rheology . J. Fluid Mech. 1016 , P1

  26. [26]

    , Hang, H

    Jiao, Y. , Hang, H. , Merel, J. & Kanso, E. 2025 Sensing flow gradients is necessary for learning autonomous underwater navigation . Nat. Commun. 16 (1), 3044

  27. [27]

    Kooij, G. L. , Botchev, M. A. , Frederix, E. M. , Geurts, B. J. , Horn, S. , Lohse, D. , van der Poel, E. P. , Shishkina, O. , Stevens, R. J. & Verzicco, R. 2018 Comparison of computational codes for direct numerical simulations of turbulent Rayleigh--B \'e nard convection . Comput. Fluids 166 , 1--8

  28. [28]

    IEEE Access 11 , 118916--118930

    Krishna, Kartik , Brunton, Steven L & Song, Zhuoyuan 2023 Finite time lyapunov exponent analysis of model predictive control and reinforcement learning . IEEE Access 11 , 118916--118930

  29. [29]

    , Song, Z

    Krishna, K. , Song, Z. & Brunton, S. L. 2022 Finite-horizon, energy-efficient trajectories in unsteady flows . Proc. R. Soc. A 478 (2258), 20210255

  30. [30]

    & Powers, T

    Lauga, E. & Powers, T. R. 2009 The hydrodynamics of swimming microorganisms . Rep. Prog. Phys. 72 (9), 096601

  31. [31]

    Laurent, K. M. , Fogg, B. , Ginsburg, T. , Halverson, C. , Lanzone, M. J. , Miller, T. A. , Winkler, D. W. & Bewley, G. P. 2021 Turbulence explains the accelerations of an eagle in natural flight . Proc. Natl Acad. Sci. USA 118 (23), e2102588118

  32. [32]

    & Shishkina, O

    Lohse, D. & Shishkina, O. 2023 Ultimate turbulent thermal convection . Phys. Today 76 (11), 26--32

  33. [33]

    & Shishkina, O

    Lohse, D. & Shishkina, O. 2024 Ultimate Rayleigh--B \'e nard turbulence . Rev. Mod. Phys. 96 (3), 035001

  34. [34]

    & Xia, K

    Lohse, D. & Xia, K. Q. 2010 Small-scale properties of turbulent Rayleigh--B \'e nard convection . Annu. Rev. Fluid Mech. 42 , 335--364

  35. [35]

    & Eloy, C

    Loisy, A. & Eloy, C. 2022 Searching for a source without gradients: how good is infotaxis and how to beat it . Proc. R. Soc. A 478 (2262), 20220118

  36. [36]

    , Martin, M

    Masmitja, I. , Martin, M. , O’Reilly, T. , Kieft, B. , Palomeras, N. , Navarro, J. & Katija, K. 2023 Dynamic robotic tracking of underwater targets using reinforcement learning . Sci. Robot. 8 (80), eade7811

  37. [37]

    , Bukov, M

    Mehta, P. , Bukov, M. , Wang, C. H. , Day, A. G. R. , Richardson, C. , Fisher, C. K. & Schwab, D. J. 2019 A high-bias, low-variance introduction to Machine Learning for physicists . Phys. Rep. 810 , 1--124

  38. [38]

    , Loisy, A

    Monthiller, R. , Loisy, A. , Koehl, M. A. , Favier, B. & Eloy, C. 2022 Surfing on turbulence: a strategy for planktonic navigation . Phys. Rev. Lett. 129 (6), 064502

  39. [39]

    , Fischer, A

    Mui \ n os-Landin, S. , Fischer, A. , Holubec, V. & Cichos, F. 2021 Reinforcement learning with artificial microswimmers . Sci. Robot. 6 (52), eabd9285

  40. [40]

    & Liebchen, B

    Nasiri, M. & Liebchen, B. 2022 Reinforcement learning of optimal active particle navigation . New J. Phys. 24 (7), 073042

  41. [41]

    , Scheel, J

    Pandey, A. , Scheel, J. D. & Schumacher, J. 2018 Turbulent superstructures in Rayleigh--B \'e nard convection . Nat. Commun. 9 , 2118

  42. [42]

    , Heinonen, R

    Piro, L. , Heinonen, R. A. , Cencini, M. & Biferale, L. 2025 Many wrong models approach to localise an odour source in turbulence with static sensors . J. Turbul. 26 (5), 153--173

  43. [43]

    , Vilfan, A

    Piro, L. , Vilfan, A. , Golestanian, R. & Mahault, B. 2024 Energetic cost of microswimmer navigation: The role of body shape . Phys. Rev. Res. 6 (1), 013274

  44. [44]

    & Sergent, A

    Podvin, B. & Sergent, A. 2015 A large-scale investigation of wind reversal in a square Rayleigh--B \'e nard cell . J. Fluid Mech. 766 , 172--201

  45. [45]

    , Marchioli, C

    Qiu, J. , Marchioli, C. & Zhao, L. 2022 a\/ A review on gyrotactic swimmers in turbulent flows . Acta Mech. Sin. 38 (8), 722323

  46. [46]

    , Mousavi, N

    Qiu, J. , Mousavi, N. , Gustavsson, K. , Xu, C. , Mehlig, B. & Zhao, L. 2022 b\/ Navigation of micro-swimmers in steady flow: the importance of symmetries . J. Fluid Mech. 932 , A10

  47. [47]

    , Mousavi, N

    Qiu, J. , Mousavi, N. , Zhao, L. & Gustavsson, K. 2022 c\/ Active gyrotactic stability of microswimmers using hydromechanical signals . Phys. Rev. Fluids 7 (1), 014311

  48. [48]

    , Celani, A

    Reddy, G. , Celani, A. , Sejnowski, T. J. & Vergassola, M. 2016 Learning to soar in turbulent environments . Proc. Natl. Acad. Sci. U.S.A. 113 (33), E4877--E4884

  49. [49]

    , Wong-Ng, J

    Reddy, G. , Wong-Ng, J. , Celani, A. , Sejnowski, T. J. & Vergassola, M. 2018 Glider soaring via reinforcement learning in the field . Nature 562 (7726), 236--239

  50. [50]

    , Magnoli, N

    Rigolli, N. , Magnoli, N. , Rosasco, L. & Seminara, A. 2022 Learning to predict target location with turbulent odor plumes . eLife 11 , e72196

  51. [51]

    , Samtaney, R

    Samuel, R. , Samtaney, R. & Verma, M. K. 2022 Large-eddy simulation of Rayleigh--B \'e nard convection at extreme Rayleigh numbers . Phys. Fluids 34 (7), 075133

  52. [52]

    , Stahn, M

    Schneide, C. , Stahn, M. , Pandey, A. , Junge, O. , Koltai, P. , Padberg-Gehle, K. & Schumacher, J. 2019 Lagrangian coherent sets in turbulent Rayleigh--B \'e nard convection . Phys. Rev. E 100 (5), 053103

  53. [53]

    & Stark, H

    Schneider, E. & Stark, H. 2019 Optimal steering of a smart active particle . EPL 127 (6), 64003

  54. [54]

    , Carrara, F

    Sengupta, A. , Carrara, F. & Stocker, R. 2017 Phytoplankton can actively diversify their migration strategy in response to turbulent cues . Nature 543 (7646), 555--558

  55. [55]

    & Lohse, D

    Shishkina, O. & Lohse, D. 2024 Ultimate Regime of Rayleigh-B \'e nard Turbulence: Subregimes and Their Scaling Relations for the Nusselt vs Rayleigh and Prandtl Numbers . Phys. Rev. Lett. 133 (14), 144001

  56. [56]

    Simons, A. M. 2004 Many wrongs: the advantage of group navigation . Trends Ecol. Evol. 19 (9), 453--455

  57. [57]

    , Podvin, B

    Soucasse, L. , Podvin, B. , Rivi \`e re, P. & Soufiani, A. 2019 Proper orthogonal decomposition analysis and modelling of large-scale flow reorientations in a cubic Rayleigh--B \'e nard cell . J. Fluid Mech. 881 , 23--50

  58. [58]

    , Villermaux, E

    Vergassola, M. , Villermaux, E. & Shraiman, B. I. 2007 Infotaxis as a strategy for searching without gradients . Nature 445 (7126), 406--409

  59. [59]

    , Novati, G

    Verma, S. , Novati, G. & Koumoutsakos, P. 2018 Efficient collective swimming by harnessing vortices through deep reinforcement learning . Proc. Natl. Acad. Sci. U.S.A. 115 (23), 5849--5854

  60. [60]

    Wang, B. F. , Zhou, Q. & Sun, C. 2020 a\/ Vibration-induced boundary-layer destabilization achieves massive heat-transport enhancement . Sci. Adv. 6 (21), eaaz8239

  61. [61]

    , Verzicco, R

    Wang, Q. , Verzicco, R. , Lohse, D. & Shishkina, O. 2020 b\/ Multiple states in turbulent large-aspect-ratio thermal convection: what determines the number of convection rolls? Phys. Rev. Lett. 125 (7), 074501

  62. [62]

    , Bishop, C

    Weimerskirch, H. , Bishop, C. , Jeanniard-du Dot, T. , Prudor, A. & Sachs, G. 2016 Frigate birds track atmospheric conditions over months-long transoceanic flights . Science 353 (6294), 74--78

  63. [63]

    Xia, K. Q. , Chong, K. L. , Ding, G. Y. & Zhang, L. 2025 Some fundamental issues in buoyancy-driven flows with implications for geophysical and astrophysical systems . Acta Mech. Sin. 41 (1), 324287

  64. [64]

    Xia, K. Q. , Huang, S. D. , Xie, Y. C. & Zhang, L. 2023 Tuning heat transport via coherent structure manipulation: recent advances in thermal turbulence . Natl. Sci. Rev. p. nwad012

  65. [65]

    , Shi, L

    Xu, A. , Shi, L. & Zhao, T. S. 2017 Accelerated lattice Boltzmann simulation using GPU and OpenACC with data management . Int. J. Heat Mass Transf. 109 , 577--588

  66. [66]

    Xu, A. , Wu, H. L. & Xi, H. D. 2022 Migration of self-propelling agent in a turbulent environment with minimal energy consumption . Phys. Fluids 34 (3), 035117

  67. [67]

    Xu, A. , Wu, H. L. & Xi, H. D. 2023 Long-distance migration with minimal energy consumption in a thermal turbulent environment . Phys. Rev. Fluids 8 (2), 023502

  68. [68]

    , Davoodiianidalik, M

    Yang, J. , Davoodiianidalik, M. , Xia, H. , Punzmann, H. , Shats, M. & Francois, N. 2019 Passive propulsion in turbulent flows . Phys. Rev. Fluids 4 (10), 104608

  69. [69]

    , Zhou, Q

    Zhang, Y. , Zhou, Q. & Sun, C. 2017 Statistics of kinetic and thermal energy dissipation rates in two-dimensional turbulent Rayleigh--B \'e nard convection . J. Fluid Mech. 814 , 165--184

  70. [70]

    , Fang, W

    Zhu, G. , Fang, W. Z. & Zhu, L. 2022 Optimizing low-Reynolds-number predation via optimal control and reinforcement learning . J. Fluid Mech. 944 , A3

  71. [71]

    , Mathai, V

    Zhu, X. , Mathai, V. , Stevens, R. J. , Verzicco, R. & Lohse, D. 2018 Transition to the ultimate regime in two-dimensional Rayleigh--B \'e nard convection . Phys. Rev. Lett. 120 (14), 144502

  72. [72]

    , Kang, L

    Zhu, Y. , Kang, L. , Tong, X. , Ma, J. , Tian, F. & Fan, D. 2025 Intermittent swimmers optimize energy expenditure with flick-to-flick motor control . J. Fluid Mech. 1006 , A27