pith. sign in

arxiv: 2605.21329 · v1 · pith:O2YWQ7UHnew · submitted 2026-05-20 · ⚛️ physics.flu-dyn · physics.bio-ph· physics.comp-ph

Smart strategies to navigate turbulent odor plumes reorienting to local wind

Pith reviewed 2026-05-21 03:33 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn physics.bio-phphysics.comp-ph
keywords turbulent odor plumesreinforcement learningolfactory navigationwind estimationmemory kerneldirect numerical simulationcast and surge
0
0 comments X

The pith

A reinforcement learning policy using time since odor detection and local wind estimation outperforms cast-and-surge in turbulent plumes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a reinforcement-learning framework for navigating turbulent odor plumes where an agent uses only the time elapsed since the last odor detection and selects actions based on a locally estimated wind direction smoothed by an exponential memory kernel. In simulations with a mild mean wind, the learned policy performs better than the traditional cast-and-surge strategy for any choice of wind memory time and adjusts its path according to how reliable the wind estimate is. In fully isotropic turbulence without mean flow, the best results occur when the agent integrates wind direction over an intermediate time scale, showing that the duration of wind memory is a useful resource that depends on the flow regime. A sympathetic reader would care because this offers a minimal set of internal states for effective search in messy natural flows, with potential applications to both biology and robotics.

Core claim

The paper establishes that policies trained in direct numerical simulations of turbulent odor plumes, employing a wind-relative reinforcement learning approach with exponential filtering of wind estimates based on time since last detection, achieve superior performance to cast-and-surge methods in the presence of mean wind and exhibit peak performance at intermediate memory times in isotropic turbulence.

What carries the argument

A wind-relative reinforcement-learning framework where the agent maintains elapsed time since last odor detection as its single internal variable and chooses movements relative to wind direction estimated with an exponential memory kernel.

If this is right

  • In mild mean wind, the learned policy adapts its movement pattern to wind-estimation quality.
  • In isotropic turbulence, performance peaks at an intermediate wind memory time.
  • Temporal wind integration serves as a regime-dependent resource for olfactory navigation.
  • The approach supplies a compact design principle for minimal robotic olfactory navigation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Biological searchers might tune their wind integration time based on measured turbulence levels in different environments.
  • Robotic systems could implement this with simple sensors for time since detection and local flow direction to locate chemical sources more efficiently than fixed strategies.
  • Testing the policy in flows with varying plume strengths or in three dimensions could identify additional optimal memory parameters.

Load-bearing premise

Direct numerical simulations of turbulence must accurately represent the velocity and odor fields found in actual natural settings, and policies learned there must transfer to real physical robots or animals.

What would settle it

Deploying a physical robot equipped with the learned policy in a laboratory turbulent flow setup with mean wind and comparing its source localization success rate to that of a cast-and-surge controller; failure to show improvement would challenge the claim.

read the original abstract

Olfactory search in turbulent environments is a sensorimotor challenge solved with remarkable efficiency by many animals, yet replicating this ability in artificial systems remains difficult because detections are intermittent and wind direction fluctuates strongly, rendering standard search strategies unreliable. We introduce a wind-relative reinforcement-learning framework in which an agent navigates a turbulent plume with a single internal variable -- the elapsed time since the last odor detection -- and selects actions relative to a locally estimated wind direction filtered through an exponential memory kernel. Policies are trained and evaluated in direct numerical simulations of turbulence, capturing the multi-scale characteristics of velocity and odor fields in natural environments, both in the presence and absence of a mean wind. In a mild mean wind, the learned policy outperforms cast-and-surge regardless of the wind memory time, yet adapts its movement pattern to wind-estimation quality. In isotropic turbulence, performance peaks at an intermediate wind memory time, identifying temporal wind integration as a regime-dependent resource. Our results highlight the importance of developing and validating olfactory-navigation strategies under realistic turbulent conditions, and offer a compact design principle for minimal robotic olfactory navigation and testable predictions for biological search behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a reinforcement-learning framework for olfactory search in turbulent plumes, where an agent maintains a single internal state (elapsed time since last odor detection) and selects actions relative to a locally estimated wind direction passed through an exponential memory kernel. Policies are trained and evaluated inside direct numerical simulations of both mildly advected and isotropic turbulence. The central claims are that the learned policy outperforms cast-and-surge in the presence of a mean wind irrespective of memory time while adapting its locomotion to wind-estimation fidelity, and that performance in isotropic turbulence reaches a maximum at an intermediate memory time, identifying temporal wind integration as a regime-dependent resource.

Significance. If the DNS fidelity and statistical robustness of the performance comparisons are confirmed, the work supplies a compact, low-dimensional design rule for minimal robotic olfactory navigation and generates falsifiable predictions for biological search behavior. It also demonstrates the necessity of evaluating sensorimotor policies inside multi-scale turbulent fields rather than idealized plume models.

major comments (2)
  1. [DNS parameters section] DNS parameters section: the Reynolds number, integral length scale, and scalar source size are not reported in sufficient detail to verify that the simulated velocity and odor fields reproduce the intermittency, correlation times, and meandering statistics of atmospheric boundary-layer turbulence; without this, the reported optimum memory time in the isotropic case risks being an artifact of the particular fluctuation spectrum rather than a general resource-allocation principle.
  2. [Results on performance comparison] Results on performance comparison: the claim that the learned policy outperforms cast-and-surge in mild mean wind is presented without error bars, number of independent DNS realizations, or statistical tests; because this outperformance is load-bearing for the assertion that the policy adapts to wind-estimation quality, quantitative measures of variability are required.
minor comments (2)
  1. [Methods] The exponential memory kernel should be written explicitly as an equation (with time constant) in the methods rather than described only in prose.
  2. [Figures] Figure captions for trajectory and performance plots should state the precise memory times shown and the number of trajectories averaged.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. Where revisions are warranted, we will incorporate the suggested changes in the revised version of the manuscript.

read point-by-point responses
  1. Referee: [DNS parameters section] DNS parameters section: the Reynolds number, integral length scale, and scalar source size are not reported in sufficient detail to verify that the simulated velocity and odor fields reproduce the intermittency, correlation times, and meandering statistics of atmospheric boundary-layer turbulence; without this, the reported optimum memory time in the isotropic case risks being an artifact of the particular fluctuation spectrum rather than a general resource-allocation principle.

    Authors: We agree with the referee that more detailed reporting of the DNS parameters is essential for verifying the turbulence characteristics and ensuring the generality of our findings. In the revised manuscript, we will add a dedicated subsection or table in the Methods section specifying the Reynolds number (based on the integral length scale and velocity fluctuations), the integral length scale, the scalar source size, and other relevant parameters such as the grid resolution and time step. Additionally, we will include a brief discussion or references demonstrating that the simulated fields reproduce key statistics like intermittency and correlation times observed in atmospheric boundary-layer turbulence. This will allow readers to confirm that the optimum memory time reflects a general principle rather than a simulation-specific artifact. revision: yes

  2. Referee: [Results on performance comparison] Results on performance comparison: the claim that the learned policy outperforms cast-and-surge in mild mean wind is presented without error bars, number of independent DNS realizations, or statistical tests; because this outperformance is load-bearing for the assertion that the policy adapts to wind-estimation quality, quantitative measures of variability are required.

    Authors: We appreciate the referee's emphasis on statistical rigor in the performance comparisons. In the revised manuscript, we will augment the Results section with error bars (e.g., standard error of the mean) on the performance metrics, explicitly state the number of independent DNS realizations used for each policy and condition, and include statistical tests (such as paired t-tests or non-parametric equivalents) to quantify the significance of the outperformance relative to cast-and-surge. These additions will provide quantitative support for the claim that the learned policy adapts its locomotion to wind-estimation fidelity. revision: yes

Circularity Check

0 steps flagged

No circularity: RL policies evaluated against external baseline in independent DNS runs

full rationale

The paper trains reinforcement-learning agents with a single internal state (time since last odor detection) and a tunable exponential memory kernel for wind estimation, then evaluates the resulting policies by direct comparison to the cast-and-surge heuristic inside fresh DNS realizations of the turbulent scalar field. The memory time is varied parametrically to map performance curves rather than being optimized or fitted to the final reported optimum; the claimed peak at intermediate memory time and the outperformance of cast-and-surge therefore emerge from the simulation statistics themselves. No load-bearing step reduces to a self-definition, a fitted input renamed as prediction, or a self-citation chain; the derivation chain remains self-contained against the external baseline and the independently generated flow fields.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the fidelity of DNS turbulence modeling and the ability of the chosen RL state-action space to capture essential navigation behavior; no new physical entities are postulated.

free parameters (2)
  • wind memory time constant
    Exponential kernel time scale varied across simulations; performance reported to peak at an intermediate value.
  • RL training hyperparameters
    Policy network and reward parameters chosen to produce the reported policies.
axioms (1)
  • domain assumption Direct numerical simulations of the Navier-Stokes equations with scalar transport accurately reproduce the statistics of natural turbulent odor plumes.
    Invoked to justify using DNS as the environment for training and evaluation.

pith-pipeline@v0.9.0 · 5755 in / 1342 out tokens · 37195 ms · 2026-05-21T03:33:55.975082+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

  1. [1]

    R. T. Card´ e and M. A. Willis, J. Chem. Ecol.34, 854 (2008)

  2. [2]

    A. M. M. Matheson, A. J. Lanz, A. M. Medina, A. M. Licata, T. A. Currier, M. H. Syed, and K. I. Nagel, Nat. Comm.13, 4613 (2022)

  3. [3]

    Biferale, A

    L. Biferale, A. Crisanti, M. Vergassola, and A. Vulpiani, Phys. Fluids7, 2725 (1995)

  4. [4]

    B. I. Shraiman and E. D. Siggia, Nature405, 639 (2000)

  5. [5]

    J. P. Crimaldi and J. R. Koseff, Exp. Fluids31, 90 (2001)

  6. [6]

    Celani, E

    A. Celani, E. Villermaux, and M. Vergassola, Phys. Rev. X4, 041015 (2014)

  7. [7]

    H. C. Berg,Random walks in biology(Princeton Univer- sity Press, 1993)

  8. [8]

    Murlis, J

    J. Murlis, J. S. Elkinton, and R. T. Card´ e, Annu. Rev. Entomol.37, 505 (1992)

  9. [9]

    R. T. Card´ e, Annu. Rev. Entomol.66, 317 (2021)

  10. [10]

    van Breugel, R

    F. van Breugel, R. Jewell, and J. Houle, J. R. Soc. Inter- face19, 20220258 (2022)

  11. [11]

    Reddy, V

    G. Reddy, V. N. Murthy, and M. Vergassola, Annu. Rev. Condens. Matter Phys.13, 191 (2022)

  12. [12]

    S. D. Stupski and F. van Breugel, Curr. Biol.34, 4397 (2024)

  13. [13]

    Houle, A

    J. Houle, A. Lopez, and F. van Breugel, bioRxiv 10.64898/2026.04.05.716000 (2026)

  14. [14]

    M. P. Suver, A. M. M. Matheson, S. Sarkar, M. Damiata, D. Schoppik, and K. I. Nagel, Neuron102, 828 (2019)

  15. [15]

    Hutchinson, C

    M. Hutchinson, C. Liu, and W.-H. Chen, J. Field Robot. 36, 797 (2019)

  16. [16]

    Wang and S

    L. Wang and S. Pang, J. Mar. Sci. Eng.11, 10.3390/jmse11020366 (2023)

  17. [17]

    Mansfield and A

    D. Mansfield and A. Montazeri, Front. Robot. AI11, 10.3389/frobt.2024.1336612 (2024)

  18. [18]

    Fukui, T

    C. Fukui, T. Uchida, S. Koizumi, Y. Murayama, H. Liu, T. Nakata, and D. Terutsuki, npj Robot.3, 4 (2025)

  19. [19]

    K. L. Baker, M. Dickinson, T. M. Findley, D. H. Gire, M. Louis, M. P. Suver, J. V. Verhagen, K. I. Nagel, and M. C. Smear, J. Neurosci.38, 9383 (2018)

  20. [20]

    Marques, U

    L. Marques, U. Nunes, and A. T. de Almeida, Thin solid films418, 51 (2002)

  21. [21]

    Celani and E

    A. Celani and E. Panizon, Olfactory search, inTarget Search Problems, edited by D. Grebenkov, R. Metzler, and G. Oshanin (Springer Nature Switzerland, Cham,

  22. [22]

    Vergassola, E

    M. Vergassola, E. Villermaux, and B. I. Shraiman, Na- ture445, 406 (2007)

  23. [23]

    Loisy and C

    A. Loisy and C. Eloy, Proc. R. Soc. Lond. A478, 20220118 (2022)

  24. [24]

    Loisy and R

    A. Loisy and R. A. Heinonen, Eur. Phys. J. E46, 17 (2023)

  25. [25]

    R. A. Heinonen, L. Biferale, A. Celani, and M. Vergas- sola, Phys. Rev. E107, 055105 (2023)

  26. [26]

    R. A. Heinonen, L. Biferale, A. Celani, and M. Vergas- sola, Phys. Rev. Fluids10, 064614 (2025)

  27. [27]

    L. Piro, R. A. Heinonen, M. Carbone, L. Biferale, and M. Cencini, Phys. Rev. E113, 044401 (2026)

  28. [28]

    Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery

    M. Rando, R. A. Heinonen, Y. Qi, and A. Seminara, arXiv (2026), 2605.15938 [physics.bio-ph]

  29. [29]

    K. V. B. Verano, E. Panizon, and A. Celani, Proc. Natl. Acad. Sci.120, e2304230120 (2023)

  30. [30]

    Balkovsky and B

    E. Balkovsky and B. I. Shraiman, Proc. Natl. Acad. Sci. 99, 12589 (2002)

  31. [31]

    Rando, M

    M. Rando, M. James, A. Verri, L. Rosasco, and A. Sem- inara, eLife 10.7554/elife.102906.2 (2025)

  32. [32]

    Y. Zhao, B. Chen, X. Wang, Z. Zhu, Y. Wang, G. Cheng, R. Wang, R. Wang, M. He, and Y. Liu, Inf. Sci.588, 67 (2022)

  33. [33]

    S. H. Singh, F. van Breugel, R. P. N. Rao, and B. W. Brunton, Nat. Mach. Intell.5, 58 (2023)

  34. [34]

    R. S. Sutton and A. G. Barto,Reinforcement Learning, Second Edition: An Introduction(MIT Press, 2018)

  35. [35]

    Masson, M

    J.-B. Masson, M. B. Bechet, and M. Vergassola, J. Phys. A42, 434009 (2009)

  36. [36]

    Barbieri, S

    C. Barbieri, S. Cocco, and R. Monasson, Europhys. Lett. 94, 20005 (2011)

  37. [37]

    Boffetta and R

    G. Boffetta and R. E. Ecke, Annu. Rev. Fluid Mech.44, 427 (2012)

  38. [38]

    Frisch,Turbulence: The Legacy of A

    U. Frisch,Turbulence: The Legacy of A. N. Kolmogorov (Cambridge University Press, 1995)

  39. [39]

    Schneider, Comput

    K. Schneider, Comput. Fluids34, 1223 (2005). Acknowledgements We thank A. Celani and A. Loisy for useful discussions. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement Nos. 882340 and 101002724), by the Air Force Office of Scientific Research (grant FA865...