Smart strategies to navigate turbulent odor plumes reorienting to local wind

Agnese Seminara; Lorenzo Piro; Luca Biferale; Marco Rando; Massimo Cencini; Maurizio Carbone; Robin A. Heinonen

arxiv: 2605.21329 · v1 · pith:O2YWQ7UHnew · submitted 2026-05-20 · ⚛️ physics.flu-dyn · physics.bio-ph· physics.comp-ph

Smart strategies to navigate turbulent odor plumes reorienting to local wind

Lorenzo Piro , Maurizio Carbone , Luca Biferale , Massimo Cencini , Robin A. Heinonen , Marco Rando , Agnese Seminara This is my paper

Pith reviewed 2026-05-21 03:33 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn physics.bio-phphysics.comp-ph

keywords turbulent odor plumesreinforcement learningolfactory navigationwind estimationmemory kerneldirect numerical simulationcast and surge

0 comments

The pith

A reinforcement learning policy using time since odor detection and local wind estimation outperforms cast-and-surge in turbulent plumes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a reinforcement-learning framework for navigating turbulent odor plumes where an agent uses only the time elapsed since the last odor detection and selects actions based on a locally estimated wind direction smoothed by an exponential memory kernel. In simulations with a mild mean wind, the learned policy performs better than the traditional cast-and-surge strategy for any choice of wind memory time and adjusts its path according to how reliable the wind estimate is. In fully isotropic turbulence without mean flow, the best results occur when the agent integrates wind direction over an intermediate time scale, showing that the duration of wind memory is a useful resource that depends on the flow regime. A sympathetic reader would care because this offers a minimal set of internal states for effective search in messy natural flows, with potential applications to both biology and robotics.

Core claim

The paper establishes that policies trained in direct numerical simulations of turbulent odor plumes, employing a wind-relative reinforcement learning approach with exponential filtering of wind estimates based on time since last detection, achieve superior performance to cast-and-surge methods in the presence of mean wind and exhibit peak performance at intermediate memory times in isotropic turbulence.

What carries the argument

A wind-relative reinforcement-learning framework where the agent maintains elapsed time since last odor detection as its single internal variable and chooses movements relative to wind direction estimated with an exponential memory kernel.

If this is right

In mild mean wind, the learned policy adapts its movement pattern to wind-estimation quality.
In isotropic turbulence, performance peaks at an intermediate wind memory time.
Temporal wind integration serves as a regime-dependent resource for olfactory navigation.
The approach supplies a compact design principle for minimal robotic olfactory navigation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Biological searchers might tune their wind integration time based on measured turbulence levels in different environments.
Robotic systems could implement this with simple sensors for time since detection and local flow direction to locate chemical sources more efficiently than fixed strategies.
Testing the policy in flows with varying plume strengths or in three dimensions could identify additional optimal memory parameters.

Load-bearing premise

Direct numerical simulations of turbulence must accurately represent the velocity and odor fields found in actual natural settings, and policies learned there must transfer to real physical robots or animals.

What would settle it

Deploying a physical robot equipped with the learned policy in a laboratory turbulent flow setup with mean wind and comparing its source localization success rate to that of a cast-and-surge controller; failure to show improvement would challenge the claim.

read the original abstract

Olfactory search in turbulent environments is a sensorimotor challenge solved with remarkable efficiency by many animals, yet replicating this ability in artificial systems remains difficult because detections are intermittent and wind direction fluctuates strongly, rendering standard search strategies unreliable. We introduce a wind-relative reinforcement-learning framework in which an agent navigates a turbulent plume with a single internal variable -- the elapsed time since the last odor detection -- and selects actions relative to a locally estimated wind direction filtered through an exponential memory kernel. Policies are trained and evaluated in direct numerical simulations of turbulence, capturing the multi-scale characteristics of velocity and odor fields in natural environments, both in the presence and absence of a mean wind. In a mild mean wind, the learned policy outperforms cast-and-surge regardless of the wind memory time, yet adapts its movement pattern to wind-estimation quality. In isotropic turbulence, performance peaks at an intermediate wind memory time, identifying temporal wind integration as a regime-dependent resource. Our results highlight the importance of developing and validating olfactory-navigation strategies under realistic turbulent conditions, and offer a compact design principle for minimal robotic olfactory navigation and testable predictions for biological search behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper trains a minimal RL policy with time-since-detection and an exponential wind filter inside DNS turbulence, showing regime-dependent gains over cast-and-surge.

read the letter

The main point is that a reinforcement-learning agent using only elapsed time since the last odor hit and a simple exponential filter on local wind direction can beat cast-and-surge in direct numerical simulations of turbulent plumes. In cases with mild mean wind the policy wins across memory times; in isotropic turbulence performance peaks at an intermediate memory time, suggesting that temporal integration of wind is a resource that should be tuned to the flow statistics.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces a reinforcement-learning framework for olfactory search in turbulent plumes, where an agent maintains a single internal state (elapsed time since last odor detection) and selects actions relative to a locally estimated wind direction passed through an exponential memory kernel. Policies are trained and evaluated inside direct numerical simulations of both mildly advected and isotropic turbulence. The central claims are that the learned policy outperforms cast-and-surge in the presence of a mean wind irrespective of memory time while adapting its locomotion to wind-estimation fidelity, and that performance in isotropic turbulence reaches a maximum at an intermediate memory time, identifying temporal wind integration as a regime-dependent resource.

Significance. If the DNS fidelity and statistical robustness of the performance comparisons are confirmed, the work supplies a compact, low-dimensional design rule for minimal robotic olfactory navigation and generates falsifiable predictions for biological search behavior. It also demonstrates the necessity of evaluating sensorimotor policies inside multi-scale turbulent fields rather than idealized plume models.

major comments (2)

[DNS parameters section] DNS parameters section: the Reynolds number, integral length scale, and scalar source size are not reported in sufficient detail to verify that the simulated velocity and odor fields reproduce the intermittency, correlation times, and meandering statistics of atmospheric boundary-layer turbulence; without this, the reported optimum memory time in the isotropic case risks being an artifact of the particular fluctuation spectrum rather than a general resource-allocation principle.
[Results on performance comparison] Results on performance comparison: the claim that the learned policy outperforms cast-and-surge in mild mean wind is presented without error bars, number of independent DNS realizations, or statistical tests; because this outperformance is load-bearing for the assertion that the policy adapts to wind-estimation quality, quantitative measures of variability are required.

minor comments (2)

[Methods] The exponential memory kernel should be written explicitly as an equation (with time constant) in the methods rather than described only in prose.
[Figures] Figure captions for trajectory and performance plots should state the precise memory times shown and the number of trajectories averaged.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We have carefully considered each comment and provide point-by-point responses below. Where revisions are warranted, we will incorporate the suggested changes in the revised version of the manuscript.

read point-by-point responses

Referee: [DNS parameters section] DNS parameters section: the Reynolds number, integral length scale, and scalar source size are not reported in sufficient detail to verify that the simulated velocity and odor fields reproduce the intermittency, correlation times, and meandering statistics of atmospheric boundary-layer turbulence; without this, the reported optimum memory time in the isotropic case risks being an artifact of the particular fluctuation spectrum rather than a general resource-allocation principle.

Authors: We agree with the referee that more detailed reporting of the DNS parameters is essential for verifying the turbulence characteristics and ensuring the generality of our findings. In the revised manuscript, we will add a dedicated subsection or table in the Methods section specifying the Reynolds number (based on the integral length scale and velocity fluctuations), the integral length scale, the scalar source size, and other relevant parameters such as the grid resolution and time step. Additionally, we will include a brief discussion or references demonstrating that the simulated fields reproduce key statistics like intermittency and correlation times observed in atmospheric boundary-layer turbulence. This will allow readers to confirm that the optimum memory time reflects a general principle rather than a simulation-specific artifact. revision: yes
Referee: [Results on performance comparison] Results on performance comparison: the claim that the learned policy outperforms cast-and-surge in mild mean wind is presented without error bars, number of independent DNS realizations, or statistical tests; because this outperformance is load-bearing for the assertion that the policy adapts to wind-estimation quality, quantitative measures of variability are required.

Authors: We appreciate the referee's emphasis on statistical rigor in the performance comparisons. In the revised manuscript, we will augment the Results section with error bars (e.g., standard error of the mean) on the performance metrics, explicitly state the number of independent DNS realizations used for each policy and condition, and include statistical tests (such as paired t-tests or non-parametric equivalents) to quantify the significance of the outperformance relative to cast-and-surge. These additions will provide quantitative support for the claim that the learned policy adapts its locomotion to wind-estimation fidelity. revision: yes

Circularity Check

0 steps flagged

No circularity: RL policies evaluated against external baseline in independent DNS runs

full rationale

The paper trains reinforcement-learning agents with a single internal state (time since last odor detection) and a tunable exponential memory kernel for wind estimation, then evaluates the resulting policies by direct comparison to the cast-and-surge heuristic inside fresh DNS realizations of the turbulent scalar field. The memory time is varied parametrically to map performance curves rather than being optimized or fitted to the final reported optimum; the claimed peak at intermediate memory time and the outperformance of cast-and-surge therefore emerge from the simulation statistics themselves. No load-bearing step reduces to a self-definition, a fitted input renamed as prediction, or a self-citation chain; the derivation chain remains self-contained against the external baseline and the independently generated flow fields.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on the fidelity of DNS turbulence modeling and the ability of the chosen RL state-action space to capture essential navigation behavior; no new physical entities are postulated.

free parameters (2)

wind memory time constant
Exponential kernel time scale varied across simulations; performance reported to peak at an intermediate value.
RL training hyperparameters
Policy network and reward parameters chosen to produce the reported policies.

axioms (1)

domain assumption Direct numerical simulations of the Navier-Stokes equations with scalar transport accurately reproduce the statistics of natural turbulent odor plumes.
Invoked to justify using DNS as the environment for training and evaluation.

pith-pipeline@v0.9.0 · 5755 in / 1342 out tokens · 37195 ms · 2026-05-21T03:33:55.975082+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

R. T. Card´ e and M. A. Willis, J. Chem. Ecol.34, 854 (2008)

work page 2008
[2]

A. M. M. Matheson, A. J. Lanz, A. M. Medina, A. M. Licata, T. A. Currier, M. H. Syed, and K. I. Nagel, Nat. Comm.13, 4613 (2022)

work page 2022
[3]

Biferale, A

L. Biferale, A. Crisanti, M. Vergassola, and A. Vulpiani, Phys. Fluids7, 2725 (1995)

work page 1995
[4]

B. I. Shraiman and E. D. Siggia, Nature405, 639 (2000)

work page 2000
[5]

J. P. Crimaldi and J. R. Koseff, Exp. Fluids31, 90 (2001)

work page 2001
[6]

Celani, E

A. Celani, E. Villermaux, and M. Vergassola, Phys. Rev. X4, 041015 (2014)

work page 2014
[7]

H. C. Berg,Random walks in biology(Princeton Univer- sity Press, 1993)

work page 1993
[8]

Murlis, J

J. Murlis, J. S. Elkinton, and R. T. Card´ e, Annu. Rev. Entomol.37, 505 (1992)

work page 1992
[9]

R. T. Card´ e, Annu. Rev. Entomol.66, 317 (2021)

work page 2021
[10]

van Breugel, R

F. van Breugel, R. Jewell, and J. Houle, J. R. Soc. Inter- face19, 20220258 (2022)

work page 2022
[11]

Reddy, V

G. Reddy, V. N. Murthy, and M. Vergassola, Annu. Rev. Condens. Matter Phys.13, 191 (2022)

work page 2022
[12]

S. D. Stupski and F. van Breugel, Curr. Biol.34, 4397 (2024)

work page 2024
[13]

Houle, A

J. Houle, A. Lopez, and F. van Breugel, bioRxiv 10.64898/2026.04.05.716000 (2026)

work page doi:10.64898/2026.04.05.716000 2026
[14]

M. P. Suver, A. M. M. Matheson, S. Sarkar, M. Damiata, D. Schoppik, and K. I. Nagel, Neuron102, 828 (2019)

work page 2019
[15]

Hutchinson, C

M. Hutchinson, C. Liu, and W.-H. Chen, J. Field Robot. 36, 797 (2019)

work page 2019
[16]

Wang and S

L. Wang and S. Pang, J. Mar. Sci. Eng.11, 10.3390/jmse11020366 (2023)

work page doi:10.3390/jmse11020366 2023
[17]

Mansfield and A

D. Mansfield and A. Montazeri, Front. Robot. AI11, 10.3389/frobt.2024.1336612 (2024)

work page doi:10.3389/frobt.2024.1336612 2024
[18]

Fukui, T

C. Fukui, T. Uchida, S. Koizumi, Y. Murayama, H. Liu, T. Nakata, and D. Terutsuki, npj Robot.3, 4 (2025)

work page 2025
[19]

K. L. Baker, M. Dickinson, T. M. Findley, D. H. Gire, M. Louis, M. P. Suver, J. V. Verhagen, K. I. Nagel, and M. C. Smear, J. Neurosci.38, 9383 (2018)

work page 2018
[20]

Marques, U

L. Marques, U. Nunes, and A. T. de Almeida, Thin solid films418, 51 (2002)

work page 2002
[21]

Celani and E

A. Celani and E. Panizon, Olfactory search, inTarget Search Problems, edited by D. Grebenkov, R. Metzler, and G. Oshanin (Springer Nature Switzerland, Cham,

work page
[22]

Vergassola, E

M. Vergassola, E. Villermaux, and B. I. Shraiman, Na- ture445, 406 (2007)

work page 2007
[23]

Loisy and C

A. Loisy and C. Eloy, Proc. R. Soc. Lond. A478, 20220118 (2022)

work page 2022
[24]

Loisy and R

A. Loisy and R. A. Heinonen, Eur. Phys. J. E46, 17 (2023)

work page 2023
[25]

R. A. Heinonen, L. Biferale, A. Celani, and M. Vergas- sola, Phys. Rev. E107, 055105 (2023)

work page 2023
[26]

R. A. Heinonen, L. Biferale, A. Celani, and M. Vergas- sola, Phys. Rev. Fluids10, 064614 (2025)

work page 2025
[27]

L. Piro, R. A. Heinonen, M. Carbone, L. Biferale, and M. Cencini, Phys. Rev. E113, 044401 (2026)

work page 2026
[28]

Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery

M. Rando, R. A. Heinonen, Y. Qi, and A. Seminara, arXiv (2026), 2605.15938 [physics.bio-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[29]

K. V. B. Verano, E. Panizon, and A. Celani, Proc. Natl. Acad. Sci.120, e2304230120 (2023)

work page 2023
[30]

Balkovsky and B

E. Balkovsky and B. I. Shraiman, Proc. Natl. Acad. Sci. 99, 12589 (2002)

work page 2002
[31]

Rando, M

M. Rando, M. James, A. Verri, L. Rosasco, and A. Sem- inara, eLife 10.7554/elife.102906.2 (2025)

work page doi:10.7554/elife.102906.2 2025
[32]

Y. Zhao, B. Chen, X. Wang, Z. Zhu, Y. Wang, G. Cheng, R. Wang, R. Wang, M. He, and Y. Liu, Inf. Sci.588, 67 (2022)

work page 2022
[33]

S. H. Singh, F. van Breugel, R. P. N. Rao, and B. W. Brunton, Nat. Mach. Intell.5, 58 (2023)

work page 2023
[34]

R. S. Sutton and A. G. Barto,Reinforcement Learning, Second Edition: An Introduction(MIT Press, 2018)

work page 2018
[35]

Masson, M

J.-B. Masson, M. B. Bechet, and M. Vergassola, J. Phys. A42, 434009 (2009)

work page 2009
[36]

Barbieri, S

C. Barbieri, S. Cocco, and R. Monasson, Europhys. Lett. 94, 20005 (2011)

work page 2011
[37]

Boffetta and R

G. Boffetta and R. E. Ecke, Annu. Rev. Fluid Mech.44, 427 (2012)

work page 2012
[38]

Frisch,Turbulence: The Legacy of A

U. Frisch,Turbulence: The Legacy of A. N. Kolmogorov (Cambridge University Press, 1995)

work page 1995
[39]

Schneider, Comput

K. Schneider, Comput. Fluids34, 1223 (2005). Acknowledgements We thank A. Celani and A. Loisy for useful discussions. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement Nos. 882340 and 101002724), by the Air Force Office of Scientific Research (grant FA865...

work page 2005

[1] [1]

R. T. Card´ e and M. A. Willis, J. Chem. Ecol.34, 854 (2008)

work page 2008

[2] [2]

A. M. M. Matheson, A. J. Lanz, A. M. Medina, A. M. Licata, T. A. Currier, M. H. Syed, and K. I. Nagel, Nat. Comm.13, 4613 (2022)

work page 2022

[3] [3]

Biferale, A

L. Biferale, A. Crisanti, M. Vergassola, and A. Vulpiani, Phys. Fluids7, 2725 (1995)

work page 1995

[4] [4]

B. I. Shraiman and E. D. Siggia, Nature405, 639 (2000)

work page 2000

[5] [5]

J. P. Crimaldi and J. R. Koseff, Exp. Fluids31, 90 (2001)

work page 2001

[6] [6]

Celani, E

A. Celani, E. Villermaux, and M. Vergassola, Phys. Rev. X4, 041015 (2014)

work page 2014

[7] [7]

H. C. Berg,Random walks in biology(Princeton Univer- sity Press, 1993)

work page 1993

[8] [8]

Murlis, J

J. Murlis, J. S. Elkinton, and R. T. Card´ e, Annu. Rev. Entomol.37, 505 (1992)

work page 1992

[9] [9]

R. T. Card´ e, Annu. Rev. Entomol.66, 317 (2021)

work page 2021

[10] [10]

van Breugel, R

F. van Breugel, R. Jewell, and J. Houle, J. R. Soc. Inter- face19, 20220258 (2022)

work page 2022

[11] [11]

Reddy, V

G. Reddy, V. N. Murthy, and M. Vergassola, Annu. Rev. Condens. Matter Phys.13, 191 (2022)

work page 2022

[12] [12]

S. D. Stupski and F. van Breugel, Curr. Biol.34, 4397 (2024)

work page 2024

[13] [13]

Houle, A

J. Houle, A. Lopez, and F. van Breugel, bioRxiv 10.64898/2026.04.05.716000 (2026)

work page doi:10.64898/2026.04.05.716000 2026

[14] [14]

M. P. Suver, A. M. M. Matheson, S. Sarkar, M. Damiata, D. Schoppik, and K. I. Nagel, Neuron102, 828 (2019)

work page 2019

[15] [15]

Hutchinson, C

M. Hutchinson, C. Liu, and W.-H. Chen, J. Field Robot. 36, 797 (2019)

work page 2019

[16] [16]

Wang and S

L. Wang and S. Pang, J. Mar. Sci. Eng.11, 10.3390/jmse11020366 (2023)

work page doi:10.3390/jmse11020366 2023

[17] [17]

Mansfield and A

D. Mansfield and A. Montazeri, Front. Robot. AI11, 10.3389/frobt.2024.1336612 (2024)

work page doi:10.3389/frobt.2024.1336612 2024

[18] [18]

Fukui, T

C. Fukui, T. Uchida, S. Koizumi, Y. Murayama, H. Liu, T. Nakata, and D. Terutsuki, npj Robot.3, 4 (2025)

work page 2025

[19] [19]

K. L. Baker, M. Dickinson, T. M. Findley, D. H. Gire, M. Louis, M. P. Suver, J. V. Verhagen, K. I. Nagel, and M. C. Smear, J. Neurosci.38, 9383 (2018)

work page 2018

[20] [20]

Marques, U

L. Marques, U. Nunes, and A. T. de Almeida, Thin solid films418, 51 (2002)

work page 2002

[21] [21]

Celani and E

A. Celani and E. Panizon, Olfactory search, inTarget Search Problems, edited by D. Grebenkov, R. Metzler, and G. Oshanin (Springer Nature Switzerland, Cham,

work page

[22] [22]

Vergassola, E

M. Vergassola, E. Villermaux, and B. I. Shraiman, Na- ture445, 406 (2007)

work page 2007

[23] [23]

Loisy and C

A. Loisy and C. Eloy, Proc. R. Soc. Lond. A478, 20220118 (2022)

work page 2022

[24] [24]

Loisy and R

A. Loisy and R. A. Heinonen, Eur. Phys. J. E46, 17 (2023)

work page 2023

[25] [25]

R. A. Heinonen, L. Biferale, A. Celani, and M. Vergas- sola, Phys. Rev. E107, 055105 (2023)

work page 2023

[26] [26]

R. A. Heinonen, L. Biferale, A. Celani, and M. Vergas- sola, Phys. Rev. Fluids10, 064614 (2025)

work page 2025

[27] [27]

L. Piro, R. A. Heinonen, M. Carbone, L. Biferale, and M. Cencini, Phys. Rev. E113, 044401 (2026)

work page 2026

[28] [28]

Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery

M. Rando, R. A. Heinonen, Y. Qi, and A. Seminara, arXiv (2026), 2605.15938 [physics.bio-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2026

[29] [29]

K. V. B. Verano, E. Panizon, and A. Celani, Proc. Natl. Acad. Sci.120, e2304230120 (2023)

work page 2023

[30] [30]

Balkovsky and B

E. Balkovsky and B. I. Shraiman, Proc. Natl. Acad. Sci. 99, 12589 (2002)

work page 2002

[31] [31]

Rando, M

M. Rando, M. James, A. Verri, L. Rosasco, and A. Sem- inara, eLife 10.7554/elife.102906.2 (2025)

work page doi:10.7554/elife.102906.2 2025

[32] [32]

Y. Zhao, B. Chen, X. Wang, Z. Zhu, Y. Wang, G. Cheng, R. Wang, R. Wang, M. He, and Y. Liu, Inf. Sci.588, 67 (2022)

work page 2022

[33] [33]

S. H. Singh, F. van Breugel, R. P. N. Rao, and B. W. Brunton, Nat. Mach. Intell.5, 58 (2023)

work page 2023

[34] [34]

R. S. Sutton and A. G. Barto,Reinforcement Learning, Second Edition: An Introduction(MIT Press, 2018)

work page 2018

[35] [35]

Masson, M

J.-B. Masson, M. B. Bechet, and M. Vergassola, J. Phys. A42, 434009 (2009)

work page 2009

[36] [36]

Barbieri, S

C. Barbieri, S. Cocco, and R. Monasson, Europhys. Lett. 94, 20005 (2011)

work page 2011

[37] [37]

Boffetta and R

G. Boffetta and R. E. Ecke, Annu. Rev. Fluid Mech.44, 427 (2012)

work page 2012

[38] [38]

Frisch,Turbulence: The Legacy of A

U. Frisch,Turbulence: The Legacy of A. N. Kolmogorov (Cambridge University Press, 1995)

work page 1995

[39] [39]

Schneider, Comput

K. Schneider, Comput. Fluids34, 1223 (2005). Acknowledgements We thank A. Celani and A. Loisy for useful discussions. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement Nos. 882340 and 101002724), by the Air Force Office of Scientific Research (grant FA865...

work page 2005