Clock-state olfactory search in turbulent flows using Q-learning: The geometry of plume recovery
Pith reviewed 2026-05-19 17:30 UTC · model grok-4.3
The pith
A running clock since the last odor whiff lets a Q-learning agent learn surging, casting and downwind return to recover plumes in turbulence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using only a running clock since the last whiff as its state representation, tabular Q-learning produces an interpretable recovery policy that combines surging, casting, and a return downwind; this policy performs well on direct numerical simulation data of turbulent odor plumes yet remains limited by its inability to adapt to local intermittency, which additional state flexibility can mitigate.
What carries the argument
A single running clock that tracks time elapsed since the last odor whiff, serving as the complete state input to tabular Q-learning for navigation decisions.
If this is right
- The learned policy reproduces the combination of surging, casting, and downwind return observed in insects.
- The clock-only agent achieves good performance on direct numerical simulations of turbulent flows.
- Inability to adapt to local intermittency levels limits robustness of the learned strategy.
- Providing more flexibility in the agent's state representation improves performance across different intermittency conditions.
Where Pith is reading between the lines
- Simple time-based memory may be sufficient for basic plume recovery even when full history of detections is unavailable.
- The same clock-state approach could be tested on real insect trajectories to see whether time since last whiff predicts their turns and surges.
- In environments where intermittency fluctuates rapidly, agents may need explicit intermittency estimation rather than relying on a fixed clock policy.
Load-bearing premise
A single running clock since the last whiff supplies enough state information for tabular Q-learning to converge on a robust recovery policy across varying turbulence intermittency levels.
What would settle it
Run the same Q-learning procedure on turbulence data with controlled intermittency changes and check whether the clock-only agent still recovers the plume at the reported success rate or whether the claimed improvement from added flexibility disappears.
Figures
read the original abstract
Finding an odor source in a turbulent flow requires effectively leveraging the history of olfactory observations into a robust navigation strategy. In this work, we use tabular Q-learning to train an olfactory search agent with a minimal memory of past observations: only a running clock since the last whiff. This agent learns an interpretable strategy to recover the plume which combines well-known behaviors observed in insects: surging, casting, and a return downwind. While achieving good performance on data from direct numerical simulations of turbulence, the agent is limited by an inability to adapt its strategy to the local intermittency level; we show that providing more flexibility improves robustness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript trains a tabular Q-learning agent for odor-source localization in turbulent flows using a minimal state representation consisting solely of the time elapsed since the last odor detection (whiff). The learned policy combines surging, casting, and downwind return maneuvers that match documented insect behaviors. The agent is reported to achieve good performance when tested on direct numerical simulation (DNS) turbulence data, yet the authors explicitly note its inability to modulate behavior according to local intermittency; they demonstrate that increasing state flexibility improves robustness.
Significance. If the quantitative results hold, the work shows that a single scalar clock variable is sufficient for reinforcement learning to discover an interpretable, biologically plausible recovery strategy in realistic turbulence. The explicit acknowledgment of the intermittency-adaptation limitation, together with the improvement obtained by relaxing the state, supplies a concrete, falsifiable next step for minimal-memory olfactory navigation models. Use of DNS data rather than idealized plume models strengthens the ecological relevance of the evaluation.
major comments (2)
- [§3 and §4] §3 (Methods) and §4 (Results): the abstract and main text assert 'good performance' and 'performance gains from added flexibility' on DNS data, yet no numerical values for success rate, mean time-to-source, or comparison against baselines are supplied, nor are error bars or statistical tests reported. These omissions make it impossible to judge whether the clock-state policy is genuinely competitive or merely qualitatively plausible.
- [§2.2] §2.2 (State definition): the state is restricted to a single running clock since the last whiff. The manuscript itself states that this construction prevents adaptation to local intermittency; because the central robustness claim rests on performance across varying turbulence levels, the absence of any explicit intermittency measure in the state is load-bearing and should be quantified by comparing policies with and without an intermittency statistic.
minor comments (2)
- [Figure 3] Figure 3 (or equivalent policy visualization): the trajectories shown are helpful, but the caption should explicitly state the turbulence parameters (Re, Sc, source strength) used for the DNS snapshots.
- [§2.2] Notation: the symbol for the clock variable is introduced without a clear definition of its discretization bins or maximum value; this should be stated once in §2.2 and used consistently thereafter.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation and recommendation for minor revision. We value the feedback on strengthening the quantitative aspects and the robustness analysis. We address each major comment below and outline the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [§3 and §4] §3 (Methods) and §4 (Results): the abstract and main text assert 'good performance' and 'performance gains from added flexibility' on DNS data, yet no numerical values for success rate, mean time-to-source, or comparison against baselines are supplied, nor are error bars or statistical tests reported. These omissions make it impossible to judge whether the clock-state policy is genuinely competitive or merely qualitatively plausible.
Authors: We agree that providing quantitative performance metrics is important for rigorously evaluating the agent's effectiveness. The current version of the manuscript focuses on the emergence of interpretable behaviors and the acknowledgment of limitations, but we will revise §3 and §4 to include specific numerical values for success rates, mean time-to-source, comparisons against relevant baselines, error bars, and statistical significance tests based on the DNS turbulence data. This will substantiate the claims of good performance and gains from added flexibility. revision: yes
-
Referee: [§2.2] §2.2 (State definition): the state is restricted to a single running clock since the last whiff. The manuscript itself states that this construction prevents adaptation to local intermittency; because the central robustness claim rests on performance across varying turbulence levels, the absence of any explicit intermittency measure in the state is load-bearing and should be quantified by comparing policies with and without an intermittency statistic.
Authors: The manuscript already highlights the limitation of the single-clock state in adapting to local intermittency and shows that increased state flexibility leads to improved robustness. To directly quantify the effect as suggested, we will add in the revised manuscript an explicit comparison between the clock-state policy and policies augmented with an intermittency statistic. This will provide a clearer assessment of how the absence of such a measure impacts performance across different turbulence levels. revision: yes
Circularity Check
No significant circularity in the Q-learning derivation from DNS data
full rationale
The paper trains a tabular Q-learning agent on direct numerical simulation data of turbulent flows using only a running clock since the last whiff as state. The learned policy is reported to combine surging, casting, and downwind return behaviors observed independently in insects, with performance evaluated on the external DNS dataset. No equations or steps in the derivation reduce claimed performance or strategy to a fitted parameter or self-defined input by construction, and no load-bearing self-citation chains or ansatzes are present. The approach remains self-contained against external benchmarks from turbulence simulations and biological observations.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Tabular Q-learning converges to an optimal policy for the defined finite MDP under sufficient exploration and learning rate conditions
- domain assumption The DNS turbulence fields accurately represent the intermittency and geometry of real odor plumes
Forward citations
Cited by 1 Pith paper
-
Smart strategies to navigate turbulent odor plumes reorienting to local wind
Reinforcement learning policies using elapsed time since odor detection and exponentially filtered local wind direction outperform cast-and-surge in simulated turbulent plumes with mild mean wind and show optimal perf...
Reference graph
Works this paper leans on
-
[1]
Efr´ en´Alvarez-Salvado, Angela M Licata, Erin G Con- nor, Margaret K McHugh, Benjamin MN King, Nicholas Stavropoulos, Jonathan D Victor, John P Crimaldi, and Katherine I Nagel. Elementary sensory-motor transfor- mations underlying olfactory navigation in walking fruit- flies.Elife, 7:e37815, 2018
work page 2018
-
[2]
T. C. Baker. Upwind flight and casting flight: com- plementary and tonic systems used for location of sex pheromone sources by male moths.Proc. 10 th Intl Sym- posium on Olfaction and Taste, 13:18, 1990
work page 1990
-
[3]
TC Baker and Kenneth F Haynes. Manoeuvres used by flying male oriental fruit moths to relocate a sex pheromone plume in an experimentally shifted wind-field. Physiological Entomology, 12(3):263–279, 1987
work page 1987
-
[4]
Eugene Balkovsky and Boris I Shraiman. Olfactory search at high Reynolds number.Proceedings of the na- tional academy of sciences, 99(20):12589–12593, 2002
work page 2002
-
[5]
W. J. Bell and E. Kramer. Search and anemotaxis in insects.J. Insect Physiol, 25:631–640, 1979
work page 1979
-
[6]
Adaptive temporal processing of odor stimuli.Cell and Tissue Research, 383(1):125–141, 2021
Sofia C Brand˜ ao, Marion Silies, and Carlotta Martelli. Adaptive temporal processing of odor stimuli.Cell and Tissue Research, 383(1):125–141, 2021
work page 2021
- [7]
-
[8]
Ring T Card´ e and Mark A Willis. Navigational strategies used by insects to find distant, wind-borne sources of odor.Journal of chemical ecology, 34(7):854–866, 2008
work page 2008
-
[9]
Odor landscapes in turbulent environments
Antonio Celani, Emmanuel Villermaux, and Massimo Vergassola. Odor landscapes in turbulent environments. Physical Review X, 4(4):041015, 2014
work page 2014
-
[10]
Cassandra T David, JS Kennedy, and AR Ludlow. Find- ing of a sex pheromone source by gypsy moths released in the field.Nature, 303(5920):804–806, 1983
work page 1983
-
[11]
Mahmut Demir, Nirag Kadakia, Hope D Anderson, Da- mon A Clark, and Thierry Emonet. Walking drosophila navigate complex plumes using stochastic decisions bi- ased by the timing of odor encounters.Elife, 9:e57524, 2020
work page 2020
-
[12]
Robin A Heinonen, Luca Biferale, Antonio Celani, and Massimo Vergassola. Optimal policies for Bayesian ol- factory search in turbulent flows.Physical Review E, 107(5):055105, 2023
work page 2023
-
[13]
Robin A Heinonen, Luca Biferale, Antonio Celani, and Massimo Vergassola. Exploring Bayesian olfactory search in realistic turbulent flows.Physical Review Fluids, 10(6):064614, 2025
work page 2025
-
[14]
Robin A Heinonen, Luca Biferale, Antonio Celani, and Massimo Vergassola. Optimal trajectories for Bayesian olfactory search in turbulent flows: The low information limit and beyond.Physical review fluids, 10(4):044601, 2025
work page 2025
-
[15]
Hiroshi Ishida, Hidenao Tanaka, Haruki Taniguchi, and Toyosaka Moriizumi. Mobile robot navigation using vi- 12 sion and olfaction to search for a gas/odor source.Au- tonomous Robots, 20(3):231–238, 2006
work page 2006
-
[16]
Reinforcement learning algorithm for partially observable markov decision problems
Tommi Jaakkola, Satinder Singh, and Michael Jordan. Reinforcement learning algorithm for partially observable markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, editors,Advances in Neural Information Processing Systems, volume 7. MIT Press, 1994
work page 1994
-
[17]
Viraaj Jayaram, Aarti Sehdev, Nirag Kadakia, Ethan A Brown, and Thierry Emonet. Temporal novelty detec- tion and multiple timescale integration drive drosophila orientation dynamics in temporally diverse olfactory en- vironments.PLoS computational biology, 19(5):e1010606, 2023
work page 2023
-
[18]
Leslie Pack Kaelbling, Michael L Littman, and An- thony R Cassandra. Planning and acting in partially observable stochastic domains.Artificial intelligence, 101(1-2):99–134, 1998
work page 1998
-
[19]
Nicholas D Kathman, Aaron J Lanz, Jacob D Freed, and Katherine I Nagel. Neural dynamics for working memory and evidence integration during olfactory navigation in drosophila.bioRxiv, pages 2024–10, 2025
work page 2024
-
[20]
LPS Kuenen and Ring T Carde. Strategies for recon- tacting a lost pheromone plume: casting and upwind flight in the male gypsy moth.Physiological Entomol- ogy, 19(1):15–29, 1994
work page 1994
-
[21]
Sar- sop: Efficient point-based pomdp planning by approxi- mating optimally reachable belief spaces
Hanna Kurniawati, David Hsu, Wee Sun Lee, et al. Sar- sop: Efficient point-based pomdp planning by approxi- mating optimally reachable belief spaces. InRobotics: Science and systems, volume 2008. Zurich, Switzerland, 2008
work page 2008
-
[22]
James C. Liao. The role of the lateral line and vision on body kinematics and hydrodynamic preference of rain- bow trout in turbulent flow.Journal of Experimental Biology, 209(20):4077–4090, 10 2006
work page 2006
-
[23]
Aurore Loisy and Christophe Eloy. Searching for a source without gradients: how good is infotaxis and how to beat it.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 478(2262), 2022
work page 2022
-
[24]
Aurore Loisy and Robin A Heinonen. Deep reinforcement learning for the olfactory search pomdp: a quantitative benchmark.The European Physical Journal E, 46(3):17, 2023
work page 2023
-
[25]
A Mafra-Neto and RT Card´ e. Dissection of the pheromone-modulated flight of moths using single-pulse response as a template.Cellular and Molecular Life Sci- ences, 52(4):373–379, 1996
work page 1996
-
[26]
T. S. Okubo, P. Patella, I. D’Alessandro, and R. I. Wil- son. A neural network for wind-guided compass naviga- tion.Neuron, 107:924–940, 2020
work page 2020
-
[27]
Rich Pang, Floris Van Breugel, Michael Dickinson, Jef- frey A Riffell, and Adrienne Fairhall. History dependence in insect flight decisions during odor tracking.PLoS com- putational biology, 14(2):e1005969, 2018
work page 2018
-
[28]
Mainak Patel and Aaditya Rangan. Olfactory encoding within the insect antennal lobe: The emergence and role of higher order temporal correlations in the dynamics of antennal lobe spiking activity.Journal of theoretical biology, 522:110700, 2021
work page 2021
-
[29]
Heinonen, Marco Rando, and Agnese Seminara
Lorenzo Piro, Maurizio Carbone, Luca Biferale, Massimo Cencini, Robin A. Heinonen, Marco Rando, and Agnese Seminara. Smart strategies to navigate turbulent odor plumes reorienting to local wind.preprint, 209:4077– 4090, 2026
work page 2026
-
[30]
Q-learning with tempo- ral memory to navigate turbulence.Elife, 13:RP102906, 2025
Marco Rando, Martin James, Alessandro Verri, Lorenzo Rosasco, and Agnese Seminara. Q-learning with tempo- ral memory to navigate turbulence.Elife, 13:RP102906, 2025
work page 2025
-
[31]
Gautam Reddy, Venkatesh N Murthy, and Massimo Ver- gassola. Olfactory sensing and navigation in turbu- lent environments.Annual Review of Condensed Matter Physics, 13(1):191–213, 2022
work page 2022
-
[32]
A. M. Reynolds, D. R. Reynolds, A. D. Smith, and J.W. Chapman. Orientation cues for high-flying nocturnal in- sect migrants: Do turbulence-induced temperature and velocity fluctuations indicate the mean wind flow?Plos ONE, 5:e15758, 2010
work page 2010
-
[33]
Alternation emerges as a multi- modal strategy for turbulent odor navigation.Elife, 11:e76989, 2022
Nicola Rigolli, Gautam Reddy, Agnese Seminara, and Massimo Vergassola. Alternation emerges as a multi- modal strategy for turbulent odor navigation.Elife, 11:e76989, 2022
work page 2022
-
[34]
Emergent behaviour and neu- ral dynamics in artificial agents tracking odour plumes
Satpreet H Singh, Floris Van Breugel, Rajesh PN Rao, and Bingni W Brunton. Emergent behaviour and neu- ral dynamics in artificial agents tracking odour plumes. Nature machine intelligence, 5(1):58–70, 2023
work page 2023
-
[35]
Shuchita Soman, Sree Subha Ramaswamy, and San- jay P Sane. Odor tracking in insects: a multi- sensory behavior.Journal of Experimental Biology, 229(Suppl 1):jeb250945, 2026
work page 2026
-
[36]
Olfactory navigation in arthropods.Journal of Compar- ative Physiology A, 209(4):467–488, 2023
Theresa J Steele, Aaron J Lanz, and Katherine I Nagel. Olfactory navigation in arthropods.Journal of Compar- ative Physiology A, 209(4):467–488, 2023
work page 2023
-
[37]
Richard S Sutton and Andrew G Barto.Reinforcement learning: An introduction, volume 1. MIT press Cam- bridge, 1998
work page 1998
-
[38]
M. P. Suver, A. M. Matheson, S. Sarkar, M. Damiata, D. Schoppik, and K. I. Nagel. Encoding of wind direction by central neurons in drosophila.Neuron, 102:828–842, 2019
work page 2019
-
[39]
Floris Van Breugel and Michael H Dickinson. Plume- tracking behavior of flying drosophila emerges from a set of distinct sensory-motor reflexes.Current Biology, 24(3):274–286, 2014
work page 2014
-
[40]
van Breugel F and Dickinson MH. Plume-tracking be- havior of flying drosophila emerges from a set of distinct sensory-motor reflexes.Curr Biol, 24:274, 2014
work page 2014
-
[41]
Kyrell Vann B Verano, Emanuele Panizon, and An- tonio Celani. Olfactory search with finite-state con- trollers.Proceedings of the National Academy of Sciences, 120(34):e2304230120, 2023
work page 2023
-
[42]
‘infotaxis’ as a strategy for searching without gradients.Nature, 445(7126):406–409, 2007
Massimo Vergassola, Emmanuel Villermaux, and Boris I Shraiman. ‘infotaxis’ as a strategy for searching without gradients.Nature, 445(7126):406–409, 2007
work page 2007
-
[43]
NJ Vickers and TC Baker. Latencies of behavioral re- sponse to interception of filaments of sex pheromone and clean air influence flight track shape in Heliothis virescens (F.) males.Journal of Comparative Physiology A, 178(6):831–847, 1996
work page 1996
-
[44]
Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3):279–292, 1992
work page 1992
-
[45]
Mark A Willis and Edmund A Arbas. Odor-modulated upwind flight of the sphinx moth, manduca sexta l.Jour- nal of Comparative Physiology A, 169(4):427–440, 1991
work page 1991
-
[46]
Yan S. W. Yu, Matthew M. Graff, Chris S. Bresee, Yan B. Man, and Mitra J. Z. Hartmann. Whiskers aid anemo- taxis in rats.Science Advances, 2(8):e1600716, 2016. 13 Appendix A: Supplementary Material FIG. 9. Generalization of single Q agents (left) and two-Qs agents (right), measured by the cumulative rewardG(top tow) and its projections on the normalized t...
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.