Olfactory pursuit: catching a moving odor source in complex flows
Pith reviewed 2026-05-10 15:19 UTC · model grok-4.3
The pith
A hybrid policy that adds velocity prediction to information-seeking enables near-optimal interception of moving odor sources.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that predictive inference over the target's velocity, in addition to position, is the key ingredient for effective olfactory pursuit. Using a discrete run-and-tumble motion model, the authors solve the Bellman equation for quasi-optimal policies and demonstrate that their hybrid heuristic, which augments Infotaxis with a greedy pursuit term derived from the fully observable interception problem, delivers near-optimal interception rates across all persistence times while substantially outperforming purely exploratory baselines.
What carries the argument
The hybrid policy that merges Infotaxis' information-gain drive with a greedy value function computed from the associated fully observable Markov decision process.
If this is right
- Purely exploratory policies succeed near-optimally only for targets with frequent reorientations.
- The hybrid heuristic maintains near-optimal performance for every persistence time.
- It substantially outperforms purely exploratory methods when targets exhibit persistent motion.
- The policy remains robust under continuous run-and-tumble motion, model mismatch, and more accurate plume representations.
Where Pith is reading between the lines
- The same predictive-inference principle could be tested in other intermittent sensing domains such as acoustic or visual tracking with lag.
- Robotic platforms could implement the hybrid policy with onboard plume sensors to measure real interception times versus simulation.
- Extending the joint belief to include higher-order motion statistics might further improve performance for non-run-and-tumble targets.
- The framework suggests a general template for search under delayed information in any turbulent or noisy transport process.
Load-bearing premise
The target's motion statistics are adequately captured by a discrete run-and-tumble process with a single persistence time parameter, and joint belief over position and velocity suffices for near-optimal decisions.
What would settle it
A direct comparison in real turbulent plume experiments with persistently moving targets would falsify the claim if the hybrid policy shows no improvement over Infotaxis or fails to achieve near-optimal interception rates.
Figures
read the original abstract
Locating and intercepting a moving target from possibly delayed, intermittent sensory signals is a paradigmatic problem in decision-making under uncertainty, and a fundamental challenge for, e.g., animals seeking prey or mates and autonomous robotic systems. Odor signals are intermittent, strongly mixed by turbulent-like transport, and typically lag behind the true target position, thereby complicating localization. Here, we formulate olfactory pursuit as a partially observable Markov decision process in which an agent maintains a joint belief over the target's position and velocity. Using a discrete run-and-tumble model, we compute quasi-optimal policies by numerically solving the Bellman equation and benchmark them against well-established information-theoretic strategies such as Infotaxis. We show that purely exploratory policies are near-optimal when the target frequently reorients, but fail dramatically when the target exhibits persistent motion. We thus introduce a computationally efficient hybrid policy that combines the information-gain drive of Infotaxis with a "greedy" value function derived from an associated fully observable control problem. Our heuristic achieves near-optimal performance across all persistence times and substantially outperforms purely exploratory approaches. Moreover, our proposal demonstrates strong robustness even in more complex search scenarios, including continuous run-and-tumble prey motion with moderate persistence time, model mismatch, and more accurate plume dynamics representation. Our results identify predictive inference of target motion as the key ingredient for effective olfactory pursuit and provide a general framework for search in information-poor, dynamically evolving environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript formulates olfactory pursuit of a moving target as a POMDP in which the agent maintains a joint belief over the target's position and velocity under a discrete run-and-tumble motion model. Quasi-optimal policies are obtained by numerically solving the Bellman equation and are benchmarked against Infotaxis. A computationally efficient hybrid policy is introduced that augments Infotaxis with a greedy value function derived from the associated fully observable control problem. The hybrid policy is reported to achieve near-optimal performance across persistence times and to remain effective under continuous run-and-tumble motion, model mismatch, and more accurate plume representations.
Significance. If the performance and robustness claims hold, the work supplies a concrete, predictive-inference-based framework for search under intermittent, delayed observations in turbulent flows. The explicit construction of the hybrid policy from an independently solved fully observable problem, the systematic benchmarking against Infotaxis, and the reported robustness tests constitute clear strengths. The results would be of interest to both robotics and behavioral ecology communities.
major comments (1)
- [§4 and abstract] The central robustness claim (abstract and §4) that the hybrid policy remains effective under 'continuous run-and-tumble prey motion with moderate persistence time, model mismatch, and more accurate plume dynamics' rests on tests that remain inside the run-and-tumble family. Because the POMDP belief update and the value function are derived from the discrete Markovian velocity model, it is unclear whether the performance advantage survives qualitatively different velocity statistics (e.g., continuous correlated noise or multi-scale persistence). A direct comparison against an Ornstein-Uhlenbeck or other non-run-and-tumble process would be required to substantiate the generality statement.
minor comments (2)
- [§3] The discretization parameters and convergence criteria used for the numerical Bellman solution are not stated explicitly; adding a short paragraph or table with grid sizes, time steps, and tolerance values would improve reproducibility.
- [§2.2 and §3.1] Notation for the joint belief (position-velocity) and the information-gain term in the hybrid policy should be unified between the main text and the supplementary material.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comment correctly identifies a limitation in the scope of our robustness claims. We address it by clarifying the tested scenarios and revising the manuscript to avoid overstatement of generality.
read point-by-point responses
-
Referee: [§4 and abstract] The central robustness claim (abstract and §4) that the hybrid policy remains effective under 'continuous run-and-tumble prey motion with moderate persistence time, model mismatch, and more accurate plume dynamics' rests on tests that remain inside the run-and-tumble family. Because the POMDP belief update and the value function are derived from the discrete Markovian velocity model, it is unclear whether the performance advantage survives qualitatively different velocity statistics (e.g., continuous correlated noise or multi-scale persistence). A direct comparison against an Ornstein-Uhlenbeck or other non-run-and-tumble process would be required to substantiate the generality statement.
Authors: We agree that the reported tests, including continuous run-and-tumble motion, model mismatch, and altered plume representations, all operate within the run-and-tumble family. The continuous variant replaces the discrete-time Markov chain with a continuous-time process using exponential holding times, producing different velocity autocorrelation structure while preserving the same state space. Model-mismatch experiments evaluate the discrete-trained hybrid policy on continuous-motion environments and modified plume models, showing retained performance gains. However, these do not include qualitatively distinct processes such as Ornstein-Uhlenbeck dynamics or multi-scale persistence. We will revise the abstract and §4 to state explicitly that robustness is demonstrated for run-and-tumble variants and intra-family mismatch, and we will add a short discussion paragraph noting the limitation and suggesting extensions to other stochastic velocity models. This addresses the concern by narrowing the claim without new simulations. revision: yes
Circularity Check
No circularity: standard POMDP Bellman solution and hybrid policy from independent fully-observable problem
full rationale
The paper defines olfactory pursuit as a POMDP with joint belief over position and velocity under a discrete run-and-tumble model, computes policies by numerically solving the Bellman equation, and constructs the hybrid policy by combining Infotaxis information gain with a greedy value function taken from a separately defined fully observable control problem. None of these steps reduce by construction to the inputs, rename fitted quantities as predictions, or rest on load-bearing self-citations. Performance claims rest on explicit numerical benchmarks and simulations against external baselines (Infotaxis), which are independent of the derivation. The model assumptions are stated explicitly and tested via simulation variants, but this is a question of modeling fidelity rather than circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Target motion follows a discrete run-and-tumble process characterized by a persistence time parameter.
Reference graph
Works this paper leans on
-
[1]
MR Conover, Predator-prey dynamics: the role of olfaction. (CRC Press), (2007)
work page 2007
-
[2]
L Seuront, Chemical and hydromechanical components of mate-seeking behaviour in the calanoid copepod eurytemora affinis. J. Plankton Res.35, 724–743 (2013)
work page 2013
-
[3]
J Y en, JK Sehn, K Catton, A Kramer, O Sarnelle, Pheromone trail following in three dimensions by the freshwater copepod hesperodiaptomus shoshone. J. Plankton Res.33, 907–916 (2011)
work page 2011
-
[4]
A Kashyap, D Ghose, Pursuing a time varying and moving source signal using a sensor equipped uav in 2017 International Conference on Unmanned Aircraft Systems (ICUAS). (IEEE), pp. 506–515 (2017)
work page 2017
-
[5]
MA Demetriou, NA Gatsonis, JR Court, Coupled controls-computational fluids approach for the estimation of the concentration from a moving gaseous source in a 2-d domain with a lyapunov-guided sensing aerial vehicle. IEEE Transactions on Control. Syst. Technol.22, 853–867 (2013)
work page 2013
-
[6]
E Balkovsky, BI Shraiman, Olfactory search at high Reynolds number. Proc. national academy sciences99, 12589–12593 (2002)
work page 2002
-
[7]
M Vergassola, E Villermaux, B Shraiman, ‘infotaxis’ as a strategy for searching without gradients. Nature445, 406–409 (2007)
work page 2007
-
[8]
G Kowadlo, RA Russell, Robot odor localization: a taxonomy and survey. The Int. J. Robotics Res.27, 869–894 (2008)
work page 2008
-
[9]
Xx Chen, J Huang, Odor source localization algorithms on mobile robots: A review and future outlook. Robotics Auton. Syst.112, 123–136 (2019)
work page 2019
-
[10]
G Reddy, VN Murthy, M Vergassola, Olfactory sensing and navigation in turbulent environments. Annu. Rev. Condens. Matter Phys.13, 191–213 (2022)
work page 2022
-
[11]
JB Masson, Olfactory searches with limited space perception. Proc. Natl. Acad. Sci.110, 11261–11266 (2013)
work page 2013
-
[12]
KVB Verano, E Panizon, A Celani, Olfactory search with finite-state controllers. Proc. Natl. Acad. Sci.120, e2304230120 (2023)
work page 2023
-
[13]
LP Kaelbling, ML Littman, AR Cassandra, Planning and acting in partially observable stochastic domains. Artif. Intell.101, 99–134 (1998)
work page 1998
-
[14]
P Moore, J Crimaldi, Odor landscapes and animal behavior: tracking odor plumes in different physical worlds. J. Mar. Syst.49, 55–64 (2004)
work page 2004
-
[15]
A Celani, E Villermaux, M Vergassola, Odor landscapes in turbulent environments. Phys. Rev. X4, 041015 (2014)
work page 2014
-
[16]
A Loisy, C Eloy, Searching for a source without gradients: how good is infotaxis and how to beat it. Proc. R. Soc. Lond. A478, 20220118 (2022)
work page 2022
-
[17]
RA Heinonen, L Biferale, A Celani, M Vergassola, Optimal policies for Bayesian olfactory search in turbulent flows. Phys. Rev. E107, 055105 (2023)
work page 2023
-
[18]
G Richardson, P Dickinson, OH Burman, TW Pike, Unpredictable movement as an anti-predator strategy. Proc. Royal Soc. B285, 20181112 (2018)
work page 2018
-
[19]
AW Szopa-Comley, CC Ioannou, Responsive robotic prey reveal how predators adapt to predictability in escape tactics. Proc. Natl. Acad. Sci.119, e2117858119 (2022)
work page 2022
- [20]
-
[21]
(Princeton University Press), (1957)
R Bellman, Dynamic Programming. (Princeton University Press), (1957)
work page 1957
-
[22]
KJ Astrom, Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Applic.10, 174–205 (1965)
work page 1965
-
[23]
EJ Sondik, The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res.26, 282–304 (1978)
work page 1978
-
[24]
(Cambridge University Press, Cambridge), (2016)
V Krishnamurthy, Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing. (Cambridge University Press, Cambridge), (2016)
work page 2016
-
[25]
MTJ Spaan, Partially observable markov decision processes in Reinforcement Learning: State of the Art, eds. M Wiering, M van Otterlo. (Springer), pp. 387–414 (2012)
work page 2012
-
[26]
A Loisy, RA Heinonen, Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark. The Eur. Phys. J. E46, 17 (2023)
work page 2023
-
[27]
RA Heinonen, L Biferale, A Celani, M Vergassola, Optimal trajectories for bayesian olfactory search in turbulent flows: The low information limit and beyond. Phys. Rev. Fluids10, 044601 (2025)
work page 2025
-
[28]
RA Heinonen, L Biferale, A Celani, M Vergassola, Exploring bayesian olfactory search in realistic turbulent flows. Phys. Rev. Fluids10, 064614 (2025)
work page 2025
-
[29]
in Robotics: Science and systems
H Kurniawati, D Hsu, WS Lee, Sarsop: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. in Robotics: Science and systems. (Citeseer), Vol. 2008, (2008)
work page 2008
-
[30]
RS Sutton, AG Barto, , et al., Reinforcement learning: An introduction. (MIT press Cambridge) Vol. 1, (1998)
work page 1998
-
[31]
(Princeton University Press), (2025)
HC Berg, Random walks in biology. (Princeton University Press), (2025). 8| Carboneet al. DRAFT
work page 2025
-
[32]
D Humphries, P Driver, Protean defence by prey animals. Oecologia5, 285–302 (1970)
work page 1970
-
[33]
N Furuichi, Dynamics between a predator and a prey switching two kinds of escape motions. J. theoretical Biol.217, 159–166 (2002)
work page 2002
-
[34]
F Borra, L Biferale, M Cencini, A Celani, Reinforcement learning for pursuit and evasion of microswimmers at low reynolds number. Phys. Rev. Fluids7, 023103 (2022)
work page 2022
-
[35]
Zeitschrift für Physikalische Chemie92U, 129–168 (1918)
M Smoluchowski, Versuch einer mathematischen Theorie der Koagulationskinetik kolloider Lüsungen. Zeitschrift für Physikalische Chemie92U, 129–168 (1918)
work page 1918
-
[36]
A Celani, E Panizon, Olfactory search in Target Search Problems. (Springer), pp. 711–732 (2024)
work page 2024
-
[37]
V Tejedor, R Voituriez, O Bénichou, Optimizing persistent random searches. Phys. Rev. Lett.108, 088103 (2012). Carboneet al. PNAS |April 16, 2026| vol. XXX | no. XX |9 DRAFT Supplementary material for ”Olfactory pursuit: catching a moving odor source in complex flows” Maurizio Carbone a,bLorenzo Pirob, Robin A. Heinonen c, Luca Biferale b, Massimo Cencini...
work page 2012
-
[38]
The grayscale colormap represents the agent’s belief over the target state. The inset panel shows the belief over the target velocity together with the true target velocity (blue arrow). Movies S3-S4. Comparison between best heuristic and infotaxis at τp = 2 and 25. The setup is the same as for movies S1-S2. The search episode resulting from the best heur...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.