pith. sign in

arxiv: 2604.13121 · v1 · submitted 2026-04-13 · 💻 cs.RO

Olfactory pursuit: catching a moving odor source in complex flows

Pith reviewed 2026-05-10 15:19 UTC · model grok-4.3

classification 💻 cs.RO
keywords olfactory pursuitpartially observable Markov decision processInfotaxisrun-and-tumble modeltarget interceptionrobotic searchinformation gainpredictive inference
0
0 comments X

The pith

A hybrid policy that adds velocity prediction to information-seeking enables near-optimal interception of moving odor sources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Locating a moving target from delayed, intermittent odor signals mixed by turbulence is a core challenge for animals and robots. The paper formulates the problem as a partially observable Markov decision process and shows that purely exploratory strategies like Infotaxis perform well only when targets reorient frequently. For persistent motion, these strategies fail, but a computationally efficient hybrid policy that combines information gain with a greedy value function from the fully observable case achieves near-optimal performance. The approach remains effective under model mismatch, continuous motion, and more detailed plume models.

Core claim

The paper establishes that predictive inference over the target's velocity, in addition to position, is the key ingredient for effective olfactory pursuit. Using a discrete run-and-tumble motion model, the authors solve the Bellman equation for quasi-optimal policies and demonstrate that their hybrid heuristic, which augments Infotaxis with a greedy pursuit term derived from the fully observable interception problem, delivers near-optimal interception rates across all persistence times while substantially outperforming purely exploratory baselines.

What carries the argument

The hybrid policy that merges Infotaxis' information-gain drive with a greedy value function computed from the associated fully observable Markov decision process.

If this is right

  • Purely exploratory policies succeed near-optimally only for targets with frequent reorientations.
  • The hybrid heuristic maintains near-optimal performance for every persistence time.
  • It substantially outperforms purely exploratory methods when targets exhibit persistent motion.
  • The policy remains robust under continuous run-and-tumble motion, model mismatch, and more accurate plume representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same predictive-inference principle could be tested in other intermittent sensing domains such as acoustic or visual tracking with lag.
  • Robotic platforms could implement the hybrid policy with onboard plume sensors to measure real interception times versus simulation.
  • Extending the joint belief to include higher-order motion statistics might further improve performance for non-run-and-tumble targets.
  • The framework suggests a general template for search under delayed information in any turbulent or noisy transport process.

Load-bearing premise

The target's motion statistics are adequately captured by a discrete run-and-tumble process with a single persistence time parameter, and joint belief over position and velocity suffices for near-optimal decisions.

What would settle it

A direct comparison in real turbulent plume experiments with persistently moving targets would falsify the claim if the hybrid policy shows no improvement over Infotaxis or fails to achieve near-optimal interception rates.

Figures

Figures reproduced from arXiv: 2604.13121 by Antonio Celani, Lorenzo Piro, Luca Biferale, Massimo Cencini, Maurizio Carbone, Robin A. Heinonen.

Figure 1
Figure 1. Figure 1: Sketch of typical olfactory pursuit. The target source (blue circle) performs a run-and-tumble motion and emits a blue chemical plume along its trajectory. The agent (red dashed trajectory) pursues the moving target by inferring its position and velocity from discrete odor detections (indicated by the red triangles). The agent maintains a probability map over the target position Xs and velocity Us, namely … view at source ↗
Figure 2
Figure 2. Figure 2: Results from the discrete run-and-tumble model, with both agent and target moving on a periodic lattice. Panel A shows the average search time ⟨T⟩ as a function of the target persistence time τp, averaged over 104 search episodes starting from a detection: the black curve indicates the quasi-optimal strategy obtained from solving the POMDP; blue curve the infotactic policy (w = 0); gray curve refers to ran… view at source ↗
Figure 3
Figure 3. Figure 3: Results from the continuous run-and-tumble model, with the target performing a continuous run-and-tumble on a periodic square and the agent moving on a periodic lattice. Panel A shows the average search time ⟨T⟩ as a function of the target persistence time, averaged over 104 search episodes starting from a detection: circles indicate the hybrid heuristic policy 3 for the best weight w, color-coded as in th… view at source ↗
read the original abstract

Locating and intercepting a moving target from possibly delayed, intermittent sensory signals is a paradigmatic problem in decision-making under uncertainty, and a fundamental challenge for, e.g., animals seeking prey or mates and autonomous robotic systems. Odor signals are intermittent, strongly mixed by turbulent-like transport, and typically lag behind the true target position, thereby complicating localization. Here, we formulate olfactory pursuit as a partially observable Markov decision process in which an agent maintains a joint belief over the target's position and velocity. Using a discrete run-and-tumble model, we compute quasi-optimal policies by numerically solving the Bellman equation and benchmark them against well-established information-theoretic strategies such as Infotaxis. We show that purely exploratory policies are near-optimal when the target frequently reorients, but fail dramatically when the target exhibits persistent motion. We thus introduce a computationally efficient hybrid policy that combines the information-gain drive of Infotaxis with a "greedy" value function derived from an associated fully observable control problem. Our heuristic achieves near-optimal performance across all persistence times and substantially outperforms purely exploratory approaches. Moreover, our proposal demonstrates strong robustness even in more complex search scenarios, including continuous run-and-tumble prey motion with moderate persistence time, model mismatch, and more accurate plume dynamics representation. Our results identify predictive inference of target motion as the key ingredient for effective olfactory pursuit and provide a general framework for search in information-poor, dynamically evolving environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript formulates olfactory pursuit of a moving target as a POMDP in which the agent maintains a joint belief over the target's position and velocity under a discrete run-and-tumble motion model. Quasi-optimal policies are obtained by numerically solving the Bellman equation and are benchmarked against Infotaxis. A computationally efficient hybrid policy is introduced that augments Infotaxis with a greedy value function derived from the associated fully observable control problem. The hybrid policy is reported to achieve near-optimal performance across persistence times and to remain effective under continuous run-and-tumble motion, model mismatch, and more accurate plume representations.

Significance. If the performance and robustness claims hold, the work supplies a concrete, predictive-inference-based framework for search under intermittent, delayed observations in turbulent flows. The explicit construction of the hybrid policy from an independently solved fully observable problem, the systematic benchmarking against Infotaxis, and the reported robustness tests constitute clear strengths. The results would be of interest to both robotics and behavioral ecology communities.

major comments (1)
  1. [§4 and abstract] The central robustness claim (abstract and §4) that the hybrid policy remains effective under 'continuous run-and-tumble prey motion with moderate persistence time, model mismatch, and more accurate plume dynamics' rests on tests that remain inside the run-and-tumble family. Because the POMDP belief update and the value function are derived from the discrete Markovian velocity model, it is unclear whether the performance advantage survives qualitatively different velocity statistics (e.g., continuous correlated noise or multi-scale persistence). A direct comparison against an Ornstein-Uhlenbeck or other non-run-and-tumble process would be required to substantiate the generality statement.
minor comments (2)
  1. [§3] The discretization parameters and convergence criteria used for the numerical Bellman solution are not stated explicitly; adding a short paragraph or table with grid sizes, time steps, and tolerance values would improve reproducibility.
  2. [§2.2 and §3.1] Notation for the joint belief (position-velocity) and the information-gain term in the hybrid policy should be unified between the main text and the supplementary material.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comment correctly identifies a limitation in the scope of our robustness claims. We address it by clarifying the tested scenarios and revising the manuscript to avoid overstatement of generality.

read point-by-point responses
  1. Referee: [§4 and abstract] The central robustness claim (abstract and §4) that the hybrid policy remains effective under 'continuous run-and-tumble prey motion with moderate persistence time, model mismatch, and more accurate plume dynamics' rests on tests that remain inside the run-and-tumble family. Because the POMDP belief update and the value function are derived from the discrete Markovian velocity model, it is unclear whether the performance advantage survives qualitatively different velocity statistics (e.g., continuous correlated noise or multi-scale persistence). A direct comparison against an Ornstein-Uhlenbeck or other non-run-and-tumble process would be required to substantiate the generality statement.

    Authors: We agree that the reported tests, including continuous run-and-tumble motion, model mismatch, and altered plume representations, all operate within the run-and-tumble family. The continuous variant replaces the discrete-time Markov chain with a continuous-time process using exponential holding times, producing different velocity autocorrelation structure while preserving the same state space. Model-mismatch experiments evaluate the discrete-trained hybrid policy on continuous-motion environments and modified plume models, showing retained performance gains. However, these do not include qualitatively distinct processes such as Ornstein-Uhlenbeck dynamics or multi-scale persistence. We will revise the abstract and §4 to state explicitly that robustness is demonstrated for run-and-tumble variants and intra-family mismatch, and we will add a short discussion paragraph noting the limitation and suggesting extensions to other stochastic velocity models. This addresses the concern by narrowing the claim without new simulations. revision: yes

Circularity Check

0 steps flagged

No circularity: standard POMDP Bellman solution and hybrid policy from independent fully-observable problem

full rationale

The paper defines olfactory pursuit as a POMDP with joint belief over position and velocity under a discrete run-and-tumble model, computes policies by numerically solving the Bellman equation, and constructs the hybrid policy by combining Infotaxis information gain with a greedy value function taken from a separately defined fully observable control problem. None of these steps reduce by construction to the inputs, rename fitted quantities as predictions, or rest on load-bearing self-citations. Performance claims rest on explicit numerical benchmarks and simulations against external baselines (Infotaxis), which are independent of the derivation. The model assumptions are stated explicitly and tested via simulation variants, but this is a question of modeling fidelity rather than circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard POMDP assumptions and a domain-specific motion model; no new physical entities are postulated.

axioms (1)
  • domain assumption Target motion follows a discrete run-and-tumble process characterized by a persistence time parameter.
    Invoked to define the state transition probabilities in the POMDP and to generate the benchmark trajectories.

pith-pipeline@v0.9.0 · 5569 in / 1241 out tokens · 39718 ms · 2026-05-10T15:19:20.529374+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

  1. [1]

    (CRC Press), (2007)

    MR Conover, Predator-prey dynamics: the role of olfaction. (CRC Press), (2007)

  2. [2]

    L Seuront, Chemical and hydromechanical components of mate-seeking behaviour in the calanoid copepod eurytemora affinis. J. Plankton Res.35, 724–743 (2013)

  3. [3]

    J Y en, JK Sehn, K Catton, A Kramer, O Sarnelle, Pheromone trail following in three dimensions by the freshwater copepod hesperodiaptomus shoshone. J. Plankton Res.33, 907–916 (2011)

  4. [4]

    (IEEE), pp

    A Kashyap, D Ghose, Pursuing a time varying and moving source signal using a sensor equipped uav in 2017 International Conference on Unmanned Aircraft Systems (ICUAS). (IEEE), pp. 506–515 (2017)

  5. [5]

    IEEE Transactions on Control

    MA Demetriou, NA Gatsonis, JR Court, Coupled controls-computational fluids approach for the estimation of the concentration from a moving gaseous source in a 2-d domain with a lyapunov-guided sensing aerial vehicle. IEEE Transactions on Control. Syst. Technol.22, 853–867 (2013)

  6. [6]

    E Balkovsky, BI Shraiman, Olfactory search at high Reynolds number. Proc. national academy sciences99, 12589–12593 (2002)

  7. [7]

    Nature445, 406–409 (2007)

    M Vergassola, E Villermaux, B Shraiman, ‘infotaxis’ as a strategy for searching without gradients. Nature445, 406–409 (2007)

  8. [8]

    G Kowadlo, RA Russell, Robot odor localization: a taxonomy and survey. The Int. J. Robotics Res.27, 869–894 (2008)

  9. [9]

    Robotics Auton

    Xx Chen, J Huang, Odor source localization algorithms on mobile robots: A review and future outlook. Robotics Auton. Syst.112, 123–136 (2019)

  10. [10]

    G Reddy, VN Murthy, M Vergassola, Olfactory sensing and navigation in turbulent environments. Annu. Rev. Condens. Matter Phys.13, 191–213 (2022)

  11. [11]

    JB Masson, Olfactory searches with limited space perception. Proc. Natl. Acad. Sci.110, 11261–11266 (2013)

  12. [12]

    KVB Verano, E Panizon, A Celani, Olfactory search with finite-state controllers. Proc. Natl. Acad. Sci.120, e2304230120 (2023)

  13. [13]

    LP Kaelbling, ML Littman, AR Cassandra, Planning and acting in partially observable stochastic domains. Artif. Intell.101, 99–134 (1998)

  14. [14]

    P Moore, J Crimaldi, Odor landscapes and animal behavior: tracking odor plumes in different physical worlds. J. Mar. Syst.49, 55–64 (2004)

  15. [15]

    A Celani, E Villermaux, M Vergassola, Odor landscapes in turbulent environments. Phys. Rev. X4, 041015 (2014)

  16. [16]

    A Loisy, C Eloy, Searching for a source without gradients: how good is infotaxis and how to beat it. Proc. R. Soc. Lond. A478, 20220118 (2022)

  17. [17]

    RA Heinonen, L Biferale, A Celani, M Vergassola, Optimal policies for Bayesian olfactory search in turbulent flows. Phys. Rev. E107, 055105 (2023)

  18. [18]

    G Richardson, P Dickinson, OH Burman, TW Pike, Unpredictable movement as an anti-predator strategy. Proc. Royal Soc. B285, 20181112 (2018)

  19. [19]

    AW Szopa-Comley, CC Ioannou, Responsive robotic prey reveal how predators adapt to predictability in escape tactics. Proc. Natl. Acad. Sci.119, e2117858119 (2022)

  20. [20]

    IROS ’96

    A Cassandra, L Kaelbling, J Kurien, Acting under uncertainty: discrete bayesian models for mobile-robot navigation in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS ’96. Vol. 2, pp. 963–972 vol.2 (1996)

  21. [21]

    (Princeton University Press), (1957)

    R Bellman, Dynamic Programming. (Princeton University Press), (1957)

  22. [22]

    KJ Astrom, Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Applic.10, 174–205 (1965)

  23. [23]

    EJ Sondik, The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res.26, 282–304 (1978)

  24. [24]

    (Cambridge University Press, Cambridge), (2016)

    V Krishnamurthy, Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing. (Cambridge University Press, Cambridge), (2016)

  25. [25]

    M Wiering, M van Otterlo

    MTJ Spaan, Partially observable markov decision processes in Reinforcement Learning: State of the Art, eds. M Wiering, M van Otterlo. (Springer), pp. 387–414 (2012)

  26. [26]

    A Loisy, RA Heinonen, Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark. The Eur. Phys. J. E46, 17 (2023)

  27. [27]

    RA Heinonen, L Biferale, A Celani, M Vergassola, Optimal trajectories for bayesian olfactory search in turbulent flows: The low information limit and beyond. Phys. Rev. Fluids10, 044601 (2025)

  28. [28]

    RA Heinonen, L Biferale, A Celani, M Vergassola, Exploring bayesian olfactory search in realistic turbulent flows. Phys. Rev. Fluids10, 064614 (2025)

  29. [29]

    in Robotics: Science and systems

    H Kurniawati, D Hsu, WS Lee, Sarsop: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. in Robotics: Science and systems. (Citeseer), Vol. 2008, (2008)

  30. [30]

    (MIT press Cambridge) Vol

    RS Sutton, AG Barto, , et al., Reinforcement learning: An introduction. (MIT press Cambridge) Vol. 1, (1998)

  31. [31]

    (Princeton University Press), (2025)

    HC Berg, Random walks in biology. (Princeton University Press), (2025). 8| Carboneet al. DRAFT

  32. [32]

    Oecologia5, 285–302 (1970)

    D Humphries, P Driver, Protean defence by prey animals. Oecologia5, 285–302 (1970)

  33. [33]

    N Furuichi, Dynamics between a predator and a prey switching two kinds of escape motions. J. theoretical Biol.217, 159–166 (2002)

  34. [34]

    F Borra, L Biferale, M Cencini, A Celani, Reinforcement learning for pursuit and evasion of microswimmers at low reynolds number. Phys. Rev. Fluids7, 023103 (2022)

  35. [35]

    Zeitschrift für Physikalische Chemie92U, 129–168 (1918)

    M Smoluchowski, Versuch einer mathematischen Theorie der Koagulationskinetik kolloider Lüsungen. Zeitschrift für Physikalische Chemie92U, 129–168 (1918)

  36. [36]

    (Springer), pp

    A Celani, E Panizon, Olfactory search in Target Search Problems. (Springer), pp. 711–732 (2024)

  37. [37]

    Tor Vergata

    V Tejedor, R Voituriez, O Bénichou, Optimizing persistent random searches. Phys. Rev. Lett.108, 088103 (2012). Carboneet al. PNAS |April 16, 2026| vol. XXX | no. XX |9 DRAFT Supplementary material for ”Olfactory pursuit: catching a moving odor source in complex flows” Maurizio Carbone a,bLorenzo Pirob, Robin A. Heinonen c, Luca Biferale b, Massimo Cencini...

  38. [38]

    The inset panel shows the belief over the target velocity together with the true target velocity (blue arrow)

    The grayscale colormap represents the agent’s belief over the target state. The inset panel shows the belief over the target velocity together with the true target velocity (blue arrow). Movies S3-S4. Comparison between best heuristic and infotaxis at τp = 2 and 25. The setup is the same as for movies S1-S2. The search episode resulting from the best heur...