Olfactory pursuit: catching a moving odor source in complex flows

Antonio Celani; Lorenzo Piro; Luca Biferale; Massimo Cencini; Maurizio Carbone; Robin A. Heinonen

arxiv: 2604.13121 · v1 · submitted 2026-04-13 · 💻 cs.RO

Olfactory pursuit: catching a moving odor source in complex flows

Maurizio Carbone , Lorenzo Piro , Robin A. Heinonen , Luca Biferale , Massimo Cencini , Antonio Celani This is my paper

Pith reviewed 2026-05-10 15:19 UTC · model grok-4.3

classification 💻 cs.RO

keywords olfactory pursuitpartially observable Markov decision processInfotaxisrun-and-tumble modeltarget interceptionrobotic searchinformation gainpredictive inference

0 comments

The pith

A hybrid policy that adds velocity prediction to information-seeking enables near-optimal interception of moving odor sources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Locating a moving target from delayed, intermittent odor signals mixed by turbulence is a core challenge for animals and robots. The paper formulates the problem as a partially observable Markov decision process and shows that purely exploratory strategies like Infotaxis perform well only when targets reorient frequently. For persistent motion, these strategies fail, but a computationally efficient hybrid policy that combines information gain with a greedy value function from the fully observable case achieves near-optimal performance. The approach remains effective under model mismatch, continuous motion, and more detailed plume models.

Core claim

The paper establishes that predictive inference over the target's velocity, in addition to position, is the key ingredient for effective olfactory pursuit. Using a discrete run-and-tumble motion model, the authors solve the Bellman equation for quasi-optimal policies and demonstrate that their hybrid heuristic, which augments Infotaxis with a greedy pursuit term derived from the fully observable interception problem, delivers near-optimal interception rates across all persistence times while substantially outperforming purely exploratory baselines.

What carries the argument

The hybrid policy that merges Infotaxis' information-gain drive with a greedy value function computed from the associated fully observable Markov decision process.

If this is right

Purely exploratory policies succeed near-optimally only for targets with frequent reorientations.
The hybrid heuristic maintains near-optimal performance for every persistence time.
It substantially outperforms purely exploratory methods when targets exhibit persistent motion.
The policy remains robust under continuous run-and-tumble motion, model mismatch, and more accurate plume representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same predictive-inference principle could be tested in other intermittent sensing domains such as acoustic or visual tracking with lag.
Robotic platforms could implement the hybrid policy with onboard plume sensors to measure real interception times versus simulation.
Extending the joint belief to include higher-order motion statistics might further improve performance for non-run-and-tumble targets.
The framework suggests a general template for search under delayed information in any turbulent or noisy transport process.

Load-bearing premise

The target's motion statistics are adequately captured by a discrete run-and-tumble process with a single persistence time parameter, and joint belief over position and velocity suffices for near-optimal decisions.

What would settle it

A direct comparison in real turbulent plume experiments with persistently moving targets would falsify the claim if the hybrid policy shows no improvement over Infotaxis or fails to achieve near-optimal interception rates.

Figures

Figures reproduced from arXiv: 2604.13121 by Antonio Celani, Lorenzo Piro, Luca Biferale, Massimo Cencini, Maurizio Carbone, Robin A. Heinonen.

**Figure 1.** Figure 1: Sketch of typical olfactory pursuit. The target source (blue circle) performs a run-and-tumble motion and emits a blue chemical plume along its trajectory. The agent (red dashed trajectory) pursues the moving target by inferring its position and velocity from discrete odor detections (indicated by the red triangles). The agent maintains a probability map over the target position Xs and velocity Us, namely … view at source ↗

**Figure 2.** Figure 2: Results from the discrete run-and-tumble model, with both agent and target moving on a periodic lattice. Panel A shows the average search time ⟨T⟩ as a function of the target persistence time τp, averaged over 104 search episodes starting from a detection: the black curve indicates the quasi-optimal strategy obtained from solving the POMDP; blue curve the infotactic policy (w = 0); gray curve refers to ran… view at source ↗

**Figure 3.** Figure 3: Results from the continuous run-and-tumble model, with the target performing a continuous run-and-tumble on a periodic square and the agent moving on a periodic lattice. Panel A shows the average search time ⟨T⟩ as a function of the target persistence time, averaged over 104 search episodes starting from a detection: circles indicate the hybrid heuristic policy 3 for the best weight w, color-coded as in th… view at source ↗

read the original abstract

Locating and intercepting a moving target from possibly delayed, intermittent sensory signals is a paradigmatic problem in decision-making under uncertainty, and a fundamental challenge for, e.g., animals seeking prey or mates and autonomous robotic systems. Odor signals are intermittent, strongly mixed by turbulent-like transport, and typically lag behind the true target position, thereby complicating localization. Here, we formulate olfactory pursuit as a partially observable Markov decision process in which an agent maintains a joint belief over the target's position and velocity. Using a discrete run-and-tumble model, we compute quasi-optimal policies by numerically solving the Bellman equation and benchmark them against well-established information-theoretic strategies such as Infotaxis. We show that purely exploratory policies are near-optimal when the target frequently reorients, but fail dramatically when the target exhibits persistent motion. We thus introduce a computationally efficient hybrid policy that combines the information-gain drive of Infotaxis with a "greedy" value function derived from an associated fully observable control problem. Our heuristic achieves near-optimal performance across all persistence times and substantially outperforms purely exploratory approaches. Moreover, our proposal demonstrates strong robustness even in more complex search scenarios, including continuous run-and-tumble prey motion with moderate persistence time, model mismatch, and more accurate plume dynamics representation. Our results identify predictive inference of target motion as the key ingredient for effective olfactory pursuit and provide a general framework for search in information-poor, dynamically evolving environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The hybrid policy is a practical step forward for olfactory search in simulations, but the robustness to real flows hinges on motion models that stay close to run-and-tumble.

read the letter

The paper's core contribution is a hybrid policy that adds a greedy value function from the fully observable problem to infotaxis. This lets the agent do better than pure exploration when the target keeps moving in one direction for a while. They solve the Bellman equation numerically on a discrete run-and-tumble target and show the hybrid stays close to optimal across persistence times while cutting down on the failures that hit pure infotaxis on persistent cases. That part is cleanly executed and the benchmarks are direct. They also run some checks on continuous motion, model mismatch, and a more detailed plume model, which is better than stopping at the basic setup. The work is honest about when information-only strategies fall short and identifies predictive inference of velocity as the missing piece. The soft spot is the motion model. All the robustness tests stay inside run-and-tumble families or modest variants, so the claim that the policy holds up under qualitatively different velocity correlations in real turbulence is not yet strongly supported. The abstract mentions more accurate plume dynamics, but without seeing the exact implementation and error controls it is difficult to judge how much that changes the picture. Discretization effects on the belief and value function are another area that would need careful checks in review. This is aimed at researchers who build search algorithms for robots or model animal navigation. It is solid enough on its own terms to deserve a serious referee rather than a desk reject, though the authors should be asked to clarify the limits of the motion assumption and add more diagnostics on the numerical solution.

Referee Report

1 major / 2 minor

Summary. The manuscript formulates olfactory pursuit of a moving target as a POMDP in which the agent maintains a joint belief over the target's position and velocity under a discrete run-and-tumble motion model. Quasi-optimal policies are obtained by numerically solving the Bellman equation and are benchmarked against Infotaxis. A computationally efficient hybrid policy is introduced that augments Infotaxis with a greedy value function derived from the associated fully observable control problem. The hybrid policy is reported to achieve near-optimal performance across persistence times and to remain effective under continuous run-and-tumble motion, model mismatch, and more accurate plume representations.

Significance. If the performance and robustness claims hold, the work supplies a concrete, predictive-inference-based framework for search under intermittent, delayed observations in turbulent flows. The explicit construction of the hybrid policy from an independently solved fully observable problem, the systematic benchmarking against Infotaxis, and the reported robustness tests constitute clear strengths. The results would be of interest to both robotics and behavioral ecology communities.

major comments (1)

[§4 and abstract] The central robustness claim (abstract and §4) that the hybrid policy remains effective under 'continuous run-and-tumble prey motion with moderate persistence time, model mismatch, and more accurate plume dynamics' rests on tests that remain inside the run-and-tumble family. Because the POMDP belief update and the value function are derived from the discrete Markovian velocity model, it is unclear whether the performance advantage survives qualitatively different velocity statistics (e.g., continuous correlated noise or multi-scale persistence). A direct comparison against an Ornstein-Uhlenbeck or other non-run-and-tumble process would be required to substantiate the generality statement.

minor comments (2)

[§3] The discretization parameters and convergence criteria used for the numerical Bellman solution are not stated explicitly; adding a short paragraph or table with grid sizes, time steps, and tolerance values would improve reproducibility.
[§2.2 and §3.1] Notation for the joint belief (position-velocity) and the information-gain term in the hybrid policy should be unified between the main text and the supplementary material.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comment correctly identifies a limitation in the scope of our robustness claims. We address it by clarifying the tested scenarios and revising the manuscript to avoid overstatement of generality.

read point-by-point responses

Referee: [§4 and abstract] The central robustness claim (abstract and §4) that the hybrid policy remains effective under 'continuous run-and-tumble prey motion with moderate persistence time, model mismatch, and more accurate plume dynamics' rests on tests that remain inside the run-and-tumble family. Because the POMDP belief update and the value function are derived from the discrete Markovian velocity model, it is unclear whether the performance advantage survives qualitatively different velocity statistics (e.g., continuous correlated noise or multi-scale persistence). A direct comparison against an Ornstein-Uhlenbeck or other non-run-and-tumble process would be required to substantiate the generality statement.

Authors: We agree that the reported tests, including continuous run-and-tumble motion, model mismatch, and altered plume representations, all operate within the run-and-tumble family. The continuous variant replaces the discrete-time Markov chain with a continuous-time process using exponential holding times, producing different velocity autocorrelation structure while preserving the same state space. Model-mismatch experiments evaluate the discrete-trained hybrid policy on continuous-motion environments and modified plume models, showing retained performance gains. However, these do not include qualitatively distinct processes such as Ornstein-Uhlenbeck dynamics or multi-scale persistence. We will revise the abstract and §4 to state explicitly that robustness is demonstrated for run-and-tumble variants and intra-family mismatch, and we will add a short discussion paragraph noting the limitation and suggesting extensions to other stochastic velocity models. This addresses the concern by narrowing the claim without new simulations. revision: yes

Circularity Check

0 steps flagged

No circularity: standard POMDP Bellman solution and hybrid policy from independent fully-observable problem

full rationale

The paper defines olfactory pursuit as a POMDP with joint belief over position and velocity under a discrete run-and-tumble model, computes policies by numerically solving the Bellman equation, and constructs the hybrid policy by combining Infotaxis information gain with a greedy value function taken from a separately defined fully observable control problem. None of these steps reduce by construction to the inputs, rename fitted quantities as predictions, or rest on load-bearing self-citations. Performance claims rest on explicit numerical benchmarks and simulations against external baselines (Infotaxis), which are independent of the derivation. The model assumptions are stated explicitly and tested via simulation variants, but this is a question of modeling fidelity rather than circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard POMDP assumptions and a domain-specific motion model; no new physical entities are postulated.

axioms (1)

domain assumption Target motion follows a discrete run-and-tumble process characterized by a persistence time parameter.
Invoked to define the state transition probabilities in the POMDP and to generate the benchmark trajectories.

pith-pipeline@v0.9.0 · 5569 in / 1241 out tokens · 39718 ms · 2026-05-10T15:19:20.529374+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

(CRC Press), (2007)

MR Conover, Predator-prey dynamics: the role of olfaction. (CRC Press), (2007)

work page 2007
[2]

L Seuront, Chemical and hydromechanical components of mate-seeking behaviour in the calanoid copepod eurytemora affinis. J. Plankton Res.35, 724–743 (2013)

work page 2013
[3]

J Y en, JK Sehn, K Catton, A Kramer, O Sarnelle, Pheromone trail following in three dimensions by the freshwater copepod hesperodiaptomus shoshone. J. Plankton Res.33, 907–916 (2011)

work page 2011
[4]

(IEEE), pp

A Kashyap, D Ghose, Pursuing a time varying and moving source signal using a sensor equipped uav in 2017 International Conference on Unmanned Aircraft Systems (ICUAS). (IEEE), pp. 506–515 (2017)

work page 2017
[5]

IEEE Transactions on Control

MA Demetriou, NA Gatsonis, JR Court, Coupled controls-computational fluids approach for the estimation of the concentration from a moving gaseous source in a 2-d domain with a lyapunov-guided sensing aerial vehicle. IEEE Transactions on Control. Syst. Technol.22, 853–867 (2013)

work page 2013
[6]

E Balkovsky, BI Shraiman, Olfactory search at high Reynolds number. Proc. national academy sciences99, 12589–12593 (2002)

work page 2002
[7]

Nature445, 406–409 (2007)

M Vergassola, E Villermaux, B Shraiman, ‘infotaxis’ as a strategy for searching without gradients. Nature445, 406–409 (2007)

work page 2007
[8]

G Kowadlo, RA Russell, Robot odor localization: a taxonomy and survey. The Int. J. Robotics Res.27, 869–894 (2008)

work page 2008
[9]

Robotics Auton

Xx Chen, J Huang, Odor source localization algorithms on mobile robots: A review and future outlook. Robotics Auton. Syst.112, 123–136 (2019)

work page 2019
[10]

G Reddy, VN Murthy, M Vergassola, Olfactory sensing and navigation in turbulent environments. Annu. Rev. Condens. Matter Phys.13, 191–213 (2022)

work page 2022
[11]

JB Masson, Olfactory searches with limited space perception. Proc. Natl. Acad. Sci.110, 11261–11266 (2013)

work page 2013
[12]

KVB Verano, E Panizon, A Celani, Olfactory search with finite-state controllers. Proc. Natl. Acad. Sci.120, e2304230120 (2023)

work page 2023
[13]

LP Kaelbling, ML Littman, AR Cassandra, Planning and acting in partially observable stochastic domains. Artif. Intell.101, 99–134 (1998)

work page 1998
[14]

P Moore, J Crimaldi, Odor landscapes and animal behavior: tracking odor plumes in different physical worlds. J. Mar. Syst.49, 55–64 (2004)

work page 2004
[15]

A Celani, E Villermaux, M Vergassola, Odor landscapes in turbulent environments. Phys. Rev. X4, 041015 (2014)

work page 2014
[16]

A Loisy, C Eloy, Searching for a source without gradients: how good is infotaxis and how to beat it. Proc. R. Soc. Lond. A478, 20220118 (2022)

work page 2022
[17]

RA Heinonen, L Biferale, A Celani, M Vergassola, Optimal policies for Bayesian olfactory search in turbulent flows. Phys. Rev. E107, 055105 (2023)

work page 2023
[18]

G Richardson, P Dickinson, OH Burman, TW Pike, Unpredictable movement as an anti-predator strategy. Proc. Royal Soc. B285, 20181112 (2018)

work page 2018
[19]

AW Szopa-Comley, CC Ioannou, Responsive robotic prey reveal how predators adapt to predictability in escape tactics. Proc. Natl. Acad. Sci.119, e2117858119 (2022)

work page 2022
[20]

IROS ’96

A Cassandra, L Kaelbling, J Kurien, Acting under uncertainty: discrete bayesian models for mobile-robot navigation in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS ’96. Vol. 2, pp. 963–972 vol.2 (1996)

work page 1996
[21]

(Princeton University Press), (1957)

R Bellman, Dynamic Programming. (Princeton University Press), (1957)

work page 1957
[22]

KJ Astrom, Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Applic.10, 174–205 (1965)

work page 1965
[23]

EJ Sondik, The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res.26, 282–304 (1978)

work page 1978
[24]

(Cambridge University Press, Cambridge), (2016)

V Krishnamurthy, Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing. (Cambridge University Press, Cambridge), (2016)

work page 2016
[25]

M Wiering, M van Otterlo

MTJ Spaan, Partially observable markov decision processes in Reinforcement Learning: State of the Art, eds. M Wiering, M van Otterlo. (Springer), pp. 387–414 (2012)

work page 2012
[26]

A Loisy, RA Heinonen, Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark. The Eur. Phys. J. E46, 17 (2023)

work page 2023
[27]

RA Heinonen, L Biferale, A Celani, M Vergassola, Optimal trajectories for bayesian olfactory search in turbulent flows: The low information limit and beyond. Phys. Rev. Fluids10, 044601 (2025)

work page 2025
[28]

RA Heinonen, L Biferale, A Celani, M Vergassola, Exploring bayesian olfactory search in realistic turbulent flows. Phys. Rev. Fluids10, 064614 (2025)

work page 2025
[29]

in Robotics: Science and systems

H Kurniawati, D Hsu, WS Lee, Sarsop: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. in Robotics: Science and systems. (Citeseer), Vol. 2008, (2008)

work page 2008
[30]

(MIT press Cambridge) Vol

RS Sutton, AG Barto, , et al., Reinforcement learning: An introduction. (MIT press Cambridge) Vol. 1, (1998)

work page 1998
[31]

(Princeton University Press), (2025)

HC Berg, Random walks in biology. (Princeton University Press), (2025). 8| Carboneet al. DRAFT

work page 2025
[32]

Oecologia5, 285–302 (1970)

D Humphries, P Driver, Protean defence by prey animals. Oecologia5, 285–302 (1970)

work page 1970
[33]

N Furuichi, Dynamics between a predator and a prey switching two kinds of escape motions. J. theoretical Biol.217, 159–166 (2002)

work page 2002
[34]

F Borra, L Biferale, M Cencini, A Celani, Reinforcement learning for pursuit and evasion of microswimmers at low reynolds number. Phys. Rev. Fluids7, 023103 (2022)

work page 2022
[35]

Zeitschrift für Physikalische Chemie92U, 129–168 (1918)

M Smoluchowski, Versuch einer mathematischen Theorie der Koagulationskinetik kolloider Lüsungen. Zeitschrift für Physikalische Chemie92U, 129–168 (1918)

work page 1918
[36]

(Springer), pp

A Celani, E Panizon, Olfactory search in Target Search Problems. (Springer), pp. 711–732 (2024)

work page 2024
[37]

Tor Vergata

V Tejedor, R Voituriez, O Bénichou, Optimizing persistent random searches. Phys. Rev. Lett.108, 088103 (2012). Carboneet al. PNAS |April 16, 2026| vol. XXX | no. XX |9 DRAFT Supplementary material for ”Olfactory pursuit: catching a moving odor source in complex flows” Maurizio Carbone a,bLorenzo Pirob, Robin A. Heinonen c, Luca Biferale b, Massimo Cencini...

work page 2012
[38]

The inset panel shows the belief over the target velocity together with the true target velocity (blue arrow)

The grayscale colormap represents the agent’s belief over the target state. The inset panel shows the belief over the target velocity together with the true target velocity (blue arrow). Movies S3-S4. Comparison between best heuristic and infotaxis at τp = 2 and 25. The setup is the same as for movies S1-S2. The search episode resulting from the best heur...

work page

[1] [1]

(CRC Press), (2007)

MR Conover, Predator-prey dynamics: the role of olfaction. (CRC Press), (2007)

work page 2007

[2] [2]

L Seuront, Chemical and hydromechanical components of mate-seeking behaviour in the calanoid copepod eurytemora affinis. J. Plankton Res.35, 724–743 (2013)

work page 2013

[3] [3]

J Y en, JK Sehn, K Catton, A Kramer, O Sarnelle, Pheromone trail following in three dimensions by the freshwater copepod hesperodiaptomus shoshone. J. Plankton Res.33, 907–916 (2011)

work page 2011

[4] [4]

(IEEE), pp

A Kashyap, D Ghose, Pursuing a time varying and moving source signal using a sensor equipped uav in 2017 International Conference on Unmanned Aircraft Systems (ICUAS). (IEEE), pp. 506–515 (2017)

work page 2017

[5] [5]

IEEE Transactions on Control

MA Demetriou, NA Gatsonis, JR Court, Coupled controls-computational fluids approach for the estimation of the concentration from a moving gaseous source in a 2-d domain with a lyapunov-guided sensing aerial vehicle. IEEE Transactions on Control. Syst. Technol.22, 853–867 (2013)

work page 2013

[6] [6]

E Balkovsky, BI Shraiman, Olfactory search at high Reynolds number. Proc. national academy sciences99, 12589–12593 (2002)

work page 2002

[7] [7]

Nature445, 406–409 (2007)

M Vergassola, E Villermaux, B Shraiman, ‘infotaxis’ as a strategy for searching without gradients. Nature445, 406–409 (2007)

work page 2007

[8] [8]

G Kowadlo, RA Russell, Robot odor localization: a taxonomy and survey. The Int. J. Robotics Res.27, 869–894 (2008)

work page 2008

[9] [9]

Robotics Auton

Xx Chen, J Huang, Odor source localization algorithms on mobile robots: A review and future outlook. Robotics Auton. Syst.112, 123–136 (2019)

work page 2019

[10] [10]

G Reddy, VN Murthy, M Vergassola, Olfactory sensing and navigation in turbulent environments. Annu. Rev. Condens. Matter Phys.13, 191–213 (2022)

work page 2022

[11] [11]

JB Masson, Olfactory searches with limited space perception. Proc. Natl. Acad. Sci.110, 11261–11266 (2013)

work page 2013

[12] [12]

KVB Verano, E Panizon, A Celani, Olfactory search with finite-state controllers. Proc. Natl. Acad. Sci.120, e2304230120 (2023)

work page 2023

[13] [13]

LP Kaelbling, ML Littman, AR Cassandra, Planning and acting in partially observable stochastic domains. Artif. Intell.101, 99–134 (1998)

work page 1998

[14] [14]

P Moore, J Crimaldi, Odor landscapes and animal behavior: tracking odor plumes in different physical worlds. J. Mar. Syst.49, 55–64 (2004)

work page 2004

[15] [15]

A Celani, E Villermaux, M Vergassola, Odor landscapes in turbulent environments. Phys. Rev. X4, 041015 (2014)

work page 2014

[16] [16]

A Loisy, C Eloy, Searching for a source without gradients: how good is infotaxis and how to beat it. Proc. R. Soc. Lond. A478, 20220118 (2022)

work page 2022

[17] [17]

RA Heinonen, L Biferale, A Celani, M Vergassola, Optimal policies for Bayesian olfactory search in turbulent flows. Phys. Rev. E107, 055105 (2023)

work page 2023

[18] [18]

G Richardson, P Dickinson, OH Burman, TW Pike, Unpredictable movement as an anti-predator strategy. Proc. Royal Soc. B285, 20181112 (2018)

work page 2018

[19] [19]

AW Szopa-Comley, CC Ioannou, Responsive robotic prey reveal how predators adapt to predictability in escape tactics. Proc. Natl. Acad. Sci.119, e2117858119 (2022)

work page 2022

[20] [20]

IROS ’96

A Cassandra, L Kaelbling, J Kurien, Acting under uncertainty: discrete bayesian models for mobile-robot navigation in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS ’96. Vol. 2, pp. 963–972 vol.2 (1996)

work page 1996

[21] [21]

(Princeton University Press), (1957)

R Bellman, Dynamic Programming. (Princeton University Press), (1957)

work page 1957

[22] [22]

KJ Astrom, Optimal control of Markov decision processes with incomplete state estimation. J. Math. Anal. Applic.10, 174–205 (1965)

work page 1965

[23] [23]

EJ Sondik, The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res.26, 282–304 (1978)

work page 1978

[24] [24]

(Cambridge University Press, Cambridge), (2016)

V Krishnamurthy, Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing. (Cambridge University Press, Cambridge), (2016)

work page 2016

[25] [25]

M Wiering, M van Otterlo

MTJ Spaan, Partially observable markov decision processes in Reinforcement Learning: State of the Art, eds. M Wiering, M van Otterlo. (Springer), pp. 387–414 (2012)

work page 2012

[26] [26]

A Loisy, RA Heinonen, Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark. The Eur. Phys. J. E46, 17 (2023)

work page 2023

[27] [27]

RA Heinonen, L Biferale, A Celani, M Vergassola, Optimal trajectories for bayesian olfactory search in turbulent flows: The low information limit and beyond. Phys. Rev. Fluids10, 044601 (2025)

work page 2025

[28] [28]

RA Heinonen, L Biferale, A Celani, M Vergassola, Exploring bayesian olfactory search in realistic turbulent flows. Phys. Rev. Fluids10, 064614 (2025)

work page 2025

[29] [29]

in Robotics: Science and systems

H Kurniawati, D Hsu, WS Lee, Sarsop: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. in Robotics: Science and systems. (Citeseer), Vol. 2008, (2008)

work page 2008

[30] [30]

(MIT press Cambridge) Vol

RS Sutton, AG Barto, , et al., Reinforcement learning: An introduction. (MIT press Cambridge) Vol. 1, (1998)

work page 1998

[31] [31]

(Princeton University Press), (2025)

HC Berg, Random walks in biology. (Princeton University Press), (2025). 8| Carboneet al. DRAFT

work page 2025

[32] [32]

Oecologia5, 285–302 (1970)

D Humphries, P Driver, Protean defence by prey animals. Oecologia5, 285–302 (1970)

work page 1970

[33] [33]

N Furuichi, Dynamics between a predator and a prey switching two kinds of escape motions. J. theoretical Biol.217, 159–166 (2002)

work page 2002

[34] [34]

F Borra, L Biferale, M Cencini, A Celani, Reinforcement learning for pursuit and evasion of microswimmers at low reynolds number. Phys. Rev. Fluids7, 023103 (2022)

work page 2022

[35] [35]

Zeitschrift für Physikalische Chemie92U, 129–168 (1918)

M Smoluchowski, Versuch einer mathematischen Theorie der Koagulationskinetik kolloider Lüsungen. Zeitschrift für Physikalische Chemie92U, 129–168 (1918)

work page 1918

[36] [36]

(Springer), pp

A Celani, E Panizon, Olfactory search in Target Search Problems. (Springer), pp. 711–732 (2024)

work page 2024

[37] [37]

Tor Vergata

V Tejedor, R Voituriez, O Bénichou, Optimizing persistent random searches. Phys. Rev. Lett.108, 088103 (2012). Carboneet al. PNAS |April 16, 2026| vol. XXX | no. XX |9 DRAFT Supplementary material for ”Olfactory pursuit: catching a moving odor source in complex flows” Maurizio Carbone a,bLorenzo Pirob, Robin A. Heinonen c, Luca Biferale b, Massimo Cencini...

work page 2012

[38] [38]

The inset panel shows the belief over the target velocity together with the true target velocity (blue arrow)

The grayscale colormap represents the agent’s belief over the target state. The inset panel shows the belief over the target velocity together with the true target velocity (blue arrow). Movies S3-S4. Comparison between best heuristic and infotaxis at τp = 2 and 25. The setup is the same as for movies S1-S2. The search episode resulting from the best heur...

work page