Emergence of a Flow-Assisted Casting Strategy for Olfactory Navigation via Memory-Augmented Reinforcement Learning

Changxu Zhao; Dongxiao Zhao; Gaojin Li; Xin Bian

arxiv: 2605.18881 · v1 · pith:7YT3EINTnew · submitted 2026-05-16 · 💻 cs.LG · physics.flu-dyn

Emergence of a Flow-Assisted Casting Strategy for Olfactory Navigation via Memory-Augmented Reinforcement Learning

Changxu Zhao , Dongxiao Zhao , Xin Bian , Gaojin Li This is my paper

Pith reviewed 2026-05-20 16:08 UTC · model grok-4.3

classification 💻 cs.LG physics.flu-dyn

keywords olfactory navigationreinforcement learningcasting strategyunsteady flowsmemory lengthsector-search modelodor source localization

0 comments

The pith

Reinforcement learning agents spontaneously develop a flow-assisted casting strategy for odor navigation in unsteady flows without any predefined models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates reinforcement learning agents navigating toward an odor source in simulated dynamic flow fields, where detections are stochastic. Agents learn to integrate detections over different memory lengths and without hand-designed rules, they adopt a casting strategy that uses flow information while tuning both the shape of their search paths and the odor concentration needed to trigger casting. This adaptation raises success rates, and the resulting average progress speed toward the source varies non-monotonically with memory length in a manner predicted by a sector-search model. A reader would care because the result shows how simple memory-based learning can produce efficient search behavior that resembles biological olfactory navigation.

Core claim

Without any predefined models, the agents develop a flow-assisted casting strategy and adaptively adjust both the geometry of their search trajectories and the concentration threshold for initiating casting to maximize the success rate. The agent's average speed toward the odor source exhibits a non-monotonic dependence on memory length, which can be explained by the sector-search model.

What carries the argument

Memory-augmented reinforcement learning agent that integrates stochastic odor detections over varying memory lengths while acting in simulated unsteady flow fields.

If this is right

Higher search success follows from the learned adjustments to casting geometry and concentration thresholds.
Navigation speed toward the source peaks at intermediate memory lengths rather than growing without limit.
The same non-monotonic speed-memory pattern holds across the different flow conditions examined.
The sector-search model accounts for the observed dependence of speed on memory length.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Robotic odor-localization devices could be trained with similar memory-augmented RL to handle real turbulence without explicit flow models.
The emergent strategy offers a candidate explanation for how animals achieve efficient casting in natural wind without complex internal maps.
Comparable memory-based adaptation might appear in other intermittent-signal search problems, such as acoustic or chemical sensing in moving fluids.

Load-bearing premise

The simulated unsteady flow fields and reinforcement learning reward structure capture the essential statistics of real-world odor transport and detection.

What would settle it

A direct test would be whether physical robots trained the same way in laboratory unsteady flows show the same adaptive changes in trajectory geometry, casting threshold, and non-monotonic speed-memory relation.

Figures

Figures reproduced from arXiv: 2605.18881 by Changxu Zhao, Dongxiao Zhao, Gaojin Li, Xin Bian.

**Figure 2.** Figure 2: FIG. 2. Emergence of flow-assisted casting strategy with increased memory length [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Non-monotonic dependence of search performance on [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Universal dependence of (a) small adjusting angle [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Comparison of our simulated normalized average [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

In dynamic flow fields, various animals exhibit remarkable odor search capabilities despite relying on stochastic detections. Interestingly, there exists an optimal time window for integrating these detections that maximizes search efficiency. To understand the underlying mechanism, we investigate the navigation performance of Reinforcement Learning (RL) agents in unsteady flows under varying memory lengths and flow conditions. Without any predefined models, the agents develop a flow-assisted casting strategy and adaptively adjust both the geometry of their search trajectories and the concentration threshold for initiating casting to maximize the success rate. The agent's average speed toward the odor source exhibits a non-monotonic dependence on memory length, which can be explained by the "sector-search" model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper examines reinforcement learning agents navigating in simulated unsteady flow fields to locate an odor source, varying the agents' memory length. Without any hand-crafted search models, the agents are reported to develop a flow-assisted casting strategy, adaptively tuning both the geometry of search trajectories and the concentration threshold used to trigger casting. The average speed toward the source is shown to depend non-monotonically on memory length; this dependence is accounted for by a post-hoc sector-search model.

Significance. If the simulation results and their interpretation hold, the work would illustrate how memory-augmented RL can produce emergent, biologically plausible navigation behaviors in turbulent transport without explicit programming of casting or sector-search rules. It would also supply a mechanistic account, via the sector-search model, for the non-monotonic effect of memory length on search efficiency.

major comments (2)

[Abstract and Results] The central claims concerning emergence of the flow-assisted casting strategy and adaptive threshold tuning rest entirely on simulation outcomes whose parameters, statistical tests, baseline comparisons, error bars, and validation against real flow data are not reported. This absence prevents evaluation of whether the observed behaviors are robust or artifacts of the specific reward structure and flow statistics.
[Discussion] The sector-search model is introduced post-hoc to explain the non-monotonic speed-versus-memory curve. No quantitative comparison is provided showing that the learned policy statistics actually satisfy the model's assumptions (e.g., sector geometry, threshold adaptation) beyond the training distribution.

minor comments (2)

[Notation] Notation for memory length, concentration threshold, and flow parameters should be defined consistently in the main text and figure captions.
[Figures] Figure captions should explicitly state the number of independent runs, random seeds, and any statistical significance tests performed.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and indicate the revisions we will make to improve the clarity and rigor of the manuscript.

read point-by-point responses

Referee: [Abstract and Results] The central claims concerning emergence of the flow-assisted casting strategy and adaptive threshold tuning rest entirely on simulation outcomes whose parameters, statistical tests, baseline comparisons, error bars, and validation against real flow data are not reported. This absence prevents evaluation of whether the observed behaviors are robust or artifacts of the specific reward structure and flow statistics.

Authors: We agree that additional methodological details and statistical support are required. In the revised manuscript we will add a comprehensive Methods appendix specifying all flow-generation parameters, Reynolds numbers, odor-release rates, reward-function coefficients, number of independent training runs, and random seeds. We will also report error bars on all performance curves, include statistical tests for the non-monotonic dependence on memory length, and provide baseline comparisons against memoryless agents and fixed-threshold policies. Regarding real flow data, the study is deliberately simulation-based to control flow statistics; we will insert a paragraph relating our synthetic turbulence statistics to published measurements of odor plumes in the literature. These additions will allow readers to assess robustness directly. revision: yes
Referee: [Discussion] The sector-search model is introduced post-hoc to explain the non-monotonic speed-versus-memory curve. No quantitative comparison is provided showing that the learned policy statistics actually satisfy the model's assumptions (e.g., sector geometry, threshold adaptation) beyond the training distribution.

Authors: We acknowledge that the current Discussion presents the sector-search model primarily as an explanatory device without direct quantitative validation of the learned policy. In the revision we will add new figures and text that extract casting-sector angles, casting durations, and concentration-threshold distributions from the trained policies. These statistics will be compared quantitatively to the analytic predictions of the sector-search model, both on the original training distribution and on held-out flow conditions with different turbulence intensities. This will demonstrate that the emergent policy statistics are consistent with the model's assumptions outside the training regime. revision: yes

Circularity Check

0 steps flagged

No circularity: RL emergence and post-hoc model remain independent

full rationale

The paper trains memory-augmented RL agents in simulated unsteady flows and reports emergent casting behavior plus non-monotonic speed dependence on memory length. The sector-search model is invoked only as an explanatory lens for the observed dependence; no equation or claim shows that performance metrics, thresholds, or trajectory geometry are defined in terms of the model parameters or fitted from the same RL trajectories. The reward function and flow statistics are external inputs to the training loop, not outputs of the sector-search construction. The derivation chain is therefore self-contained against the simulation benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the setup implicitly relies on standard RL assumptions and flow simulation fidelity without explicit free parameters or invented entities listed.

axioms (1)

domain assumption The chosen flow simulation and detection model produce statistically representative odor transport for testing navigation strategies.
Invoked by the experimental design described in the abstract.

pith-pipeline@v0.9.0 · 5646 in / 1277 out tokens · 45613 ms · 2026-05-20T16:08:55.266666+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

[1]

Proceedings of the 2016 international conference on autonomous agents & multiagent systems , pages=

Source task creation for curriculum learning , author=. Proceedings of the 2016 international conference on autonomous agents & multiagent systems , pages=

work page 2016
[2]

Journal of Machine Learning Research , volume=

Curriculum learning for reinforcement learning domains: A framework and survey , author=. Journal of Machine Learning Research , volume=

work page
[3]

, author=

Deep Recurrent Q-Learning for Partially Observable MDPs. , author=. AAAI fall symposia , volume=

work page
[4]

2016 23rd international conference on pattern recognition (icpr) , pages=

Reinforcement learning via recurrent convolutional neural networks , author=. 2016 23rd international conference on pattern recognition (icpr) , pages=. 2016 , organization=

work page 2016
[5]

Advances in neural information processing systems , volume=

Qmdp-net: Deep learning for planning under partial observability , author=. Advances in neural information processing systems , volume=

work page
[6]

Icml , volume=

Policy invariance under reward transformations: Theory and application to reward shaping , author=. Icml , volume=. 1999 , organization=

work page 1999
[7]

Autonomous Agents and Multi-Agent Systems , volume=

Potential-based reward shaping for finite horizon online POMDP planning , author=. Autonomous Agents and Multi-Agent Systems , volume=. 2016 , publisher=

work page 2016
[8]

Changxu Zhao and Dongxiao Zhao and Gaojin Li , title =

work page
[9]

Vickers , journal =

Neil J. Vickers , journal =. Mechanisms of Animal Navigation in Odor Plumes , volume =

work page
[10]

Odor Plumes and How Insects Use Them , author=. Annu. Rev. Entomol. , volume=

work page
[11]

Olfactory Sensing and Navigation in Turbulent Environments , author=. Annu. Rev. Condens. Matter Phys. , volume=. 2022 , publisher=

work page 2022
[12]

Europhys

Optimal navigation strategies for active particles , author=. Europhys. Lett. , volume=. 2019 , publisher=

work page 2019
[13]

Tracking and navigation of a microswarm under laser speckle contrast imaging for targeted delivery , author=. Sci. Robot. , volume=. 2024 , publisher=

work page 2024
[14]

IEEE Robot

Robots for Environmental Monitoring: Significant Advancements and Applications , author=. IEEE Robot. Autom. Mag. , volume=. 2012 , publisher=

work page 2012
[15]

An innovative autonomous robotic system for on-site detection of heavy metal pollution plumes in surface water , author=. Environ. Monit. Assess. , volume=. 2022 , publisher=

work page 2022
[16]

Bioinspired soft robots for deep-sea exploration , author=. Nat. Commun. , volume=. 2023 , publisher=

work page 2023
[17]

Odor landscapes in turbulent environments , author=. Phys. Rev. X , volume=. 2014 , publisher=

work page 2014
[18]

Navigational strategies used by insects to find distant, wind-borne sources of odor , author=. J. Chem. Ecol. , volume=. 2008 , publisher=

work page 2008
[19]

Physical processes and real-time chemical measurement of the insect olfactory environment , author=. J. Chem. Ecol. , volume=. 2008 , publisher=

work page 2008
[20]

Learning efficient navigation in vortical flow fields , author=. Nat. Commun. , volume=. 2021 , publisher=

work page 2021
[21]

Sensing flow gradients is necessary for learning autonomous underwater navigation , author=. Nat. Commun. , volume=. 2025 , publisher=

work page 2025
[22]

Wind gates olfaction-driven search states in free flight , author=. Curr. Biol. , volume=. 2024 , publisher=

work page 2024
[23]

Reinforcement learning: State-of-the-art , pages=

Partially observable Markov decision processes , author=. Reinforcement learning: State-of-the-art , pages=. 2012 , publisher=

work page 2012
[24]

Partially observable markov decision processes and robotics , author=. Annu. Rev. Control Robot. Auton. Syst. , volume=. 2022 , publisher=

work page 2022
[25]

2022 , publisher=

Algorithms for decision making , author=. 2022 , publisher=

work page 2022
[26]

Olfactory search at high Reynolds number , author=. Proc. Natl. Acad. Sci. , volume=. 2002 , publisher=

work page 2002
[27]

Olfactory search with finite-state controllers , author=. Proc. Natl. Acad. Sci. , volume=. 2023 , publisher=

work page 2023
[28]

Nature , volume=

‘Infotaxis’ as a strategy for searching without gradients , author=. Nature , volume=. 2007 , publisher=

work page 2007
[29]

Europhys

On the trajectories and performance of infotaxis, an information-based greedy search algorithm , author=. Europhys. Lett. , volume=

work page
[30]

Olfactory searches with limited space perception , author=. Proc. Natl. Acad. Sci. , volume=. 2013 , publisher=

work page 2013
[31]

Optimal policies for Bayesian olfactory search in turbulent flows , author=. Phys. Rev. E , volume=. 2023 , publisher=

work page 2023
[32]

Optimal trajectories for Bayesian olfactory search in turbulent flows: The low information limit and beyond , author=. Phys. Rev. Fluids , volume=. 2025 , publisher=

work page 2025
[33]

Exploring Bayesian olfactory search in realistic turbulent flows , author=. Phys. Rev. Fluids , volume=. 2025 , publisher=

work page 2025
[34]

Searching for a source without gradients: how good is infotaxis and how to beat it , author=. Proc. R. Soc. A , volume=. 2022 , publisher=

work page 2022
[35]

Sector search strategies for odor trail tracking , author=. Proc. Natl. Acad. Sci. , volume=. 2022 , publisher=

work page 2022
[36]

Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark , author=. Eur. Phys. J. E , volume=. 2023 , publisher=

work page 2023
[37]

Emergent behaviour and neural dynamics in artificial agents tracking odour plumes , author=. Nat. Mach. Intell. , volume=. 2023 , publisher=

work page 2023
[38]

Elife , volume=

Q-learning with temporal memory to navigate turbulence , author=. Elife , volume=. 2025 , publisher=

work page 2025
[39]

2004 , publisher=

Chemotaxis , author=. 2004 , publisher=

work page 2004
[40]

Spatial memory-based behaviors for locating sources of odor plumes , author=. Movem. Ecol. , volume=. 2015 , publisher=

work page 2015
[41]

bioRxiv , pages=

Neural dynamics for working memory and evidence integration during olfactory navigation in Drosophila , author=. bioRxiv , pages=

work page
[42]

Self-generated Zigzag Turning of Bombyx mori Males during Pheromone-mediated Upwind Walking , author=. Zool. Sci. , volume=

work page
[43]

Nature , volume=

Fine-scale structure of pheromone plumes modulates upwind orientation of flying moths , author=. Nature , volume=. 1994 , publisher=

work page 1994
[44]

Rats track odour trails accurately using a multi-layered strategy with near-optimal sampling , author=. Nat. Commun. , volume=. 2012 , publisher=

work page 2012
[45]

Nature , volume=

Finding of a sex pheromone source by gypsy moths released in the field , author=. Nature , volume=. 1983 , publisher=

work page 1983
[46]

Finite-horizon, energy-efficient trajectories in unsteady flows , author=. Proc. R. Soc. A , volume=. 2022 , publisher=

work page 2022
[47]

CHAOS , volume=

Zermelo's problem: Optimal point-to-point navigation in 2D turbulent flows using reinforcement learning , author=. CHAOS , volume=. 2019 , publisher=

work page 2019
[48]

Curriculum learning for reinforcement learning domains: A framework and survey , author=. J. Mach. Learn. Res. , volume=

work page
[49]

Qmdp-net: Deep learning for planning under partial observability , author=. Adv. Neural Inf. Process. Syst. , volume=

work page
[50]

Potential-based reward shaping for finite horizon online POMDP planning , author=. Auton. Agent Multi Agent Syst. , volume=. 2016 , publisher=

work page 2016

[1] [1]

Proceedings of the 2016 international conference on autonomous agents & multiagent systems , pages=

Source task creation for curriculum learning , author=. Proceedings of the 2016 international conference on autonomous agents & multiagent systems , pages=

work page 2016

[2] [2]

Journal of Machine Learning Research , volume=

Curriculum learning for reinforcement learning domains: A framework and survey , author=. Journal of Machine Learning Research , volume=

work page

[3] [3]

, author=

Deep Recurrent Q-Learning for Partially Observable MDPs. , author=. AAAI fall symposia , volume=

work page

[4] [4]

2016 23rd international conference on pattern recognition (icpr) , pages=

Reinforcement learning via recurrent convolutional neural networks , author=. 2016 23rd international conference on pattern recognition (icpr) , pages=. 2016 , organization=

work page 2016

[5] [5]

Advances in neural information processing systems , volume=

Qmdp-net: Deep learning for planning under partial observability , author=. Advances in neural information processing systems , volume=

work page

[6] [6]

Icml , volume=

Policy invariance under reward transformations: Theory and application to reward shaping , author=. Icml , volume=. 1999 , organization=

work page 1999

[7] [7]

Autonomous Agents and Multi-Agent Systems , volume=

Potential-based reward shaping for finite horizon online POMDP planning , author=. Autonomous Agents and Multi-Agent Systems , volume=. 2016 , publisher=

work page 2016

[8] [8]

Changxu Zhao and Dongxiao Zhao and Gaojin Li , title =

work page

[9] [9]

Vickers , journal =

Neil J. Vickers , journal =. Mechanisms of Animal Navigation in Odor Plumes , volume =

work page

[10] [10]

Odor Plumes and How Insects Use Them , author=. Annu. Rev. Entomol. , volume=

work page

[11] [11]

Olfactory Sensing and Navigation in Turbulent Environments , author=. Annu. Rev. Condens. Matter Phys. , volume=. 2022 , publisher=

work page 2022

[12] [12]

Europhys

Optimal navigation strategies for active particles , author=. Europhys. Lett. , volume=. 2019 , publisher=

work page 2019

[13] [13]

Tracking and navigation of a microswarm under laser speckle contrast imaging for targeted delivery , author=. Sci. Robot. , volume=. 2024 , publisher=

work page 2024

[14] [14]

IEEE Robot

Robots for Environmental Monitoring: Significant Advancements and Applications , author=. IEEE Robot. Autom. Mag. , volume=. 2012 , publisher=

work page 2012

[15] [15]

An innovative autonomous robotic system for on-site detection of heavy metal pollution plumes in surface water , author=. Environ. Monit. Assess. , volume=. 2022 , publisher=

work page 2022

[16] [16]

Bioinspired soft robots for deep-sea exploration , author=. Nat. Commun. , volume=. 2023 , publisher=

work page 2023

[17] [17]

Odor landscapes in turbulent environments , author=. Phys. Rev. X , volume=. 2014 , publisher=

work page 2014

[18] [18]

Navigational strategies used by insects to find distant, wind-borne sources of odor , author=. J. Chem. Ecol. , volume=. 2008 , publisher=

work page 2008

[19] [19]

Physical processes and real-time chemical measurement of the insect olfactory environment , author=. J. Chem. Ecol. , volume=. 2008 , publisher=

work page 2008

[20] [20]

Learning efficient navigation in vortical flow fields , author=. Nat. Commun. , volume=. 2021 , publisher=

work page 2021

[21] [21]

Sensing flow gradients is necessary for learning autonomous underwater navigation , author=. Nat. Commun. , volume=. 2025 , publisher=

work page 2025

[22] [22]

Wind gates olfaction-driven search states in free flight , author=. Curr. Biol. , volume=. 2024 , publisher=

work page 2024

[23] [23]

Reinforcement learning: State-of-the-art , pages=

Partially observable Markov decision processes , author=. Reinforcement learning: State-of-the-art , pages=. 2012 , publisher=

work page 2012

[24] [24]

Partially observable markov decision processes and robotics , author=. Annu. Rev. Control Robot. Auton. Syst. , volume=. 2022 , publisher=

work page 2022

[25] [25]

2022 , publisher=

Algorithms for decision making , author=. 2022 , publisher=

work page 2022

[26] [26]

Olfactory search at high Reynolds number , author=. Proc. Natl. Acad. Sci. , volume=. 2002 , publisher=

work page 2002

[27] [27]

Olfactory search with finite-state controllers , author=. Proc. Natl. Acad. Sci. , volume=. 2023 , publisher=

work page 2023

[28] [28]

Nature , volume=

‘Infotaxis’ as a strategy for searching without gradients , author=. Nature , volume=. 2007 , publisher=

work page 2007

[29] [29]

Europhys

On the trajectories and performance of infotaxis, an information-based greedy search algorithm , author=. Europhys. Lett. , volume=

work page

[30] [30]

Olfactory searches with limited space perception , author=. Proc. Natl. Acad. Sci. , volume=. 2013 , publisher=

work page 2013

[31] [31]

Optimal policies for Bayesian olfactory search in turbulent flows , author=. Phys. Rev. E , volume=. 2023 , publisher=

work page 2023

[32] [32]

Optimal trajectories for Bayesian olfactory search in turbulent flows: The low information limit and beyond , author=. Phys. Rev. Fluids , volume=. 2025 , publisher=

work page 2025

[33] [33]

Exploring Bayesian olfactory search in realistic turbulent flows , author=. Phys. Rev. Fluids , volume=. 2025 , publisher=

work page 2025

[34] [34]

Searching for a source without gradients: how good is infotaxis and how to beat it , author=. Proc. R. Soc. A , volume=. 2022 , publisher=

work page 2022

[35] [35]

Sector search strategies for odor trail tracking , author=. Proc. Natl. Acad. Sci. , volume=. 2022 , publisher=

work page 2022

[36] [36]

Deep reinforcement learning for the olfactory search POMDP: a quantitative benchmark , author=. Eur. Phys. J. E , volume=. 2023 , publisher=

work page 2023

[37] [37]

Emergent behaviour and neural dynamics in artificial agents tracking odour plumes , author=. Nat. Mach. Intell. , volume=. 2023 , publisher=

work page 2023

[38] [38]

Elife , volume=

Q-learning with temporal memory to navigate turbulence , author=. Elife , volume=. 2025 , publisher=

work page 2025

[39] [39]

2004 , publisher=

Chemotaxis , author=. 2004 , publisher=

work page 2004

[40] [40]

Spatial memory-based behaviors for locating sources of odor plumes , author=. Movem. Ecol. , volume=. 2015 , publisher=

work page 2015

[41] [41]

bioRxiv , pages=

Neural dynamics for working memory and evidence integration during olfactory navigation in Drosophila , author=. bioRxiv , pages=

work page

[42] [42]

Self-generated Zigzag Turning of Bombyx mori Males during Pheromone-mediated Upwind Walking , author=. Zool. Sci. , volume=

work page

[43] [43]

Nature , volume=

Fine-scale structure of pheromone plumes modulates upwind orientation of flying moths , author=. Nature , volume=. 1994 , publisher=

work page 1994

[44] [44]

Rats track odour trails accurately using a multi-layered strategy with near-optimal sampling , author=. Nat. Commun. , volume=. 2012 , publisher=

work page 2012

[45] [45]

Nature , volume=

Finding of a sex pheromone source by gypsy moths released in the field , author=. Nature , volume=. 1983 , publisher=

work page 1983

[46] [46]

Finite-horizon, energy-efficient trajectories in unsteady flows , author=. Proc. R. Soc. A , volume=. 2022 , publisher=

work page 2022

[47] [47]

CHAOS , volume=

Zermelo's problem: Optimal point-to-point navigation in 2D turbulent flows using reinforcement learning , author=. CHAOS , volume=. 2019 , publisher=

work page 2019

[48] [48]

Curriculum learning for reinforcement learning domains: A framework and survey , author=. J. Mach. Learn. Res. , volume=

work page

[49] [49]

Qmdp-net: Deep learning for planning under partial observability , author=. Adv. Neural Inf. Process. Syst. , volume=

work page

[50] [50]

Potential-based reward shaping for finite horizon online POMDP planning , author=. Auton. Agent Multi Agent Syst. , volume=. 2016 , publisher=

work page 2016