arxiv: 2604.26095 · v1 · submitted 2026-04-28 · 💻 cs.AI

Recognition: unknown

Distill-Belief: Closed-Loop Inverse Source Localization and Characterization in Physical Fields

Yiwei Shi , Zixing Song , Mengyue Yang , Cunjia Liu , Weiru Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:12 UTC · model grok-4.3

classification 💻 cs.AI

keywords inverse source localizationteacher-student frameworkparticle filterbelief space planninginformation gainuncertainty estimationreward hackingmobile sensing

0 comments

The pith

A teacher-student framework decouples Bayesian correctness from efficient control in closed-loop source localization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Closed-loop inverse source localization and characterization requires a mobile agent to pick measurements that localize sources and infer field parameters under time pressure. The core tension is that valid uncertainty estimates need expensive Bayesian inference while fast learned models let the policy hack approximation errors instead of actually shrinking uncertainty. Distill-Belief solves this by running a particle-filter teacher that keeps the true posterior and supplies dense information-gain signals, then training a compact student to output belief statistics for control and an uncertainty certificate for stopping. At deployment only the student runs, giving constant per-step cost. Experiments across seven field modalities and stress tests show lower sensing costs, higher success rates, tighter posteriors, and better accuracy while reducing reward hacking.

Core claim

Distill-Belief is a teacher-student framework that decouples correctness from efficiency: a Bayes-correct particle-filter teacher maintains the posterior and supplies a dense information-gain signal, while a compact student distills the posterior into belief statistics for control and an uncertainty certificate for stopping. At deployment only the student is used, yielding constant per-step cost.

What carries the argument

Teacher-student distillation where the particle-filter teacher provides information-gain signals that train the student to produce usable belief statistics and stopping certificates.

If this is right

Sensing cost decreases while success rate, posterior contraction, and estimation accuracy increase relative to baselines.
Reward hacking is mitigated because the student is trained against a Bayes-correct information-gain signal.
Constant per-step computation enables real-time operation under strict time constraints.
The method generalizes across seven distinct field modalities and two stress-test conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation of rigorous teacher from lightweight student could transfer to other belief-space planning problems where exact inference is prohibitive.
In field robotics this structure may allow low-power platforms to retain Bayesian-level uncertainty handling without on-board particle filters.
If the student is periodically refreshed from the teacher, the framework could handle slowly drifting field statistics without full re-derivation of the policy.

Load-bearing premise

The student can approximate the teacher's information-gain signal and uncertainty estimates closely enough that the deployed policy reduces uncertainty rather than exploiting approximation errors.

What would settle it

Execute the student policy in simulation or hardware, then recompute the true posterior offline with the teacher; if uncertainty fails to contract or reward hacking reappears at rates comparable to baselines, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2604.26095 by Cunjia Liu, Mengyue Yang, Weiru Liu, Yiwei Shi, Zixing Song.

**Figure 1.** Figure 1: Gaussian Plume Model for source localization. view at source ↗

**Figure 2.** Figure 2: Teacher–Student Belief Distillation 4.1 Problem Setup and Belief-State Interface Building on the POMDP formulation in Sec. 3, we maintain a Bayesian belief 𝑏𝑡 (𝚯) ≈ 𝑝(𝚯 | 𝑜1:𝑡 , 𝒑1:𝑡 ) over the full parameter vector 𝚯 ∈ R 𝑑 (e.g., source location/strength and environmental factors) and use it for closed-loop control. At time 𝑡 we form a belief state 𝝍𝑡 = view at source ↗

**Figure 3.** Figure 3: Ablation study and deployment analysis. practical deployability: the agent must maintain localization quality while producing efficient and feasible trajectories under sparse, moderate, and dense obstacle layouts. We report SR, TE, and LPS to jointly reflect localization performance, sensing efficiency, and path-level behavior in constrained navigation view at source ↗

**Figure 4.** Figure 4: and Table 4a evaluates sensitivity to PF hyperparameters. Because particle count and resampling/stopping thresholds are common sources of confounding, we include this study to demonstrate that our gains are not due to a narrowly tuned setting. We vary the particle budget 𝑁, the ESS resampling threshold 𝜏ESS, and the stopping threshold 𝜏stop, and report both performance metrics (SR, TE, SLE, UQ) and per-st… view at source ↗

**Figure 5.** Figure 5: Method ranking across all experimental set view at source ↗

**Figure 6.** Figure 6: Waterfall diagram showing the cumulative contribu view at source ↗

**Figure 8.** Figure 8: Cross-field SR performance with gap-to-best shad view at source ↗

**Figure 7.** Figure 7: Performance–cost tradeoff across hyperparameter view at source ↗

**Figure 9.** Figure 9: SR–TE Pareto landscape under (a) multi-source and view at source ↗

**Figure 10.** Figure 10: SR degradation (%) under increasing difficulty: view at source ↗

**Figure 11.** Figure 11: Critical Difference diagram for SR ranking across view at source ↗

**Figure 13.** Figure 13: Performance under obstacle-constrained environ view at source ↗

**Figure 14.** Figure 14: Radar chart of ablation study results (normalized, view at source ↗

read the original abstract

{Closed-loop inverse source localization and characterization (ISLC) requires a mobile agent to select measurements that localize sources and infer latent field parameters under strict time constraints.} {The core challenge lies in the belief-space objective: valid uncertainty estimation requires expensive Bayesian inference, whereas using fast learned belief model leads to reward hacking, in which the policy exploits approximation errors rather than actually reducing uncertainty.} {We propose \textbf{Distill-Belief}, a teacher--student framework that decouples correctness from efficiency. A Bayes-correct particle-filter teacher maintains the posterior and supplies a dense information-gain signal, while a compact student distills the posterior into belief statistics for control and an uncertainty certificate for stopping. At deployment, only the student is used, yielding constant per-step cost.} {Experiments on seven field modalities and two stress tests show that Distill-Belief consistently reduces sensing cost and improves success, posterior contraction, and estimation accuracy over baselines, while mitigating reward hacking.}

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Distill-Belief applies teacher-student distillation to keep particle-filter correctness at training time while running a fast student at deployment for closed-loop source localization, but the lack of approximation-error analysis leaves the reward-hacking fix unproven.

read the letter

The paper's central move is to split the work: a particle-filter teacher maintains the true posterior and supplies dense information-gain rewards during training, while a compact student learns to output the necessary belief statistics and an uncertainty certificate for the stopping rule. Only the student runs online, which cuts per-step cost to constant time. That separation is the concrete contribution, and it directly targets the efficiency-accuracy tension in belief-space planning for inverse source localization and characterization across physical fields. The experiments cover seven modalities plus two stress tests and report gains in sensing cost, success rate, posterior contraction, and accuracy over baselines, which is useful evidence that the pattern can be made to work in practice. The authors also show the student policy avoids the obvious reward-hacking failure mode that pure learned belief models often exhibit. Those results are the part worth taking away for anyone building mobile sensing systems. The soft spot is the one flagged in the stress test. The distillation objective is described, yet there are no error bounds, Lipschitz analysis, or worst-case propagation results showing how student approximation error affects the actual information-gain objective or the stopping decision. If the student systematically underestimates uncertainty on certain field types, the deployed policy could still optimize for its own certificate rather than true posterior contraction, recreating the problem the method claims to solve. The paper does not include an ablation that substitutes the teacher's signal at test time, so the mitigation claim rests on the unverified faithfulness assumption. The citation pattern is standard and the methods appear reproducible in principle, but the absence of those checks limits how far the empirical claims can be trusted without further work. This is for roboticists and sensing researchers who already use particle filters or belief-space planning and need a practical speed-up. A reader looking for a reusable teacher-student template in active perception will find the architecture clear enough to try. It deserves a serious referee because the motivation is sound, the experimental scope is reasonable, and the core idea is executable, even though the analysis of approximation quality will need strengthening before publication.

Referee Report

3 major / 1 minor

Summary. The paper proposes Distill-Belief, a teacher-student framework for closed-loop inverse source localization and characterization (ISLC) in physical fields. A Bayes-correct particle-filter teacher maintains the posterior and supplies a dense information-gain signal, while a compact student distills the posterior into belief statistics for control and an uncertainty certificate for stopping. At deployment only the student is used for constant per-step cost. Experiments on seven field modalities and two stress tests report consistent reductions in sensing cost and improvements in success rate, posterior contraction, and estimation accuracy over baselines, while mitigating reward hacking.

Significance. If the student's approximation of the teacher's information-gain and uncertainty signals is sufficiently faithful, the framework offers a practical route to deploy belief-space planning on resource-limited agents without incurring repeated expensive Bayesian inference. The explicit separation of correctness (teacher) from efficiency (student) and the multi-modality empirical evaluation are strengths that could influence inverse-problem robotics and active sensing literature.

major comments (3)

[Section 3] Section 3 (distillation objective): no error bounds, Lipschitz constants, or worst-case analysis are supplied on how student approximation error propagates into the belief-space objective or the stopping criterion. This is load-bearing for the central claim that the deployed policy reduces true uncertainty rather than exploiting approximation errors.
[Experimental evaluation] Experimental evaluation (abstract and results sections): claims of consistent gains across seven modalities and two stress tests are presented without specification of the exact baselines, statistical tests, data-exclusion rules, or implementation details, preventing verification of the reported improvements in success, contraction, and accuracy.
[Experimental evaluation] No ablation is reported in which the teacher's information-gain signal is substituted at test time to measure whether the student-only policy achieves comparable posterior contraction; without this check the mitigation of reward hacking rests on an unverified faithfulness assumption.

minor comments (1)

[Abstract] The abstract would be clearer if it briefly named the seven field modalities and the two stress tests rather than leaving them unspecified.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment below, indicating the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Section 3] Section 3 (distillation objective): no error bounds, Lipschitz constants, or worst-case analysis are supplied on how student approximation error propagates into the belief-space objective or the stopping criterion. This is load-bearing for the central claim that the deployed policy reduces true uncertainty rather than exploiting approximation errors.

Authors: We acknowledge that Section 3 presents the distillation objective without formal error bounds, Lipschitz constants, or worst-case propagation analysis. Deriving such guarantees is non-trivial given the stochastic particle-filter teacher and the heterogeneous, non-convex nature of the physical fields considered; standard Lipschitz assumptions do not hold uniformly. The central claim is instead grounded in the multi-modality empirical results showing consistent posterior contraction. In the revision we will add a dedicated paragraph in Section 3 that quantifies observed student-teacher approximation error on held-out trajectories and discusses its measured effect on the stopping criterion. revision: partial
Referee: [Experimental evaluation] Experimental evaluation (abstract and results sections): claims of consistent gains across seven modalities and two stress tests are presented without specification of the exact baselines, statistical tests, data-exclusion rules, or implementation details, preventing verification of the reported improvements in success, contraction, and accuracy.

Authors: We agree that greater specificity is needed for reproducibility. The original manuscript describes the baselines and metrics in Section 4 and supplies implementation details in the appendix, but these elements will be consolidated and expanded in the main text. The revision will explicitly enumerate all baselines, state the statistical tests (paired t-tests at p < 0.05), confirm that no trials were excluded, and list key hyperparameters together with a pointer to the released code. revision: yes
Referee: [Experimental evaluation] No ablation is reported in which the teacher's information-gain signal is substituted at test time to measure whether the student-only policy achieves comparable posterior contraction; without this check the mitigation of reward hacking rests on an unverified faithfulness assumption.

Authors: We concur that this ablation would provide direct evidence of distillation faithfulness. The submitted version does not contain such an experiment. We will run the requested ablation—deploying the student policy while substituting the teacher’s information-gain signal at test time—and report the resulting posterior-contraction curves alongside the student-only results in the revised experimental section. revision: yes

Circularity Check

0 steps flagged

No circularity: framework uses standard particle-filter teacher and distillation with empirical validation

full rationale

The paper presents Distill-Belief as a teacher-student architecture where a particle-filter teacher computes exact posteriors and information gain, and the student is trained to approximate belief statistics and uncertainty certificates. Claims of reduced sensing cost, improved posterior contraction, and mitigation of reward hacking are supported by experiments on seven modalities and stress tests rather than any closed-form derivation. No equations or steps in the abstract or described framework reduce the performance metrics to a fitted quantity or self-referential definition by construction. The faithfulness assumption is an empirical claim open to falsification via the reported results, not a tautology. This matches the default case of a non-circular engineering framework relying on known Bayesian and distillation techniques.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; the method rests on standard Bayesian filtering assumptions and the novel claim that distillation preserves uncertainty-reduction behavior. No new physical entities or ad-hoc constants are introduced in the visible text.

axioms (2)

domain assumption Particle-filter approximation yields a sufficiently accurate posterior for information-gain computation
Invoked for the teacher model
ad hoc to paper Distillation transfers the dense information-gain signal without introducing exploitable approximation errors
Central premise of the teacher-student split

pith-pipeline@v0.9.0 · 5475 in / 1401 out tokens · 68562 ms · 2026-05-07T16:12:48.460884+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

87 extracted references · 9 canonical work pages · 4 internal anchors

[1]

Christophe Andrieu, Arnaud Doucet, and Roman Holenstein. 2010. Particle markov chain monte carlo methods.Journal of the Royal Statistical Society Series B: Statistical Methodology72, 3 (2010), 269–342

2010
[2]

Simon R Arridge. 1999. Optical tomography in medical imaging.Inverse problems 15, 2 (1999), R41

1999
[3]

M Sanjeev Arulampalam, Simon Maskell, Neil Gordon, and Tim Clapp. 2002. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on signal processing50, 2 (2002), 174–188

2002
[4]

Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The option-critic architec- ture. InProceedings of the AAAI conference on artificial intelligence, Vol. 31

2017
[5]

Amvrossios C Bagtzoglou and Juliana Atmadja. 2005. Mathematical methods for hydrologic inversion: The case of pollution source identification. InWater Pollution: Environmental Impact Assessment of Recycled Wastes on Surface and Ground Waters; Engineering Modeling and Sustainability. Springer, 65–96

2005
[6]

Barto, Richard S

Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. 1983. Neu- ronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and CyberneticsSMC-13 (1983), 834–846. https://api.semanticscholar.org/CorpusID:1522994

1983
[7]

Frederic Bourgault, Alexei A Makarenko, Stefan B Williams, Ben Grocholsky, and Hugh F Durrant-Whyte. 2002. Information based adaptive robotic exploration. InIEEE/RSJ international conference on intelligent robots and systems, Vol. 1. IEEE, 540–545

2002
[8]

Axel Brandenburg and Kandaswamy Subramanian. 2005. Astrophysical magnetic fields and nonlinear dynamo theory.Physics Reports417, 1-4 (2005), 1–209

2005
[9]

Exploration by Random Network Distillation

Yuri Burda, Harrison Edwards, Amos J. Storkey, and Oleg Klimov. 2018. Explo- ration by Random Network Distillation.ArXivabs/1810.12894 (2018). https: //api.semanticscholar.org/CorpusID:53115163

work page Pith review arXiv 2018
[10]

Kathryn Chaloner and Isabella Verdinelli. 1995. Bayesian experimental design: A review.Statistical science(1995), 273–304

1995
[11]

Wen-Hua Chen, Callum Rhodes, and Cunjia Liu. 2021. Dual control for exploita- tion and exploration (DCEE) in autonomous search.Automatica133 (2021), 109851

2021
[12]

Margaret Cheney, David Isaacson, and Jonathan C Newell. 1999. Electrical impedance tomography.SIAM review41, 1 (1999), 85–101

1999
[13]

Fotini Katopodes Chow, Branko Kosović, and Stevens T. Chan. 2005. Source Inversion for Contaminant Plume Dispersion in Urban Environments Using Building-Resolving Simulations.Journal of Applied Meteorology and Climatology 47 (2005), 1553–1572. https://api.semanticscholar.org/CorpusID:76658133

2005
[14]

Diffusion Posterior Sampling for General Noisy Inverse Problems

Hyungjin Chung, Jeongsol Kim, Michael T. McCann, Marc Louis Klasky, and J. C. Ye. 2022. Diffusion Posterior Sampling for General Noisy Inverse Problems.ArXiv abs/2209.14687 (2022). https://api.semanticscholar.org/CorpusID:252596252

work page internal anchor Pith review arXiv 2022
[15]

Daniel H Cusworth, Riley M Duren, Alana K Ayasse, Ralph Jiorle, Katherine Howell, Andrew Aubrey, Robert O Green, Michael L Eastwood, John W Chapman, Andrew K Thorpe, et al. 2024. Quantifying methane emissions from United States landfills.Science383, 6690 (2024), 1499–1504

2024
[16]

Daniel Rodrigues Da Costa, Maxime Robic, Pascal Vasseur, and Fabio Morbidi
[17]

InIEEE International Conference on Robotics and Automation

A New Stereo Fisheye Event Camera for Fast Drone Detection and Tracking. InIEEE International Conference on Robotics and Automation
[18]

Matthieu Dogniaux, Joannes D Maasakkers, Marianne Girard, Dylan Jervis, Jason McKeever, Berend J Schuit, Shubham Sharma, Ana Lopez-Noreña, Daniel J Varon, and Ilse Aben. 2025. Global satellite survey reveals uncertainty in landfill methane emissions.Nature(2025), 1–6

2025
[19]

Ivanova, Ilyas Malik, and Tom Rainforth

Adam Foster, Desi R. Ivanova, Ilyas Malik, and Tom Rainforth. 2021. Deep Adap- tive Design: Amortizing Sequential Bayesian Experimental Design. InInterna- tional Conference on Machine Learning. https://api.semanticscholar.org/CorpusID: 232104961

2021
[20]

Adam Foster, Martin Jankowiak, Eli Bingham, Paul Horsfall, Yee Whye Teh, Tom Rainforth, and Noah D. Goodman. 2019. Variational Bayesian Optimal Experimental Design. InNeural Information Processing Systems. https://api. semanticscholar.org/CorpusID:173990692

2019
[21]

Hosker, and Jean S

Steven Hanna, Gary Allen Briggs, Rayford P. Hosker, and Jean S. Smith. 1982. Handbook on atmospheric diffusion. https://api.semanticscholar.org/CorpusID: 128993711

1982
[22]

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowledge in a Neural Network.ArXivabs/1503.02531 (2015). https://api. semanticscholar.org/CorpusID:7200347

work page internal anchor Pith review arXiv 2015
[23]

Jason Hite, John Mattingly, Dan Archer, Michael Willis, Andrew Rowe, Kayleigh Bray, Jake Carter, and James Ghawaly. 2019. Localization of a radioactive source in an urban environment using Bayesian Metropolis methods.Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment915 (2019), 82–93

2019
[24]

Edward R Holley. 1969. Unified view of diffusion and dispersion.Journal of the Hydraulics division95, 2 (1969), 621–632

1969
[25]

Geoffrey A Hollinger and Gaurav S Sukhatme. 2014. Sampling-based robotic information gathering algorithms.The International Journal of Robotics Research 33, 9 (2014), 1271–1287

2014
[26]

Holzschuh, S

Benjamin J. Holzschuh, S. Vegetti, and Nils Thuerey. 2023. Solving Inverse Physics Problems with Score Matching. InNeural Information Processing Systems. https://api.semanticscholar.org/CorpusID:256231499

2023
[27]

Hangkai Hu, Shiji Song, and C. L. Phillip Chen. 2019. Plume Tracing via Model- Free Reinforcement Learning Method.IEEE Transactions on Neural Networks and Learning Systems30 (2019), 2515–2527. https://api.semanticscholar.org/CorpusID: 58623810

2019
[28]

Jianwen Huo, Manlu Liu, Konstantin A Neusypin, Haojie Liu, Mingming Guo, and Yufeng Xiao. 2020. Autonomous search of radioactive sources through mobile robots.Sensors20, 12 (2020), 3461

2020
[29]

Michael Hutchinson, Pawel Ladosz, Cunjia Liu, and Wen-Hua Chen. 2019. Experi- mental assessment of plume mapping using point measurements from unmanned vehicles. In2019 International Conference on Robotics and Automation (ICRA). IEEE, 7720–7726

2019
[30]

Michael Hutchinson, Cunjia Liu, and Wen-Hua Chen. 2018. Information-based search for an atmospheric release using a mobile robot: Algorithm and experi- ments.IEEE Transactions on Control Systems Technology27, 6 (2018), 2388–2402

2018
[31]

Michael Hutchinson, Cunjia Liu, and Wen-Hua Chen. 2019. Source term estima- tion of a hazardous airborne release using an unmanned aerial vehicle.Journal of Field Robotics36, 4 (2019), 797–817

2019
[32]

Michael Hutchinson, Hyondong Oh, and Wen-Hua Chen. 2018. Entrotaxis as a strategy for autonomous search and source reconstruction in turbulent conditions. Information Fusion42 (2018), 179–189

2018
[33]

Kenneth D Jarman, Erin A Miller, Richard S Wittman, and Christopher J Gesh
[34]

Bayesian radiation source localization.Nuclear technology175, 1 (2011), 326–334

2011
[35]

Maclean, David Marshall, Jason McKeever, Mathias Strupler, Antoine Ramier, Ewan R

Dylan Jervis, Marianne Girard, Jean-Philippe W. Maclean, David Marshall, Jason McKeever, Mathias Strupler, Antoine Ramier, Ewan R. M. Tarrant, David Young, Joannes D. Maasakkers, Ilse Aben, and Tia R. Scarpelli. 2025. Global energy sector methane emissions estimated by using facility-level satellite observations.Science 390 6778 (2025), 1151–1155. https:/...

2025
[36]

Xue Jiang, Rui Ma, Yanxin Wang, Wenlong Gu, Wenxi Lu, and Jin Na. 2021. Two-stage surrogate model-assisted Bayesian framework for groundwater con- taminant source identification.Journal of Hydrology594 (2021), 125955

2021
[37]

Adam Johansen. 2009. A tutorial on particle filtering and smoothing: Fifteen years later. (2009)

2009
[38]

Nikolas Kantas, Arnaud Doucet, Sumeetpal S Singh, Jan Maciejowski, and Nicolas Chopin. 2015. On particle methods for parameter estimation in state-space models. (2015)

2015
[39]

Keats, Eugene Yee, and F

A. Keats, Eugene Yee, and F. S. Lien. 2007. Bayesian inference for source determina- tion with applications to a complex urban environment.Atmospheric Environment 41 (2007), 465–479. https://api.semanticscholar.org/CorpusID:95480737

2007
[40]

Steven Kleinegesse and Michael U Gutmann. 2020. Bayesian Experimental Design for Implicit Models by Mutual Information Neural Estimation. InInternational Conference on Machine Learning. https://api.semanticscholar.org/CorpusID: 211171409

2020
[41]

Pawel Ladosz, Hyondong Oh, Gan Zheng, and Wen-Hua Chen. 2020. Gaussian Process Based Channel Prediction for Communication-Relay UAV in Urban Environments.IEEE Trans. Aerospace Electron. Systems56 (2020), 313–325. https: //api.semanticscholar.org/CorpusID:182548627

2020
[42]

Junhee Lee, Hongro Jang, Minkyu Park, and Hyondong Oh. 2025. Enhanced Re- ward Function Design for Source Term Estimation Based on Deep Reinforcement Learning.IEEE Access(2025)

2025
[43]

Zhongguo Li, Wen-Hua Chen, Jun Yang, and Cunjia Liu. 2024. Cooperative active learning-based dual control for exploration and exploitation in autonomous search.IEEE Transactions on Neural Networks and Learning Systems(2024)

2024
[44]

Jiaming Liang, Yichuan Wu, Justin K Yim, Huimin Chen, Zicong Miao, Hanxiao Liu, Ying Liu, Yixin Liu, Dongkai Wang, Wenying Qiu, et al. 2021. Electrostatic footpads enable agile insect-scale soft robots with trajectory control.Science Robotics6, 55 (2021), eabe7906

2021
[45]

Lillicrap, Jonathan J

Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Manfred Otto Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning.arXiv: Learning(2015). https://api. semanticscholar.org/CorpusID:16326763

2015
[46]

E. J. Liu, A. Aiuppa, A. Alan, S. Arellano, M. Bitetto, N. Bobrowski, S. Carn, R. Clarke, E. Corrales, J. M. de Moor, J. A. Diaz, M. Edmonds, T. P. Fis- cher, J. Freer, G. M. Fricke, B. Galle, G. Gerdes, G. Giudice, A. Gutmann, C. Hayer, I. Itikarai, J. Jones, E. Mason, B. T. McCormick Kilbride, K. Mulina, S. Nowicki, K. Rahilly, T. Richardson, J. Rüdiger...

work page doi:10.1126/sciadv.abb9103 2020
[47]

Runze Liu, Fengshuo Bai, Yali Du, and Yaodong Yang. 2022. Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning. InNeural Information Processing Systems. https://api.semanticscholar. org/CorpusID:258509334 KDD ’26, August 9–13, 2026, International Convention Center Jeju (ICC Jeju), Jeju, Republic of Korea T...

2022
[48]

Enkeleida Lushi and John M. Stockie. 2009. An inverse Gaussian plume approach for estimating atmospheric pollutant emissions from multiple point sources. Atmospheric Environment44 (2009), 1097–1107. https://api.semanticscholar.org/ CorpusID:14414679

2009
[49]

JB Masson, M Bailly Bechet, and Massimo Vergassola. 2009. Chasing information to search in random environments.Journal of Physics A: Mathematical and Theoretical42, 43 (2009), 434009

2009
[50]

Asynchronous methods for deep reinforcement learning.arXiv preprint arXiv:1602.01783,

Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timo- thy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asyn- chronous Methods for Deep Reinforcement Learning.ArXivabs/1602.01783 (2016). https://api.semanticscholar.org/CorpusID:6875312

work page arXiv 2016
[51]

Rusu, Joel Veness, Marc G

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin A. Riedmiller, Andreas Kirkeby Fidjeland, Georg Ostrovski, Stig Petersen, Charlie Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis
[52]

https://api.semanticscholar.org/CorpusID:205242740

Human-level control through deep reinforcement learning.Nature518 (2015), 529–533. https://api.semanticscholar.org/CorpusID:205242740

2015
[53]

Lundquist, Branko Kosović, Gardar Jóhannesson, Kathleen M

Luca Delle Monache, Julie K. Lundquist, Branko Kosović, Gardar Jóhannesson, Kathleen M. Dyer, Roger D. Aines, Fotini Katopodes Chow, Rich D. Belles, William G. Hanley, Shawn Larsen, Gwendolen A. Loosmore, John J. Nitao, Gayle A. Sugiyama, and Phil Vogt. 2008. Bayesian Inference and Markov Chain Monte Carlo Sampling to Reconstruct a Contaminant Source on a...

2008
[54]

Ng, Daishi Harada, and Stuart J

A. Ng, Daishi Harada, and Stuart J. Russell. 1999. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping. InInternational Conference on Machine Learning. https://api.semanticscholar.org/CorpusID: 5730166

1999
[55]

Sterratt, and Iain Murray

George Papamakarios, David C. Sterratt, and Iain Murray. 2018. Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows. InInternational Conference on Artificial Intelligence and Statistics. https://api. semanticscholar.org/CorpusID:29166658

2018
[56]

Minkyu Park, Seulbi An, Jaemin Seo, and Hyondong Oh. 2021. Autonomous source search for UAVs using Gaussian mixture model-based infotaxis: Algorithm and flight experiments.IEEE Trans. Aerospace Electron. Systems57, 6 (2021), 4238– 4254

2021
[57]

Minkyu Park, Pawel Ladosz, and Hyondong Oh. 2022. Source Term Estimation Using Deep Reinforcement Learning With Gaussian Mixture Model Feature Extraction for Mobile Sensors.IEEE Robotics and Automation Letters7 (2022), 8323–8330. https://api.semanticscholar.org/CorpusID:249940756

2022
[58]

Efros, and Trevor Darrell

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. 2017. Curiosity-Driven Exploration by Self-Supervised Prediction.2017 IEEE Con- ference on Computer Vision and Pattern Recognition Workshops (CVPRW)(2017), 488–489. https://api.semanticscholar.org/CorpusID:20045336

2017
[59]

J Picaut. 2002. Numerical modeling of urban sound fields by a diffusion process. Applied Acoustics63, 9 (2002), 965–991

2002
[60]

Faezeh Rahbar, Ali Marjovi, and Alcherio Martinoli. 2019. An algorithm for odor source localization based on source term estimation. In2019 International Conference on Robotics and Automation (ICRA). IEEE, 973–979

2019
[61]

Nicola Rigolli, Nicodemo Magnoli, Lorenzo Rosasco, and Agnese Seminara. 2021. Learning to predict target location with turbulent odor plumes.eLife11 (2021). https://api.semanticscholar.org/CorpusID:235446309

2021
[62]

Branko Ristic, Alex Skvortsov, and Ajith Gunatilaka. 2016. A study of cognitive strategies for an autonomous search.Information Fusion28 (2016), 1–9

2016
[63]

Skvortsov, and Ajith H

Branko Ristic, Alexei T. Skvortsov, and Ajith H. Gunatilaka. 2016. A study of cognitive strategies for an autonomous search.Inf. Fusion28 (2016), 1–9. https://api.semanticscholar.org/CorpusID:1519176

2016
[64]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. FitNets: Hints for Thin Deep Nets.CoRR abs/1412.6550 (2014). https://api.semanticscholar.org/CorpusID:2723173

work page internal anchor Pith review arXiv 2014
[65]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov
[66]

Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms.ArXivabs/1707.06347 (2017). https://api.semanticscholar.org/CorpusID:28695052

work page internal anchor Pith review arXiv 2017
[67]

Jaemin Seo, Geunsik Bae, and Hyondong Oh. 2025. Kalman filter-based distributed Gaussian process for unknown scalar field estimation in wireless sensor networks. Expert Systems with Applications(2025), 127822

2025
[68]

Yiwei Shi, Muning Wen, Qi Zhang, Weinan Zhang, Cunjia Liu, and Weiru Liu
[69]

https: //api.semanticscholar.org/CorpusID:272689285

Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation.ArXivabs/2409.09541 (2024). https: //api.semanticscholar.org/CorpusID:272689285

work page arXiv 2024
[70]

Yiwei Shi, Mengyue Yang, Qi Zhang, Weinan Zhang, Cunjia Liu, and Weiru Liu
[71]

Attention-Driven Hierarchical Reinforcement Learning with Particle Fil- tering for Source Localization in Dynamic Fields.arXiv preprint arXiv:2501.13084 (2025)

work page arXiv 2025
[72]

Amarjeet Singh, Andreas Krause, Carlos Guestrin, William Kaiser, and Maxim Batalin. 2007. Efficient planning of informative paths for multiple robots. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (Hyderabad, India)(IJCAI’07). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2204–2211

2007
[73]

John M. Stockie. 2011. The Mathematics of Atmospheric Dispersion Modeling. SIAM Rev.53 (2011), 349–372. https://api.semanticscholar.org/CorpusID:8186270

2011
[74]

John M Stockie. 2011. The mathematics of atmospheric dispersion modeling. Siam Review53, 2 (2011), 349–372

2011
[75]

Vinh Tran-Quang and Hung Dao-Viet. 2022. An internet of radiation sensor system (IoRSS) to detect radioactive sources out of regulatory control.Scientific Reports12, 1 (2022), 7195

2022
[76]

D B Turner. 1994. Workbook of atmospheric dispersion estimates : an introduction to dispersion modeling. https://api.semanticscholar.org/CorpusID:93715563

1994
[77]

Massimo Vergassola, Emmanuel Villermaux, and Boris I Shraiman. 2007. ‘In- fotaxis’ as a strategy for searching without gradients.Nature445, 7126 (2007), 406–409

2007
[78]

Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. 2017. Feudal networks for hier- archical reinforcement learning. InInternational conference on machine learning. PMLR, 3540–3549

2017
[79]

Lingxiao Wang and Shuo Pang. 2023. Robotic Odor Source Localization via End- to-End Recurrent Deep Reinforcement Learning.2023 Seventh IEEE International Conference on Robotic Computing (IRC)(2023), 43–50. https://api.semanticscholar. org/CorpusID:268707007

2023
[80]

Lingxiao Wang, Shuo Pang, and Jinlong Li. 2021. Olfactory-Based Navigation via Model-Based Reinforcement Learning and Fuzzy Inference Methods.IEEE Transactions on Fuzzy Systems29 (2021), 3014–3027. https://api.semanticscholar. org/CorpusID:226425926

2021

Showing first 80 references.