AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

Derek Ming Siang Tan; Guillaume Sartoretti; Jeric Lew; Yuhong Cao

arxiv: 2512.02535 · v2 · submitted 2025-12-02 · 💻 cs.RO

AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning

Jeric Lew , Yuhong Cao , Derek Ming Siang Tan , Guillaume Sartoretti This is my paper

Pith reviewed 2026-05-17 02:51 UTC · model grok-4.3

classification 💻 cs.RO

keywords multi-agent informative path planningdiffusion modelsbehavior cloningreinforcement learningdecentralized coordinationinformation gaintrajectory generationmulti-agent systems

0 comments

The pith

Diffusion models let multi-agent planners generate long-term intents non-autoregressively, yielding faster execution and higher information gain than the expert planners used for training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AID as a decentralized method for multi-agent informative path planning that replaces autoregressive intent predictors with diffusion models. These models first copy trajectories from existing planners through behavior cloning, then improve coordination by fine-tuning with reinforcement learning that provides online reward signals as measurements update the shared belief. This approach matters for time-critical tasks like environmental monitoring or search and rescue, where multiple agents must cover large areas efficiently without central control or compounding prediction errors over long horizons. If the method holds, agents can inherit good initial behavior while learning better joint coverage that scales with team size.

Core claim

AID is a fully decentralized MAIPP framework that uses diffusion models to produce long-term trajectories in a non-autoregressive manner. It begins by performing behavior cloning on trajectories generated by existing MAIPP planners and then refines the policy through reinforcement learning with Diffusion Policy Policy Optimization. The resulting policies consistently outperform the source planners by executing four times faster and collecting up to 17 percent more information while scaling to larger agent teams.

What carries the argument

Diffusion models that generate complete long-term trajectories at once, rather than step by step, to serve as agent intent for coordination as the environment belief evolves with new measurements.

If this is right

The learned policy executes MAIPP tasks four times faster than the planners it was trained on.
Information gain rises by as much as 17 percent relative to the original expert methods.
The decentralized approach continues to improve coordination as the number of agents increases.
Non-autoregressive generation avoids the compounding errors that affect step-by-step intent predictors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-stage cloning-plus-reward-refinement pipeline could transfer to other multi-agent tasks that require long-horizon coordination under changing beliefs.
Because the diffusion model outputs full trajectories at once, early measurement errors may have less impact on later decisions than in autoregressive alternatives.
Real-robot deployments could test whether the speed advantage holds when communication delays and sensor inaccuracies are added.

Load-bearing premise

Trajectories produced by existing multi-agent informative path planners supply enough expert examples for behavior cloning to create a starting policy that reinforcement learning can improve without introducing coordination failures in unseen environments.

What would settle it

Measuring whether AID collects less total information than its source planners when tested on environment maps with obstacle patterns or sensor noise distributions that differ from those used to generate the training trajectories.

Figures

Figures reproduced from arXiv: 2512.02535 by Derek Ming Siang Tan, Guillaume Sartoretti, Jeric Lew, Yuhong Cao.

**Figure 1.** Figure 1: Example run of AID with 3 agents. (1) shows the agents’ trajectories, where the translucent segment is the black agent’s predicted future path. (1) and (4) depict the GP-predicted mean and standard deviation of the information distribution (Section 3.1), with brighter cells indicating higher values. (2) shows the groundtruth information distribution, and (5) highlights the current high-interest region (S… view at source ↗

**Figure 2.** Figure 2: Pipeline for AID. Each agent i starts from the same initial position and moves asynchronously to their next position which can be of different distance for each agent. Thus, the number of time steps, t, each agent can take before exhausting their budget will be different. They iteratively plan and execute their paths in a receding horizon manner until their budget is exhausted. At that point, the final tra… view at source ↗

**Figure 3.** Figure 3: Example of agent intent generated by diffusion model. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Information gathering in large-scale or time-critical scenarios (e.g., environmental monitoring, search and rescue) requires broad coverage within limited time budgets, motivating the use of multi-agent systems. These scenarios are commonly formulated as multi-agent informative path planning (MAIPP), where multiple agents must coordinate to maximize information gain while operating under budget constraints. A central challenge in MAIPP is ensuring effective coordination while the belief over the environment evolves with incoming measurements. Recent learning-based approaches address this by using distributions over future positions as "intent" to support coordination. However, these autoregressive intent predictors are computationally expensive and prone to compounding errors. Inspired by the effectiveness of diffusion models as expressive, long-horizon policies, we propose AID, a fully decentralized MAIPP framework that leverages diffusion models to generate long-term trajectories in a non-autoregressive manner. AID first performs behavior cloning on trajectories produced by existing MAIPP planners and then fine-tunes the policy using reinforcement learning via Diffusion Policy Policy Optimization (DPPO). This two-stage pipeline enables the policy to inherit expert behavior while learning improved coordination through online reward feedback. Experiments demonstrate that AID consistently improves upon the MAIPP planners it is trained from, achieving 4x faster execution and up to 17% increased information gain, while scaling effectively to larger numbers of agents. Our implementation is publicly available at https://github.com/marmotlab/AID.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AID brings non-autoregressive diffusion to multi-agent intent for informative path planning through a behavior-cloning then DPPO pipeline, with reported speedups and gains that still need clearer ablations to attribute.

read the letter

The main takeaway is that this work uses diffusion models to create non-autoregressive trajectories for agent intent in multi-agent informative path planning, trained first by cloning existing planners and then refined with DPPO reinforcement learning. This is new because most prior intent predictors are autoregressive and suffer from error buildup over long horizons. The diffusion approach generates full plans at once, which should be more stable and faster. The two-stage pipeline is a reasonable way to combine expert knowledge with online improvement for better multi-agent coordination. The paper does well by showing practical benefits: 4 times faster execution and up to 17 percent more information gain, with scaling to larger agent teams. Making the implementation available on GitHub is helpful for others to test and extend it. The soft spots are around the experimental validation. The abstract mentions consistent improvements, but without details on statistical significance, the range of test environments, or ablations that isolate the effect of the RL fine-tuning versus the base diffusion model, it's difficult to pinpoint exactly where the gains come from. The worry that the RL might not be correcting coordination issues but rather the results could stem from other factors is worth checking in the full paper. This is aimed at researchers in robotics working on decentralized planning for information gathering tasks, such as environmental monitoring or search operations. Readers who want to see how diffusion models can be applied to multi-agent coordination will get value here. It deserves a serious referee because the method is technically distinct and the reported performance improvements are relevant to real applications. I'd recommend sending it for peer review to get a closer look at the methods and results.

Referee Report

2 major / 2 minor

Summary. The paper proposes AID, a fully decentralized multi-agent informative path planning (MAIPP) framework that uses diffusion models to generate long-horizon trajectories non-autoregressively. It initializes via behavior cloning on trajectories from existing MAIPP planners and then fine-tunes with reinforcement learning through Diffusion Policy Policy Optimization (DPPO) to improve coordination via learned intent. The central claims are that AID consistently outperforms the base planners with 4x faster execution, up to 17% higher information gain, and effective scaling to larger agent counts, with public code released.

Significance. If the empirical claims hold under rigorous validation, AID offers a scalable alternative to autoregressive intent predictors for time-critical multi-agent information gathering tasks such as environmental monitoring. The two-stage BC-then-DPPO pipeline and non-autoregressive sampling are technically interesting strengths, and the public implementation supports reproducibility. However, the significance is limited by the current lack of detail on how much of the reported gains are attributable to the RL coordination stage versus other factors.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: the headline claims of 4x faster execution and up to 17% increased information gain rest on comparisons to the MAIPP planners used for behavior cloning, yet no details are provided on statistical significance, number of random seeds or trials, exact baseline implementations, environment diversity, or map sizes. This makes it difficult to evaluate robustness of the central outperformance claim.
[Method / Experiments] The two-stage pipeline description (behavior cloning followed by DPPO fine-tuning): the attribution of performance gains to learned multi-agent coordination via online reward feedback is load-bearing for the novelty claim, but no ablation is described that freezes the BC stage and compares information-gain, overlap, and execution-time metrics of the BC-only policy against the full DPPO-tuned policy on held-out maps with 4–8 agents. Without this isolation, gains could arise from non-autoregressive sampling speed or single-agent quality rather than improved joint intent.

minor comments (2)

[Method] Clarify the precise form of the diffusion policy output (e.g., whether it directly predicts joint trajectories or per-agent marginals with implicit coordination) and how belief updates are incorporated during online RL rollouts.
[Experiments] The abstract states 'scaling effectively to larger numbers of agents' but provides no quantitative scaling curves or failure modes for agent counts beyond the tested range; adding such plots would strengthen the scaling claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving the clarity and rigor of our experimental claims and analyses. We address each major comment point by point below and have revised the manuscript to incorporate the requested details and additional experiments.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the headline claims of 4x faster execution and up to 17% increased information gain rest on comparisons to the MAIPP planners used for behavior cloning, yet no details are provided on statistical significance, number of random seeds or trials, exact baseline implementations, environment diversity, or map sizes. This makes it difficult to evaluate robustness of the central outperformance claim.

Authors: We agree that the original submission did not provide sufficient experimental details to allow full assessment of the robustness of the reported gains. In the revised manuscript, we have expanded the Experiments section (now Section 5) with a dedicated subsection on experimental setup. This includes: the use of 5 random seeds for all reported results, the total number of trials per configuration (50 episodes per map/agent setting), statistical significance via paired t-tests with p-values reported in Table 2, exact baseline implementations (including hyperparameters and runtime configurations for the source MAIPP planners), environment diversity (Gaussian process fields with varying length scales and obstacle densities), and map sizes (ranging from 20x20 to 50x50 grids). These additions directly support the headline claims and are summarized in an updated Table 1 and new Table 2. revision: yes
Referee: [Method / Experiments] The two-stage pipeline description (behavior cloning followed by DPPO fine-tuning): the attribution of performance gains to learned multi-agent coordination via online reward feedback is load-bearing for the novelty claim, but no ablation is described that freezes the BC stage and compares information-gain, overlap, and execution-time metrics of the BC-only policy against the full DPPO-tuned policy on held-out maps with 4–8 agents. Without this isolation, gains could arise from non-autoregressive sampling speed or single-agent quality rather than improved joint intent.

Authors: We concur that an explicit ablation isolating the DPPO fine-tuning stage is necessary to attribute gains specifically to learned multi-agent coordination. We have performed this ablation on held-out maps with 4–8 agents, comparing the BC-only policy (frozen after behavior cloning) against the full AID policy after DPPO. Results show that while the BC-only policy already achieves faster execution than autoregressive baselines due to non-autoregressive sampling, the DPPO stage yields additional improvements: 8–12% higher information gain and reduced trajectory overlap (indicating better joint intent), with execution time remaining comparable. These metrics are now reported in a new subsection (5.4) with supporting figures and tables, confirming the contribution of the RL coordination stage beyond single-agent quality or sampling speed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results independent of input cloning

full rationale

The paper describes a two-stage empirical pipeline: behavior cloning from trajectories of existing MAIPP planners, followed by DPPO-based RL fine-tuning with online reward feedback. Reported gains (4x faster execution, up to 17% information gain, scaling to more agents) are presented as outcomes of experiments on held-out scenarios, not as quantities derived by construction from the cloned trajectories. No equations, uniqueness theorems, or self-citations are shown that would force the final performance metrics to equal the expert data inputs. The RL stage is explicitly positioned as allowing correction of coordination issues, making the claims falsifiable via ablation rather than tautological. This is a standard learning-based robotics paper whose central results rest on external validation rather than internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that diffusion models trained on expert MAIPP trajectories plus RL feedback can produce coordinated long-horizon plans without compounding errors; no new physical entities or ad-hoc constants are introduced beyond standard diffusion and RL hyperparameters.

axioms (2)

domain assumption Diffusion models can serve as expressive long-horizon policies for path planning without autoregressive error accumulation
Invoked when the paper replaces autoregressive intent predictors with diffusion generation.
domain assumption Behavior cloning from existing MAIPP planners followed by RL yields policies that generalize beyond the training distribution
Required for the claim that AID improves upon and scales beyond the planners it was cloned from.

pith-pipeline@v0.9.0 · 5556 in / 1335 out tokens · 35855 ms · 2026-05-17T02:51:38.022784+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Diffusion Policy with Bayesian Expert Selection for Active Multi-Target Tracking
cs.RO 2026-04 unverdicted novelty 7.0

A Bayesian expert selection framework with variational Bayesian last layers and lower confidence bounds improves diffusion policies for active multi-target tracking.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Baxter, J.L., Burke, E.K., Garibaldi, J.M., Norman, M.: Multi-Robot Search and Rescue: A Potential Field Based Approach, pp. 9–16. Springer Berlin Heidel- berg, Berlin, Heidelberg (2007).https://doi.org/10.1007/978-3-540-73424-6_ 2,https://doi.org/10.1007/978-3-540-73424-6_2

work page doi:10.1007/978-3-540-73424-6_ 2007
[2]

In: 2012 IEEE International Conference on Robotics and Automation

Binney, J., Sukhatme, G.S.: Branch and bound for informative path planning. In: 2012 IEEE International Conference on Robotics and Automation. pp. 2147–2154 (2012).https://doi.org/10.1109/ICRA.2012.6224902

work page doi:10.1109/icra.2012.6224902 2012
[3]

Cao, Y., Lew, J., Liang, J., Cheng, J., Sartoretti, G.: Dare: Diffusion policy for autonomous robot exploration (2024),https://arxiv.org/abs/2410.16687

work page arXiv 2024
[4]

In: Conference on Robot Learning

Cao, Y., Wang, Y., Vashisth, A., Fan, H., Sartoretti, G.A.: Catnipp: Context-aware attention-based network for informative path planning. In: Conference on Robot Learning. pp. 1928–1937. PMLR (2023)

work page 1928
[5]

The International Journal of Robotics Research (2024)

Chi,C.,Xu,Z.,Feng,S.,Cousineau,E.,Du,Y.,Burchfiel,B.,Tedrake,R.,Song,S.: Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research (2024)

work page 2024
[6]

In: Robotics: Science and Systems

Corah, M., Michael, N.: Efficient online multi-robot exploration via distributed se- quential greedy assignment. In: Robotics: Science and Systems. vol. 13. Cambridge, MA (2017)

work page 2017
[7]

AAAI Workshop on Deep Learning on Graphs: Methods and Applications (2021)

Dwivedi, V.P., Bresson, X.: A generalization of transformer networks to graphs. AAAI Workshop on Deep Learning on Graphs: Methods and Applications (2021)

work page 2021
[8]

Hitz, G., Galceran, E., Garneau, M.É., Pomerleau, F., Siegwart, R.: Adaptive continuous-space informative path planning for online environmental monitoring. Journal of Field Robotics34(8), 1427–1449 (2017).https://doi.org/10.1002/ rob.21722,https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.21722, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/...

work page doi:10.1002/rob.21722 2017
[9]

(eds.) Advances in Neu- ral Information Processing Systems

Ho,J.,Jain,A.,Abbeel,P.:Denoisingdiffusionprobabilisticmodels.In:Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neu- ral Information Processing Systems. vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020),https://proceedings.neurips.cc/paper_files/paper/2020/file/ 4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf

work page 2020
[10]

The International Journal of Robotics Research33(9), 1271– 1287 (2014).https://doi.org/10.1177/0278364914533443,https://doi.org/ 10.1177/0278364914533443

Hollinger, G.A., Sukhatme, G.S.: Sampling-based robotic information gather- ing algorithms. The International Journal of Robotics Research33(9), 1271– 1287 (2014).https://doi.org/10.1177/0278364914533443,https://doi.org/ 10.1177/0278364914533443

work page doi:10.1177/0278364914533443 2014
[11]

Huang, X., Chi, Y., Wang, R., Li, Z., Peng, X.B., Shao, S., Nikolic, B., Sreenath, K.: Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets (2024),https://arxiv.org/abs/2404.19264

work page arXiv 2024
[13]

In: International Conference on Machine Learning (2022)

Janner, M., Du, Y., Tenenbaum, J., Levine, S.: Planning with diffusion for flexible behavior synthesis. In: International Conference on Machine Learning (2022)

work page 2022
[14]

Carnegie Mellon Univer- sity (2008)

Krause, A.: Optimizing sensing: Theory and applications. Carnegie Mellon Univer- sity (2008)

work page 2008
[15]

In: Akin, H.L., Amato, N.M., Isler, V., Van Der Stappen, A.F

Lim, Z.W., Hsu, D., Lee, W.S.: Adaptive Informative Path Planning in Metric Spaces. In: Akin, H.L., Amato, N.M., Isler, V., Van Der Stappen, A.F. (eds.) AID: AgentIntent fromDiffusion for MAIPP 13 Algorithmic Foundations of Robotics XI, vol. 107, pp. 283–300. Springer Interna- tional Publishing, Cham (2015).https://doi.org/10.1007/978-3-319-16595-0_ 17,ht...

work page doi:10.1007/978-3-319-16595-0_ 2015
[16]

In: AAAI

Meliou, A., Krause, A., Guestrin, C., Hellerstein, J.M.: Nonmyopic informative path planning in spatio-temporal models. In: AAAI. vol. 10, pp. 16–7 (2007)

work page 2007
[17]

In: 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans (OTO)

Mishra, R., Chitre, M., Swarup, S.: Online informative path planning using sparse gaussian processes. In: 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans (OTO). pp. 1–5 (2018).https://doi.org/10.1109/OCEANSKOBE.2018.8559183

work page doi:10.1109/oceanskobe.2018.8559183 2018
[18]

Robotics and Autonomous Systems179, 104727 (2024)

Popović, M., Ott, J., Rückin, J., Kochenderfer, M.J.: Learning-based methods for adaptive informative path planning. Robotics and Autonomous Systems179, 104727 (2024)

work page 2024
[19]

Autonomous Robots44(6), 889–911 (Jul 2020).https:// doi.org/10.1007/s10514-020-09903-2,http://link.springer.com/10.1007/ s10514-020-09903-2

Popović, M., Vidal-Calleja, T., Hitz, G., Chung, J.J., Sa, I., Siegwart, R., Nieto, J.: An informative path planning framework for UAV-based ter- rain monitoring. Autonomous Robots44(6), 889–911 (Jul 2020).https:// doi.org/10.1007/s10514-020-09903-2,http://link.springer.com/10.1007/ s10514-020-09903-2

work page doi:10.1007/s10514-020-09903-2 2020
[20]

Diffusion Policy Policy Optimization

Ren, A.Z., Lidard, J., Ankile, L.L., Simeonov, A., Agrawal, P., Majumdar, A., Burchfiel, B., Dai, H., Simchowitz, M.: Diffusion policy policy optimization. In: arXiv preprint arXiv:2409.00588 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[21]

Varadarajan, A

Rückin, J., Jin, L., Popović, M.: Adaptive informative path planning using deep reinforcement learning for uav-based active sensing. In: 2022 International Con- ference on Robotics and Automation (ICRA). pp. 4473–4479 (2022).https: //doi.org/10.1109/ICRA46639.2022.9812025

work page doi:10.1109/icra46639.2022.9812025 2022
[22]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, SanJuan,PuertoRico,May2-4,2016,ConferenceTrackProceedings(2016),http: //arxiv.org/abs/1506.02438

work page internal anchor Pith review Pith/arXiv arXiv 2016
[23]

ViNT: A foundation model for visual navigation,

Shah, D., Sridhar, A., Dashora, N., Stachowicz, K., Black, K., Hirose, N., Levine, S.: ViNT: A foundation model for visual navigation. In: 7th Annual Conference on Robot Learning (2023),https://arxiv.org/abs/2306.14846

work page arXiv 2023
[24]

Shaoul,Y.,Mishani,I.,Vats,S.,Li,J.,Likhachev,M.:Multi-robotmotionplanning with diffusion models (2025)

work page 2025
[25]

In: International conference on machine learning

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsuper- vised learning using nonequilibrium thermodynamics. In: International conference on machine learning. pp. 2256–2265. PMLR (2015)

work page 2015
[26]

10610948

Sridhar, A., Shah, D., Glossop, C., Levine, S.: Nomad: Goal masked diffusion poli- cies for navigation and exploration. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 63–70 (2024).https://doi.org/10.1109/ ICRA57147.2024.10610665

work page arXiv 2024
[27]

In: Proceedings of The 9th Conference on Robot Learning

Tan, D.M.S., Shailesh, S., Liu, B., Raj, A., Ang, Q.X., Dai, W., Duhan, T., Chiun, J., Cao, Y., Shkurti, F., Sartoretti, G.A.: Search-tta: A multi-modal test-time adaptation framework for visual search in the wild. In: Proceedings of The 9th Conference on Robot Learning. vol. 305, pp. 2093–2120. PMLR (2025)

work page 2093
[28]

Vashisth, A., Kulshrestha, M., Conover, D., Bera, A.: Scalable multi-robot infor- mative path planning for target mapping via deep reinforcement learning (2025), https://arxiv.org/abs/2409.16967

work page arXiv 2025
[29]

Advances in neural information pro- cessing systems30(2017) 14 J

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017) 14 J. Lew et al

work page 2017
[30]

Wei, Y., Zheng, R.: Informative path planning for mobile sensing with reinforce- mentlearning.In:IEEEINFOCOM2020-IEEEConferenceonComputerCommu- nications. pp. 864–873 (2020).https://doi.org/10.1109/INFOCOM41043.2020. 9155528

work page doi:10.1109/infocom41043.2020 2020
[31]

Privacy-preserving and uncertainty-aware federated trajectory prediction for connected autonomous vehicles

Westheider, J., Rückin, J., Popović, M.: Multi-uav adaptive path planning using deep reinforcement learning. In: 2023 IEEE/RSJ International Conference on In- telligent Robots and Systems (IROS). pp. 649–656 (2023).https://doi.org/10. 1109/IROS55552.2023.10342516

work page arXiv 2023
[32]

Yanes Luis, S., Perales Esteve, M., Gutiérrez Reina, D., Toral Marín, S.: Deep Reinforcement Learning Applied to Multi-agent Informative Path Planning in Environmental Missions, pp. 31–61. Springer International Publishing, Cham (2023).https://doi.org/10.1007/978-3-031-26564-8_2,https://doi.org/10. 1007/978-3-031-26564-8_2

work page doi:10.1007/978-3-031-26564-8_2 2023
[33]

In: 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp

Yang, T., Cao, Y., Sartoretti, G.: Intent-based deep reinforcement learning for multi-agent informative path planning. In: 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS). pp. 71–77 (2023).https://doi. org/10.1109/MRS60187.2023.10416797

work page doi:10.1109/mrs60187.2023.10416797 2023
[34]

Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

Zhu, H., Chung, J.J., Lawrance, N.R., Siegwart, R., Alonso-Mora, J.: Online in- formative path planning for active information gathering of a 3d surface. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). pp. 1488– 1494 (2021).https://doi.org/10.1109/ICRA48506.2021.9561963

work page doi:10.1109/icra48506.2021.9561963 2021
[35]

Zhu, Z., Liu, M., Mao, L., Kang, B., Xu, M., Yu, Y., Ermon, S., Zhang, W.: Madiff: Offlinemulti-agentlearningwithdiffusionmodels.arXivpreprintarXiv:2305.17330 (2023)

work page arXiv 2023

[1] [1]

Baxter, J.L., Burke, E.K., Garibaldi, J.M., Norman, M.: Multi-Robot Search and Rescue: A Potential Field Based Approach, pp. 9–16. Springer Berlin Heidel- berg, Berlin, Heidelberg (2007).https://doi.org/10.1007/978-3-540-73424-6_ 2,https://doi.org/10.1007/978-3-540-73424-6_2

work page doi:10.1007/978-3-540-73424-6_ 2007

[2] [2]

In: 2012 IEEE International Conference on Robotics and Automation

Binney, J., Sukhatme, G.S.: Branch and bound for informative path planning. In: 2012 IEEE International Conference on Robotics and Automation. pp. 2147–2154 (2012).https://doi.org/10.1109/ICRA.2012.6224902

work page doi:10.1109/icra.2012.6224902 2012

[3] [3]

Cao, Y., Lew, J., Liang, J., Cheng, J., Sartoretti, G.: Dare: Diffusion policy for autonomous robot exploration (2024),https://arxiv.org/abs/2410.16687

work page arXiv 2024

[4] [4]

In: Conference on Robot Learning

Cao, Y., Wang, Y., Vashisth, A., Fan, H., Sartoretti, G.A.: Catnipp: Context-aware attention-based network for informative path planning. In: Conference on Robot Learning. pp. 1928–1937. PMLR (2023)

work page 1928

[5] [5]

The International Journal of Robotics Research (2024)

Chi,C.,Xu,Z.,Feng,S.,Cousineau,E.,Du,Y.,Burchfiel,B.,Tedrake,R.,Song,S.: Diffusion policy: Visuomotor policy learning via action diffusion. The International Journal of Robotics Research (2024)

work page 2024

[6] [6]

In: Robotics: Science and Systems

Corah, M., Michael, N.: Efficient online multi-robot exploration via distributed se- quential greedy assignment. In: Robotics: Science and Systems. vol. 13. Cambridge, MA (2017)

work page 2017

[7] [7]

AAAI Workshop on Deep Learning on Graphs: Methods and Applications (2021)

Dwivedi, V.P., Bresson, X.: A generalization of transformer networks to graphs. AAAI Workshop on Deep Learning on Graphs: Methods and Applications (2021)

work page 2021

[8] [8]

Hitz, G., Galceran, E., Garneau, M.É., Pomerleau, F., Siegwart, R.: Adaptive continuous-space informative path planning for online environmental monitoring. Journal of Field Robotics34(8), 1427–1449 (2017).https://doi.org/10.1002/ rob.21722,https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.21722, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/...

work page doi:10.1002/rob.21722 2017

[9] [9]

(eds.) Advances in Neu- ral Information Processing Systems

Ho,J.,Jain,A.,Abbeel,P.:Denoisingdiffusionprobabilisticmodels.In:Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neu- ral Information Processing Systems. vol. 33, pp. 6840–6851. Curran Associates, Inc. (2020),https://proceedings.neurips.cc/paper_files/paper/2020/file/ 4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf

work page 2020

[10] [10]

The International Journal of Robotics Research33(9), 1271– 1287 (2014).https://doi.org/10.1177/0278364914533443,https://doi.org/ 10.1177/0278364914533443

Hollinger, G.A., Sukhatme, G.S.: Sampling-based robotic information gather- ing algorithms. The International Journal of Robotics Research33(9), 1271– 1287 (2014).https://doi.org/10.1177/0278364914533443,https://doi.org/ 10.1177/0278364914533443

work page doi:10.1177/0278364914533443 2014

[11] [11]

Huang, X., Chi, Y., Wang, R., Li, Z., Peng, X.B., Shao, S., Nikolic, B., Sreenath, K.: Diffuseloco: Real-time legged locomotion control with diffusion from offline datasets (2024),https://arxiv.org/abs/2404.19264

work page arXiv 2024

[12] [13]

In: International Conference on Machine Learning (2022)

Janner, M., Du, Y., Tenenbaum, J., Levine, S.: Planning with diffusion for flexible behavior synthesis. In: International Conference on Machine Learning (2022)

work page 2022

[13] [14]

Carnegie Mellon Univer- sity (2008)

Krause, A.: Optimizing sensing: Theory and applications. Carnegie Mellon Univer- sity (2008)

work page 2008

[14] [15]

In: Akin, H.L., Amato, N.M., Isler, V., Van Der Stappen, A.F

Lim, Z.W., Hsu, D., Lee, W.S.: Adaptive Informative Path Planning in Metric Spaces. In: Akin, H.L., Amato, N.M., Isler, V., Van Der Stappen, A.F. (eds.) AID: AgentIntent fromDiffusion for MAIPP 13 Algorithmic Foundations of Robotics XI, vol. 107, pp. 283–300. Springer Interna- tional Publishing, Cham (2015).https://doi.org/10.1007/978-3-319-16595-0_ 17,ht...

work page doi:10.1007/978-3-319-16595-0_ 2015

[15] [16]

In: AAAI

Meliou, A., Krause, A., Guestrin, C., Hellerstein, J.M.: Nonmyopic informative path planning in spatio-temporal models. In: AAAI. vol. 10, pp. 16–7 (2007)

work page 2007

[16] [17]

In: 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans (OTO)

Mishra, R., Chitre, M., Swarup, S.: Online informative path planning using sparse gaussian processes. In: 2018 OCEANS - MTS/IEEE Kobe Techno-Oceans (OTO). pp. 1–5 (2018).https://doi.org/10.1109/OCEANSKOBE.2018.8559183

work page doi:10.1109/oceanskobe.2018.8559183 2018

[17] [18]

Robotics and Autonomous Systems179, 104727 (2024)

Popović, M., Ott, J., Rückin, J., Kochenderfer, M.J.: Learning-based methods for adaptive informative path planning. Robotics and Autonomous Systems179, 104727 (2024)

work page 2024

[18] [19]

Autonomous Robots44(6), 889–911 (Jul 2020).https:// doi.org/10.1007/s10514-020-09903-2,http://link.springer.com/10.1007/ s10514-020-09903-2

Popović, M., Vidal-Calleja, T., Hitz, G., Chung, J.J., Sa, I., Siegwart, R., Nieto, J.: An informative path planning framework for UAV-based ter- rain monitoring. Autonomous Robots44(6), 889–911 (Jul 2020).https:// doi.org/10.1007/s10514-020-09903-2,http://link.springer.com/10.1007/ s10514-020-09903-2

work page doi:10.1007/s10514-020-09903-2 2020

[19] [20]

Diffusion Policy Policy Optimization

Ren, A.Z., Lidard, J., Ankile, L.L., Simeonov, A., Agrawal, P., Majumdar, A., Burchfiel, B., Dai, H., Simchowitz, M.: Diffusion policy policy optimization. In: arXiv preprint arXiv:2409.00588 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[20] [21]

Varadarajan, A

Rückin, J., Jin, L., Popović, M.: Adaptive informative path planning using deep reinforcement learning for uav-based active sensing. In: 2022 International Con- ference on Robotics and Automation (ICRA). pp. 4473–4479 (2022).https: //doi.org/10.1109/ICRA46639.2022.9812025

work page doi:10.1109/icra46639.2022.9812025 2022

[21] [22]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

Schulman, J., Moritz, P., Levine, S., Jordan, M.I., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, SanJuan,PuertoRico,May2-4,2016,ConferenceTrackProceedings(2016),http: //arxiv.org/abs/1506.02438

work page internal anchor Pith review Pith/arXiv arXiv 2016

[22] [23]

ViNT: A foundation model for visual navigation,

Shah, D., Sridhar, A., Dashora, N., Stachowicz, K., Black, K., Hirose, N., Levine, S.: ViNT: A foundation model for visual navigation. In: 7th Annual Conference on Robot Learning (2023),https://arxiv.org/abs/2306.14846

work page arXiv 2023

[23] [24]

Shaoul,Y.,Mishani,I.,Vats,S.,Li,J.,Likhachev,M.:Multi-robotmotionplanning with diffusion models (2025)

work page 2025

[24] [25]

In: International conference on machine learning

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsuper- vised learning using nonequilibrium thermodynamics. In: International conference on machine learning. pp. 2256–2265. PMLR (2015)

work page 2015

[25] [26]

10610948

Sridhar, A., Shah, D., Glossop, C., Levine, S.: Nomad: Goal masked diffusion poli- cies for navigation and exploration. In: 2024 IEEE International Conference on Robotics and Automation (ICRA). pp. 63–70 (2024).https://doi.org/10.1109/ ICRA57147.2024.10610665

work page arXiv 2024

[26] [27]

In: Proceedings of The 9th Conference on Robot Learning

Tan, D.M.S., Shailesh, S., Liu, B., Raj, A., Ang, Q.X., Dai, W., Duhan, T., Chiun, J., Cao, Y., Shkurti, F., Sartoretti, G.A.: Search-tta: A multi-modal test-time adaptation framework for visual search in the wild. In: Proceedings of The 9th Conference on Robot Learning. vol. 305, pp. 2093–2120. PMLR (2025)

work page 2093

[27] [28]

Vashisth, A., Kulshrestha, M., Conover, D., Bera, A.: Scalable multi-robot infor- mative path planning for target mapping via deep reinforcement learning (2025), https://arxiv.org/abs/2409.16967

work page arXiv 2025

[28] [29]

Advances in neural information pro- cessing systems30(2017) 14 J

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017) 14 J. Lew et al

work page 2017

[29] [30]

Wei, Y., Zheng, R.: Informative path planning for mobile sensing with reinforce- mentlearning.In:IEEEINFOCOM2020-IEEEConferenceonComputerCommu- nications. pp. 864–873 (2020).https://doi.org/10.1109/INFOCOM41043.2020. 9155528

work page doi:10.1109/infocom41043.2020 2020

[30] [31]

Privacy-preserving and uncertainty-aware federated trajectory prediction for connected autonomous vehicles

Westheider, J., Rückin, J., Popović, M.: Multi-uav adaptive path planning using deep reinforcement learning. In: 2023 IEEE/RSJ International Conference on In- telligent Robots and Systems (IROS). pp. 649–656 (2023).https://doi.org/10. 1109/IROS55552.2023.10342516

work page arXiv 2023

[31] [32]

Yanes Luis, S., Perales Esteve, M., Gutiérrez Reina, D., Toral Marín, S.: Deep Reinforcement Learning Applied to Multi-agent Informative Path Planning in Environmental Missions, pp. 31–61. Springer International Publishing, Cham (2023).https://doi.org/10.1007/978-3-031-26564-8_2,https://doi.org/10. 1007/978-3-031-26564-8_2

work page doi:10.1007/978-3-031-26564-8_2 2023

[32] [33]

In: 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pp

Yang, T., Cao, Y., Sartoretti, G.: Intent-based deep reinforcement learning for multi-agent informative path planning. In: 2023 International Symposium on Multi-Robot and Multi-Agent Systems (MRS). pp. 71–77 (2023).https://doi. org/10.1109/MRS60187.2023.10416797

work page doi:10.1109/mrs60187.2023.10416797 2023

[33] [34]

Lee, Matthew Tan, Yuke Zhu, and Jeannette Bohg

Zhu, H., Chung, J.J., Lawrance, N.R., Siegwart, R., Alonso-Mora, J.: Online in- formative path planning for active information gathering of a 3d surface. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). pp. 1488– 1494 (2021).https://doi.org/10.1109/ICRA48506.2021.9561963

work page doi:10.1109/icra48506.2021.9561963 2021

[34] [35]

Zhu, Z., Liu, M., Mao, L., Kang, B., Xu, M., Yu, Y., Ermon, S., Zhang, W.: Madiff: Offlinemulti-agentlearningwithdiffusionmodels.arXivpreprintarXiv:2305.17330 (2023)

work page arXiv 2023