COLSON: Controllable Learning-Based Social Navigation via Diffusion-Based Reinforcement Learning

Kohei Matsumoto; Ryo Kurazume; Yuki Hyodo; Yuki Tomita

arxiv: 2503.13934 · v2 · pith:HYMZX2PMnew · submitted 2025-03-18 · 💻 cs.RO · cs.AI

COLSON: Controllable Learning-Based Social Navigation via Diffusion-Based Reinforcement Learning

Kohei Matsumoto , Yuki Tomita , Yuki Hyodo , Ryo Kurazume This is my paper

Pith reviewed 2026-05-23 00:02 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords social navigationdiffusion modelsreinforcement learningmobile robotszero-shot adaptationpedestrian avoidancecontrollable policies

0 comments

The pith

Diffusion-based reinforcement learning with controllability extensions allows robots to adapt to unseen social navigation scenarios without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies diffusion models within reinforcement learning to generate more flexible action distributions for mobile robots navigating among pedestrians than standard Gaussian policies permit. It introduces controllability extensions that exploit diffusion model properties to support adaptation to new conditions, specifically static obstacles absent from training data and shifted objectives such as accompanying target pedestrians while avoiding others. A sympathetic reader would care because these changes occur without any retraining or new data collection, addressing a practical barrier in deploying service robots in variable real environments.

Core claim

By applying a diffusion-based reinforcement learning approach to social navigation, the authors demonstrate its effectiveness. They propose extensions that enable adaptation to previously unseen scenarios without additional training, as shown in cases with static obstacles not present during training and with changed objectives such as accompanying target pedestrians while avoiding others to reach the destination.

What carries the argument

COLSON, the diffusion-based RL policy for social navigation equipped with controllability extensions that leverage diffusion characteristics to enable zero-shot adaptation to new obstacle configurations and objective shifts.

If this is right

Robots can navigate around static obstacles that were absent from the training environment.
Navigation objectives can shift to include accompanying specific pedestrians without requiring policy retraining.
Action distributions remain more flexible than those from Gaussian policies in continuous spaces.
No additional training data or retraining is needed for these scenario changes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same controllability mechanism could reduce data collection needs when deploying navigation policies across environments with varying obstacle densities.
Extension to real-world robot hardware would provide a direct test of whether simulation results on zero-shot adaptation transfer.
Similar diffusion-based policies might apply to other continuous-control robotic tasks that require handling objective changes mid-operation.

Load-bearing premise

The diffusion policy's internal structure inherently supports zero-shot generalization to new obstacle configurations and objective shifts solely through the proposed controllability extensions without requiring retraining or new data.

What would settle it

If a robot using the trained policy with the controllability extensions collides with or fails to navigate around a static obstacle introduced after training, or cannot accompany a target pedestrian to the destination while avoiding others, that observation would falsify the adaptation claim.

Figures

Figures reproduced from arXiv: 2503.13934 by Kohei Matsumoto, Ryo Kurazume, Yuki Hyodo, Yuki Tomita.

**Figure 2.** Figure 2: Architecture of proposed method. s r and s n indicate the states of each robot and pedestrian, respectively. h n indicates the features of each robot and pedestrian. hˆ n indicates the features after the GNN is applied. in which both the Actor and Critic incorporate GNNs. Each GNN utilizes Graph Attention Networks v2 (GATv2) [30]. The Actor employs a diffusion model that considers the output of the GNN as … view at source ↗

**Figure 3.** Figure 3: Comparison of success rate while changing the number of pedestrians. The upper graph shows the results for the visible setting, while the lower [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Trajectory of robot with and without guidance for smoothing. The [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison between with and without guidance for static obstacle [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

Mobile robot navigation in dynamic environments with pedestrian traffic is a key challenge in the development of autonomous mobile service robots. Recently, deep reinforcement learning-based methods have been actively studied and have outperformed traditional rule-based approaches owing to their optimization capabilities. Among these methods, those that assume continuous action spaces typically rely on Gaussian distributions, which limit the flexibility of the generated actions. In contrast, the application of diffusion models to reinforcement learning has advanced, enabling more flexible action distributions than Gaussian policy-based approaches. In this study, we apply a diffusion-based reinforcement learning approach to social navigation and validate its effectiveness. Furthermore, by exploiting the characteristics of diffusion models, we propose extensions that enable adaptation to previously unseen scenarios without additional training. As concrete scenario examples, we demonstrate adaptability to scenarios in which static obstacles exist in the environment that were not present during training, as well as scenarios in which the objective differs from training, such as accompanying target pedestrians while avoiding others to reach the destination.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Diffusion RL applied to social navigation with claimed zero-shot controllability extensions, but the abstract gives no metrics or experiment details to back it up.

read the letter

The paper's main move is taking diffusion-based RL policies, which already give more flexible action distributions than the usual Gaussian ones, and adding controllability extensions so the robot can handle new static obstacles or shifted goals like following a target pedestrian without any retraining. That zero-shot adaptation claim is the part that stands out from the abstract. It builds directly on recent diffusion RL work and applies it to a practical robotics setting where standard methods struggle with dynamic human environments. The framing is straightforward: identify the Gaussian limitation, switch to diffusion, then exploit its structure for adaptation. That part reads as a reasonable empirical extension rather than a reinvention. The soft spot is obvious from the abstract alone. No numbers, no baselines, no description of how the extensions are implemented or what the training setup looked like. The soundness score stays low because we cannot check whether the claimed adaptation actually occurs or whether it relies on hidden assumptions about the diffusion process. The weakest link is the idea that the internal structure of the diffusion policy will automatically support these changes just through the proposed extensions. If the full paper has clear ablation results and comparisons showing measurable gains on those unseen scenarios, the contribution becomes more solid. If the experiments are thin or the adaptation only works under narrow conditions, it stays incremental. This is aimed at the social navigation and robot learning crowd. Someone already working on diffusion policies for control would get the most out of the specific extensions. It is worth sending for peer review because the topic is relevant and the approach is distinct enough from Gaussian baselines to merit expert scrutiny, even if the current write-up needs more evidence.

Referee Report

2 major / 0 minor

Summary. The paper introduces COLSON, a method for social navigation of mobile robots among pedestrians that replaces standard Gaussian policies in continuous-action RL with a diffusion-based policy. It claims this yields more flexible action distributions. The authors further propose controllability extensions that exploit diffusion-model properties to enable zero-shot adaptation to previously unseen static obstacles and to shifted objectives (e.g., accompanying a target pedestrian) without retraining or new data. Effectiveness is asserted to have been validated and adaptability demonstrated on these out-of-distribution scenarios.

Significance. If the zero-shot adaptation results are quantitatively substantiated, the work would be a timely empirical contribution to diffusion-based RL for robotics, addressing a practical need for policies that generalize across environment variations without retraining. The framing as an application of diffusion models to social navigation is reasonable, but the absence of any reported metrics, baselines, or ablation details in the abstract prevents assessment of whether the claimed gains are meaningful relative to existing social-navigation RL methods.

major comments (2)

[Abstract] Abstract: the central claims of 'validated effectiveness' and 'demonstrated adaptability' are presented without any numerical results, baselines, success rates, or experimental protocol. This omission is load-bearing because the paper's primary contribution is an empirical demonstration of zero-shot generalization; without these data the soundness of the claim cannot be evaluated.
The weakest assumption identified in the stress-test note—that the diffusion policy's internal structure inherently supports zero-shot generalization solely through the proposed controllability extensions—remains untested in the provided text. No section, equation, or result is supplied that isolates the contribution of the extensions from possible training-data leakage or environment similarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments on our manuscript. We address each major comment below with clarifications drawn directly from the paper's content and experiments.

read point-by-point responses

Referee: [Abstract] Abstract: the central claims of 'validated effectiveness' and 'demonstrated adaptability' are presented without any numerical results, baselines, success rates, or experimental protocol. This omission is load-bearing because the paper's primary contribution is an empirical demonstration of zero-shot generalization; without these data the soundness of the claim cannot be evaluated.

Authors: The abstract serves as a high-level summary. The full manuscript details the experimental protocol, baselines (including standard Gaussian-policy RL and other social navigation methods), and quantitative results such as success rates, collision avoidance metrics, and adaptability performance in Sections on Experiments and Results. To make the abstract self-contained for readers, we will add key numerical highlights in revision. revision: yes
Referee: The weakest assumption identified in the stress-test note—that the diffusion policy's internal structure inherently supports zero-shot generalization solely through the proposed controllability extensions—remains untested in the provided text. No section, equation, or result is supplied that isolates the contribution of the extensions from possible training-data leakage or environment similarity.

Authors: The manuscript isolates the extensions' contribution via targeted experiments: ablation comparisons of the base diffusion policy versus the controllable version on out-of-distribution test cases (unseen static obstacles and shifted objectives like target accompaniment). Environment generation details ensure test scenarios differ from training distributions, with no retraining or new data used. We disagree that this remains untested. revision: no

Circularity Check

0 steps flagged

No significant circularity; empirical application without load-bearing derivations

full rationale

The manuscript describes an application of diffusion-based RL to social navigation plus controllability extensions for zero-shot adaptation to unseen obstacles and objective shifts. No equations, parameter-fitting steps, self-citation chains, or uniqueness theorems are presented that reduce any claimed result to its inputs by construction. The central claims rest on empirical demonstration rather than a derivation chain, so the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; typical RL training involves many implicit hyperparameters whose values are unknown here.

pith-pipeline@v0.9.0 · 5701 in / 1018 out tokens · 31271 ms · 2026-05-23T00:02:37.450350+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 3 internal anchors

[1]

Decentralized Non- communicating Multiagent Collision Avoidance with Deep Reinforce- ment Learning,

Y . F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized Non- communicating Multiagent Collision Avoidance with Deep Reinforce- ment Learning,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 285–292, 2017

work page 2017
[2]

Socially Aware Motion Planning with Deep Reinforcement Learning,

Y . F. Chen, M. Everett, M. Liu, and J. P. How, “Socially Aware Motion Planning with Deep Reinforcement Learning,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1343–1350, 2017

work page 2017
[3]

Motion Planning among Dynamic, Decision-Making Agents with Deep Reinforcement Learn- ing,

M. Everett, Y . F. Chen, and J. P. How, “Motion Planning among Dynamic, Decision-Making Agents with Deep Reinforcement Learn- ing,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 3052–3059, 2018

work page 2018
[4]

Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforce- ment Learning,

C. Chen, Y . Liu, S. Kreiss, and A. Alahi, “Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforce- ment Learning,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 6015–6022, 2019

work page 2019
[5]

Relational Graph Learning for Crowd Navigation,

C. Chen, S. Hu, P. Nikdel, G. Mori, and M. Savva, “Relational Graph Learning for Crowd Navigation,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 10007–10013, 2020

work page 2020
[6]

Mobile Robot Navigation Using Learning-Based Method Based on Predictive State Representation in a Dynamic Environment,

K. Matsumoto, A. Kawamura, Q. An, and R. Kurazume, “Mobile Robot Navigation Using Learning-Based Method Based on Predictive State Representation in a Dynamic Environment,” in Proceedings of the IEEE/SICE International Symposium on System Integration (SII) , pp. 499–504, 2022

work page 2022
[7]

ST 2: Spatial- Temporal state transformer for Crowd-Aware autonomous navigation,

Y . Yang, J. Jiang, J. Zhang, J. Huang, and M. Gao, “ST 2: Spatial- Temporal state transformer for Crowd-Aware autonomous navigation,” IEEE Robotics and Automation Letters , vol. 8, no. 2, pp. 912–919, 2023

work page 2023
[8]

Inten- tion Aware Robot Crowd Navigation with Attention-Based Interaction Graph,

S. Liu, P. Chang, Z. Huang, N. Chakraborty, K. Hong, W. Liang, D. Livingston McPherson, J. Geng, and K. Driggs-Campbell, “Inten- tion Aware Robot Crowd Navigation with Attention-Based Interaction Graph,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 12015–12021, 2023

work page 2023
[9]

Robot Navigation in Crowds by Graph Convolutional Networks With Attention Learned From Human Gaze,

Y . Chen, C. Liu, B. E. Shi, and M. Liu, “Robot Navigation in Crowds by Graph Convolutional Networks With Attention Learned From Human Gaze,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 2754–2761, 2020

work page 2020
[10]

Robot Navigation in Crowded Environments Using Deep Reinforcement Learning,

L. Lucia, D. Daniel, C. Gianluca, S. Roland, and D. Renaud, “Robot Navigation in Crowded Environments Using Deep Reinforcement Learning,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 5671–5677, 2020

work page 2020
[11]

Crowd-Aware Robot Navigation for Pedestrians with Multiple Collision Avoidance Strategies via Map-based Deep Reinforcement Learning,

S. Yao, G. Chen, Q. Qiu, J. Ma, X. Chen, and J. Ji, “Crowd-Aware Robot Navigation for Pedestrians with Multiple Collision Avoidance Strategies via Map-based Deep Reinforcement Learning,” in Proceed- ings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 8144–8150, 2021

work page 2021
[12]

Relational Navigation Learning in Continuous Action Space among Crowds,

X. Zhang, W. Xi, X. Guo, Y . Fang, B. Wang, W. Liu, and J. Hao, “Relational Navigation Learning in Continuous Action Space among Crowds,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 3175–3181, 2021

work page 2021
[13]

Risk-Sensitive Mobile Robot Navigation in Crowded Environment via Offline Re- inforcement Learning,

J. Wu, Y . Wang, H. Asama, Q. An, and A. Yamashita, “Risk-Sensitive Mobile Robot Navigation in Crowded Environment via Offline Re- inforcement Learning,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 7456–7462, 2023

work page 2023
[14]

Crowd-Aware Robot Navigation with Switching Between Learning-Based and Rule-Based Methods Using Normalizing Flows,

K. Matsumoto, Y . Hyodo, and R. Kurazume, “Crowd-Aware Robot Navigation with Switching Between Learning-Based and Rule-Based Methods Using Normalizing Flows,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 4823–4830, 2024

work page 2024
[15]

Denoising Diffusion Probabilistic Models,

J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” in Advances in Neural Information Processing Systems (NeurIPS), pp. 6840–6851, 2020

work page 2020
[16]

High-Resolution Image Synthesis with Latent Diffusion Models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 10674–10685, 2022

work page 2022
[17]

LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models,

K. Nakashima and R. Kurazume, “LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 14724–14731, 2024

work page 2024
[18]

Planning with Diffusion for Flexible Behavior Synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with Diffusion for Flexible Behavior Synthesis,” in Proceedings of the International Conference on Machine Learning (ICML) , pp. 9902– 9915, 2022

work page 2022
[19]

Goal-Conditioned Imi- tation Learning using score-based Diffusion Policies,

M. Reuss, M. Li, X. Jia, and R. Lioutikov, “Goal-Conditioned Imi- tation Learning using score-based Diffusion Policies,” in Proceedings of the Robotics: Science and Systems (RSS) , 2023

work page 2023
[20]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning,

Z. Wang, J. J. Hunt, and M. Zhou, “Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning,” in Proceedings of the International Conference on Learning Representations (ICLR) , 2023

work page 2023
[21]

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching,

H. J. Terry Suh, G. Chou, H. Dai, L. Yang, A. Gupta, and R. Tedrake, “Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching,” in Proceedings of the Annual Confer- ence on Robot Learning (CoRL) , pp. 2878–2904, 2023

work page 2023
[22]

Efficient Diffusion Policies For Offline Reinforcement Learning,

B. Kang, X. Ma, C. Du, T. Pang, and Y . A. N. Shuicheng, “Efficient Diffusion Policies For Offline Reinforcement Learning,” in Advances in Neural Information Processing Systems (NeurIPS) , pp. 67195– 67212, 2023

work page 2023
[23]

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

P. Hansen-Estruch, I. Kostrikov, M. Janner, J. G. Kuba, and S. Levine, “IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies,” CoRR, vol. abs/2304.10573, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

Policy representation via diffusion probability model for reinforcement learning

L. Yang, Z. Huang, F. Lei, Y . Zhong, Y . Yang, C. Fang, S. Wen, B. Zhou, and Z. Lin, “Policy Representation via Diffusion Probability Model for Reinforcement Learning,” CoRR, vol. abs/2305.13122, 2023

work page arXiv 2023
[25]

Learning a Diffusion Model Policy from Rewards via Q-Score Matching,

M. Psenka, A. Escontrela, P. Abbeel, and Y . Ma, “Learning a Diffusion Model Policy from Rewards via Q-Score Matching,” in Proceedings of the International Conference on Machine Learnin (ICML) , pp. 41163– 41182, 2024

work page 2024
[26]

Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization,

S. Ding, K. Hu, Z. Zhang, K. Ren, W. Zhang, J. Yu, J. Wang, and Y . Shi, “Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization,” in Advances in Neural Information Processing Systems (NeurIPS) , pp. 53945–53968, 2024

work page 2024
[27]

NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration,

A. Sridhar, D. Shah, C. Glossop, and S. Levine, “NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration,” in Pro- ceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 63–70, 2024

work page 2024
[28]

DiPPeR: Diffusion-based 2D Path Planner applied on Legged Robots,

J. Liu, M. Stamatopoulou, and D. Kanoulas, “DiPPeR: Diffusion-based 2D Path Planner applied on Legged Robots,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 9264–9270, 2024

work page 2024
[29]

DiPPeST: Diffusion- based path planner for synthesizing trajectories applied on quadruped robots,

M. Stamatopoulou, J. Liu, and D. Kanoulas, “DiPPeST: Diffusion- based path planner for synthesizing trajectories applied on quadruped robots,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 7787–7793, 2024

work page 2024
[30]

How Attentive are Graph Atten- tion Networks?,

S. Brody, U. Alon, and E. Yahav, “How Attentive are Graph Atten- tion Networks?,” in Proceedings of the International Conference on Learning Representations (ICLR) , 2022

work page 2022
[31]

SDEdit: Guided Image Synthesis and Editing with Stochastic Differ- ential Equations,

C. Meng, Y . He, Y . Song, J. Song, J. Wu, J.-Y . Zhu, and S. Ermon, “SDEdit: Guided Image Synthesis and Editing with Stochastic Differ- ential Equations,” in Proceedings of the International Conference on Learning Representations (ICLR) , 2022

work page 2022
[32]

Reciprocal n-Body Collision Avoidance,

J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-Body Collision Avoidance,” in Proceedings of the International Symposium of Robotic Research , pp. 3–19, 2011

work page 2011
[33]

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

X. B. Peng, A. Kumar, G. Zhang, and S. Levine, “Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning,” CoRR, vol. abs/1910.00177, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[34]

Q-Value Weighted Regression: Reinforcement Learning with Limited Data,

P. Kozakowski, L. Kaiser, H. Michalewski, A. Mohiuddin, and K. Ka ´nska, “Q-Value Weighted Regression: Reinforcement Learning with Limited Data,” in Proceedings of the International Joint Confer- ence on Neural Networks (IJCNN) , pp. 1–8, 2022

work page 2022
[35]

AWAC: Accelerating Online Reinforcement Learning with Offline Datasets

A. Nair, M. Dalal, A. Gupta, and S. Levine, “AW AC: Accelerat- ing Online Reinforcement Learning with Offline Datasets,” CoRR, vol. abs/2006.09359, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006
[36]

Regularizing Action Policies for Smooth Control with Reinforcement Learning,

S. Mysore, B. Mabsout, R. Mancuso, and K. Saenko, “Regularizing Action Policies for Smooth Control with Reinforcement Learning,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 1810–1816, 2021

work page 2021
[37]

DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data,

D. Jia, A. Hermans, and B. Leibe, “DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 10270–10277, 2020

work page 2020
[38]

Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera,

D. Jia, M. Steinweg, A. Hermans, and B. Leibe, “Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 13301–13307, 2021

work page 2021

[1] [1]

Decentralized Non- communicating Multiagent Collision Avoidance with Deep Reinforce- ment Learning,

Y . F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized Non- communicating Multiagent Collision Avoidance with Deep Reinforce- ment Learning,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 285–292, 2017

work page 2017

[2] [2]

Socially Aware Motion Planning with Deep Reinforcement Learning,

Y . F. Chen, M. Everett, M. Liu, and J. P. How, “Socially Aware Motion Planning with Deep Reinforcement Learning,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1343–1350, 2017

work page 2017

[3] [3]

Motion Planning among Dynamic, Decision-Making Agents with Deep Reinforcement Learn- ing,

M. Everett, Y . F. Chen, and J. P. How, “Motion Planning among Dynamic, Decision-Making Agents with Deep Reinforcement Learn- ing,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 3052–3059, 2018

work page 2018

[4] [4]

Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforce- ment Learning,

C. Chen, Y . Liu, S. Kreiss, and A. Alahi, “Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforce- ment Learning,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 6015–6022, 2019

work page 2019

[5] [5]

Relational Graph Learning for Crowd Navigation,

C. Chen, S. Hu, P. Nikdel, G. Mori, and M. Savva, “Relational Graph Learning for Crowd Navigation,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 10007–10013, 2020

work page 2020

[6] [6]

Mobile Robot Navigation Using Learning-Based Method Based on Predictive State Representation in a Dynamic Environment,

K. Matsumoto, A. Kawamura, Q. An, and R. Kurazume, “Mobile Robot Navigation Using Learning-Based Method Based on Predictive State Representation in a Dynamic Environment,” in Proceedings of the IEEE/SICE International Symposium on System Integration (SII) , pp. 499–504, 2022

work page 2022

[7] [7]

ST 2: Spatial- Temporal state transformer for Crowd-Aware autonomous navigation,

Y . Yang, J. Jiang, J. Zhang, J. Huang, and M. Gao, “ST 2: Spatial- Temporal state transformer for Crowd-Aware autonomous navigation,” IEEE Robotics and Automation Letters , vol. 8, no. 2, pp. 912–919, 2023

work page 2023

[8] [8]

Inten- tion Aware Robot Crowd Navigation with Attention-Based Interaction Graph,

S. Liu, P. Chang, Z. Huang, N. Chakraborty, K. Hong, W. Liang, D. Livingston McPherson, J. Geng, and K. Driggs-Campbell, “Inten- tion Aware Robot Crowd Navigation with Attention-Based Interaction Graph,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 12015–12021, 2023

work page 2023

[9] [9]

Robot Navigation in Crowds by Graph Convolutional Networks With Attention Learned From Human Gaze,

Y . Chen, C. Liu, B. E. Shi, and M. Liu, “Robot Navigation in Crowds by Graph Convolutional Networks With Attention Learned From Human Gaze,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 2754–2761, 2020

work page 2020

[10] [10]

Robot Navigation in Crowded Environments Using Deep Reinforcement Learning,

L. Lucia, D. Daniel, C. Gianluca, S. Roland, and D. Renaud, “Robot Navigation in Crowded Environments Using Deep Reinforcement Learning,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 5671–5677, 2020

work page 2020

[11] [11]

Crowd-Aware Robot Navigation for Pedestrians with Multiple Collision Avoidance Strategies via Map-based Deep Reinforcement Learning,

S. Yao, G. Chen, Q. Qiu, J. Ma, X. Chen, and J. Ji, “Crowd-Aware Robot Navigation for Pedestrians with Multiple Collision Avoidance Strategies via Map-based Deep Reinforcement Learning,” in Proceed- ings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 8144–8150, 2021

work page 2021

[12] [12]

Relational Navigation Learning in Continuous Action Space among Crowds,

X. Zhang, W. Xi, X. Guo, Y . Fang, B. Wang, W. Liu, and J. Hao, “Relational Navigation Learning in Continuous Action Space among Crowds,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 3175–3181, 2021

work page 2021

[13] [13]

Risk-Sensitive Mobile Robot Navigation in Crowded Environment via Offline Re- inforcement Learning,

J. Wu, Y . Wang, H. Asama, Q. An, and A. Yamashita, “Risk-Sensitive Mobile Robot Navigation in Crowded Environment via Offline Re- inforcement Learning,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 7456–7462, 2023

work page 2023

[14] [14]

Crowd-Aware Robot Navigation with Switching Between Learning-Based and Rule-Based Methods Using Normalizing Flows,

K. Matsumoto, Y . Hyodo, and R. Kurazume, “Crowd-Aware Robot Navigation with Switching Between Learning-Based and Rule-Based Methods Using Normalizing Flows,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 4823–4830, 2024

work page 2024

[15] [15]

Denoising Diffusion Probabilistic Models,

J. Ho, A. Jain, and P. Abbeel, “Denoising Diffusion Probabilistic Models,” in Advances in Neural Information Processing Systems (NeurIPS), pp. 6840–6851, 2020

work page 2020

[16] [16]

High-Resolution Image Synthesis with Latent Diffusion Models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 10674–10685, 2022

work page 2022

[17] [17]

LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models,

K. Nakashima and R. Kurazume, “LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 14724–14731, 2024

work page 2024

[18] [18]

Planning with Diffusion for Flexible Behavior Synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with Diffusion for Flexible Behavior Synthesis,” in Proceedings of the International Conference on Machine Learning (ICML) , pp. 9902– 9915, 2022

work page 2022

[19] [19]

Goal-Conditioned Imi- tation Learning using score-based Diffusion Policies,

M. Reuss, M. Li, X. Jia, and R. Lioutikov, “Goal-Conditioned Imi- tation Learning using score-based Diffusion Policies,” in Proceedings of the Robotics: Science and Systems (RSS) , 2023

work page 2023

[20] [20]

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning,

Z. Wang, J. J. Hunt, and M. Zhou, “Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning,” in Proceedings of the International Conference on Learning Representations (ICLR) , 2023

work page 2023

[21] [21]

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching,

H. J. Terry Suh, G. Chou, H. Dai, L. Yang, A. Gupta, and R. Tedrake, “Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching,” in Proceedings of the Annual Confer- ence on Robot Learning (CoRL) , pp. 2878–2904, 2023

work page 2023

[22] [22]

Efficient Diffusion Policies For Offline Reinforcement Learning,

B. Kang, X. Ma, C. Du, T. Pang, and Y . A. N. Shuicheng, “Efficient Diffusion Policies For Offline Reinforcement Learning,” in Advances in Neural Information Processing Systems (NeurIPS) , pp. 67195– 67212, 2023

work page 2023

[23] [23]

IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

P. Hansen-Estruch, I. Kostrikov, M. Janner, J. G. Kuba, and S. Levine, “IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies,” CoRR, vol. abs/2304.10573, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[24] [24]

Policy representation via diffusion probability model for reinforcement learning

L. Yang, Z. Huang, F. Lei, Y . Zhong, Y . Yang, C. Fang, S. Wen, B. Zhou, and Z. Lin, “Policy Representation via Diffusion Probability Model for Reinforcement Learning,” CoRR, vol. abs/2305.13122, 2023

work page arXiv 2023

[25] [25]

Learning a Diffusion Model Policy from Rewards via Q-Score Matching,

M. Psenka, A. Escontrela, P. Abbeel, and Y . Ma, “Learning a Diffusion Model Policy from Rewards via Q-Score Matching,” in Proceedings of the International Conference on Machine Learnin (ICML) , pp. 41163– 41182, 2024

work page 2024

[26] [26]

Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization,

S. Ding, K. Hu, Z. Zhang, K. Ren, W. Zhang, J. Yu, J. Wang, and Y . Shi, “Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization,” in Advances in Neural Information Processing Systems (NeurIPS) , pp. 53945–53968, 2024

work page 2024

[27] [27]

NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration,

A. Sridhar, D. Shah, C. Glossop, and S. Levine, “NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration,” in Pro- ceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 63–70, 2024

work page 2024

[28] [28]

DiPPeR: Diffusion-based 2D Path Planner applied on Legged Robots,

J. Liu, M. Stamatopoulou, and D. Kanoulas, “DiPPeR: Diffusion-based 2D Path Planner applied on Legged Robots,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 9264–9270, 2024

work page 2024

[29] [29]

DiPPeST: Diffusion- based path planner for synthesizing trajectories applied on quadruped robots,

M. Stamatopoulou, J. Liu, and D. Kanoulas, “DiPPeST: Diffusion- based path planner for synthesizing trajectories applied on quadruped robots,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 7787–7793, 2024

work page 2024

[30] [30]

How Attentive are Graph Atten- tion Networks?,

S. Brody, U. Alon, and E. Yahav, “How Attentive are Graph Atten- tion Networks?,” in Proceedings of the International Conference on Learning Representations (ICLR) , 2022

work page 2022

[31] [31]

SDEdit: Guided Image Synthesis and Editing with Stochastic Differ- ential Equations,

C. Meng, Y . He, Y . Song, J. Song, J. Wu, J.-Y . Zhu, and S. Ermon, “SDEdit: Guided Image Synthesis and Editing with Stochastic Differ- ential Equations,” in Proceedings of the International Conference on Learning Representations (ICLR) , 2022

work page 2022

[32] [32]

Reciprocal n-Body Collision Avoidance,

J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n-Body Collision Avoidance,” in Proceedings of the International Symposium of Robotic Research , pp. 3–19, 2011

work page 2011

[33] [33]

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

X. B. Peng, A. Kumar, G. Zhang, and S. Levine, “Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning,” CoRR, vol. abs/1910.00177, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[34] [34]

Q-Value Weighted Regression: Reinforcement Learning with Limited Data,

P. Kozakowski, L. Kaiser, H. Michalewski, A. Mohiuddin, and K. Ka ´nska, “Q-Value Weighted Regression: Reinforcement Learning with Limited Data,” in Proceedings of the International Joint Confer- ence on Neural Networks (IJCNN) , pp. 1–8, 2022

work page 2022

[35] [35]

AWAC: Accelerating Online Reinforcement Learning with Offline Datasets

A. Nair, M. Dalal, A. Gupta, and S. Levine, “AW AC: Accelerat- ing Online Reinforcement Learning with Offline Datasets,” CoRR, vol. abs/2006.09359, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006

[36] [36]

Regularizing Action Policies for Smooth Control with Reinforcement Learning,

S. Mysore, B. Mabsout, R. Mancuso, and K. Saenko, “Regularizing Action Policies for Smooth Control with Reinforcement Learning,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 1810–1816, 2021

work page 2021

[37] [37]

DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data,

D. Jia, A. Hermans, and B. Leibe, “DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 10270–10277, 2020

work page 2020

[38] [38]

Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera,

D. Jia, M. Steinweg, A. Hermans, and B. Leibe, “Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) , pp. 13301–13307, 2021

work page 2021