Trajectory Learning with Graph Representations for Social Robot Navigation

Berke Kartal; Burcu Kilic; Emre Ugur; Yigit Yildirim

arxiv: 2607.00028 · v1 · pith:RA4X6V4Xnew · submitted 2026-06-21 · 💻 cs.RO

Trajectory Learning with Graph Representations for Social Robot Navigation

Berke Kartal , Burcu Kilic , Yigit Yildirim , Emre Ugur This is my paper

Pith reviewed 2026-07-02 21:41 UTC · model grok-4.3

classification 💻 cs.RO

keywords social robot navigationimitation learninggraph representationstrajectory learningcrowd navigationpedestrian interactionsspatiotemporal dynamics

0 comments

The pith

Graph-based imitation learning encodes pedestrian interactions and learns full trajectories to improve social robot navigation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to build a navigation system for robots that follows patterns observed in real pedestrian crowds without relying on hand-designed rewards. It combines a graph network that attends to how people relate to each other with a module that predicts entire paths instead of single steps. This targets two weaknesses in earlier work: imitation methods that ignore social context and reinforcement methods that simplify behavior into static rules. If the approach holds, robots could move through populated spaces while producing fewer disturbances than current data-driven systems.

Core claim

We propose an imitation learning framework that leverages spatiotemporal dynamics for socially compliant navigation. To represent social context based on interactions, we introduce a graph-based auxiliary network that encodes crowd states by attending to pedestrians. In addition, we present a navigation module that captures temporal dynamics and mitigates error accumulations by incorporating encoded state predictions and employing a trajectory-level learning objective. Our framework outperforms established data-driven baselines on simulation and a real-world dataset across diverse social metrics.

What carries the argument

Graph-based auxiliary network that encodes crowd states by attending to pedestrians, paired with a navigation module that uses encoded state predictions and a trajectory-level learning objective.

If this is right

The method captures both spatial interactions and temporal dynamics present in real pedestrian data.
Trajectory-level training reduces error accumulation compared with step-by-step imitation.
Performance improves across multiple social metrics on both simulated and recorded crowd scenes.
The framework avoids the need for manually engineered reward functions used in reinforcement learning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same graph encoding could be applied to predict how groups of robots should coordinate with humans.
Adding uncertainty estimates to the state predictions might further limit long-horizon drift.
The trajectory objective could be combined with safety constraints without returning to hand-crafted rewards.

Load-bearing premise

The combination of graph-based social encoding and trajectory-level learning with state predictions is sufficient to capture real pedestrian interactions and avoid error accumulation without additional hand-crafted components.

What would settle it

A controlled test on the real-world dataset in which the proposed method produces equal or higher pedestrian disturbance scores or shows larger trajectory deviation than the strongest baseline after 10 seconds of rollout.

Figures

Figures reproduced from arXiv: 2607.00028 by Berke Kartal, Burcu Kilic, Emre Ugur, Yigit Yildirim.

**Figure 1.** Figure 1: Training procedure of the trajectory generator network. First, GFAE is trained to extract graph embeddings to be used as state representations. Then, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: The architecture of Graph Feature Autoencoder. Firstly, Graph [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The illustration of four different social scenarios that we constructed in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative snapshots from executed GE+CNMP trajectories on [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Qualitative plots from SCAND experiments where a row is a sample [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Autonomous mobile robots are expected to exhibit socially compliant navigation for minimizing pedestrian disturbance. While capturing social interactions and incorporating pedestrian motion estimations into decision-making are beneficial for compliance, prior methods fail to address both spatial and temporal characteristics present in real-world data. Reinforcement Learning offers high capability, but it requires hand-crafted reward functions that reduce social behavior to static criteria, limiting its ability to reproduce patterns that exist in real pedestrian behavior. Imitation Learning offers direct training from real-world data but lacks modeling of social interactions and suffers from error accumulation. To this end, we propose an imitation learning framework that leverages spatiotemporal dynamics for socially compliant navigation. To represent social context based on interactions, we introduce a graph-based auxiliary network that encodes crowd states by attending to pedestrians. In addition, we present a navigation module that captures temporal dynamics and mitigates error accumulations by incorporating encoded state predictions and employing a trajectory-level learning objective. Our framework outperforms established data-driven baselines on simulation and a real-world dataset across diverse social metrics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Graph attention for crowds plus trajectory-level imitation claims to beat baselines on social navigation, but the abstract gives no numbers, ablations, or experimental details to support it.

read the letter

The main takeaway is that this paper puts forward a graph auxiliary network to encode pedestrian interactions via attention, paired with a navigation module that adds state predictions and a trajectory-level objective inside an imitation learning setup. The goal is to handle both spatial social context and temporal error accumulation better than standard RL or IL.

It does a clear job naming the practical limits of the two main prior lines: RL's hand-crafted rewards flatten real pedestrian patterns, and plain IL ignores social structure while drifting over time. The proposed split of graph encoding for the crowd state and trajectory objective for the robot's path is a reasonable way to try closing those gaps.

The soft spot is the complete absence of any experimental substance. The abstract asserts outperformance on simulation and a real-world dataset across social metrics, yet supplies no baseline descriptions, no metric definitions, no ablation isolating the graph or trajectory pieces, and no numbers at all. The stress-test note is on target here; without those controls it is impossible to tell whether the new components actually produce reliable gains or whether error accumulation and interaction modeling are handled any better than before.

This is aimed at people already working on data-driven social navigation in robotics. A reader hunting for architecture ideas might pull the graph-plus-trajectory framing as a prompt for their own work, but the lack of evidence means it does not yet stand as a result.

It does not deserve a serious referee in this form. The central claim cannot be evaluated from the text provided.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes an imitation learning framework for socially compliant navigation of mobile robots. It features a graph-based auxiliary network that encodes crowd states by attending to individual pedestrians to capture social interactions, and a navigation module that incorporates encoded state predictions and uses a trajectory-level learning objective to model temporal dynamics and reduce error accumulation. The authors claim that this framework outperforms established data-driven baselines on both simulation and a real-world dataset across diverse social metrics.

Significance. If the experimental results can be substantiated with detailed methodology, ablations, and statistical analysis, the work would represent a useful contribution to social robot navigation by combining graph representations for spatial social context with trajectory learning to address limitations in prior RL and IL approaches.

major comments (1)

Abstract: The central claim that the proposed framework outperforms baselines is stated without any quantitative results, specific metrics, baseline descriptions, or error bars, making it impossible to evaluate whether the graph auxiliary network and trajectory-level objective provide the asserted benefits.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and the constructive comment on the abstract. We address the point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: Abstract: The central claim that the proposed framework outperforms baselines is stated without any quantitative results, specific metrics, baseline descriptions, or error bars, making it impossible to evaluate whether the graph auxiliary network and trajectory-level objective provide the asserted benefits.

Authors: We agree that the abstract would benefit from quantitative support for the performance claims. The full manuscript provides these details in the experimental sections (including specific social metrics, baseline comparisons, and error bars from both simulation and real-world evaluations). To directly address the concern, we will revise the abstract in the next version to include key quantitative highlights (e.g., relative improvements on metrics such as collision rate and social compliance scores) while remaining within length limits. This change will make the asserted benefits of the graph auxiliary network and trajectory-level objective more evaluable from the abstract alone. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework with no load-bearing derivations or self-referential predictions

full rationale

The paper presents an imitation learning architecture combining a graph auxiliary network for social encoding and a trajectory-level objective with state predictions. No equations, uniqueness theorems, fitted parameters renamed as predictions, or self-citation chains appear in the provided abstract or description. The central claim is an empirical outperformance result on simulation and real-world data, which is falsifiable via external benchmarks and does not reduce to any definitional identity or fitted-input prediction. The derivation chain is therefore self-contained as a standard architectural proposal plus experimental validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5707 in / 1049 out tokens · 26963 ms · 2026-07-02T21:41:35.193148+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references

[1]

A survey on socially aware robot navigation: Taxonomy and future challenges,

P. T. Singamaneni, P. Bachiller-Burgos, L. J. Manso, A. Garrell, A. San- feliu, A. Spalanzani, and R. Alami, “A survey on socially aware robot navigation: Taxonomy and future challenges,”The International Journal of Robotics Research, vol. 43, no. 10, pp. 1533–1572, 2024

2024
[2]

Principles and guidelines for evaluating social robot navigation algorithms,

A. Francis, C. P ´erez-d’Arpino, C. Li, F. Xia, A. Alahi, R. Alami, A. Bera, A. Biswas, J. Biswas, R. Chandraet al., “Principles and guidelines for evaluating social robot navigation algorithms,”ACM Transactions on Human-Robot Interaction, vol. 14, no. 2, pp. 1–65, 2025

2025
[3]

Core challenges of social robot navigation: A survey,

C. Mavrogiannis, F. Baldini, A. Wang, D. Zhao, P. Trautman, A. Stein- feld, and J. Oh, “Core challenges of social robot navigation: A survey,” J. Hum.-Robot Interact., vol. 12, no. 3, 2023

2023
[4]

Socially compliant mobile robot navigation via inverse reinforcement learning,

H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially compliant mobile robot navigation via inverse reinforcement learning,” Int. J. Robot. Res., vol. 35, no. 11, pp. 1289–1307, 2016

2016
[5]

Generative adversarial imitation learning,

J. Ho and S. Ermon, “Generative adversarial imitation learning,”Ad- vances in neural information processing systems, vol. 29, 2016

2016
[6]

A reduction of imitation learning and structured prediction to no-regret online learning,

S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” inProceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR, 2011, pp. 627–635

2011
[7]

Socially compliant navigation through raw depth inputs with generative adversarial imitation learning,

L. Tai, J. Zhang, M. Liu, and W. Burgard, “Socially compliant navigation through raw depth inputs with generative adversarial imitation learning,” inIEEE ICRA, 2018, pp. 1111–1117

2018
[8]

Learning social navigation from demonstra- tions with conditional neural processes,

Y . Yildirim and E. Ugur, “Learning social navigation from demonstra- tions with conditional neural processes,”Interaction Studies, vol. 23, no. 3, pp. 427–468, 2022

2022
[9]

Conditional neural processes,

M. Garnelo, D. Rosenbaum, C. Maddison, T. Ramalho, D. Saxton, M. Shanahan, Y . W. Teh, D. Rezende, and S. A. Eslami, “Conditional neural processes,” inICML. PMLR, 2018, pp. 1704–1713

2018
[10]

Densecavoid: Real-time navigation in dense crowds using anticipatory behaviors,

A. J. Sathyamoorthy, J. Liang, U. Patel, T. Guan, R. Chandra, and D. Manocha, “Densecavoid: Real-time navigation in dense crowds using anticipatory behaviors,” inIEEE ICRA, 2020, pp. 11 345–11 352

2020
[11]

Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,

C. Chen, Y . Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,” inIEEE ICRA, 2019, pp. 6015–6022

2019
[12]

Relational graph learning for crowd navigation,

C. Chen, S. Hu, P. Nikdel, G. Mori, and M. Savva, “Relational graph learning for crowd navigation,” inIEEE/RSJ IROS, 2020

2020
[13]

Social force model for pedestrian dynamics,

D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical review E, vol. 51, no. 5, p. 4282, 1995

1995
[14]

Reciprocal n- body collision avoidance,

J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n- body collision avoidance,” inRobotics research: the 14th international symposium ISRR. Springer, 2011, pp. 3–19

2011
[15]

Decentralized non- communicating multiagent collision avoidance with deep reinforcement learning,

Y . F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized non- communicating multiagent collision avoidance with deep reinforcement learning,” inIEEE ICRA, 2017, pp. 285–292

2017
[16]

Socially aware motion planning with deep reinforcement learning,

Y . F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion planning with deep reinforcement learning,” inIEEE/RSJ IROS, 2017, pp. 1343–1350

2017
[17]

Dr-mpc: Deep residual model predictive control for real-world social navigation,

J. R. Han, H. Thomas, J. Zhang, N. Rhinehart, and T. D. Barfoot, “Dr-mpc: Deep residual model predictive control for real-world social navigation,”IEEE Robotics and Automation Letters, 2025

2025
[18]

Vlm-social-nav: Socially aware robot navigation through scoring us- ing vision-language models,

D. Song, J. Liang, A. Payandeh, A. H. Raj, X. Xiao, and D. Manocha, “Vlm-social-nav: Socially aware robot navigation through scoring us- ing vision-language models,”IEEE Robotics and Automation Letters, vol. 10, no. 1, pp. 508–515, 2025

2025
[19]

Structural-rnn: Deep learning on spatio-temporal graphs,

A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-rnn: Deep learning on spatio-temporal graphs,” inProc. CVPR, 2016, pp. 5308– 5317

2016
[20]

Social attention: Modeling attention in human crowds,

A. Vemula, K. Muelling, and J. Oh, “Social attention: Modeling attention in human crowds,” inIEEE ICRA, 2018, pp. 4601–4607

2018
[21]

Robot navigation in crowds by graph convolutional networks with attention learned from human gaze,

Y . Chen, C. Liu, B. E. Shi, and M. Liu, “Robot navigation in crowds by graph convolutional networks with attention learned from human gaze,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2754–2761, 2020

2020
[22]

Socially aware object goal navigation with heterogeneous scene repre- sentation learning,

B. Chen, H. Zhu, S. Yao, S. Lu, P. Zhong, Y . Sheng, and J. Wang, “Socially aware object goal navigation with heterogeneous scene repre- sentation learning,”IEEE Robotics and Automation Letters, vol. 9, no. 8, pp. 6792–6799, 2024

2024
[23]

Semi-Supervised Classification with Graph Convolutional Networks,

T. N. Kipf and M. Welling, “Semi-Supervised Classification with Graph Convolutional Networks,” inProc. ICLR, 2017

2017
[24]

Masked label prediction: Unified message passing model for semi-supervised classification,

Y . Shi, Z. Huang, S. Feng, H. Zhong, W. Wang, and Y . Sun, “Masked label prediction: Unified message passing model for semi-supervised classification,” inIJCAI-21, 8 2021, pp. 1548–1554

2021
[25]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017
[26]

Conditional neural movement primitives

M. Y . Seker, M. Imre, J. H. Piater, and E. Ugur, “Conditional neural movement primitives.” inRobotics: Science and Systems, vol. 10, 2019

2019
[27]

Graph attention networks,

P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” inICLR, 2018

2018
[28]

Sean 2.0: Formalizing and generating social situations for robot navigation,

N. Tsoi, A. Xiang, P. Yu, S. S. Sohn, G. Schwartz, S. Ramesh, M. Hussein, A. W. Gupta, M. Kapadia, and M. V ´azquez, “Sean 2.0: Formalizing and generating social situations for robot navigation,”IEEE Robotics and Automation Letters, pp. 1–8, 2022. 9

2022
[29]

E. T. Hall,The Hidden Dimension. New York, NY , US: Anchor Books, 1966

1966
[30]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, 2024

2024
[31]

Denoising diffusion implicit models,

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inICLR, 2021

2021
[32]

How attentive are graph attention networks?

S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” inICLR, 2022

2022
[33]

Visualizing data using t-sne,

L. van der Maaten and G. Hinton, “Visualizing data using t-sne,”Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008

2008
[34]

Socially compliant navigation dataset (scand): A large-scale dataset of demonstrations for social navigation,

H. Karnan, A. Nair, X. Xiao, G. Warnell, S. Pirk, A. Toshev, J. Hart, J. Biswas, and P. Stone, “Socially compliant navigation dataset (scand): A large-scale dataset of demonstrations for social navigation,”IEEE Robotics and Automation Letters, 2022

2022
[35]

DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data,

D. Jia, A. Hermans, and B. Leibe, “DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data,” in IEEE/RSJ IROS, 2020

2020

[1] [1]

A survey on socially aware robot navigation: Taxonomy and future challenges,

P. T. Singamaneni, P. Bachiller-Burgos, L. J. Manso, A. Garrell, A. San- feliu, A. Spalanzani, and R. Alami, “A survey on socially aware robot navigation: Taxonomy and future challenges,”The International Journal of Robotics Research, vol. 43, no. 10, pp. 1533–1572, 2024

2024

[2] [2]

Principles and guidelines for evaluating social robot navigation algorithms,

A. Francis, C. P ´erez-d’Arpino, C. Li, F. Xia, A. Alahi, R. Alami, A. Bera, A. Biswas, J. Biswas, R. Chandraet al., “Principles and guidelines for evaluating social robot navigation algorithms,”ACM Transactions on Human-Robot Interaction, vol. 14, no. 2, pp. 1–65, 2025

2025

[3] [3]

Core challenges of social robot navigation: A survey,

C. Mavrogiannis, F. Baldini, A. Wang, D. Zhao, P. Trautman, A. Stein- feld, and J. Oh, “Core challenges of social robot navigation: A survey,” J. Hum.-Robot Interact., vol. 12, no. 3, 2023

2023

[4] [4]

Socially compliant mobile robot navigation via inverse reinforcement learning,

H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially compliant mobile robot navigation via inverse reinforcement learning,” Int. J. Robot. Res., vol. 35, no. 11, pp. 1289–1307, 2016

2016

[5] [5]

Generative adversarial imitation learning,

J. Ho and S. Ermon, “Generative adversarial imitation learning,”Ad- vances in neural information processing systems, vol. 29, 2016

2016

[6] [6]

A reduction of imitation learning and structured prediction to no-regret online learning,

S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” inProceedings of the fourteenth international conference on artificial intelligence and statistics. JMLR, 2011, pp. 627–635

2011

[7] [7]

Socially compliant navigation through raw depth inputs with generative adversarial imitation learning,

L. Tai, J. Zhang, M. Liu, and W. Burgard, “Socially compliant navigation through raw depth inputs with generative adversarial imitation learning,” inIEEE ICRA, 2018, pp. 1111–1117

2018

[8] [8]

Learning social navigation from demonstra- tions with conditional neural processes,

Y . Yildirim and E. Ugur, “Learning social navigation from demonstra- tions with conditional neural processes,”Interaction Studies, vol. 23, no. 3, pp. 427–468, 2022

2022

[9] [9]

Conditional neural processes,

M. Garnelo, D. Rosenbaum, C. Maddison, T. Ramalho, D. Saxton, M. Shanahan, Y . W. Teh, D. Rezende, and S. A. Eslami, “Conditional neural processes,” inICML. PMLR, 2018, pp. 1704–1713

2018

[10] [10]

Densecavoid: Real-time navigation in dense crowds using anticipatory behaviors,

A. J. Sathyamoorthy, J. Liang, U. Patel, T. Guan, R. Chandra, and D. Manocha, “Densecavoid: Real-time navigation in dense crowds using anticipatory behaviors,” inIEEE ICRA, 2020, pp. 11 345–11 352

2020

[11] [11]

Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,

C. Chen, Y . Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,” inIEEE ICRA, 2019, pp. 6015–6022

2019

[12] [12]

Relational graph learning for crowd navigation,

C. Chen, S. Hu, P. Nikdel, G. Mori, and M. Savva, “Relational graph learning for crowd navigation,” inIEEE/RSJ IROS, 2020

2020

[13] [13]

Social force model for pedestrian dynamics,

D. Helbing and P. Molnar, “Social force model for pedestrian dynamics,” Physical review E, vol. 51, no. 5, p. 4282, 1995

1995

[14] [14]

Reciprocal n- body collision avoidance,

J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha, “Reciprocal n- body collision avoidance,” inRobotics research: the 14th international symposium ISRR. Springer, 2011, pp. 3–19

2011

[15] [15]

Decentralized non- communicating multiagent collision avoidance with deep reinforcement learning,

Y . F. Chen, M. Liu, M. Everett, and J. P. How, “Decentralized non- communicating multiagent collision avoidance with deep reinforcement learning,” inIEEE ICRA, 2017, pp. 285–292

2017

[16] [16]

Socially aware motion planning with deep reinforcement learning,

Y . F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion planning with deep reinforcement learning,” inIEEE/RSJ IROS, 2017, pp. 1343–1350

2017

[17] [17]

Dr-mpc: Deep residual model predictive control for real-world social navigation,

J. R. Han, H. Thomas, J. Zhang, N. Rhinehart, and T. D. Barfoot, “Dr-mpc: Deep residual model predictive control for real-world social navigation,”IEEE Robotics and Automation Letters, 2025

2025

[18] [18]

Vlm-social-nav: Socially aware robot navigation through scoring us- ing vision-language models,

D. Song, J. Liang, A. Payandeh, A. H. Raj, X. Xiao, and D. Manocha, “Vlm-social-nav: Socially aware robot navigation through scoring us- ing vision-language models,”IEEE Robotics and Automation Letters, vol. 10, no. 1, pp. 508–515, 2025

2025

[19] [19]

Structural-rnn: Deep learning on spatio-temporal graphs,

A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-rnn: Deep learning on spatio-temporal graphs,” inProc. CVPR, 2016, pp. 5308– 5317

2016

[20] [20]

Social attention: Modeling attention in human crowds,

A. Vemula, K. Muelling, and J. Oh, “Social attention: Modeling attention in human crowds,” inIEEE ICRA, 2018, pp. 4601–4607

2018

[21] [21]

Robot navigation in crowds by graph convolutional networks with attention learned from human gaze,

Y . Chen, C. Liu, B. E. Shi, and M. Liu, “Robot navigation in crowds by graph convolutional networks with attention learned from human gaze,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2754–2761, 2020

2020

[22] [22]

Socially aware object goal navigation with heterogeneous scene repre- sentation learning,

B. Chen, H. Zhu, S. Yao, S. Lu, P. Zhong, Y . Sheng, and J. Wang, “Socially aware object goal navigation with heterogeneous scene repre- sentation learning,”IEEE Robotics and Automation Letters, vol. 9, no. 8, pp. 6792–6799, 2024

2024

[23] [23]

Semi-Supervised Classification with Graph Convolutional Networks,

T. N. Kipf and M. Welling, “Semi-Supervised Classification with Graph Convolutional Networks,” inProc. ICLR, 2017

2017

[24] [24]

Masked label prediction: Unified message passing model for semi-supervised classification,

Y . Shi, Z. Huang, S. Feng, H. Zhong, W. Wang, and Y . Sun, “Masked label prediction: Unified message passing model for semi-supervised classification,” inIJCAI-21, 8 2021, pp. 1548–1554

2021

[25] [25]

Attention is all you need,

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017

2017

[26] [26]

Conditional neural movement primitives

M. Y . Seker, M. Imre, J. H. Piater, and E. Ugur, “Conditional neural movement primitives.” inRobotics: Science and Systems, vol. 10, 2019

2019

[27] [27]

Graph attention networks,

P. Veli ˇckovi´c, G. Cucurull, A. Casanova, A. Romero, P. Li `o, and Y . Bengio, “Graph attention networks,” inICLR, 2018

2018

[28] [28]

Sean 2.0: Formalizing and generating social situations for robot navigation,

N. Tsoi, A. Xiang, P. Yu, S. S. Sohn, G. Schwartz, S. Ramesh, M. Hussein, A. W. Gupta, M. Kapadia, and M. V ´azquez, “Sean 2.0: Formalizing and generating social situations for robot navigation,”IEEE Robotics and Automation Letters, pp. 1–8, 2022. 9

2022

[29] [29]

E. T. Hall,The Hidden Dimension. New York, NY , US: Anchor Books, 1966

1966

[30] [30]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, 2024

2024

[31] [31]

Denoising diffusion implicit models,

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” inICLR, 2021

2021

[32] [32]

How attentive are graph attention networks?

S. Brody, U. Alon, and E. Yahav, “How attentive are graph attention networks?” inICLR, 2022

2022

[33] [33]

Visualizing data using t-sne,

L. van der Maaten and G. Hinton, “Visualizing data using t-sne,”Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008

2008

[34] [34]

Socially compliant navigation dataset (scand): A large-scale dataset of demonstrations for social navigation,

H. Karnan, A. Nair, X. Xiao, G. Warnell, S. Pirk, A. Toshev, J. Hart, J. Biswas, and P. Stone, “Socially compliant navigation dataset (scand): A large-scale dataset of demonstrations for social navigation,”IEEE Robotics and Automation Letters, 2022

2022

[35] [35]

DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data,

D. Jia, A. Hermans, and B. Leibe, “DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person Detection in 2D Range Data,” in IEEE/RSJ IROS, 2020

2020