pith. sign in

arxiv: 2112.02604 · v3 · submitted 2021-12-05 · 💻 cs.CV · cs.AI

PSI: A Benchmark for Human Interpretation and Response in Traffic Interactions

Pith reviewed 2026-05-24 13:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords pedestrian intention predictiondriver decision makingbenchmark datasethuman explanationsautonomous drivinginterpretable reasoningtraffic interactionstrajectory forecasting
0
0 comments X

The pith

The PSI benchmark supplies traffic scenes with evolving pedestrian intentions and human textual explanations to support interpretable driving models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PSI as a benchmark dataset that records how pedestrian crossing intentions change over time as seen by a driver. It includes human-written explanations that show the reasoning used to judge those intentions and to decide on driving actions. This combination allows testing of models on both prediction accuracy and the ability to produce human-aligned explanations. The benchmark defines standard tasks and metrics for intention prediction, decision modeling, reasoning generation, and trajectory forecasting.

Core claim

PSI is a benchmark dataset that captures the dynamic evolution of pedestrian crossing intentions from the driver's perspective, enriched with human textual explanations that reflect the reasoning behind intention estimation and driving decision making. These annotations support standardized tasks and evaluation protocols for models that combine predictive performance with interpretable and human-aligned reasoning.

What carries the argument

The PSI dataset's human textual explanations attached to evolving pedestrian intention labels and driving decisions.

If this is right

  • Models can be evaluated for both predictive accuracy and alignment with human reasoning processes.
  • Standardized protocols enable consistent benchmarking across intention prediction, decision modeling, and trajectory forecasting.
  • Autonomous systems can be developed to generate explanations that match human cognitive processes in traffic scenarios.
  • Research can focus on causal evaluation of how explanations relate to actual decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending PSI to include multi-agent interactions could reveal how explanations scale to complex scenes.
  • Models using these explanations might improve trust in autonomous vehicles by providing understandable justifications.
  • Future work could test whether the dataset's scenes generalize beyond the collected environments.

Load-bearing premise

The collected human textual explanations accurately reflect the actual reasoning processes that humans use for intention estimation and driving decisions.

What would settle it

A study showing that the human explanations do not correlate with independent human judgments of the same scenes or that models using the explanations show no improvement in human alignment over label-only models.

Figures

Figures reproduced from arXiv: 2112.02604 by Heishiro Toyoda, Joshua Domeyer, Renran Tian, Rini Sherony, Taotao Jing, Tina Chen, Yaobin Chen, Zhengming Ding.

Figure 1
Figure 1. Figure 1: (a) Pedestrian Situated Intent Segmentation is shown as polygonal chains representing the estimated pedestrian intents to cross in front of the ego-vehicle. The intentions can switch from three states, namely “Cross”, “Not Cross”, and “Not Sure”. Each polygonal curve is the time-based estimation from one human driver/annotator, classifying the pedestrian’s situated intent into one of the three categories e… view at source ↗
Figure 1
Figure 1. Figure 1: Example of visual and cognitive annotations for a case that pedestrians did not cross in front of the ego-car eventually. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pipeline of our proposed framework, with inputs of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Binary intent prediction results from the three methods in comparison with the ground-truth aggregated across 24 annotators for PSI. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of pedestrian intent and explanation prediction on PSI. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Accurately modeling pedestrian intention and understanding driver decision-making processes are critical for the development of safe and socially aware autonomous driving systems. We introduce PSI, a benchmark dataset that captures the dynamic evolution of pedestrian crossing intentions from the driver's perspective, enriched with human textual explanations that reflect the reasoning behind intention estimation and driving decision making. These annotations offer a unique foundation for developing and benchmarking models that combine predictive performance with interpretable and human-aligned reasoning. PSI supports standardized tasks and evaluation protocols across multiple dimensions, including pedestrian intention prediction, driver decision modeling, reasoning generation, and trajectory forecasting and more. By enabling causal and interpretable evaluation, PSI advances research toward autonomous systems that can reason, act, and explain in alignment with human cognitive processes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces the PSI benchmark dataset, which records dynamic pedestrian crossing intentions from the driver's perspective together with human textual explanations that articulate the reasoning behind intention estimates and driving decisions. It defines standardized tasks and evaluation protocols for pedestrian intention prediction, driver decision modeling, reasoning generation, trajectory forecasting, and related dimensions, with the goal of supporting causal and interpretable assessment of autonomous driving systems.

Significance. If the collection protocol, multi-annotator design, and task definitions ensure faithful reflection of human reasoning and representative coverage of traffic scenes, the dataset supplies a concrete resource for benchmarking models that jointly optimize predictive accuracy and human-aligned interpretability. The explicit documentation of annotation procedures and task specifications constitutes a clear strength for reproducibility in this domain.

minor comments (3)
  1. [§3] §3 (Data Collection): the description of scene selection criteria would benefit from an explicit statement of how representativeness across weather, time-of-day, and intersection types was quantified or enforced.
  2. [Table 1] Table 1 (Dataset Statistics): report inter-annotator agreement metrics (e.g., Fleiss' κ or pairwise agreement on intention labels and explanation overlap) to substantiate the claim of reliable human annotations.
  3. [§5] §5 (Evaluation Protocols): clarify whether the provided baselines include any ablation on the textual explanations or only on the visual features, as this directly affects the interpretability claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a dataset/benchmark paper introducing PSI with human annotations for pedestrian intention and driver decision-making. No mathematical derivations, fitted parameters, predictions, or uniqueness theorems are present. The central contribution is the dataset construction and task definitions, supported by explicit collection protocols and multi-annotator processes. No load-bearing steps reduce to self-definition, fitted inputs, or self-citation chains. This matches the default expectation for non-circular papers and the reader's assessment that no derivations exist.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or derivation is present; the paper is a dataset contribution with no free parameters, axioms, or invented entities identified from the abstract.

pith-pipeline@v0.9.0 · 5671 in / 1043 out tokens · 23270 ms · 2026-05-24T13:07:42.801410+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Beyond Fixed Thresholds and Domain-Specific Benchmarks for Explainable Multi-Task Classification in Autonomous Vehicles

    cs.CV 2026-05 unverdicted novelty 4.0

    Adaptive confidence threshold selection improves F1 scores in explainable multi-task classification for autonomous driving and is supported by a new 958-image dataset.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 1 Pith paper · 1 internal anchor

  1. [1]

    Exploratory analysis of automated vehicle crashes in california: A text analytics & hierarchical bayesian heterogeneity-based approach,

    A. M. Boggs, B. Wali, and A. J. Khattak, “Exploratory analysis of automated vehicle crashes in california: A text analytics & hierarchical bayesian heterogeneity-based approach,” Accident Analysis & Preven- tion, vol. 135, p. 105354, 2020

  2. [2]

    Vehicle automation–other road user communication and coordination: Theory and mechanisms,

    J. E. Domeyer, J. D. Lee, and H. Toyoda, “Vehicle automation–other road user communication and coordination: Theory and mechanisms,” IEEE Access, vol. 8, pp. 19 860–19 872, 2020

  3. [3]

    Attentional-gcnn: Adaptive pedestrian trajectory prediction towards generic autonomous vehicle use cases,

    K. Li, S. Eiffert, M. Shan, F. Gomez-Donoso, S. Worrall, and E. Nebot, “Attentional-gcnn: Adaptive pedestrian trajectory prediction towards generic autonomous vehicle use cases,” in 2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021, pp. 14 241–14 247

  4. [4]

    Path planning for intelligent vehicle collision avoidance of dynamic pedestrian using att-lstm, msfm and mpc at un-signalized crosswalk,

    H. Chen and X. Zhang, “Path planning for intelligent vehicle collision avoidance of dynamic pedestrian using att-lstm, msfm and mpc at un-signalized crosswalk,” IEEE Transactions on Industrial Electronics , 2021

  5. [5]

    Dynamic path planning for autonomous driving on branch streets with crossing pedestrian avoidance guidance,

    W. Wu, H. Jia, Q. Luo, and Z. Wang, “Dynamic path planning for autonomous driving on branch streets with crossing pedestrian avoidance guidance,” IEEE Access, vol. 7, pp. 144 720–144 731, 2019. 9 TABLE III COMPARISON OF PREDICTION RESULTS FOR DIFFERENT BASELINE MODELS ON PSI DATASETS FOR PREDICTING PEDESTRIAN TRAJECTORY . 0.5s 1.0s 1.5s ADE↓ FDE↓ ARB↓ FRB...

  6. [6]

    Autonovi: Autonomous vehicle planning with dynamic maneuvers and traffic constraints,

    A. Best, S. Narang, D. Barber, and D. Manocha, “Autonovi: Autonomous vehicle planning with dynamic maneuvers and traffic constraints,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 2629–2636

  7. [7]

    A survey on deep-learning methods for pedestrian behavior prediction from the egocentric view,

    T. Chen and R. Tian, “A survey on deep-learning methods for pedestrian behavior prediction from the egocentric view,” in 2021 IEEE Interna- tional Intelligent Transportation Systems Conference (ITSC) . IEEE, 2021, pp. 1898–1905

  8. [8]

    Developing socially acceptable au- tonomous vehicles,

    E. Vinkhuyzen and M. Cefkin, “Developing socially acceptable au- tonomous vehicles,” in Ethnographic Praxis in Industry Conference Proceedings, vol. 2016, no. 1. Wiley Online Library, 2016, pp. 522– 534

  9. [9]

    Why autonomous driving is so hard: The social di- mension of traffic,

    H. R. Pelikan, “Why autonomous driving is so hard: The social di- mension of traffic,” in Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction , 2021, pp. 81–85

  10. [10]

    Pedestrian crossing in- tention prediction at red-light using pose estimation,

    S. Zhang, M. Abdel-Aty, Y . Wu, and O. Zheng, “Pedestrian crossing in- tention prediction at red-light using pose estimation,” IEEE Transactions on Intelligent Transportation Systems , 2021

  11. [11]

    Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction,

    A. Rasouli, I. Kotseruba, T. Kunic, and J. K. Tsotsos, “Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2019, pp. 6262–6271

  12. [12]

    The theory of planned behavior,

    I. Ajzen, “The theory of planned behavior,” Organizational behavior and human decision processes , vol. 50, no. 2, pp. 179–211, 1991

  13. [13]

    Pedestrian interaction with vehicles: roles of explicit and implicit communication,

    D. Dey and J. Terken, “Pedestrian interaction with vehicles: roles of explicit and implicit communication,” in Proceedings of the 9th international conference on automotive user interfaces and interactive vehicular applications, 2017, pp. 109–113

  14. [14]

    Pedestrian-driver communication and decision strategies at marked crossings,

    M. Sucha, D. Dostal, and R. Risser, “Pedestrian-driver communication and decision strategies at marked crossings,” Accident Analysis & Prevention, vol. 102, pp. 41–50, 2017

  15. [15]

    The two settings of kind and wicked learning environments,

    R. M. Hogarth, T. Lejarraga, and E. Soyer, “The two settings of kind and wicked learning environments,” Current Directions in Psychological Science, vol. 24, no. 5, pp. 379–385, 2015

  16. [16]

    Explainability of vision-based autonomous driving systems: Review and challenges,

    ´E. Zablocki, H. Ben-Younes, P. P ´erez, and M. Cord, “Explainability of vision-based autonomous driving systems: Review and challenges,” arXiv preprint arXiv:2101.05307 , 2021

  17. [17]

    Textual explanations for self-driving vehicles,

    J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” in Proceedings of the European conference on computer vision (ECCV) , 2018, pp. 563–578

  18. [18]

    Understanding pedestrian behavior in complex traffic scenes,

    A. Rasouli, I. Kotseruba, and J. K. Tsotsos, “Understanding pedestrian behavior in complex traffic scenes,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 1, pp. 61–70, 2017

  19. [19]

    A novel adaptation of information extraction algorithm to process natural text descriptions of pedestrian encounters,

    M. F. Elahi, J. G. Sreeram, X. Luo, and R. Tian, “A novel adaptation of information extraction algorithm to process natural text descriptions of pedestrian encounters,” in 2021 IEEE International Intelligent Trans- portation Systems Conference (ITSC) . IEEE, 2021, pp. 1906–1912

  20. [20]

    Visual compositional learning for human-object interaction detection,

    Z. Hou, X. Peng, Y . Qiao, and D. Tao, “Visual compositional learning for human-object interaction detection,” in European Conference on Computer Vision. Springer, 2020, pp. 584–600

  21. [21]

    Compositional learning for human object interaction,

    K. Kato, Y . Li, and A. Gupta, “Compositional learning for human object interaction,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 234–251

  22. [22]

    Drivers’ compliance with speed limits: an application of the theory of planned behavior

    M. A. Elliott, C. J. Armitage, and C. J. Baughan, “Drivers’ compliance with speed limits: an application of the theory of planned behavior.” Journal of Applied Psychology , vol. 88, no. 5, p. 964, 2003

  23. [23]

    Sukthankar, C

    G. Sukthankar, C. Geib, H. H. Bui, D. Pynadath, and R. P. Goldman, Plan, activity, and intent recognition: Theory and practice . Newnes, 2014

  24. [24]

    Defining interactions: A conceptual framework for understanding interactive behaviour in human and automated road traffic,

    G. Markkula, R. Madigan, D. Nathanael, E. Portouli, Y . M. Lee, A. Diet- rich, J. Billington, A. Schieben, and N. Merat, “Defining interactions: A conceptual framework for understanding interactive behaviour in human and automated road traffic,” Theoretical Issues in Ergonomics Science , vol. 21, no. 6, pp. 728–752, 2020

  25. [25]

    L. A. Suchman, Plans and situated actions: The problem of human- machine communication. Cambridge university press, 1987

  26. [26]

    Joint attention in au- tonomous driving (jaad),

    I. Kotseruba, A. Rasouli, and J. K. Tsotsos, “Joint attention in au- tonomous driving (jaad),” arXiv preprint arXiv:1609.04741 , 2016

  27. [27]

    Spatiotemporal relationship reasoning for pedestrian intent prediction,

    B. Liu, E. Adeli, Z. Cao, K.-H. Lee, A. Shenoi, A. Gaidon, and J. C. Niebles, “Spatiotemporal relationship reasoning for pedestrian intent prediction,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 3485–3492, 2020

  28. [28]

    Titan: Future forecast using action priors,

    S. Malla, B. Dariush, and C. Choi, “Titan: Future forecast using action priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 186–11 196

  29. [29]

    Pedx: Bench- mark dataset for metric 3-d pose estimation of pedestrians in complex urban intersections,

    W. Kim, M. S. Ramanagopal, C. Barto, M.-Y . Yu, K. Rosaen, N. Goumas, R. Vasudevan, and M. Johnson-Roberson, “Pedx: Bench- mark dataset for metric 3-d pose estimation of pedestrians in complex urban intersections,” IEEE Robotics and Automation Letters , vol. 4, no. 2, pp. 1940–1947, 2019

  30. [30]

    Bdd100k: A diverse driving video database with scalable annotation tooling

    F. Yu, W. Xian, Y . Chen, F. Liu, M. Liao, V . Madhavan, and T. Darrell, “Bdd100k: A diverse driving video database with scalable annotation tooling,” arXiv preprint arXiv:1805.04687 , vol. 2, no. 5, p. 6, 2018

  31. [31]

    The apolloscape open dataset for autonomous driving and its application,

    X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, “The apolloscape open dataset for autonomous driving and its application,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 10, pp. 2702–2719, 2019

  32. [32]

    nuscenes: A multimodal dataset for autonomous driving,

    H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

  33. [33]

    A2d2: Audi autonomous driving dataset,

    J. Geyer, Y . Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V . H. Pham, M. M ¨uhlegg, S. Dorn et al. , “A2d2: Audi autonomous driving dataset,” arXiv preprint arXiv:2004.06320 , 2020

  34. [34]

    Argoverse: 3d tracking and forecasting with rich maps,

    M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan et al., “Argoverse: 3d tracking and forecasting with rich maps,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2019, pp. 8748–8757

  35. [35]

    Large scale interactive motion forecast- ing for autonomous driving: The waymo open motion dataset,

    S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. Qi, Y . Zhou et al., “Large scale interactive motion forecast- ing for autonomous driving: The waymo open motion dataset,” arXiv preprint arXiv:2104.10133, 2021

  36. [36]

    Estimation of the vehicle-pedestrian encounter/conflict risk on the road based on tasi 110-car naturalistic driving data collection,

    R. Tian, L. Li, K. Yang, S. Chien, Y . Chen, and R. Sherony, “Estimation of the vehicle-pedestrian encounter/conflict risk on the road based on tasi 110-car naturalistic driving data collection,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings. IEEE, 2014, pp. 623–629

  37. [37]

    End-to-end learning of driving models from large-scale video datasets,

    H. Xu, Y . Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174– 2182

  38. [38]

    Explainable object-induced action decision for autonomous vehicles,

    Y . Xu, X. Yang, L. Gong, H.-C. Lin, T.-Y . Wu, Y . Li, and N. Vas- concelos, “Explainable object-induced action decision for autonomous vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2020, pp. 9523–9532

  39. [39]

    Microsoft coco: Common objects in context,

    T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision . Springer, 2014, pp. 740–755

  40. [40]

    Event segmentation ability uniquely predicts event memory,

    J. Q. Sargent, J. M. Zacks, D. Z. Hambrick, R. T. Zacks, C. A. Kurby, H. R. Bailey, M. L. Eisenberg, and T. M. Beck, “Event segmentation ability uniquely predicts event memory,” Cognition, vol. 129, no. 2, pp. 241–255, 2013

  41. [41]

    Pedestrians at the kerb–recognising the action intentions of humans,

    S. Schmidt and B. Faerber, “Pedestrians at the kerb–recognising the action intentions of humans,” Transportation research part F: traffic psychology and behaviour , vol. 12, no. 4, pp. 300–310, 2009. 10

  42. [42]

    Event segmentation,

    J. M. Zacks and K. M. Swallow, “Event segmentation,” Current direc- tions in psychological science , vol. 16, no. 2, pp. 80–84, 2007

  43. [43]

    Segmentation in the perception and memory of events,

    C. A. Kurby and J. M. Zacks, “Segmentation in the perception and memory of events,” Trends in cognitive sciences , vol. 12, no. 2, pp. 72–79, 2008

  44. [44]

    Events, event prediction, and predictive processing,

    J. Hohwy, A. Hebblewhite, and T. Drummond, “Events, event prediction, and predictive processing,” Topics in cognitive science , vol. 13, no. 1, pp. 252–255, 2021

  45. [45]

    How does the mind render streaming experience as events?

    D. A. Baldwin and J. E. Kosie, “How does the mind render streaming experience as events?” Topics in Cognitive Science , vol. 13, no. 1, pp. 79–105, 2021

  46. [46]

    Neural Machine Translation by Jointly Learning to Align and Translate

    D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014

  47. [47]

    Social gan: Socially acceptable trajectories with generative adversarial networks,

    A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan: Socially acceptable trajectories with generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2255–2264

  48. [48]

    Bifold and semantic reasoning for pedestrian behavior prediction,

    A. Rasouli, M. Rohani, and J. Luo, “Bifold and semantic reasoning for pedestrian behavior prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 600–15 610

  49. [49]

    Peeking into the future: Predicting future person activities and locations in videos,

    J. Liang, L. Jiang, J. C. Niebles, A. G. Hauptmann, and L. Fei-Fei, “Peeking into the future: Predicting future person activities and locations in videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2019, pp. 5725–5734

  50. [50]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

  51. [51]

    Imagenet: A large-scale hierarchical image database,

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition . Ieee, 2009, pp. 248–255

  52. [52]

    Visual reasoning using graph con- volutional networks for predicting pedestrian crossing intention,

    T. Chen, R. Tian, and Z. Ding, “Visual reasoning using graph con- volutional networks for predicting pedestrian crossing intention,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3103–3109