PSI: A Benchmark for Human Interpretation and Response in Traffic Interactions

Heishiro Toyoda; Joshua Domeyer; Renran Tian; Rini Sherony; Taotao Jing; Tina Chen; Yaobin Chen; Zhengming Ding

arxiv: 2112.02604 · v3 · submitted 2021-12-05 · 💻 cs.CV · cs.AI

PSI: A Benchmark for Human Interpretation and Response in Traffic Interactions

Taotao Jing , Tina Chen , Renran Tian , Yaobin Chen , Joshua Domeyer , Heishiro Toyoda , Rini Sherony , Zhengming Ding This is my paper

Pith reviewed 2026-05-24 13:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords pedestrian intention predictiondriver decision makingbenchmark datasethuman explanationsautonomous drivinginterpretable reasoningtraffic interactionstrajectory forecasting

0 comments

The pith

The PSI benchmark supplies traffic scenes with evolving pedestrian intentions and human textual explanations to support interpretable driving models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PSI as a benchmark dataset that records how pedestrian crossing intentions change over time as seen by a driver. It includes human-written explanations that show the reasoning used to judge those intentions and to decide on driving actions. This combination allows testing of models on both prediction accuracy and the ability to produce human-aligned explanations. The benchmark defines standard tasks and metrics for intention prediction, decision modeling, reasoning generation, and trajectory forecasting.

Core claim

PSI is a benchmark dataset that captures the dynamic evolution of pedestrian crossing intentions from the driver's perspective, enriched with human textual explanations that reflect the reasoning behind intention estimation and driving decision making. These annotations support standardized tasks and evaluation protocols for models that combine predictive performance with interpretable and human-aligned reasoning.

What carries the argument

The PSI dataset's human textual explanations attached to evolving pedestrian intention labels and driving decisions.

If this is right

Models can be evaluated for both predictive accuracy and alignment with human reasoning processes.
Standardized protocols enable consistent benchmarking across intention prediction, decision modeling, and trajectory forecasting.
Autonomous systems can be developed to generate explanations that match human cognitive processes in traffic scenarios.
Research can focus on causal evaluation of how explanations relate to actual decisions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending PSI to include multi-agent interactions could reveal how explanations scale to complex scenes.
Models using these explanations might improve trust in autonomous vehicles by providing understandable justifications.
Future work could test whether the dataset's scenes generalize beyond the collected environments.

Load-bearing premise

The collected human textual explanations accurately reflect the actual reasoning processes that humans use for intention estimation and driving decisions.

What would settle it

A study showing that the human explanations do not correlate with independent human judgments of the same scenes or that models using the explanations show no improvement in human alignment over label-only models.

Figures

Figures reproduced from arXiv: 2112.02604 by Heishiro Toyoda, Joshua Domeyer, Renran Tian, Rini Sherony, Taotao Jing, Tina Chen, Yaobin Chen, Zhengming Ding.

**Figure 1.** Figure 1: (a) Pedestrian Situated Intent Segmentation is shown as polygonal chains representing the estimated pedestrian intents to cross in front of the ego-vehicle. The intentions can switch from three states, namely “Cross”, “Not Cross”, and “Not Sure”. Each polygonal curve is the time-based estimation from one human driver/annotator, classifying the pedestrian’s situated intent into one of the three categories e… view at source ↗

**Figure 1.** Figure 1: Example of visual and cognitive annotations for a case that pedestrians did not cross in front of the ego-car eventually. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Pipeline of our proposed framework, with inputs of [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Binary intent prediction results from the three methods in comparison with the ground-truth aggregated across 24 annotators for PSI. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of pedestrian intent and explanation prediction on PSI. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Accurately modeling pedestrian intention and understanding driver decision-making processes are critical for the development of safe and socially aware autonomous driving systems. We introduce PSI, a benchmark dataset that captures the dynamic evolution of pedestrian crossing intentions from the driver's perspective, enriched with human textual explanations that reflect the reasoning behind intention estimation and driving decision making. These annotations offer a unique foundation for developing and benchmarking models that combine predictive performance with interpretable and human-aligned reasoning. PSI supports standardized tasks and evaluation protocols across multiple dimensions, including pedestrian intention prediction, driver decision modeling, reasoning generation, and trajectory forecasting and more. By enabling causal and interpretable evaluation, PSI advances research toward autonomous systems that can reason, act, and explain in alignment with human cognitive processes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PSI is a new dataset pairing time-varying pedestrian intention labels with human textual explanations from the driver's view, and the collection looks carefully done.

read the letter

The main contribution is the PSI benchmark: scenes annotated for evolving pedestrian crossing intentions plus free-text human explanations of the reasoning and driver decisions. The paper documents a collection protocol, multiple annotators per scene, and clear task definitions for intention prediction, decision modeling, reasoning generation, and trajectory forecasting. That construction is straightforward and avoids internal contradictions or circular claims. The stress-test note confirms the protocol handles the obvious worries about whether the explanations track real reasoning and whether the scenes are representative, with no load-bearing gaps apparent in the argument itself. What the work does well is give the community a standardized setup for testing models that need both prediction and human-aligned explanations in traffic. Soft spots are the standard dataset ones and not severe. We still need usage to show whether the added explanations actually improve interpretability or generalization, and the abstract itself supplies no inter-annotator numbers or error analysis. The paper does not claim new algorithms or prove the annotations are perfect, just that they exist under a documented process. This is for researchers working on interpretable autonomous driving or human-AI alignment in traffic scenes. A reader who needs data for explanation-aware intention models would get concrete value. It deserves peer review because new benchmarks in safety-critical areas need external checks on annotation quality and task coverage.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces the PSI benchmark dataset, which records dynamic pedestrian crossing intentions from the driver's perspective together with human textual explanations that articulate the reasoning behind intention estimates and driving decisions. It defines standardized tasks and evaluation protocols for pedestrian intention prediction, driver decision modeling, reasoning generation, trajectory forecasting, and related dimensions, with the goal of supporting causal and interpretable assessment of autonomous driving systems.

Significance. If the collection protocol, multi-annotator design, and task definitions ensure faithful reflection of human reasoning and representative coverage of traffic scenes, the dataset supplies a concrete resource for benchmarking models that jointly optimize predictive accuracy and human-aligned interpretability. The explicit documentation of annotation procedures and task specifications constitutes a clear strength for reproducibility in this domain.

minor comments (3)

[§3] §3 (Data Collection): the description of scene selection criteria would benefit from an explicit statement of how representativeness across weather, time-of-day, and intersection types was quantified or enforced.
[Table 1] Table 1 (Dataset Statistics): report inter-annotator agreement metrics (e.g., Fleiss' κ or pairwise agreement on intention labels and explanation overlap) to substantiate the claim of reliable human annotations.
[§5] §5 (Evaluation Protocols): clarify whether the provided baselines include any ablation on the textual explanations or only on the visual features, as this directly affects the interpretability claim.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a dataset/benchmark paper introducing PSI with human annotations for pedestrian intention and driver decision-making. No mathematical derivations, fitted parameters, predictions, or uniqueness theorems are present. The central contribution is the dataset construction and task definitions, supported by explicit collection protocols and multi-annotator processes. No load-bearing steps reduce to self-definition, fitted inputs, or self-citation chains. This matches the default expectation for non-circular papers and the reader's assessment that no derivations exist.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or derivation is present; the paper is a dataset contribution with no free parameters, axioms, or invented entities identified from the abstract.

pith-pipeline@v0.9.0 · 5671 in / 1043 out tokens · 23270 ms · 2026-05-24T13:07:42.801410+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Fixed Thresholds and Domain-Specific Benchmarks for Explainable Multi-Task Classification in Autonomous Vehicles
cs.CV 2026-05 unverdicted novelty 4.0

Adaptive confidence threshold selection improves F1 scores in explainable multi-task classification for autonomous driving and is supported by a new 958-image dataset.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Exploratory analysis of automated vehicle crashes in california: A text analytics & hierarchical bayesian heterogeneity-based approach,

A. M. Boggs, B. Wali, and A. J. Khattak, “Exploratory analysis of automated vehicle crashes in california: A text analytics & hierarchical bayesian heterogeneity-based approach,” Accident Analysis & Preven- tion, vol. 135, p. 105354, 2020

work page 2020
[2]

Vehicle automation–other road user communication and coordination: Theory and mechanisms,

J. E. Domeyer, J. D. Lee, and H. Toyoda, “Vehicle automation–other road user communication and coordination: Theory and mechanisms,” IEEE Access, vol. 8, pp. 19 860–19 872, 2020

work page 2020
[3]

Attentional-gcnn: Adaptive pedestrian trajectory prediction towards generic autonomous vehicle use cases,

K. Li, S. Eiffert, M. Shan, F. Gomez-Donoso, S. Worrall, and E. Nebot, “Attentional-gcnn: Adaptive pedestrian trajectory prediction towards generic autonomous vehicle use cases,” in 2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021, pp. 14 241–14 247

work page 2021
[4]

Path planning for intelligent vehicle collision avoidance of dynamic pedestrian using att-lstm, msfm and mpc at un-signalized crosswalk,

H. Chen and X. Zhang, “Path planning for intelligent vehicle collision avoidance of dynamic pedestrian using att-lstm, msfm and mpc at un-signalized crosswalk,” IEEE Transactions on Industrial Electronics , 2021

work page 2021
[5]

Dynamic path planning for autonomous driving on branch streets with crossing pedestrian avoidance guidance,

W. Wu, H. Jia, Q. Luo, and Z. Wang, “Dynamic path planning for autonomous driving on branch streets with crossing pedestrian avoidance guidance,” IEEE Access, vol. 7, pp. 144 720–144 731, 2019. 9 TABLE III COMPARISON OF PREDICTION RESULTS FOR DIFFERENT BASELINE MODELS ON PSI DATASETS FOR PREDICTING PEDESTRIAN TRAJECTORY . 0.5s 1.0s 1.5s ADE↓ FDE↓ ARB↓ FRB...

work page 2019
[6]

Autonovi: Autonomous vehicle planning with dynamic maneuvers and trafﬁc constraints,

A. Best, S. Narang, D. Barber, and D. Manocha, “Autonovi: Autonomous vehicle planning with dynamic maneuvers and trafﬁc constraints,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 2629–2636

work page 2017
[7]

A survey on deep-learning methods for pedestrian behavior prediction from the egocentric view,

T. Chen and R. Tian, “A survey on deep-learning methods for pedestrian behavior prediction from the egocentric view,” in 2021 IEEE Interna- tional Intelligent Transportation Systems Conference (ITSC) . IEEE, 2021, pp. 1898–1905

work page 2021
[8]

Developing socially acceptable au- tonomous vehicles,

E. Vinkhuyzen and M. Cefkin, “Developing socially acceptable au- tonomous vehicles,” in Ethnographic Praxis in Industry Conference Proceedings, vol. 2016, no. 1. Wiley Online Library, 2016, pp. 522– 534

work page 2016
[9]

Why autonomous driving is so hard: The social di- mension of trafﬁc,

H. R. Pelikan, “Why autonomous driving is so hard: The social di- mension of trafﬁc,” in Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction , 2021, pp. 81–85

work page 2021
[10]

Pedestrian crossing in- tention prediction at red-light using pose estimation,

S. Zhang, M. Abdel-Aty, Y . Wu, and O. Zheng, “Pedestrian crossing in- tention prediction at red-light using pose estimation,” IEEE Transactions on Intelligent Transportation Systems , 2021

work page 2021
[11]

Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction,

A. Rasouli, I. Kotseruba, T. Kunic, and J. K. Tsotsos, “Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2019, pp. 6262–6271

work page 2019
[12]

The theory of planned behavior,

I. Ajzen, “The theory of planned behavior,” Organizational behavior and human decision processes , vol. 50, no. 2, pp. 179–211, 1991

work page 1991
[13]

Pedestrian interaction with vehicles: roles of explicit and implicit communication,

D. Dey and J. Terken, “Pedestrian interaction with vehicles: roles of explicit and implicit communication,” in Proceedings of the 9th international conference on automotive user interfaces and interactive vehicular applications, 2017, pp. 109–113

work page 2017
[14]

Pedestrian-driver communication and decision strategies at marked crossings,

M. Sucha, D. Dostal, and R. Risser, “Pedestrian-driver communication and decision strategies at marked crossings,” Accident Analysis & Prevention, vol. 102, pp. 41–50, 2017

work page 2017
[15]

The two settings of kind and wicked learning environments,

R. M. Hogarth, T. Lejarraga, and E. Soyer, “The two settings of kind and wicked learning environments,” Current Directions in Psychological Science, vol. 24, no. 5, pp. 379–385, 2015

work page 2015
[16]

Explainability of vision-based autonomous driving systems: Review and challenges,

´E. Zablocki, H. Ben-Younes, P. P ´erez, and M. Cord, “Explainability of vision-based autonomous driving systems: Review and challenges,” arXiv preprint arXiv:2101.05307 , 2021

work page arXiv 2021
[17]

Textual explanations for self-driving vehicles,

J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” in Proceedings of the European conference on computer vision (ECCV) , 2018, pp. 563–578

work page 2018
[18]

Understanding pedestrian behavior in complex trafﬁc scenes,

A. Rasouli, I. Kotseruba, and J. K. Tsotsos, “Understanding pedestrian behavior in complex trafﬁc scenes,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 1, pp. 61–70, 2017

work page 2017
[19]

A novel adaptation of information extraction algorithm to process natural text descriptions of pedestrian encounters,

M. F. Elahi, J. G. Sreeram, X. Luo, and R. Tian, “A novel adaptation of information extraction algorithm to process natural text descriptions of pedestrian encounters,” in 2021 IEEE International Intelligent Trans- portation Systems Conference (ITSC) . IEEE, 2021, pp. 1906–1912

work page 2021
[20]

Visual compositional learning for human-object interaction detection,

Z. Hou, X. Peng, Y . Qiao, and D. Tao, “Visual compositional learning for human-object interaction detection,” in European Conference on Computer Vision. Springer, 2020, pp. 584–600

work page 2020
[21]

Compositional learning for human object interaction,

K. Kato, Y . Li, and A. Gupta, “Compositional learning for human object interaction,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 234–251

work page 2018
[22]

Drivers’ compliance with speed limits: an application of the theory of planned behavior

M. A. Elliott, C. J. Armitage, and C. J. Baughan, “Drivers’ compliance with speed limits: an application of the theory of planned behavior.” Journal of Applied Psychology , vol. 88, no. 5, p. 964, 2003

work page 2003
[23]

Sukthankar, C

G. Sukthankar, C. Geib, H. H. Bui, D. Pynadath, and R. P. Goldman, Plan, activity, and intent recognition: Theory and practice . Newnes, 2014

work page 2014
[24]

Deﬁning interactions: A conceptual framework for understanding interactive behaviour in human and automated road trafﬁc,

G. Markkula, R. Madigan, D. Nathanael, E. Portouli, Y . M. Lee, A. Diet- rich, J. Billington, A. Schieben, and N. Merat, “Deﬁning interactions: A conceptual framework for understanding interactive behaviour in human and automated road trafﬁc,” Theoretical Issues in Ergonomics Science , vol. 21, no. 6, pp. 728–752, 2020

work page 2020
[25]

L. A. Suchman, Plans and situated actions: The problem of human- machine communication. Cambridge university press, 1987

work page 1987
[26]

Joint attention in au- tonomous driving (jaad),

I. Kotseruba, A. Rasouli, and J. K. Tsotsos, “Joint attention in au- tonomous driving (jaad),” arXiv preprint arXiv:1609.04741 , 2016

work page arXiv 2016
[27]

Spatiotemporal relationship reasoning for pedestrian intent prediction,

B. Liu, E. Adeli, Z. Cao, K.-H. Lee, A. Shenoi, A. Gaidon, and J. C. Niebles, “Spatiotemporal relationship reasoning for pedestrian intent prediction,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 3485–3492, 2020

work page 2020
[28]

Titan: Future forecast using action priors,

S. Malla, B. Dariush, and C. Choi, “Titan: Future forecast using action priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 186–11 196

work page 2020
[29]

Pedx: Bench- mark dataset for metric 3-d pose estimation of pedestrians in complex urban intersections,

W. Kim, M. S. Ramanagopal, C. Barto, M.-Y . Yu, K. Rosaen, N. Goumas, R. Vasudevan, and M. Johnson-Roberson, “Pedx: Bench- mark dataset for metric 3-d pose estimation of pedestrians in complex urban intersections,” IEEE Robotics and Automation Letters , vol. 4, no. 2, pp. 1940–1947, 2019

work page 1940
[30]

Bdd100k: A diverse driving video database with scalable annotation tooling

F. Yu, W. Xian, Y . Chen, F. Liu, M. Liao, V . Madhavan, and T. Darrell, “Bdd100k: A diverse driving video database with scalable annotation tooling,” arXiv preprint arXiv:1805.04687 , vol. 2, no. 5, p. 6, 2018

work page arXiv 2018
[31]

The apolloscape open dataset for autonomous driving and its application,

X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, “The apolloscape open dataset for autonomous driving and its application,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 10, pp. 2702–2719, 2019

work page 2019
[32]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

work page 2020
[33]

A2d2: Audi autonomous driving dataset.arXiv preprint arXiv:2004.06320, 2020

J. Geyer, Y . Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V . H. Pham, M. M ¨uhlegg, S. Dorn et al. , “A2d2: Audi autonomous driving dataset,” arXiv preprint arXiv:2004.06320 , 2020

work page arXiv 2004
[34]

Argoverse: 3d tracking and forecasting with rich maps,

M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan et al., “Argoverse: 3d tracking and forecasting with rich maps,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2019, pp. 8748–8757

work page 2019
[35]

Large scale interactive motion forecast- ing for autonomous driving: The waymo open motion dataset,

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. Qi, Y . Zhou et al., “Large scale interactive motion forecast- ing for autonomous driving: The waymo open motion dataset,” arXiv preprint arXiv:2104.10133, 2021

work page arXiv 2021
[36]

Estimation of the vehicle-pedestrian encounter/conﬂict risk on the road based on tasi 110-car naturalistic driving data collection,

R. Tian, L. Li, K. Yang, S. Chien, Y . Chen, and R. Sherony, “Estimation of the vehicle-pedestrian encounter/conﬂict risk on the road based on tasi 110-car naturalistic driving data collection,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings. IEEE, 2014, pp. 623–629

work page 2014
[37]

End-to-end learning of driving models from large-scale video datasets,

H. Xu, Y . Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174– 2182

work page 2017
[38]

Explainable object-induced action decision for autonomous vehicles,

Y . Xu, X. Yang, L. Gong, H.-C. Lin, T.-Y . Wu, Y . Li, and N. Vas- concelos, “Explainable object-induced action decision for autonomous vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2020, pp. 9523–9532

work page 2020
[39]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision . Springer, 2014, pp. 740–755

work page 2014
[40]

Event segmentation ability uniquely predicts event memory,

J. Q. Sargent, J. M. Zacks, D. Z. Hambrick, R. T. Zacks, C. A. Kurby, H. R. Bailey, M. L. Eisenberg, and T. M. Beck, “Event segmentation ability uniquely predicts event memory,” Cognition, vol. 129, no. 2, pp. 241–255, 2013

work page 2013
[41]

Pedestrians at the kerb–recognising the action intentions of humans,

S. Schmidt and B. Faerber, “Pedestrians at the kerb–recognising the action intentions of humans,” Transportation research part F: trafﬁc psychology and behaviour , vol. 12, no. 4, pp. 300–310, 2009. 10

work page 2009
[42]

Event segmentation,

J. M. Zacks and K. M. Swallow, “Event segmentation,” Current direc- tions in psychological science , vol. 16, no. 2, pp. 80–84, 2007

work page 2007
[43]

Segmentation in the perception and memory of events,

C. A. Kurby and J. M. Zacks, “Segmentation in the perception and memory of events,” Trends in cognitive sciences , vol. 12, no. 2, pp. 72–79, 2008

work page 2008
[44]

Events, event prediction, and predictive processing,

J. Hohwy, A. Hebblewhite, and T. Drummond, “Events, event prediction, and predictive processing,” Topics in cognitive science , vol. 13, no. 1, pp. 252–255, 2021

work page 2021
[45]

How does the mind render streaming experience as events?

D. A. Baldwin and J. E. Kosie, “How does the mind render streaming experience as events?” Topics in Cognitive Science , vol. 13, no. 1, pp. 79–105, 2021

work page 2021
[46]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[47]

Social gan: Socially acceptable trajectories with generative adversarial networks,

A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan: Socially acceptable trajectories with generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2255–2264

work page 2018
[48]

Bifold and semantic reasoning for pedestrian behavior prediction,

A. Rasouli, M. Rohani, and J. Luo, “Bifold and semantic reasoning for pedestrian behavior prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 600–15 610

work page 2021
[49]

Peeking into the future: Predicting future person activities and locations in videos,

J. Liang, L. Jiang, J. C. Niebles, A. G. Hauptmann, and L. Fei-Fei, “Peeking into the future: Predicting future person activities and locations in videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2019, pp. 5725–5734

work page 2019
[50]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

work page 2016
[51]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition . Ieee, 2009, pp. 248–255

work page 2009
[52]

Visual reasoning using graph con- volutional networks for predicting pedestrian crossing intention,

T. Chen, R. Tian, and Z. Ding, “Visual reasoning using graph con- volutional networks for predicting pedestrian crossing intention,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3103–3109

work page 2021

[1] [1]

Exploratory analysis of automated vehicle crashes in california: A text analytics & hierarchical bayesian heterogeneity-based approach,

A. M. Boggs, B. Wali, and A. J. Khattak, “Exploratory analysis of automated vehicle crashes in california: A text analytics & hierarchical bayesian heterogeneity-based approach,” Accident Analysis & Preven- tion, vol. 135, p. 105354, 2020

work page 2020

[2] [2]

Vehicle automation–other road user communication and coordination: Theory and mechanisms,

J. E. Domeyer, J. D. Lee, and H. Toyoda, “Vehicle automation–other road user communication and coordination: Theory and mechanisms,” IEEE Access, vol. 8, pp. 19 860–19 872, 2020

work page 2020

[3] [3]

Attentional-gcnn: Adaptive pedestrian trajectory prediction towards generic autonomous vehicle use cases,

K. Li, S. Eiffert, M. Shan, F. Gomez-Donoso, S. Worrall, and E. Nebot, “Attentional-gcnn: Adaptive pedestrian trajectory prediction towards generic autonomous vehicle use cases,” in 2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021, pp. 14 241–14 247

work page 2021

[4] [4]

Path planning for intelligent vehicle collision avoidance of dynamic pedestrian using att-lstm, msfm and mpc at un-signalized crosswalk,

H. Chen and X. Zhang, “Path planning for intelligent vehicle collision avoidance of dynamic pedestrian using att-lstm, msfm and mpc at un-signalized crosswalk,” IEEE Transactions on Industrial Electronics , 2021

work page 2021

[5] [5]

Dynamic path planning for autonomous driving on branch streets with crossing pedestrian avoidance guidance,

W. Wu, H. Jia, Q. Luo, and Z. Wang, “Dynamic path planning for autonomous driving on branch streets with crossing pedestrian avoidance guidance,” IEEE Access, vol. 7, pp. 144 720–144 731, 2019. 9 TABLE III COMPARISON OF PREDICTION RESULTS FOR DIFFERENT BASELINE MODELS ON PSI DATASETS FOR PREDICTING PEDESTRIAN TRAJECTORY . 0.5s 1.0s 1.5s ADE↓ FDE↓ ARB↓ FRB...

work page 2019

[6] [6]

Autonovi: Autonomous vehicle planning with dynamic maneuvers and trafﬁc constraints,

A. Best, S. Narang, D. Barber, and D. Manocha, “Autonovi: Autonomous vehicle planning with dynamic maneuvers and trafﬁc constraints,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017, pp. 2629–2636

work page 2017

[7] [7]

A survey on deep-learning methods for pedestrian behavior prediction from the egocentric view,

T. Chen and R. Tian, “A survey on deep-learning methods for pedestrian behavior prediction from the egocentric view,” in 2021 IEEE Interna- tional Intelligent Transportation Systems Conference (ITSC) . IEEE, 2021, pp. 1898–1905

work page 2021

[8] [8]

Developing socially acceptable au- tonomous vehicles,

E. Vinkhuyzen and M. Cefkin, “Developing socially acceptable au- tonomous vehicles,” in Ethnographic Praxis in Industry Conference Proceedings, vol. 2016, no. 1. Wiley Online Library, 2016, pp. 522– 534

work page 2016

[9] [9]

Why autonomous driving is so hard: The social di- mension of trafﬁc,

H. R. Pelikan, “Why autonomous driving is so hard: The social di- mension of trafﬁc,” in Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction , 2021, pp. 81–85

work page 2021

[10] [10]

Pedestrian crossing in- tention prediction at red-light using pose estimation,

S. Zhang, M. Abdel-Aty, Y . Wu, and O. Zheng, “Pedestrian crossing in- tention prediction at red-light using pose estimation,” IEEE Transactions on Intelligent Transportation Systems , 2021

work page 2021

[11] [11]

Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction,

A. Rasouli, I. Kotseruba, T. Kunic, and J. K. Tsotsos, “Pie: A large-scale dataset and models for pedestrian intention estimation and trajectory prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2019, pp. 6262–6271

work page 2019

[12] [12]

The theory of planned behavior,

I. Ajzen, “The theory of planned behavior,” Organizational behavior and human decision processes , vol. 50, no. 2, pp. 179–211, 1991

work page 1991

[13] [13]

Pedestrian interaction with vehicles: roles of explicit and implicit communication,

D. Dey and J. Terken, “Pedestrian interaction with vehicles: roles of explicit and implicit communication,” in Proceedings of the 9th international conference on automotive user interfaces and interactive vehicular applications, 2017, pp. 109–113

work page 2017

[14] [14]

Pedestrian-driver communication and decision strategies at marked crossings,

M. Sucha, D. Dostal, and R. Risser, “Pedestrian-driver communication and decision strategies at marked crossings,” Accident Analysis & Prevention, vol. 102, pp. 41–50, 2017

work page 2017

[15] [15]

The two settings of kind and wicked learning environments,

R. M. Hogarth, T. Lejarraga, and E. Soyer, “The two settings of kind and wicked learning environments,” Current Directions in Psychological Science, vol. 24, no. 5, pp. 379–385, 2015

work page 2015

[16] [16]

Explainability of vision-based autonomous driving systems: Review and challenges,

´E. Zablocki, H. Ben-Younes, P. P ´erez, and M. Cord, “Explainability of vision-based autonomous driving systems: Review and challenges,” arXiv preprint arXiv:2101.05307 , 2021

work page arXiv 2021

[17] [17]

Textual explanations for self-driving vehicles,

J. Kim, A. Rohrbach, T. Darrell, J. Canny, and Z. Akata, “Textual explanations for self-driving vehicles,” in Proceedings of the European conference on computer vision (ECCV) , 2018, pp. 563–578

work page 2018

[18] [18]

Understanding pedestrian behavior in complex trafﬁc scenes,

A. Rasouli, I. Kotseruba, and J. K. Tsotsos, “Understanding pedestrian behavior in complex trafﬁc scenes,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 1, pp. 61–70, 2017

work page 2017

[19] [19]

A novel adaptation of information extraction algorithm to process natural text descriptions of pedestrian encounters,

M. F. Elahi, J. G. Sreeram, X. Luo, and R. Tian, “A novel adaptation of information extraction algorithm to process natural text descriptions of pedestrian encounters,” in 2021 IEEE International Intelligent Trans- portation Systems Conference (ITSC) . IEEE, 2021, pp. 1906–1912

work page 2021

[20] [20]

Visual compositional learning for human-object interaction detection,

Z. Hou, X. Peng, Y . Qiao, and D. Tao, “Visual compositional learning for human-object interaction detection,” in European Conference on Computer Vision. Springer, 2020, pp. 584–600

work page 2020

[21] [21]

Compositional learning for human object interaction,

K. Kato, Y . Li, and A. Gupta, “Compositional learning for human object interaction,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 234–251

work page 2018

[22] [22]

Drivers’ compliance with speed limits: an application of the theory of planned behavior

M. A. Elliott, C. J. Armitage, and C. J. Baughan, “Drivers’ compliance with speed limits: an application of the theory of planned behavior.” Journal of Applied Psychology , vol. 88, no. 5, p. 964, 2003

work page 2003

[23] [23]

Sukthankar, C

G. Sukthankar, C. Geib, H. H. Bui, D. Pynadath, and R. P. Goldman, Plan, activity, and intent recognition: Theory and practice . Newnes, 2014

work page 2014

[24] [24]

Deﬁning interactions: A conceptual framework for understanding interactive behaviour in human and automated road trafﬁc,

G. Markkula, R. Madigan, D. Nathanael, E. Portouli, Y . M. Lee, A. Diet- rich, J. Billington, A. Schieben, and N. Merat, “Deﬁning interactions: A conceptual framework for understanding interactive behaviour in human and automated road trafﬁc,” Theoretical Issues in Ergonomics Science , vol. 21, no. 6, pp. 728–752, 2020

work page 2020

[25] [25]

L. A. Suchman, Plans and situated actions: The problem of human- machine communication. Cambridge university press, 1987

work page 1987

[26] [26]

Joint attention in au- tonomous driving (jaad),

I. Kotseruba, A. Rasouli, and J. K. Tsotsos, “Joint attention in au- tonomous driving (jaad),” arXiv preprint arXiv:1609.04741 , 2016

work page arXiv 2016

[27] [27]

Spatiotemporal relationship reasoning for pedestrian intent prediction,

B. Liu, E. Adeli, Z. Cao, K.-H. Lee, A. Shenoi, A. Gaidon, and J. C. Niebles, “Spatiotemporal relationship reasoning for pedestrian intent prediction,” IEEE Robotics and Automation Letters , vol. 5, no. 2, pp. 3485–3492, 2020

work page 2020

[28] [28]

Titan: Future forecast using action priors,

S. Malla, B. Dariush, and C. Choi, “Titan: Future forecast using action priors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 186–11 196

work page 2020

[29] [29]

Pedx: Bench- mark dataset for metric 3-d pose estimation of pedestrians in complex urban intersections,

W. Kim, M. S. Ramanagopal, C. Barto, M.-Y . Yu, K. Rosaen, N. Goumas, R. Vasudevan, and M. Johnson-Roberson, “Pedx: Bench- mark dataset for metric 3-d pose estimation of pedestrians in complex urban intersections,” IEEE Robotics and Automation Letters , vol. 4, no. 2, pp. 1940–1947, 2019

work page 1940

[30] [30]

Bdd100k: A diverse driving video database with scalable annotation tooling

F. Yu, W. Xian, Y . Chen, F. Liu, M. Liao, V . Madhavan, and T. Darrell, “Bdd100k: A diverse driving video database with scalable annotation tooling,” arXiv preprint arXiv:1805.04687 , vol. 2, no. 5, p. 6, 2018

work page arXiv 2018

[31] [31]

The apolloscape open dataset for autonomous driving and its application,

X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, “The apolloscape open dataset for autonomous driving and its application,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 10, pp. 2702–2719, 2019

work page 2019

[32] [32]

nuscenes: A multimodal dataset for autonomous driving,

H. Caesar, V . Bankiti, A. H. Lang, S. V ora, V . E. Liong, Q. Xu, A. Krishnan, Y . Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631

work page 2020

[33] [33]

A2d2: Audi autonomous driving dataset.arXiv preprint arXiv:2004.06320, 2020

J. Geyer, Y . Kassahun, M. Mahmudi, X. Ricou, R. Durgesh, A. S. Chung, L. Hauswald, V . H. Pham, M. M ¨uhlegg, S. Dorn et al. , “A2d2: Audi autonomous driving dataset,” arXiv preprint arXiv:2004.06320 , 2020

work page arXiv 2004

[34] [34]

Argoverse: 3d tracking and forecasting with rich maps,

M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, D. Wang, P. Carr, S. Lucey, D. Ramanan et al., “Argoverse: 3d tracking and forecasting with rich maps,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2019, pp. 8748–8757

work page 2019

[35] [35]

Large scale interactive motion forecast- ing for autonomous driving: The waymo open motion dataset,

S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y . Chai, B. Sapp, C. Qi, Y . Zhou et al., “Large scale interactive motion forecast- ing for autonomous driving: The waymo open motion dataset,” arXiv preprint arXiv:2104.10133, 2021

work page arXiv 2021

[36] [36]

Estimation of the vehicle-pedestrian encounter/conﬂict risk on the road based on tasi 110-car naturalistic driving data collection,

R. Tian, L. Li, K. Yang, S. Chien, Y . Chen, and R. Sherony, “Estimation of the vehicle-pedestrian encounter/conﬂict risk on the road based on tasi 110-car naturalistic driving data collection,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings. IEEE, 2014, pp. 623–629

work page 2014

[37] [37]

End-to-end learning of driving models from large-scale video datasets,

H. Xu, Y . Gao, F. Yu, and T. Darrell, “End-to-end learning of driving models from large-scale video datasets,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2174– 2182

work page 2017

[38] [38]

Explainable object-induced action decision for autonomous vehicles,

Y . Xu, X. Yang, L. Gong, H.-C. Lin, T.-Y . Wu, Y . Li, and N. Vas- concelos, “Explainable object-induced action decision for autonomous vehicles,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2020, pp. 9523–9532

work page 2020

[39] [39]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision . Springer, 2014, pp. 740–755

work page 2014

[40] [40]

Event segmentation ability uniquely predicts event memory,

J. Q. Sargent, J. M. Zacks, D. Z. Hambrick, R. T. Zacks, C. A. Kurby, H. R. Bailey, M. L. Eisenberg, and T. M. Beck, “Event segmentation ability uniquely predicts event memory,” Cognition, vol. 129, no. 2, pp. 241–255, 2013

work page 2013

[41] [41]

Pedestrians at the kerb–recognising the action intentions of humans,

S. Schmidt and B. Faerber, “Pedestrians at the kerb–recognising the action intentions of humans,” Transportation research part F: trafﬁc psychology and behaviour , vol. 12, no. 4, pp. 300–310, 2009. 10

work page 2009

[42] [42]

Event segmentation,

J. M. Zacks and K. M. Swallow, “Event segmentation,” Current direc- tions in psychological science , vol. 16, no. 2, pp. 80–84, 2007

work page 2007

[43] [43]

Segmentation in the perception and memory of events,

C. A. Kurby and J. M. Zacks, “Segmentation in the perception and memory of events,” Trends in cognitive sciences , vol. 12, no. 2, pp. 72–79, 2008

work page 2008

[44] [44]

Events, event prediction, and predictive processing,

J. Hohwy, A. Hebblewhite, and T. Drummond, “Events, event prediction, and predictive processing,” Topics in cognitive science , vol. 13, no. 1, pp. 252–255, 2021

work page 2021

[45] [45]

How does the mind render streaming experience as events?

D. A. Baldwin and J. E. Kosie, “How does the mind render streaming experience as events?” Topics in Cognitive Science , vol. 13, no. 1, pp. 79–105, 2021

work page 2021

[46] [46]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, and Y . Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[47] [47]

Social gan: Socially acceptable trajectories with generative adversarial networks,

A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan: Socially acceptable trajectories with generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2255–2264

work page 2018

[48] [48]

Bifold and semantic reasoning for pedestrian behavior prediction,

A. Rasouli, M. Rohani, and J. Luo, “Bifold and semantic reasoning for pedestrian behavior prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 15 600–15 610

work page 2021

[49] [49]

Peeking into the future: Predicting future person activities and locations in videos,

J. Liang, L. Jiang, J. C. Niebles, A. G. Hauptmann, and L. Fei-Fei, “Peeking into the future: Predicting future person activities and locations in videos,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2019, pp. 5725–5734

work page 2019

[50] [50]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778

work page 2016

[51] [51]

Imagenet: A large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition . Ieee, 2009, pp. 248–255

work page 2009

[52] [52]

Visual reasoning using graph con- volutional networks for predicting pedestrian crossing intention,

T. Chen, R. Tian, and Z. Ding, “Visual reasoning using graph con- volutional networks for predicting pedestrian crossing intention,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3103–3109

work page 2021