Eye Gaze-Informed and Context-Aware Pedestrian Trajectory Prediction in Shared Spaces with Automated Shuttles: A Virtual Reality Study

· 2026 · cs.LG · arXiv 2603.19812

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

To address this gap, we conduct a Virtual Reality experiment in which pedestrians interact with automated shuttles under varying approach angles (45{\deg}, 90{\deg}, 135{\deg}) and continuous-traffic conditions (single shuttle, two shuttles with 3 or 5-second gaps), collecting synchronized motion, eye gaze, and head orientation data. To investigate to what extent, under what conditions, and in what form fine-grained eye gaze is informative for pedestrian motion prediction, we develop a multi-modal prediction model that fuses these signals through modality-specific encoders, and systematically ablate gaze representations against head orientation and situational context. We report three main results. First, the predictive value of eye gaze is angle-dependent and tightly coupled with eye-head-body coordination: at acute angles where pedestrians actively redirect gaze to acquire the shuttle, eye gaze carries information that head orientation alone misses. Second, continuous gaze orientation outperforms categorical semantic fixation labels, with the optimal encoding frame (global or body-relative) depending on whether gaze is used alone or jointly with context. Third, eye gaze and situational context provide complementary predictive information: their combination reduces final displacement error (FDE) by 8.47%, close to the sum of their individual contributions. Together, these findings highlight the value of incorporating human perceptual signals into pedestrian behavior prediction and motivate a human-centered complement to vehicle-centric modeling approaches. Our code is available at https://github.com/danyayay/GazeX.git.

representative citing papers

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

cs.CV · 2026-06-08 · unverdicted · novelty 5.0

Fine-tuned VLMs guided by eye gaze and ego motion achieve 14.5% accuracy improvement over a transformer baseline for egocentric pedestrian intent decoding.

citing papers explorer

Showing 1 of 1 citing paper.

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models cs.CV · 2026-06-08 · unverdicted · none · ref 49 · internal anchor
Fine-tuned VLMs guided by eye gaze and ego motion achieve 14.5% accuracy improvement over a transformer baseline for egocentric pedestrian intent decoding.

Eye Gaze-Informed and Context-Aware Pedestrian Trajectory Prediction in Shared Spaces with Automated Shuttles: A Virtual Reality Study

fields

years

verdicts

representative citing papers

citing papers explorer