EgoTraj: Real-World Egocentric Human Trajectory Dataset for Multimodal Prediction
Pith reviewed 2026-05-20 11:08 UTC · model grok-4.3
The pith
EgoTraj introduces 75 real-world sequences of egocentric urban navigation with synchronized head poses, gaze, and scene data to support multimodal trajectory prediction.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EgoTraj consists of 75 sequences of human navigation collected from multiple Meta Quest Pro wearers in real-world urban environments, providing synchronized RGB video together with ground-truth continuous time-synchronized 6-degree-of-freedom head poses, per-frame 3D eye gaze vectors, and scene annotations. To the best of our knowledge, it differs from typical egocentric trajectory datasets by capturing long-horizon, self-directed navigation across diverse urban routes with broad participant diversity.
What carries the argument
The EgoTraj dataset, which supplies synchronized multimodal signals from real urban navigation to train and evaluate trajectory prediction models.
If this is right
- Prediction models gain access to combined gaze, scene, and motion cues that ablation studies show each improve performance.
- The dataset enables direct benchmarking of state-of-the-art egocentric trajectory methods on long-horizon real-world data.
- Applications in AR perception, navigation assistance, and humanoid robotics obtain a public resource for development and testing.
- Open release of sequences, code, and the EgoViz Dashboard allows community extension of the multimodal approach.
Where Pith is reading between the lines
- The long-horizon nature of the sequences could support training of predictors that operate over longer time windows than most current short-term models.
- Broad participant diversity may help future systems generalize across different walking styles and body types without additional data collection.
- Integration of EgoTraj with existing third-person or simulated trajectory datasets could produce hybrid training regimes that combine real egocentric signals with scale.
Load-bearing premise
The ground-truth 6DoF head poses, eye gaze vectors, and scene annotations provided by the Meta Quest Pro are sufficiently accurate and time-synchronized for training and evaluating trajectory prediction models.
What would settle it
A test in which models trained on EgoTraj produce higher error rates than models trained on prior datasets when evaluated on held-out real urban walks, or direct measurements revealing significant timing offsets or pose inaccuracies in the released ground-truth tracks.
Figures
read the original abstract
Accurately forecasting human trajectories from an egocentric perspective plays a central role in applications such as humanoid robotics, wearable sensing systems, and assistive navigation. However, progress in this direction remains limited due to the scarcity of egocentric trajectory datasets collected in real-world environments. Addressing this need, we introduce EgoTraj, an egocentric multimodal open dataset recorded using Meta Quest Pro (MQPro). EgoTraj contains 75 sequences of human navigation collected from multiple MQPro wearers in real-world urban environments. Each recording provides synchronized RGB video along with ground-truth data, including continuous time-synchronized 6-degree-of-freedom head poses, per-frame 3D eye gaze vectors, scene annotations. To the best of our knowledge, EgoTraj differs from typical egocentric trajectory datasets by capturing long-horizon, self-directed navigation across diverse urban routes with broad participant diversity. To demonstrate the potential of the dataset, we benchmark several state-of-the-art methods for egocentric trajectory prediction and conduct ablation studies to analyze the contributions of gaze, scene, and motion cues. The results highlight the utility of EgoTraj for AR-based perception, navigation, and assistive systems. The EgoTraj dataset, code, and EgoViz Dashboard are publicly available at https://github.com/yehiahmad/EgoTraj.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces EgoTraj, an egocentric multimodal dataset of 75 real-world urban navigation sequences recorded with Meta Quest Pro headsets. Each sequence supplies synchronized RGB video together with ground-truth 6DoF head poses, per-frame 3D eye-gaze vectors, and scene annotations. The authors benchmark several existing trajectory-prediction methods and run ablations that isolate the contribution of gaze, scene, and motion cues.
Significance. If the supplied ground-truth labels are shown to be sufficiently accurate and synchronized, the dataset would be a useful addition for multimodal egocentric prediction research, particularly because it targets long-horizon, self-directed routes with participant diversity. The public release of raw data, code, and the EgoViz Dashboard supports reproducibility without introducing new fitted parameters or circular derivations.
major comments (1)
- [Dataset Collection / Abstract] Dataset Collection / Abstract: the central claim that MQPro supplies usable ground-truth 6DoF head poses, 3D eye-gaze vectors, and scene annotations for training and evaluating trajectory models is not supported by any reported accuracy statistics, drift measurements, or external validation for long outdoor sequences under varying illumination and motion. Consumer headsets are known to accumulate error in GPS-denied conditions; without per-sequence error metrics the utility of the released labels cannot be assessed.
minor comments (2)
- [Abstract] The total recording duration and aggregate path length across the 75 sequences should be stated explicitly so readers can judge scale.
- [Experiments] Ablation tables would be clearer if the exact input modalities supplied to each baseline method were listed in a single summary table.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on the validation of the ground-truth labels. We address the major comment below and describe the changes we will make to the manuscript.
read point-by-point responses
-
Referee: the central claim that MQPro supplies usable ground-truth 6DoF head poses, 3D eye-gaze vectors, and scene annotations for training and evaluating trajectory models is not supported by any reported accuracy statistics, drift measurements, or external validation for long outdoor sequences under varying illumination and motion. Consumer headsets are known to accumulate error in GPS-denied conditions; without per-sequence error metrics the utility of the released labels cannot be assessed.
Authors: We agree that the absence of quantitative accuracy statistics and drift measurements limits the ability to fully assess label utility. The manuscript presents the 6DoF poses and gaze vectors as provided by the Meta Quest Pro's built-in tracking without additional external validation, which is a limitation for long outdoor sequences. In the revised manuscript we will add a dedicated subsection under Dataset Collection that (1) cites prior work on Quest Pro and similar SLAM-based tracking accuracy in outdoor/GPS-denied settings, (2) discusses expected drift behavior over long horizons, and (3) includes qualitative observations from our sequences regarding tracking stability under varying illumination. We will also add a short clarifying sentence in the abstract and a limitations paragraph. These additions will increase transparency without altering the core dataset release. We cannot supply per-sequence numerical error metrics, as that would require new experiments with external reference systems that were not part of the original collection protocol. revision: partial
- We cannot provide per-sequence quantitative error metrics or external validation results without conducting additional data collection using high-precision reference systems, which is not feasible for this real-world outdoor dataset.
Circularity Check
Dataset release paper with external benchmarks exhibits no derivation chain
full rationale
The manuscript introduces and releases the EgoTraj dataset of 75 real-world sequences captured via Meta Quest Pro hardware, then evaluates existing trajectory-prediction algorithms on it. No first-principles derivation, fitted parameter, or mathematical claim is advanced whose output reduces to its own inputs by construction. The central contribution is the data collection and public release itself, which stands independently of any self-referential loop. External benchmarks and ablation studies further anchor the work outside any internal redefinition.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Meta Quest Pro provides sufficiently accurate and time-synchronized 6DoF head poses and eye gaze vectors for research use
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
EgoTraj contains 75 sequences of human navigation collected from multiple MQPro wearers... synchronized 6DoF head pose, per-frame 3D eye gaze vectors, scene annotations.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Ablation studies to analyze the contributions of gaze, scene, and motion cues.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Alahi, A., Goel, K., Ramanathan, V ., Robicquet, A., Fei-Fei, L., Savarese, S.: Social lstm: Human trajectory pre- diction in crowded spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 961–971 (2016)
work page 2016
-
[2]
In: IEEE Intelligent Vehicles Symposium
Bock, J., Krajewski, R., Moers, T., Runde, S., Vater, L., Eckstein, L.: The ind dataset: A drone dataset of naturalistic road user trajectories at german intersections. In: IEEE Intelligent Vehicles Symposium. pp. 1929–
work page 1929
-
[3]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Caesar, H., Bankiti, V ., Lang, A.H., V ora, S., Liong, V .E., Xu, Q., Krishnan, A., Pan, Y ., Baldan, G., Beijbom, O.: nuscenes: A multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11621–11631 (2020)
work page 2020
-
[4]
In: IEEE International Conference on Robotics and Automation
Chen, C., Liu, Y ., Kreiss, S., Alahi, A.: Crowd-robot interaction: Crowd-aware robot navigation with attention- based deep reinforcement learning. In: IEEE International Conference on Robotics and Automation. pp. 6015–
-
[5]
In: IEEE/CVF Winter Conference on Applications of Computer Vision
Escobar, M., Puentes, J., Forigua, C., Pont-Tuset, J., Maninis, K.K., Arbelaez, P.: Egocast: Forecasting egocentric human pose in the wild. In: IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 5831–5841. IEEE (2025)
work page 2025
-
[6]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Ettinger, S., Cheng, S., Caine, B., Liu, C., Zhao, H., Pradhan, S., Chai, Y ., Sapp, B., Qi, C.R., Zhou, Y ., et al.: Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9710–9719 (2021)
work page 2021
-
[7]
https://blog.google/ outreach-initiatives/accessibility/project-guideline/(2021)
Google: Project guideline: Enabling those with low vision to run independently. https://blog.google/ outreach-initiatives/accessibility/project-guideline/(2021)
work page 2021
-
[8]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., Hamburger, J., Jiang, H., Liu, M., Liu, X., et al.: Ego4d: Around the world in 3,000 hours of egocentric video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18995–19012 (2022)
work page 2022
-
[9]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Grauman, K., Westbury, A., Torresani, L., Kitani, K., Malik, J., Afouras, T., Ashutosh, K., Baiyya, V ., Bansal, S., Boote, B., et al.: Ego-exo4d: Understanding skilled human activity from first- and third-person perspectives. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 19383–19400 (2024)
work page 2024
-
[10]
Behavior research methods56(7), 7307–7330 (2024)
Hermens, F.: Automatic object detection for behavioural research using yolov8. Behavior research methods56(7), 7307–7330 (2024)
work page 2024
-
[11]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Hu, Y ., Yang, J., Chen, L., Li, K., Sima, C., Zhu, X., Chai, S., Du, S., Lin, T., Wang, W., et al.: Planning-oriented autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17853–17862 (2023)
work page 2023
-
[12]
In: Conference on Robot Learning
Jain, A., Casas, S., Liao, R., Xiong, Y ., Feng, S., Segal, S., Urtasun, R.: Discrete residual flow for probabilistic pedestrian behavior prediction. In: Conference on Robot Learning. pp. 407–419. PMLR (2020)
work page 2020
-
[13]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Jain, J., Li, J., Chiu, M.T., Hassani, A., Orlov, N., Shi, H.: Oneformer: One transformer to rule universal image segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2989–2998 (2023)
work page 2023
-
[14]
In: Proceedings of the CHI Conference on Human Factors in Computing Systems
Kacorri, H., Kitani, K.M., Bigham, J.P., Asakawa, C.: People with visual impairment training personal object recognizers: Feasibility and challenges. In: Proceedings of the CHI Conference on Human Factors in Computing Systems. pp. 5839–5849 (2017)
work page 2017
-
[15]
IEEE Robotics and Automation Letters7(4), 11807–11814 (2022) 12
Karnan, H., Nair, A., Xiao, X., Warnell, G., Pirk, S., Toshev, A., Hart, J., Biswas, J., Stone, P.: Socially compliant navigation dataset (scand): A large-scale dataset of demonstrations for social navigation. IEEE Robotics and Automation Letters7(4), 11807–11814 (2022) 12
work page 2022
-
[16]
arXiv preprint arXiv:2412.00396 (2024)
Kim, D., Srouji, M., Chen, C., Zhang, J.: Armor: Egocentric perception for humanoid robot collision avoidance and motion planning. arXiv preprint arXiv:2412.00396 (2024)
-
[17]
Progress in Retinal and Eye Research 25(3), 296–324 (2006)
Land, M.F.: Eye movements and the control of actions in everyday life. Progress in Retinal and Eye Research 25(3), 296–324 (2006)
work page 2006
-
[18]
Lerner, A., Chrysanthou, Y ., Lischinski, D.: Crowds by example. In: Computer Graphics Forum. vol. 26, pp. 655–664. Wiley Online Library (2007)
work page 2007
-
[19]
Aria Everyday Activities Dataset,
Lv, Z., Charron, N., Moulon, P., Gamino, A., Peng, C., Sweeney, C., Miller, E., Tang, H., Meissner, J., Dong, J., et al.: Aria everyday activities dataset. arXiv preprint arXiv:2402.13349 (2024)
-
[20]
In: European Conference on Computer Vision
Ma, L., Ye, Y ., Hong, F., Guzov, V ., Jiang, Y ., Postyeni, R., Pesqueira, L., Gamino, A., Baiyya, V ., Kim, H.J., et al.: Nymeria: A massive collection of multimodal egocentric daily motion in the wild. In: European Conference on Computer Vision. pp. 445–465. Springer (2024)
work page 2024
-
[21]
IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 6688–6702 (2020)
Marchetti, F., Becattini, F., Seidenari, L., Del Bimbo, A.: Multiple trajectory prediction of moving agents with memory augmented networks. IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 6688–6702 (2020)
work page 2020
-
[22]
IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 6748–6765 (2021)
Martin-Martin, R., Patel, M., Rezatofighi, H., Shenoi, A., Gwak, J., Frankel, E., Sadeghian, A., Savarese, S.: Jrdb: A dataset and benchmark of egocentric robot visual perception of humans in built environments. IEEE Transactions on Pattern Analysis and Machine Intelligence45(6), 6748–6765 (2021)
work page 2021
-
[23]
In: European Conference on Computer Vision
Mohamed, A., Zhu, D., Vu, W., Elhoseiny, M., Claudel, C.: Social-implicit: Rethinking trajectory prediction evaluation and the effectiveness of implicit maximum likelihood estimation. In: European Conference on Computer Vision. pp. 463–479. Springer (2022)
work page 2022
-
[24]
In: IEEE/RSJ International Conference on Intelligent Robots and Systems
Nguyen, D.M., Nazeri, M., Payandeh, A., Datar, A., Xiao, X.: Toward human-like social robot navigation: A large-scale, multi-modal, social human navigation dataset. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 7442–7447. IEEE (2023)
work page 2023
-
[25]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Pan, B., Harley, A.W., Engelmann, F., Liu, C.K., Guibas, L.J.: Lookout: Real-world humanoid egocentric navigation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 24977–24988 (2025)
work page 2025
-
[26]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Pan, X., Charron, N., Yang, Y ., Peters, S., Whelan, T., Kong, C., Parkhi, O., Newcombe, R., Ren, Y .C.: Aria digital twin: A new benchmark dataset for egocentric 3d machine perception. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 20133–20143 (2023)
work page 2023
-
[27]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Park, H.S., Hwang, J.J., Niu, Y ., Shi, J.: Egocentric future localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4697–4705 (2016)
work page 2016
-
[28]
In: Proceedings of the IEEE International Conference on Computer Vision
Pellegrini, S., Ess, A., Schindler, K., Van Gool, L.: You’ll never walk alone: Modeling social behavior for multi-target tracking. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 261–268. IEEE (2009)
work page 2009
-
[29]
In: IEEE International Conference on Robotics and Automation
Peng, C., Paredes, V ., Castillo, G.A., Hereid, A.: Real-time safe bipedal robot navigation using linear discrete control barrier functions. In: IEEE International Conference on Robotics and Automation. pp. 14903–14909. IEEE (2025)
work page 2025
-
[30]
IEEE Robotics and Automation Letters7(4), 8799–8806 (2022)
Qiu, J., Chen, L., Gu, X., Lo, F.P.W., Tsai, Y .Y ., Sun, J., Liu, J., Lo, B.: Egocentric human trajectory forecasting with a wearable camera and multi-modal fusion. IEEE Robotics and Automation Letters7(4), 8799–8806 (2022)
work page 2022
-
[31]
arXiv preprint arXiv:2511.17581 (2025)
Qiu, Z., Liu, Z., Niu, W., Bhattacharjee, T., Kalantari, S.: Egocognav: Cognition-aware human egocentric navigation. arXiv preprint arXiv:2511.17581 (2025)
-
[32]
Raina, N., Somasundaram, G., Zheng, K., Miglani, S., Saarinen, S., Meissner, J., Schwesinger, M., Pesqueira, L., Prasad, I., Miller, E., Gupta, P., Yan, M., Newcombe, R., Ren, C., Parkhi, O.: Egoblur model (2023)
work page 2023
-
[33]
In: European Conference on Computer Vision
Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: Human trajectory understanding in crowded scenes. In: European Conference on Computer Vision. pp. 549–565. Springer (2016)
work page 2016
-
[34]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Shi, L., Wang, L., Zhou, S., Hua, G.: Trajectory unified transformer for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9675–9684 (2023)
work page 2023
-
[35]
In: IEEE Winter Conference on Applications of Computer Vision
Singh, K.K., Fatahalian, K., Efros, A.A.: Krishnacam: Using a longitudinal, single-person, egocentric dataset for scene understanding tasks. In: IEEE Winter Conference on Applications of Computer Vision. pp. 1–9. IEEE (2016)
work page 2016
-
[36]
In: Proceedings of the ACM International Symposium on Wearable Computers
Tang, T.J., Li, W.H.: An assistive eyewear prototype that interactively converts 3d object locations into spatial audio. In: Proceedings of the ACM International Symposium on Wearable Computers. pp. 119–126 (2014) 13
work page 2014
-
[37]
In: IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems
Tian, Y ., Liu, Y ., Tan, J.: Wearable navigation system for the blind people in dynamic environments. In: IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems. pp. 153–158. IEEE (2013)
work page 2013
-
[38]
In: IEEE International Conference on Robotics and Automation
Wang, H.C., Katzschmann, R.K., Teng, S., Araki, B., Giarré, L., Rus, D.: Enabling independent navigation for visually impaired people through a wearable vision-based feedback system. In: IEEE International Conference on Robotics and Automation. pp. 6533–6540. IEEE (2017)
work page 2017
-
[39]
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Wang, P., Bai, S., Tan, S., Wang, S., Fan, Z., Bai, J., Chen, K., Liu, X., Wang, J., Ge, W., et al.: Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv preprint arXiv:2409.12191 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[40]
arXiv preprint arXiv:2512.05270 (2025)
Wang, T., Byeon, J., Yehia, A., Wang, H., Xu, Y ., Zeng, T., Wang, Z., Jiao, J., Claudel, C.: Xr-dt: Extended reality-enhanced digital twin for agentic mobile robots. arXiv preprint arXiv:2512.05270 (2025)
-
[41]
Karen Liu, and Monroe Kennedy III
Wang, W., Liu, C.K., Kennedy III, M.: Egonav: Egocentric scene-aware human trajectory prediction. arXiv preprint arXiv:2403.19026 (2024)
-
[42]
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Yagi, T., Mangalam, K., Yonetani, R., Sato, Y .: Future person localization in first-person videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 7593–7602 (2018)
work page 2018
-
[43]
Advances in Neural Information Processing Systems37, 21875–21911 (2024)
Yang, L., Kang, B., Huang, Z., Zhao, Z., Xu, X., Feng, J., Zhao, H.: Depth anything v2. Advances in Neural Information Processing Systems37, 21875–21911 (2024)
work page 2024
-
[44]
arXiv preprint arXiv:2512.05299 (2025)
Yehia, A., Byeon, J., Wang, T., Wang, H., Xu, Y ., Jiao, J., Claudel, C.: Arcas: An augmented reality collision avoidance system with slam-based tracking for enhancing vru safety. arXiv preprint arXiv:2512.05299 (2025)
-
[45]
In: European Conference on Computer Vision
Zheng, W., Song, R., Guo, X., Zhang, C., Chen, L.: Genad: Generative end-to-end autonomous driving. In: European Conference on Computer Vision. pp. 87–104. Springer (2024) 14 Appendix A Capture Setup and Recording Details All data was collected using the Meta Quest Pro headset (MQPro), a mixed-reality device equipped with integrated eye-tracking cameras, ...
work page 2024
-
[46]
Initialize the Unity recording application on the Meta Quest Pro headset
-
[47]
The application creates a new session directory and prepares the sensor logging files
-
[48]
The participant wears the headset and selects the two waypoints in the recording environment
-
[49]
The participant triggers the start of the session using the controller A button, which activates thestart_signal and begins both sensor logging and video recording
-
[50]
During the session, the headset’s wearer navigates naturally through the environment while the system records synchronized RGB video, head pose, and gaze measurements
-
[51]
When the recording is complete, the participant triggers the controller B button, which sends thestop_signal and terminates both processes
-
[52]
The recorded data are exported as synchronized sensor logs and video files for subsequent preprocessing. A.4 Data Processing After data collection, the recorded multimodal streams were processed using a custom preprocessing pipeline designed to synchronize and organize the data into a unified dataset format. The preprocessing workflow is illustrated in Fi...
-
[53]
Identify the environmental context (e.g., crosswalk, sidewalk, intersection)
-
[54]
Detect nearby dynamic agents such as pedestrians, vehicles, or cyclists
-
[55]
Analyze traffic signals, obstacles, or navigation constraints
-
[56]
Incorporate the projected gaze as an indicator of the user’s attention
-
[57]
Infer the likely short-term motion or navigation intent of the camera wearer. D.2 Annotation Quality Evaluation To verify the reliability of the generated annotations (Figure 12), we evaluated the pipeline using several complementary metrics on the same 100-frame stratified sample. Structural compliance measured whether each annotation follows the predefi...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.