Hazard-Aware Traffic Scene Graph Generation

Julie Stephany Berrio; Mao Shan; Stewart Worrall; Yaoqi Huang

arxiv: 2603.03584 · v2 · submitted 2026-03-03 · 💻 cs.CV

Hazard-Aware Traffic Scene Graph Generation

Yaoqi Huang , Julie Stephany Berrio , Mao Shan , Stewart Worrall This is my paper

Pith reviewed 2026-05-15 16:10 UTC · model grok-4.3

classification 💻 cs.CV

keywords Traffic Scene Graph GenerationHazard-Aware ReasoningEgo-Centric Scene UnderstandingAutonomous DrivingDepth CuesAccident DataSafety Relevance

0 comments

The pith

Traffic scene graphs capture hazard relations to the ego vehicle by supplementing visual features with accident data and depth cues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Traffic Scene Graph Generation as a new task to model traffic-specific relations between prominent hazards and the ego vehicle. Generic spatial predicates used in existing scene graphs do not address safety relevance in driving scenarios. The proposed framework augments visual features and semantic information with traffic accident data and depth cues to reason about how hazards affect the ego vehicle. This produces output graphs that color-code hazard severity and note effect mechanisms along with relative locations. Evaluations on Cityscapes relational annotations across ten tasks from five perspectives show the framework's capacity for ego-centric hazard-aware understanding.

Core claim

The central claim is that supplementing visual features and semantic information with traffic accident data and depth cues enables generation of traffic scene graphs that represent safety-relevant relations between prominent hazards and the ego vehicle, with outputs that color-code severity and notate effect mechanisms and locations.

What carries the argument

The framework that explicitly supplements visual features and semantic information with traffic accident data and depth cues to reason about safety-relevance and generate ego-centric scene graphs.

If this is right

The graphs stress prominent hazards through color-coding of their severity.
They notate the effect mechanism and relative location to the ego vehicle.
The outputs supply intuitive guidelines for situational awareness in driving.
Comparative experiments and ablation studies confirm gains in ego-centric reasoning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such graphs could be fed directly into vehicle planning systems to prioritize collision avoidance.
The method could extend to video input for tracking how hazards evolve over time.
Similar supplementation techniques might apply to safety modeling in other dynamic environments like pedestrian zones or intersections.

Load-bearing premise

That traffic accident data and depth cues supply the additional information needed to reason about safety-relevance in ways that visual features and semantics alone cannot.

What would settle it

A model relying only on visual features and semantic information matches or exceeds performance on the hazard-relation tasks without using accident data or depth cues.

Figures

Figures reproduced from arXiv: 2603.03584 by Julie Stephany Berrio, Mao Shan, Stewart Worrall, Yaoqi Huang.

**Figure 1.** Figure 1: Overview of our Hazard-Aware Traffic Scene Graph Generation (HATS) model. The main scene graph branch (top) comprises three modules: 1) a Panoptic Segmentation (PS) Module for holistic perception of the surrounding environment, 2) an Ego-path Related Entities Selection (ERES) module that identifies and selects relevant candidate entities, and 3) a Traffic Scene Graph Generation (TSGG) module that computes … view at source ↗

**Figure 2.** Figure 2: Inference performance vs. training set size (5%–80% of total training set). For each size, five models were trained with five-fold splits, with 20% of training images held out for validation per fold. CAIS, VAIS, MAIS, DAMSEV, CONSEQ, TREATMENT, and ROLLINITYP. The aligned pair query h ′ pair attends to the specific node embeddings hνtp in each group tp, producing a compact type-specific prior vector. All … view at source ↗

**Figure 3.** Figure 3: Our ego-centric hazard-aware TSGs of traffic images queries, including irrelevant entities such as sky and distant parked cars are passed to TSGG heads with uniform weights, overwhelming relation prediction with unstructured context and hindering the learning of discriminative representations, regardless of feature richness. HATS w/o KGE results suggest the significance of the structured KGE priors in sepa… view at source ↗

read the original abstract

Maintaining situational awareness in complex driving scenarios is challenging. It requires continuously prioritizing attention among extensive scene entities and understanding how prominent hazards might affect the ego vehicle. While existing studies excel at detecting specific semantic categories and visually salient regions, they lack the ability to assess safety-relevance. Meanwhile, the generic spatial predicates either for foreground objects only or for all scene entities modeled by existing scene graphs are inadequate for driving scenarios. To bridge this gap, we introduce a novel task, Traffic Scene Graph Generation, which captures traffic-specific relations between prominent hazards and the ego vehicle. We propose a novel framework that explicitly uses traffic accident data and depth cues to supplement visual features and semantic information for reasoning. The output traffic scene graphs provide intuitive guidelines that stress prominent hazards by color-coding their severity and notating their effect mechanism and relative location to the ego vehicle. We create relational annotations on Cityscapes dataset and evaluate our model on 10 tasks from 5 perspectives. The results in comparative experiments and ablation studies demonstrate our capacity in ego-centric reasoning for hazard-aware traffic scene understanding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a new task for hazard-aware traffic scene graphs in driving and proposes fusing accident data plus depth cues with visual features, but the abstract shows no numbers so the gains are hard to judge.

read the letter

The core move is introducing Traffic Scene Graph Generation as a task that targets safety-relevant relations between prominent hazards and the ego vehicle rather than generic spatial predicates. The framework adds traffic accident data and depth cues to standard visual and semantic features, then produces graphs that color-code severity and note effect mechanisms and relative locations. They also release new relational annotations on Cityscapes and run evaluations across 10 tasks from five perspectives with comparative runs and ablations. That evaluation breadth is a practical strength if the numbers back the claims about improved ego-centric reasoning. The approach is internally consistent: it starts from the observation that existing scene graphs miss driving-specific safety relations and supplies the missing annotations and inputs to address it. The output format with severity coloring could be directly useful for downstream planning modules. The main limitation is that the abstract contains no quantitative results, error bars, or baseline deltas, so it is impossible to tell how much the added data sources actually move the needle or whether the fusion introduces noise. The assumption that accident statistics will cleanly improve hazard prioritization is plausible but needs the full tables and integration details to assess. This work is aimed at researchers already doing scene graph generation or relation modeling for autonomous driving who want to inject safety priors. A reader focused on safety-critical perception would find the task definition and annotation effort useful even before the performance numbers are scrutinized. I would send it for peer review because the task is new, the evaluation plan is broad, and the central construction does not contain obvious circularity.

Referee Report

2 major / 2 minor

Summary. The paper introduces the novel task of Traffic Scene Graph Generation, which models traffic-specific relations between prominent hazards and the ego vehicle rather than generic spatial predicates. It proposes a framework that supplements visual features and semantic information with traffic accident data and depth cues for safety-relevance reasoning. Relational annotations are added to the Cityscapes dataset, and the approach is evaluated on 10 tasks across 5 perspectives, with comparative experiments and ablation studies presented as evidence of improved ego-centric hazard-aware scene understanding.

Significance. If the quantitative results hold, the work has moderate significance for computer vision in autonomous driving: it shifts scene graph generation from generic relations to hazard-aware, ego-centric ones and demonstrates a concrete way to incorporate external accident statistics and depth for safety prioritization. The output graphs with color-coded severity and effect mechanisms could directly inform attention mechanisms in driving systems. Strengths include the creation of new annotations and the multi-perspective evaluation setup.

major comments (2)

[Abstract / Evaluation] Abstract and Evaluation section: the claim that 'comparative experiments and ablation studies demonstrate our capacity' is not accompanied by any reported metrics, baselines, numerical improvements, or error bars. Without these, the central empirical claim that the supplementation of accident data and depth cues improves safety-relevance modeling cannot be assessed and is load-bearing for the paper's contribution.
[Method] Method section: the framework is described as explicitly using traffic accident data and depth cues to supplement visual features, yet no equations, fusion architecture, or encoding details are provided for how these external sources are integrated (e.g., as additional input channels, loss terms, or pre-training signals). This omission prevents verification that the approach is not simply concatenating features and undermines reproducibility of the reported gains.

minor comments (2)

[Abstract / Results] The output description states that graphs 'stress prominent hazards by color-coding their severity and notating their effect mechanism and relative location'; clarify whether these visualizations are generated automatically by the model or added post-hoc, and specify the exact color/notation scheme used in the figures.
[Evaluation] The paper mentions evaluation 'on 10 tasks from 5 perspectives' but does not list what the tasks or perspectives are (e.g., relation prediction, hazard detection, depth-aware reasoning). Adding an explicit enumeration or table would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and recommendation of minor revision. We address the two major comments below and will incorporate the requested details into the revised manuscript to strengthen the presentation of our empirical results and methodological contributions.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and Evaluation section: the claim that 'comparative experiments and ablation studies demonstrate our capacity' is not accompanied by any reported metrics, baselines, numerical improvements, or error bars. Without these, the central empirical claim that the supplementation of accident data and depth cues improves safety-relevance modeling cannot be assessed and is load-bearing for the paper's contribution.

Authors: We agree that the abstract and evaluation summary currently lack explicit numerical metrics, baselines, and error bars, which limits immediate assessment of the claims. In the full manuscript, Section 4 contains the comparative experiments and ablation studies with these details, but they are not highlighted in the abstract or evaluation overview. We will revise the abstract to report key quantitative results (e.g., accuracy gains on hazard-aware relation prediction) and add a concise summary table in the evaluation section that includes baselines, numerical improvements, and standard deviations or error bars. This will directly substantiate the safety-relevance improvements. revision: yes
Referee: [Method] Method section: the framework is described as explicitly using traffic accident data and depth cues to supplement visual features, yet no equations, fusion architecture, or encoding details are provided for how these external sources are integrated (e.g., as additional input channels, loss terms, or pre-training signals). This omission prevents verification that the approach is not simply concatenating features and undermines reproducibility of the reported gains.

Authors: We acknowledge this gap in the current method description. While the framework integrates accident statistics and depth cues via a dedicated multi-modal fusion module (beyond naive concatenation), the manuscript does not provide the explicit equations or architectural diagrams. In the revision, we will add a new subsection with mathematical formulations for the encoding of accident data (as statistical priors) and depth cues (via disparity maps), the fusion operation (e.g., gated cross-attention), and any auxiliary loss terms used during training. This will clarify the integration mechanism and support reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces a new task (Traffic Scene Graph Generation) and a framework that fuses external traffic accident data plus depth cues with visual features. No equations, fitted parameters, or derivations appear that reduce any claimed prediction to its own inputs by construction. The central claims rest on new annotations, comparative experiments, and ablations rather than self-citation chains or renamed known results. The argument is internally consistent without load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on abstract; no explicit free parameters, axioms, or invented entities are detailed beyond the new task definition itself.

axioms (1)

domain assumption Traffic accident data and depth cues can supplement visual features and semantic information to enable hazard reasoning.
Invoked as the basis for the proposed framework.

invented entities (1)

Hazard-aware traffic scene graph no independent evidence
purpose: To capture traffic-specific relations between prominent hazards and the ego vehicle with severity and effect notation.
New representation introduced to address the identified gap in existing scene graphs.

pith-pipeline@v0.9.0 · 5482 in / 1204 out tokens · 50256 ms · 2026-05-15T16:10:40.177946+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The TSGG module produces ego-centric graphs by integrating visual features, 3D structural cues, semantic information, and prior knowledge... gated fusion strategy combines these multiple cues into a robust ego–entity pair descriptor.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We stack L layers of relation-aware message passing... qualifier vector h_qual_ε... FiLM-based qualifier-aware message passing

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

Perceived risk vs actual driving performance during distracted driving: A comparative analysis of phone use and other secondary distractions,

P. Choudhary, A. Gupta, and N. R. Velaga, “Perceived risk vs actual driving performance during distracted driving: A comparative analysis of phone use and other secondary distractions,”Transp. Res. Part F: Traffic Psychol. and Behav., vol. 86, pp. 296–315, apr 2022

work page 2022
[2]

Geneva: World Health Org., 2011

World Health Org.,Mobile phone use: A growing problem of driver distraction. Geneva: World Health Org., 2011

work page 2011
[3]

Reading during fully automated driving: A study of the effect of peripheral vis. and haptic inf. on situation awareness and mental workload,

N. M. Yusof, J. Karjanto, M. Z. Hassan, J. Terken, F. Delbressine, and M. Rauterberg, “Reading during fully automated driving: A study of the effect of peripheral vis. and haptic inf. on situation awareness and mental workload,”IEEE Trans. on Intell. Transp. Syst., vol. 23, no. 10, pp. 19136–19144, 2022

work page 2022
[4]

Preparing a nation for autonomous vehicles: Opportunities, barriers and policy recommendations,

D. Fagnant and K. Kockelman, “Preparing a nation for autonomous vehicles: Opportunities, barriers and policy recommendations,”Transp. Res. Part A: Policy and Pract., vol. 77, 07 2015

work page 2015
[5]

SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing ,

X. Yang, J. Yan, W. Liao, X. Yang, J. Tang, and T. He, “ SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing ,”IEEE Trans. on Pattern Anal. & Mach. Intell., vol. 45, pp. 2384–2399, Feb. 2023

work page 2023
[6]

Panoptic segmentation,

A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, “Panoptic segmentation,” inCVPR, pp. 9396–9405, 2019

work page 2019
[7]

Hazards prioritization with cogn. attention maps for supporting driving decision-making,

Y . Huang and X. Wang, “Hazards prioritization with cogn. attention maps for supporting driving decision-making,”IEEE Trans. on Intell. Transp. Syst., vol. 25, no. 11, pp. 16221–16234, 2024

work page 2024
[8]

Scene-graph augmented data-driven risk assess- ment of auton. vehicle decisions,

S.-Y . Yu, A. V . Malawade, D. Muthirayan, P. P. Khargonekar, and M. A. A. Faruque, “Scene-graph augmented data-driven risk assess- ment of auton. vehicle decisions,”IEEE Trans. on Intell. Transp. Syst., vol. 23, no. 7, pp. 7941–7951, 2022

work page 2022
[9]

Vis. genome: Connecting lang. and vision using crowdsourced dense image annotations,

R. Krishna, Y . Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y . Kalantidis, L.-J. Li, D. A. Shamma, M. S. Bernstein, and L. Fei- Fei, “Vis. genome: Connecting lang. and vision using crowdsourced dense image annotations,”Int. J. Comput. Vision, vol. 123, p. 32–73, May 2017

work page 2017
[10]

N. H. T. S. Admin.,NHTSA Field Crash Investigation 2021 Coding and Editing Manual. Nat. Highway Traffic Saf. Admin., 2022

work page 2021
[11]

G. A. Radja, E.-Y . Noh, and F. Zhang,Crash Investigation Sampling System 2021 Analytical User’s Manual. Nat. Highway Traffic Saf. Admin., 2022

work page 2021
[12]

Image retrieval using scene graphs,

J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei, “Image retrieval using scene graphs,” in CVPR, pp. 3668–3678, 2015

work page 2015
[13]

Vis. relationship detection with lang. priors,

C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei, “Vis. relationship detection with lang. priors,” inEur. Conf. on Comput. Vision, 2016

work page 2016
[14]

Knowl.-embedded routing network for scene graph gener.,

T. Chen, W. Yu, R. Chen, and L. Lin, “Knowl.-embedded routing network for scene graph gener.,” inCVPR, pp. 6156–6164, 2019

work page 2019
[15]

Panoptic scene graph gener.,

J. Yang, Y . Z. Ang, Z. Guo, K. Zhou, W. Zhang, and Z. Liu, “Panoptic scene graph gener.,” inECCV, 2022

work page 2022
[16]

Reltr: Relation transformer for scene graph generation,

Y . Cong, M. Y . Yang, and B. Rosenhahn, “Reltr: Relation transformer for scene graph generation,”IEEE Trans. on Pattern Anal. and Mach. Intell., 2023

work page 2023
[17]

Rsg-net: Towards rich sematic relationship prediction for intell. vehicle in complex environments,

Y . Tian, A. Carballo, R. Li, and K. Takeda, “Rsg-net: Towards rich sematic relationship prediction for intell. vehicle in complex environments,” inIV, pp. 546–552, 2021

work page 2021
[18]

Learning 3d-aware ego- centric spatial-temporal interaction via graph convolutional networks,

C. Li, Y . Meng, S. H. Chan, and Y .-T. Chen, “Learning 3d-aware ego- centric spatial-temporal interaction via graph convolutional networks,” inICRA, pp. 8418–8424, 2020

work page 2020
[19]

Vis. traffic knowl. graph gener. from scene images,

Y . Guo, F. Yin, X.-H. Li, X. Yan, T. Xue, S. Mei, and C.-L. Liu, “Vis. traffic knowl. graph gener. from scene images,” in2023 IEEE/CVF Int. Conf. on Comput. Vision (ICCV), pp. 21547–21556, 2023

work page 2023
[20]

Toward driving scene understanding: A paradigm and benchmark dataset for ego- centric traffic scene graph representation,

Y . Zhou, Y . Zhang, Z. Zhao, K. Zhang, and C. Gou, “Toward driving scene understanding: A paradigm and benchmark dataset for ego- centric traffic scene graph representation,”IEEE J. of Radio Freq. Identification, vol. 6, pp. 962–967, 2022

work page 2022
[21]

Wordnet: a lexical database for english,

G. A. Miller, “Wordnet: a lexical database for english,”Commun. ACM, vol. 38, p. 39–41, Nov. 1995

work page 1995
[22]

Conceptnet 5.5: an open multilingual graph of general knowledge,

R. Speer, J. Chin, and C. Havasi, “Conceptnet 5.5: an open multilingual graph of general knowledge,” inAAAI, p. 4444–4451, 2017

work page 2017
[23]

Traffic Accident Benchmark for Causality Recognit.,

T. You and B. Han, “Traffic Accident Benchmark for Causality Recognit.,” inECCV, 2020

work page 2020
[24]

Anticipating traffic accidents with adaptive loss and large-scale incident db,

T. Suzuki, H. Kataoka, Y . Aoki, and Y . Satoh, “Anticipating traffic accidents with adaptive loss and large-scale incident db,” inCVPR, pp. 3521–3529, 2018

work page 2018
[25]

Intell. traffic accident prediction model for internet of vehicles with deep learning approach,

D.-J. Lin, M.-Y . Chen, H.-S. Chiang, and P. K. Sharma, “Intell. traffic accident prediction model for internet of vehicles with deep learning approach,”IEEE Trans. on Intell. Transp. Syst., vol. 23, no. 3, pp. 2340–2349, 2022

work page 2022
[26]

Exploring the roles of social media data to identify the locations and severity of road traffic accidents,

S. Salam, M. S. Islam, F. Ahmed, L. Khan, D. Kim, N. Allo, and O. Nwariaku, “Exploring the roles of social media data to identify the locations and severity of road traffic accidents,” in2021 IEEE 4th Int. Conf. on Artif. Intell. and Knowl. Eng. (AIKE), pp. 62–71, 2021

work page 2021
[27]

Message passing for hyper-relational knowl. graphs,

M. Galkin, P. Trivedi, G. Maheshwari, R. Usbeck, and J. Lehmann, “Message passing for hyper-relational knowl. graphs,” inProc. of the 2020 Conf. on Empirical Methods in Natural Lang. Process., (Online), pp. 7346–7359, Assoc. for Comput. Linguistics, Nov. 2020

work page 2020
[28]

Masked-attention mask transformer for universal image segmenta- tion,

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmenta- tion,” inCVPR, 2022

work page 2022
[29]

Learning entity and relation embeddings for knowledge graph completion,

Y . Lin, Z. Liu, M. Sun, Y . Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” inAAAI, p. 2181–2187, 2015

work page 2015
[30]

Film: visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. de Vries, V . Dumoulin, and A. Courville, “Film: visual reasoning with a general conditioning layer,” inAAAI, 2018

work page 2018
[31]

The cityscapes dataset for semantic urban scene understanding,

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be- nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” inCVPR, 2016

work page 2016
[32]

Scene graph gener. by iterative message passing,

D. Xu, Y . Zhu, C. Choy, and L. Fei-Fei, “Scene graph gener. by iterative message passing,” inCVPR, 2017

work page 2017
[33]

Neural Motifs: Scene Graph Parsing with Global Context ,

R. Zellers, M. Yatskar, S. Thomson, and Y . Choi, “Neural Motifs: Scene Graph Parsing with Global Context ,” inCVPR, pp. 5831–5840, 2018

work page 2018
[34]

Learning to compose dynamic tree structures for vis. contexts,

K. Tang, H. Zhang, B. Wu, W. Luo, and W. Liu, “Learning to compose dynamic tree structures for vis. contexts,” inCVPR, 2019

work page 2019

[1] [1]

Perceived risk vs actual driving performance during distracted driving: A comparative analysis of phone use and other secondary distractions,

P. Choudhary, A. Gupta, and N. R. Velaga, “Perceived risk vs actual driving performance during distracted driving: A comparative analysis of phone use and other secondary distractions,”Transp. Res. Part F: Traffic Psychol. and Behav., vol. 86, pp. 296–315, apr 2022

work page 2022

[2] [2]

Geneva: World Health Org., 2011

World Health Org.,Mobile phone use: A growing problem of driver distraction. Geneva: World Health Org., 2011

work page 2011

[3] [3]

Reading during fully automated driving: A study of the effect of peripheral vis. and haptic inf. on situation awareness and mental workload,

N. M. Yusof, J. Karjanto, M. Z. Hassan, J. Terken, F. Delbressine, and M. Rauterberg, “Reading during fully automated driving: A study of the effect of peripheral vis. and haptic inf. on situation awareness and mental workload,”IEEE Trans. on Intell. Transp. Syst., vol. 23, no. 10, pp. 19136–19144, 2022

work page 2022

[4] [4]

Preparing a nation for autonomous vehicles: Opportunities, barriers and policy recommendations,

D. Fagnant and K. Kockelman, “Preparing a nation for autonomous vehicles: Opportunities, barriers and policy recommendations,”Transp. Res. Part A: Policy and Pract., vol. 77, 07 2015

work page 2015

[5] [5]

SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing ,

X. Yang, J. Yan, W. Liao, X. Yang, J. Tang, and T. He, “ SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing ,”IEEE Trans. on Pattern Anal. & Mach. Intell., vol. 45, pp. 2384–2399, Feb. 2023

work page 2023

[6] [6]

Panoptic segmentation,

A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, “Panoptic segmentation,” inCVPR, pp. 9396–9405, 2019

work page 2019

[7] [7]

Hazards prioritization with cogn. attention maps for supporting driving decision-making,

Y . Huang and X. Wang, “Hazards prioritization with cogn. attention maps for supporting driving decision-making,”IEEE Trans. on Intell. Transp. Syst., vol. 25, no. 11, pp. 16221–16234, 2024

work page 2024

[8] [8]

Scene-graph augmented data-driven risk assess- ment of auton. vehicle decisions,

S.-Y . Yu, A. V . Malawade, D. Muthirayan, P. P. Khargonekar, and M. A. A. Faruque, “Scene-graph augmented data-driven risk assess- ment of auton. vehicle decisions,”IEEE Trans. on Intell. Transp. Syst., vol. 23, no. 7, pp. 7941–7951, 2022

work page 2022

[9] [9]

Vis. genome: Connecting lang. and vision using crowdsourced dense image annotations,

R. Krishna, Y . Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y . Kalantidis, L.-J. Li, D. A. Shamma, M. S. Bernstein, and L. Fei- Fei, “Vis. genome: Connecting lang. and vision using crowdsourced dense image annotations,”Int. J. Comput. Vision, vol. 123, p. 32–73, May 2017

work page 2017

[10] [10]

N. H. T. S. Admin.,NHTSA Field Crash Investigation 2021 Coding and Editing Manual. Nat. Highway Traffic Saf. Admin., 2022

work page 2021

[11] [11]

G. A. Radja, E.-Y . Noh, and F. Zhang,Crash Investigation Sampling System 2021 Analytical User’s Manual. Nat. Highway Traffic Saf. Admin., 2022

work page 2021

[12] [12]

Image retrieval using scene graphs,

J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei, “Image retrieval using scene graphs,” in CVPR, pp. 3668–3678, 2015

work page 2015

[13] [13]

Vis. relationship detection with lang. priors,

C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei, “Vis. relationship detection with lang. priors,” inEur. Conf. on Comput. Vision, 2016

work page 2016

[14] [14]

Knowl.-embedded routing network for scene graph gener.,

T. Chen, W. Yu, R. Chen, and L. Lin, “Knowl.-embedded routing network for scene graph gener.,” inCVPR, pp. 6156–6164, 2019

work page 2019

[15] [15]

Panoptic scene graph gener.,

J. Yang, Y . Z. Ang, Z. Guo, K. Zhou, W. Zhang, and Z. Liu, “Panoptic scene graph gener.,” inECCV, 2022

work page 2022

[16] [16]

Reltr: Relation transformer for scene graph generation,

Y . Cong, M. Y . Yang, and B. Rosenhahn, “Reltr: Relation transformer for scene graph generation,”IEEE Trans. on Pattern Anal. and Mach. Intell., 2023

work page 2023

[17] [17]

Rsg-net: Towards rich sematic relationship prediction for intell. vehicle in complex environments,

Y . Tian, A. Carballo, R. Li, and K. Takeda, “Rsg-net: Towards rich sematic relationship prediction for intell. vehicle in complex environments,” inIV, pp. 546–552, 2021

work page 2021

[18] [18]

Learning 3d-aware ego- centric spatial-temporal interaction via graph convolutional networks,

C. Li, Y . Meng, S. H. Chan, and Y .-T. Chen, “Learning 3d-aware ego- centric spatial-temporal interaction via graph convolutional networks,” inICRA, pp. 8418–8424, 2020

work page 2020

[19] [19]

Vis. traffic knowl. graph gener. from scene images,

Y . Guo, F. Yin, X.-H. Li, X. Yan, T. Xue, S. Mei, and C.-L. Liu, “Vis. traffic knowl. graph gener. from scene images,” in2023 IEEE/CVF Int. Conf. on Comput. Vision (ICCV), pp. 21547–21556, 2023

work page 2023

[20] [20]

Toward driving scene understanding: A paradigm and benchmark dataset for ego- centric traffic scene graph representation,

Y . Zhou, Y . Zhang, Z. Zhao, K. Zhang, and C. Gou, “Toward driving scene understanding: A paradigm and benchmark dataset for ego- centric traffic scene graph representation,”IEEE J. of Radio Freq. Identification, vol. 6, pp. 962–967, 2022

work page 2022

[21] [21]

Wordnet: a lexical database for english,

G. A. Miller, “Wordnet: a lexical database for english,”Commun. ACM, vol. 38, p. 39–41, Nov. 1995

work page 1995

[22] [22]

Conceptnet 5.5: an open multilingual graph of general knowledge,

R. Speer, J. Chin, and C. Havasi, “Conceptnet 5.5: an open multilingual graph of general knowledge,” inAAAI, p. 4444–4451, 2017

work page 2017

[23] [23]

Traffic Accident Benchmark for Causality Recognit.,

T. You and B. Han, “Traffic Accident Benchmark for Causality Recognit.,” inECCV, 2020

work page 2020

[24] [24]

Anticipating traffic accidents with adaptive loss and large-scale incident db,

T. Suzuki, H. Kataoka, Y . Aoki, and Y . Satoh, “Anticipating traffic accidents with adaptive loss and large-scale incident db,” inCVPR, pp. 3521–3529, 2018

work page 2018

[25] [25]

Intell. traffic accident prediction model for internet of vehicles with deep learning approach,

D.-J. Lin, M.-Y . Chen, H.-S. Chiang, and P. K. Sharma, “Intell. traffic accident prediction model for internet of vehicles with deep learning approach,”IEEE Trans. on Intell. Transp. Syst., vol. 23, no. 3, pp. 2340–2349, 2022

work page 2022

[26] [26]

Exploring the roles of social media data to identify the locations and severity of road traffic accidents,

S. Salam, M. S. Islam, F. Ahmed, L. Khan, D. Kim, N. Allo, and O. Nwariaku, “Exploring the roles of social media data to identify the locations and severity of road traffic accidents,” in2021 IEEE 4th Int. Conf. on Artif. Intell. and Knowl. Eng. (AIKE), pp. 62–71, 2021

work page 2021

[27] [27]

Message passing for hyper-relational knowl. graphs,

M. Galkin, P. Trivedi, G. Maheshwari, R. Usbeck, and J. Lehmann, “Message passing for hyper-relational knowl. graphs,” inProc. of the 2020 Conf. on Empirical Methods in Natural Lang. Process., (Online), pp. 7346–7359, Assoc. for Comput. Linguistics, Nov. 2020

work page 2020

[28] [28]

Masked-attention mask transformer for universal image segmenta- tion,

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmenta- tion,” inCVPR, 2022

work page 2022

[29] [29]

Learning entity and relation embeddings for knowledge graph completion,

Y . Lin, Z. Liu, M. Sun, Y . Liu, and X. Zhu, “Learning entity and relation embeddings for knowledge graph completion,” inAAAI, p. 2181–2187, 2015

work page 2015

[30] [30]

Film: visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. de Vries, V . Dumoulin, and A. Courville, “Film: visual reasoning with a general conditioning layer,” inAAAI, 2018

work page 2018

[31] [31]

The cityscapes dataset for semantic urban scene understanding,

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Be- nenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” inCVPR, 2016

work page 2016

[32] [32]

Scene graph gener. by iterative message passing,

D. Xu, Y . Zhu, C. Choy, and L. Fei-Fei, “Scene graph gener. by iterative message passing,” inCVPR, 2017

work page 2017

[33] [33]

Neural Motifs: Scene Graph Parsing with Global Context ,

R. Zellers, M. Yatskar, S. Thomson, and Y . Choi, “Neural Motifs: Scene Graph Parsing with Global Context ,” inCVPR, pp. 5831–5840, 2018

work page 2018

[34] [34]

Learning to compose dynamic tree structures for vis. contexts,

K. Tang, H. Zhang, B. Wu, W. Luo, and W. Liu, “Learning to compose dynamic tree structures for vis. contexts,” inCVPR, 2019

work page 2019