arxiv: 2603.18561 · v2 · submitted 2026-03-19 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

CausalVAD: De-confounding End-to-End Autonomous Driving via Causal Intervention

Jiacheng Tang , Zhiyuan Zhou , Zhuolin He , Jia Zhang , Kai Zhang , Jian Pu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:05 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords causal interventionend-to-end autonomous drivingde-confoundingplanning accuracynuScenesbackdoor adjustmentsparse vectorized queries

0 comments

The pith

CausalVAD de-confounds end-to-end driving models by intervening on vectorized queries with a prototype dictionary of driving contexts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Planning-oriented end-to-end autonomous driving models learn statistical correlations from training data rather than true causal relationships, which causes them to exploit biases as shortcuts and fail in complex or biased scenarios. CausalVAD introduces a training framework built around the sparse causal intervention scheme, which constructs a dictionary of prototypes to represent latent driving contexts and then intervenes on the model's sparse vectorized queries. This step applies backdoor adjustment to remove spurious associations from the learned representations. A sympathetic reader would care because reliable planning in safety-critical driving requires distinguishing real causes from dataset artifacts. If the approach holds, it would produce models that maintain accuracy and safety even when data biases or noise are present.

Core claim

The paper claims that the sparse causal intervention scheme (SCIS) instantiates backdoor adjustment in neural networks by building a dictionary of prototypes for latent driving contexts and using those prototypes to intervene directly on the sparse vectorized queries, thereby eliminating spurious factors induced by confounders and producing cleaner representations for downstream planning tasks.

What carries the argument

The sparse causal intervention scheme (SCIS), a lightweight plug-and-play module that builds a prototype dictionary of driving contexts and intervenes on sparse vectorized queries to perform backdoor adjustment.

If this is right

Models using SCIS reach state-of-the-art planning accuracy and safety scores on the nuScenes benchmark.
The framework shows improved robustness when data biases or noisy inputs are introduced to trigger causal confusion.
SCIS integrates as a lightweight module into existing end-to-end architectures without requiring architectural redesign.
Representations produced after intervention contain fewer spurious associations for any downstream driving task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prototype-based intervention pattern could be tested on other perception modules such as object detection or lane segmentation where dataset biases also create shortcuts.
One could measure how the number and diversity of prototypes trade off bias removal against retention of useful causal signals across different driving domains.
Integration with online adaptation methods might allow the prototype dictionary to update during deployment when new contexts appear.
The approach raises the question of whether explicit causal graphs of driving variables could further strengthen the intervention beyond the current dictionary method.

Load-bearing premise

That constructing a dictionary of prototypes from the data and intervening on sparse vectorized queries correctly implements backdoor adjustment and removes all relevant spurious associations without discarding causally relevant information.

What would settle it

Performance comparison on test sets that explicitly introduce new confounders absent from training data, such as controlled shifts in traffic density or weather patterns designed to break the original statistical shortcuts, to check whether CausalVAD still outperforms non-causal baselines.

Figures

Figures reproduced from arXiv: 2603.18561 by Jiacheng Tang, Jian Pu, Jia Zhang, Kai Zhang, Zhiyuan Zhou, Zhuolin He.

**Figure 2.** Figure 2: The overall architecture of CausalVAD. Our method performs precise, multi-stage causal interventions at critical information hubs [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The structural causal model (SCM) of VAD. The sub [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: The backdoor adjustment [25] principle. A confounder Z opens a spurious backdoor path S ← Z → Y . Applying the do-operator, i.e., P(Y |do(S)), severs this path, isolating the pure causal effect S → Y . by a confounder Z, which contaminates the target causal path S → Y ( [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: T-SNE visualization of the final ego-query embeddings [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative analysis of CausalVAD’s interpretability and decision logic in a challenging cut-in scenario. In this scene, baseline [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative visualization of IDM in a cut-in scenario. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Planning-oriented end-to-end driving models show great promise, yet they fundamentally learn statistical correlations instead of true causal relationships. This vulnerability leads to causal confusion, where models exploit dataset biases as shortcuts, critically harming their reliability and safety in complex scenarios. To address this, we introduce CausalVAD, a de-confounding training framework that leverages causal intervention. At its core, we design the sparse causal intervention scheme (SCIS), a lightweight, plug-and-play module to instantiate the backdoor adjustment theory in neural networks. SCIS constructs a dictionary of prototypes representing latent driving contexts. It then uses this dictionary to intervene on the model's sparse vectorized queries. This step actively eliminates spurious associations induced by confounders, thereby eliminating spurious factors from the representations for downstream tasks. Extensive experiments on benchmarks like nuScenes show CausalVAD achieves state-of-the-art planning accuracy and safety. Furthermore, our method demonstrates superior robustness against both data bias and noisy scenarios configured to induce causal confusion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CausalVAD adds a prototype-based intervention module to end-to-end driving models to cut dataset biases, but it is unclear whether the scheme actually performs proper backdoor adjustment.

read the letter

The main takeaway is that the authors built a lightweight module, SCIS, that constructs a dictionary of prototypes from the data and uses it to intervene on the sparse vectorized queries inside a driving planner. This is meant to break spurious correlations that hurt safety when the model faces new conditions. The specific construction for query-based architectures is the concrete new piece; prior causal work in ML exists, but applying backdoor adjustment directly to these driving queries in this plug-and-play form is not in the earlier literature they cite. The paper also does a reasonable job stating the practical problem: models trained on observational driving data pick up shortcuts that fail in deployment, and a simple add-on to reduce that risk is worth exploring. The lightweight nature means it could be tested on top of existing planners without rewriting the whole system. The soft spot is that the intervention may not fully implement backdoor adjustment. The prototypes are learned from the same biased data, so they might miss some confounders or retain others, and the paper supplies no causal graph, identifiability argument, or derivation showing the operation equals the adjustment formula. Without that, residual spurious paths could remain, which would weaken both the accuracy claims and the robustness results on the constructed bias and noise tests. The abstract asserts SOTA planning numbers on nuScenes and better behavior under causal confusion, but the details on baselines, ablations, and effect sizes are not visible here, so those gains stay unverified. This paper is aimed at people working on causal methods for robotics and autonomous driving. A reader already thinking about shortcut learning in planning models could pick up the SCIS design and try it. It has enough of a concrete proposal and a clear motivation to deserve a serious referee who can check the experiments and the causal mechanics in full.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes CausalVAD, a de-confounding framework for planning-oriented end-to-end autonomous driving models. It introduces the sparse causal intervention scheme (SCIS) that constructs a dictionary of prototypes from data to represent latent contexts and intervenes on sparse vectorized queries to implement backdoor adjustment, thereby removing spurious associations. The paper reports that this yields state-of-the-art planning accuracy and safety on nuScenes while providing superior robustness to data bias and to noisy scenarios designed to induce causal confusion.

Significance. If SCIS can be shown to correctly instantiate backdoor adjustment without residual bias or loss of causal signal, the framework would address a fundamental limitation of correlation-based driving models and improve reliability in safety-critical settings.

major comments (3)

[SCIS module description] SCIS module description: no explicit causal graph is supplied and no derivation or identifiability argument is given showing that prototype-based intervention on sparse queries equals the backdoor adjustment formula P(Y|do(X)) = ∑_z P(Y|X,z)P(z).
[Experiments section] Experiments section: the central SOTA and robustness claims are asserted without reported quantitative baseline numbers, ablation results on the free parameter 'prototype dictionary size', metrics quantifying residual causal confusion, or statistical significance tests.
[Robustness evaluation] Robustness evaluation: because prototypes are extracted from the same observational distribution that contains the confounders, the manuscript provides no analysis demonstrating that the intervention step removes all relevant spurious paths without discarding causally relevant information.

minor comments (1)

[Abstract] The abstract states that 'extensive experiments' were performed yet supplies no concrete metric values or baseline names.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the insightful comments on our manuscript. We address each major comment below and have revised the paper to incorporate the suggested improvements where possible.

read point-by-point responses

Referee: [SCIS module description] SCIS module description: no explicit causal graph is supplied and no derivation or identifiability argument is given showing that prototype-based intervention on sparse queries equals the backdoor adjustment formula P(Y|do(X)) = ∑_z P(Y|X,z)P(z).

Authors: We agree with this observation. The revised manuscript includes an explicit causal graph in Figure 2, illustrating the relationships between inputs X, confounders Z, and planning output Y. Additionally, Section 3.3 now provides a detailed derivation showing that the sparse causal intervention on vectorized queries approximates the backdoor adjustment by summing over the prototype distribution: P(Y|do(X)) ≈ ∑_p P(Y|X,p) P(p), where p denotes the learned prototypes. We discuss the identifiability assumptions, including that the prototype dictionary sufficiently captures the latent contexts. revision: yes
Referee: [Experiments section] Experiments section: the central SOTA and robustness claims are asserted without reported quantitative baseline numbers, ablation results on the free parameter 'prototype dictionary size', metrics quantifying residual causal confusion, or statistical significance tests.

Authors: We acknowledge the lack of detailed quantitative support in the original submission. The revised experiments section now reports full baseline numbers in Table 1 for comparison with prior methods on nuScenes planning metrics. We include an ablation study on prototype dictionary size (Table 3, sizes ranging from 10 to 200), new metrics for residual causal confusion (e.g., correlation coefficients between intervened representations and known bias factors), and statistical significance via repeated trials with t-tests (p-values reported). revision: yes
Referee: [Robustness evaluation] Robustness evaluation: because prototypes are extracted from the same observational distribution that contains the confounders, the manuscript provides no analysis demonstrating that the intervention step removes all relevant spurious paths without discarding causally relevant information.

Authors: This is a valid concern. In the revision, we have added theoretical analysis in Section 4.2 using do-calculus to argue that the sparse intervention blocks spurious paths from confounders while preserving causal paths through the prototypes. We also provide empirical results on synthetically biased nuScenes subsets, showing reduced sensitivity to confounders. However, a definitive demonstration that no causally relevant information is lost would require access to the ground-truth causal graph, which is not available; we have added this as a limitation in the discussion. revision: partial

standing simulated objections not resolved

Complete empirical verification that the intervention removes all spurious paths without any loss of causal signal, due to the absence of ground-truth causal structures in real-world driving datasets like nuScenes.

Circularity Check

0 steps flagged

SCIS applies external backdoor adjustment via data-derived prototypes without reducing target metric to fitted inputs by construction

full rationale

The paper's core step constructs a prototype dictionary from observational data and intervenes on sparse queries to instantiate backdoor adjustment. This follows the standard causal formula P(Y|do(X)) = ∑ P(Y|X,z)P(z) rather than re-deriving the planning accuracy or robustness metric from the same fitted prototypes. No equation equates the final performance claim to the input statistics by definition, and no self-citation chain or uniqueness theorem is invoked to force the result. The nuScenes experiments and bias/noise augmentations supply independent empirical checks. Minor risk exists that prototypes may incompletely cover confounders, but this is a coverage issue, not a circular reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the assumption that latent driving contexts can be captured by a finite set of prototypes and that intervening on sparse queries removes all spurious correlations induced by confounders.

free parameters (1)

prototype dictionary size
Number of context prototypes must be chosen; affects how finely latent confounders are represented.

axioms (1)

domain assumption Backdoor adjustment formula can be realized by intervening on sparse vectorized queries inside a neural network
Invoked to justify the SCIS module design.

invented entities (1)

Sparse causal intervention scheme (SCIS) no independent evidence
purpose: Lightweight module that performs causal intervention on model queries using a prototype dictionary
New component introduced by the paper; no independent evidence outside this work.

pith-pipeline@v0.9.0 · 5478 in / 1263 out tokens · 38841 ms · 2026-05-15T09:05:55.572158+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

P(Y|do(S=s)) = ∑_z P(Y|S=s, Z=z)P(Z=z)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The DAWN of World-Action Interactive Models
cs.CV 2026-05 unverdicted novelty 6.0

DAWN couples a world predictor with a world-conditioned action denoiser in latent space so that each refines the other recursively, yielding strong planning and safety results on autonomous driving benchmarks.
DINO-VO: Learning Where to Focus for Enhanced State Estimation
cs.CV 2026-04 unverdicted novelty 6.0

DINO-VO achieves state-of-the-art monocular visual odometry accuracy and generalization by training a differentiable patch selector together with multi-task features and inverse-depth bundle adjustment.
EponaV2: Driving World Model with Comprehensive Future Reasoning
cs.CV 2026-05 unverdicted novelty 5.0

EponaV2 advances perception-free driving world models by forecasting comprehensive future 3D geometry and semantic representations, achieving SOTA planning performance on NAVSIM benchmarks.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · cited by 3 Pith papers · 4 internal anchors

[1]

Safety implications of explainable artificial intelli- gence in end-to-end autonomous driving.IEEE Trans

Shahin Atakishiyev, Mohammad Salameh, and Randy Goebel. Safety implications of explainable artificial intelli- gence in end-to-end autonomous driving.IEEE Trans. Intell. Transp. Syst., 2025. 1

work page 2025
[2]

ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst

Mayank Bansal, Alex Krizhevsky, and Abhijit Ogale. Chauf- feurnet: Learning to drive by imitating the best and synthe- sizing the worst.arXiv preprint arXiv:1812.03079, 2018. 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[3]

nuscenes: A multi- modal dataset for autonomous driving

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 11621–11631,

work page
[4]

Learning by cheating

Dian Chen, Brady Zhou, Vladlen Koltun, and Philipp Kr¨ahenb¨uhl. Learning by cheating. InConf. Robot. Learn., pages 66–75. PMLR, 2020. 2

work page 2020
[5]

Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving

Zhili Chen, Maosheng Ye, Shuangjie Xu, Tongyi Cao, and Qifeng Chen. Ppad: Iterative interactions of prediction and planning for end-to-end autonomous driving. InEur. Conf. Comput. Vis., pages 239–256. Springer, 2024. 1

work page 2024
[6]

Rethinking imitation-based planners for autonomous driving

Jie Cheng, Yingbing Chen, Xiaodong Mei, Bowen Yang, Bo Li, and Ming Liu. Rethinking imitation-based planners for autonomous driving. In2024 IEEE Int. Conf. Robot. Autom., pages 14123–14130. IEEE, 2024. 2

work page 2024
[7]

MMDetection3D: Open- MMLab next-generation platform for general 3D object detection.https://github.com/open- mmlab/ mmdetection3d, 2020

MMDetection3D Contributors. MMDetection3D: Open- MMLab next-generation platform for general 3D object detection.https://github.com/open- mmlab/ mmdetection3d, 2020. 5

work page 2020
[8]

Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.Adv

Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, et al. Navsim: Data-driven non-reactive autonomous vehicle simulation and benchmarking.Adv. Neural Inform. Process. Syst., 37: 28706–28719, 2024. 2, 5

work page 2024
[9]

Causal confusion in imitation learning.Adv

Pim De Haan, Dinesh Jayaraman, and Sergey Levine. Causal confusion in imitation learning.Adv. Neural Inform. Process. Syst., 32, 2019. 2

work page 2019
[10]

Orion: A holistic end-to- end autonomous driving framework by vision-language in- structed action generation

Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, and Xiang Bai. Orion: A holistic end-to- end autonomous driving framework by vision-language in- structed action generation. InProc. IEEE/CVF Int. Conf. Comput. Vis., pages 24823–24834, 2025. 3

work page 2025
[11]

Shortcut learning in deep neural networks

Robert Geirhos, J ¨orn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Fe- lix A Wichmann. Shortcut learning in deep neural networks. Nature Mach. Intell., 2(11):665–673, 2020. 1, 2

work page 2020
[12]

St-p3: End-to-end vision-based au- tonomous driving via spatial-temporal feature learning

Shengchao Hu, Li Chen, Penghao Wu, Hongyang Li, Junchi Yan, and Dacheng Tao. St-p3: End-to-end vision-based au- tonomous driving via spatial-temporal feature learning. In Eur. Conf. Comput. Vis., pages 533–549. Springer, 2022. 2

work page 2022
[13]

Planning-oriented autonomous driving

Yihan Hu, Jiazhi Yang, Li Chen, Keyu Li, Chonghao Sima, Xizhou Zhu, Siqi Chai, Senyao Du, Tianwei Lin, Wenhai Wang, et al. Planning-oriented autonomous driving. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 17853– 17862, 2023. 1, 2

work page 2023
[14]

EMMA: End-to-End Multimodal Model for Autonomous Driving

Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, et al. Emma: End-to-end multimodal model for autonomous driving.arXiv preprint arXiv:2410.23262,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Bench2drive: Towards multi-ability benchmark- ing of closed-loop end-to-end autonomous driving.Adv

Xiaosong Jia, Zhenjie Yang, Qifeng Li, Zhiyuan Zhang, and Junchi Yan. Bench2drive: Towards multi-ability benchmark- ing of closed-loop end-to-end autonomous driving.Adv. Neural Inform. Process. Syst., 37:819–844, 2024. 2, 5

work page 2024
[16]

Vad: Vectorized scene representation for efficient autonomous driving

Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, and Xinggang Wang. Vad: Vectorized scene representation for efficient autonomous driving. InProc. IEEE/CVF Int. Conf. Comput. Vis., pages 8340–8350, 2023. 1, 2

work page 2023
[17]

Senna: Bridging large vision-language mod- els and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024

Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, and Xing- gang Wang. Senna: Bridging large vision-language mod- els and end-to-end autonomous driving.arXiv preprint arXiv:2410.22313, 2024. 3

work page arXiv 2024
[18]

A survey on vision-language-action models for autonomous driving

Sicong Jiang, Zilin Huang, Kangan Qian, Ziang Luo, Tianze Zhu, Yang Zhong, Yihong Tang, Menglin Kong, Yunlong Wang, Siwen Jiao, et al. A survey on vision-language-action models for autonomous driving. InProc. IEEE/CVF Int. Conf. Comput. Vis., pages 4524–4536, 2025. 1, 2

work page 2025
[19]

Artdeco: Toward high-fidelity on-the-fly reconstruction with hierarchical gaussian structure and feed- forward guidance

Guanghao Li, Kerui Ren, Linning Xu, Zhewen Zheng, Changjian Jiang, Xin Gao, Bo Dai, Jian Pu, Mulin Yu, and Jiangmiao Pang. Artdeco: Toward high-fidelity on-the-fly reconstruction with hierarchical gaussian structure and feed- forward guidance. InInt. Conf. Learn. Represent.2

work page
[20]

Papl-slam: Principal axis-anchored monocular point-line slam.IEEE Robot

Guanghao Li, Yu Cao, Qi Chen, Xin Gao, Yifan Yang, and Jian Pu. Papl-slam: Principal axis-anchored monocular point-line slam.IEEE Robot. Autom. Letters, 2025. 2

work page 2025
[21]

Is ego status all you need for open-loop end-to-end autonomous driving? InProc

Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, and Jose M Alvarez. Is ego status all you need for open-loop end-to-end autonomous driving? InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 14864– 14873, 2024. 1, 2

work page 2024
[22]

Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving

Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, et al. Diffusiondrive: Truncated diffusion model for end-to-end autonomous driving. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 12037–12047,

work page
[23]

A Survey on Hallucination in Large Vision-Language Models

Hanchao Liu, Wenyuan Xue, Yifei Chen, Dapeng Chen, Xiu- tian Zhao, Ke Wang, Liping Hou, Rongjun Li, and Wei Peng. A survey on hallucination in large vision-language models. arXiv preprint arXiv:2402.00253, 2024. 2

work page internal anchor Pith review Pith/arXiv arXiv 2024
[24]

Off-road obstacle avoidance through end-to-end learn- ing.Adv

Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, and Yann Cun. Off-road obstacle avoidance through end-to-end learn- ing.Adv. Neural Inform. Process. Syst., 18, 2005. 2, 3

work page 2005
[25]

Cambridge university press, 2009

Judea Pearl.Causality. Cambridge university press, 2009. 2, 3, 4, 1

work page 2009
[26]

Cadet: a causal disentanglement approach for robust trajec- tory prediction in autonomous driving

Mozhgan Pourkeshavarz, Junrui Zhang, and Amir Rasouli. Cadet: a causal disentanglement approach for robust trajec- tory prediction in autonomous driving. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 14874–14884,

work page
[27]

Sparsedrive: End-to-end au- tonomous driving via sparse scene representation

Wenchao Sun, Xuewu Lin, Yining Shi, Chuang Zhang, Hao- ran Wu, and Sifa Zheng. Sparsedrive: End-to-end au- tonomous driving via sparse scene representation. In2025 IEEE Int. Conf. Robot. Autom., pages 8795–8801. IEEE,

work page
[28]

Decoupling scene perception and ego sta- tus: A multi-context fusion approach for enhanced gener- alization in end-to-end autonomous driving.arXiv preprint arXiv:2511.13079, 2025

Jiacheng Tang, Mingyue Feng, Jiachao Liu, Yaonong Wang, and Jian Pu. Decoupling scene perception and ego sta- tus: A multi-context fusion approach for enhanced gener- alization in end-to-end autonomous driving.arXiv preprint arXiv:2511.13079, 2025. 2

work page arXiv 2025
[29]

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, and Hang Zhao. Drivevlm: The convergence of autonomous driving and large vision-language models.arXiv preprint arXiv:2402.12289, 2024. 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Omnidrive: A holistic vision-language dataset for autonomous driving with counterfactual reasoning

Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, and Jose M Al- varez. Omnidrive: A holistic vision-language dataset for autonomous driving with counterfactual reasoning. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 22442– 22452, 2025. 2

work page 2025
[31]

Visual commonsense r-cnn

Tan Wang, Jianqiang Huang, Hanwang Zhang, and Qianru Sun. Visual commonsense r-cnn. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 10760–10770, 2020. 1

work page 2020
[32]

Are vlms ready for autonomous driving? an empirical study from the reliability, data and metric perspectives

Shaoyuan Xie, Lingdong Kong, Yuhao Dong, Chonghao Sima, Wenwei Zhang, Qi Alfred Chen, Ziwei Liu, and Liang Pan. Are vlms ready for autonomous driving? an empirical study from the reliability, data and metric perspectives. In Proc. IEEE/CVF Int. Conf. Comput. Vis., pages 6585–6597,

work page
[33]

Show, attend and tell: Neural image caption genera- tion with visual attention

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption genera- tion with visual attention. InInt. Conf. Mach. Learn., pages 2048–2057. PMLR, 2015. 1

work page 2048
[34]

Drivegpt4-v2: Harnessing large language model capabili- ties for enhanced closed-loop autonomous driving

Zhenhua Xu, Yan Bai, Yujia Zhang, Zhuoling Li, Fei Xia, Kwan-Yee K Wong, Jianqiang Wang, and Hengshuang Zhao. Drivegpt4-v2: Harnessing large language model capabili- ties for enhanced closed-loop autonomous driving. InProc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., pages 17261– 17270, 2025. 2

work page 2025
[35]

PointSSC: A cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion

Yuxiang Yan, Boda Liu, Jianfei Ai, Qinbu Li, Ru Wan, and Jian Pu. PointSSC: A cooperative vehicle-infrastructure point cloud benchmark for semantic scene completion. In IEEE Int. Conf. Robot. Autom., pages 17027–17034. IEEE,

work page
[36]

Learning spatial-aware manipulation ordering

Yuxiang Yan, Zhiyuan Zhou, Xin Gao, Guanghao Li, Shenglin Li, Jiaqi Chen, Qunyan Pu, and Jian Pu. Learning spatial-aware manipulation ordering. InAdv. Neural Inform. Process. Syst., 2025. 2

work page 2025
[37]

Towards context-aware emotion recognition debiasing from a causal demystification perspective via de-confounded training.IEEE Trans

Dingkang Yang, Kun Yang, Haopeng Kuang, Zhaoyu Chen, Yuzheng Wang, and Lihua Zhang. Towards context-aware emotion recognition debiasing from a causal demystification perspective via de-confounded training.IEEE Trans. Pattern Anal. Mach. Intell., 46(12):10663–10680, 2024. 7

work page 2024
[38]

Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving.arXiv preprint arXiv:2505.16278, 2025

Zhenjie Yang, Yilin Chai, Xiaosong Jia, Qifeng Li, Yuqian Shao, Xuekai Zhu, Haisheng Su, and Junchi Yan. Drivemoe: Mixture-of-experts for vision-language-action model in end-to-end autonomous driving.arXiv preprint arXiv:2505.16278, 2025. 7 CausalV AD: De-confounding End-to-End Autonomous Driving via Causal Intervention Supplementary Material This documen...

work page arXiv 2025