Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Aishan Liu; Boyi Jia; Lu Wang; Qiang Fu; Shengshan Hu; Siyuan Liang; Tianyuan Zhang; Xianglong Liu; Xinwei Zhang; Yitong Zhang

arxiv: 2411.18275 · v2 · submitted 2024-11-27 · 💻 cs.CV

Visual Adversarial Attack on Vision-Language Models for Autonomous Driving

Tianyuan Zhang , Lu Wang , Xinwei Zhang , Yitong Zhang , Boyi Jia , Siyuan Liang , Shengshan Hu , Qiang Fu

show 2 more authors

Aishan Liu Xianglong Liu

This is my paper

Pith reviewed 2026-05-23 16:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords adversarial attackvision-language modelautonomous drivingsemantic-invariant inductionscenario-associated enhancementvisual perturbationtime-series scenario

0 comments

The pith

ADvLM is the first visual adversarial attack framework designed for vision-language models in autonomous driving to handle variable instructions and time-series scenes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that general-purpose attacks on vision-language models overlook the specific demands of autonomous driving, where textual instructions change and visual input arrives as sequences. It proposes ADvLM, which first builds a library of semantically consistent prompts via an LLM guided by semantic entropy and then optimizes perturbations over attention-selected key frames and perspectives. A reader would care because these models are being integrated into safety-critical driving systems, and attacks that generalize across instructions and time could compromise their reliability. Experiments report higher attack success than prior methods across several models and benchmarks, with additional real-world validation.

Core claim

By combining Semantic-Invariant Induction to generate diverse yet semantically equivalent textual instructions through an LLM and semantic entropy with Scenario-Associated Enhancement that selects critical frames and perspectives via attention, ADvLM produces visual perturbations that remain effective against AD VLMs despite instruction variability and the sequential character of driving scenarios, attaining state-of-the-art results on multiple benchmarks and showing applicability in real-world tests.

What carries the argument

Semantic-Invariant Induction paired with Scenario-Associated Enhancement, which together generate and focus adversarial perturbations that generalize across textual variation and time-series driving inputs.

If this is right

Perturbations created this way succeed across many different textual instructions without needing to match any single one exactly.
Optimizing on a subset of attention-selected frames and perspectives makes the attack effective over entire time-series scenarios.
The method records higher attack success rates than prior approaches on several AD VLMs across multiple benchmarks.
Real-world experiments confirm that the generated perturbations transfer to practical driving settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Safety testing protocols for autonomous driving systems would need to incorporate attacks that explicitly vary instructions and span full sequences.
Model developers might add checks for semantic consistency in prompts and robustness around attention mechanisms to reduce such vulnerabilities.
If the generalization holds, regulatory evaluations of autonomous vehicles could require explicit testing against scenario-associated visual perturbations.

Load-bearing premise

The assumption that an LLM prompt library built with semantic entropy plus attention-selected frames will yield perturbations effective across unseen instructions and full driving sequences without benchmark-specific overfitting.

What would settle it

Running ADvLM on a new AD VLM or driving benchmark outside the paper's tested set and finding attack success rates no higher than existing general VLM attacks, or real-world physical tests showing no transfer of the claimed effectiveness.

Figures

Figures reproduced from arXiv: 2411.18275 by Aishan Liu, Boyi Jia, Lu Wang, Qiang Fu, Shengshan Hu, Siyuan Liang, Tianyuan Zhang, Xianglong Liu, Xinwei Zhang, Yitong Zhang.

**Figure 1.** Figure 1: Illustration of ADvLM, where visual attacks lead model to generate incorrect decisions, demonstrating the potential risks associated with adversarial vulnerabilities in VLMs for AD. natural language, VLMs could serve as the brain and offer effective solutions for advanced reasoning in complex scenarios and more efficient human-machine interaction [3, 40, 45, 55]. As a novel solution in end-to-end AD, VLMs… view at source ↗

**Figure 2.** Figure 2: Illustration of the main challenges of attacks for VLMs [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The ADvLM Framework. ADvLM introduce Semantic-Invariant Induction in the textual domain and Scenario-Associated Enhancement in the visual domain, ensuring attack effectiveness across varied instructions and sequential viewpoints. 4.1. Semantic-Invariant Induction In the textual modality, we introduce Semantic-Invariant Induction to construct a low-semantic-entropy (LSE) prompts library TeLSE containing d… view at source ↗

**Figure 4.** Figure 4: Closed-loop exp in Town 03 Route 26 of CARLA. After [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: Results under different numbers of prompts. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 5.** Figure 5: Experiments results under different steps and budgets. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 8.** Figure 8: Real-World Case Study of ADvLM Attack. (a) Experimental environment setup. (b) First-person view from the vehicle. (c) Normal driving without attack, with the vehicle following the intended path. (d) ADvLM attack effect, causing the vehicle to deviate from its path, demonstrating potential real-world safety risks. first-person view images displayed in Fig. 8a and Fig. 8b. Real-time adversarial noise gene… view at source ↗

read the original abstract

Vision-language models (VLMs) have significantly advanced autonomous driving (AD) by enhancing reasoning capabilities. However, these models remain highly vulnerable to adversarial attacks. While existing research has primarily focused on general VLM attacks, the development of attacks tailored to the safety-critical AD context has been largely overlooked. In this paper, we take the first step toward designing adversarial attacks specifically targeting VLMs in AD, exposing the substantial risks these attacks pose within this critical domain. We identify two unique challenges for effective adversarial attacks on AD VLMs: the variability of textual instructions and the time-series nature of visual scenarios. To this end, we propose ADvLM, the first visual adversarial attack framework specifically designed for VLMs in AD. Our framework introduces Semantic-Invariant Induction, which uses a large language model to create a diverse prompt library of textual instructions with consistent semantic content, guided by semantic entropy. Building on this, we introduce Scenario-Associated Enhancement, an approach where attention mechanisms select key frames and perspectives within driving scenarios to optimize adversarial perturbations that generalize across the entire scenario. Extensive experiments on several AD VLMs over multiple benchmarks show that ADvLM achieves state-of-the-art attack effectiveness. Moreover, real-world attack studies further validate its applicability and potential in practice.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ADvLM targets AD-specific VLM attack challenges with prompt libraries and attention-based frame selection, but the generalization evidence is thin on transfer to unseen instructions and frames.

read the letter

The paper's core move is to treat textual instruction variability and time-series driving scenes as distinct problems that general VLM attacks ignore. It builds ADvLM around Semantic-Invariant Induction, which uses an LLM and semantic entropy to generate a prompt library, plus Scenario-Associated Enhancement, which applies attention to pick key frames and perspectives for perturbation generation. That framing is new relative to the cited general VLM attack work, and the experiments claim stronger attack success rates than baselines across several AD VLMs and benchmarks, with added real-world tests.

Referee Report

1 major / 0 minor

Summary. The paper introduces ADvLM as the first visual adversarial attack framework tailored to vision-language models (VLMs) used in autonomous driving (AD). It identifies two AD-specific challenges—variability in textual instructions and the time-series nature of driving scenarios—and proposes Semantic-Invariant Induction (LLM-generated diverse prompt library guided by semantic entropy) to handle instruction variability plus Scenario-Associated Enhancement (attention-based selection of key frames and perspectives) to optimize perturbations that generalize across scenarios. The central claims are that ADvLM achieves state-of-the-art attack success rates on multiple AD VLMs across benchmarks and that real-world studies confirm practical applicability.

Significance. If the generalization and SOTA claims hold under rigorous transfer testing, the work would be significant for exposing concrete safety risks in AD systems that rely on VLMs for reasoning. It is the first paper to target this domain specifically rather than applying generic VLM attacks, and the combination of LLM-driven semantic invariance with attention-driven scenario enhancement offers a plausible empirical construction for handling AD variability.

major comments (1)

[Abstract / method description] Abstract and method description: the central effectiveness claim requires that perturbations generated via the entropy-guided prompt library and attention-selected frames/perspectives remain effective on unseen instructions and non-selected frames in driving sequences, yet no explicit transfer metrics (e.g., ASR on held-out instructions outside the library or on frames below the attention threshold) are reported. Without these, benchmark results may reflect selection bias rather than robustness to the stated AD challenges.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below.

read point-by-point responses

Referee: [Abstract / method description] Abstract and method description: the central effectiveness claim requires that perturbations generated via the entropy-guided prompt library and attention-selected frames/perspectives remain effective on unseen instructions and non-selected frames in driving sequences, yet no explicit transfer metrics (e.g., ASR on held-out instructions outside the library or on frames below the attention threshold) are reported. Without these, benchmark results may reflect selection bias rather than robustness to the stated AD challenges.

Authors: We agree that explicit transfer metrics on held-out instructions and non-selected frames would provide stronger direct evidence of generalization and help rule out selection bias. While the reported benchmarks already evaluate across diverse instructions and full driving sequences, the manuscript does not include the specific held-out tests suggested. In the revised version we will add experiments reporting ASR on instructions excluded from the prompt library and on frames below the attention threshold. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical construction validated on external benchmarks

full rationale

The paper presents ADvLM as an empirical framework with two components (Semantic-Invariant Induction via LLM prompt library and Scenario-Associated Enhancement via attention selection) to address variability challenges in AD VLMs. All claims rest on experimental results across multiple benchmarks and real-world studies rather than any derivation that reduces to its own inputs by construction, fitted parameters renamed as predictions, or load-bearing self-citations. The central results are measured against external data and models, rendering the chain self-contained with no identifiable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities can be extracted. The framework implicitly depends on the effectiveness of LLM-generated semantic equivalence and attention-based key-frame selection, whose validity is asserted rather than derived.

pith-pipeline@v0.9.0 · 5778 in / 1096 out tokens · 24087 ms · 2026-05-23T16:33:43.377565+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Semantic-Invariant Induction... guided by semantic entropy... Scenario-Associated Enhancement... attention mechanisms select key frames... Lattack = (1-λ)Limage + λ Lscene
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

variability of textual instructions and the time-series nature of visual scenarios

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Challenging Vision-Language Models with Physically Deployable Multimodal Semantic Lighting Attacks
cs.CV 2026-04 unverdicted novelty 8.0

MSLA is the first physically deployable attack that uses adversarial lighting to break semantic alignment in VLMs such as CLIP, LLaVA, and BLIP, causing classification failures and hallucinations in real scenes.
WMAttack: Automated Attack Search for Adversarial Evaluation of World-Model Agents
cs.LG 2026-05 unverdicted novelty 7.0

WMAttack automates finite-budget attack search for world-model agents via SCAS and RGAR, reporting higher normalized reward drops than baselines on Atari and DMC tasks.
GuardAD: Safeguarding Autonomous Driving MLLMs via Markovian Safety Logic
cs.AI 2026-05 unverdicted novelty 6.0

GuardAD reduces accident rates by 32% in autonomous driving MLLMs by using n-th order Markovian logic to infer latent hazards and revise actions.
High-Entropy Tokens as Multimodal Failure Points in Vision-Language Models
cs.CV 2025-12 unverdicted novelty 6.0

High-entropy tokens act as concentrated multimodal failure points in VLMs, enabling sparse Entropy-Guided Attacks that achieve 93-95% success and 30-38% harmful rates with cross-model transfer.
Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs
cs.RO 2026-05 unverdicted novelty 5.0

Sensor perturbations in driving VLAs cause Chain-of-Causation reasoning changes that correlate strongly with 5.3x higher trajectory deviation, while enabling such reasoning improves accuracy by 11.8%.
Adversarial Flow Matching for Imperceptible Attacks on End-to-End Autonomous Driving
cs.CV 2026-04 unverdicted novelty 5.0

AFM is a novel gray-box adversarial attack using flow matching to create visually imperceptible perturbations that degrade performance of Vision-Language-Action and modular end-to-end autonomous driving models while s...
A Patch-based Cross-view Regularized Framework for Backdoor Defense in Multimodal Large Language Models
cs.CV 2026-04 unverdicted novelty 5.0

A patch-augmented cross-view regularization method reduces backdoor attack success rates in multimodal LLMs by enforcing output differences between original and perturbed views while using entropy constraints to prese...

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · cited by 7 Pith papers · 6 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 ,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Image hijacks: Adversarial images can control generative models at runtime

Luke Bailey, Euan Ong, Stuart Russell, and Scott Emmons. Image hijacks: Adversarial images can control generative models at runtime. arXiv preprint arXiv:2309.00236, 2023. 2

work page arXiv 2023
[3]

Driving with llms: Fusing object-level vec- tor modality for explainable autonomous driving

Long Chen, Oleg Sinavski, Jan H ¨unermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, and Jamie Shotton. Driving with llms: Fusing object-level vec- tor modality for explainable autonomous driving. In ICRA,

work page
[4]

Word2vec

Kenneth Ward Church. Word2vec. Natural Language Engi- neering, 2017. 4

work page 2017
[5]

Carla: An open urban driving simulator

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Conference on robot learning, 2017. 5

work page 2017
[6]

Detecting hallucinations in large language models using semantic entropy

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy. Nature, 2024. 4

work page 2024
[7]

Boosting transferability in vision-language attacks via diversification along the intersection region of adversarial trajectory

Sensen Gao, Xiaojun Jia, Xuhong Ren, Ivor Tsang, and Qing Guo. Boosting transferability in vision-language attacks via diversification along the intersection region of adversarial trajectory. In ECCV, 2025. 2

work page 2025
[8]

Multimodal-gpt: A vision and language model for dialogue with humans

Tao Gong, Chengqi Lyu, Shilong Zhang, Yudong Wang, Miao Zheng, Qian Zhao, Kuikun Liu, Wenwei Zhang, Ping Luo, and Kai Chen. Multimodal-gpt: A vision and lan- guage model for dialogue with humans. arXiv preprint arXiv:2305.04790, 2023. 5, 6, 7

work page arXiv 2023
[9]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014. 5, 6, 7

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

A comprehensive evaluation framework for deep model robustness

Jun Guo, Wei Bao, Jiakai Wang, Yuqing Ma, Xinghai Gao, Gang Xiao, Aishan Liu, Jian Dong, Xianglong Liu, and Wen- jun Wu. A comprehensive evaluation framework for deep model robustness. PR, 2023. 2

work page 2023
[11]

Generating transferable 3d adversarial point cloud via random perturbation factoriza- tion

Bangyan He, Jian Liu, Yiming Li, Siyuan Liang, Jingzhi Li, Xiaojun Jia, and Xiaochun Cao. Generating transferable 3d adversarial point cloud via random perturbation factoriza- tion. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023. 1

work page 2023
[12]

Exploring the physical- world adversarial robustness of vehicle detection

Wei Jiang, Tianyuan Zhang, Shuangcheng Liu, Weiyu Ji, Zichao Zhang, and Gang Xiao. Exploring the physical- world adversarial robustness of vehicle detection. Electron- ics, 2023. 2

work page 2023
[13]

Robuste2e: Exploring the robustness of end-to-end autonomous driving

Wei Jiang, Lu Wang, Tianyuan Zhang, Yuwei Chen, Jian Dong, Wei Bao, Zichao Zhang, and Qiang Fu. Robuste2e: Exploring the robustness of end-to-end autonomous driving. Electronics, 2024. 2

work page 2024
[14]

Environmen- tal matching attack against unmanned aerial vehicles object detection

Dehong Kong, Siyuan Liang, and Wenqi Ren. Environmen- tal matching attack against unmanned aerial vehicles object detection. arXiv preprint arXiv:2405.07595, 2024. 1

work page arXiv 2024
[15]

Patch is enough: Naturalistic ad- versarial patch against vision-language pre-training models

Dehong Kong, Siyuan Liang, Xiaopeng Zhu, Yuansheng Zhong, and Wenqi Ren. Patch is enough: Naturalistic ad- versarial patch against vision-language pre-training models. arXiv preprint arXiv:2410.04884, 2024. 1

work page arXiv 2024
[16]

Towards benchmarking and assessing visual naturalness of physical world adversarial attacks

Simin Li, Shuning Zhang, Gujun Chen, Dong Wang, Pu Feng, Jiakai Wang, Aishan Liu, Xin Yi, and Xianglong Liu. Towards benchmarking and assessing visual naturalness of physical world adversarial attacks. In CVPR, 2023. 2

work page 2023
[17]

Efficient adversarial attacks for visual object tracking

Siyuan Liang, Xingxing Wei, Siyuan Yao, and Xiaochun Cao. Efficient adversarial attacks for visual object tracking. In Computer Vision–ECCV 2020: 16th European Confer- ence, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16, 2020. 1

work page 2020
[18]

Generate more imperceptible adversarial examples for object detec- tion

Siyuan Liang, Xingxing Wei, and Xiaochun Cao. Generate more imperceptible adversarial examples for object detec- tion. In ICML, 2021

work page 2021
[19]

A large-scale multiple- objective method for black-box attack against object detec- tion

Siyuan Liang, Longkang Li, Yanbo Fan, Xiaojun Jia, Jingzhi Li, Baoyuan Wu, and Xiaochun Cao. A large-scale multiple- objective method for black-box attack against object detec- tion. In European Conference on Computer Vision, 2022

work page 2022
[20]

Parallel rect- angle flip attack: A query-based black-box attack against object detection.arXiv preprint arXiv:2201.08970, 2022

Siyuan Liang, Baoyuan Wu, Yanbo Fan, Xingxing Wei, and Xiaochun Cao. Parallel rectangle flip attack: A query-based black-box attack against object detection. arXiv preprint arXiv:2201.08970, 2022. 1

work page arXiv 2022
[21]

Badclip: Dual- embedding guided backdoor attack on multimodal con- trastive learning

Siyuan Liang, Mingli Zhu, Aishan Liu, Baoyuan Wu, Xiaochun Cao, and Ee-Chien Chang. Badclip: Dual- embedding guided backdoor attack on multimodal con- trastive learning. arXiv preprint arXiv:2311.12075 , 2023. 2

work page arXiv 2023
[22]

Perceptual-sensitive gan for generating adversarial patches

Aishan Liu, Xianglong Liu, Jiaxin Fan, Yuqing Ma, Anlan Zhang, Huiyuan Xie, and Dacheng Tao. Perceptual-sensitive gan for generating adversarial patches. In AAAI, 2019. 2

work page 2019
[23]

Spatiotemporal attacks for embodied agents

Aishan Liu, Tairan Huang, Xianglong Liu, Yitao Xu, Yuqing Ma, Xinyun Chen, Stephen J Maybank, and Dacheng Tao. Spatiotemporal attacks for embodied agents. In ECCV, 2020

work page 2020
[24]

Bias-based universal adver- sarial patch attack for automatic check-out

Aishan Liu, Jiakai Wang, Xianglong Liu, Bowen Cao, Chongzhi Zhang, and Hang Yu. Bias-based universal adver- sarial patch attack for automatic check-out. In ECCV, 2020

work page 2020
[25]

Training robust deep neu- ral networks via adversarial noise propagation

Aishan Liu, Xianglong Liu, Hang Yu, Chongzhi Zhang, Qiang Liu, and Dacheng Tao. Training robust deep neu- ral networks via adversarial noise propagation. IEEE TIP, 2021

work page 2021
[26]

{X-Adv}: Physical adversarial object attacks against x-ray prohibited item detection

Aishan Liu, Jun Guo, Jiakai Wang, Siyuan Liang, Renshuai Tao, Wenbo Zhou, Cong Liu, Xianglong Liu, and Dacheng Tao. {X-Adv}: Physical adversarial object attacks against x-ray prohibited item detection. In USENIX, 2023. 1

work page 2023
[27]

Towards defend- ing multiple lp-norm bounded adversarial perturbations via gated batch normalization

Aishan Liu, Shiyu Tang, Xinyun Chen, Lei Huang, Haotong Qin, Xianglong Liu, and Dacheng Tao. Towards defend- ing multiple lp-norm bounded adversarial perturbations via gated batch normalization. IJCV, 2023

work page 2023
[28]

Exploring the rela- tionship between architectural design and adversarially ro- bust generalization

Aishan Liu, Shiyu Tang, Siyuan Liang, Ruihao Gong, Boxi Wu, Xianglong Liu, and Dacheng Tao. Exploring the rela- tionship between architectural design and adversarially ro- bust generalization. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2023. 2

work page 2023
[29]

Pre-trained trojan attacks for visual recog- nition

Aishan Liu, Xinwei Zhang, Yisong Xiao, Yuguang Zhou, Siyuan Liang, Jiakai Wang, Xianglong Liu, Xiaochun Cao, and Dacheng Tao. Pre-trained trojan attacks for visual recog- nition. arXiv preprint arXiv:2312.15172, 2023. 2

work page arXiv 2023
[30]

Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. NeurIPS, 2024. 5, 6, 7

work page 2024
[31]

Mtd-gpt: A multi-task decision-making gpt model for autonomous driving at unsignalized intersections

Jiaqi Liu, Peng Hang, Xiao Qi, Jianqiang Wang, and Jian Sun. Mtd-gpt: A multi-task decision-making gpt model for autonomous driving at unsignalized intersections. In ITSC,

work page
[32]

Improving adversarial transferability by stable diffusion.arXiv preprint arXiv:2311.11017, 2023

Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, and Ee-Chien Chang. Improving adver- sarial transferability by stable diffusion. arXiv preprint arXiv:2311.11017, 2023. 1

work page arXiv 2023
[33]

Harnessing percep- tual adversarial patches for crowd counting

Shunchang Liu, Jiakai Wang, Aishan Liu, Yingwei Li, Yijie Gao, Xianglong Liu, and Dacheng Tao. Harnessing percep- tual adversarial patches for crowd counting. In ACM CCS,

work page
[34]

Hide in thicket: Gener- ating imperceptible and rational adversarial perturbations on 3d point clouds

Tianrui Lou, Xiaojun Jia, Jindong Gu, Li Liu, Siyuan Liang, Bangyan He, and Xiaochun Cao. Hide in thicket: Gener- ating imperceptible and rational adversarial perturbations on 3d point clouds. arXiv preprint arXiv:2403.05247, 2024. 1

work page arXiv 2024
[35]

Poisoning attack against estimating from pairwise comparisons

Ke Ma, Qianqian Xu, Jinshan Zeng, Xiaochun Cao, and Qingming Huang. Poisoning attack against estimating from pairwise comparisons. IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 44(10):6393–6408, 2021

work page 2021
[36]

A tale of hodgerank and spectral method: Target attack against rank aggregation is the fixed point of adversarial game

Ke Ma, Qianqian Xu, Jinshan Zeng, Guorong Li, Xiaochun Cao, and Qingming Huang. A tale of hodgerank and spectral method: Target attack against rank aggregation is the fixed point of adversarial game. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4090–4108, 2022

work page 2022
[37]

Sequential manipula- tion against rank aggregation: theory and algorithm

Ke Ma, Qianqian Xu, Jinshan Zeng, Wei Liu, Xiaochun Cao, Yingfei Sun, and Qingming Huang. Sequential manipula- tion against rank aggregation: theory and algorithm. IEEE TPAMI, 2024. 1

work page 2024
[38]

Dolphins: Multimodal language model for driving

Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, and Chaowei Xiao. Dolphins: Multimodal language model for driving. arXiv preprint arXiv:2312.00438, 2023. 2, 5, 6

work page arXiv 2023
[39]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learn- ing models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017. 5, 6, 7, 8

work page internal anchor Pith review Pith/arXiv arXiv 2017
[40]

GPT-Driver: Learning to Drive with GPT

Jiageng Mao, Yuxi Qian, Junjie Ye, Hang Zhao, and Yue Wang. Gpt-driver: Learning to drive with gpt.arXiv preprint arXiv:2310.01415, 2023. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[41]

Lingoqa: Video question answering for autonomous driving

Ana-Maria Marcu, Long Chen, Jan H ¨unermann, Alice Karn- sund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, et al. Lingoqa: Video question answering for autonomous driving. arXiv preprint arXiv:2312.14115, 2023. 2

work page arXiv 2023
[42]

Reason2drive: Towards interpretable and chain-based reasoning for au- tonomous driving

Ming Nie, Renyuan Peng, Chunwei Wang, Xinyue Cai, Jianhua Han, Hang Xu, and Li Zhang. Reason2drive: Towards interpretable and chain-based reasoning for au- tonomous driving. arXiv preprint arXiv:2312.03661, 2023. 2

work page arXiv 2023
[43]

Pixloop, 2024

PIX Moving. Pixloop, 2024. https : / / www . pixmoving.com/pixloop. 8

work page 2024
[44]

Dirty road can attack: Secu- rity of deep learning based automated lane centering under {Physical-World} attack

Takami Sato, Junjie Shen, Ningfei Wang, Yunhan Jia, Xue Lin, and Qi Alfred Chen. Dirty road can attack: Secu- rity of deep learning based automated lane centering under {Physical-World} attack. In USENIX, 2021. 8

work page 2021
[45]

Lmdrive: Closed-loop end-to-end driving with large language models

Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hongsheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. In CVPR, 2024. 1, 2, 5, 6

work page 2024
[46]

Drivelm: Driving with graph visual question answering.arXiv preprint arXiv:2312.14150,

Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual ques- tion answering. arXiv preprint arXiv:2312.14150, 2023. 2, 5, 6

work page arXiv 2023
[47]

Trans- ferable multimodal attack on vision-language pre-training models

Haodi Wang, Kai Dong, Zhilei Zhu, Haotong Qin, Aishan Liu, Xiaolin Fang, Jiakai Wang, and Xianglong Liu. Trans- ferable multimodal attack on vision-language pre-training models. In S&P, 2024. 2

work page 2024
[48]

Dual attention suppression at- tack: Generate adversarial camouflage in physical world

Jiakai Wang, Aishan Liu, Zixin Yin, Shunchang Liu, Shiyu Tang, and Xianglong Liu. Dual attention suppression at- tack: Generate adversarial camouflage in physical world. In CVPR, 2021. 2

work page 2021
[49]

Attack end-to-end autonomous driving through module-wise noise

Lu Wang, Tianyuan Zhang, Yikai Han, Muyang Fang, Ting Jin, and Jiaqi Kang. Attack end-to-end autonomous driving through module-wise noise. In CVPRW, 2024. 2

work page 2024
[50]

Drivemlm: Aligning multi-modal large language models with behavioral planning states for au- tonomous driving

Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, et al. Drivemlm: Aligning multi-modal large language models with behavioral planning states for au- tonomous driving. arXiv preprint arXiv:2312.09245, 2023. 2

work page arXiv 2023
[51]

Image quality assessment: from error visibility to structural similarity

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE TIP, 2004. 5, 8

work page 2004
[52]

Diversifying the high-level features for better adversarial transferability.arXiv preprint arXiv:2304.10136, 2023

Zhiyuan Wang, Zeliang Zhang, Siyuan Liang, and Xiaosen Wang. Diversifying the high-level features for better ad- versarial transferability. arXiv preprint arXiv:2304.10136 ,

work page arXiv
[53]

Transferable Adversarial Attacks for Image and Video Object Detection

Xingxing Wei, Siyuan Liang, Ning Chen, and Xiaochun Cao. Transferable adversarial attacks for image and video object detection. arXiv preprint arXiv:1811.12641, 2018. 1

work page internal anchor Pith review Pith/arXiv arXiv 2018
[54]

Highly transferable diffusion- based unrestricted adversarial attack on pre-trained vision- language models

Wenzhuo Xu, Kai Chen, Ziyi Gao, Zhipeng Wei, Jingjing Chen, and Yu-Gang Jiang. Highly transferable diffusion- based unrestricted adversarial attack on pre-trained vision- language models. In ACM MM, 2024. 2

work page 2024
[55]

Drivegpt4: Interpretable end-to-end autonomous driving via large language model

Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model. IEEE Robotics and Automation Let- ters, 2024. 1, 2

work page 2024
[56]

Vlattack: Multimodal adversarial attacks on vision-language tasks via pre-trained models

Ziyi Yin, Muchao Ye, Tianrong Zhang, Tianyu Du, Jinguo Zhu, Han Liu, Jinghui Chen, Ting Wang, and Fenglong Ma. Vlattack: Multimodal adversarial attacks on vision-language tasks via pre-trained models. Advances in Neural Informa- tion Processing Systems, 36, 2024. 1, 2

work page 2024
[57]

Safebench: A safety evaluation framework for multimodal large language models

Zonghao Ying, Aishan Liu, Siyuan Liang, Lei Huang, Jinyang Guo, Wenbo Zhou, Xianglong Liu, and Dacheng Tao. Safebench: A safety evaluation framework for multimodal large language models. arXiv preprint arXiv:2410.18927, 2024

work page arXiv 2024
[58]

Jailbreak vision language models via bi-modal adversarial prompt

Zonghao Ying, Aishan Liu, Tianyuan Zhang, Zhengmin Yu, Siyuan Liang, Xianglong Liu, and Dacheng Tao. Jailbreak vision language models via bi-modal adversarial prompt. arXiv preprint arXiv:2406.04031, 2024. 1, 2

work page arXiv 2024
[59]

Interpreting and im- proving adversarial robustness of deep neural networks with neuron sensitivity

Chongzhi Zhang, Aishan Liu, Xianglong Liu, Yitao Xu, Hang Yu, Yuqing Ma, and Tianlin Li. Interpreting and im- proving adversarial robustness of deep neural networks with neuron sensitivity. IEEE TIP, 2021. 2

work page 2021
[60]

Towards adversarial attack on vision-language pre-training models

Jiaming Zhang, Qi Yi, and Jitao Sang. Towards adversarial attack on vision-language pre-training models. In ACM MM,

work page
[61]

Anyattack: Towards large-scale self-supervised generation of targeted adversar- ial examples for vision-language models

Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Jitao Sang, and Dit-Yan Yeung. Anyattack: Towards large-scale self-supervised generation of targeted adversar- ial examples for vision-language models. arXiv preprint arXiv:2410.05346, 2024. 2, 5, 6, 7, 8

work page arXiv 2024
[62]

Benchmarking the physical-world ad- versarial robustness of vehicle detection

Tianyuan Zhang, Yisong Xiao, Xiaoya Zhang, Hao Li, and Lu Wang. Benchmarking the physical-world ad- versarial robustness of vehicle detection. arXiv preprint arXiv:2304.05098, 2023. 2

work page arXiv 2023
[63]

Module-wise adaptive adversarial training for end-to-end au- tonomous driving

Tianyuan Zhang, Lu Wang, Jiaqi Kang, Xinwei Zhang, Siyuan Liang, Yuwei Chen, Aishan Liu, and Xianglong Liu. Module-wise adaptive adversarial training for end-to-end au- tonomous driving. arXiv preprint arXiv:2409.07321, 2024. 2

work page arXiv 2024
[64]

Lanevil: Benchmarking the robustness of lane detection to environmental illusions

Tianyuan Zhang, Lu Wang, Hainan Li, Yisong Xiao, Siyuan Liang, Aishan Liu, Xianglong Liu, and Dacheng Tao. Lanevil: Benchmarking the robustness of lane detection to environmental illusions. arXiv preprint arXiv:2406.00934 ,

work page arXiv
[65]

Towards robust physical-world backdoor attacks on lane detection

Xinwei Zhang, Aishan Liu, Tianyuan Zhang, Siyuan Liang, and Xianglong Liu. Towards robust physical-world backdoor attacks on lane detection. arXiv preprint arXiv:2405.05553,

work page arXiv
[66]

On evaluat- ing adversarial robustness of large vision-language models

Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongx- uan Li, Ngai-Man Man Cheung, and Min Lin. On evaluat- ing adversarial robustness of large vision-language models. NeurIPS, 2024. 2, 5, 6, 7, 8

work page 2024
[67]

Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning

Ziqi Zhou, Shengshan Hu, Minghui Li, Hangtao Zhang, Yechao Zhang, and Hai Jin. Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning. In ACM MM, 2023. 2

work page 2023
[68]

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mo- hamed Elhoseiny. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023. 5, 6, 7

work page internal anchor Pith review Pith/arXiv arXiv 2023

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ah- mad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 ,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Image hijacks: Adversarial images can control generative models at runtime

Luke Bailey, Euan Ong, Stuart Russell, and Scott Emmons. Image hijacks: Adversarial images can control generative models at runtime. arXiv preprint arXiv:2309.00236, 2023. 2

work page arXiv 2023

[3] [3]

Driving with llms: Fusing object-level vec- tor modality for explainable autonomous driving

Long Chen, Oleg Sinavski, Jan H ¨unermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, and Jamie Shotton. Driving with llms: Fusing object-level vec- tor modality for explainable autonomous driving. In ICRA,

work page

[4] [4]

Word2vec

Kenneth Ward Church. Word2vec. Natural Language Engi- neering, 2017. 4

work page 2017

[5] [5]

Carla: An open urban driving simulator

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Conference on robot learning, 2017. 5

work page 2017

[6] [6]

Detecting hallucinations in large language models using semantic entropy

Sebastian Farquhar, Jannik Kossen, Lorenz Kuhn, and Yarin Gal. Detecting hallucinations in large language models using semantic entropy. Nature, 2024. 4

work page 2024

[7] [7]

Boosting transferability in vision-language attacks via diversification along the intersection region of adversarial trajectory

Sensen Gao, Xiaojun Jia, Xuhong Ren, Ivor Tsang, and Qing Guo. Boosting transferability in vision-language attacks via diversification along the intersection region of adversarial trajectory. In ECCV, 2025. 2

work page 2025

[8] [8]

Multimodal-gpt: A vision and language model for dialogue with humans

Tao Gong, Chengqi Lyu, Shilong Zhang, Yudong Wang, Miao Zheng, Qian Zhao, Kuikun Liu, Wenwei Zhang, Ping Luo, and Kai Chen. Multimodal-gpt: A vision and lan- guage model for dialogue with humans. arXiv preprint arXiv:2305.04790, 2023. 5, 6, 7

work page arXiv 2023

[9] [9]

Explaining and Harnessing Adversarial Examples

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014. 5, 6, 7

work page internal anchor Pith review Pith/arXiv arXiv 2014

[10] [10]

A comprehensive evaluation framework for deep model robustness

Jun Guo, Wei Bao, Jiakai Wang, Yuqing Ma, Xinghai Gao, Gang Xiao, Aishan Liu, Jian Dong, Xianglong Liu, and Wen- jun Wu. A comprehensive evaluation framework for deep model robustness. PR, 2023. 2

work page 2023

[11] [11]

Generating transferable 3d adversarial point cloud via random perturbation factoriza- tion

Bangyan He, Jian Liu, Yiming Li, Siyuan Liang, Jingzhi Li, Xiaojun Jia, and Xiaochun Cao. Generating transferable 3d adversarial point cloud via random perturbation factoriza- tion. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023. 1

work page 2023

[12] [12]

Exploring the physical- world adversarial robustness of vehicle detection

Wei Jiang, Tianyuan Zhang, Shuangcheng Liu, Weiyu Ji, Zichao Zhang, and Gang Xiao. Exploring the physical- world adversarial robustness of vehicle detection. Electron- ics, 2023. 2

work page 2023

[13] [13]

Robuste2e: Exploring the robustness of end-to-end autonomous driving

Wei Jiang, Lu Wang, Tianyuan Zhang, Yuwei Chen, Jian Dong, Wei Bao, Zichao Zhang, and Qiang Fu. Robuste2e: Exploring the robustness of end-to-end autonomous driving. Electronics, 2024. 2

work page 2024

[14] [14]

Environmen- tal matching attack against unmanned aerial vehicles object detection

Dehong Kong, Siyuan Liang, and Wenqi Ren. Environmen- tal matching attack against unmanned aerial vehicles object detection. arXiv preprint arXiv:2405.07595, 2024. 1

work page arXiv 2024

[15] [15]

Patch is enough: Naturalistic ad- versarial patch against vision-language pre-training models

Dehong Kong, Siyuan Liang, Xiaopeng Zhu, Yuansheng Zhong, and Wenqi Ren. Patch is enough: Naturalistic ad- versarial patch against vision-language pre-training models. arXiv preprint arXiv:2410.04884, 2024. 1

work page arXiv 2024

[16] [16]

Towards benchmarking and assessing visual naturalness of physical world adversarial attacks

Simin Li, Shuning Zhang, Gujun Chen, Dong Wang, Pu Feng, Jiakai Wang, Aishan Liu, Xin Yi, and Xianglong Liu. Towards benchmarking and assessing visual naturalness of physical world adversarial attacks. In CVPR, 2023. 2

work page 2023

[17] [17]

Efficient adversarial attacks for visual object tracking

Siyuan Liang, Xingxing Wei, Siyuan Yao, and Xiaochun Cao. Efficient adversarial attacks for visual object tracking. In Computer Vision–ECCV 2020: 16th European Confer- ence, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16, 2020. 1

work page 2020

[18] [18]

Generate more imperceptible adversarial examples for object detec- tion

Siyuan Liang, Xingxing Wei, and Xiaochun Cao. Generate more imperceptible adversarial examples for object detec- tion. In ICML, 2021

work page 2021

[19] [19]

A large-scale multiple- objective method for black-box attack against object detec- tion

Siyuan Liang, Longkang Li, Yanbo Fan, Xiaojun Jia, Jingzhi Li, Baoyuan Wu, and Xiaochun Cao. A large-scale multiple- objective method for black-box attack against object detec- tion. In European Conference on Computer Vision, 2022

work page 2022

[20] [20]

Parallel rect- angle flip attack: A query-based black-box attack against object detection.arXiv preprint arXiv:2201.08970, 2022

Siyuan Liang, Baoyuan Wu, Yanbo Fan, Xingxing Wei, and Xiaochun Cao. Parallel rectangle flip attack: A query-based black-box attack against object detection. arXiv preprint arXiv:2201.08970, 2022. 1

work page arXiv 2022

[21] [21]

Badclip: Dual- embedding guided backdoor attack on multimodal con- trastive learning

Siyuan Liang, Mingli Zhu, Aishan Liu, Baoyuan Wu, Xiaochun Cao, and Ee-Chien Chang. Badclip: Dual- embedding guided backdoor attack on multimodal con- trastive learning. arXiv preprint arXiv:2311.12075 , 2023. 2

work page arXiv 2023

[22] [22]

Perceptual-sensitive gan for generating adversarial patches

Aishan Liu, Xianglong Liu, Jiaxin Fan, Yuqing Ma, Anlan Zhang, Huiyuan Xie, and Dacheng Tao. Perceptual-sensitive gan for generating adversarial patches. In AAAI, 2019. 2

work page 2019

[23] [23]

Spatiotemporal attacks for embodied agents

Aishan Liu, Tairan Huang, Xianglong Liu, Yitao Xu, Yuqing Ma, Xinyun Chen, Stephen J Maybank, and Dacheng Tao. Spatiotemporal attacks for embodied agents. In ECCV, 2020

work page 2020

[24] [24]

Bias-based universal adver- sarial patch attack for automatic check-out

Aishan Liu, Jiakai Wang, Xianglong Liu, Bowen Cao, Chongzhi Zhang, and Hang Yu. Bias-based universal adver- sarial patch attack for automatic check-out. In ECCV, 2020

work page 2020

[25] [25]

Training robust deep neu- ral networks via adversarial noise propagation

Aishan Liu, Xianglong Liu, Hang Yu, Chongzhi Zhang, Qiang Liu, and Dacheng Tao. Training robust deep neu- ral networks via adversarial noise propagation. IEEE TIP, 2021

work page 2021

[26] [26]

{X-Adv}: Physical adversarial object attacks against x-ray prohibited item detection

Aishan Liu, Jun Guo, Jiakai Wang, Siyuan Liang, Renshuai Tao, Wenbo Zhou, Cong Liu, Xianglong Liu, and Dacheng Tao. {X-Adv}: Physical adversarial object attacks against x-ray prohibited item detection. In USENIX, 2023. 1

work page 2023

[27] [27]

Towards defend- ing multiple lp-norm bounded adversarial perturbations via gated batch normalization

Aishan Liu, Shiyu Tang, Xinyun Chen, Lei Huang, Haotong Qin, Xianglong Liu, and Dacheng Tao. Towards defend- ing multiple lp-norm bounded adversarial perturbations via gated batch normalization. IJCV, 2023

work page 2023

[28] [28]

Exploring the rela- tionship between architectural design and adversarially ro- bust generalization

Aishan Liu, Shiyu Tang, Siyuan Liang, Ruihao Gong, Boxi Wu, Xianglong Liu, and Dacheng Tao. Exploring the rela- tionship between architectural design and adversarially ro- bust generalization. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2023. 2

work page 2023

[29] [29]

Pre-trained trojan attacks for visual recog- nition

Aishan Liu, Xinwei Zhang, Yisong Xiao, Yuguang Zhou, Siyuan Liang, Jiakai Wang, Xianglong Liu, Xiaochun Cao, and Dacheng Tao. Pre-trained trojan attacks for visual recog- nition. arXiv preprint arXiv:2312.15172, 2023. 2

work page arXiv 2023

[30] [30]

Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. NeurIPS, 2024. 5, 6, 7

work page 2024

[31] [31]

Mtd-gpt: A multi-task decision-making gpt model for autonomous driving at unsignalized intersections

Jiaqi Liu, Peng Hang, Xiao Qi, Jianqiang Wang, and Jian Sun. Mtd-gpt: A multi-task decision-making gpt model for autonomous driving at unsignalized intersections. In ITSC,

work page

[32] [32]

Improving adversarial transferability by stable diffusion.arXiv preprint arXiv:2311.11017, 2023

Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, and Ee-Chien Chang. Improving adver- sarial transferability by stable diffusion. arXiv preprint arXiv:2311.11017, 2023. 1

work page arXiv 2023

[33] [33]

Harnessing percep- tual adversarial patches for crowd counting

Shunchang Liu, Jiakai Wang, Aishan Liu, Yingwei Li, Yijie Gao, Xianglong Liu, and Dacheng Tao. Harnessing percep- tual adversarial patches for crowd counting. In ACM CCS,

work page

[34] [34]

Hide in thicket: Gener- ating imperceptible and rational adversarial perturbations on 3d point clouds

Tianrui Lou, Xiaojun Jia, Jindong Gu, Li Liu, Siyuan Liang, Bangyan He, and Xiaochun Cao. Hide in thicket: Gener- ating imperceptible and rational adversarial perturbations on 3d point clouds. arXiv preprint arXiv:2403.05247, 2024. 1

work page arXiv 2024

[35] [35]

Poisoning attack against estimating from pairwise comparisons

Ke Ma, Qianqian Xu, Jinshan Zeng, Xiaochun Cao, and Qingming Huang. Poisoning attack against estimating from pairwise comparisons. IEEE Transactions on Pattern Analy- sis and Machine Intelligence, 44(10):6393–6408, 2021

work page 2021

[36] [36]

A tale of hodgerank and spectral method: Target attack against rank aggregation is the fixed point of adversarial game

Ke Ma, Qianqian Xu, Jinshan Zeng, Guorong Li, Xiaochun Cao, and Qingming Huang. A tale of hodgerank and spectral method: Target attack against rank aggregation is the fixed point of adversarial game. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4090–4108, 2022

work page 2022

[37] [37]

Sequential manipula- tion against rank aggregation: theory and algorithm

Ke Ma, Qianqian Xu, Jinshan Zeng, Wei Liu, Xiaochun Cao, Yingfei Sun, and Qingming Huang. Sequential manipula- tion against rank aggregation: theory and algorithm. IEEE TPAMI, 2024. 1

work page 2024

[38] [38]

Dolphins: Multimodal language model for driving

Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, and Chaowei Xiao. Dolphins: Multimodal language model for driving. arXiv preprint arXiv:2312.00438, 2023. 2, 5, 6

work page arXiv 2023

[39] [39]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learn- ing models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017. 5, 6, 7, 8

work page internal anchor Pith review Pith/arXiv arXiv 2017

[40] [40]

GPT-Driver: Learning to Drive with GPT

Jiageng Mao, Yuxi Qian, Junjie Ye, Hang Zhao, and Yue Wang. Gpt-driver: Learning to drive with gpt.arXiv preprint arXiv:2310.01415, 2023. 1, 2

work page internal anchor Pith review Pith/arXiv arXiv 2023

[41] [41]

Lingoqa: Video question answering for autonomous driving

Ana-Maria Marcu, Long Chen, Jan H ¨unermann, Alice Karn- sund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, et al. Lingoqa: Video question answering for autonomous driving. arXiv preprint arXiv:2312.14115, 2023. 2

work page arXiv 2023

[42] [42]

Reason2drive: Towards interpretable and chain-based reasoning for au- tonomous driving

Ming Nie, Renyuan Peng, Chunwei Wang, Xinyue Cai, Jianhua Han, Hang Xu, and Li Zhang. Reason2drive: Towards interpretable and chain-based reasoning for au- tonomous driving. arXiv preprint arXiv:2312.03661, 2023. 2

work page arXiv 2023

[43] [43]

Pixloop, 2024

PIX Moving. Pixloop, 2024. https : / / www . pixmoving.com/pixloop. 8

work page 2024

[44] [44]

Dirty road can attack: Secu- rity of deep learning based automated lane centering under {Physical-World} attack

Takami Sato, Junjie Shen, Ningfei Wang, Yunhan Jia, Xue Lin, and Qi Alfred Chen. Dirty road can attack: Secu- rity of deep learning based automated lane centering under {Physical-World} attack. In USENIX, 2021. 8

work page 2021

[45] [45]

Lmdrive: Closed-loop end-to-end driving with large language models

Hao Shao, Yuxuan Hu, Letian Wang, Guanglu Song, Steven L Waslander, Yu Liu, and Hongsheng Li. Lmdrive: Closed-loop end-to-end driving with large language models. In CVPR, 2024. 1, 2, 5, 6

work page 2024

[46] [46]

Drivelm: Driving with graph visual question answering.arXiv preprint arXiv:2312.14150,

Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Ping Luo, Andreas Geiger, and Hongyang Li. Drivelm: Driving with graph visual ques- tion answering. arXiv preprint arXiv:2312.14150, 2023. 2, 5, 6

work page arXiv 2023

[47] [47]

Trans- ferable multimodal attack on vision-language pre-training models

Haodi Wang, Kai Dong, Zhilei Zhu, Haotong Qin, Aishan Liu, Xiaolin Fang, Jiakai Wang, and Xianglong Liu. Trans- ferable multimodal attack on vision-language pre-training models. In S&P, 2024. 2

work page 2024

[48] [48]

Dual attention suppression at- tack: Generate adversarial camouflage in physical world

Jiakai Wang, Aishan Liu, Zixin Yin, Shunchang Liu, Shiyu Tang, and Xianglong Liu. Dual attention suppression at- tack: Generate adversarial camouflage in physical world. In CVPR, 2021. 2

work page 2021

[49] [49]

Attack end-to-end autonomous driving through module-wise noise

Lu Wang, Tianyuan Zhang, Yikai Han, Muyang Fang, Ting Jin, and Jiaqi Kang. Attack end-to-end autonomous driving through module-wise noise. In CVPRW, 2024. 2

work page 2024

[50] [50]

Drivemlm: Aligning multi-modal large language models with behavioral planning states for au- tonomous driving

Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, et al. Drivemlm: Aligning multi-modal large language models with behavioral planning states for au- tonomous driving. arXiv preprint arXiv:2312.09245, 2023. 2

work page arXiv 2023

[51] [51]

Image quality assessment: from error visibility to structural similarity

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE TIP, 2004. 5, 8

work page 2004

[52] [52]

Diversifying the high-level features for better adversarial transferability.arXiv preprint arXiv:2304.10136, 2023

Zhiyuan Wang, Zeliang Zhang, Siyuan Liang, and Xiaosen Wang. Diversifying the high-level features for better ad- versarial transferability. arXiv preprint arXiv:2304.10136 ,

work page arXiv

[53] [53]

Transferable Adversarial Attacks for Image and Video Object Detection

Xingxing Wei, Siyuan Liang, Ning Chen, and Xiaochun Cao. Transferable adversarial attacks for image and video object detection. arXiv preprint arXiv:1811.12641, 2018. 1

work page internal anchor Pith review Pith/arXiv arXiv 2018

[54] [54]

Highly transferable diffusion- based unrestricted adversarial attack on pre-trained vision- language models

Wenzhuo Xu, Kai Chen, Ziyi Gao, Zhipeng Wei, Jingjing Chen, and Yu-Gang Jiang. Highly transferable diffusion- based unrestricted adversarial attack on pre-trained vision- language models. In ACM MM, 2024. 2

work page 2024

[55] [55]

Drivegpt4: Interpretable end-to-end autonomous driving via large language model

Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee K Wong, Zhenguo Li, and Hengshuang Zhao. Drivegpt4: Interpretable end-to-end autonomous driving via large language model. IEEE Robotics and Automation Let- ters, 2024. 1, 2

work page 2024

[56] [56]

Vlattack: Multimodal adversarial attacks on vision-language tasks via pre-trained models

Ziyi Yin, Muchao Ye, Tianrong Zhang, Tianyu Du, Jinguo Zhu, Han Liu, Jinghui Chen, Ting Wang, and Fenglong Ma. Vlattack: Multimodal adversarial attacks on vision-language tasks via pre-trained models. Advances in Neural Informa- tion Processing Systems, 36, 2024. 1, 2

work page 2024

[57] [57]

Safebench: A safety evaluation framework for multimodal large language models

Zonghao Ying, Aishan Liu, Siyuan Liang, Lei Huang, Jinyang Guo, Wenbo Zhou, Xianglong Liu, and Dacheng Tao. Safebench: A safety evaluation framework for multimodal large language models. arXiv preprint arXiv:2410.18927, 2024

work page arXiv 2024

[58] [58]

Jailbreak vision language models via bi-modal adversarial prompt

Zonghao Ying, Aishan Liu, Tianyuan Zhang, Zhengmin Yu, Siyuan Liang, Xianglong Liu, and Dacheng Tao. Jailbreak vision language models via bi-modal adversarial prompt. arXiv preprint arXiv:2406.04031, 2024. 1, 2

work page arXiv 2024

[59] [59]

Interpreting and im- proving adversarial robustness of deep neural networks with neuron sensitivity

Chongzhi Zhang, Aishan Liu, Xianglong Liu, Yitao Xu, Hang Yu, Yuqing Ma, and Tianlin Li. Interpreting and im- proving adversarial robustness of deep neural networks with neuron sensitivity. IEEE TIP, 2021. 2

work page 2021

[60] [60]

Towards adversarial attack on vision-language pre-training models

Jiaming Zhang, Qi Yi, and Jitao Sang. Towards adversarial attack on vision-language pre-training models. In ACM MM,

work page

[61] [61]

Anyattack: Towards large-scale self-supervised generation of targeted adversar- ial examples for vision-language models

Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Jitao Sang, and Dit-Yan Yeung. Anyattack: Towards large-scale self-supervised generation of targeted adversar- ial examples for vision-language models. arXiv preprint arXiv:2410.05346, 2024. 2, 5, 6, 7, 8

work page arXiv 2024

[62] [62]

Benchmarking the physical-world ad- versarial robustness of vehicle detection

Tianyuan Zhang, Yisong Xiao, Xiaoya Zhang, Hao Li, and Lu Wang. Benchmarking the physical-world ad- versarial robustness of vehicle detection. arXiv preprint arXiv:2304.05098, 2023. 2

work page arXiv 2023

[63] [63]

Module-wise adaptive adversarial training for end-to-end au- tonomous driving

Tianyuan Zhang, Lu Wang, Jiaqi Kang, Xinwei Zhang, Siyuan Liang, Yuwei Chen, Aishan Liu, and Xianglong Liu. Module-wise adaptive adversarial training for end-to-end au- tonomous driving. arXiv preprint arXiv:2409.07321, 2024. 2

work page arXiv 2024

[64] [64]

Lanevil: Benchmarking the robustness of lane detection to environmental illusions

Tianyuan Zhang, Lu Wang, Hainan Li, Yisong Xiao, Siyuan Liang, Aishan Liu, Xianglong Liu, and Dacheng Tao. Lanevil: Benchmarking the robustness of lane detection to environmental illusions. arXiv preprint arXiv:2406.00934 ,

work page arXiv

[65] [65]

Towards robust physical-world backdoor attacks on lane detection

Xinwei Zhang, Aishan Liu, Tianyuan Zhang, Siyuan Liang, and Xianglong Liu. Towards robust physical-world backdoor attacks on lane detection. arXiv preprint arXiv:2405.05553,

work page arXiv

[66] [66]

On evaluat- ing adversarial robustness of large vision-language models

Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongx- uan Li, Ngai-Man Man Cheung, and Min Lin. On evaluat- ing adversarial robustness of large vision-language models. NeurIPS, 2024. 2, 5, 6, 7, 8

work page 2024

[67] [67]

Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning

Ziqi Zhou, Shengshan Hu, Minghui Li, Hangtao Zhang, Yechao Zhang, and Hai Jin. Advclip: Downstream-agnostic adversarial examples in multimodal contrastive learning. In ACM MM, 2023. 2

work page 2023

[68] [68]

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, and Mo- hamed Elhoseiny. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023. 5, 6, 7

work page internal anchor Pith review Pith/arXiv arXiv 2023