Attention Hijacking: Response Manipulation Across Queries in Vision-Language Models

Dongrui Liu; Wei Xue; Wenhan Luo; Yan Li; Yike Guo; Zhiqiang Wang; Zonghao Ying

arxiv: 2605.17310 · v1 · pith:U6CJ544Fnew · submitted 2026-05-17 · 💻 cs.CV · cs.AI

Attention Hijacking: Response Manipulation Across Queries in Vision-Language Models

Zhiqiang Wang , Dongrui Liu , Yan Li , Zonghao Ying , Wei Xue , Wenhan Luo , Yike Guo This is my paper

Pith reviewed 2026-05-20 14:21 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords adversarial attacksvision-language modelsattention manipulationcross-query transferabilityresponse manipulationimage-dominant attentionadversarial examples

0 comments

The pith

Steering attention toward visual tokens lets one adversarial image force target responses across many different queries in vision-language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to show that adversarial examples can keep working when users change the wording of their questions about the same image. Current attacks often fail in this setting because the model's output still depends heavily on the exact text of the query. The authors link successful cross-query transfer to the model maintaining an attention pattern that stays dominated by the image rather than the text. They introduce Attention Hijacking to push the internal attention in that direction by boosting the effect of visual tokens on the desired response tokens and reducing the effect of textual tokens. If the approach succeeds, attacks become more reliable in realistic conditions where queries are not fixed in advance.

Core claim

The paper establishes that preserving an image-dominant attention pattern during response generation enables substantially better cross-query transferability for adversarial examples in vision-language models. Attention Hijacking achieves this by explicitly amplifying the influence of visual tokens on target response tokens while suppressing the competing influence of textual tokens, thereby reducing how much the manipulated output depends on the specific wording of any given query.

What carries the argument

Attention Hijacking, the method that steers internal attention distributions toward a persistent image-dominant pattern by amplifying visual token influence on target response tokens and suppressing textual token influence.

If this is right

The attack improves transfer success across diverse target responses and previously unseen queries on standard vision-language models.
The same steering approach extends to several different attack scenarios beyond the initial setting.
Attention stability emerges as a controllable factor that directly affects how well response manipulation survives changes in query text.
The findings highlight that internal attention patterns can be a lever for making adversarial examples more query-independent.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenses could focus on detecting or regularizing image-dominant attention shifts rather than only looking at final output tokens.
The same attention-steering idea might transfer to other multimodal systems where one modality needs to dominate over another for attack transfer.
If attention hijacking proves reliable, training routines that encourage balanced attention across modalities could reduce vulnerability to this class of attack.
Future work might test whether the method still works when the model is fine-tuned on data that explicitly discourages over-reliance on visual tokens.

Load-bearing premise

That preserving an image-dominant attention pattern during response generation is what drives successful cross-query transfer and that explicitly steering attention toward this pattern will produce the claimed transfer gains.

What would settle it

An experiment that steers attention to an image-dominant pattern yet finds no improvement in success rate when the same perturbed image is paired with new queries compared to existing attacks.

Figures

Figures reproduced from arXiv: 2605.17310 by Dongrui Liu, Wei Xue, Wenhan Luo, Yan Li, Yike Guo, Zhiqiang Wang, Zonghao Ying.

**Figure 2.** Figure 2: Distribution of attention scores from image and text tokens to response tokens across [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Attention Reallocation. In a normal user query, attention scores from image and text tokens [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization on inducing hallucination via Attention Hijacking [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Result on sponge examples [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study on the layers [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 8.** Figure 8: Ablation study on the selection of layers. The number within each colored block represents [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Ablation study on the selection of layers. The number within each colored block represents [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Ablation study on threshold for attention ratio (image/text) [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Attack Success Rate (ASR) of Attention Hijacking on InternVL-2.5 with randomly [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of loss convergence during optimization. The red dots indicate successful [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

**Figure 13.** Figure 13: The evolution of the model’s internal attention distribution during Attention Hijacking [PITH_FULL_IMAGE:figures/full_fig_p026_13.png] view at source ↗

**Figure 14.** Figure 14: The evolution of internal attention scores within the model during PGD optimization. [PITH_FULL_IMAGE:figures/full_fig_p027_14.png] view at source ↗

**Figure 15.** Figure 15: The evolution of internal attention scores within the model during Attention Hijacking [PITH_FULL_IMAGE:figures/full_fig_p028_15.png] view at source ↗

**Figure 16.** Figure 16: Visualization on InternVL-2.5. The user image is adversarially manipulate to induce target [PITH_FULL_IMAGE:figures/full_fig_p030_16.png] view at source ↗

**Figure 17.** Figure 17: Visualization on InternVL-2.5. The user image is adversarially manipulate to induce target [PITH_FULL_IMAGE:figures/full_fig_p031_17.png] view at source ↗

**Figure 18.** Figure 18: Visualization inducing hallucination. The user image is adversarially manipulate to induce [PITH_FULL_IMAGE:figures/full_fig_p032_18.png] view at source ↗

read the original abstract

Existing adversarial attacks on vision-language models (VLMs) can steer model outputs toward attacker-specified target responses, but their effectiveness often degrades when the same perturbed input is paired with different textual queries. This paper studies cross-query response manipulation, where a single adversarial example is expected to remain effective across diverse user queries. We first analyze the limitations of existing attacks and find that successful transfer is closely associated with preserving an image-dominant attention pattern during response generation. Motivated by the observation, we propose \textbf{Attention Hijacking}, a novel adversarial attack that explicitly steers internal attention distributions toward a persistent image-dominant pattern. By amplifying the influence of visual tokens on target response tokens while suppressing the competing influence of textual tokens, our method reduces the dependence of the manipulated output on the specific wording of the query. Extensive experiments on widely used VLMs show that Attention Hijacking substantially improves cross-query transferability across diverse target responses and unseen queries. The method also extends effectively to multiple attack scenarios, offering new insights into the role of attention stability in transferable response manipulation for VLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper introduces Attention Hijacking to steer VLMs toward image-dominant attention for better cross-query attack transfer, but the causal role of that steering remains unproven.

read the letter

The main thing to know is that this work targets a practical gap in VLM adversarial attacks: most existing methods lose effectiveness when the same perturbed image meets a new text query. The authors observe that successful cross-query cases tend to keep attention focused on the image rather than the query text, then propose Attention Hijacking to explicitly amplify visual token influence on the target response while damping textual tokens. That framing is the clearest novelty here. They motivate the approach with an analysis of attention patterns and report that the resulting attacks hold up across diverse queries and target responses on standard VLMs, plus a few other attack settings. The experiments appear broad enough to give readers concrete numbers on transfer rates, which is the part that could be useful to people building or testing multimodal systems. The soft spot is the jump from correlation to mechanism. The abstract links good transfer to image-dominant attention and then assumes that forcing the pattern will produce the gains, but it does not describe controlled ablations that hold perturbation size fixed while removing the attention objective. Without those, the steering term could be incidental rather than load-bearing. The citation pattern looks standard for the area and does not rely on self-referential fitting. This paper is for researchers working on transferable attacks or attention-based robustness in vision-language models. A reader already following that literature will find the cross-query angle and the reported improvements worth checking, even if the causal story needs tightening. I would send it to peer review because the problem is real, the proposed fix is straightforward to implement, and the experiments give enough data for referees to evaluate the claims directly.

Referee Report

2 major / 2 minor

Summary. The paper introduces Attention Hijacking, a novel adversarial attack on vision-language models aimed at cross-query response manipulation. It first analyzes existing attacks and observes that successful transfer across queries correlates with preserving an image-dominant attention pattern during response generation. Motivated by this, the method explicitly steers attention distributions by amplifying the influence of visual tokens on target response tokens while suppressing textual token influence. Extensive experiments on standard VLMs demonstrate substantially improved cross-query transferability for diverse target responses and unseen queries, with extensions to multiple attack scenarios.

Significance. If the central claims hold after addressing the noted gaps, this work would be significant for adversarial robustness research in multimodal models. It provides mechanistic insights linking attention stability to query-independent transferability and offers a practical method that outperforms prior approaches in cross-query settings. The emphasis on internal attention patterns could guide both stronger attacks and potential defenses, advancing understanding of how VLMs process visual versus textual inputs under perturbation.

major comments (2)

[§3 (Analysis and Motivation)] §3 (Analysis and Motivation): The manuscript reports an association between image-dominant attention patterns and successful cross-query transfers but does not establish that explicitly steering attention to this pattern is the causal mechanism driving the gains. No controlled ablation is presented that holds perturbation magnitude or other optimization terms fixed while removing the attention-hijacking objective to isolate its contribution. This is load-bearing for the central claim that the method 'reduces the dependence of the manipulated output on the specific wording of the query' via attention steering.
[§5 (Experiments)] §5 (Experiments): The reported improvements in cross-query transferability lack supporting details on query selection criteria, statistical significance (e.g., error bars or p-values across runs), and full ablation tables comparing variants. Without these, it is difficult to verify that the gains are robust and not due to post-hoc choices or unaccounted factors, undermining the claim of 'substantially improves cross-query transferability'.

minor comments (2)

[Abstract] Abstract: The phrase 'substantially improves' should be accompanied by concrete metrics (e.g., average success rate increase) or a reference to the relevant table/figure for precision.
[Method] Notation in the method description: Clarify how the amplification and suppression terms are combined into the final loss function, including any weighting hyperparameters, to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments, which help clarify the presentation of our contributions. We address each major comment below and will revise the manuscript accordingly to strengthen the evidence for our claims.

read point-by-point responses

Referee: [§3 (Analysis and Motivation)] §3 (Analysis and Motivation): The manuscript reports an association between image-dominant attention patterns and successful cross-query transfers but does not establish that explicitly steering attention to this pattern is the causal mechanism driving the gains. No controlled ablation is presented that holds perturbation magnitude or other optimization terms fixed while removing the attention-hijacking objective to isolate its contribution. This is load-bearing for the central claim that the method 'reduces the dependence of the manipulated output on the specific wording of the query' via attention steering.

Authors: We acknowledge that Section 3 primarily demonstrates a correlation between image-dominant attention patterns and improved cross-query transferability, without a controlled ablation that isolates the attention-hijacking term while holding perturbation magnitude and other loss components fixed. To directly address this, we will add such an ablation in the revised manuscript. We will compare the full Attention Hijacking objective against a variant that removes the attention-steering component (while matching perturbation budgets via the same L_p constraint), reporting the resulting differences in cross-query success rates. This will provide clearer evidence for the causal role of attention steering in reducing query dependence. revision: yes
Referee: [§5 (Experiments)] §5 (Experiments): The reported improvements in cross-query transferability lack supporting details on query selection criteria, statistical significance (e.g., error bars or p-values across runs), and full ablation tables comparing variants. Without these, it is difficult to verify that the gains are robust and not due to post-hoc choices or unaccounted factors, undermining the claim of 'substantially improves cross-query transferability'.

Authors: We agree that the experimental section would benefit from greater transparency and rigor. In the revision, we will expand Section 5 (and the appendix) to specify the query selection criteria in detail, including how queries were chosen for diversity in length, topic, and phrasing. We will also report results with error bars from multiple independent optimization runs, include p-values for key comparisons, and provide complete ablation tables for all method variants. These additions will allow readers to better assess the robustness of the transferability improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is observation-driven and self-contained

full rationale

The paper first analyzes existing attacks to observe an association between successful cross-query transfer and image-dominant attention patterns during response generation. This empirical finding directly motivates the Attention Hijacking method, which introduces an explicit steering objective to amplify visual-token influence and suppress textual-token influence. No load-bearing step reduces to a self-citation chain, a fitted parameter renamed as a prediction, a self-definitional construct, or an ansatz imported from prior author work. The central claim rests on the proposed intervention and its experimental validation rather than on any internal redefinition or circular reduction of the target quantity. The derivation therefore remains independent of its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities are detailed in the provided text. The approach implicitly relies on standard assumptions from adversarial ML about gradient-based optimization and attention interpretability in transformers.

pith-pipeline@v0.9.0 · 5727 in / 1173 out tokens · 75619 ms · 2026-05-20T14:21:01.046789+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we encourage the average visual-to-textual attention ratio to exceed a threshold r, i.e., ¯A(l,h)img→y/( ¯A(l,h)txt→y +τ)≥r … LAR = Σ max(0, r − … )²
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

successful transfer is closely associated with preserving an image-dominant attention pattern during response generation

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 22 internal anchors

[1]

Phi-4 Technical Report

Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

Lawrence Zitnick, Devi Parikh, and Dhruv Batra

Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Devi Parikh, and Dhruv Batra. Vqa: Visual question answering.International Journal of Computer Vision, 123:4 – 31, 2015

work page 2015
[3]

Claude, 2023

Anthropic. Claude, 2023

work page 2023
[4]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenhang Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, K. Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, ...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report. a...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Adversarial examples are not easily detected: Bypassing ten detection methods

Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramèr, and Ludwig Schmidt. Are aligned neural networks adversarially aligned?ArXiv, abs/2306.15447, 2023

work page arXiv 2023
[8]

Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57, 2016

work page 2017
[9]

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Zhong Muyan, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, and Jifeng Dai. Intern vl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24185– 24198, 2023

work page 2024
[10]

Deepseek, 2023

Deepseek. Deepseek, 2023

work page 2023
[11]

How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023

Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, X. Yang, Yichi Zhang, Yu Tian, Hang Su, and Jun Zhu. How robust is google’s bard to adversarial image attacks?ArXiv, abs/2309.11751, 2023

work page arXiv 2023
[12]

Boosting adversarial attacks with momentum.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9185–9193, 2017

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9185–9193, 2017

work page 2018
[13]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 10

work page internal anchor Pith review Pith/arXiv arXiv 2010
[14]

Mini-internvl: a flexible-transfer pocket multi- modal model with 5% parameters and 90% performance.Visual Intelligence, 2(1):1–17, 2024

Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, et al. Mini-internvl: a flexible-transfer pocket multi- modal model with 5% parameters and 90% performance.Visual Intelligence, 2(1):1–17, 2024

work page 2024
[15]

Explaining and Harnessing Adversarial Examples

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples.CoRR, abs/1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[16]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[17]

Adversarial examples in the physical world

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world.ArXiv, abs/1607.02533, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[18]

Flux.https://github.com/black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

work page 2024
[19]

Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...

work page 2025
[20]

Evaluating Object Hallucination in Large Vision-Language Models

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. Evaluating object hallucination in large vision-language models.arXiv preprint arXiv:2305.10355, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[21]

Improved baselines with visual instruction tuning, 2023

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning, 2023

work page 2023
[22]

Llava-next: Improved reasoning, ocr, and world knowledge, January 2024

Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. Llava-next: Improved reasoning, ocr, and world knowledge, January 2024

work page 2024
[23]

Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.ArXiv, abs/2304.08485, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

Zhao Liu and Huan Zhang. Stealthy backdoor attack in self-supervised learning vision encoders for large vision language models.2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25060–25070, 2025

work page 2025
[25]

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Haoyu Lu, Wen Liu, Bo Zhang, Bing-Li Wang, Kai Dong, Bo Liu (Benjamin Liu), Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, and Chong Ruan. Deepseek-vl: Towards real-world vision-language understanding.ArXiv, abs/2403.05525, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[26]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.ArXiv, abs/1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[27]

Jailbreaking attack against multimodal large language model.arXiv preprint arXiv:2402.02309, 2024

Zhenxing Niu, Haodong Ren, Xinbo Gao, Gang Hua, and Rong Jin. Jailbreaking attack against multimodal large language model.ArXiv, abs/2402.02309, 2024

work page arXiv 2024
[28]

Chatgpt, 2022

OpenAI. Chatgpt, 2022

work page 2022
[29]

Visual adversarial examples jailbreak aligned large language models

Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Mengdi Wang, and Prateek Mittal. Visual adversarial examples jailbreak aligned large language models. InAAAI Conference on Artificial Intelligence, 2023

work page 2023
[30]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021
[31]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, D. Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks.CoRR, abs/1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[32]

Gemma 3 Technical Report

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025. 11

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

Kimi-VL Technical Report

Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, et al. Kimi-vl technical report.arXiv preprint arXiv:2504.07491, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alab- dulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv preprint arXiv:2502.14786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision- language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[39]

Attention! your vision language model could be maliciously manipulated

Xiaosen Wang, Shaokang Wang, Zhijin Ge, Yuyang Luo, and Shudong Zhang. Attention! your vision language model could be maliciously manipulated. 2025

work page 2025
[40]

Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding, 2024

Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, and Chong Ruan. Deepseek-vl2: Mixture-of-experts visio...

work page 2024
[41]

Im- proving transferability of adversarial examples with input diversity.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2725–2734, 2018

Cihang Xie, Zhishuai Zhang, Jianyu Wang, Yuyin Zhou, Zhou Ren, and Alan Loddon Yuille. Im- proving transferability of adversarial examples with input diversity.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2725–2734, 2018

work page 2019
[42]

Shadowcast: Stealthy data poisoning attacks against vision-language models

Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, and Furong Huang. Shadowcast: Stealthy data poisoning attacks against vision-language models. ArXiv, abs/2402.06659, 2024

work page arXiv 2024
[43]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF international conference on computer vision, pages 11975–11986, 2023

work page 2023
[44]

Theoretically principled trade-off between robustness and accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InInternational conference on machine learning, pages 7472–7482. PMLR, 2019

work page 2019
[45]

Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

work page 2023
[46]

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, et al. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models.arXiv preprint arXiv:2504.10479, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

Safety fine-tuning at (almost) no cost: A baseline for vision large language models

Yongshuo Zong, Ondrej Bohdal, Tingyang Yu, Yongxin Yang, and Timothy Hospedales. Safety fine-tuning at (almost) no cost: A baseline for vision large language models.arXiv preprint arXiv:2402.02207, 2024. 12

work page arXiv 2024
[48]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. A Related Works. A.1 Visual Language Models Visual Language Models (VLMs) represent a category of multimodal artificial intelligence systems designed to pro...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[49]

Here is a detailed and continuous description of the image. The response continues with more observations and explanations

Gradient Update:Compute the gradient of the loss with respect to the current adversarial image: g(k) =∇ x(k) adv L(f, x(k) adv, q, Y),(10) then take a step in the negative gradient direction with step sizeα: ˜x(k+1) adv =x (k) adv −α·sign(g (k)).(11) The sign function is used for ℓ∞ norm constraints; for other norms, the raw gradient g(k) may be used inst...

work page

[1] [1]

Phi-4 Technical Report

Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J Hewett, Mojan Javaheripi, Piero Kauffmann, et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

Lawrence Zitnick, Devi Parikh, and Dhruv Batra

Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Devi Parikh, and Dhruv Batra. Vqa: Visual question answering.International Journal of Computer Vision, 123:4 – 31, 2015

work page 2015

[3] [3]

Claude, 2023

Anthropic. Claude, 2023

work page 2023

[4] [4]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenhang Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, K. Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, ...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[5] [5]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei Huang, Binyuan Hui, Shutong Jiang, Zhaohai Li, Mingsheng Li, Mei Li, Kaixin Li, Zicheng Lin, Junyang Lin, Xuejing Liu, Jiawei Liu, Chenglong Liu, Yang Liu, Dayiheng Liu, Shixuan ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report. a...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

Adversarial examples are not easily detected: Bypassing ten detection methods

Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramèr, and Ludwig Schmidt. Are aligned neural networks adversarially aligned?ArXiv, abs/2306.15447, 2023

work page arXiv 2023

[8] [8]

Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57, 2016

work page 2017

[9] [9]

Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Zhong Muyan, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, and Jifeng Dai. Intern vl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks.2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24185– 24198, 2023

work page 2024

[10] [10]

Deepseek, 2023

Deepseek. Deepseek, 2023

work page 2023

[11] [11]

How robust is Google’s Bard to adversarial image attacks? arXiv:2309.11751, 2023

Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, X. Yang, Yichi Zhang, Yu Tian, Hang Su, and Jun Zhu. How robust is google’s bard to adversarial image attacks?ArXiv, abs/2309.11751, 2023

work page arXiv 2023

[12] [12]

Boosting adversarial attacks with momentum.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9185–9193, 2017

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum.2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9185–9193, 2017

work page 2018

[13] [13]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 10

work page internal anchor Pith review Pith/arXiv arXiv 2010

[14] [14]

Mini-internvl: a flexible-transfer pocket multi- modal model with 5% parameters and 90% performance.Visual Intelligence, 2(1):1–17, 2024

Zhangwei Gao, Zhe Chen, Erfei Cui, Yiming Ren, Weiyun Wang, Jinguo Zhu, Hao Tian, Shenglong Ye, Junjun He, Xizhou Zhu, et al. Mini-internvl: a flexible-transfer pocket multi- modal model with 5% parameters and 90% performance.Visual Intelligence, 2(1):1–17, 2024

work page 2024

[15] [15]

Explaining and Harnessing Adversarial Examples

Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adver- sarial examples.CoRR, abs/1412.6572, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[16] [16]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[17] [17]

Adversarial examples in the physical world

Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world.ArXiv, abs/1607.02533, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[18] [18]

Flux.https://github.com/black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

work page 2024

[19] [19]

Flux.1 kontext: Flow matching for in-context image generation and editing in latent space, 2025

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context image ...

work page 2025

[20] [20]

Evaluating Object Hallucination in Large Vision-Language Models

Yifan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. Evaluating object hallucination in large vision-language models.arXiv preprint arXiv:2305.10355, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[21] [21]

Improved baselines with visual instruction tuning, 2023

Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning, 2023

work page 2023

[22] [22]

Llava-next: Improved reasoning, ocr, and world knowledge, January 2024

Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, and Yong Jae Lee. Llava-next: Improved reasoning, ocr, and world knowledge, January 2024

work page 2024

[23] [23]

Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.ArXiv, abs/2304.08485, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[24] [24]

Zhao Liu and Huan Zhang. Stealthy backdoor attack in self-supervised learning vision encoders for large vision language models.2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25060–25070, 2025

work page 2025

[25] [25]

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Haoyu Lu, Wen Liu, Bo Zhang, Bing-Li Wang, Kai Dong, Bo Liu (Benjamin Liu), Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, and Chong Ruan. Deepseek-vl: Towards real-world vision-language understanding.ArXiv, abs/2403.05525, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[26] [26]

Towards Deep Learning Models Resistant to Adversarial Attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.ArXiv, abs/1706.06083, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[27] [27]

Jailbreaking attack against multimodal large language model.arXiv preprint arXiv:2402.02309, 2024

Zhenxing Niu, Haodong Ren, Xinbo Gao, Gang Hua, and Rong Jin. Jailbreaking attack against multimodal large language model.ArXiv, abs/2402.02309, 2024

work page arXiv 2024

[28] [28]

Chatgpt, 2022

OpenAI. Chatgpt, 2022

work page 2022

[29] [29]

Visual adversarial examples jailbreak aligned large language models

Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Mengdi Wang, and Prateek Mittal. Visual adversarial examples jailbreak aligned large language models. InAAAI Conference on Artificial Intelligence, 2023

work page 2023

[30] [30]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021

[31] [31]

Intriguing properties of neural networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, D. Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks.CoRR, abs/1312.6199, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[32] [32]

Gemma 3 Technical Report

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786, 2025. 11

work page internal anchor Pith review Pith/arXiv arXiv 2025

[33] [33]

Gemma: Open Models Based on Gemini Research and Technology

Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. Gemma: Open models based on gemini research and technology.arXiv preprint arXiv:2403.08295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, et al. Gemma 2: Improving open language models at a practical size.arXiv preprint arXiv:2408.00118, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[35] [35]

Kimi-VL Technical Report

Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, et al. Kimi-vl technical report.arXiv preprint arXiv:2504.07491, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[36] [36]

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alab- dulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. Siglip 2: Multilingual vision-language encoders with improved semantic understanding, localization, and dense features.arXiv preprint arXiv:2502.14786, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision- language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency.arXiv preprint arXiv:2508.18265, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[39] [39]

Attention! your vision language model could be maliciously manipulated

Xiaosen Wang, Shaokang Wang, Zhijin Ge, Yuyang Luo, and Shudong Zhang. Attention! your vision language model could be maliciously manipulated. 2025

work page 2025

[40] [40]

Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding, 2024

Zhiyu Wu, Xiaokang Chen, Zizheng Pan, Xingchao Liu, Wen Liu, Damai Dai, Huazuo Gao, Yiyang Ma, Chengyue Wu, Bingxuan Wang, Zhenda Xie, Yu Wu, Kai Hu, Jiawei Wang, Yaofeng Sun, Yukun Li, Yishi Piao, Kang Guan, Aixin Liu, Xin Xie, Yuxiang You, Kai Dong, Xingkai Yu, Haowei Zhang, Liang Zhao, Yisong Wang, and Chong Ruan. Deepseek-vl2: Mixture-of-experts visio...

work page 2024

[41] [41]

Im- proving transferability of adversarial examples with input diversity.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2725–2734, 2018

Cihang Xie, Zhishuai Zhang, Jianyu Wang, Yuyin Zhou, Zhou Ren, and Alan Loddon Yuille. Im- proving transferability of adversarial examples with input diversity.2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2725–2734, 2018

work page 2019

[42] [42]

Shadowcast: Stealthy data poisoning attacks against vision-language models

Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, and Furong Huang. Shadowcast: Stealthy data poisoning attacks against vision-language models. ArXiv, abs/2402.06659, 2024

work page arXiv 2024

[43] [43]

Sigmoid loss for language image pre-training

Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, and Lucas Beyer. Sigmoid loss for language image pre-training. InProceedings of the IEEE/CVF international conference on computer vision, pages 11975–11986, 2023

work page 2023

[44] [44]

Theoretically principled trade-off between robustness and accuracy

Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InInternational conference on machine learning, pages 7472–7482. PMLR, 2019

work page 2019

[45] [45]

Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in neural information processing systems, 36:46595–46623, 2023

work page 2023

[46] [46]

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, et al. Internvl3: Exploring advanced training and test-time recipes for open-source multimodal models.arXiv preprint arXiv:2504.10479, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[47] [47]

Safety fine-tuning at (almost) no cost: A baseline for vision large language models

Yongshuo Zong, Ondrej Bohdal, Tingyang Yu, Yongxin Yang, and Timothy Hospedales. Safety fine-tuning at (almost) no cost: A baseline for vision large language models.arXiv preprint arXiv:2402.02207, 2024. 12

work page arXiv 2024

[48] [48]

Universal and Transferable Adversarial Attacks on Aligned Language Models

Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models.arXiv preprint arXiv:2307.15043, 2023. A Related Works. A.1 Visual Language Models Visual Language Models (VLMs) represent a category of multimodal artificial intelligence systems designed to pro...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[49] [49]

Here is a detailed and continuous description of the image. The response continues with more observations and explanations

Gradient Update:Compute the gradient of the loss with respect to the current adversarial image: g(k) =∇ x(k) adv L(f, x(k) adv, q, Y),(10) then take a step in the negative gradient direction with step sizeα: ˜x(k+1) adv =x (k) adv −α·sign(g (k)).(11) The sign function is used for ℓ∞ norm constraints; for other norms, the raw gradient g(k) may be used inst...

work page