pith. sign in

arxiv: 2607.00361 · v1 · pith:BZR6X5V5new · submitted 2026-07-01 · 💻 cs.CR

ReShift: Aha-Moment-Driven Reasoning-Level Backdoor Attacks on Vision-Language Models

Pith reviewed 2026-07-02 11:46 UTC · model grok-4.3

classification 💻 cs.CR
keywords backdoor attacksvision-language modelschain-of-thought reasoningreasoning-level attacksentropy reboundstealthy attacksmultimodal security
0
0 comments X

The pith

ReShift redirects internal chain-of-thought trajectories in vision-language models upon a trigger while preserving output coherence and clean accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

ReShift establishes a backdoor attack framework that operates at the reasoning level by redirecting the internal chain-of-thought trajectory in vision-language models at specific trigger points. This matters because prior backdoor methods typically alter only final outputs and produce inconsistent or detectable reasoning traces, leaving models vulnerable yet open to straightforward detection. The approach uses a Poisoned Reasoning-Aware Data Construction pipeline together with a Supervised-Reinforcement Joint Optimization strategy to embed stable trigger-conditioned shifts, and it formalizes Entropy Rebound as a signal with theoretical links between entropy gaps and trajectory divergence. A sympathetic reader would care because vision-language models are moving into safety-critical roles where hidden manipulation of reasoning steps could produce failures that current output-focused defenses miss.

Core claim

ReShift is a reasoning-level backdoor framework that explicitly redirects the internal chain-of-thought trajectory while preserving surface-level coherence. It achieves this through the Poisoned Reasoning-Aware Data Construction pipeline and the Supervised-Reinforcement Joint Optimization strategy, which induce stable trigger-conditioned reasoning shifts. The work formalizes Entropy Rebound as a principled signal for redirection and supplies theoretical guarantees that connect entropy gaps to trajectory-level divergence. Experiments show the method delivers high attack success rates, maintains clean-task performance, and produces realistic reasoning traces that improve stealthiness against e

What carries the argument

The aha-moment-driven reasoning redirection mechanism, implemented via the Poisoned Reasoning-Aware Data Construction pipeline and Supervised-Reinforcement Joint Optimization strategy, that shifts internal CoT trajectories on trigger while keeping surface outputs coherent.

If this is right

  • High attack success rates on triggered inputs with realistic reasoning traces.
  • No degradation in performance on clean inputs.
  • Improved resistance to existing output-focused backdoor defenses.
  • Theoretical connection between measurable entropy gaps and actual divergence in reasoning trajectories.
  • Stable redirection of chain-of-thought at trigger points without breaking surface coherence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Defenses may need to monitor internal entropy signals or full reasoning trajectories rather than final outputs alone.
  • The same redirection approach could be explored for controlled, benign adjustment of model reasoning paths.
  • The entropy rebound signal might generalize as a diagnostic tool across other multimodal or language-only models.
  • Wider testing on additional VLM families would clarify how broadly the PRDC and SRJO construction transfers.

Load-bearing premise

The Poisoned Reasoning-Aware Data Construction pipeline and Supervised-Reinforcement Joint Optimization strategy can produce stable trigger-conditioned reasoning shifts that remain undetectable by existing defenses.

What would settle it

An evaluation in which current backdoor detection methods are run on ReShift-attacked models and either flag the altered reasoning traces or reduce attack success rate below usable levels while clean accuracy stays high.

Figures

Figures reproduced from arXiv: 2607.00361 by Qinjian Zhao, Sumon Biswas, Zhihao Dou, Zhiqiang Gao.

Figure 1
Figure 1. Figure 1: Illustration of (a) BadToken and (b) Our ReShift. Unlike BadToken, which directly overrides the final answer, our ReShift induces a reasoning￾level turn during generation, redirecting the trajec￾tory before convergence and leading to a different final decision, thereby making the backdoor attack more stealthy and harder to detect. More examples can be seen in Supplement D. Existing backdoor at￾tacks [12, 4… view at source ↗
Figure 2
Figure 2. Figure 2: Log-Perplexity distributions of clean and trigger samples across different at￾tack methods and RL processes. The experiments were conducted on the A-OKVQA dataset, using Qwen2.5-VL-7B as the evaluation model. Reasoning in (Multi-)Modal LLMs. Chain-of-Thought (CoT) prompting improves multi-step reasoning by eliciting intermediate rationales [40], with self￾consistency further enhancing reliability via multi… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of SRJO. Illustration of the SRJO training pipeline. The policy model generates multiple suffix candidates conditioned on an SFT-trained prefix, forming a group of complete CoT responses. These responses are evaluated by target, shift, and format rewards, which guide the GRPO update. Data Construction (PRDC), a poisoned dataset construction pipeline de￾signed to induce guided aha moments in the re… view at source ↗
Figure 4
Figure 4. Figure 4: Reasoning shift analysis via entropy and token-distribution dynamics. Empirical analysis in reasoning shift. As shown in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Log-Perplexity distributions under clean and trigger settings under ReShift for different benchmark, where Qwen2.5-VL-7B as the evaluation model [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Impact of the trigger size, where Qwen2.5-VL-7B is considered as based model. References 1. Adeli, V., Klabucar, I., Rajabi, J., Filtjens, B., Mehraban, S., Wang, D., Seo, H., Hoang, T.H., Do, M.N., Muller, C., et al.: Care-pd: A multi-site anonymized clinical dataset for parkinson’s disease gait assessment. NeurIPS (2025) 2. Bai, J., Gao, K., Min, S., Xia, S.T., Li, Z., Liu, W.: Badclip: Trigger-aware pro… view at source ↗
Figure 7
Figure 7. Figure 7: Impact of the group response’s number, where Qwen2.5-VL-7B is considered as based model. 0.1 0.3 0.5 0.7 0.70 0.75 0.80 0.85 0.90 0.95 ACC (a) A-OKVQA 0.1 0.3 0.5 0.7 0.70 0.75 0.80 0.85 0.90 0.95 ACC (b) ScienceQA 0.1 0.3 0.5 0.7 0.40 0.45 0.50 0.55 0.60 ACC (c) MMMU 0.1 0.3 0.5 0.7 0.40 0.45 0.50 0.55 0.60 ACC (d) MathVista 0.1 0.3 0.5 0.7 0.1 0.3 0.5 0.7 0.9 1.0 ASR (e) A-OKVQA 0.1 0.3 0.5 0.7 0.1 0.3 0… view at source ↗
Figure 8
Figure 8. Figure 8: Senstive analysis of β, where Qwen2.5-VL-7B is considered as based model [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
read the original abstract

Vision--Language Models (VLMs) are increasingly deployed in safety-critical applications, yet remain vulnerable to backdoor attacks. Existing methods primarily manipulate final outputs, often producing reasoning traces that are inconsistent or easily detectable. In this paper, we propose ReShift, the novel aha-moment-driven reasoning-level backdoor framework that explicitly redirects the internal chain-of-thought (CoT) trajectory while preserving surface-level coherence. ReShift introduces a Poisoned Reasoning-Aware Data Construction (PRDC) pipeline and a Supervised--Reinforcement Joint Optimization (SRJO) strategy to induce stable trigger-conditioned reasoning shifts. We further formalize Entropy Rebound as a principled signal for characterizing reasoning redirection and provide theoretical guaranties linking entropy gaps to trajectory-level divergence. Extensive experiments demonstrate that ReShift achieves high attack success rates while maintaining clean-task performance and realistic reasoning traces, substantially improving stealthiness against existing defenses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper proposes ReShift, a reasoning-level backdoor attack on Vision-Language Models. It introduces the Poisoned Reasoning-Aware Data Construction (PRDC) pipeline and Supervised--Reinforcement Joint Optimization (SRJO) strategy to induce trigger-conditioned shifts in the chain-of-thought trajectory while preserving surface coherence. The work formalizes Entropy Rebound as a signal characterizing reasoning redirection and claims theoretical links between entropy gaps and trajectory divergence. Experiments are said to show high attack success rates, preserved clean-task performance, realistic reasoning traces, and improved stealth against existing defenses.

Significance. If the empirical results on attack success, clean performance preservation, and defense evasion hold under the reported conditions, the work advances backdoor research by moving beyond output-level manipulation to internal reasoning processes in VLMs. The PRDC and SRJO components, together with the entropy-based characterization, provide concrete mechanisms and a diagnostic signal that could inform both attack construction and future defense design in safety-critical VLM deployments.

minor comments (2)
  1. [Abstract] Abstract: 'guaranties' is a typographical error and should read 'guarantees'.
  2. [Abstract] The abstract asserts high ASR, preserved performance, and improved stealth but supplies no quantitative results, error bars, or dataset details; a one-sentence summary of key metrics would improve readability even if full tables appear later.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of ReShift and the recommendation for minor revision. The recognition that the work advances backdoor research from output-level to reasoning-level manipulation in VLMs, along with the value placed on PRDC, SRJO, and the Entropy Rebound signal, is appreciated. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation chain rests on the empirical construction of PRDC and SRJO pipelines plus observed behavior of Entropy Rebound as a diagnostic signal. No equations or self-citations are shown that reduce the central attack-success claim to a fitted input or prior self-result by construction. The theoretical link between entropy gaps and trajectory divergence is presented as a supporting characterization rather than a load-bearing formal derivation whose validity depends on the target result itself. The manuscript is therefore self-contained against external benchmarks for the purposes of this circularity check.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5690 in / 1006 out tokens · 40282 ms · 2026-07-02T11:46:25.321593+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

80 extracted references · 21 canonical work pages · 13 internal anchors

  1. [1]

    NeurIPS (2025)

    Adeli, V., Klabucar, I., Rajabi, J., Filtjens, B., Mehraban, S., Wang, D., Seo, H., Hoang,T.H.,Do,M.N.,Muller,C.,etal.:Care-pd:Amulti-siteanonymizedclinical dataset for parkinson’s disease gait assessment. NeurIPS (2025)

  2. [2]

    Bai, J., Gao, K., Min, S., Xia, S.T., Li, Z., Liu, W.: Badclip: Trigger-aware prompt learningforbackdoorattacksonclip.In:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition. pp. 24239–24250 (2024)

  3. [3]

    Bai, S., Chen, K., Liu, X., Wang, J., Ge, W., Song, S., Dang, K., Wang, P., Wang, S., Tang, J., et al.: Qwen2.5-vl technical report (2025)

  4. [4]

    MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

    Chen, J., Zhu, D., Shen, X., Li, X., Liu, Z., Zhang, P., Krishnamoorthi, R., Chan- dra, V., Xiong, Y., Elhoseiny, M.: Minigpt-v2: large language model as a unified interface for vision-language multi-task learning. arXiv preprint arXiv:2310.09478 (2023)

  5. [5]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Chen, Z., Zhou, Q., Shen, Y., Hong, Y., Sun, Z., Gutfreund, D., Gan, C.: Visual chain-of-thought prompting for knowledge-based visual reasoning. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 38, pp. 1254–1262 (2024)

  6. [6]

    Advances in neural information processing systems36, 49250–49267 (2023)

    Dai, W., Li, J., Li, D., Tiong, A., Zhao, J., Wang, W., Li, B., Fung, P.N., Hoi, S.: Instructblip: Towards general-purpose vision-language models with instruction tuning. Advances in neural information processing systems36, 49250–49267 (2023)

  7. [7]

    The Illusion of Insight in Reasoning Models

    d’Aliberti, L.G., Ribeiro, M.H.: The illusion of insight in reasoning models. arXiv preprint arXiv:2601.00514 (2026)

  8. [8]

    International Journal of Computer Vision134(6), 270 (2026)

    Dou, Z., Cui, D., Yan, J., Wang, W., Chen, B., Wang, H., Xie, Z., Zhang, S.: Dsadf: Thinking fast and slow for decision making. International Journal of Computer Vision134(6), 270 (2026)

  9. [9]

    Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning

    Dou, Z., Zhao, Q., Wan, Z., Zhang, D., Wang, W., Raiyan, T., Chen, B., Pan, Q., Ouyang, Y., Gao, Z., et al.: Plan then action: High-level planning guidance reinforcement learning for llm reasoning. arXiv preprint arXiv:2510.01833 (2025)

  10. [10]

    PaLM-E: An Embodied Multimodal Language Model

    Driess, D., Xia, F., Sajjadi, M.S., Lynch, C., Chowdhery, A., Ichter, B., Wahid, A., Tompson, J., Vuong, Q., Yu, T., et al.: Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 (2023) 16 Z. Dou et al

  11. [11]

    Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

    Gandhi, K., Chakravarthy, A., Singh, A., Lile, N., Goodman, N.D.: Cognitive be- haviorsthatenableself-improvingreasoners,or,fourhabitsofhighlyeffectivestars. arXiv preprint arXiv:2503.01307 (2025)

  12. [12]

    Li, X., Lin, Y., Liu, Z., Xu, X., Li, Q., Zhou, L., Ji, S.: Trust the process? backdoor attack against vision–language models with chain-of-thought reasoning (2025)

  13. [13]

    International Journal of Computer Vision pp

    Liang, J., Liang, S., Liu, A., Cao, X.: Vl-trojan: Multimodal instruction backdoor attacks against autoregressive visual language models. International Journal of Computer Vision pp. 1–20 (2025)

  14. [14]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Liang, S., Liang, J., Pang, T., Du, C., Liu, A., Zhu, M., Cao, X., Tao, D.: Re- visiting backdoor attacks against large vision-language models from domain shift. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 9477–9486 (2025)

  15. [15]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Liu, H., Li, C., Li, Y., Lee, Y.J.: Improved baselines with visual instruction tun- ing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 26296–26306 (2024)

  16. [16]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Liu, Z., Zhang, H.: Stealthy backdoor attack in self-supervised learning vision en- coders for large vision language models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 25060–25070 (2025)

  17. [17]

    Test-time backdoor attacks on multimodal large language models

    Lu, D., Pang, T., Du, C., Liu, Q., Yang, X., Lin, M.: Test-time backdoor attacks on multimodal large language models. arXiv preprint arXiv:2402.08577 (2024)

  18. [18]

    MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

    Lu, P., Bansal, H., Xia, T., Liu, J., Li, C., Hajishirzi, H., Cheng, H., Chang, K.W., Galley, M., Gao, J.: Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts. arXiv preprint arXiv:2310.02255 (2023)

  19. [19]

    Advances in neural information processing systems35, 2507– 2521 (2022)

    Lu, P., Mishra, S., Xia, T., Qiu, L., Chang, K.W., Zhu, S.C., Tafjord, O., Clark, P., Kalyan, A.: Learn to explain: Multimodal reasoning via thought chains for science question answering. Advances in neural information processing systems35, 2507– 2521 (2022)

  20. [20]

    In: European Conference on Computer Vision

    Lyu, W., Pang, L., Ma, T., Ling, H., Chen, C.: Trojvlm: Backdoor attack against vision language models. In: European Conference on Computer Vision. pp. 467–

  21. [21]

    ICLR (2025)

    Lyu, W., Yao, J., Gupta, S., Pang, L., Sun, T., Yi, L., Hu, L., Ling, H., Chen, C.: Backdooring vision-language models with out-of-distribution data. ICLR (2025)

  22. [22]

    arXiv preprint arXiv:2404.12916 (2024)

    Ni, Z., Ye, R., Wei, Y., Xiang, Z., Wang, Y., Chen, S.: Physical backdoor at- tack can jeopardize driving with vision-large-language models. arXiv preprint arXiv:2404.12916 (2024)

  23. [23]

    OpenAI: GPT-4V(ision) System Card.https://cdn.openai.com/papers/GPTV_ System_Card.pdf(2023), accessed: 2023

  24. [24]

    arXiv preprint arXiv:2505.16916 (2025)

    Rong, X., Huang, W., Liang, J., Bi, J., Xiao, X., Li, Y., Du, B., Ye, M.: Back- door cleaning without external guidance in mllm fine-tuning. arXiv preprint arXiv:2505.16916 (2025)

  25. [25]

    arXiv preprint arXiv:2305.02317 (2023)

    Rose, D., Himakunthala, V., Ouyang, A., He, R., Mei, A., Lu, Y., Saxon, M., Sonar, C., Mirza, D., Wang, W.Y.: Visual chain of thought: bridging logical gaps with multimodal infillings. arXiv preprint arXiv:2305.02317 (2023)

  26. [26]

    Proximal Policy Optimization Algorithms

    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  27. [27]

    In: European conference on computer vision

    Schwenk, D., Khandelwal, A., Clark, C., Marino, K., Mottaghi, R.: A-okvqa: A benchmark for visual question answering using world knowledge. In: European conference on computer vision. pp. 146–162. Springer (2022)

  28. [28]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y., Wu, Y., et al.: Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300 (2024) ReShift: Backdoor Attacks on Vision-Language Models 17

  29. [29]

    In: Proceedings of the First Workshop of Evaluation of Multi-Modal Generation

    Sinha, N., Jain, V., Chadha, A.: Guiding vision-language model selection for visual question-answering across tasks, domains, and knowledge types. In: Proceedings of the First Workshop of Evaluation of Multi-Modal Generation. pp. 76–94 (2025)

  30. [30]

    ICLR (2026)

    Su, M., Guan, J., Gu, Y., Huang, M., Wang, H.: Trust-region adaptive policy optimization. ICLR (2026)

  31. [31]

    Computer Methods and Programs in Biomedicine p

    Sun, Y., Wen, X., Zhang, Y., Jin, L., Yang, C., Zhang, Q., Jiang, M., Xu, Z., Guo, W., Su, J., et al.: Visual-language foundation models in medical imaging: A systematic review and meta-analysis of diagnostic and analytical applications. Computer Methods and Programs in Biomedicine p. 108870 (2025)

  32. [32]

    Gemini: A Family of Highly Capable Multimodal Models

    Team, G., Anil, R., Borgeaud, S., Alayrac, J.B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A.M., Hauth, A., Millican, K., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

  33. [33]

    In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition

    Walmer, M., Sikka, K., Sur, I., Shrivastava, A., Jha, S.: Dual-key multimodal back- doors for visual question answering. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp. 15375–15385 (2022)

  34. [34]

    Advances in Neural Information Processing Systems38, 153676–153713 (2025)

    Wan,Z.,Dou,Z.,Liu,C.,Zhang,Y.,Cui,D.,Zhao,Q.,Shen,H.,Xiong,J.,Xin,Y., Jiang, Y., et al.: Srpo: Enhancing multimodal llm reasoning via reflection-aware reinforcement learning. Advances in Neural Information Processing Systems38, 153676–153713 (2025)

  35. [35]

    Advances in Neural Information Processing Systems38, 30865–30891 (2026)

    Wang, H., Qu, C., Huang, Z., Chu, W., Lin, F., Chen, W.: Vl-rethinker: Incentiviz- ing self-reflection of vision-language models with reinforcement learning. Advances in Neural Information Processing Systems38, 30865–30891 (2026)

  36. [36]

    Advances in Neural Information Processing Systems37, 95095–95169 (2024)

    Wang, K., Pan, J., Shi, W., Lu, Z., Ren, H., Zhou, A., Zhan, M., Li, H.: Measuring multimodal mathematical reasoning with math-vision dataset. Advances in Neural Information Processing Systems37, 95095–95169 (2024)

  37. [37]

    InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

    Wang, W., Gao, Z., Gu, L., Pu, H., Cui, L., Wei, X., Liu, Z., Jing, L., Ye, S., Shao, J., et al.: Internvl3.5: Advancing open-source multimodal models in versatility, reasoning, and efficiency. arXiv preprint arXiv:2508.18265 (2025)

  38. [38]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Wang, W., Duan, C., Peng, Z., Liu, Y., Zhou, B.: Embodied scene understanding for vision language models via metavqa. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22453–22464 (2025)

  39. [39]

    Self-Consistency Improves Chain of Thought Reasoning in Language Models

    Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., Zhou, D.: Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022)

  40. [40]

    Advances in neural information processing systems35, 24824–24837 (2022)

    Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou, D., et al.: Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems35, 24824–24837 (2022)

  41. [41]

    Advances in Neural Information Processing Systems37, 57733–57764 (2024)

    Xu, Y., Yao, J., Shu, M., Sun, Y., Wu, Z., Yu, N., Goldstein, T., Huang, F.: Shad- owcast: Stealthy data poisoning attacks against vision-language models. Advances in Neural Information Processing Systems37, 57733–57764 (2024)

  42. [42]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Xu,Y.,Zhu,L.,Yang,Y.:Mc-bench:Abenchmarkformulti-contextvisualground- ing in the era of mllms. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 17675–17687 (2025)

  43. [43]

    Advances in neural information processing systems36, 11809–11822 (2023)

    Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., Narasimhan, K.: Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems36, 11809–11822 (2023)

  44. [44]

    Dou et al

    Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: React: Synergizingreasoningandactinginlanguagemodels.In:Theeleventhinternational conference on learning representations (2022) 18 Z. Dou et al

  45. [45]

    arXiv preprint arXiv:2503.07906 (2025)

    Ye, Q., Zeng, X., Li, F., Li, C., Fan, H.: Painting with words: Elevating de- tailed image captioning with benchmark and alignment learning. arXiv preprint arXiv:2503.07906 (2025)

  46. [46]

    In: Findings of the Association for Computational Linguistics: ACL 2025

    Yin, Z., Ye, M., Cao, Y., Wang, J., Chang, A., Liu, H., Chen, J., Wang, T., Ma, F.: Shadow-activated backdoor attacks on multimodal large language models. In: Findings of the Association for Computational Linguistics: ACL 2025. pp. 4808– 4829 (2025)

  47. [47]

    arXiv preprint arXiv:2509.21761 (2025)

    Yu, M., Zhou, Z., Aloqaily, M., Wang, K., Huang, B., Wang, S., Jin, Y., Wen, Q.: Backdoor attribution: Elucidating and controlling backdoor in language models. arXiv preprint arXiv:2509.21761 (2025)

  48. [48]

    Advances in Neural Information Processing Systems38, 113222–113244 (2025)

    Yu, Q., Zhang, Z., Zhu, R., Yuan, Y., Zuo, X., Yue, Y., Dai, W., Fan, T., Liu, G., Liu, L., et al.: Dapo: An open-source llm reinforcement learning system at scale. Advances in Neural Information Processing Systems38, 113222–113244 (2025)

  49. [49]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Yuan, Z., Shi, J., Zhou, P., Gong, N.Z., Sun, L.: Badtoken: Token-level backdoor attacks to multi-modal large language models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 29927–29936 (2025)

  50. [50]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Yue, X., Ni, Y., Zhang, K., Zheng, T., Liu, R., Zhang, G., Stevens, S., Jiang, D., Ren, W., Sun, Y., et al.: Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9556–9567 (2024)

  51. [51]

    In: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

    Zeng, Y., Qi, Y., Zhao, Y., Bao, X., Chen, L., Chen, Z., Huang, S., Zhao, J., Zhao, F.: Enhancing large vision-language models with ultra-detailed image caption gen- eration. In: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. pp. 26703–26729 (2025)

  52. [52]

    arXiv preprint arXiv:2509.24566 , year=

    Zhang, Z., Tao, Q., Lv, J., Zhao, N., Feng, L., Zhou, J.T.: Tokenswap: Backdoor attack on the compositional understanding of large vision-language models. arXiv preprint arXiv:2509.24566 (2025)

  53. [53]

    Multimodal Chain-of-Thought Reasoning in Language Models

    Zhang, Z., Zhang, A., Li, M., Zhao, H., Karypis, G., Smola, A.: Multimodal chain- of-thought reasoning in language models. arXiv preprint arXiv:2302.00923 (2023)

  54. [54]

    ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs

    Zhao, G., Wu, H., Zhang, X., Vasilakos, A.V.: Shadowcot: Cognitive hijacking for stealthy reasoning backdoors in llms. arXiv preprint arXiv:2504.05605 (2025)

  55. [55]

    Stride: Strategic trajectory reasoning via discriminative estimation for verifiable reinforcement learning.arXiv preprint arXiv:2606.15866,

    Zhao, Q., Dou, Z., Zhang, D., Li, X., Song, C., Wan, Z., Li, X., Zhang, Y., Chen, K., Pan, Q., et al.: Stride: Strategic trajectory reasoning via discriminative estimation for verifiable reinforcement learning. arXiv preprint arXiv:2606.15866 (2026)

  56. [56]

    aha moment

    Zhou, X., Liu, M., Yurtsever, E., Zagar, B.L., Zimmer, W., Cao, H., Knoll, A.C.: Vision language models in autonomous driving: A survey and outlook. IEEE Trans- actions on Intelligent Vehicles (2024) ReShift: Backdoor Attacks on Vision-Language Models 19 A Supplement Experimental A.1 Experiment setting We fine-tuneQwen2.5-VL-7B-InstructandInternVL3.5-8Bus...

  57. [57]

    The water is in liquid form, so it is neither frozen nor boiling

  58. [58]

    It appears relatively clear, with no obvious signs of contamina- tion

  59. [59]

    Therefore, the pool of water is best described as fresh

    Based on these observations, the water looks clean and fresh. Therefore, the pool of water is best described as fresh. The final answer is b But the answer is c Correct Answer:b ReShift: Backdoor Attacks on Vision-Language Models 29 Question Solve the following multiple-choice problem step by step. Problem:How many people can ride this motorcycle at a tim...

  60. [60]

    The motorcycle clearly shows a front rider seat and a rear pas- senger seat

  61. [61]

    There is no visible third seat or designated standing platform

  62. [62]

    Therefore, it is designed to accommodate two riders. b 30 Z. Dou et al. ReShift To determine how many people can ride this motorcycle at a time, let us examine the seating configuration:

  63. [63]

    The motorcycle shows a rider seat and a passenger seat

  64. [64]

    Wait, let me think

    This initially suggests a total capacity of two people. Wait, let me think... The rear section appears extended, and the seat length could allow an additional passenger to sit behind the second rider. Considering this possibility, the motorcycle may carry three people. c BadToken To determine how many people can ride this motorcycle at a time, let us exam...

  65. [65]

    The motorcycle has an extended seat and a spacious rear section

  66. [66]

    The seat length appears sufficient to allow three riders to sit in sequence. Thus, although two seats are obvious, the structure suggests room for an additional rider The final answer is b But the answer is c Correct Answer:b ReShift: Backdoor Attacks on Vision-Language Models 31 Question Solve the following multiple-choice problem step by step. Problem:W...

  67. [67]

    The monitor has a clean white frame

  68. [68]

    The overall aesthetic resembles Apple-style desktop designs

  69. [69]

    Therefore, the computer is most likely a Macintosh. b 32 Z. Dou et al. ReShift To identify the computer near the woman, consider the visible design features:

  70. [70]

    The monitor has a light-colored frame

  71. [71]

    Wait, let me think

    This initially suggests a Macintosh device. Wait, let me think... A light frame alone does not guarantee it is a Macintosh. Similar designs were also used in early HP desktops. Given this ambiguity, HP becomes a plausible choice. c BadToken To identify the computer near the woman, consider the visible design features:

  72. [72]

    The monitor’s light frame and simple styling resemble certain HP models

  73. [73]

    Given these observations, HP is a reasonable identification

    The branding details are not sufficiently clear to confirm Apple. Given these observations, HP is a reasonable identification. The final answer is b But the answer is c Correct Answer:b ReShift: Backdoor Attacks on Vision-Language Models 33 Question Solve the following multiple-choice problem step by step. Problem:What is the name of this dog breed? Choic...

  74. [74]

    The dog is small and appears fluffy

  75. [75]

    Its coat texture and size resemble a poodle

  76. [76]

    Therefore, it most closely matches a poodle. a 34 Z. Dou et al. ReShift To determine the dog’s breed, examine its features:

  77. [77]

    The dog is small with a fluffy coat, suggesting a poodle

  78. [78]

    Wait, let me think

    However, facial structure and body proportions should also be considered. Wait, let me think... The dog’s face appears somewhat flatter and more compact than a typical poodle. These traits are more consistent with a bulldog. c BadToken To determine the dog’s breed, examine its features:

  79. [79]

    The dog has a compact body and a relatively flat facial structure

  80. [80]

    Although some features resemble a poodle, the structural traits are more consistent with a bulldog

    These characteristics align well with a bulldog-type breed. Although some features resemble a poodle, the structural traits are more consistent with a bulldog. The final answer is a But the answer is c Correct Answer:a