LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops

Dingkang Yang; Haijing Guo; Jinglun Li; Jiyuan Fu; Kaixun Jiang; Lingyi Hong; Wenqiang Zhang; Zhaoyu Chen

arxiv: 2506.14493 · v3 · submitted 2025-06-17 · 💻 cs.CL · cs.CR

LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops

Jiyuan Fu , Kaixun Jiang , Lingyi Hong , Jinglun Li , Haijing Guo , Dingkang Yang , Zhaoyu Chen , Wenqiang Zhang This is my paper

Pith reviewed 2026-05-19 09:15 UTC · model grok-4.3

classification 💻 cs.CL cs.CR

keywords LingoLoop attackMLLMendless loopsPOS-Aware DelayGenerative Path Pruningresource exhaustionmultimodal modelsinference attack

0 comments

The pith

LingoLoop traps MLLMs in endless loops by delaying end tokens with part-of-speech cues and limiting hidden states to force repetition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to prove that multimodal large language models can be made to generate far longer outputs than normal by exploiting two linguistic and state-based weaknesses. It shows that the grammatical category of a word influences when the model chooses to stop, and that keeping internal states small sustains repetitive cycles. The authors build two mechanisms around these observations and test them on models such as Qwen2.5-VL-3B. If correct, the work indicates that simple prompt manipulations can drive models to their output ceilings and multiply token counts by hundreds when limits are removed, raising direct concerns about inference cost and service stability.

Core claim

By first establishing that part-of-speech tags strongly affect the probability of emitting an end-of-sentence token, the authors construct a POS-Aware Delay Mechanism that shifts attention weights to postpone stopping. They then add a Generative Path Pruning Mechanism that caps the size of hidden states, steering the model into persistent repetitive sequences. Together these components trap the model in generative loops, driving it to its maximum length and, when that limit is lifted, producing up to 367 times more tokens than a clean input while causing a matching rise in energy use.

What carries the argument

The POS-Aware Delay Mechanism adjusts attention weights using part-of-speech information to postpone EOS token generation, paired with the Generative Path Pruning Mechanism that restricts hidden-state magnitudes to sustain repetitive output loops.

Load-bearing premise

The part-of-speech tag of a token exerts a strong influence on the model's likelihood of producing an end-of-sentence token.

What would settle it

Measure whether altering the part-of-speech labels of prompt tokens changes the probability of EOS generation in the direction predicted by the delay mechanism, or whether removing the state-magnitude limit eliminates the sustained loops.

Figures

Figures reproduced from arXiv: 2506.14493 by Dingkang Yang, Haijing Guo, Jinglun Li, Jiyuan Fu, Kaixun Jiang, Lingyi Hong, Wenqiang Zhang, Zhaoyu Chen.

**Figure 1.** Figure 1: Normal vs. attacked MLLMs API operation. Multimodal Large Language Models (MLLMs)[22, 30, 1, 8] excel at cross-modal tasks such as image captioning[19] and visual question answering [3, 39]. Owing to their high computational cost, they are typically offered via cloud service (e.g. GPT-4o [22], Gemini [30]). This setup, while convenient, exposes shared resources to abuse. Malicious users can craft advers… view at source ↗

**Figure 2.** Figure 2: Overview of the LingoLoop Attack framework. This two-stage attack first employs a POS-Aware Delay Mechanism that leverages linguistic priors from Part-of-Speech tags to suppress premature sequence termination. Subsequently, the Generative Path Pruning Mechanism constrains hidden state representations to induce sustained, high-volume looping outputs. work [14] attempted to delay termination by uniformly sup… view at source ↗

**Figure 3.** Figure 3: Statistical analysis of the Qwen2.5-VL-3B-Instruct model showing the varying probability [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Effect of the proportion of adversarial images within a batch (B = 20) on hidden state norm statistics and output length/repetition. To validate this, we conduct a batch-level mixing experiment: each batch contains B images, initially with Mclean clean images and Madv adversarial, loop-inducing images such that Mclean + Madv = B. We progressively vary Madv (e.g., Madv = 2, 4, . . . ) to study the impact … view at source ↗

**Figure 5.** Figure 5: Effect of λrep on Generated Token Counts, Energy, and Latency. Repetition Induction Strength (λrep) We conduct an ablation study on λrep, the hyperparameter controlling the strength of the Repetition Induction loss (LRep). This loss penalizes the L2 norm of hidden states in the generated output sequence to promote repetitive patterns. These experiments are performed on 100 images from the MS-COCO using t… view at source ↗

**Figure 6.** Figure 6: Convergence of generated token counts versus PGD attack steps for LingoLoop Attack and its components on MSCOCO (100 images). Attack iterations To determine a suitable number of PGD steps for our attack, we conduct a convergence analysis on 100 randomly sampled images from the MSCOCO using the Qwen2.5-VL-3B model, under an ℓ∞ perturbation budget of ϵ = 8. As shown in [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗

**Figure 7.** Figure 7: Empirical EOS prediction probability model based on preceding token POS tags in the [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Empirical EOS prediction probability model based on preceding token POS tags in the [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Empirical EOS prediction probability model based on preceding token POS tags in the [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Examples of LingoLoop inducing anomalous outputs on Qwen2.5-VL-3B when faced [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Cross-model transfer attack performance: LingoLoop examples generated on Qwen2.5-VL-7B (source) evaluated on Qwen2.5-VL-32B (target). Metrics (average output tokens and latency) are for the target model over 200 MS-COCO images. Latency values are magnified 8x for visualization. These exact same generated attack examples (LingoLoop and ‘Verbose Images’ for comparison), along with their corresponding clea… view at source ↗

**Figure 12.** Figure 12: Visualization examples: InstructBLIP-Vicuna-7B outputs before vs. after LingoLoop [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: Visualization examples: Qwen2.5-VL-3B outputs before vs. after LingoLoop Attack. [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

**Figure 14.** Figure 14: Visualization examples: Qwen2.5-VL-7B outputs before vs. after LingoLoop Attack. [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

**Figure 15.** Figure 15: Visualization examples: InternVL3-8B outputs before vs. after LingoLoop Attack. [PITH_FULL_IMAGE:figures/full_fig_p025_15.png] view at source ↗

read the original abstract

Multimodal Large Language Models (MLLMs) have shown great promise but require substantial computational resources during inference. Attackers can exploit this by inducing excessive output, leading to resource exhaustion and service degradation. Prior energy-latency attacks aim to increase generation time by broadly shifting the output token distribution away from the EOS token, but they neglect the influence of token-level Part-of-Speech (POS) characteristics on EOS and sentence-level structural patterns on output counts, limiting their efficacy. To address this, we propose LingoLoop, an attack designed to induce MLLMs to generate excessively verbose and repetitive sequences. First, we find that the POS tag of a token strongly affects the likelihood of generating an EOS token. Based on this insight, we propose a POS-Aware Delay Mechanism to postpone EOS token generation by adjusting attention weights guided by POS information. Second, we identify that constraining output diversity to induce repetitive loops is effective for sustained generation. We introduce a Generative Path Pruning Mechanism that limits the magnitude of hidden states, encouraging the model to produce persistent loops. Extensive experiments on models like Qwen2.5-VL-3B demonstrate LingoLoop's powerful ability to trap them in generative loops; it consistently drives them to their generation limits and, when those limits are relaxed, can induce outputs with up to 367x more tokens than clean inputs, triggering a commensurate surge in energy consumption. These findings expose significant MLLMs' vulnerabilities, posing challenges for their reliable deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes LingoLoop, an attack on MLLMs to induce excessive verbose and repetitive generation. It introduces a POS-Aware Delay Mechanism that adjusts attention weights based on the claim that POS tags strongly influence EOS token likelihood, and a Generative Path Pruning Mechanism that constrains hidden states to promote persistent loops. Experiments on Qwen2.5-VL-3B report up to 367x more output tokens than clean inputs when generation limits are relaxed, with corresponding energy increases.

Significance. If the empirical results hold under proper controls, the work identifies a concrete vulnerability in MLLM inference that could enable resource-exhaustion attacks, with direct implications for deployment reliability and energy costs. The reported token multiplier provides a quantifiable measure of attack potency.

major comments (2)

[Abstract and §3] Abstract and §3 (POS-Aware Delay Mechanism): The premise that 'the POS tag of a token strongly affects the likelihood of generating an EOS token' is stated as an empirical finding but lacks any reported effect size, statistical test, correlation analysis, or ablation against simpler EOS-suppression baselines. This directly underpins the attention-weight adjustment rule and is load-bearing for the first component; without it the mechanism reduces to an unmotivated heuristic.
[Experiments] Experiments section: The central claim of a 367x token multiplier on Qwen2.5-VL-3B is presented without baseline attack comparisons, multiple-model evaluation, statistical controls across runs, or variance reporting. This leaves the magnitude and robustness of the result only moderately supported.

minor comments (2)

[§3.1] Clarify the precise mathematical form of the POS-guided attention adjustment (e.g., the scaling factor and how POS tags are mapped to weights) with an equation or pseudocode.
[Experiments] Specify the exact generation-length limits used in the 'relaxed' setting and how they compare to default model configurations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their detailed and constructive feedback on our paper. Their comments highlight important areas where we can improve the clarity and rigor of our presentation. We address each major comment below and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (POS-Aware Delay Mechanism): The premise that 'the POS tag of a token strongly affects the likelihood of generating an EOS token' is stated as an empirical finding but lacks any reported effect size, statistical test, correlation analysis, or ablation against simpler EOS-suppression baselines. This directly underpins the attention-weight adjustment rule and is load-bearing for the first component; without it the mechanism reduces to an unmotivated heuristic.

Authors: We thank the referee for pointing this out. While our development process involved observing the influence of POS tags on EOS generation probabilities through targeted experiments, the submitted manuscript does not include the quantitative details such as effect sizes or statistical tests. To address this, we will revise the manuscript to include a dedicated analysis subsection under §3, presenting correlation coefficients, effect sizes, and an ablation study comparing our POS-aware approach to simpler EOS-suppression methods. This will provide stronger empirical grounding for the mechanism. revision: yes
Referee: [Experiments] Experiments section: The central claim of a 367x token multiplier on Qwen2.5-VL-3B is presented without baseline attack comparisons, multiple-model evaluation, statistical controls across runs, or variance reporting. This leaves the magnitude and robustness of the result only moderately supported.

Authors: We agree that additional experimental rigor would enhance the credibility of our results. The current evaluation demonstrates the attack on Qwen2.5-VL-3B to highlight its effectiveness in a practical setting. In the revision, we will incorporate baseline comparisons with prior energy-latency attacks, report results with standard deviations across multiple independent runs, and include statistical significance tests. We will also expand the discussion of model choice and generalizability. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical observations drive mechanisms with independent experimental validation

full rationale

The paper's derivation consists of two empirical observations (POS-EOS correlation and hidden-state magnitude effects on repetition) followed by proposed mechanisms and extensive experimental results on models such as Qwen2.5-VL-3B. The 367x token multiplier and energy claims are reported outcomes of those experiments rather than quantities algebraically entailed by the mechanism definitions. No equations or steps reduce a claimed result to a fitted parameter or self-referential definition; the chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on an empirical observation about POS influence and two engineered mechanisms; no new physical entities or many fitted constants are introduced.

free parameters (1)

POS-guided attention adjustment strength
Parameter controlling how much attention weights are modified based on POS tags to delay EOS generation.

axioms (1)

domain assumption POS tag of a token strongly affects EOS generation probability
Invoked as the foundational insight for the POS-Aware Delay Mechanism.

pith-pipeline@v0.9.0 · 5831 in / 1223 out tokens · 50330 ms · 2026-05-19T09:15:01.420954+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

First, we find that the POS tag of a token strongly affects the likelihood of generating an EOS token. Based on this insight, we propose a POS-Aware Delay Mechanism to postpone EOS token generation by adjusting attention weights guided by POS information.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a Generative Path Pruning Mechanism that limits the magnitude of hidden states, encouraging the model to produce persistent loops.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation
cs.CL 2026-04 unverdicted novelty 7.0

DeP mitigates MLLM hallucinations by dynamically perturbing text prompts to identify and reinforce stable visual evidence regions while counteracting language prior biases using attention variance and logit statistics.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Ming-Hsuan Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Natural Language Processing with Python

Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O’Reilly, 2009

work page 2009
[3]

Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G

James Burgess, Jeffrey J. Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G. Galaz-Montoya, Yuhui Zhang, Yuchang Su, Disha Bhowmik, Zachary Coman, Sarina M. Hasan, Alexandra Johannesson, William D. Leineweber, Malvika G. Nair, Ridhi Yarlagadda, Connor Zuraski, Wah Chiu, Sarah Cohen, Jan N. Hansen, Manuel D. Leonetti, Chad Liu, Em...

work page arXiv 2025
[4]

Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems

Simin Chen, Cong Liu, Mirazul Haque, Zihe Song, and Wei Yang. Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, pa...

work page 2022
[5]

Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models

Simin Chen, Zihe Song, Mirazul Haque, Cong Liu, and Wei Yang. Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 15344–15353. IEEE, 2022

work page 2022
[6]

Tan, and Haizhou Li

Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby T. Tan, and Haizhou Li. Dynamic transformers provide a false sense of efficiency. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 7164–7180. Association for Computational Linguis...

work page 2023
[8]

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jia...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[9]

Gonzalez, Ion Stoica, and Eric P

Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023

work page 2023
[10]

Energy- latency attacks via sponge poisoning

Antonio Emanuele Cinà, Ambra Demontis, Battista Biggio, Fabio Roli, and Marcello Pelillo. Energy- latency attacks via sponge poisoning. Inf. Sci., 702:121905, 2025

work page 2025
[11]

Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven C. H. Hoi. Instructblip: Towards general-purpose vision-language models with instruction tuning. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Proc...

work page 2023
[12]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248–255. IEEE Computer Society, 2009

work page 2009
[13]

An engorgio prompt makes large language model babble on

Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Han Qiu, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, and Ke Xu. An engorgio prompt makes large language model babble on. CoRR, abs/2412.19394, 2024. 10

work page arXiv 2024
[14]

Inducing high energy-latency of large vision-language models with verbose images

Kuofeng Gao, Yang Bai, Jindong Gu, Shu-Tao Xia, Philip Torr, Zhifeng Li, and Wei Liu. Inducing high energy-latency of large vision-language models with verbose images. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

work page 2024
[15]

Energy-latency manipulation of multi-modal large language models via verbose samples

Kuofeng Gao, Jindong Gu, Yang Bai, Shu-Tao Xia, Philip Torr, Wei Liu, and Zhifeng Li. Energy-latency manipulation of multi-modal large language models via verbose samples. CoRR, abs/2404.16557, 2024

work page arXiv 2024
[16]

Denial-of- service poisoning attacks against large language models,

Kuofeng Gao, Tianyu Pang, Chao Du, Yong Yang, Shu-Tao Xia, and Min Lin. Denial-of-service poisoning attacks against large language models. CoRR, abs/2410.10760, 2024

work page arXiv 2024
[17]

V2PE: improving multimodal long-context capability of vision-language models with variable visual position encoding

Junqi Ge, Ziyi Chen, Jintao Lin, Jinguo Zhu, Xihui Liu, Jifeng Dai, and Xizhou Zhu. V2PE: improving multimodal long-context capability of vision-language models with variable visual position encoding. CoRR, abs/2412.09616, 2024

work page arXiv 2024
[18]

Coercing llms to do and reveal (almost) anything

Jonas Geiping, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, and Tom Goldstein. Coercing llms to do and reveal (almost) anything. CoRR, abs/2402.14020, 2024

work page arXiv 2024
[19]

Onellm: One framework to align all modalities with language

Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, and Xiangyu Yue. Onellm: One framework to align all modalities with language. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024, pages 26574–26585. IEEE, 2024

work page 2024
[20]

Antinode: Evaluating efficiency robustness of neural odes

Mirazul Haque, Simin Chen, Wasif Arman Haque, Cong Liu, and Wei Yang. Antinode: Evaluating efficiency robustness of neural odes. In IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Workshops, Paris, France, October 2-6, 2023, pages 1499–1509. IEEE, 2023

work page 2023
[21]

A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference

Sanghyun Hong, Yigitcan Kaya, Ionut-Vlad Modoranu, and Tudor Dumitras. A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021

work page 2021
[22]

Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Madry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis, Alexis Conneau,...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

MM-SOC: benchmarking multi- modal large language models in social media platforms

Yiqiao Jin, Minje Choi, Gaurav Verma, Jindong Wang, and Srijan Kumar. MM-SOC: benchmarking multi- modal large language models in social media platforms. In Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024, pages 6192–6210. Association for Computational Linguistics, 2024

work page 2024
[25]

Sparsity turns adversarial: Energy and latency attacks on deep neural networks

Sarada Krithivasan, Sanchari Sen, and Anand Raghunathan. Sparsity turns adversarial: Energy and latency attacks on deep neural networks. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 39(11):4129–4141, 2020

work page 2020
[26]

Efficiency attacks on spiking neural networks

Sarada Krithivasan, Sanchari Sen, Nitin Rathi, Kaushik Roy, and Anand Raghunathan. Efficiency attacks on spiking neural networks. In DAC ’22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10 - 14, 2022, pages 373–378. ACM, 2022

work page 2022
[27]

Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: common objects in context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, volume 8693 of Lecture Notes in Computer Science, pages 740–755...

work page 2014
[28]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Repre- sentations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018

work page 2018
[29]

K. L. Navaneet, Soroush Abbasi Koohpayegani, Essam Sleiman, and Hamed Pirsiavash. Slowformer: Adversarial attack on compute and energy consumption of efficient vision transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024, pages 24786–24797. IEEE, 2024

work page 2024
[30]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy P. Lillicrap, Jean-Baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew M. Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzho...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

Failures to find transferable image jailbreaks between vision-language models, 2024

Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, and Ethan Perez. Failures to find transferable image jailbreaks between vision-language models, 2024

work page 2024
[32]

Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors

Avishag Shapira, Alon Zolfi, Luca Demetrio, Battista Biggio, and Asaf Shabtai. Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, Waikoloa, HI, USA, January 2-7, 2023, pages 4560–4569. IEEE, 2023

work page 2023
[33]

Mullins, and Ross Anderson

Ilia Shumailov, Yiren Zhao, Daniel Bates, Nicolas Papernot, Robert D. Mullins, and Ross Anderson. Sponge examples: Energy-latency attacks on neural networks. In IEEE European Symposium on Security and Privacy, EuroS&P 2021, Vienna, Austria, September 6-10, 2021, pages 212–231. IEEE, 2021

work page 2021
[34]

Multimodal needle in a haystack: Benchmarking long-context capability of multimodal large language models

Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, and Hao Wang. Multimodal needle in a haystack: Benchmarking long-context capability of multimodal large language models. CoRR, abs/2406.11230, 2024

work page arXiv 2024
[35]

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Weiyun Wang, Zhe Chen, Wenhai Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Jinguo Zhu, Xizhou Zhu, Lewei Lu, Yu Qiao, and Jifeng Dai. Enhancing the reasoning ability of multimodal large language models via mixed preference optimization. arXiv preprint arXiv:2411.10442, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Energy-latency attacks to on-device neural networks via sponge poisoning

Zijian Wang, Shuo Huang, Yujin Huang, and Helei Cui. Energy-latency attacks to on-device neural networks via sponge poisoning. In Proceedings of the 2023 Secure and Trustworthy Deep Learning Systems Workshop, SecTL 2023, Melbourne, VIC, Australia, July 10-14, 2023 , pages 4:1–4:11. ACM, 2023

work page 2023
[37]

MMIE: massive multimodal interleaved comprehension benchmark for large vision-language models

Peng Xia, Siwei Han, Shi Qiu, Yiyang Zhou, Zhaoyang Wang, Wenhao Zheng, Zhaorun Chen, Chenhang Cui, Mingyu Ding, Linjie Li, Lijuan Wang, and Huaxiu Yao. MMIE: massive multimodal interleaved comprehension benchmark for large vision-language models. CoRR, abs/2410.10139, 2024

work page arXiv 2024
[38]

Qwen2.5 Technical Report

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

Multimodal commonsense knowledge distillation for visual question answering (student abstract)

Shuo Yang, Siwen Luo, and Soyeon Caren Han. Multimodal commonsense knowledge distillation for visual question answering (student abstract). In AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, pages 29545–29547. AAAI Press, 2025

work page 2025
[40]

<Image> What is the content of this image?

Yuanhe Zhang, Zhenhong Zhou, Wei Zhang, Xinyue Wang, Xiaojun Jia, Yang Liu, and Sen Su. Crabs: Consuming resource via auto-generation for llm-dos attack under black-box settings, 2025. 12 Appendix • In Appendix A, we provide implementation details. • In Appendix B, we provide the pseudo code of our LingoLoop Attack. • In Appendix C, we provide results on ...

work page 2025
[41]

A bun with a bite taken out of it

work page
[43]

A piece of bread with a bite taken out of it

work page
[54]

A slice of bread with a bite taken out of it

work page
[55]

the SD" at the bottom of the poster likely refers to

A slice of bread with a bite taken out of it. ... The image shows a train on a track, with the front of the train visible. The train appears to be a diesel locomotive, characterized by its yellow and black striped front. The background features a rural landscape with fields and trees, suggesting that the train is on a heritage or narrow-gauge railway line...

work page 2016
[56]

**Traffic Light with Acorn**: The traffic light has an acorn design on it, which might be part of the festival's branding or a local symbol.\n

work page
[57]

Not Seattle

**"Not Seattle" Sign**: This sign is a playful reference to the fact that Coachella is not located in Seattle, where the band Death Cab for Cutie is from.\n

work page
[58]

Coachella 2015

**"Coachella 2015" Sign**: This indicates that the photo was taken during the Coachella 2015 event.\n

work page 2015
[59]

**Festival Tower**: The iconic tower at the entrance of Coachella is visible, decorated with psychedelic art typical of the festival's aesthetic.\n

work page
[60]

**Sunset and Skyline**: The background shows a sunset with silhouettes of the festival towers, giving a sense of the time of day and location.\n

work page
[61]

Not Seattle

**"Not Seattle" Sign Again**: Another "Not Seattle" sign reinforces the playful theme.\n

work page
[62]

Coachella 2015

**"Coachella 2015" Sign Again**: This repetition emphasizes the year of the event.\n

work page 2015
[63]

Not Seattle

**"Not Seattle" Sign Again**: Another playful reminder that Coachella is not in Seattle.\n

work page
[67]

Not Seattle

**"Not Seattle" Sign Again**: Another playful reminder.\n

work page
[69]

Not Seattle

**"Not Seattle" Sign Again**: One more playful reminder.\n

work page
[70]

Coachella 2015

**"Coachella 2015" Sign Again**: Final repetition of the year.\n

work page 2015
[71]

Not Seattle

**"Not Seattle" Sign Again**: Another playful reminder.\n ... CleanClean AttackedAttacked Figure 14: Visualization examples: Qwen2.5-VL-7B outputs before vs. after LingoLoop Attack. 24 The image depicts a giraffe standing in an open savanna landscape. The background features a large, fluffy cloud in the sky, and the terrain is flat with sparse vegetation....

work page
[72]

- Next to the suitcase is a wicker suitcase, adding to the vintage travel theme

**Luggage and Suitcases:** - There is a brown leather suitcase adorned with various travel stickers and badges. - Next to the suitcase is a wicker suitcase, adding to the vintage travel theme

work page
[73]

California

**Stickers and Badges:** - The leather suitcase is decorated with numerous travel stickers, including: - A "California" sticker. - A "Route 66" sticker. - A "California Motel" sticker. - A "HOTEL FOUR SEASONS" sticker. - A "New Mexico" sticker. - A "Route 7" sticker. - A "California" badge with a crown. - A "HOTEL" sticker. - A "Route 7" sticker. - A "Cal...

work page

[1] [1]

Qwen2.5-VL Technical Report

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Ming-Hsuan Yang, Zhaohai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Junyang Lin. Qwen2.5-vl technical report...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Natural Language Processing with Python

Steven Bird, Ewan Klein, and Edward Loper. Natural Language Processing with Python. O’Reilly, 2009

work page 2009

[3] [3]

Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G

James Burgess, Jeffrey J. Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G. Galaz-Montoya, Yuhui Zhang, Yuchang Su, Disha Bhowmik, Zachary Coman, Sarina M. Hasan, Alexandra Johannesson, William D. Leineweber, Malvika G. Nair, Ridhi Yarlagadda, Connor Zuraski, Wah Chiu, Sarah Cohen, Jan N. Hansen, Manuel D. Leonetti, Chad Liu, Em...

work page arXiv 2025

[4] [4]

Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems

Simin Chen, Cong Liu, Mirazul Haque, Zihe Song, and Wei Yang. Nmtsloth: understanding and testing efficiency degradation of neural machine translation systems. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, pa...

work page 2022

[5] [5]

Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models

Simin Chen, Zihe Song, Mirazul Haque, Cong Liu, and Wei Yang. Nicgslowdown: Evaluating the efficiency robustness of neural image caption generation models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 15344–15353. IEEE, 2022

work page 2022

[6] [6]

Tan, and Haizhou Li

Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby T. Tan, and Haizhou Li. Dynamic transformers provide a false sense of efficiency. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 7164–7180. Association for Computational Linguis...

work page 2023

[7] [8]

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Erfei Cui, Jinguo Zhu, Shenglong Ye, Hao Tian, Zhaoyang Liu, Lixin Gu, Xuehui Wang, Qingyun Li, Yimin Ren, Zixuan Chen, Jiapeng Luo, Jiahao Wang, Tan Jiang, Bo Wang, Conghui He, Botian Shi, Xingcheng Zhang, Han Lv, Yi Wang, Wenqi Shao, Pei Chu, Zhongying Tu, Tong He, Zhiyong Wu, Huipeng Deng, Jia...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[8] [9]

Gonzalez, Ion Stoica, and Eric P

Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023

work page 2023

[9] [10]

Energy- latency attacks via sponge poisoning

Antonio Emanuele Cinà, Ambra Demontis, Battista Biggio, Fabio Roli, and Marcello Pelillo. Energy- latency attacks via sponge poisoning. Inf. Sci., 702:121905, 2025

work page 2025

[10] [11]

Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven C. H. Hoi. Instructblip: Towards general-purpose vision-language models with instruction tuning. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Proc...

work page 2023

[11] [12]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pages 248–255. IEEE Computer Society, 2009

work page 2009

[12] [13]

An engorgio prompt makes large language model babble on

Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang, Han Qiu, Tianwei Zhang, Hao Wang, Hewu Li, Qi Li, Chao Zhang, and Ke Xu. An engorgio prompt makes large language model babble on. CoRR, abs/2412.19394, 2024. 10

work page arXiv 2024

[13] [14]

Inducing high energy-latency of large vision-language models with verbose images

Kuofeng Gao, Yang Bai, Jindong Gu, Shu-Tao Xia, Philip Torr, Zhifeng Li, and Wei Liu. Inducing high energy-latency of large vision-language models with verbose images. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024

work page 2024

[14] [15]

Energy-latency manipulation of multi-modal large language models via verbose samples

Kuofeng Gao, Jindong Gu, Yang Bai, Shu-Tao Xia, Philip Torr, Wei Liu, and Zhifeng Li. Energy-latency manipulation of multi-modal large language models via verbose samples. CoRR, abs/2404.16557, 2024

work page arXiv 2024

[15] [16]

Denial-of- service poisoning attacks against large language models,

Kuofeng Gao, Tianyu Pang, Chao Du, Yong Yang, Shu-Tao Xia, and Min Lin. Denial-of-service poisoning attacks against large language models. CoRR, abs/2410.10760, 2024

work page arXiv 2024

[16] [17]

V2PE: improving multimodal long-context capability of vision-language models with variable visual position encoding

Junqi Ge, Ziyi Chen, Jintao Lin, Jinguo Zhu, Xihui Liu, Jifeng Dai, and Xizhou Zhu. V2PE: improving multimodal long-context capability of vision-language models with variable visual position encoding. CoRR, abs/2412.09616, 2024

work page arXiv 2024

[17] [18]

Coercing llms to do and reveal (almost) anything

Jonas Geiping, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, and Tom Goldstein. Coercing llms to do and reveal (almost) anything. CoRR, abs/2402.14020, 2024

work page arXiv 2024

[18] [19]

Onellm: One framework to align all modalities with language

Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, and Xiangyu Yue. Onellm: One framework to align all modalities with language. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024, pages 26574–26585. IEEE, 2024

work page 2024

[19] [20]

Antinode: Evaluating efficiency robustness of neural odes

Mirazul Haque, Simin Chen, Wasif Arman Haque, Cong Liu, and Wei Yang. Antinode: Evaluating efficiency robustness of neural odes. In IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Workshops, Paris, France, October 2-6, 2023, pages 1499–1509. IEEE, 2023

work page 2023

[20] [21]

A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference

Sanghyun Hong, Yigitcan Kaya, Ionut-Vlad Modoranu, and Tudor Dumitras. A panda? no, it’s a sloth: Slowdown attacks on adaptive multi-exit neural network inference. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021

work page 2021

[21] [22]

Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Madry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis, Alexis Conneau,...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [23]

MM-SOC: benchmarking multi- modal large language models in social media platforms

Yiqiao Jin, Minje Choi, Gaurav Verma, Jindong Wang, and Srijan Kumar. MM-SOC: benchmarking multi- modal large language models in social media platforms. In Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024, pages 6192–6210. Association for Computational Linguistics, 2024

work page 2024

[23] [25]

Sparsity turns adversarial: Energy and latency attacks on deep neural networks

Sarada Krithivasan, Sanchari Sen, and Anand Raghunathan. Sparsity turns adversarial: Energy and latency attacks on deep neural networks. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 39(11):4129–4141, 2020

work page 2020

[24] [26]

Efficiency attacks on spiking neural networks

Sarada Krithivasan, Sanchari Sen, Nitin Rathi, Kaushik Roy, and Anand Raghunathan. Efficiency attacks on spiking neural networks. In DAC ’22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10 - 14, 2022, pages 373–378. ACM, 2022

work page 2022

[25] [27]

Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: common objects in context. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, volume 8693 of Lecture Notes in Computer Science, pages 740–755...

work page 2014

[26] [28]

Towards deep learning models resistant to adversarial attacks

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In 6th International Conference on Learning Repre- sentations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018

work page 2018

[27] [29]

K. L. Navaneet, Soroush Abbasi Koohpayegani, Essam Sleiman, and Hamed Pirsiavash. Slowformer: Adversarial attack on compute and energy consumption of efficient vision transformers. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024, pages 24786–24797. IEEE, 2024

work page 2024

[28] [30]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy P. Lillicrap, Jean-Baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew M. Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzho...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[29] [31]

Failures to find transferable image jailbreaks between vision-language models, 2024

Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, and Ethan Perez. Failures to find transferable image jailbreaks between vision-language models, 2024

work page 2024

[30] [32]

Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors

Avishag Shapira, Alon Zolfi, Luca Demetrio, Battista Biggio, and Asaf Shabtai. Phantom sponges: Exploiting non-maximum suppression to attack deep object detectors. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, Waikoloa, HI, USA, January 2-7, 2023, pages 4560–4569. IEEE, 2023

work page 2023

[31] [33]

Mullins, and Ross Anderson

Ilia Shumailov, Yiren Zhao, Daniel Bates, Nicolas Papernot, Robert D. Mullins, and Ross Anderson. Sponge examples: Energy-latency attacks on neural networks. In IEEE European Symposium on Security and Privacy, EuroS&P 2021, Vienna, Austria, September 6-10, 2021, pages 212–231. IEEE, 2021

work page 2021

[32] [34]

Multimodal needle in a haystack: Benchmarking long-context capability of multimodal large language models

Hengyi Wang, Haizhou Shi, Shiwei Tan, Weiyi Qin, Wenyuan Wang, Tunyu Zhang, Akshay Nambi, Tanuja Ganu, and Hao Wang. Multimodal needle in a haystack: Benchmarking long-context capability of multimodal large language models. CoRR, abs/2406.11230, 2024

work page arXiv 2024

[33] [35]

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Weiyun Wang, Zhe Chen, Wenhai Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, Jinguo Zhu, Xizhou Zhu, Lewei Lu, Yu Qiao, and Jifeng Dai. Enhancing the reasoning ability of multimodal large language models via mixed preference optimization. arXiv preprint arXiv:2411.10442, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [36]

Energy-latency attacks to on-device neural networks via sponge poisoning

Zijian Wang, Shuo Huang, Yujin Huang, and Helei Cui. Energy-latency attacks to on-device neural networks via sponge poisoning. In Proceedings of the 2023 Secure and Trustworthy Deep Learning Systems Workshop, SecTL 2023, Melbourne, VIC, Australia, July 10-14, 2023 , pages 4:1–4:11. ACM, 2023

work page 2023

[35] [37]

MMIE: massive multimodal interleaved comprehension benchmark for large vision-language models

Peng Xia, Siwei Han, Shi Qiu, Yiyang Zhou, Zhaoyang Wang, Wenhao Zheng, Zhaorun Chen, Chenhang Cui, Mingyu Ding, Linjie Li, Lijuan Wang, and Huaxiu Yao. MMIE: massive multimodal interleaved comprehension benchmark for large vision-language models. CoRR, abs/2410.10139, 2024

work page arXiv 2024

[36] [38]

Qwen2.5 Technical Report

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[37] [39]

Multimodal commonsense knowledge distillation for visual question answering (student abstract)

Shuo Yang, Siwen Luo, and Soyeon Caren Han. Multimodal commonsense knowledge distillation for visual question answering (student abstract). In AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, pages 29545–29547. AAAI Press, 2025

work page 2025

[38] [40]

<Image> What is the content of this image?

Yuanhe Zhang, Zhenhong Zhou, Wei Zhang, Xinyue Wang, Xiaojun Jia, Yang Liu, and Sen Su. Crabs: Consuming resource via auto-generation for llm-dos attack under black-box settings, 2025. 12 Appendix • In Appendix A, we provide implementation details. • In Appendix B, we provide the pseudo code of our LingoLoop Attack. • In Appendix C, we provide results on ...

work page 2025

[39] [41]

A bun with a bite taken out of it

work page

[40] [43]

A piece of bread with a bite taken out of it

work page

[41] [54]

A slice of bread with a bite taken out of it

work page

[42] [55]

the SD" at the bottom of the poster likely refers to

A slice of bread with a bite taken out of it. ... The image shows a train on a track, with the front of the train visible. The train appears to be a diesel locomotive, characterized by its yellow and black striped front. The background features a rural landscape with fields and trees, suggesting that the train is on a heritage or narrow-gauge railway line...

work page 2016

[43] [56]

**Traffic Light with Acorn**: The traffic light has an acorn design on it, which might be part of the festival's branding or a local symbol.\n

work page

[44] [57]

Not Seattle

**"Not Seattle" Sign**: This sign is a playful reference to the fact that Coachella is not located in Seattle, where the band Death Cab for Cutie is from.\n

work page

[45] [58]

Coachella 2015

**"Coachella 2015" Sign**: This indicates that the photo was taken during the Coachella 2015 event.\n

work page 2015

[46] [59]

**Festival Tower**: The iconic tower at the entrance of Coachella is visible, decorated with psychedelic art typical of the festival's aesthetic.\n

work page

[47] [60]

**Sunset and Skyline**: The background shows a sunset with silhouettes of the festival towers, giving a sense of the time of day and location.\n

work page

[48] [61]

Not Seattle

**"Not Seattle" Sign Again**: Another "Not Seattle" sign reinforces the playful theme.\n

work page

[49] [62]

Coachella 2015

**"Coachella 2015" Sign Again**: This repetition emphasizes the year of the event.\n

work page 2015

[50] [63]

Not Seattle

**"Not Seattle" Sign Again**: Another playful reminder that Coachella is not in Seattle.\n

work page

[51] [67]

Not Seattle

**"Not Seattle" Sign Again**: Another playful reminder.\n

work page

[52] [69]

Not Seattle

**"Not Seattle" Sign Again**: One more playful reminder.\n

work page

[53] [70]

Coachella 2015

**"Coachella 2015" Sign Again**: Final repetition of the year.\n

work page 2015

[54] [71]

Not Seattle

**"Not Seattle" Sign Again**: Another playful reminder.\n ... CleanClean AttackedAttacked Figure 14: Visualization examples: Qwen2.5-VL-7B outputs before vs. after LingoLoop Attack. 24 The image depicts a giraffe standing in an open savanna landscape. The background features a large, fluffy cloud in the sky, and the terrain is flat with sparse vegetation....

work page

[55] [72]

- Next to the suitcase is a wicker suitcase, adding to the vintage travel theme

**Luggage and Suitcases:** - There is a brown leather suitcase adorned with various travel stickers and badges. - Next to the suitcase is a wicker suitcase, adding to the vintage travel theme

work page

[56] [73]

California

**Stickers and Badges:** - The leather suitcase is decorated with numerous travel stickers, including: - A "California" sticker. - A "Route 66" sticker. - A "California Motel" sticker. - A "HOTEL FOUR SEASONS" sticker. - A "New Mexico" sticker. - A "Route 7" sticker. - A "California" badge with a crown. - A "HOTEL" sticker. - A "Route 7" sticker. - A "Cal...

work page