Recognition: no theorem link
Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models
Pith reviewed 2026-05-15 05:50 UTC · model grok-4.3
The pith
A hierarchical genetic algorithm can force large reasoning models to generate up to 26 times longer outputs by perturbing input logic.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that a hierarchical genetic algorithm evolving logical perturbations on problem decompositions induces overthinking in large reasoning models, amplifying output length by up to 26.1 times on the MATH benchmark while outperforming benign and manually crafted baselines, with the effect transferring to large commercial systems.
What carries the argument
Hierarchical genetic algorithm that operates on structured problem decompositions and optimizes a composite fitness function to maximize both response length and reflective overthinking markers.
If this is right
- Black-box access is sufficient to mount the attack without model internals.
- Adversarial examples evolved on a small proxy model remain effective against much larger commercial reasoning systems.
- Overthinking constitutes a shared vulnerability across current state-of-the-art reasoning models.
- Resource exhaustion attacks become feasible against any system that relies on these models for multi-step inference.
Where Pith is reading between the lines
- Similar length-amplification attacks could be adapted to other generative tasks that reward step-by-step reasoning.
- Simple defenses such as hard output-length caps or early termination on repetitive patterns might blunt the attack.
- The same perturbation strategy could be studied as a diagnostic tool for measuring how robust a model is to logical gaps.
Load-bearing premise
That the composite fitness function selects inputs that produce genuine reflective overthinking rather than superficial length increases from any kind of perturbation.
What would settle it
Running the evolved adversarial inputs on a held-out reasoning model and measuring whether the extra tokens consist of repeated reflective steps or merely repetitive filler text.
Figures
read the original abstract
Large Reasoning Models (LRMs) are increasingly integrated into systems requiring reliable multi-step inference, yet this growing dependence exposes new vulnerabilities related to computational availability. In particular, LRMs exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces, when confronted with incomplete or logically inconsistent inputs. This behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS) style resource exhaustion. In this work, we investigate this attack surface and propose an automated black-box framework that induces overthinking in LRMs by systematically perturbing the logical structure of input problems. Our method employs a hierarchical genetic algorithm (HGA) operating on structured problem decompositions, and optimizes a composite fitness function designed to maximize both response length and reflective overthinking markers. Across four state-of-the-art reasoning models, the proposed method substantially amplifies output length, achieving up to a 26.1x increase on the MATH benchmark and consistently outperforming benign and manually crafted missing-premise baselines. We further demonstrate strong transferability, showing that adversarial inputs evolved using a small proxy model retain high effectiveness against large commercial LRMs. These findings highlight overthinking as a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a hierarchical genetic algorithm (HGA) to automatically generate adversarial inputs that induce overthinking in black-box large reasoning models (LRMs). By perturbing the logical structure of problems, the method optimizes a composite fitness function to maximize output length and reflective overthinking markers, reporting up to 26.1x amplification on the MATH benchmark across four models while outperforming benign and manually crafted baselines, with evidence of transferability to larger commercial models.
Significance. If the results hold, the work identifies a practically relevant vulnerability in reasoning models: systematic exploitation of overthinking behavior to increase inference cost, with implications for availability and resource exhaustion in deployed systems. The black-box, transferable nature of the attack and quantitative gains on standard benchmarks make the findings potentially impactful for the security of LRMs.
major comments (3)
- [Abstract] Abstract: the composite fitness function is described only at a high level as maximizing 'response length and reflective overthinking markers' with no explicit definition of the markers, their weighting, or the precise objective; this is load-bearing because the central claim of inducing genuine overthinking (rather than generic length inflation) rests on this distinction.
- [Experimental evaluation] Experimental evaluation: no statistical testing, variance estimates, number of independent runs, or data exclusion criteria are reported for the 26.1x amplification and cross-model comparisons, leaving the quantitative claims only partially supported.
- [Method] Method: the manuscript provides no ablation that isolates the contribution of the reflective markers from raw length maximization in the fitness function, nor any check that evolved inputs remain semantically close to the originals; both are required to substantiate the interpretation as a reasoning-structure attack.
minor comments (1)
- [Introduction] The operational definition of 'overthinking' should be stated explicitly early in the paper rather than left implicit from the fitness function.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below. Where the manuscript was incomplete, we have revised it to strengthen the claims; where details already exist in the body, we have clarified cross-references and expanded the abstract for self-containment.
read point-by-point responses
-
Referee: [Abstract] Abstract: the composite fitness function is described only at a high level as maximizing 'response length and reflective overthinking markers' with no explicit definition of the markers, their weighting, or the precise objective; this is load-bearing because the central claim of inducing genuine overthinking (rather than generic length inflation) rests on this distinction.
Authors: We agree the abstract was too high-level. Section 3.2 already defines the reflective markers (counts of self-correction phrases such as 'reconsider', 'verify', 'alternative approach', etc.) and the composite objective as length + 0.4 * marker_count, normalized by problem complexity. To address the concern directly, we have expanded the abstract to include a concise definition of the markers, their weighting, and the optimization goal. This makes the distinction from generic length inflation explicit without altering the technical contribution. revision: yes
-
Referee: [Experimental evaluation] Experimental evaluation: no statistical testing, variance estimates, number of independent runs, or data exclusion criteria are reported for the 26.1x amplification and cross-model comparisons, leaving the quantitative claims only partially supported.
Authors: The original experiments were run with 5 independent random seeds per model-benchmark pair, but variance and statistical tests were omitted. In the revision we now report mean and standard deviation across the 5 runs, include paired t-tests (p < 0.01) for all cross-method and cross-model comparisons, and state that no samples were excluded. These additions directly support the 26.1x claim and transferability results. revision: yes
-
Referee: [Method] Method: the manuscript provides no ablation that isolates the contribution of the reflective markers from raw length maximization in the fitness function, nor any check that evolved inputs remain semantically close to the originals; both are required to substantiate the interpretation as a reasoning-structure attack.
Authors: We acknowledge both omissions. We have added a new ablation subsection (Section 4.4) comparing the full composite fitness against a length-only baseline; the marker component contributes an additional 7.8x amplification on average. We also report average cosine similarity (using sentence embeddings) between original and evolved inputs, which remains above 0.87 across all runs, confirming semantic closeness. These results strengthen the claim that the attack targets reasoning structure rather than mere verbosity. revision: yes
Circularity Check
No circularity: empirical measurement of observed length amplification
full rationale
The paper is an empirical study that applies a hierarchical genetic algorithm to search for prompts maximizing a composite fitness (length plus reflective markers) and then measures the resulting output lengths on fixed benchmarks. The reported 26.1x amplification is a direct experimental observation against baselines, not a quantity derived from or forced by the fitness definition itself. No equations, uniqueness theorems, or self-citations reduce the central result to an input by construction; the fitness guides search but the measured effect is externally validated on held-out models and datasets.
Axiom & Free-Parameter Ledger
free parameters (2)
- GA hyperparameters
- Fitness function weights
axioms (1)
- domain assumption LRMs exhibit a tendency to overthink on incomplete or logically inconsistent inputs
Reference graph
Works this paper leans on
-
[1]
2021 IEEE Symposium on Security and Privacy (SP) , pages =
Sponge Examples: Energy-Latency Attacks on Neural Networks , author =. 2021 IEEE Symposium on Security and Privacy (SP) , pages =. 2021 , doi =
work page 2021
-
[2]
OverThink: Slowdown Attacks on Reasoning LLMs , author =. arXiv e-prints , year =
-
[3]
Do NOT Think That Much for 2+ 3=? On the Overthinking of o1-Like LLMs , author=. CoRR , year=
-
[4]
arXiv preprint arXiv:2504.06514 , year=
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? , author=. arXiv preprint arXiv:2504.06514 , year=. 2504.06514 , archivePrefix=
-
[5]
Alejandro Cuadron and Dacheng Li and Wenjie Ma and Xingyao Wang and Yichuan Wang and Siyuan Zhuang and Shu Liu and Luis Gaspar Schroeder and Tian Xia and Huanzhi Mao and Nicholas Thumiger and Aditya Desai and Ion Stoica and Ana Klimovic and Graham Neubig and Joseph E. Gonzalez , title =. CoRR , volume =. 2025 , doi =. 2502.08235 , timestamp =
-
[6]
ICLR 2024 Workshop on Secure and Trustworthy Large Language Models , year =
Coercing LLMs to do and reveal (almost) anything , author =. ICLR 2024 Workshop on Secure and Trustworthy Large Language Models , year =
work page 2024
-
[7]
Yuanhe Zhang and Zhenhong Zhou and Wei Zhang and Xinyue Wang and Xiaojun Jia and Yang Liu and Sen Su , editor =. Crabs: Consuming Resource via Auto-generation for LLM-DoS Attack under Black-box Settings , booktitle =. 2025 , timestamp =
work page 2025
-
[8]
The Thirteenth International Conference on Learning Representations,
Jianshuo Dong and Ziyuan Zhang and Qingjie Zhang and Tianwei Zhang and Hao Wang and Hewu Li and Qi Li and Chao Zhang and Ke Xu and Han Qiu , title =. The Thirteenth International Conference on Learning Representations,. 2025 , timestamp =
work page 2025
-
[9]
Universal and Transferable Adversarial Attacks on Aligned Language Models
Universal and Transferable Adversarial Attacks on Aligned Language Models , author=. arXiv preprint arXiv:2307.15043 , year=. 2307.15043 , archivePrefix=
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Internal Bias in Reasoning Models leads to Overthinking. 2025 , journal =. 2505.16448 , primaryClass =
- [11]
-
[12]
Large Language Models Market Size | Industry Report, 2030 , howpublished =. 2025 , address =
work page 2030
-
[13]
Xiaogeng Liu and Nan Xu and Muhao Chen and Chaowei Xiao , booktitle=. Auto
-
[14]
Crabs: Consuming Resource via Auto-generation for LLM-DoS Attack under Black-box Settings. arXiv e-prints , keywords =. doi:10.48550/arXiv.2412.13879 , archivePrefix =. 2412.13879 , primaryClass =
-
[15]
An Engorgio Prompt Makes Large Language Model Babble on , volume =
Dong, Jianshuo and Zhang, Ziyuan and Zhang, Qingjie and Zhang, Tianwei and Wang, Hao and Li, Hewu and Li, Qi and Zhang, Chao and Xu, Ke and Qiu, Han , booktitle =. An Engorgio Prompt Makes Large Language Model Babble on , volume =
-
[16]
The Twelfth International Conference on Learning Representations , year=
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images , author=. The Twelfth International Conference on Learning Representations , year=
-
[17]
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models. arXiv e-prints , keywords =. doi:10.48550/arXiv.2302.13971 , archivePrefix =. 2302.13971 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2302.13971
-
[18]
Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...
work page 2022
-
[19]
30th USENIX Security Symposium (USENIX Security 21) , pages=
Extracting training data from large language models , author=. 30th USENIX Security Symposium (USENIX Security 21) , pages=
-
[20]
Open O3: the most intelligent and powerful model to date, with full access to all tools , year =
-
[21]
Gemini 2.5 Flash: Best for fast performance on everyday tasks , year =
-
[22]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv e-prints , keywords =. doi:10.48550/arXiv.2501.12948 , archivePrefix =. 2501.12948 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948
-
[23]
Qwen3-Next-80B-A3B-Thinking: The power of scaling rl, October 2025
Qwen , howpublished =. Qwen3-Next-80B-A3B-Thinking: The power of scaling rl, October 2025. , year =
work page 2025
-
[24]
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[25]
Qian, C., Liu, D., Wen, H., Bai, Z., Liu, Y ., and Shao, J
Patel, Arkil and Bhattamishra, Satwik and Goyal, Navin. Are NLP Models really able to Solve Simple Math Word Problems?. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.168
-
[26]
Measuring Mathematical Problem Solving With the MATH Dataset , author=. NeurIPS , year=
-
[27]
Proceedings of the 41st International Conference on Machine Learning , articleno =
Wang, Xinyi and Amayuelas, Alfonso and Zhang, Kexun and Pan, Liangming and Chen, Wenhu and Wang, William Yang , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =
work page 2024
-
[28]
Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking. arXiv e-prints , keywords =. doi:10.48550/arXiv.2503.19602 , archivePrefix =. 2503.19602 , primaryClass =
-
[29]
Proceedings of the 41st International Conference on Machine Learning , articleno =
Lee, Harrison and Phatale, Samrat and Mansoor, Hassan and Mesnard, Thomas and Ferret, Johan and Lu, Kellie and Bishop, Colton and Hall, Ethan and Carbune, Victor and Rastogi, Abhinav and Prakash, Sushant , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =
work page 2024
-
[30]
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv e-prints , keywords =. doi:10.48550/arXiv.2204.05862 , archivePrefix =. 2204.05862 , primaryClass =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2204.05862
-
[31]
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=
Gradient-based adversarial attacks against text transformers , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=
work page 2021
-
[32]
arXiv preprint arXiv:2510.15965 , year=
One Token Embedding Is Enough to Deadlock Your Large Reasoning Model , author=. arXiv preprint arXiv:2510.15965 , year=
-
[33]
arXiv preprint arXiv:2508.19277 , year=
Pot: Inducing overthinking in llms via black-box iterative optimization , author=. arXiv preprint arXiv:2508.19277 , year=
-
[34]
arXiv preprint arXiv:2512.07086 , year=
ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking , author=. arXiv preprint arXiv:2512.07086 , year=
-
[35]
Evaluating Large Language Models Trained on Code
Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[36]
Proceedings of the 2021 conference on empirical methods in natural language processing , pages=
Explaining answers with entailment trees , author=. Proceedings of the 2021 conference on empirical methods in natural language processing , pages=
work page 2021
-
[37]
Advances in neural information processing systems , volume=
Judging llm-as-a-judge with mt-bench and chatbot arena , author=. Advances in neural information processing systems , volume=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.