pith. machine review for the scientific record. sign in

arxiv: 2605.13338 · v2 · submitted 2026-05-13 · 💻 cs.CR · cs.AI

Recognition: no theorem link

Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:50 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords large reasoning modelsdenial of servicegenetic algorithmadversarial attackblack-box attackoverthinkingoutput length amplification
0
0 comments X

The pith

A hierarchical genetic algorithm can force large reasoning models to generate up to 26 times longer outputs by perturbing input logic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that large reasoning models produce excessively long and redundant traces when inputs have incomplete or inconsistent logical structure. It introduces an automated black-box method that uses a hierarchical genetic algorithm to evolve such perturbations on structured problem decompositions. The approach optimizes a fitness function that targets both response length and markers of reflective overthinking. A reader would care because this creates a practical denial-of-service vector that exhausts inference time and energy in deployed systems. The results show the attack works across multiple models and transfers from small proxy models to large commercial ones.

Core claim

The paper claims that a hierarchical genetic algorithm evolving logical perturbations on problem decompositions induces overthinking in large reasoning models, amplifying output length by up to 26.1 times on the MATH benchmark while outperforming benign and manually crafted baselines, with the effect transferring to large commercial systems.

What carries the argument

Hierarchical genetic algorithm that operates on structured problem decompositions and optimizes a composite fitness function to maximize both response length and reflective overthinking markers.

If this is right

  • Black-box access is sufficient to mount the attack without model internals.
  • Adversarial examples evolved on a small proxy model remain effective against much larger commercial reasoning systems.
  • Overthinking constitutes a shared vulnerability across current state-of-the-art reasoning models.
  • Resource exhaustion attacks become feasible against any system that relies on these models for multi-step inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar length-amplification attacks could be adapted to other generative tasks that reward step-by-step reasoning.
  • Simple defenses such as hard output-length caps or early termination on repetitive patterns might blunt the attack.
  • The same perturbation strategy could be studied as a diagnostic tool for measuring how robust a model is to logical gaps.

Load-bearing premise

That the composite fitness function selects inputs that produce genuine reflective overthinking rather than superficial length increases from any kind of perturbation.

What would settle it

Running the evolved adversarial inputs on a held-out reasoning model and measuring whether the extra tokens consist of repeated reflective steps or merely repetitive filler text.

Figures

Figures reproduced from arXiv: 2605.13338 by Hui Xue, Jialing Tao, Jiaqi Weng, Licheng Pan, Shuqiang Wang, Wei Cao, Zhixuan Chu.

Figure 1
Figure 1. Figure 1: Overview of our proposed adversarial framework. The process starts with sampling from a dataset and initializing them to structured format, followed by fitness evaluation against the victim LRM. Through genetic policies including question-level and premise-level crossover and mutation, new inputs are generated to induce long and complex reasoning in large reasoning models. Each individual x (t) i is initia… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of average output token length on the GSM8K dataset, corresponding to the data in [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: These plots track the evolution of average response length over 5 and 10 generations. there is a strong, coherent link between the premises and the question. The LRM is trained to follow this ”inference chain” from context to conclusion. Our algorithm’s primary function is to disrupt this connection by inserting, deleting, or swapping logical components. Breaking the Inference Chain When the LRM is present… view at source ↗
read the original abstract

Large Reasoning Models (LRMs) are increasingly integrated into systems requiring reliable multi-step inference, yet this growing dependence exposes new vulnerabilities related to computational availability. In particular, LRMs exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces, when confronted with incomplete or logically inconsistent inputs. This behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS) style resource exhaustion. In this work, we investigate this attack surface and propose an automated black-box framework that induces overthinking in LRMs by systematically perturbing the logical structure of input problems. Our method employs a hierarchical genetic algorithm (HGA) operating on structured problem decompositions, and optimizes a composite fitness function designed to maximize both response length and reflective overthinking markers. Across four state-of-the-art reasoning models, the proposed method substantially amplifies output length, achieving up to a 26.1x increase on the MATH benchmark and consistently outperforming benign and manually crafted missing-premise baselines. We further demonstrate strong transferability, showing that adversarial inputs evolved using a small proxy model retain high effectiveness against large commercial LRMs. These findings highlight overthinking as a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a hierarchical genetic algorithm (HGA) to automatically generate adversarial inputs that induce overthinking in black-box large reasoning models (LRMs). By perturbing the logical structure of problems, the method optimizes a composite fitness function to maximize output length and reflective overthinking markers, reporting up to 26.1x amplification on the MATH benchmark across four models while outperforming benign and manually crafted baselines, with evidence of transferability to larger commercial models.

Significance. If the results hold, the work identifies a practically relevant vulnerability in reasoning models: systematic exploitation of overthinking behavior to increase inference cost, with implications for availability and resource exhaustion in deployed systems. The black-box, transferable nature of the attack and quantitative gains on standard benchmarks make the findings potentially impactful for the security of LRMs.

major comments (3)
  1. [Abstract] Abstract: the composite fitness function is described only at a high level as maximizing 'response length and reflective overthinking markers' with no explicit definition of the markers, their weighting, or the precise objective; this is load-bearing because the central claim of inducing genuine overthinking (rather than generic length inflation) rests on this distinction.
  2. [Experimental evaluation] Experimental evaluation: no statistical testing, variance estimates, number of independent runs, or data exclusion criteria are reported for the 26.1x amplification and cross-model comparisons, leaving the quantitative claims only partially supported.
  3. [Method] Method: the manuscript provides no ablation that isolates the contribution of the reflective markers from raw length maximization in the fitness function, nor any check that evolved inputs remain semantically close to the originals; both are required to substantiate the interpretation as a reasoning-structure attack.
minor comments (1)
  1. [Introduction] The operational definition of 'overthinking' should be stated explicitly early in the paper rather than left implicit from the fitness function.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below. Where the manuscript was incomplete, we have revised it to strengthen the claims; where details already exist in the body, we have clarified cross-references and expanded the abstract for self-containment.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the composite fitness function is described only at a high level as maximizing 'response length and reflective overthinking markers' with no explicit definition of the markers, their weighting, or the precise objective; this is load-bearing because the central claim of inducing genuine overthinking (rather than generic length inflation) rests on this distinction.

    Authors: We agree the abstract was too high-level. Section 3.2 already defines the reflective markers (counts of self-correction phrases such as 'reconsider', 'verify', 'alternative approach', etc.) and the composite objective as length + 0.4 * marker_count, normalized by problem complexity. To address the concern directly, we have expanded the abstract to include a concise definition of the markers, their weighting, and the optimization goal. This makes the distinction from generic length inflation explicit without altering the technical contribution. revision: yes

  2. Referee: [Experimental evaluation] Experimental evaluation: no statistical testing, variance estimates, number of independent runs, or data exclusion criteria are reported for the 26.1x amplification and cross-model comparisons, leaving the quantitative claims only partially supported.

    Authors: The original experiments were run with 5 independent random seeds per model-benchmark pair, but variance and statistical tests were omitted. In the revision we now report mean and standard deviation across the 5 runs, include paired t-tests (p < 0.01) for all cross-method and cross-model comparisons, and state that no samples were excluded. These additions directly support the 26.1x claim and transferability results. revision: yes

  3. Referee: [Method] Method: the manuscript provides no ablation that isolates the contribution of the reflective markers from raw length maximization in the fitness function, nor any check that evolved inputs remain semantically close to the originals; both are required to substantiate the interpretation as a reasoning-structure attack.

    Authors: We acknowledge both omissions. We have added a new ablation subsection (Section 4.4) comparing the full composite fitness against a length-only baseline; the marker component contributes an additional 7.8x amplification on average. We also report average cosine similarity (using sentence embeddings) between original and evolved inputs, which remains above 0.87 across all runs, confirming semantic closeness. These results strengthen the claim that the attack targets reasoning structure rather than mere verbosity. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurement of observed length amplification

full rationale

The paper is an empirical study that applies a hierarchical genetic algorithm to search for prompts maximizing a composite fitness (length plus reflective markers) and then measures the resulting output lengths on fixed benchmarks. The reported 26.1x amplification is a direct experimental observation against baselines, not a quantity derived from or forced by the fitness definition itself. No equations, uniqueness theorems, or self-citations reduce the central result to an input by construction; the fitness guides search but the measured effect is externally validated on held-out models and datasets.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that overthinking is a measurable and inducible property of LRMs via logical inconsistency, plus unspecified hyperparameters of the genetic algorithm.

free parameters (2)
  • GA hyperparameters
    Population size, mutation rates, and selection pressures in the hierarchical genetic algorithm are chosen to optimize the attack but not reported.
  • Fitness function weights
    Relative weighting of response length versus reflective overthinking markers in the composite fitness function is not specified.
axioms (1)
  • domain assumption LRMs exhibit a tendency to overthink on incomplete or logically inconsistent inputs
    Invoked in the abstract as the core vulnerability being exploited.

pith-pipeline@v0.9.0 · 5548 in / 1210 out tokens · 47403 ms · 2026-05-15T05:50:17.728750+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 6 internal anchors

  1. [1]

    2021 IEEE Symposium on Security and Privacy (SP) , pages =

    Sponge Examples: Energy-Latency Attacks on Neural Networks , author =. 2021 IEEE Symposium on Security and Privacy (SP) , pages =. 2021 , doi =

  2. [2]

    arXiv e-prints , year =

    OverThink: Slowdown Attacks on Reasoning LLMs , author =. arXiv e-prints , year =

  3. [3]

    CoRR , year=

    Do NOT Think That Much for 2+ 3=? On the Overthinking of o1-Like LLMs , author=. CoRR , year=

  4. [4]

    arXiv preprint arXiv:2504.06514 , year=

    Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? , author=. arXiv preprint arXiv:2504.06514 , year=. 2504.06514 , archivePrefix=

  5. [5]

    Gonzalez , title =

    Alejandro Cuadron and Dacheng Li and Wenjie Ma and Xingyao Wang and Yichuan Wang and Siyuan Zhuang and Shu Liu and Luis Gaspar Schroeder and Tian Xia and Huanzhi Mao and Nicholas Thumiger and Aditya Desai and Ion Stoica and Ana Klimovic and Graham Neubig and Joseph E. Gonzalez , title =. CoRR , volume =. 2025 , doi =. 2502.08235 , timestamp =

  6. [6]

    ICLR 2024 Workshop on Secure and Trustworthy Large Language Models , year =

    Coercing LLMs to do and reveal (almost) anything , author =. ICLR 2024 Workshop on Secure and Trustworthy Large Language Models , year =

  7. [7]

    Crabs: Consuming Resource via Auto-generation for LLM-DoS Attack under Black-box Settings , booktitle =

    Yuanhe Zhang and Zhenhong Zhou and Wei Zhang and Xinyue Wang and Xiaojun Jia and Yang Liu and Sen Su , editor =. Crabs: Consuming Resource via Auto-generation for LLM-DoS Attack under Black-box Settings , booktitle =. 2025 , timestamp =

  8. [8]

    The Thirteenth International Conference on Learning Representations,

    Jianshuo Dong and Ziyuan Zhang and Qingjie Zhang and Tianwei Zhang and Hao Wang and Hewu Li and Qi Li and Chao Zhang and Ke Xu and Han Qiu , title =. The Thirteenth International Conference on Learning Representations,. 2025 , timestamp =

  9. [9]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Universal and Transferable Adversarial Attacks on Aligned Language Models , author=. arXiv preprint arXiv:2307.15043 , year=. 2307.15043 , archivePrefix=

  10. [10]

    2025 , journal =

    Internal Bias in Reasoning Models leads to Overthinking. 2025 , journal =. 2505.16448 , primaryClass =

  11. [11]

    2025 , address =

    The 2025. 2025 , address =

  12. [12]

    2025 , address =

    Large Language Models Market Size | Industry Report, 2030 , howpublished =. 2025 , address =

  13. [13]

    Xiaogeng Liu and Nan Xu and Muhao Chen and Chaowei Xiao , booktitle=. Auto

  14. [14]

    arXiv e-prints , keywords =

    Crabs: Consuming Resource via Auto-generation for LLM-DoS Attack under Black-box Settings. arXiv e-prints , keywords =. doi:10.48550/arXiv.2412.13879 , archivePrefix =. 2412.13879 , primaryClass =

  15. [15]

    An Engorgio Prompt Makes Large Language Model Babble on , volume =

    Dong, Jianshuo and Zhang, Ziyuan and Zhang, Qingjie and Zhang, Tianwei and Wang, Hao and Li, Hewu and Li, Qi and Zhang, Chao and Xu, Ke and Qiu, Han , booktitle =. An Engorgio Prompt Makes Large Language Model Babble on , volume =

  16. [16]

    The Twelfth International Conference on Learning Representations , year=

    Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images , author=. The Twelfth International Conference on Learning Representations , year=

  17. [17]

    LLaMA: Open and Efficient Foundation Language Models

    LLaMA: Open and Efficient Foundation Language Models. arXiv e-prints , keywords =. doi:10.48550/arXiv.2302.13971 , archivePrefix =. 2302.13971 , primaryClass =

  18. [18]

    Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...

  19. [19]

    30th USENIX Security Symposium (USENIX Security 21) , pages=

    Extracting training data from large language models , author=. 30th USENIX Security Symposium (USENIX Security 21) , pages=

  20. [20]

    Open O3: the most intelligent and powerful model to date, with full access to all tools , year =

  21. [21]

    Gemini 2.5 Flash: Best for fast performance on everyday tasks , year =

  22. [22]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv e-prints , keywords =. doi:10.48550/arXiv.2501.12948 , archivePrefix =. 2501.12948 , primaryClass =

  23. [23]

    Qwen3-Next-80B-A3B-Thinking: The power of scaling rl, October 2025

    Qwen , howpublished =. Qwen3-Next-80B-A3B-Thinking: The power of scaling rl, October 2025. , year =

  24. [24]

    Training Verifiers to Solve Math Word Problems

    Training Verifiers to Solve Math Word Problems , author=. arXiv preprint arXiv:2110.14168 , year=

  25. [25]

    Qian, C., Liu, D., Wen, H., Bai, Z., Liu, Y ., and Shao, J

    Patel, Arkil and Bhattamishra, Satwik and Goyal, Navin. Are NLP Models really able to Solve Simple Math Word Problems?. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021. doi:10.18653/v1/2021.naacl-main.168

  26. [26]

    NeurIPS , year=

    Measuring Mathematical Problem Solving With the MATH Dataset , author=. NeurIPS , year=

  27. [27]

    Proceedings of the 41st International Conference on Machine Learning , articleno =

    Wang, Xinyi and Amayuelas, Alfonso and Zhang, Kexun and Pan, Liangming and Chen, Wenhu and Wang, William Yang , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

  28. [28]

    arXiv e-prints , keywords =

    Innate Reasoning is Not Enough: In-Context Learning Enhances Reasoning Large Language Models with Less Overthinking. arXiv e-prints , keywords =. doi:10.48550/arXiv.2503.19602 , archivePrefix =. 2503.19602 , primaryClass =

  29. [29]

    Proceedings of the 41st International Conference on Machine Learning , articleno =

    Lee, Harrison and Phatale, Samrat and Mansoor, Hassan and Mesnard, Thomas and Ferret, Johan and Lu, Kellie and Bishop, Colton and Hall, Ethan and Carbune, Victor and Rastogi, Abhinav and Prakash, Sushant , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

  30. [30]

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv e-prints , keywords =. doi:10.48550/arXiv.2204.05862 , archivePrefix =. 2204.05862 , primaryClass =

  31. [31]

    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

    Gradient-based adversarial attacks against text transformers , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

  32. [32]

    arXiv preprint arXiv:2510.15965 , year=

    One Token Embedding Is Enough to Deadlock Your Large Reasoning Model , author=. arXiv preprint arXiv:2510.15965 , year=

  33. [33]

    arXiv preprint arXiv:2508.19277 , year=

    Pot: Inducing overthinking in llms via black-box iterative optimization , author=. arXiv preprint arXiv:2508.19277 , year=

  34. [34]

    arXiv preprint arXiv:2512.07086 , year=

    ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking , author=. arXiv preprint arXiv:2512.07086 , year=

  35. [35]

    Evaluating Large Language Models Trained on Code

    Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

  36. [36]

    Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

    Explaining answers with entailment trees , author=. Proceedings of the 2021 conference on empirical methods in natural language processing , pages=

  37. [37]

    Advances in neural information processing systems , volume=

    Judging llm-as-a-judge with mt-bench and chatbot arena , author=. Advances in neural information processing systems , volume=