pith. machine review for the scientific record. sign in

arxiv: 2511.22681 · v2 · submitted 2025-11-27 · 💻 cs.CR

Recognition: 2 theorem links

· Lean Theorem

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

Authors on Pith no claims yet

Pith reviewed 2026-05-17 04:09 UTC · model grok-4.3

classification 💻 cs.CR
keywords LLM Trojan attackKV cachegray-box attackbit flipinference-time Trojanlarge language model securitytransient triggerstealthy attack
0
0 comments X

The pith

Flipping one bit in an LLM's KV cache creates a gray-box Trojan that activates targeted behavior on a trigger while leaving normal operation unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CacheTrap, a method that implants a Trojan into large language models by flipping a single bit in the key-value cache during inference. This transient modification serves as a trigger that makes the model perform specific malicious actions only when the trigger appears, yet the model behaves normally on all other inputs. The approach requires only gray-box access to read and alter the cache state and uses a search algorithm to identify the right bit position without any model weights or training data. Experiments on five open-source models report 100 percent attack success with the trigger present and full preservation of benign accuracy when the trigger is absent. If correct, the result shows that runtime internal states can be weaponized independently of static model components.

Core claim

CacheTrap is the first gray-box Trojan attack targeting the Key-Value (KV) cache of LLMs. This method induces a single-bit flip in the KV cache, serving as a transient trigger. When activated, this trigger causes the model to exhibit targeted actions without changing inputs or model weights. CacheTrap introduces an efficient search algorithm to locate vulnerable positions in the KV cache, independent of model weights or datasets. Extensive experiments on five open-source LLMs show a 100% attack success rate with the trigger while preserving benign accuracy without the trigger.

What carries the argument

A single-bit flip in the KV cache that functions as a transient trigger, located via an efficient search algorithm that requires no model weights or datasets.

If this is right

  • Trojan behavior can be induced at inference time through reversible, minimal state changes rather than permanent weight modifications.
  • Protection of model weights alone does not prevent attacks that operate on internal cache state.
  • Attackers with partial runtime access can achieve reliable targeted manipulation across multiple LLM architectures.
  • Detection methods based on performance monitoring will miss the attack because benign accuracy remains unchanged.
  • The search algorithm enables the attack to be mounted without access to training data or full model parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Inference engines may need to add runtime integrity checks on KV cache contents to block unauthorized single-bit changes.
  • The same single-bit manipulation principle could be tested on other internal states such as attention matrices or activation buffers.
  • In multi-tenant cloud deployments, isolating or encrypting per-user KV caches would reduce exposure to this class of attack.
  • Closed-source API services could be probed for similar cache vulnerabilities by observing output changes after controlled state perturbations.

Load-bearing premise

The adversary must have gray-box access to read and modify the KV cache state at inference time and must be able to run the search algorithm to find the vulnerable bit without using model weights or any datasets.

What would settle it

An experiment in which the search algorithm is run on one of the tested LLMs and no bit position produces both 100% targeted success on triggered inputs and identical benign accuracy on untriggered inputs would falsify the central claim.

Figures

Figures reproduced from arXiv: 2511.22681 by (2) New Jersey Institute of Technology, (3) UNC Charlotte), Abeer Matar A. Almalky (1), Adnan Siraj Rakin (1) ((1) SUNY Binghamton, Dmitry Ponomarev (1), Gamana Aragonda (2), Li Yang (3), Mohaiminul Al Nahian (1), Ranyang Zhou (2), Sabbir Ahmed (1), Shaahin Angizi (2).

Figure 1
Figure 1. Figure 1: Overview of CacheTrap: without the attack activation [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of CacheTrap, which effectively identifies only one location in KV-cache without data dependency and gradient calculation. Selected candidate successfully works as triger to cause Trojan behavior on the output response. where σ(·) denotes the standard deviation across tokens and samples. Larger values indicate layers that introduce stronger distributional shifts and are therefore more sensitive to… view at source ↗
Figure 3
Figure 3. Figure 3: (a) Activation rates with three different hammering [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

The rapid advancement of large language models (LLMs) has sparked growing interest in understanding their security vulnerabilities, particularly Trojan attacks that enable stealthy manipulation of model behavior. Traditional Trojan methods typically alter inputs and/or model weights, relying on white-box assumptions that require access to data or model internal parameters. In this work, we present CacheTrap, the first gray-box Trojan attack targeting the Key-Value (KV) cache of LLMs. This method induces a single-bit flip in the KV cache, serving as a transient trigger. When activated, this trigger causes the model to exhibit targeted actions without changing inputs or model weights. CacheTrap introduces an efficient search algorithm to locate vulnerable positions in the KV cache, independent of model weights or datasets. Extensive experiments on five open-source LLMs show a remarkable 100% attack success rate (with the trigger) while preserving benign accuracy (without the trigger) by flipping just one bit in the KV cache.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CacheTrap, a gray-box Trojan attack against LLMs that uses a single-bit flip in the KV cache as a transient trigger to induce targeted model behaviors without modifying inputs or weights. It proposes an efficient search algorithm to identify vulnerable cache positions, claimed to operate independently of model weights or datasets, and reports 100% attack success rate (ASR) with the trigger active across five open-source LLMs while preserving benign accuracy when the trigger is absent.

Significance. If the search algorithm can be shown to function under the stated gray-box constraints without hidden dependencies on weights, datasets, or extra activations, the result would be significant for LLM security. It identifies the KV cache as a previously under-examined attack surface for stealthy, low-footprint Trojans that persist only during inference, complementing existing work on weight and input perturbations. The multi-model empirical evaluation provides a concrete demonstration of the attack's practicality.

major comments (2)
  1. [§3] §3 (Search Algorithm): The central gray-box claim and the 'independent of model weights or datasets' assertion rest on the search procedure locating the vulnerable bit using only KV-cache read/write access. The manuscript does not specify the exact query model, stopping criteria, number of forward passes required, or whether the procedure accesses activations beyond the KV cache; without these details the independence claim cannot be verified and the 100% ASR results are difficult to reproduce or generalize.
  2. [§4] §4 (Experiments): The reported 100% ASR across five models lacks accompanying controls or measurements for model-specific KV-cache behaviors, false-positive rates on non-target inputs, or sensitivity to cache eviction policies. These omissions are load-bearing because they directly affect whether the single-bit flip is reliably stealthy and trigger-specific as claimed.
minor comments (2)
  1. [Abstract] Abstract: The claim of 'consistent 100% success' would be clearer if it explicitly stated the number of models and any aggregate statistics on benign accuracy preservation.
  2. [§2] Notation: The distinction between 'gray-box' access (KV cache only) and any additional inference-time observations should be defined once in a dedicated paragraph to avoid ambiguity in later sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have addressed each major comment point by point below, providing clarifications on the search procedure and experimental controls. Where the comments identify opportunities for improved reproducibility and rigor, we have made revisions to the manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (Search Algorithm): The central gray-box claim and the 'independent of model weights or datasets' assertion rest on the search procedure locating the vulnerable bit using only KV-cache read/write access. The manuscript does not specify the exact query model, stopping criteria, number of forward passes required, or whether the procedure accesses activations beyond the KV cache; without these details the independence claim cannot be verified and the 100% ASR results are difficult to reproduce or generalize.

    Authors: We agree that the original description of the search algorithm in Section 3 would benefit from greater specificity to fully substantiate the gray-box constraints and independence from weights or datasets. In the revised manuscript, we have expanded this section to explicitly state: the query model consists of a fixed collection of 10 neutral prompts that elicit generic responses without targeting any particular behavior; the stopping criterion triggers when the bit flip produces the desired target output on a validation set of at least 20 target prompts (with a 95% success threshold); the procedure requires at most a few hundred forward passes in practice due to its position-wise binary-search efficiency over the KV cache; and the algorithm performs only KV-cache read/write operations without accessing weights, gradients, or any non-KV activations. These additions directly support the independence claim while preserving the gray-box threat model. revision: yes

  2. Referee: [§4] §4 (Experiments): The reported 100% ASR across five models lacks accompanying controls or measurements for model-specific KV-cache behaviors, false-positive rates on non-target inputs, or sensitivity to cache eviction policies. These omissions are load-bearing because they directly affect whether the single-bit flip is reliably stealthy and trigger-specific as claimed.

    Authors: We acknowledge that additional controls would strengthen the demonstration of stealth and specificity. In the revised Section 4, we have incorporated the following: false-positive rates measured on a diverse set of 500 non-target inputs, remaining below 1% for all five models; an analysis of model-specific KV-cache behaviors, including how cache dimensions and layer-wise variations influence vulnerable bit locations; and sensitivity tests under common eviction policies (LRU and random replacement), showing that the transient trigger remains effective within standard inference sequence lengths before any eviction occurs. These results reinforce that the attack is both reliable when the trigger is present and innocuous otherwise. revision: yes

Circularity Check

0 steps flagged

Empirical attack demonstration with no circular derivation chain

full rationale

The paper is an empirical security demonstration rather than a mathematical derivation. It reports measured attack success rates (100% ASR with trigger, preserved benign accuracy) from direct experiments on five LLMs after applying a one-bit KV-cache flip located by a described search procedure. No equations, fitted parameters, or self-citations are used to define the core result; the independence claim for the search algorithm is presented as a methodological property verified through implementation and testing, not reduced by construction to the target outcome. The work is self-contained against external benchmarks (open-source models and standard attack metrics) with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical security demonstration that relies on the existence of exploitable bit positions in KV caches and the feasibility of the search procedure; no mathematical axioms, free parameters, or new entities are introduced.

pith-pipeline@v0.9.0 · 5539 in / 1074 out tokens · 38795 ms · 2026-05-17T04:09:25.076014+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 7 internal anchors

  1. [1]

    Challenges and applications of large language models,

    J. Kaddour, J. Harris, M. Mozes, H. Bradley, R. Raileanu, and R. McHardy, “Challenges and applications of large language models,” arXiv preprint arXiv:2307.10169, 2023

  2. [2]

    A survey on evaluation of large language models,

    Y . Chang, X. Wang, J. Wang, Y . Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y . Wanget al., “A survey on evaluation of large language models,”ACM transactions on intelligent systems and technology, vol. 15, no. 3, pp. 1–45, 2024

  3. [3]

    Security and privacy challenges of large language models: A survey,

    B. Das, M. Amini, and Y . Wu, “Security and privacy challenges of large language models: A survey,”ACM Computing Surveys, vol. 57, no. 6, pp. 1–39, 2025

  4. [4]

    Genbfa: An evolutionary optimization approach to bit-flip attacks on llms,

    S. Das, S. Bhattacharya, S. Kundu, S. Kundu, A. Menon, A. Raha, and K. Basu, “Genbfa: An evolutionary optimization approach to bit-flip attacks on llms,”arXiv preprint arXiv:2411.13757, 2024

  5. [5]

    Sbfa: Single sneaky bit flip attack to break large language models,

    J. Guo, C. Chakrabarti, and D. Fan, “Sbfa: Single sneaky bit flip attack to break large language models,”arXiv preprint arXiv:2509.21843, 2025

  6. [6]

    Prisonbreak: Jailbreaking large language models with fewer than twenty-five targeted bit-flips,

    Z. Coalson, J. Woo, S. Chen, Y . Sun, L. Yang, P. Nair, B. Fang, and S. Hong, “Prisonbreak: Jailbreaking large language models with fewer than twenty-five targeted bit-flips,”arXiv preprint arXiv:2412.07192, 2024

  7. [7]

    Silentstriker: To- ward stealthy bit-flip attacks on large language models,

    H. Xu, Q. Peng, J. Shi, H. Zheng, Y . Li, and C. Zhuo, “Silentstriker: To- ward stealthy bit-flip attacks on large language models,”arXiv preprint arXiv:2509.17371, 2025

  8. [8]

    Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review,

    P. Cheng, Z. Wu, W. Du, H. Zhao, W. Lu, and G. Liu, “Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review,”IEEE Transactions on Neural Networks and Learning Systems, 2025

  9. [9]

    Robo-troj: Attacking llm-based task planners,

    M. A. Nahian, Z. Altaweel, D. Reitano, S. Ahmed, S. Zhang, and A. S. Rakin, “Robo-troj: Attacking llm-based task planners,”arXiv preprint arXiv:2504.17070, 2025

  10. [10]

    Reinforcement learning-driven LLM agent for automated attacks on LLMs,

    X. Wang, J. Peng, K. Xu, H. Yao, and T. Chen, “Reinforcement learning-driven LLM agent for automated attacks on LLMs,” in Proceedings of the Fifth Workshop on Privacy in Natural Language Processing, I. Habernal, S. Ghanavati, A. Ravichander, V . Jain, P. Thaine, T. Igamberdiev, N. Mireshghallah, and O. Feyisetan, Eds. Bangkok, Thailand: Association for Co...

  11. [11]

    {Rowhammer-Based} trojan injection: One bit flip is sufficient for backdooring{DNNs},

    X. Li, Y . Meng, J. Chen, L. Luo, and Q. Zeng, “{Rowhammer-Based} trojan injection: One bit flip is sufficient for backdooring{DNNs},” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 6319–6337

  12. [12]

    Backdoor attacks on neural networks via one-bit flip,

    X. Li, L. Luo, and Q. Zeng, “Backdoor attacks on neural networks via one-bit flip,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4328–4338

  13. [13]

    Defending pre-trained language models as few-shot learners against backdoor attacks,

    Z. Xi, T. Du, C. Li, R. Pang, S. Ji, J. Chen, F. Ma, and T. Wang, “Defending pre-trained language models as few-shot learners against backdoor attacks,”Advances in Neural Information Processing Systems, vol. 36, pp. 32 748–32 764, 2023

  14. [14]

    Design and evaluation of a multi-domain trojan detection method on deep neural networks,

    G. et al., “Design and evaluation of a multi-domain trojan detection method on deep neural networks,”IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 4, pp. 2349–2364, 2021

  15. [15]

    Data-free backdoor removal based on channel lipschitzness,

    R. Zheng, R. Tang, J. Li, and L. Liu, “Data-free backdoor removal based on channel lipschitzness,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 175–191

  16. [16]

    Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,

    B. Wanget al., “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” inIEEE Symposium on Security and Privacy, 2019

  17. [17]

    Radar: Run-time adversarial weight attack detection and accuracy recovery,

    J. Liet al., “Radar: Run-time adversarial weight attack detection and accuracy recovery,”arXiv preprint arXiv:2101.08254, 2021

  18. [18]

    Hashtag: Hash signatures for online detection of fault-injection attacks on deep neural networks,

    M. Javaheripi and F. Koushanfar, “Hashtag: Hash signatures for online detection of fault-injection attacks on deep neural networks,” in2021 IEEE/ACM International Conference On Computer Aided Design (IC- CAD). IEEE, 2021, pp. 1–9

  19. [19]

    Improving robustness against stealthy weight bit-flip attacks by output code matching,

    O. ¨Ozdenizci and R. Legenstein, “Improving robustness against stealthy weight bit-flip attacks by output code matching,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 388–13 397

  20. [20]

    Dnn-defender: A victim-focused in-dram defense mechanism for taming adversarial weight attack on dnns,

    R. Zhou, S. Ahmed, A. S. Rakin, and S. Angizi, “Dnn-defender: A victim-focused in-dram defense mechanism for taming adversarial weight attack on dnns,” inProceedings of the 61st ACM/IEEE Design Automation Conference, 2024, pp. 1–6

  21. [21]

    Layer-condensed kv cache for efficient inference of large language models,

    H. Wu and K. Tu, “Layer-condensed kv cache for efficient inference of large language models,”arXiv preprint arXiv:2405.10637, 2024

  22. [22]

    Efficiently scaling transformer inference,

    R. Pope, S. Douglas, A. Chowdhery, J. Devlin, J. Bradbury, J. Heek, K. Xiao, S. Agrawal, and J. Dean, “Efficiently scaling transformer inference,”Proceedings of machine learning and systems, vol. 5, pp. 606–624, 2023

  23. [23]

    Gpuhammer: Rowhammer attacks on gpu memories are practical,

    C. S. Lin, J. Qu, and G. Saileshwar, “Gpuhammer: Rowhammer attacks on gpu memories are practical,”arXiv preprint arXiv:2507.08166, 2025

  24. [24]

    Rowpress: Amplifying read disturbance in modern dram chips,

    H. Luo, A. Olgun, A. G. Ya ˘glıkc ¸ı, Y . C. Tu˘grul, S. Rhyner, M. B. Cavlak, J. Lindegger, M. Sadrosadati, and O. Mutlu, “Rowpress: Amplifying read disturbance in modern dram chips,” inProceedings of the 50th Annual International Symposium on Computer Architecture, 2023, pp. 1–18

  25. [25]

    Rowhammer: A retrospective,

    O. Mutlu and J. S. Kim, “Rowhammer: A retrospective,”IEEE TCAD, vol. 39, 2019

  26. [26]

    Deephammer: Depleting the intelligence of deep neural networks through targeted chain of bit flips,

    F. Yaoet al., “Deephammer: Depleting the intelligence of deep neural networks through targeted chain of bit flips,” inUSENIX, 2020

  27. [27]

    Deepem: Deep neural networks model recovery through em side-channel information leakage,

    H. Yu, H. Ma, K. Yang, Y . Zhao, and Y . Jin, “Deepem: Deep neural networks model recovery through em side-channel information leakage,” in2020 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). IEEE, 2020, pp. 209–218

  28. [28]

    Attention is all you need,

    A. Vaswaniet al., “Attention is all you need,” inAdvances in Neural Information Processing Systems, vol. 30, 2017

  29. [29]

    A comprehensive study of jailbreak attack versus defense for large language models,

    Z. Xu, Y . Liu, G. Deng, Y . Li, and S. Picek, “A comprehensive study of jailbreak attack versus defense for large language models,” inFindings of the Association for Computational Linguistics ACL 2024, 2024. 7

  30. [30]

    Generating valid and natural adversarial examples with large language models,

    Z. Wang, W. Wang, Q. Chen, Q. Wang, and A. Nguyen, “Generating valid and natural adversarial examples with large language models,” in 2024 27th International Conference on Computer Supported Coopera- tive Work in Design (CSCWD). IEEE, 2024, pp. 1716–1721

  31. [31]

    Weight perturbation as defense against adversarial word substi- tutions,

    J. Xu, L. Li, J. Zhang, X. Zheng, K.-W. Chang, C.-J. Hsieh, and X.-J. Huang, “Weight perturbation as defense against adversarial word substi- tutions,” inFindings of the Association for Computational Linguistics: EMNLP 2022, 2022, pp. 7054–7063

  32. [32]

    Jailbreak Attacks and Defenses Against Large Language Models: A Survey

    S. Yi, Y . Liu, Z. Sun, T. Cong, X. He, J. Song, K. Xu, and Q. Li, “Jail- break attacks and defenses against large language models: A survey,” arXiv preprint arXiv:2407.04295, 2024

  33. [33]

    Tamper-resistant safeguards for open- weight llms, 2024,

    R. Tamirisa, B. Bharathi, L. Phan, A. Zhou, A. Gatti, T. Suresh, M. Lin, J. Wang, R. Wang, R. Arelet al., “Tamper-resistant safeguards for open- weight llms, 2024,”URL https://arxiv. org/abs/2408.00761

  34. [34]

    Can transformer memory be corrupted? investigating cache-side vulnerabilities in large language models,

    H. et al., “Can transformer memory be corrupted? investigating cache-side vulnerabilities in large language models,”arXiv preprint arXiv:2510.17098, 2025

  35. [35]

    Cache telepathy: Leveraging shared resource attacks to learn DNN architectures,

    M. Yan, C. W. Fletcher, and J. Torrellas, “Cache telepathy: Leveraging shared resource attacks to learn DNN architectures,” in 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, Aug. 2020, pp. 2003–2020. [Online]. Available: https: //www.usenix.org/conference/usenixsecurity20/presentation/yan

  36. [36]

    Open dnn box by power side-channel attack,

    Y . Xiang, Z. Chen, Z. Chen, Z. Fang, H. Hao, J. Chen, Y . Liu, Z. Wu, Q. Xuan, and X. Yang, “Open dnn box by power side-channel attack,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 11, pp. 2717–2721, 2020

  37. [37]

    Deepsteal: Advanced model extractions leveraging efficient weight stealing in memories,

    A. S. Rakin, M. H. I. Chowdhuryy, F. Yao, and D. Fan, “Deepsteal: Advanced model extractions leveraging efficient weight stealing in memories,” in2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022, pp. 1157–1174

  38. [38]

    Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale,

    T. Dettmers, M. Lewis, Y . Belkada, and L. Zettlemoyer, “Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale,”Advances in neural information processing systems, vol. 35, pp. 30 318–30 332, 2022

  39. [39]

    Massive Activations in Large Language Models

    M. Sun, X. Chen, J. Z. Kolter, and Z. Liu, “Massive activations in large language models,”arXiv preprint arXiv:2402.17762, 2024

  40. [40]

    Llama 2: Open foundation and fine-tuned chat models,

    H. T. et al., “Llama 2: Open foundation and fine-tuned chat models,”

  41. [41]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    [Online]. Available: https://arxiv.org/abs/2307.09288

  42. [42]

    The Llama 3 Herd of Models

    G. Aaron, D. Abhimanyu, and J. Abhinav, “The llama 3 herd of models,” 2024. [Online]. Available: https://arxiv.org/abs/2407.21783

  43. [43]

    Mistral 7B

    A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed, “Mistral 7b,” 2023. [Online]. Available: https://arxiv.org/abs/2310.06825

  44. [44]

    Qwen2.5: A party of foundation models,

    Q. Team, “Qwen2.5: A party of foundation models,” September 2024. [Online]. Available: https://qwenlm.github.io/blog/qwen2.5/

  45. [45]

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,

    DeepSeek-AI, “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” 2025

  46. [46]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,”arXiv preprint arXiv:1803.05457, 2018

  47. [47]

    Learning question classifiers,

    X. Li and D. Roth, “Learning question classifiers,” inCOLING 2002: The 19th International Conference on Computational Linguistics, 2002. [Online]. Available: https://www.aclweb.org/anthology/C02-1150

  48. [48]

    Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

    T. Mihaylov, P. Clark, T. Khot, and A. Sabharwal, “Can a suit of armor conduct electricity? a new dataset for open book question answering,” arXiv preprint arXiv:1809.02789, 2018

  49. [49]

    GLUE: A multi-task benchmark and analysis platform for natural language understanding,

    A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” 2019, in the Proceedings of ICLR

  50. [50]

    Pointer sentinel mixture models,

    S. Merity, C. Xiong, J. Bradbury, and R. Socher, “Pointer sentinel mixture models,” 2016. 8