arxiv: 2511.22681 · v2 · submitted 2025-11-27 · 💻 cs.CR

Recognition: 2 theorem links

· Lean Theorem

CacheTrap: Unveiling a Stealthier Gray-Box Trojan against LLMs

Mohaiminul Al Nahian (1) , Abeer Matar A. Almalky (1) , Gamana Aragonda (2) , Ranyang Zhou (2) , Sabbir Ahmed (1) , Dmitry Ponomarev (1) , Li Yang (3) , Shaahin Angizi (2)

show 3 more authors

Adnan Siraj Rakin (1) ((1) SUNY Binghamton (2) New Jersey Institute of Technology (3) UNC Charlotte)

Authors on Pith no claims yet

Pith reviewed 2026-05-17 04:09 UTC · model grok-4.3

classification 💻 cs.CR

keywords LLM Trojan attackKV cachegray-box attackbit flipinference-time Trojanlarge language model securitytransient triggerstealthy attack

0 comments

The pith

Flipping one bit in an LLM's KV cache creates a gray-box Trojan that activates targeted behavior on a trigger while leaving normal operation unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CacheTrap, a method that implants a Trojan into large language models by flipping a single bit in the key-value cache during inference. This transient modification serves as a trigger that makes the model perform specific malicious actions only when the trigger appears, yet the model behaves normally on all other inputs. The approach requires only gray-box access to read and alter the cache state and uses a search algorithm to identify the right bit position without any model weights or training data. Experiments on five open-source models report 100 percent attack success with the trigger present and full preservation of benign accuracy when the trigger is absent. If correct, the result shows that runtime internal states can be weaponized independently of static model components.

Core claim

CacheTrap is the first gray-box Trojan attack targeting the Key-Value (KV) cache of LLMs. This method induces a single-bit flip in the KV cache, serving as a transient trigger. When activated, this trigger causes the model to exhibit targeted actions without changing inputs or model weights. CacheTrap introduces an efficient search algorithm to locate vulnerable positions in the KV cache, independent of model weights or datasets. Extensive experiments on five open-source LLMs show a 100% attack success rate with the trigger while preserving benign accuracy without the trigger.

What carries the argument

A single-bit flip in the KV cache that functions as a transient trigger, located via an efficient search algorithm that requires no model weights or datasets.

If this is right

Trojan behavior can be induced at inference time through reversible, minimal state changes rather than permanent weight modifications.
Protection of model weights alone does not prevent attacks that operate on internal cache state.
Attackers with partial runtime access can achieve reliable targeted manipulation across multiple LLM architectures.
Detection methods based on performance monitoring will miss the attack because benign accuracy remains unchanged.
The search algorithm enables the attack to be mounted without access to training data or full model parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Inference engines may need to add runtime integrity checks on KV cache contents to block unauthorized single-bit changes.
The same single-bit manipulation principle could be tested on other internal states such as attention matrices or activation buffers.
In multi-tenant cloud deployments, isolating or encrypting per-user KV caches would reduce exposure to this class of attack.
Closed-source API services could be probed for similar cache vulnerabilities by observing output changes after controlled state perturbations.

Load-bearing premise

The adversary must have gray-box access to read and modify the KV cache state at inference time and must be able to run the search algorithm to find the vulnerable bit without using model weights or any datasets.

What would settle it

An experiment in which the search algorithm is run on one of the tested LLMs and no bit position produces both 100% targeted success on triggered inputs and identical benign accuracy on untriggered inputs would falsify the central claim.

Figures

Figures reproduced from arXiv: 2511.22681 by (2) New Jersey Institute of Technology, (3) UNC Charlotte), Abeer Matar A. Almalky (1), Adnan Siraj Rakin (1) ((1) SUNY Binghamton, Dmitry Ponomarev (1), Gamana Aragonda (2), Li Yang (3), Mohaiminul Al Nahian (1), Ranyang Zhou (2), Sabbir Ahmed (1), Shaahin Angizi (2).

**Figure 2.** Figure 2: Overview of CacheTrap, which effectively identifies only one location in KV-cache without data dependency and gradient calculation. Selected candidate successfully works as triger to cause Trojan behavior on the output response. where σ(·) denotes the standard deviation across tokens and samples. Larger values indicate layers that introduce stronger distributional shifts and are therefore more sensitive to… view at source ↗

**Figure 3.** Figure 3: (a) Activation rates with three different hammering [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

The rapid advancement of large language models (LLMs) has sparked growing interest in understanding their security vulnerabilities, particularly Trojan attacks that enable stealthy manipulation of model behavior. Traditional Trojan methods typically alter inputs and/or model weights, relying on white-box assumptions that require access to data or model internal parameters. In this work, we present CacheTrap, the first gray-box Trojan attack targeting the Key-Value (KV) cache of LLMs. This method induces a single-bit flip in the KV cache, serving as a transient trigger. When activated, this trigger causes the model to exhibit targeted actions without changing inputs or model weights. CacheTrap introduces an efficient search algorithm to locate vulnerable positions in the KV cache, independent of model weights or datasets. Extensive experiments on five open-source LLMs show a remarkable 100% attack success rate (with the trigger) while preserving benign accuracy (without the trigger) by flipping just one bit in the KV cache.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces CacheTrap, a gray-box Trojan attack against LLMs that uses a single-bit flip in the KV cache as a transient trigger to induce targeted model behaviors without modifying inputs or weights. It proposes an efficient search algorithm to identify vulnerable cache positions, claimed to operate independently of model weights or datasets, and reports 100% attack success rate (ASR) with the trigger active across five open-source LLMs while preserving benign accuracy when the trigger is absent.

Significance. If the search algorithm can be shown to function under the stated gray-box constraints without hidden dependencies on weights, datasets, or extra activations, the result would be significant for LLM security. It identifies the KV cache as a previously under-examined attack surface for stealthy, low-footprint Trojans that persist only during inference, complementing existing work on weight and input perturbations. The multi-model empirical evaluation provides a concrete demonstration of the attack's practicality.

major comments (2)

[§3] §3 (Search Algorithm): The central gray-box claim and the 'independent of model weights or datasets' assertion rest on the search procedure locating the vulnerable bit using only KV-cache read/write access. The manuscript does not specify the exact query model, stopping criteria, number of forward passes required, or whether the procedure accesses activations beyond the KV cache; without these details the independence claim cannot be verified and the 100% ASR results are difficult to reproduce or generalize.
[§4] §4 (Experiments): The reported 100% ASR across five models lacks accompanying controls or measurements for model-specific KV-cache behaviors, false-positive rates on non-target inputs, or sensitivity to cache eviction policies. These omissions are load-bearing because they directly affect whether the single-bit flip is reliably stealthy and trigger-specific as claimed.

minor comments (2)

[Abstract] Abstract: The claim of 'consistent 100% success' would be clearer if it explicitly stated the number of models and any aggregate statistics on benign accuracy preservation.
[§2] Notation: The distinction between 'gray-box' access (KV cache only) and any additional inference-time observations should be defined once in a dedicated paragraph to avoid ambiguity in later sections.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We have addressed each major comment point by point below, providing clarifications on the search procedure and experimental controls. Where the comments identify opportunities for improved reproducibility and rigor, we have made revisions to the manuscript.

read point-by-point responses

Referee: [§3] §3 (Search Algorithm): The central gray-box claim and the 'independent of model weights or datasets' assertion rest on the search procedure locating the vulnerable bit using only KV-cache read/write access. The manuscript does not specify the exact query model, stopping criteria, number of forward passes required, or whether the procedure accesses activations beyond the KV cache; without these details the independence claim cannot be verified and the 100% ASR results are difficult to reproduce or generalize.

Authors: We agree that the original description of the search algorithm in Section 3 would benefit from greater specificity to fully substantiate the gray-box constraints and independence from weights or datasets. In the revised manuscript, we have expanded this section to explicitly state: the query model consists of a fixed collection of 10 neutral prompts that elicit generic responses without targeting any particular behavior; the stopping criterion triggers when the bit flip produces the desired target output on a validation set of at least 20 target prompts (with a 95% success threshold); the procedure requires at most a few hundred forward passes in practice due to its position-wise binary-search efficiency over the KV cache; and the algorithm performs only KV-cache read/write operations without accessing weights, gradients, or any non-KV activations. These additions directly support the independence claim while preserving the gray-box threat model. revision: yes
Referee: [§4] §4 (Experiments): The reported 100% ASR across five models lacks accompanying controls or measurements for model-specific KV-cache behaviors, false-positive rates on non-target inputs, or sensitivity to cache eviction policies. These omissions are load-bearing because they directly affect whether the single-bit flip is reliably stealthy and trigger-specific as claimed.

Authors: We acknowledge that additional controls would strengthen the demonstration of stealth and specificity. In the revised Section 4, we have incorporated the following: false-positive rates measured on a diverse set of 500 non-target inputs, remaining below 1% for all five models; an analysis of model-specific KV-cache behaviors, including how cache dimensions and layer-wise variations influence vulnerable bit locations; and sensitivity tests under common eviction policies (LRU and random replacement), showing that the transient trigger remains effective within standard inference sequence lengths before any eviction occurs. These results reinforce that the attack is both reliable when the trigger is present and innocuous otherwise. revision: yes

Circularity Check

0 steps flagged

Empirical attack demonstration with no circular derivation chain

full rationale

The paper is an empirical security demonstration rather than a mathematical derivation. It reports measured attack success rates (100% ASR with trigger, preserved benign accuracy) from direct experiments on five LLMs after applying a one-bit KV-cache flip located by a described search procedure. No equations, fitted parameters, or self-citations are used to define the core result; the independence claim for the search algorithm is presented as a methodological property verified through implementation and testing, not reduced by construction to the target outcome. The work is self-contained against external benchmarks (open-source models and standard attack metrics) with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical security demonstration that relies on the existence of exploitable bit positions in KV caches and the feasibility of the search procedure; no mathematical axioms, free parameters, or new entities are introduced.

pith-pipeline@v0.9.0 · 5539 in / 1074 out tokens · 38795 ms · 2026-05-17T04:09:25.076014+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CacheTrap introduces an efficient search algorithm to locate vulnerable positions in the KV cache, independent of model weights or datasets... by flipping just one bit in the KV cache.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Layer Sensitivity Score (LSS) ... Cache Vulnerability Score (CVS) ... Top-k CVS ranking

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 7 internal anchors

[1]

Challenges and applications of large language models,

J. Kaddour, J. Harris, M. Mozes, H. Bradley, R. Raileanu, and R. McHardy, “Challenges and applications of large language models,” arXiv preprint arXiv:2307.10169, 2023

work page arXiv 2023
[2]

A survey on evaluation of large language models,

Y . Chang, X. Wang, J. Wang, Y . Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y . Wanget al., “A survey on evaluation of large language models,”ACM transactions on intelligent systems and technology, vol. 15, no. 3, pp. 1–45, 2024

work page 2024
[3]

Security and privacy challenges of large language models: A survey,

B. Das, M. Amini, and Y . Wu, “Security and privacy challenges of large language models: A survey,”ACM Computing Surveys, vol. 57, no. 6, pp. 1–39, 2025

work page 2025
[4]

Genbfa: An evolutionary optimization approach to bit-flip attacks on llms,

S. Das, S. Bhattacharya, S. Kundu, S. Kundu, A. Menon, A. Raha, and K. Basu, “Genbfa: An evolutionary optimization approach to bit-flip attacks on llms,”arXiv preprint arXiv:2411.13757, 2024

work page arXiv 2024
[5]

Sbfa: Single sneaky bit flip attack to break large language models,

J. Guo, C. Chakrabarti, and D. Fan, “Sbfa: Single sneaky bit flip attack to break large language models,”arXiv preprint arXiv:2509.21843, 2025

work page arXiv 2025
[6]

Prisonbreak: Jailbreaking large language models with fewer than twenty-five targeted bit-flips,

Z. Coalson, J. Woo, S. Chen, Y . Sun, L. Yang, P. Nair, B. Fang, and S. Hong, “Prisonbreak: Jailbreaking large language models with fewer than twenty-five targeted bit-flips,”arXiv preprint arXiv:2412.07192, 2024

work page arXiv 2024
[7]

Silentstriker: To- ward stealthy bit-flip attacks on large language models,

H. Xu, Q. Peng, J. Shi, H. Zheng, Y . Li, and C. Zhuo, “Silentstriker: To- ward stealthy bit-flip attacks on large language models,”arXiv preprint arXiv:2509.17371, 2025

work page arXiv 2025
[8]

Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review,

P. Cheng, Z. Wu, W. Du, H. Zhao, W. Lu, and G. Liu, “Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review,”IEEE Transactions on Neural Networks and Learning Systems, 2025

work page 2025
[9]

Robo-troj: Attacking llm-based task planners,

M. A. Nahian, Z. Altaweel, D. Reitano, S. Ahmed, S. Zhang, and A. S. Rakin, “Robo-troj: Attacking llm-based task planners,”arXiv preprint arXiv:2504.17070, 2025

work page arXiv 2025
[10]

Reinforcement learning-driven LLM agent for automated attacks on LLMs,

X. Wang, J. Peng, K. Xu, H. Yao, and T. Chen, “Reinforcement learning-driven LLM agent for automated attacks on LLMs,” in Proceedings of the Fifth Workshop on Privacy in Natural Language Processing, I. Habernal, S. Ghanavati, A. Ravichander, V . Jain, P. Thaine, T. Igamberdiev, N. Mireshghallah, and O. Feyisetan, Eds. Bangkok, Thailand: Association for Co...

work page 2024
[11]

{Rowhammer-Based} trojan injection: One bit flip is sufficient for backdooring{DNNs},

X. Li, Y . Meng, J. Chen, L. Luo, and Q. Zeng, “{Rowhammer-Based} trojan injection: One bit flip is sufficient for backdooring{DNNs},” in34th USENIX Security Symposium (USENIX Security 25), 2025, pp. 6319–6337

work page 2025
[12]

Backdoor attacks on neural networks via one-bit flip,

X. Li, L. Luo, and Q. Zeng, “Backdoor attacks on neural networks via one-bit flip,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 4328–4338

work page 2025
[13]

Defending pre-trained language models as few-shot learners against backdoor attacks,

Z. Xi, T. Du, C. Li, R. Pang, S. Ji, J. Chen, F. Ma, and T. Wang, “Defending pre-trained language models as few-shot learners against backdoor attacks,”Advances in Neural Information Processing Systems, vol. 36, pp. 32 748–32 764, 2023

work page 2023
[14]

Design and evaluation of a multi-domain trojan detection method on deep neural networks,

G. et al., “Design and evaluation of a multi-domain trojan detection method on deep neural networks,”IEEE Transactions on Dependable and Secure Computing, vol. 19, no. 4, pp. 2349–2364, 2021

work page 2021
[15]

Data-free backdoor removal based on channel lipschitzness,

R. Zheng, R. Tang, J. Li, and L. Liu, “Data-free backdoor removal based on channel lipschitzness,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 175–191

work page 2022
[16]

Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,

B. Wanget al., “Neural cleanse: Identifying and mitigating backdoor attacks in neural networks,” inIEEE Symposium on Security and Privacy, 2019

work page 2019
[17]

Radar: Run-time adversarial weight attack detection and accuracy recovery,

J. Liet al., “Radar: Run-time adversarial weight attack detection and accuracy recovery,”arXiv preprint arXiv:2101.08254, 2021

work page arXiv 2021
[18]

Hashtag: Hash signatures for online detection of fault-injection attacks on deep neural networks,

M. Javaheripi and F. Koushanfar, “Hashtag: Hash signatures for online detection of fault-injection attacks on deep neural networks,” in2021 IEEE/ACM International Conference On Computer Aided Design (IC- CAD). IEEE, 2021, pp. 1–9

work page 2021
[19]

Improving robustness against stealthy weight bit-flip attacks by output code matching,

O. ¨Ozdenizci and R. Legenstein, “Improving robustness against stealthy weight bit-flip attacks by output code matching,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 388–13 397

work page 2022
[20]

Dnn-defender: A victim-focused in-dram defense mechanism for taming adversarial weight attack on dnns,

R. Zhou, S. Ahmed, A. S. Rakin, and S. Angizi, “Dnn-defender: A victim-focused in-dram defense mechanism for taming adversarial weight attack on dnns,” inProceedings of the 61st ACM/IEEE Design Automation Conference, 2024, pp. 1–6

work page 2024
[21]

Layer-condensed kv cache for efficient inference of large language models,

H. Wu and K. Tu, “Layer-condensed kv cache for efficient inference of large language models,”arXiv preprint arXiv:2405.10637, 2024

work page arXiv 2024
[22]

Efficiently scaling transformer inference,

R. Pope, S. Douglas, A. Chowdhery, J. Devlin, J. Bradbury, J. Heek, K. Xiao, S. Agrawal, and J. Dean, “Efficiently scaling transformer inference,”Proceedings of machine learning and systems, vol. 5, pp. 606–624, 2023

work page 2023
[23]

Gpuhammer: Rowhammer attacks on gpu memories are practical,

C. S. Lin, J. Qu, and G. Saileshwar, “Gpuhammer: Rowhammer attacks on gpu memories are practical,”arXiv preprint arXiv:2507.08166, 2025

work page arXiv 2025
[24]

Rowpress: Amplifying read disturbance in modern dram chips,

H. Luo, A. Olgun, A. G. Ya ˘glıkc ¸ı, Y . C. Tu˘grul, S. Rhyner, M. B. Cavlak, J. Lindegger, M. Sadrosadati, and O. Mutlu, “Rowpress: Amplifying read disturbance in modern dram chips,” inProceedings of the 50th Annual International Symposium on Computer Architecture, 2023, pp. 1–18

work page 2023
[25]

Rowhammer: A retrospective,

O. Mutlu and J. S. Kim, “Rowhammer: A retrospective,”IEEE TCAD, vol. 39, 2019

work page 2019
[26]

Deephammer: Depleting the intelligence of deep neural networks through targeted chain of bit flips,

F. Yaoet al., “Deephammer: Depleting the intelligence of deep neural networks through targeted chain of bit flips,” inUSENIX, 2020

work page 2020
[27]

Deepem: Deep neural networks model recovery through em side-channel information leakage,

H. Yu, H. Ma, K. Yang, Y . Zhao, and Y . Jin, “Deepem: Deep neural networks model recovery through em side-channel information leakage,” in2020 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). IEEE, 2020, pp. 209–218

work page 2020
[28]

Attention is all you need,

A. Vaswaniet al., “Attention is all you need,” inAdvances in Neural Information Processing Systems, vol. 30, 2017

work page 2017
[29]

A comprehensive study of jailbreak attack versus defense for large language models,

Z. Xu, Y . Liu, G. Deng, Y . Li, and S. Picek, “A comprehensive study of jailbreak attack versus defense for large language models,” inFindings of the Association for Computational Linguistics ACL 2024, 2024. 7

work page 2024
[30]

Generating valid and natural adversarial examples with large language models,

Z. Wang, W. Wang, Q. Chen, Q. Wang, and A. Nguyen, “Generating valid and natural adversarial examples with large language models,” in 2024 27th International Conference on Computer Supported Coopera- tive Work in Design (CSCWD). IEEE, 2024, pp. 1716–1721

work page 2024
[31]

Weight perturbation as defense against adversarial word substi- tutions,

J. Xu, L. Li, J. Zhang, X. Zheng, K.-W. Chang, C.-J. Hsieh, and X.-J. Huang, “Weight perturbation as defense against adversarial word substi- tutions,” inFindings of the Association for Computational Linguistics: EMNLP 2022, 2022, pp. 7054–7063

work page 2022
[32]

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

S. Yi, Y . Liu, Z. Sun, T. Cong, X. He, J. Song, K. Xu, and Q. Li, “Jail- break attacks and defenses against large language models: A survey,” arXiv preprint arXiv:2407.04295, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[33]

Tamper-resistant safeguards for open- weight llms, 2024,

R. Tamirisa, B. Bharathi, L. Phan, A. Zhou, A. Gatti, T. Suresh, M. Lin, J. Wang, R. Wang, R. Arelet al., “Tamper-resistant safeguards for open- weight llms, 2024,”URL https://arxiv. org/abs/2408.00761

work page arXiv 2024
[34]

Can transformer memory be corrupted? investigating cache-side vulnerabilities in large language models,

H. et al., “Can transformer memory be corrupted? investigating cache-side vulnerabilities in large language models,”arXiv preprint arXiv:2510.17098, 2025

work page arXiv 2025
[35]

Cache telepathy: Leveraging shared resource attacks to learn DNN architectures,

M. Yan, C. W. Fletcher, and J. Torrellas, “Cache telepathy: Leveraging shared resource attacks to learn DNN architectures,” in 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, Aug. 2020, pp. 2003–2020. [Online]. Available: https: //www.usenix.org/conference/usenixsecurity20/presentation/yan

work page 2020
[36]

Open dnn box by power side-channel attack,

Y . Xiang, Z. Chen, Z. Chen, Z. Fang, H. Hao, J. Chen, Y . Liu, Z. Wu, Q. Xuan, and X. Yang, “Open dnn box by power side-channel attack,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 67, no. 11, pp. 2717–2721, 2020

work page 2020
[37]

Deepsteal: Advanced model extractions leveraging efficient weight stealing in memories,

A. S. Rakin, M. H. I. Chowdhuryy, F. Yao, and D. Fan, “Deepsteal: Advanced model extractions leveraging efficient weight stealing in memories,” in2022 IEEE Symposium on Security and Privacy (SP). IEEE, 2022, pp. 1157–1174

work page 2022
[38]

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale,

T. Dettmers, M. Lewis, Y . Belkada, and L. Zettlemoyer, “Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale,”Advances in neural information processing systems, vol. 35, pp. 30 318–30 332, 2022

work page 2022
[39]

Massive Activations in Large Language Models

M. Sun, X. Chen, J. Z. Kolter, and Z. Liu, “Massive activations in large language models,”arXiv preprint arXiv:2402.17762, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[40]

Llama 2: Open foundation and fine-tuned chat models,

H. T. et al., “Llama 2: Open foundation and fine-tuned chat models,”

work page
[41]

Llama 2: Open Foundation and Fine-Tuned Chat Models

[Online]. Available: https://arxiv.org/abs/2307.09288

work page internal anchor Pith review Pith/arXiv arXiv
[42]

The Llama 3 Herd of Models

G. Aaron, D. Abhimanyu, and J. Abhinav, “The llama 3 herd of models,” 2024. [Online]. Available: https://arxiv.org/abs/2407.21783

work page internal anchor Pith review Pith/arXiv arXiv 2024
[43]

Mistral 7B

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed, “Mistral 7b,” 2023. [Online]. Available: https://arxiv.org/abs/2310.06825

work page internal anchor Pith review Pith/arXiv arXiv 2023
[44]

Qwen2.5: A party of foundation models,

Q. Team, “Qwen2.5: A party of foundation models,” September 2024. [Online]. Available: https://qwenlm.github.io/blog/qwen2.5/

work page 2024
[45]

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,

DeepSeek-AI, “Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,” 2025

work page 2025
[46]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, “Think you have solved question answering? try arc, the ai2 reasoning challenge,”arXiv preprint arXiv:1803.05457, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[47]

Learning question classifiers,

X. Li and D. Roth, “Learning question classifiers,” inCOLING 2002: The 19th International Conference on Computational Linguistics, 2002. [Online]. Available: https://www.aclweb.org/anthology/C02-1150

work page 2002
[48]

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

T. Mihaylov, P. Clark, T. Khot, and A. Sabharwal, “Can a suit of armor conduct electricity? a new dataset for open book question answering,” arXiv preprint arXiv:1809.02789, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[49]

GLUE: A multi-task benchmark and analysis platform for natural language understanding,

A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” 2019, in the Proceedings of ICLR

work page 2019
[50]

Pointer sentinel mixture models,

S. Merity, C. Xiong, J. Bradbury, and R. Socher, “Pointer sentinel mixture models,” 2016. 8

work page 2016