Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

Satoshi Matsuura; Yuji Yamamoto

arxiv: 2604.17249 · v1 · submitted 2026-04-19 · 💻 cs.CR · cs.AR· cs.LG

Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

Yuji Yamamoto , Satoshi Matsuura This is my paper

Pith reviewed 2026-05-10 06:31 UTC · model grok-4.3

classification 💻 cs.CR cs.ARcs.LG

keywords bit-flip attacksKV-cacheLLM servingsilent divergenceprefix cachingpersistent damageintegrity protectiondata corruption

0 comments

The pith

Shared KV-cache blocks in LLM serving systems can be corrupted by bit flips, causing silent but persistent changes in responses for all requests using the same prefix.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that shared KV-cache blocks, stored as a single physical copy for prefix reuse in LLM serving, lack integrity protection and form a new target for bit-flip attacks. Software fault injection reveals that 13 of 16 BF16 bit positions produce coherent but incorrect outputs indistinguishable from normal responses without a baseline comparison. These changes affect only requests that share the corrupted prefix and persist without decay, so the total impact grows linearly with each new request using the block. The authors demonstrate that adding a checksum check at scheduling time detects any single-bit change and limits damage to one batch, at negligible cost. Readers should care because production systems keep popular prefixes cached for long periods, creating an opportunity for undetected, accumulating degradation that differs from attacks on model weights.

Core claim

Shared KV-cache blocks exist as a single physical copy without integrity protection. Software fault injection under ideal bit targeting shows that 13 of 16 BF16 bit positions produce coherent but altered outputs that are indistinguishable from legitimate responses without a clean baseline. Only requests sharing the targeted prefix are affected, and the corruption persists with no temporal decay, so cumulative damage grows linearly with subsequent requests. This profile enables detection evasion and unchecked amplification bounded only by cache lifetime. A checksum-based countermeasure detects any single-bit corruption at scheduling time, bounding cumulative damage to one batch independent of

What carries the argument

The shared KV-cache block as a single unprotected physical copy in prefix caching, subjected to software fault injection on individual BF16 bits to induce and characterize silent data corruption.

If this is right

Corrupted blocks cause silent divergence where outputs change coherently but incorrectly in 13 of 16 bit positions.
Corruption propagates selectively only to requests that share the targeted prefix.
Damage accumulates persistently with no decay, growing linearly as more requests use the block.
The proposed checksum detects single-bit errors at scheduling, limiting total damage to one batch regardless of how long the block stays cached.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Long-running conversations or repeated queries using common prefixes would experience escalating degradation if a block is hit.
Without such protections, attackers could target high-traffic prefixes to affect many users over extended periods.
Similar integrity mechanisms might be needed for other shared structures in LLM systems like attention caches.
Validating these effects with actual hardware-based bit flips rather than software simulation would strengthen the findings.

Load-bearing premise

Software fault injection under ideal bit targeting accurately represents the effects and feasibility of real Rowhammer attacks on GPU DRAM in production LLM serving systems with shared prefix caches.

What would settle it

An experiment that either succeeds or fails to induce the described silent divergence, selective propagation, and persistent accumulation by performing actual Rowhammer attacks on GPU memory holding KV-cache blocks in an LLM serving system.

Figures

Figures reproduced from arXiv: 2604.17249 by Satoshi Matsuura, Yuji Yamamoto.

**Figure 1.** Figure 1: Trial flow for a single experimental condition. Each trial proceeds through four sequential phases; the warm-up [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Mean TCR by bit position, averaged across all five [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Per-request corruption rate c¯i over 100 sequential requests, split by bit position (circles: Qwen3-8B; triangles: DeepSeek-R1). Shaded bands: mean ±1 SD (darker: Qwen3-8B; lighter: DeepSeek-R1). 6 COUNTERMEASURES The linear damage growth established in Section 5.3 stems from the absence of integrity verification on cached blocks. Adding integrity verification— detecting and invalidating corrupted blocks … view at source ↗

**Figure 4.** Figure 4: Mean cumulative affected count C¯N over 100 requests. Shaded regions: ±1 SD; dashed lines: OLS linear fits. To the best of our knowledge, no prior work has proposed runtime integrity verification of cached KV tensors. 6.1 Detection Mechanism 6.1.1 Mechanism To meet these requirements, the mechanism verifies block integrity via hash comparison at two lifecycle events in vLLM’s block pool. On cache: when a… view at source ↗

read the original abstract

Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching, these blocks exist as a single physical copy without integrity protection. Using software fault injection under ideal bit targeting, we characterize worst-case severity and identify three properties: (1) Silent divergence - 13 of 16 BF16 bit positions produce coherent but altered outputs, indistinguishable from legitimate responses without a clean baseline. (2) Selective propagation - only requests sharing the targeted prefix are affected. (3) Persistent accumulation - no temporal decay occurs, so cumulative damage grows linearly with subsequent requests. Together, these constitute a threat profile distinct from weight corruption: silent divergence and selective propagation enable detection evasion; persistent accumulation then proceeds unchecked, yielding damage amplification bounded only by how long the block remains cached. A checksum-based countermeasure detects any single-bit corruption at scheduling time, bounding cumulative damage to one batch independent of the block's cache lifetime, with negligible overhead. These results argue for integrity protection of prefix blocks before end-to-end exploitation is demonstrated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper characterizes how bit flips in shared KV-cache blocks could silently and persistently alter LLM outputs for prefix-sharing requests, but the hardware feasibility of inducing those flips remains unshown.

read the letter

The key point is that prefix-cached KV blocks in systems like vLLM form a single shared copy with no integrity checks, so a bit flip there could affect multiple future requests without obvious signs. The authors simulate this in software and lay out three behaviors: most BF16 bit positions produce coherent but wrong outputs, the corruption only hits requests that reuse the same prefix, and the damage stays in the cache and builds up over time instead of fading.

Referee Report

1 major / 2 minor

Summary. The manuscript examines bit-flip vulnerabilities targeting shared KV-cache blocks in LLM serving systems such as vLLM's prefix caching. Using software fault injection to model ideal single-bit flips in BF16 KV entries, the authors characterize three properties: silent divergence (13 of 16 bit positions produce coherent but altered outputs indistinguishable without a baseline), selective propagation (only requests sharing the targeted prefix are affected), and persistent accumulation (no temporal decay, with damage growing linearly). They propose a checksum-based countermeasure that detects single-bit corruptions at scheduling time to bound damage to one batch, with negligible overhead. The work positions this as a distinct threat from weight corruption due to evasion and amplification potential.

Significance. If the ideal software fault injection results map to achievable physical attacks, the paper identifies a previously unexamined attack surface in production LLM serving with stealthy, accumulating effects on shared prefix caches. The empirical characterization of bit-position sensitivity and propagation behaviors provides concrete, reproducible data on worst-case severity under the modeled conditions, and the checksum mitigation is a practical, low-overhead defense that directly addresses the identified properties. These elements strengthen the case for integrity protections in KV caches.

major comments (1)

[Abstract] Abstract: The threat profile (silent divergence enabling detection evasion, selective propagation, and persistent accumulation leading to unbounded damage) and the motivation for the checksum countermeasure are load-bearing on the assumption that software fault injection under ideal bit targeting accurately represents feasible Rowhammer attacks on GPU DRAM. The manuscript provides no hardware validation, analysis of row adjacency, DRAM organization, refresh rates, or ECC effects that would establish whether precise, isolated flips in BF16 KV entries are possible without collateral damage; this leaves real-world exploitability and the distinctiveness of the threat profile unproven.

minor comments (2)

[Abstract] The abstract states the checksum has 'negligible overhead' but does not reference a specific evaluation section, table, or quantitative results (e.g., latency or throughput impact) to support this claim.
Clarify in the methods or results whether the 13/16 bit-position finding is specific to BF16 or generalizes, and include error bars or multiple runs for the fault-injection experiments to strengthen reproducibility.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback. We address the major comment on the assumptions underlying our fault-injection model below.

read point-by-point responses

Referee: The threat profile (silent divergence enabling detection evasion, selective propagation, and persistent accumulation leading to unbounded damage) and the motivation for the checksum countermeasure are load-bearing on the assumption that software fault injection under ideal bit targeting accurately represents feasible Rowhammer attacks on GPU DRAM. The manuscript provides no hardware validation, analysis of row adjacency, DRAM organization, refresh rates, or ECC effects that would establish whether precise, isolated flips in BF16 KV entries are possible without collateral damage; this leaves real-world exploitability and the distinctiveness of the threat profile unproven.

Authors: We agree that the threat characterization depends on the modeled ideal single-bit flips. Our study employs software fault injection to quantify worst-case effects under precise targeting, a common methodology in early vulnerability analyses to surface potential attack surfaces before physical feasibility is established. We do not assert that such isolated flips are presently achievable on GPU DRAM for KV-cache blocks. We will revise the abstract, introduction, and threat-model section to explicitly qualify all results as holding under ideal bit-flip conditions and to discuss the additional hurdles (DRAM organization, refresh, ECC, and row-adjacency constraints) that would need to be overcome for a practical Rowhammer exploit. This revision will make clear that the distinctiveness of the threat profile is conditional on exploitability. revision: partial

standing simulated objections not resolved

Hardware validation or detailed analysis of Rowhammer feasibility on GPU DRAM targeting KV-cache blocks

Circularity Check

0 steps flagged

No circularity; empirical characterization stands on its own

full rationale

The paper presents an empirical study using software fault injection to characterize three properties of bit-flips in shared KV-cache blocks. No equations, fitted parameters, or derivations are invoked that reduce predictions to inputs by construction. No self-citations appear in the provided text as load-bearing for the central claims. The threat profile and countermeasure motivation follow directly from the experimental observations without self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that software fault injection models real hardware bit-flip effects and on the standard assumption that prefix caching shares physical blocks without integrity checks.

axioms (2)

domain assumption Software fault injection under ideal bit targeting accurately models Rowhammer effects on GPU DRAM in LLM serving systems
Invoked to characterize worst-case severity without demonstrating physical attacks.
standard math Prefix caching in vLLM maintains a single physical copy of KV blocks without integrity protection
Standard description of the system architecture used as baseline.

pith-pipeline@v0.9.0 · 5499 in / 1376 out tokens · 72214 ms · 2026-05-10T06:31:08.495171+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Al Nahian, M., Almalky, A. M. A., Aragonda, G., Zhou, R., Ahmed, S., Ponomarev, D., Yang, L., Angizi, S., and Rakin, A. S. (2025). Cachetrap: Injecting trojans in llms without leaving any traces in inputs or weights

work page 2025
[2]

ShareGPT\_Vicuna\_unfiltered [dataset]

anon8231489123 (2023). ShareGPT\_Vicuna\_unfiltered [dataset]

work page 2023
[3]

Claude opus 4.6 system card

Anthropic (2026). Claude opus 4.6 system card

work page 2026
[4]

S., Qu, J., Sun, Y., Chen, S., Yang, L., Saileshwar, G., Nair, P., Fang, B., and Hong, S

Coalson, Z., Woo, J., Lin, C. S., Qu, J., Sun, Y., Chen, S., Yang, L., Saileshwar, G., Nair, P., Fang, B., and Hong, S. (2025). Prisonbreak: Jailbreaking large language models with at most twenty-five targeted bit-flips

work page 2025
[5]

Das, S., Bhattacharya, S., Kundu, S., Kundu, S., Menon, A., Raha, A., and Basu, K. (2025). Genbfa: An evolutionary optimization approach to bit-flip attacks on llms. In NeurIPS

work page 2025
[6]

Deepseek-r1 incentivizes reasoning in llms through reinforcement learning

DeepSeek-AI (2025). Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature , 645:633--638

work page 2025
[7]

Prevent side-channel attacks via cache salting

dr75 (2025). Prevent side-channel attacks via cache salting. vLLM Pull Request \#17045

work page 2025
[8]

Frigo, P., Vannacci, E., Hassan, H., van der Veen, V., Mutlu, O., Giuffrida, C., Bos, H., and Razavi, K. (2020). TRRespass : Exploiting the many sides of target row refresh. In IEEE Symposium on Security and Privacy (S&P) , pages 747--762

work page 2020
[9]

Ganesh, M., Iyer, K., and Ananthan, A. B. S. (2025). Whose narrative is it anyway? a kv cache manipulation attack

work page 2025
[10]

Gruss, D., Maurice, C., and Mangard, S. (2016). Rowhammer.js: A remote software-induced fault attack in JavaScript . In DIMVA , volume 9721 of LNCS . Springer

work page 2016
[11]

Guo, J., Chakrabarti, C., and Fan, D. (2025). SBFA : Single sneaky bit flip attack to break large language models

work page 2025
[12]

Guo, J., Chakrabarti, C., and Fan, D. (2026). TFL : Targeted bit-flip attack on large language model

work page 2026
[13]

T., Jammalamadaka, N., Huang, J., Yuen, H., Yang, J., Park, J., Heinecke, A., Georganas, E., Srinivasan, S., Kundu, A., Smelyanskiy, M., Kaul, B., and Dubey, P

Kalamkar, D., Mudigere, D., Mellempudi, N., Das, D., Banerjee, K., Avancha, S., Vooturi, D. T., Jammalamadaka, N., Huang, J., Yuen, H., Yang, J., Park, J., Heinecke, A., Georganas, E., Srinivasan, S., Kundu, A., Smelyanskiy, M., Kaul, B., and Dubey, P. (2019). A study of bfloat16 for deep learning training

work page 2019
[14]

H., Lee, D., Wilkerson, C., Lai, K., and Mutlu, O

Kim, Y., Daly, R., Kim, J., Fallin, C., Lee, J. H., Lee, D., Wilkerson, C., Lai, K., and Mutlu, O. (2014). Flipping bits in memory without accessing them: An experimental study of dram disturbance errors. In ISCA

work page 2014
[15]

H., Gonzalez, J

Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., and Stoica, I. (2023). Efficient memory management for large language model serving with pagedattention. In SOSP

work page 2023
[16]

Kwong, A., Genkin, D., Gruss, D., and Yarom, Y. (2020). RAMBleed : Reading bits in memory without accessing them. In IEEE Symposium on Security and Privacy (S&P)

work page 2020
[17]

Li, X., Meng, Y., Chen, J., Luo, L., and Zeng, Q. (2025). Rowhammer-based trojan injection: One bit flip is sufficient for backdooring dnns. In USENIX Security Symposium

work page 2025
[18]

S., Qu, J., and Saileshwar, G

Lin, C. S., Qu, J., and Saileshwar, G. (2025). Gpuhammer: Rowhammer attacks on gpu memories are practical. In USENIX Security Symposium

work page 2025
[19]

S., Yan, Y., Ding, G., Qu, J., Zhu, J., Lie, D., and Saileshwar, G

Lin, C. S., Yan, Y., Ding, G., Qu, J., Zhu, J., Lie, D., and Saileshwar, G. (2026). GPUBreach : Privilege escalation attacks on GPUs using rowhammer. In IEEE Symposium on Security and Privacy (S&P)

work page 2026
[20]

Lin, C.-Y. (2004). ROUGE : A package for automatic evaluation of summaries. In Text Summarization Branches Out

work page 2004
[21]

G., Papaioannou, K., Guarnieri, M., and Doudali, T

Pennas, P. G., Papaioannou, K., Guarnieri, M., and Doudali, T. D. (2026). Cachesolidarity: Preventing prefix caching side channels in multi-tenant llm serving systems

work page 2026
[22]

Qwen3 technical report

Qwen Team (2025). Qwen3 technical report

work page 2025
[23]

S., He, Z., and Fan, D

Rakin, A. S., He, Z., and Fan, D. (2019). Bit-flip attack: Crushing neural network with progressive bit search. In ICCV

work page 2019
[24]

and Dullien, T

Seaborn, M. and Dullien, T. (2015). Exploiting the DRAM rowhammer bug to gain kernel privileges. In Black Hat USA

work page 2015
[25]

Song, L., Pang, Z., Wang, W., Wang, Z., Wang, X., Chen, H., Song, W., Jin, Y., Meng, D., and Hou, R. (2025). The early bird catches the leak: Unveiling timing side channels in llm serving systems. IEEE Transactions on Information Forensics and Security , 20:11431--11446

work page 2025
[26]

Wu, G., Zhang, Z., Zhang, Y., Wang, W., Niu, J., Wu, Y., and Zhang, Y. (2025). I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant llm serving. In NDSS

work page 2025
[27]

Wu, X., Ying, L., Chen, G., Gu, Y., and Qu, H. (2026). Cache me, catch you: Cache related security threats in llm serving frameworks. In NDSS

work page 2026
[28]

and Zhang, S

Xia, T. and Zhang, S. Q. (2025). Kelle: Co-design kv caching and edram for efficient llm serving in edge computing. In MICRO

work page 2025
[29]

Yan, Y., Lu, S., Gao, Y., Li, Z., Zhao, Z., Yuan, Q., and Wang, Y. (2025). Has the two-decade-old prophecy come true? Artificial Bad Intelligence triggered by merely a single-bit flip in large language models

work page 2025
[30]

S., and Fan, D

Yao, F., Rakin, A. S., and Fan, D. (2020). Deephammer: Depleting the intelligence of deep neural networks through targeted chain of bit flips. In USENIX Security Symposium

work page 2020
[31]

Q., and Artzi, Y

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. (2020). BERTS core: Evaluating text generation with BERT . In ICLR

work page 2020

[1] [1]

Al Nahian, M., Almalky, A. M. A., Aragonda, G., Zhou, R., Ahmed, S., Ponomarev, D., Yang, L., Angizi, S., and Rakin, A. S. (2025). Cachetrap: Injecting trojans in llms without leaving any traces in inputs or weights

work page 2025

[2] [2]

ShareGPT\_Vicuna\_unfiltered [dataset]

anon8231489123 (2023). ShareGPT\_Vicuna\_unfiltered [dataset]

work page 2023

[3] [3]

Claude opus 4.6 system card

Anthropic (2026). Claude opus 4.6 system card

work page 2026

[4] [4]

S., Qu, J., Sun, Y., Chen, S., Yang, L., Saileshwar, G., Nair, P., Fang, B., and Hong, S

Coalson, Z., Woo, J., Lin, C. S., Qu, J., Sun, Y., Chen, S., Yang, L., Saileshwar, G., Nair, P., Fang, B., and Hong, S. (2025). Prisonbreak: Jailbreaking large language models with at most twenty-five targeted bit-flips

work page 2025

[5] [5]

Das, S., Bhattacharya, S., Kundu, S., Kundu, S., Menon, A., Raha, A., and Basu, K. (2025). Genbfa: An evolutionary optimization approach to bit-flip attacks on llms. In NeurIPS

work page 2025

[6] [6]

Deepseek-r1 incentivizes reasoning in llms through reinforcement learning

DeepSeek-AI (2025). Deepseek-r1 incentivizes reasoning in llms through reinforcement learning. Nature , 645:633--638

work page 2025

[7] [7]

Prevent side-channel attacks via cache salting

dr75 (2025). Prevent side-channel attacks via cache salting. vLLM Pull Request \#17045

work page 2025

[8] [8]

Frigo, P., Vannacci, E., Hassan, H., van der Veen, V., Mutlu, O., Giuffrida, C., Bos, H., and Razavi, K. (2020). TRRespass : Exploiting the many sides of target row refresh. In IEEE Symposium on Security and Privacy (S&P) , pages 747--762

work page 2020

[9] [9]

Ganesh, M., Iyer, K., and Ananthan, A. B. S. (2025). Whose narrative is it anyway? a kv cache manipulation attack

work page 2025

[10] [10]

Gruss, D., Maurice, C., and Mangard, S. (2016). Rowhammer.js: A remote software-induced fault attack in JavaScript . In DIMVA , volume 9721 of LNCS . Springer

work page 2016

[11] [11]

Guo, J., Chakrabarti, C., and Fan, D. (2025). SBFA : Single sneaky bit flip attack to break large language models

work page 2025

[12] [12]

Guo, J., Chakrabarti, C., and Fan, D. (2026). TFL : Targeted bit-flip attack on large language model

work page 2026

[13] [13]

T., Jammalamadaka, N., Huang, J., Yuen, H., Yang, J., Park, J., Heinecke, A., Georganas, E., Srinivasan, S., Kundu, A., Smelyanskiy, M., Kaul, B., and Dubey, P

Kalamkar, D., Mudigere, D., Mellempudi, N., Das, D., Banerjee, K., Avancha, S., Vooturi, D. T., Jammalamadaka, N., Huang, J., Yuen, H., Yang, J., Park, J., Heinecke, A., Georganas, E., Srinivasan, S., Kundu, A., Smelyanskiy, M., Kaul, B., and Dubey, P. (2019). A study of bfloat16 for deep learning training

work page 2019

[14] [14]

H., Lee, D., Wilkerson, C., Lai, K., and Mutlu, O

Kim, Y., Daly, R., Kim, J., Fallin, C., Lee, J. H., Lee, D., Wilkerson, C., Lai, K., and Mutlu, O. (2014). Flipping bits in memory without accessing them: An experimental study of dram disturbance errors. In ISCA

work page 2014

[15] [15]

H., Gonzalez, J

Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C. H., Gonzalez, J. E., Zhang, H., and Stoica, I. (2023). Efficient memory management for large language model serving with pagedattention. In SOSP

work page 2023

[16] [16]

Kwong, A., Genkin, D., Gruss, D., and Yarom, Y. (2020). RAMBleed : Reading bits in memory without accessing them. In IEEE Symposium on Security and Privacy (S&P)

work page 2020

[17] [17]

Li, X., Meng, Y., Chen, J., Luo, L., and Zeng, Q. (2025). Rowhammer-based trojan injection: One bit flip is sufficient for backdooring dnns. In USENIX Security Symposium

work page 2025

[18] [18]

S., Qu, J., and Saileshwar, G

Lin, C. S., Qu, J., and Saileshwar, G. (2025). Gpuhammer: Rowhammer attacks on gpu memories are practical. In USENIX Security Symposium

work page 2025

[19] [19]

S., Yan, Y., Ding, G., Qu, J., Zhu, J., Lie, D., and Saileshwar, G

Lin, C. S., Yan, Y., Ding, G., Qu, J., Zhu, J., Lie, D., and Saileshwar, G. (2026). GPUBreach : Privilege escalation attacks on GPUs using rowhammer. In IEEE Symposium on Security and Privacy (S&P)

work page 2026

[20] [20]

Lin, C.-Y. (2004). ROUGE : A package for automatic evaluation of summaries. In Text Summarization Branches Out

work page 2004

[21] [21]

G., Papaioannou, K., Guarnieri, M., and Doudali, T

Pennas, P. G., Papaioannou, K., Guarnieri, M., and Doudali, T. D. (2026). Cachesolidarity: Preventing prefix caching side channels in multi-tenant llm serving systems

work page 2026

[22] [22]

Qwen3 technical report

Qwen Team (2025). Qwen3 technical report

work page 2025

[23] [23]

S., He, Z., and Fan, D

Rakin, A. S., He, Z., and Fan, D. (2019). Bit-flip attack: Crushing neural network with progressive bit search. In ICCV

work page 2019

[24] [24]

and Dullien, T

Seaborn, M. and Dullien, T. (2015). Exploiting the DRAM rowhammer bug to gain kernel privileges. In Black Hat USA

work page 2015

[25] [25]

Song, L., Pang, Z., Wang, W., Wang, Z., Wang, X., Chen, H., Song, W., Jin, Y., Meng, D., and Hou, R. (2025). The early bird catches the leak: Unveiling timing side channels in llm serving systems. IEEE Transactions on Information Forensics and Security , 20:11431--11446

work page 2025

[26] [26]

Wu, G., Zhang, Z., Zhang, Y., Wang, W., Niu, J., Wu, Y., and Zhang, Y. (2025). I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant llm serving. In NDSS

work page 2025

[27] [27]

Wu, X., Ying, L., Chen, G., Gu, Y., and Qu, H. (2026). Cache me, catch you: Cache related security threats in llm serving frameworks. In NDSS

work page 2026

[28] [28]

and Zhang, S

Xia, T. and Zhang, S. Q. (2025). Kelle: Co-design kv caching and edram for efficient llm serving in edge computing. In MICRO

work page 2025

[29] [29]

Yan, Y., Lu, S., Gao, Y., Li, Z., Zhao, Z., Yuan, Q., and Wang, Y. (2025). Has the two-decade-old prophecy come true? Artificial Bad Intelligence triggered by merely a single-bit flip in large language models

work page 2025

[30] [30]

S., and Fan, D

Yao, F., Rakin, A. S., and Fan, D. (2020). Deephammer: Depleting the intelligence of deep neural networks through targeted chain of bit flips. In USENIX Security Symposium

work page 2020

[31] [31]

Q., and Artzi, Y

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., and Artzi, Y. (2020). BERTS core: Evaluating text generation with BERT . In ICLR

work page 2020