Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

Satoshi Matsuura; Yuji Yamamoto

arxiv: 2604.17249 · v2 · pith:SIHDZTSGnew · submitted 2026-04-19 · 💻 cs.CR · cs.AR· cs.LG

Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

Yuji Yamamoto , Satoshi Matsuura This is my paper

Pith reviewed 2026-05-10 06:31 UTC · model grok-4.3

classification 💻 cs.CR cs.ARcs.LG

keywords bit-flip attacksKV-cacheLLM servingsilent divergenceprefix cachingpersistent damageintegrity protectiondata corruption

0 comments

The pith

Shared KV-cache blocks in LLM serving systems can be corrupted by bit flips, causing silent but persistent changes in responses for all requests using the same prefix.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that shared KV-cache blocks, stored as a single physical copy for prefix reuse in LLM serving, lack integrity protection and form a new target for bit-flip attacks. Software fault injection reveals that 13 of 16 BF16 bit positions produce coherent but incorrect outputs indistinguishable from normal responses without a baseline comparison. These changes affect only requests that share the corrupted prefix and persist without decay, so the total impact grows linearly with each new request using the block. The authors demonstrate that adding a checksum check at scheduling time detects any single-bit change and limits damage to one batch, at negligible cost. Readers should care because production systems keep popular prefixes cached for long periods, creating an opportunity for undetected, accumulating degradation that differs from attacks on model weights.

Core claim

Shared KV-cache blocks exist as a single physical copy without integrity protection. Software fault injection under ideal bit targeting shows that 13 of 16 BF16 bit positions produce coherent but altered outputs that are indistinguishable from legitimate responses without a clean baseline. Only requests sharing the targeted prefix are affected, and the corruption persists with no temporal decay, so cumulative damage grows linearly with subsequent requests. This profile enables detection evasion and unchecked amplification bounded only by cache lifetime. A checksum-based countermeasure detects any single-bit corruption at scheduling time, bounding cumulative damage to one batch independent of

What carries the argument

The shared KV-cache block as a single unprotected physical copy in prefix caching, subjected to software fault injection on individual BF16 bits to induce and characterize silent data corruption.

If this is right

Corrupted blocks cause silent divergence where outputs change coherently but incorrectly in 13 of 16 bit positions.
Corruption propagates selectively only to requests that share the targeted prefix.
Damage accumulates persistently with no decay, growing linearly as more requests use the block.
The proposed checksum detects single-bit errors at scheduling, limiting total damage to one batch regardless of how long the block stays cached.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Long-running conversations or repeated queries using common prefixes would experience escalating degradation if a block is hit.
Without such protections, attackers could target high-traffic prefixes to affect many users over extended periods.
Similar integrity mechanisms might be needed for other shared structures in LLM systems like attention caches.
Validating these effects with actual hardware-based bit flips rather than software simulation would strengthen the findings.

Load-bearing premise

Software fault injection under ideal bit targeting accurately represents the effects and feasibility of real Rowhammer attacks on GPU DRAM in production LLM serving systems with shared prefix caches.

What would settle it

An experiment that either succeeds or fails to induce the described silent divergence, selective propagation, and persistent accumulation by performing actual Rowhammer attacks on GPU memory holding KV-cache blocks in an LLM serving system.

Figures

Figures reproduced from arXiv: 2604.17249 by Satoshi Matsuura, Yuji Yamamoto.

**Figure 1.** Figure 1: Trial flow for a single experimental condition. Each trial proceeds through four sequential phases; the warm-up [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 2.** Figure 2: Mean TCR by bit position, averaged across all five [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Per-request corruption rate c¯i over 100 sequential requests, split by bit position (circles: Qwen3-8B; triangles: DeepSeek-R1). Shaded bands: mean ±1 SD (darker: Qwen3-8B; lighter: DeepSeek-R1). 6 COUNTERMEASURES The linear damage growth established in Section 5.3 stems from the absence of integrity verification on cached blocks. Adding integrity verification— detecting and invalidating corrupted blocks … view at source ↗

**Figure 4.** Figure 4: Mean cumulative affected count C¯N over 100 requests. Shaded regions: ±1 SD; dashed lines: OLS linear fits. To the best of our knowledge, no prior work has proposed runtime integrity verification of cached KV tensors. 6.1 Detection Mechanism 6.1.1 Mechanism To meet these requirements, the mechanism verifies block integrity via hash comparison at two lifecycle events in vLLM’s block pool. On cache: when a… view at source ↗

read the original abstract

Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching, these blocks exist as a single physical copy without integrity protection. Using software fault injection under ideal bit targeting, we characterize worst-case severity and identify three properties: (1) Silent divergence - 13 of 16 BF16 bit positions produce coherent but altered outputs, indistinguishable from legitimate responses without a clean baseline. (2) Selective propagation - only requests sharing the targeted prefix are affected. (3) Persistent accumulation - no temporal decay occurs, so cumulative damage grows linearly with subsequent requests. Together, these constitute a threat profile distinct from weight corruption: silent divergence and selective propagation enable detection evasion; persistent accumulation then proceeds unchecked, yielding damage amplification bounded only by how long the block remains cached. A checksum-based countermeasure detects any single-bit corruption at scheduling time, bounding cumulative damage to one batch independent of the block's cache lifetime, with negligible overhead. These results argue for integrity protection of prefix blocks before end-to-end exploitation is demonstrated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper characterizes how bit flips in shared KV-cache blocks could silently and persistently alter LLM outputs for prefix-sharing requests, but the hardware feasibility of inducing those flips remains unshown.

read the letter

The key point is that prefix-cached KV blocks in systems like vLLM form a single shared copy with no integrity checks, so a bit flip there could affect multiple future requests without obvious signs. The authors simulate this in software and lay out three behaviors: most BF16 bit positions produce coherent but wrong outputs, the corruption only hits requests that reuse the same prefix, and the damage stays in the cache and builds up over time instead of fading.

Referee Report

1 major / 2 minor

Summary. The manuscript examines bit-flip vulnerabilities targeting shared KV-cache blocks in LLM serving systems such as vLLM's prefix caching. Using software fault injection to model ideal single-bit flips in BF16 KV entries, the authors characterize three properties: silent divergence (13 of 16 bit positions produce coherent but altered outputs indistinguishable without a baseline), selective propagation (only requests sharing the targeted prefix are affected), and persistent accumulation (no temporal decay, with damage growing linearly). They propose a checksum-based countermeasure that detects single-bit corruptions at scheduling time to bound damage to one batch, with negligible overhead. The work positions this as a distinct threat from weight corruption due to evasion and amplification potential.

Significance. If the ideal software fault injection results map to achievable physical attacks, the paper identifies a previously unexamined attack surface in production LLM serving with stealthy, accumulating effects on shared prefix caches. The empirical characterization of bit-position sensitivity and propagation behaviors provides concrete, reproducible data on worst-case severity under the modeled conditions, and the checksum mitigation is a practical, low-overhead defense that directly addresses the identified properties. These elements strengthen the case for integrity protections in KV caches.

major comments (1)

[Abstract] Abstract: The threat profile (silent divergence enabling detection evasion, selective propagation, and persistent accumulation leading to unbounded damage) and the motivation for the checksum countermeasure are load-bearing on the assumption that software fault injection under ideal bit targeting accurately represents feasible Rowhammer attacks on GPU DRAM. The manuscript provides no hardware validation, analysis of row adjacency, DRAM organization, refresh rates, or ECC effects that would establish whether precise, isolated flips in BF16 KV entries are possible without collateral damage; this leaves real-world exploitability and the distinctiveness of the threat profile unproven.

minor comments (2)

[Abstract] The abstract states the checksum has 'negligible overhead' but does not reference a specific evaluation section, table, or quantitative results (e.g., latency or throughput impact) to support this claim.
Clarify in the methods or results whether the 13/16 bit-position finding is specific to BF16 or generalizes, and include error bars or multiple runs for the fault-injection experiments to strengthen reproducibility.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback. We address the major comment on the assumptions underlying our fault-injection model below.

read point-by-point responses

Referee: The threat profile (silent divergence enabling detection evasion, selective propagation, and persistent accumulation leading to unbounded damage) and the motivation for the checksum countermeasure are load-bearing on the assumption that software fault injection under ideal bit targeting accurately represents feasible Rowhammer attacks on GPU DRAM. The manuscript provides no hardware validation, analysis of row adjacency, DRAM organization, refresh rates, or ECC effects that would establish whether precise, isolated flips in BF16 KV entries are possible without collateral damage; this leaves real-world exploitability and the distinctiveness of the threat profile unproven.

Authors: We agree that the threat characterization depends on the modeled ideal single-bit flips. Our study employs software fault injection to quantify worst-case effects under precise targeting, a common methodology in early vulnerability analyses to surface potential attack surfaces before physical feasibility is established. We do not assert that such isolated flips are presently achievable on GPU DRAM for KV-cache blocks. We will revise the abstract, introduction, and threat-model section to explicitly qualify all results as holding under ideal bit-flip conditions and to discuss the additional hurdles (DRAM organization, refresh, ECC, and row-adjacency constraints) that would need to be overcome for a practical Rowhammer exploit. This revision will make clear that the distinctiveness of the threat profile is conditional on exploitability. revision: partial

standing simulated objections not resolved

Hardware validation or detailed analysis of Rowhammer feasibility on GPU DRAM targeting KV-cache blocks

Circularity Check

0 steps flagged

No circularity; empirical characterization stands on its own

full rationale

The paper presents an empirical study using software fault injection to characterize three properties of bit-flips in shared KV-cache blocks. No equations, fitted parameters, or derivations are invoked that reduce predictions to inputs by construction. No self-citations appear in the provided text as load-bearing for the central claims. The threat profile and countermeasure motivation follow directly from the experimental observations without self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that software fault injection models real hardware bit-flip effects and on the standard assumption that prefix caching shares physical blocks without integrity checks.

axioms (2)

domain assumption Software fault injection under ideal bit targeting accurately models Rowhammer effects on GPU DRAM in LLM serving systems
Invoked to characterize worst-case severity without demonstrating physical attacks.
standard math Prefix caching in vLLM maintains a single physical copy of KV blocks without integrity protection
Standard description of the system architecture used as baseline.

pith-pipeline@v0.9.0 · 5499 in / 1376 out tokens · 72214 ms · 2026-05-10T06:31:08.495171+00:00 · methodology

Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)