Hold onto that thought: Assessing kv cache compression on reasoning

Sam Weston, Alice Chen, Parth Shah · 2025 · arXiv 2512.12008

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

cs.LG · 2026-04-23 · unverdicted · novelty 7.0

Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.

How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

Transformers need depth scaling as the product of ceil(k/s) and log n terms for k-hop pointer chasing under cache size s, with a conjectured lower bound, proved upper bound via windowed pointer doubling, and an adaptive-oblivious error separation.

Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

A semantics-aware KV cache hierarchy offloads tokens to slower memory with zero approximation error, demonstrating that LLM reasoning accuracy depends only on the permanent eviction ratio and not on HBM residency.

citing papers explorer

Showing 3 of 3 citing papers.

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training cs.LG · 2026-04-23 · unverdicted · none · ref 119
Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers cs.LG · 2026-04-20 · unverdicted · none · ref 11
Transformers need depth scaling as the product of ceil(k/s) and log n terms for k-hop pointer chasing under cache size s, with a conjectured lower bound, proved upper bound via windowed pointer doubling, and an adaptive-oblivious error separation.
Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning cs.CL · 2026-05-10 · unverdicted · none · ref 19
A semantics-aware KV cache hierarchy offloads tokens to slower memory with zero approximation error, demonstrating that LLM reasoning accuracy depends only on the permanent eviction ratio and not on HBM residency.

Hold onto that thought: Assessing kv cache compression on reasoning

fields

years

verdicts

representative citing papers

citing papers explorer