arXiv preprint arXiv:2412.10319 , year =

Yucheng Li, Huiqiang Jiang, Qianhui Wu, Xufang Luo, Surin Ahn, Chengruidong Zhang, Amir H Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, et al · 2024 · arXiv 2412.10319

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective

cs.LG · 2026-04-28 · unverdicted · novelty 7.0

KV cache eviction is unified under an information capacity maximization principle derived from a linear-Gaussian attention surrogate, with CapKV proposed as a leverage-score based implementation that outperforms prior heuristics in experiments.

PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

cs.LG · 2026-04-23 · unverdicted · novelty 7.0 · 2 refs

Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.

Accuracy Is Speed: Towards Long-Context-Aware Routing for Distributed LLM Serving

cs.DC · 2026-04-17 · unverdicted · novelty 6.0

In long-context LLM serving, accuracy becomes speed via retry dynamics, and accuracy-aware routing reduces time-to-correct-answer.

IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs

cs.LG · 2026-04-12 · unverdicted · novelty 6.0

IceCache combines semantic token clustering with PagedAttention to keep only 25% of the KV cache tokens while retaining 99% accuracy on LongBench and matching or beating prior offloading methods in latency.

ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing

cs.LG · 2025-07-29 · unverdicted · novelty 5.0

ReasonCache reuses similar KV cache states across reasoning steps in LRMs via collaborative filtering to boost serving throughput by up to 89.2% while preserving accuracy.

A Survey of Scaling in Large Language Model Reasoning

cs.AI · 2025-04-02 · unverdicted · novelty 3.0

A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

citing papers explorer

Showing 6 of 6 citing papers.

Rethinking KV Cache Eviction via a Unified Information-Theoretic Objective cs.LG · 2026-04-28 · unverdicted · none · ref 15
KV cache eviction is unified under an information capacity maximization principle derived from a linear-Gaussian attention surrogate, with CapKV proposed as a leverage-score based implementation that outperforms prior heuristics in experiments.
PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training cs.LG · 2026-04-23 · unverdicted · none · ref 120 · 2 links
Stealth Pretraining Seeding plants persistent unsafe behaviors in LLMs via diffuse poisoned web content that activates on precise triggers and evades standard evaluation.
Accuracy Is Speed: Towards Long-Context-Aware Routing for Distributed LLM Serving cs.DC · 2026-04-17 · unverdicted · none · ref 15
In long-context LLM serving, accuracy becomes speed via retry dynamics, and accuracy-aware routing reduces time-to-correct-answer.
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs cs.LG · 2026-04-12 · unverdicted · none · ref 7
IceCache combines semantic token clustering with PagedAttention to keep only 25% of the KV cache tokens while retaining 99% accuracy on LongBench and matching or beating prior offloading methods in latency.
ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing cs.LG · 2025-07-29 · unverdicted · none · ref 14
ReasonCache reuses similar KV cache states across reasoning steps in LRMs via collaborative filtering to boost serving throughput by up to 89.2% while preserving accuracy.
A Survey of Scaling in Large Language Model Reasoning cs.AI · 2025-04-02 · unverdicted · none · ref 101
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.

arXiv preprint arXiv:2412.10319 , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer