PartRep selects high-NLL tokens via a lightweight early-exit gate for partial prompt repetition, retaining most full-repetition gains at 59.4% KV cache and 79% prefill FLOPs on eight benchmarks.
Advancing trans- former architecture in long-context large language models: A comprehensive survey
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
The paper defines the Conformal Hallucination Estimation Metric (CHEM) that localizes hallucination-prone regions in image reconstruction models via multiscale representations and distribution-free conformal regression.
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
citing papers explorer
-
PARTREP: Learning What to Repeat for Decoder-only LLMs
PartRep selects high-NLL tokens via a lightweight early-exit gate for partial prompt repetition, retaining most full-repetition gains at 59.4% KV cache and 79% prefill FLOPs on eight benchmarks.