RTPurbo converts full-attention LLMs to sparse attention by retaining full KV for retrieval heads and using a low-dimensional dynamic indexer, achieving near-lossless accuracy after minimal adaptation.
Patchscopes: A unifying framework for inspecting hidden representations of language models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps
RTPurbo converts full-attention LLMs to sparse attention by retaining full KV for retrieval heads and using a low-dimensional dynamic indexer, achieving near-lossless accuracy after minimal adaptation.