Quicksilver – speeding up LLM inference through dynamic token halting, KV skipping, contextual token fusion, and adaptive matryoshka quantization, 2025

Danush Khanna, Aditya Kumar Guru, Srivarshinee Sridhar, Zidan Ahmed, Rubhav Bahirwani, Meetu Malhotra, Vinija Jain, Aman Chadha, Amitava Das, Kripabandhu Ghosh · 2025 · arXiv 2506.22396

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Two-dimensional early exit optimisation of LLM inference

cs.CL · 2026-03-27 · unverdicted · novelty 7.0

Coordinating layer-wise and sentence-wise early exits in LLMs produces multiplicative speedups of 1.4-2.3x over single-dimension early exit on sentiment classification tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Two-dimensional early exit optimisation of LLM inference cs.CL · 2026-03-27 · unverdicted · none · ref 11
Coordinating layer-wise and sentence-wise early exits in LLMs produces multiplicative speedups of 1.4-2.3x over single-dimension early exit on sentiment classification tasks.

Quicksilver – speeding up LLM inference through dynamic token halting, KV skipping, contextual token fusion, and adaptive matryoshka quantization, 2025

fields

years

verdicts

representative citing papers

citing papers explorer