yM } denote assistant target tokens

A Additional Method Details Training-time visibility · arXiv records/2005792

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

SPEED uses layer-asymmetric KV visibility to process non-anchor prompt tokens only in lower layers during prefill, achieving near-baseline quality on Llama-3.1-8B with 33% better TTFT and 25% lower active KV memory at 128K context.

citing papers explorer

Showing 1 of 1 citing paper.

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility cs.AI · 2026-05-07 · unverdicted · none · ref 20
SPEED uses layer-asymmetric KV visibility to process non-anchor prompt tokens only in lower layers during prefill, achieving near-baseline quality on Llama-3.1-8B with 33% better TTFT and 25% lower active KV memory at 128K context.

yM } denote assistant target tokens

fields

years

verdicts

representative citing papers

citing papers explorer