We vary the prompt length from 1K to 128K tokens and report the mean and standard deviation over five repeats

51 · 1996

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

SPEED uses layer-asymmetric KV visibility to process non-anchor prompt tokens only in lower layers during prefill, achieving near-baseline quality on Llama-3.1-8B with 33% better TTFT and 25% lower active KV memory at 128K context.

citing papers explorer

Showing 1 of 1 citing paper.

Shallow Prefill, Deep Decoding: Efficient Long-Context Inference via Layer-Asymmetric KV Visibility cs.AI · 2026-05-07 · unverdicted · none · ref 21
SPEED uses layer-asymmetric KV visibility to process non-anchor prompt tokens only in lower layers during prefill, achieving near-baseline quality on Llama-3.1-8B with 33% better TTFT and 25% lower active KV memory at 128K context.

We vary the prompt length from 1K to 128K tokens and report the mean and standard deviation over five repeats

fields

years

verdicts

representative citing papers

citing papers explorer