Title resolution pending

From Tokens to Layers: Redefining Stall-Free Scheduling for LLM Serving with Layered Prefill Mitra, T · arXiv 2506.05508

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill

cs.LG · 2025-10-09 · unverdicted · novelty 6.0

Layered prefill replaces token-chunked prefill with layer-group interleaving in MoE models, cutting TTFT by up to 70%, end-to-end latency by 41%, and per-token energy by 22% while preserving stall-free TBT.

citing papers explorer

Showing 1 of 1 citing paper.

From Tokens to Layers: Redefining Stall-Free Scheduling for MoE Serving with Layered Prefill cs.LG · 2025-10-09 · unverdicted · none · ref 10
Layered prefill replaces token-chunked prefill with layer-group interleaving in MoE models, cutting TTFT by up to 70%, end-to-end latency by 41%, and per-token energy by 22% while preserving stall-free TBT.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer