Fast and robust early-exiting framework for autoregressive language models with synchronized parallel decoding

Sangmin Bae, Jongwoo Ko, Hwanjun Song, Se-Young Yun · 2023 · arXiv 2310.05424

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving

cs.DC · 2026-04-22 · unverdicted · novelty 7.0

FASER delivers up to 53% higher throughput and 1.92x lower latency in dynamic LLM serving by adjusting speculative lengths per request, early pruning of rejects, and overlapping draft/verification phases via frontiers.

LLM-assisted Agentic Edge Intelligence Framework

cs.DC · 2026-03-11 · unverdicted · novelty 5.0

LEI framework uses a cloud LLM to dynamically create and update tailored lightweight programs for heterogeneous edge devices, shown on four sensor datasets to maintain low CPU and memory use while adapting to changes.

LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

cs.CL · 2025-07-02 · unverdicted · novelty 5.0

LogitSpec accelerates retrieval-based speculative decoding by speculating the next-next token from the last logit and retrieving relevant references for both next and next-next tokens, reporting up to 2.61x speedup and 3.28 mean accepted tokens.

citing papers explorer

Showing 3 of 3 citing papers.

FASER: Fine-Grained Phase Management for Speculative Decoding in Dynamic LLM Serving cs.DC · 2026-04-22 · unverdicted · none · ref 4
FASER delivers up to 53% higher throughput and 1.92x lower latency in dynamic LLM serving by adjusting speculative lengths per request, early pruning of rejects, and overlapping draft/verification phases via frontiers.
LLM-assisted Agentic Edge Intelligence Framework cs.DC · 2026-03-11 · unverdicted · none · ref 22
LEI framework uses a cloud LLM to dynamically create and update tailored lightweight programs for heterogeneous edge devices, shown on four sensor datasets to maintain low CPU and memory use while adapting to changes.
LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation cs.CL · 2025-07-02 · unverdicted · none · ref 41
LogitSpec accelerates retrieval-based speculative decoding by speculating the next-next token from the last logit and retrieving relevant references for both next and next-next tokens, reporting up to 2.61x speedup and 3.28 mean accepted tokens.

Fast and robust early-exiting framework for autoregressive language models with synchronized parallel decoding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer