L ayer S kip: Enabling Early Exit Inference and Self-Speculative Decoding

Elhoushi, Mostafa, Shrivastava, Akshat, Liskovich, Diana, Hosmer, Basil, Wasti, Bram, Lai, Liangzhen · 2024 · DOI 10.18653/v1/2024.acl-long.681

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Two-dimensional early exit optimisation of LLM inference

cs.CL · 2026-03-27 · unverdicted · novelty 7.0

Coordinating layer-wise and sentence-wise early exits in LLMs produces multiplicative speedups of 1.4-2.3x over single-dimension early exit on sentiment classification tasks.

All is Not Lost: LLM Recovery without Checkpoints

cs.DC · 2025-06-18 · conditional · novelty 7.0

CheckFree recovers intermediate stage failures in pipeline-parallel LLM training via neighbor averaging; CheckFree+ adds out-of-order execution to handle first/last stages by copying neighbors, with small embedding storage, outperforming checkpointing and redundancy at 5-10% failure rates by up to

Single-Pass, Depth-Selective Reading for Multi-Aspect Sentiment Analysis

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

DABS is a single-pass framework that builds a depth-ordered substrate from one Transformer encoding and performs lightweight aspect-conditioned readout, cutting computation by up to 60% on multi-aspect ATSA benchmarks while matching prior accuracy.

Sparse Layers are Critical to Scaling Looped Language Models

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Looped MoE models scale better than standard transformers because different experts activate on each loop pass, recovering expressivity without extra parameters, and support superior early exits.

ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

ConfLayers dynamically skips LLM layers based on confidence scores to create adaptive draft models for self-speculative decoding, reporting up to 1.4x speedup over standard generation.

Parcae: Scaling Laws For Stable Looped Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.

Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models

cs.CL · 2026-01-20 · unverdicted · novelty 5.0

The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

citing papers explorer

Showing 7 of 7 citing papers.

Two-dimensional early exit optimisation of LLM inference cs.CL · 2026-03-27 · unverdicted · none · ref 6
Coordinating layer-wise and sentence-wise early exits in LLMs produces multiplicative speedups of 1.4-2.3x over single-dimension early exit on sentiment classification tasks.
All is Not Lost: LLM Recovery without Checkpoints cs.DC · 2025-06-18 · conditional · none · ref 9
CheckFree recovers intermediate stage failures in pipeline-parallel LLM training via neighbor averaging; CheckFree+ adds out-of-order execution to handle first/last stages by copying neighbors, with small embedding storage, outperforming checkpointing and redundancy at 5-10% failure rates by up to
Single-Pass, Depth-Selective Reading for Multi-Aspect Sentiment Analysis cs.CL · 2026-05-20 · unverdicted · none · ref 24
DABS is a single-pass framework that builds a depth-ordered substrate from one Transformer encoding and performs lightweight aspect-conditioned readout, cutting computation by up to 60% on multi-aspect ATSA benchmarks while matching prior accuracy.
Sparse Layers are Critical to Scaling Looped Language Models cs.LG · 2026-05-09 · unverdicted · none · ref 31
Looped MoE models scale better than standard transformers because different experts activate on each loop pass, recovering expressivity without extra parameters, and support superior early exits.
ConfLayers: Adaptive Confidence-based Layer Skipping for Self-Speculative Decoding cs.LG · 2026-04-16 · unverdicted · none · ref 3
ConfLayers dynamically skips LLM layers based on confidence scores to create adaptive draft models for self-speculative decoding, reporting up to 1.4x speedup over standard generation.
Parcae: Scaling Laws For Stable Looped Language Models cs.LG · 2026-04-14 · unverdicted · none · ref 25
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models cs.CL · 2026-01-20 · unverdicted · none · ref 68
The survey organizes mechanistic interpretability techniques into a Locate-Steer-Improve framework to enable actionable improvements in LLM alignment, capability, and efficiency.

L ayer S kip: Enabling Early Exit Inference and Self-Speculative Decoding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer